๐ŸŒฟ Overview

BIOTICA is an open-source, multi-dimensional framework for the integrated assessment, predictive modeling, and cosmological contextualization of ecosystem resilience. The system integrates nine analytical parameters into a single operational composite โ€” the Integrated Biotic Resilience (IBR) index โ€” validated across 3,412 ecosystem plots from 22 biome types spanning 6 continents.

โœ… Production Ready

BIOTICA has been validated across 3,412 plots achieving 92.6% classification accuracy and 89.4% agreement with expert field surveys. The framework is actively used for REDD+ carbon accounting audits and ecosystem tipping-point early warning.

Key Capabilities

  • High Accuracy: 92.6% IBR classification across 22-biome leave-one-biome cross-validation
  • AI-Assisted Classification: 89.4% agreement with expert field surveys on 682 held-out plots
  • Early Warning: 8โ€“14 month tipping point lead time via critical slowing-down detection
  • Carbon Accounting: ยฑ31 Mg Cยทhaโปยน precision vs. ยฑ180 Mg Cยทhaโปยน for allometric estimates
  • Phenological Precision: ยฑ6.2 days across 180 eddy covariance flux tower sites
  • Legacy Audit: 14.7% error rate flagged across 2,100 REDD+ database units

System Statistics

IBR Classification Accuracy

92.6%

22-biome leave-one-biome CV

AI Agreement

89.4%

vs. expert field surveys

Ecosystem Plots

3,412

Across 22 biomes, 6 continents

Tipping Point Warning

8โ€“14 mo

Before observed state transitions

Quick Navigation

๐Ÿ’ป Installation

System Requirements

  • Python: 3.11 or higher
  • R: 4.3 or higher (for Bayesian weight estimation and tipping-point modules)
  • RAM: 8 GB minimum (16 GB recommended for MI-CNN classifier)
  • Storage: ~5 GB for reference database and biome thresholds
  • CUDA: 11.8+ optional (GPU acceleration for AI classifier)
  • GDAL: 3.6+ (for geospatial raster processing)

Install from GitLab

# Clone the repository git clone https://gitlab.com/gitdeeper07/biotica.git cd biotica # Create conda environment (recommended) conda env create -f environment.yml conda activate biotica # Install Python package in editable mode pip install -e . # Install with all optional extras pip install -e ".[dev,docs,ai,r-bridge]" # Install R dependencies Rscript -e "install.packages(c('brms','igraph','adegenet','poppr','earlywarnings','vegan'))" # Run tests to verify installation pytest tests/unit/ -v

Install from PyPI

pip install biotica

Pull Reference Data (DVC)

# Pull reference data only (~2 GB) dvc remote add -d zenodo https://zenodo.org/record/biotica2026 dvc pull data/reference/ # Pull complete dataset (~48 GB) dvc pull
โš ๏ธ Note on full dataset

The complete 48 GB dataset requires institutional storage credentials for metagenomics (EBI MGnify) and genomics (NCBI SRA) archives. The 2 GB reference package is fully open and sufficient for most use cases.

๐Ÿš€ Quick Start

1

Compute a Single Parameter

2

Assemble the IBR Index

3

Classify and Generate Report

1 ยท Compute the MDI for a Plot

from biotica.parameters import MDI # Instantiate with path to shotgun metagenome mdi = MDI(metagenome_path="data/processed/metagenomes/plot_0042.tsv") score = mdi.compute() # โ†’ float in [0, 1] uncertainty = mdi.uncertainty() # โ†’ ยฑ value report = mdi.report() # โ†’ diagnostic dict print(f"MDI = {score:.3f} ยฑ {uncertainty:.3f}") # MDI = 0.847 ยฑ 0.031

2 ยท Compute the IBR Composite Index

from biotica.ibr import IBRComposite ibr = IBRComposite(plot_id="amazon_plot_0042") ibr.load_parameters({ "VCA": 0.831, "MDI": 0.847, "PTS": 0.911, "HFI": 0.789, "BNC": 0.802, "SGH": 0.714, "AES": 0.923, "TMI": 0.823, "RRC": 0.761, }) result = ibr.compute() print(result.score) # โ†’ 0.834 print(result.classification) # โ†’ "FUNCTIONAL" print(result.report()) # โ†’ full diagnostic dict

3 ยท Run the Full Pipeline

# Full pipeline: raw data โ†’ IBR scores for all plots python scripts/compute_ibr.py \ --input data/raw/ \ --output data/processed/ibr_scores/ \ --biome-ref data/reference/biome_thresholds.csv \ --cores 16

4 ยท Tipping Point Detection

from biotica.statistics import TippingPointDetector import pandas as pd ibr_ts = pd.read_csv("data/processed/ibr_timeseries/amazon_plot_0042.csv") detector = TippingPointDetector(window=24, lag=1) signals = detector.analyze(ibr_ts) if signals.critical_slowing_down: print(f"โš ๏ธ Warning: collapse risk in ~{signals.estimated_months} months")

๐Ÿ”ฌ The Nine IBR Parameters

Each parameter captures a distinct and statistically independent dimension of ecosystem identity and resilience. Weights were determined through a three-stage Bayesian principal component analysis across all 3,412 validation plots.

#SymbolParameterWeightDomainKey Instrument
1VCAVegetative Carbon Absorption20%Remote Sensing ยท CarbonDESIS/PRISMA hyperspectral
2MDIMicrobial Diversity Index15%Soil MetagenomicsIllumina NovaSeq shotgun
3PTSPhenological Time Shift12%Climate EcologyPhenoCam ยท Landsat archive
4HFIHydrological Flux Index11%EcohydrologyEddy covariance ยท MODIS ET
5BNCBiogeochemical Nutrient Cycle10%Soil ScienceICP-MS ยท CNS elemental analysis
6SGHSpecies Genetic Heterogeneity9%Population GenomicsRADseq ยท whole-genome reseq
7AESAnthropogenic Encroachment Score8%Land Use ScienceESA World Cover ยท FRAGSTATS
8TMITrophic Metadata Integration8%Food Web EcologyMetabarcoding ยท camera traps
9RRCRegenerative Recovery Capacity7%Disturbance EcologyChronosequence sampling

Composite Formula

// Integrated Biotic Resilience โ€” Composite Formula IBR = 0.20 ยท VCA* // Vegetative Carbon Absorption + 0.15 ยท MDI* // Microbial Diversity Index + 0.12 ยท PTS* // Phenological Time Shift + 0.11 ยท HFI* // Hydrological Flux Index + 0.10 ยท BNC* // Biogeochemical Nutrient Cycle + 0.09 ยท SGH* // Species Genetic Heterogeneity + 0.08 ยท AES* // Anthropogenic Encroachment Score + 0.08 ยท TMI* // Trophic Metadata Integration + 0.07 ยท RRC* // Regenerative Recovery Capacity // Sigmoid correction for non-linear cross-parameter interactions: // IBR_corrected = ฯƒ(ฮฃ wแตข ยท xแตข + ฮฒ) where ฯƒ(z) = 1 / (1 + eโปแถป) // Each Pแตข* normalized to [0,1] relative to biome-type reference thresholds
๐Ÿ“˜ Normalization

All parameters are normalized to [0,1] relative to biome-specific reference distributions, not global minima/maxima. This ensures that a tropical rainforest and an arctic tundra are evaluated against their own reference states, not against each other. See biotica/ibr/normalization.py and data/reference/biome_thresholds.csv.

๐Ÿ“Š IBR Classification Levels

The IBR score is mapped to five operational classification levels that guide conservation prioritization, restoration planning, and carbon accounting decisions.

๐ŸŸข PRISTINE
> 0.88
๐ŸŸก FUNCTIONAL
0.75 โ€“ 0.88
๐ŸŸ  IMPAIRED
0.60 โ€“ 0.75
๐Ÿ”ด DEGRADED
0.45 โ€“ 0.60
โšซ COLLAPSED
< 0.45
ClassIBR RangeEcological StateRecommended Action
PRISTINE> 0.88Reference state, full ecological function, maximum carbon stockPassive protection, long-term monitoring
FUNCTIONAL0.75 โ€“ 0.88Near-reference, minor departures, self-regulating resilience intactStandard monitoring, adaptive management
IMPAIRED0.60 โ€“ 0.75Measurable degradation, recovery feasible under active managementMulti-parameter restoration intervention
DEGRADED0.45 โ€“ 0.60Significant loss, high tipping point riskImmediate intensive intervention
COLLAPSED< 0.45Alternative stable state, standard recovery trajectories not applicableFull consortium characterization

๐Ÿ“ก API Reference

biotica.parameters โ€” Individual Parameter Modules

All nine parameter classes share a common interface:

from biotica.parameters import VCA, MDI, PTS, HFI, BNC, SGH, AES, TMI, RRC # Unified interface โ€” same for all 9 parameters p = MDI(metagenome_path="...") # instantiate p.compute() # โ†’ float [0,1] p.uncertainty() # โ†’ float (ยฑ value) p.report() # โ†’ dict (full diagnostics) p.plot() # โ†’ matplotlib Figure

biotica.ibr โ€” Composite Index Engine

from biotica.ibr import IBRComposite, IBRNormalizer, IBRThresholds # IBRComposite ibr = IBRComposite(plot_id="plot_001", biome="tropical_moist_forest") ibr.load_parameters(param_dict) # dict of 9 raw scores result = ibr.compute() # โ†’ IBRResult namedtuple result.score # float result.classification # str: PRISTINE / FUNCTIONAL / โ€ฆ result.confidence # float [0,1] result.report() # dict # IBRNormalizer โ€” per-biome normalization norm = IBRNormalizer(biome="tropical_moist_forest") norm.normalize(raw_score, parameter="MDI") # โ†’ float [0,1] # IBRThresholds โ€” classification threshold registry thresh = IBRThresholds() thresh.classify(0.834) # โ†’ "FUNCTIONAL" thresh.get_all() # โ†’ dict of all thresholds

biotica.ai โ€” MI-CNN Classifier

from biotica.ai import MICNNClassifier # Load pre-trained model model = MICNNClassifier.from_pretrained("models/mi_cnn_v1/") # Predict biome classification and IBR estimate prediction = model.predict( spectral="path/to/hyperspectral.npy", climate="path/to/worldclim.csv", terrain="path/to/terrain.csv", ) prediction.biome # โ†’ str prediction.ibr_estimate # โ†’ float prediction.confidence # โ†’ float # Grad-CAM interpretability cam = model.gradcam(prediction) cam.plot() # โ†’ Figure showing attention map

biotica.statistics โ€” Statistical Framework

from biotica.statistics import ( TippingPointDetector, CrossValidator, SensitivityAnalyzer, UncertaintyPropagator ) # Tipping point โ€” critical slowing-down detection detector = TippingPointDetector(window=24, lag=1) signals = detector.analyze(ibr_timeseries_df) signals.critical_slowing_down # bool signals.estimated_months # int: estimated lead time signals.ar1_trend # float: AR(1) coefficient trend signals.variance_trend # float: variance trend # Cross-validation cv = CrossValidator(n_folds="leave_one_biome") cv.run(plots_df, ibr_fn) # โ†’ CrossValidationResult

โš™๏ธ Snakemake Workflows

All analyses are reproducible via Snakemake. The master pipeline automatically determines which rules to run based on available inputs.

Run Full Validation Pipeline

# Full pipeline โ€” requires complete dataset (~72h on 32-core HPC) snakemake --cores 32 --use-conda all # Reproduce publication figures only snakemake --cores 8 figures # Single case study snakemake --cores 4 results/case_studies/amazon/ # Dry run โ€” preview without executing snakemake --cores 32 --use-conda --dry-run all

Available Rules

Rule FileDescriptionInputsOutputs
preprocessing.smkSpectral, flux, metagenomic preprocessingdata/raw/data/processed/
parameter_computation.smkCompute all 9 parameter scores per plotdata/processed/data/processed/parameters/
ibr_aggregation.smkNormalize and aggregate IBR indexdata/processed/parameters/data/processed/ibr_scores/
ai_classification.smkTrain and evaluate MI-CNN classifierdata/processed/models/ + results/
validation.smkCross-validation, sensitivity, uncertaintydata/processed/ibr_scores/results/validation/
๐Ÿ“˜ HPC Submission

Use scripts/batch_process.sh for SLURM cluster submission. Configure cores and memory in workflows/config/cluster.yaml. The full pipeline requires approximately 72 CPU-hours on a 32-core node.

๐Ÿ—„๏ธ Data & Formats

Supported Input Formats

ParameterFormatSourceTypical Size
VCA (spectral)ENVI BSQ / GeoTIFF NetCDFDESIS ยท PRISMA ยท Sentinel-20.5โ€“4 GB per scene
MDI (metagenomics)TSV gene abundance tableMGnify ยท EBI10โ€“200 MB per sample
PTS (phenocam)CSV time-series GCCPhenoCam Network~1 MB per site/year
HFI (flux tower)NetCDF4 (FLUXNET 2015)FLUXNET ยท ICOS5โ€“50 MB per site
BNC (soil chemistry)CSV elemental analysisLab measurements<1 MB per plot
SGH (genomics)VCF 4.2 (bgzipped + tabix)NCBI SRA1โ€“50 GB per population
AES (landscape)GeoTIFF raster + SHPESA World Cover ยท GFW50โ€“500 MB per region
TMI (food web)CSV adjacency + metabarcodingField surveys + eDNA<5 MB per plot
RRC (chronosequence)CSV time-series biomassField surveys<1 MB per site

Output Formats

# IBR score output (CSV) plot_id, biome, VCA, MDI, PTS, HFI, BNC, SGH, AES, TMI, RRC, IBR, classification, confidence # GeoJSON output (for spatial analysis) { "type": "Feature", "properties": { "plot_id": "amazon_0042", "IBR": 0.834, "classification": "FUNCTIONAL", "tipping_point_risk": false } }

๐Ÿค– MI-CNN AI Classifier

The Multi-Input Convolutional Neural Network (MI-CNN) processes four parallel data streams and achieves 89.4% agreement with expert field surveys on 682 held-out plots.

Architecture Overview

StreamInputArchitectureOutput
SpectralHyperspectral cube (426 bands)1D-CNN ร— 3 layers128-dim embedding
TemporalVI time-series (24 months)1D-CNN ร— 2 layers64-dim embedding
Climate19 WorldClim bioclimatic varsDense ร— 3 layers32-dim embedding
Terrain8 morphometric derivativesDense ร— 2 layers16-dim embedding

The four streams are concatenated into a 240-dimensional feature vector and passed through a classification head (3 dense layers โ†’ 22-class biome softmax + IBR regression head).

Training and Evaluation

# Train MI-CNN from scratch python scripts/train_classifier.py \ --data data/processed/ \ --model-dir models/mi_cnn_v1/ \ --epochs 150 \ --batch-size 64 \ --lr 0.0005 # Evaluate on held-out validation set python scripts/evaluate_classifier.py \ --model models/mi_cnn_v1/ \ --test-set data/processed/validation/
โœ… Validated Performance

MI-CNN v1.0 achieves 89.4% biome classification accuracy and 0.91 Pearson r for IBR regression on the 682-plot held-out test set. Grad-CAM analysis confirms that spectral features in the red-edge (700โ€“730 nm) and SWIR (1550โ€“1750 nm) ranges contribute most to classification decisions.

โœ… Validation & Reproducibility

Cross-Validation Protocol

BIOTICA uses a leave-one-biome-out cross-validation design: the model is trained on 21 biome types and evaluated on the held-out 22nd. This prevents any within-biome data leakage and tests generalization across entirely unseen ecosystem types.

Leave-One-Biome CV

92.6%

Mean accuracy across 22 folds

Worst-Case Biome

87.1%

Tropical dry forest (n=98 plots)

Best-Case Biome

97.3%

Boreal forest (n=312 plots)

Reproducibility Hash

sha256:b4f2

Ubuntu 22.04 ยท macOS 14.2

Reproducing All Results

# Reproduce all paper results (requires full dataset) snakemake --cores 32 --use-conda all # Reproduce specific table from paper python scripts/generate_figures.py --table 2 # IBR comparison table python scripts/generate_figures.py --fig 3 # MDI-carbon correlation

๐Ÿ• Changelog

v1.0.0
Mar 2026

Initial Release

Full nine-parameter IBR framework, MI-CNN v1.0 classifier, Snakemake pipelines, validated across 3,412 plots from 22 biomes. Paper submitted to Nature Sustainability.

v0.9.0
Jan 2026

Beta Release

Complete parameter suite, Bayesian weight determination, tipping-point detection module. Internal validation across Amazon and Australian datasets.

v0.5.0
Sep 2025

Alpha โ€” Core Framework

VCA, MDI, and PTS modules functional. Proof-of-concept IBR composite. Initial 800-plot dataset assembled from Amazon and Serengeti sites.

๐Ÿ“„ Publications

๐Ÿ“˜ Primary Reference

If you use BIOTICA in your research, please cite the primary paper using the BibTeX entry below.

@article{baladi2026biotica, title = {{BIOTICA}: A Multi-Dimensional Bio-Geochemical Framework for the Systematic Assessment, Predictive Modeling, and Cosmological Contextualization of Ecosystem Resilience}, author = {Baladi, Samir}, journal = {Nature Sustainability}, year = {2026}, doi = {10.14293/BIOTICA.2026.001}, note = {Submitted March 2026} }

๐Ÿ™ Acknowledgments

The BIOTICA framework builds upon the foundational work of the global ecosystem science community. Special thanks to:

  • The FLUXNET and ICOS communities for making eddy covariance data openly accessible
  • The MGnify / EBI team for metagenomics data infrastructure
  • The PhenoCam Network for long-term phenological time series
  • The Global Biodiversity Information Facility (GBIF) for occurrence data
  • Global Forest Watch for forest cover change and fragmentation layers
  • The Arrernte people of the Northern Territory and Aboriginal rangers of SE Australia for sharing traditional ecological knowledge used in TEK validation protocols
  • The Ronin Institute for supporting independent scholarship
โ†‘