🌿 BIOTICA Documentation — Ecosystem Resilience Assessment Framework

🌿 Overview

BIOTICA is an open-source, multi-dimensional framework for the integrated assessment, predictive modeling, and cosmological contextualization of ecosystem resilience. The system integrates nine analytical parameters into a single operational composite — the Integrated Biotic Resilience (IBR) index — validated across 3,412 ecosystem plots from 22 biome types spanning 6 continents.

✅ Production Ready

BIOTICA has been validated across 3,412 plots achieving 92.6% classification accuracy and 89.4% agreement with expert field surveys. The framework is actively used for REDD+ carbon accounting audits and ecosystem tipping-point early warning.

Key Capabilities

High Accuracy: 92.6% IBR classification across 22-biome leave-one-biome cross-validation
AI-Assisted Classification: 89.4% agreement with expert field surveys on 682 held-out plots
Early Warning: 8–14 month tipping point lead time via critical slowing-down detection
Carbon Accounting: ±31 Mg C·ha⁻¹ precision vs. ±180 Mg C·ha⁻¹ for allometric estimates
Phenological Precision: ±6.2 days across 180 eddy covariance flux tower sites
Legacy Audit: 14.7% error rate flagged across 2,100 REDD+ database units

System Statistics

IBR Classification Accuracy

92.6%

22-biome leave-one-biome CV

AI Agreement

89.4%

vs. expert field surveys

Ecosystem Plots

3,412

Across 22 biomes, 6 continents

Tipping Point Warning

8–14 mo

Before observed state transitions

Quick Navigation

💻

Installation

Setup in 5 minutes

🚀

Quick Start

Your first IBR computation

📡

API Reference

Full module documentation

⚙️

Workflows

Snakemake pipelines

💻 Installation

System Requirements

Python: 3.11 or higher
R: 4.3 or higher (for Bayesian weight estimation and tipping-point modules)
RAM: 8 GB minimum (16 GB recommended for MI-CNN classifier)
Storage: ~5 GB for reference database and biome thresholds
CUDA: 11.8+ optional (GPU acceleration for AI classifier)
GDAL: 3.6+ (for geospatial raster processing)

Install from GitLab

# Clone the repository
git clone https://gitlab.com/gitdeeper07/biotica.git
cd biotica

# Create conda environment (recommended)
conda env create -f environment.yml
conda activate biotica

# Install Python package in editable mode
pip install -e .

# Install with all optional extras
pip install -e ".[dev,docs,ai,r-bridge]"

# Install R dependencies
Rscript -e "install.packages(c('brms','igraph','adegenet','poppr','earlywarnings','vegan'))"

# Run tests to verify installation
pytest tests/unit/ -v

Install from PyPI

pip install biotica

Pull Reference Data (DVC)

# Pull reference data only (~2 GB)
dvc remote add -d zenodo https://zenodo.org/record/biotica2026
dvc pull data/reference/

# Pull complete dataset (~48 GB)
dvc pull

⚠️ Note on full dataset

The complete 48 GB dataset requires institutional storage credentials for metagenomics (EBI MGnify) and genomics (NCBI SRA) archives. The 2 GB reference package is fully open and sufficient for most use cases.

🚀 Quick Start

1

Compute a Single Parameter

2

Assemble the IBR Index

3

Classify and Generate Report

1 · Compute the MDI for a Plot

from biotica.parameters import MDI

# Instantiate with path to shotgun metagenome
mdi = MDI(metagenome_path="data/processed/metagenomes/plot_0042.tsv")

score       = mdi.compute()       # → float in [0, 1]
uncertainty = mdi.uncertainty()   # → ± value
report      = mdi.report()        # → diagnostic dict

print(f"MDI = {score:.3f} ± {uncertainty:.3f}")  # MDI = 0.847 ± 0.031

2 · Compute the IBR Composite Index

from biotica.ibr import IBRComposite

ibr = IBRComposite(plot_id="amazon_plot_0042")
ibr.load_parameters({
    "VCA": 0.831, "MDI": 0.847, "PTS": 0.911,
    "HFI": 0.789, "BNC": 0.802, "SGH": 0.714,
    "AES": 0.923, "TMI": 0.823, "RRC": 0.761,
})

result = ibr.compute()
print(result.score)           # → 0.834
print(result.classification)  # → "FUNCTIONAL"
print(result.report())        # → full diagnostic dict

3 · Run the Full Pipeline

# Full pipeline: raw data → IBR scores for all plots
python scripts/compute_ibr.py \
  --input  data/raw/ \
  --output data/processed/ibr_scores/ \
  --biome-ref data/reference/biome_thresholds.csv \
  --cores 16

4 · Tipping Point Detection

from biotica.statistics import TippingPointDetector
import pandas as pd

ibr_ts = pd.read_csv("data/processed/ibr_timeseries/amazon_plot_0042.csv")

detector = TippingPointDetector(window=24, lag=1)
signals  = detector.analyze(ibr_ts)

if signals.critical_slowing_down:
    print(f"⚠️ Warning: collapse risk in ~{signals.estimated_months} months")

🔬 The Nine IBR Parameters

Each parameter captures a distinct and statistically independent dimension of ecosystem identity and resilience. Weights were determined through a three-stage Bayesian principal component analysis across all 3,412 validation plots.

#	Symbol	Parameter	Weight	Domain	Key Instrument
1	VCA	Vegetative Carbon Absorption	20%	Remote Sensing · Carbon	DESIS/PRISMA hyperspectral
2	MDI	Microbial Diversity Index	15%	Soil Metagenomics	Illumina NovaSeq shotgun
3	PTS	Phenological Time Shift	12%	Climate Ecology	PhenoCam · Landsat archive
4	HFI	Hydrological Flux Index	11%	Ecohydrology	Eddy covariance · MODIS ET
5	BNC	Biogeochemical Nutrient Cycle	10%	Soil Science	ICP-MS · CNS elemental analysis
6	SGH	Species Genetic Heterogeneity	9%	Population Genomics	RADseq · whole-genome reseq
7	AES	Anthropogenic Encroachment Score	8%	Land Use Science	ESA World Cover · FRAGSTATS
8	TMI	Trophic Metadata Integration	8%	Food Web Ecology	Metabarcoding · camera traps
9	RRC	Regenerative Recovery Capacity	7%	Disturbance Ecology	Chronosequence sampling

Composite Formula

// Integrated Biotic Resilience — Composite Formula

IBR =
  0.20 · VCA*  // Vegetative Carbon Absorption
+ 0.15 · MDI*  // Microbial Diversity Index
+ 0.12 · PTS*  // Phenological Time Shift
+ 0.11 · HFI*  // Hydrological Flux Index
+ 0.10 · BNC*  // Biogeochemical Nutrient Cycle
+ 0.09 · SGH*  // Species Genetic Heterogeneity
+ 0.08 · AES*  // Anthropogenic Encroachment Score
+ 0.08 · TMI*  // Trophic Metadata Integration
+ 0.07 · RRC*  // Regenerative Recovery Capacity

// Sigmoid correction for non-linear cross-parameter interactions:
// IBR_corrected = σ(Σ wᵢ · xᵢ + β)  where σ(z) = 1 / (1 + e⁻ᶻ)
// Each Pᵢ* normalized to [0,1] relative to biome-type reference thresholds

📘 Normalization

All parameters are normalized to [0,1] relative to biome-specific reference distributions, not global minima/maxima. This ensures that a tropical rainforest and an arctic tundra are evaluated against their own reference states, not against each other. See biotica/ibr/normalization.py and data/reference/biome_thresholds.csv.

📊 IBR Classification Levels

The IBR score is mapped to five operational classification levels that guide conservation prioritization, restoration planning, and carbon accounting decisions.

🟢 PRISTINE
> 0.88

🟡 FUNCTIONAL
0.75 – 0.88

🟠 IMPAIRED
0.60 – 0.75

🔴 DEGRADED
0.45 – 0.60

⚫ COLLAPSED
< 0.45

Class	IBR Range	Ecological State	Recommended Action
PRISTINE	> 0.88	Reference state, full ecological function, maximum carbon stock	Passive protection, long-term monitoring
FUNCTIONAL	0.75 – 0.88	Near-reference, minor departures, self-regulating resilience intact	Standard monitoring, adaptive management
IMPAIRED	0.60 – 0.75	Measurable degradation, recovery feasible under active management	Multi-parameter restoration intervention
DEGRADED	0.45 – 0.60	Significant loss, high tipping point risk	Immediate intensive intervention
COLLAPSED	< 0.45	Alternative stable state, standard recovery trajectories not applicable	Full consortium characterization

📡 API Reference

biotica.parameters — Individual Parameter Modules

All nine parameter classes share a common interface:

from biotica.parameters import VCA, MDI, PTS, HFI, BNC, SGH, AES, TMI, RRC

# Unified interface — same for all 9 parameters
p = MDI(metagenome_path="...")   # instantiate
p.compute()                          # → float [0,1]
p.uncertainty()                      # → float (± value)
p.report()                           # → dict (full diagnostics)
p.plot()                             # → matplotlib Figure

biotica.ibr — Composite Index Engine

from biotica.ibr import IBRComposite, IBRNormalizer, IBRThresholds

# IBRComposite
ibr = IBRComposite(plot_id="plot_001", biome="tropical_moist_forest")
ibr.load_parameters(param_dict)      # dict of 9 raw scores
result = ibr.compute()               # → IBRResult namedtuple
result.score                         # float
result.classification                # str: PRISTINE / FUNCTIONAL / …
result.confidence                    # float [0,1]
result.report()                      # dict

# IBRNormalizer — per-biome normalization
norm = IBRNormalizer(biome="tropical_moist_forest")
norm.normalize(raw_score, parameter="MDI")  # → float [0,1]

# IBRThresholds — classification threshold registry
thresh = IBRThresholds()
thresh.classify(0.834)               # → "FUNCTIONAL"
thresh.get_all()                     # → dict of all thresholds

biotica.ai — MI-CNN Classifier

from biotica.ai import MICNNClassifier

# Load pre-trained model
model = MICNNClassifier.from_pretrained("models/mi_cnn_v1/")

# Predict biome classification and IBR estimate
prediction = model.predict(
    spectral="path/to/hyperspectral.npy",
    climate="path/to/worldclim.csv",
    terrain="path/to/terrain.csv",
)
prediction.biome         # → str
prediction.ibr_estimate  # → float
prediction.confidence    # → float

# Grad-CAM interpretability
cam = model.gradcam(prediction)
cam.plot()               # → Figure showing attention map

biotica.statistics — Statistical Framework

from biotica.statistics import (
    TippingPointDetector,
    CrossValidator,
    SensitivityAnalyzer,
    UncertaintyPropagator
)

# Tipping point — critical slowing-down detection
detector = TippingPointDetector(window=24, lag=1)
signals  = detector.analyze(ibr_timeseries_df)
signals.critical_slowing_down   # bool
signals.estimated_months        # int: estimated lead time
signals.ar1_trend               # float: AR(1) coefficient trend
signals.variance_trend          # float: variance trend

# Cross-validation
cv = CrossValidator(n_folds="leave_one_biome")
cv.run(plots_df, ibr_fn)        # → CrossValidationResult

⚙️ Snakemake Workflows

All analyses are reproducible via Snakemake. The master pipeline automatically determines which rules to run based on available inputs.

Run Full Validation Pipeline

# Full pipeline — requires complete dataset (~72h on 32-core HPC)
snakemake --cores 32 --use-conda all

# Reproduce publication figures only
snakemake --cores 8 figures

# Single case study
snakemake --cores 4 results/case_studies/amazon/

# Dry run — preview without executing
snakemake --cores 32 --use-conda --dry-run all

Available Rules

Rule File	Description	Inputs	Outputs
preprocessing.smk	Spectral, flux, metagenomic preprocessing	data/raw/	data/processed/
parameter_computation.smk	Compute all 9 parameter scores per plot	data/processed/	data/processed/parameters/
ibr_aggregation.smk	Normalize and aggregate IBR index	data/processed/parameters/	data/processed/ibr_scores/
ai_classification.smk	Train and evaluate MI-CNN classifier	data/processed/	models/ + results/
validation.smk	Cross-validation, sensitivity, uncertainty	data/processed/ibr_scores/	results/validation/

📘 HPC Submission

Use scripts/batch_process.sh for SLURM cluster submission. Configure cores and memory in workflows/config/cluster.yaml. The full pipeline requires approximately 72 CPU-hours on a 32-core node.

🗄️ Data & Formats

Supported Input Formats

Parameter	Format	Source	Typical Size
VCA (spectral)	ENVI BSQ / GeoTIFF NetCDF	DESIS · PRISMA · Sentinel-2	0.5–4 GB per scene
MDI (metagenomics)	TSV gene abundance table	MGnify · EBI	10–200 MB per sample
PTS (phenocam)	CSV time-series GCC	PhenoCam Network	~1 MB per site/year
HFI (flux tower)	NetCDF4 (FLUXNET 2015)	FLUXNET · ICOS	5–50 MB per site
BNC (soil chemistry)	CSV elemental analysis	Lab measurements	<1 MB per plot
SGH (genomics)	VCF 4.2 (bgzipped + tabix)	NCBI SRA	1–50 GB per population
AES (landscape)	GeoTIFF raster + SHP	ESA World Cover · GFW	50–500 MB per region
TMI (food web)	CSV adjacency + metabarcoding	Field surveys + eDNA	<5 MB per plot
RRC (chronosequence)	CSV time-series biomass	Field surveys	<1 MB per site

Output Formats

# IBR score output (CSV)
plot_id, biome, VCA, MDI, PTS, HFI, BNC, SGH, AES, TMI, RRC, IBR, classification, confidence

# GeoJSON output (for spatial analysis)
{
  "type": "Feature",
  "properties": {
    "plot_id": "amazon_0042",
    "IBR": 0.834,
    "classification": "FUNCTIONAL",
    "tipping_point_risk": false
  }
}

🤖 MI-CNN AI Classifier

The Multi-Input Convolutional Neural Network (MI-CNN) processes four parallel data streams and achieves 89.4% agreement with expert field surveys on 682 held-out plots.

Architecture Overview

Stream	Input	Architecture	Output
Spectral	Hyperspectral cube (426 bands)	1D-CNN × 3 layers	128-dim embedding
Temporal	VI time-series (24 months)	1D-CNN × 2 layers	64-dim embedding
Climate	19 WorldClim bioclimatic vars	Dense × 3 layers	32-dim embedding
Terrain	8 morphometric derivatives	Dense × 2 layers	16-dim embedding

The four streams are concatenated into a 240-dimensional feature vector and passed through a classification head (3 dense layers → 22-class biome softmax + IBR regression head).

Training and Evaluation

# Train MI-CNN from scratch
python scripts/train_classifier.py \
  --data data/processed/ \
  --model-dir models/mi_cnn_v1/ \
  --epochs 150 \
  --batch-size 64 \
  --lr 0.0005

# Evaluate on held-out validation set
python scripts/evaluate_classifier.py \
  --model models/mi_cnn_v1/ \
  --test-set data/processed/validation/

✅ Validated Performance

MI-CNN v1.0 achieves 89.4% biome classification accuracy and 0.91 Pearson r for IBR regression on the 682-plot held-out test set. Grad-CAM analysis confirms that spectral features in the red-edge (700–730 nm) and SWIR (1550–1750 nm) ranges contribute most to classification decisions.

✅ Validation & Reproducibility

Cross-Validation Protocol

BIOTICA uses a leave-one-biome-out cross-validation design: the model is trained on 21 biome types and evaluated on the held-out 22nd. This prevents any within-biome data leakage and tests generalization across entirely unseen ecosystem types.

Leave-One-Biome CV

92.6%

Mean accuracy across 22 folds

Worst-Case Biome

87.1%

Tropical dry forest (n=98 plots)

Best-Case Biome

97.3%

Boreal forest (n=312 plots)

Reproducibility Hash

sha256:b4f2

Ubuntu 22.04 · macOS 14.2

Reproducing All Results

# Reproduce all paper results (requires full dataset)
snakemake --cores 32 --use-conda all

# Reproduce specific table from paper
python scripts/generate_figures.py --table 2  # IBR comparison table
python scripts/generate_figures.py --fig 3    # MDI-carbon correlation

🕐 Changelog

v1.0.0
Mar 2026

Initial Release

Full nine-parameter IBR framework, MI-CNN v1.0 classifier, Snakemake pipelines, validated across 3,412 plots from 22 biomes. Paper submitted to Nature Sustainability.

v0.9.0
Jan 2026

Beta Release

Complete parameter suite, Bayesian weight determination, tipping-point detection module. Internal validation across Amazon and Australian datasets.

v0.5.0
Sep 2025

Alpha — Core Framework

VCA, MDI, and PTS modules functional. Proof-of-concept IBR composite. Initial 800-plot dataset assembled from Amazon and Serengeti sites.

📄 Publications

📘 Primary Reference

If you use BIOTICA in your research, please cite the primary paper using the BibTeX entry below.

@article{baladi2026biotica,
  title   = {{BIOTICA}: A Multi-Dimensional Bio-Geochemical Framework for the
             Systematic Assessment, Predictive Modeling, and Cosmological
             Contextualization of Ecosystem Resilience},
  author  = {Baladi, Samir},
  journal = {Nature Sustainability},
  year    = {2026},
  doi     = {10.14293/BIOTICA.2026.001},
  note    = {Submitted March 2026}
}

🙏 Acknowledgments

The BIOTICA framework builds upon the foundational work of the global ecosystem science community. Special thanks to:

The FLUXNET and ICOS communities for making eddy covariance data openly accessible
The MGnify / EBI team for metagenomics data infrastructure
The PhenoCam Network for long-term phenological time series
The Global Biodiversity Information Facility (GBIF) for occurrence data
Global Forest Watch for forest cover change and fragmentation layers
The Arrernte people of the Northern Territory and Aboriginal rangers of SE Australia for sharing traditional ecological knowledge used in TEK validation protocols
The Ronin Institute for supporting independent scholarship