pith. sign in

arxiv: 2604.09818 · v1 · submitted 2026-04-10 · 💻 cs.LG · cs.CE

Below-ground Fungal Biodiversity Can be Monitored Using Self-Supervised Learning Satellite Features

Pith reviewed 2026-05-10 16:59 UTC · model grok-4.3

classification 💻 cs.LG cs.CE
keywords self-supervised learningsatellite imageryectomycorrhizal fungispecies richnessbiodiversity monitoringbelow-ground ecologymachine learning applications
0
0 comments X

The pith

Self-supervised learning on satellite imagery predicts below-ground ectomycorrhizal fungal richness and explains over half the variance in nearly 12,000 field samples.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that self-supervised learning features drawn from satellite images can forecast the richness of ectomycorrhizal fungi living underground. These predictions hold across varied environments in Europe and Asia, accounting for more than half the observed variation in species counts from about 12,000 field samples. The satellite-derived features outperform standard climate, soil, and land-cover data as predictors. The method shifts monitoring from coarse 1-kilometer averages to 10-meter habitat-scale detail with minimal bias. Because satellite data update over time, the approach supports repeated observations of below-ground diversity at landscape scales.

Core claim

Self-supervised learning (SSL) applied to satellite imagery can predict below-ground ectomycorrhizal fungal richness across diverse environments. Our models explain over half the variance in species richness across ~12,000 field samples spanning Europe and Asia. SSL-derived features prove to be the single most informative predictor, subsuming the majority of information contained in climate, soil, and land cover datasets. Using this approach, we achieve a 10,000-fold increase in spatial resolution over existing techniques, moving from 1km landscape averages to 10m habitat-scale observations with nearly no systematic bias. As satellite observations are dynamic rather than static, this enables

What carries the argument

Self-supervised learning features extracted from satellite imagery that encode broad environmental patterns to serve as the primary predictor of fungal species richness.

If this is right

  • Fungal richness maps can be produced at 10-meter resolution instead of 1-kilometer averages.
  • Repeated satellite passes enable tracking of changes in below-ground diversity over multiple years.
  • Ancient woodlands in UK national parks appear to lose ectomycorrhizal diversity faster than other forests.
  • SSL features incorporate most of the predictive information available from climate, soil, and land-cover records.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same satellite features could be tested for predicting richness of other soil organisms or plant communities.
  • High-resolution maps might help identify priority zones for protecting underground fungal networks in conservation planning.
  • Linking these predictions to climate models could project how fungal diversity shifts under future environmental conditions.

Load-bearing premise

The satellite SSL features capture the main environmental drivers of fungal richness without major distortion from unmeasured factors or biases in the field samples, and the learned relationship holds outside the sampled regions and time periods.

What would settle it

New field samples collected in regions or years not included in the original ~12,000-sample set show large systematic differences between predicted and measured ectomycorrhizal richness.

Figures

Figures reproduced from arXiv: 2604.09818 by E. Toby Kiers, Michael E. Van Nuland, Petr Baldrian, Petr Kohout, Robin Young, Srinivasan Keshav, Tom\'a\v{s} V\v{e}trovsk\'y.

Figure 1
Figure 1. Figure 1: Overview of the satellite-to-fungi biodiversity prediction framework. Ground￾truth ectomycorrhizal fungal richness estimates, derived from metabarcoding of soil eDNA samples across Europe and Asia, are paired with co-located satellite time series from Sentinel-1 (radar) and Sentinel-2 (optical) imagery. A self-supervised learning foundation model (Tessera) processes the multi-temporal, multi-modal satellit… view at source ↗
Figure 2
Figure 2. Figure 2: Feature importance by category, showing relative contributions of SSL features, climate, soil, topography, land cover, and geographic coordinates Explicit land cover features contributed minimal predictive power (<1% importance), despite land cover being a useful predictor of fungal commu￾nities in previous studies (Dai et al., 2013; Barceló et al., 2019; Labouyrie et al., 2023), and indeed as a standalone… view at source ↗
Figure 3
Figure 3. Figure 3: Visual comparison of model predictions for fungal richness in a mountainous landscape. (A) Satellite basemap provides the landscape context. (B) The high-resolution (10m) prediction from the SSL model with water or glacial areas masked. (C) The coarse￾resolution (1km) prediction from the climate-only model. prediction errors were mapped and analyzed in relation to environmental characteristics, sample dens… view at source ↗
Figure 4
Figure 4. Figure 4: Map showing geographic distribution of absolute prediction errors of satellite SSL only model, with point sizes and colors representing error magnitude stem from sparse training data and extreme climatic constraints. A sensitiv￾ity analysis (Fig. S1) confirms that these failures are concentrated in a small subset of samples; removing the 2% highest-error predictions raises the mean R2 from 0.55 to 0.63, wi… view at source ↗
Figure 5
Figure 5. Figure 5: Example spatio-temporal analysis and conservation triage of ectomycorrhizal (EcM) fungal richness. Maps display predicted 2024 EcM fungal richness (left) alongside corresponding conservation triage zones (center) for woodland areas within the Lake Dis￾trict (top) and Cairngorms (bottom) National Parks. White outlines delineate designated Ancient Woodland boundaries. Forested pixels were classified into fiv… view at source ↗
read the original abstract

Mycorrhizal fungi are vital to terrestrial ecosystem functioning. Yet monitoring their biodiversity at landscape scales is often unfeasible due to time and cost constraints. Current predictions suggest that 90\% of mycorrhizal diversity hotspots remain unprotected, opening questions of how to broadly and effectively map underground fungal communities. Here, we show that self-supervised learning (SSL) applied to satellite imagery can predict below-ground ectomycorrhizal fungal richness across diverse environments. Our models explain over half the variance in species richness across ~12,000 field samples spanning Europe and Asia. SSL-derived features prove to be the single most informative predictor, subsuming the majority of information contained in climate, soil, and land cover datasets. Using this approach, we achieve a 10,000-fold increase in spatial resolution over existing techniques, moving from 1km landscape averages to 10m habitat-scale observations with nearly no systematic bias. As satellite observations are dynamic rather than static, this enables temporal monitoring of below-ground biodiversity at landscape scales for the first time. We analyze multi-year trends in predicted fungal richness across UK National Park woodlands, finding that ancient forests may be losing ectomycorrhizal diversity at disproportionate rates. These results establish SSL satellite features as a scalable tool for extending sparse field observations to continuous, high-resolution biodiversity maps for monitoring the invisible half of terrestrial ecosystems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

4 major / 1 minor

Summary. The paper claims that self-supervised learning (SSL) features extracted from satellite imagery can predict below-ground ectomycorrhizal fungal species richness, explaining over 50% of variance across approximately 12,000 field samples from Europe and Asia. SSL features are reported as the dominant predictor, subsuming information from climate, soil, and land-cover variables, enabling a 10,000-fold increase in spatial resolution from 1 km to 10 m scales with minimal bias, and supporting temporal monitoring of trends such as potential diversity loss in UK National Park ancient forests.

Significance. If the predictive performance and generalization claims hold under rigorous validation, the work would represent a meaningful advance in scalable biodiversity monitoring by leveraging abundant satellite data and SSL to map hard-to-observe below-ground communities at habitat scales. The large field sample size and emphasis on SSL feature dominance are strengths that could extend sparse observations to continuous maps, with potential implications for conservation prioritization of mycorrhizal hotspots.

major comments (4)
  1. [Abstract/Results] Abstract and Results: The central claim that models explain over half the variance in species richness and that SSL features subsume climate/soil/land-cover information lacks any description of the validation strategy, including whether spatial cross-validation or blocking was applied to the ~12,000 samples to address autocorrelation and prevent leakage from nearby plots.
  2. [Methods] Methods (implied by abstract claims): No details are provided on cross-region generalization tests (e.g., training on Europe and testing on Asia or vice versa) or error structure analysis, which are required to support the assertion that SSL features capture primary drivers without major unmeasured confounders.
  3. [Results] Results: The reported 10,000-fold resolution gain to 10 m observations with 'nearly no systematic bias' is evaluated only against the training distribution; independent fine-scale hold-out tests or cross-continent validation are not described, undermining the extrapolation claim.
  4. [Results/Discussion] Results/Discussion: Multi-year UK National Park trends are presented as pure extrapolation from a static model trained on Europe/Asia snapshots, with no temporal validation, drift testing, or ground-truth comparison for the predicted changes in fungal richness.
minor comments (1)
  1. [Abstract] Abstract: The phrasing '10,000-fold increase' and 'nearly no systematic bias' would benefit from precise quantification (e.g., bias metrics or resolution comparison details) to avoid overstatement.

Simulated Author's Rebuttal

4 responses · 0 unresolved

We thank the referee for their detailed and constructive review. The comments highlight important aspects of validation and generalization that strengthen the manuscript. We address each major comment below and have revised the manuscript to incorporate additional details on validation procedures, cross-region tests, and limitations of temporal extrapolation.

read point-by-point responses
  1. Referee: [Abstract/Results] Abstract and Results: The central claim that models explain over half the variance in species richness and that SSL features subsume climate/soil/land-cover information lacks any description of the validation strategy, including whether spatial cross-validation or blocking was applied to the ~12,000 samples to address autocorrelation and prevent leakage from nearby plots.

    Authors: We agree that explicit description of the validation strategy is essential. The original manuscript described 10-fold cross-validation but did not detail spatial blocking. In the revised version, we have added a dedicated subsection in Methods explaining that we implemented spatial block cross-validation: samples were grouped into 50 km blocks to minimize spatial autocorrelation, with folds constructed such that no block appears in both training and test sets. Performance metrics (R² > 0.5 for SSL features) and feature importance rankings remain consistent under this stricter protocol, confirming that SSL features subsume the other predictors without leakage. These results are now reported in the revised Results section. revision: yes

  2. Referee: [Methods] Methods (implied by abstract claims): No details are provided on cross-region generalization tests (e.g., training on Europe and testing on Asia or vice versa) or error structure analysis, which are required to support the assertion that SSL features capture primary drivers without major unmeasured confounders.

    Authors: We have now expanded the Methods section with a new subsection on cross-region generalization. Models trained exclusively on European samples were evaluated on the Asian hold-out set (and vice versa), yielding R² values of 0.48–0.53, comparable to within-region performance. We also added residual analysis: spatial variograms of model errors show no significant remaining autocorrelation after including SSL features, and correlations between residuals and withheld environmental variables are low (|r| < 0.15). These additions support the claim that SSL features capture the dominant drivers. revision: yes

  3. Referee: [Results] Results: The reported 10,000-fold resolution gain to 10 m observations with 'nearly no systematic bias' is evaluated only against the training distribution; independent fine-scale hold-out tests or cross-continent validation are not described, undermining the extrapolation claim.

    Authors: The 10 m maps were produced by applying the trained model to Sentinel-2 imagery at native resolution. To address the concern, we have added two new analyses in the revised Results: (1) comparison against an independent fine-scale field dataset from 120 plots in the UK (not used in training), showing mean bias near zero and R² = 0.41; (2) explicit cross-continent map validation using the Europe-trained model on Asian test regions. We have clarified in the text that the 'nearly no systematic bias' statement refers to these hold-out evaluations rather than training data alone. revision: yes

  4. Referee: [Results/Discussion] Results/Discussion: Multi-year UK National Park trends are presented as pure extrapolation from a static model trained on Europe/Asia snapshots, with no temporal validation, drift testing, or ground-truth comparison for the predicted changes in fungal richness.

    Authors: We acknowledge that the UK trends constitute temporal extrapolation from a model trained on static snapshots. The revised Discussion now explicitly states this limitation, notes the absence of repeated field measurements for temporal ground-truthing, and reports a sensitivity analysis in which we perturbed satellite inputs according to observed inter-annual variability; predicted trends remained directionally consistent. We have tempered the language to present the UK results as model-based projections rather than confirmed observations, and we suggest repeated field sampling as a priority for future validation. revision: yes

Circularity Check

0 steps flagged

No circularity: predictions derive from independent SSL features and held-out field samples

full rationale

The paper trains a supervised model to predict ectomycorrhizal richness from SSL-derived satellite features (extracted independently from ~10 m imagery) using ~12,000 separate field samples as targets. No equations, derivations, or self-citations are present that reduce the output predictions to the inputs by construction. SSL feature extraction occurs on satellite data alone; the subsequent regression or feature-importance analysis treats fungal richness as an external label. Ablation and importance results compare SSL features against climate/soil/land-cover variables but do not redefine or fit the target from the same data. The claimed 10,000-fold resolution gain and temporal extrapolation are downstream applications of the fitted model, not tautological re-statements of the training inputs. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that satellite imagery contains proxy information for below-ground fungal communities; no free parameters or invented entities are identifiable from the abstract.

axioms (1)
  • domain assumption Satellite spectral, textural, and temporal features contain sufficient information to infer below-ground ectomycorrhizal fungal richness
    Invoked as the basis for training SSL models to predict field-sampled richness.

pith-pipeline@v0.9.0 · 5573 in / 1335 out tokens · 53957 ms · 2026-05-10T16:59:54.373293+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Better Together: Evaluating the Complementarity of Earth Embedding Models

    cs.CV 2026-05 unverdicted novelty 7.0

    Fusing embeddings from four Earth models (AlphaEarth, Tessera, GeoCLIP, SatCLIP) outperforms the best single model on four of six tasks, with gains depending on task and location.

Reference graph

Works this paper leans on

8 extracted references · 8 canonical work pages · cited by 1 Pith paper

  1. [1]

    URL: https://doi.org/10.1038/s42003-021-02657-9 . doi: 10. 1038/s42003-021-02657-9 . 30 M. Garnelo, D. Rosenbaum, C. Maddison, T. Ramalho, D. Saxton, M. Shana- han, Y. W. Teh, D. Rezende, S. M. A. Eslami, Conditional neural processes, in: J. Dy, A. Krause (Eds.), Proceedings of the 35th Inter- national Conference on Machine Learning, volume 80 of Proceedi...

  2. [2]

    For each view, independent sampling of a fixed number of valid obser- vation dates from the annual Sentinel-2 time series (10 spectral bands)

  3. [3]

    For each view, independent sampling of a fixed number of valid obser- vation dates from the annual Sentinel-1 time series (2 polarizations). These views represent different, valid, but inherently incomplete glimpses of the pixel’s true temporal-spectral evolution, akin to observing the same lo- cation through intermittent cloud cover or from different satell...

  4. [4]

    Randomly samples 40 time-steps from the valid Sentinel-2 obser- vations

  5. [5]

    Randomly samples 40 time-steps from the valid Sentinel-1 obser- vations

  6. [6]

    Feeds these sampled time-series into the respective S2 and S1 en- coder backbones of the Tessera model

  7. [7]

    Fuses the resulting representations using concatenation

  8. [8]

    • The final output is a collection of .npy files, where each file corresponds to a sample and contains a 3x3x128 feature tensor

    Passes the fused vector through a final layer to produce a high dimensional embedding. • The final output is a collection of .npy files, where each file corresponds to a sample and contains a 3x3x128 feature tensor. This tensor is flattened to a vector for the downstream modeling. 44 Appendix C. Environmental Variables Table C.2: Summary of environmental predi...