PlantTraitNet: An Uncertainty-Aware Multimodal Framework for Global-Scale Plant Trait Inference from Citizen Science Data

Anja Linst\"adter; Ayushi Sharma; Christian Rossi; Daniel Lusk; Daniel Mederer; Etienne Lalibert\'e; Jana Eichel; Javier Lopatin; Johanna Trost; Johannes Dollinger

arxiv: 2511.06943 · v3 · pith:DBPFX4IDnew · submitted 2025-11-10 · 💻 cs.CV · cs.AI

PlantTraitNet: An Uncertainty-Aware Multimodal Framework for Global-Scale Plant Trait Inference from Citizen Science Data

Ayushi Sharma , Johanna Trost , Daniel Lusk , Johannes Dollinger , Julian Schrader , Christian Rossi , Javier Lopatin , Etienne Lalibert\'e

show 9 more authors

Simon Haberstroh Jana Eichel Daniel Mederer Jose Miguel Cerda-Paredes Shyam S. Phartyal Lisa-Maricia Schwarz Anja Linst\"adter Maria Concei\c{c}\~ao Caldeira Teja Kattenborn

This is my paper

Pith reviewed 2026-05-21 19:41 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords plant traitscitizen sciencedeep learningglobal mappingcomputer visiontrait inferencemultimodal frameworkecological data

0 comments

The pith

Citizen science photos combined with deep learning produce more accurate global maps of plant traits than existing products.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces PlantTraitNet to address the sparse coverage of traditional field measurements for key plant traits like height, leaf area, specific leaf area, and nitrogen content. It uses a multi-modal framework to infer these traits directly from millions of geotagged citizen science photographs under weak supervision. Individual predictions are then aggregated across geographic space to create global trait distribution maps. These maps are tested against independent sPlotOpen vegetation survey data and shown to outperform leading existing trait products on all four traits. The work establishes citizen imagery plus computer vision as a practical route to scalable, higher-accuracy trait information for ecosystem studies.

Core claim

PlantTraitNet is a multi-modal, multi-task uncertainty-aware deep learning framework that predicts four key plant traits from citizen science photos using weak supervision. By aggregating individual trait predictions across space, it generates global maps of trait distributions. Validation against independent sPlotOpen survey data and benchmarking against leading global trait products shows consistent outperformance across all evaluated traits.

What carries the argument

The uncertainty-aware multimodal deep learning framework that extracts morphological and physiological signals from individual citizen science photographs for trait prediction under weak supervision and then aggregates those predictions spatially.

Load-bearing premise

Citizen science photographs contain sufficient visual information on plant morphology and physiology to support accurate trait inference under weak supervision, and spatial aggregation of individual predictions yields reliable global distributions.

What would settle it

New independent field measurements in regions with sparse citizen science coverage would show whether PlantTraitNet maps lose their accuracy advantage over existing products or fail to match sPlotOpen validation levels.

Figures

Figures reproduced from arXiv: 2511.06943 by Anja Linst\"adter, Ayushi Sharma, Christian Rossi, Daniel Lusk, Daniel Mederer, Etienne Lalibert\'e, Jana Eichel, Javier Lopatin, Johanna Trost, Johannes Dollinger, Jose Miguel Cerda-Paredes, Julian Schrader, Lisa-Maricia Schwarz, Maria Concei\c{c}\~ao Caldeira, Shyam S. Phartyal, Simon Haberstroh, Teja Kattenborn.

**Figure 2.** Figure 2: Randomly sampled images showing highest/low [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: The model integrates image, depth, and geospatial [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Overview of the pipeline. We filter weakly labeled citizen science data (Raw data) based on high model uncertainty [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Mean relative prediction error (MRPE) computed on validation data at the family level, visualized along the taxonomic [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Intraspecific variation in predicted height for four species. Bar plots (left) show model predictions; histograms (right) [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Distribution of median trait values at the species level in the weakly labeled citizen science training data. [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗

**Figure 8.** Figure 8: Locations of scientifically curated reference [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗

**Figure 9.** Figure 9: Visualizations of predictive uncertainty for plant height during residual-aware filtering. Example images with high [PITH_FULL_IMAGE:figures/full_fig_p013_9.png] view at source ↗

**Figure 10.** Figure 10: Examples of images with high uncertainty and high residual error identified during residual-aware filtering for [PITH_FULL_IMAGE:figures/full_fig_p013_10.png] view at source ↗

**Figure 11.** Figure 11: Visualization of the predicted log-variance values [PITH_FULL_IMAGE:figures/full_fig_p014_11.png] view at source ↗

**Figure 12.** Figure 12: Global trait maps derived from spatially aggre [PITH_FULL_IMAGE:figures/full_fig_p014_12.png] view at source ↗

**Figure 13.** Figure 13: Global trait predictions obtained from PlantTraitNet against globally distributed vegetation survey data (sPlotOpen). [PITH_FULL_IMAGE:figures/full_fig_p015_13.png] view at source ↗

**Figure 14.** Figure 14: Mean relative prediction error (MRPE) computed [PITH_FULL_IMAGE:figures/full_fig_p015_14.png] view at source ↗

**Figure 15.** Figure 15: Intraspecific variation in predicted plant height compared to TRY-derived trait means. [PITH_FULL_IMAGE:figures/full_fig_p016_15.png] view at source ↗

**Figure 16.** Figure 16: Intraspecific variation in predicted leaf area compared to TRY-derived trait means. [PITH_FULL_IMAGE:figures/full_fig_p017_16.png] view at source ↗

**Figure 17.** Figure 17: Intraspecific variation in predicted specific leaf area compared to TRY-derived trait means. [PITH_FULL_IMAGE:figures/full_fig_p018_17.png] view at source ↗

**Figure 18.** Figure 18: Intraspecific variation in predicted leaf nitrogen compared to TRY-derived trait means. [PITH_FULL_IMAGE:figures/full_fig_p019_18.png] view at source ↗

read the original abstract

Global plant maps of plant traits, such as leaf nitrogen or plant height, are essential for understanding ecosystem processes, including the carbon and energy cycles of the Earth system. However, existing trait maps remain limited by the high cost and sparse geographic coverage of field-based measurements. Citizen science initiatives offer a largely untapped resource to overcome these limitations, with over 50 million geotagged plant photographs worldwide capturing valuable visual information on plant morphology and physiology. In this study, we introduce PlantTraitNet, a multi-modal, multi-task uncertainty-aware deep learning framework that predictsfour key plant traits (plant height, leaf area, specific leaf area, and nitrogen content) from citizen science photos using weak supervision. By aggregating individual trait predictions across space, we generate global maps of trait distributions. We validate these maps against independent vegetation survey data (sPlotOpen) and benchmark them against leading global trait products. Our results show that PlantTraitNet consistently outperforms existing trait maps across all evaluated traits, demonstrating that citizen science imagery, when integrated with computer vision and geospatial AI, enables not only scalable but also more accurate global trait mapping. This approach offers a powerful new pathway for ecological research and Earth system modeling.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PlantTraitNet uses citizen photos in a multimodal setup to claim better global trait maps, but the abstract and stress-test point leave the image branch's actual contribution unproven.

read the letter

PlantTraitNet's main claim is that a multimodal uncertainty-aware model trained on citizen science photos can produce global maps of plant height, leaf area, specific leaf area, and nitrogen that outperform existing products when checked against sPlotOpen. If the photos genuinely add signal beyond location and environmental layers, this would be a practical route to denser trait data for carbon-cycle work. That is the one thing worth knowing up front. The framework is new in its specific combination of weak supervision on geotagged photos, multi-task uncertainty modeling, and spatial aggregation for these four traits. The scale of available citizen images is a clear practical advantage over field-only approaches, and the motivation to feed better trait layers into Earth-system models is solid. The paper does a reasonable job framing the data gap and proposing an end-to-end pipeline that could be reused. The soft spots sit in the evidence for the image contribution. The abstract states outperformance without numbers, error bars, or ablation results, so it is impossible to judge whether the visual branch improves predictions or whether geospatial covariates are carrying most of the load under weak supervision. The stress-test concern about spurious correlations or secondary image value therefore lands as a real open question rather than a minor detail. Without those checks the central claim stays hard to assess. This is for ecologists and Earth-system modelers who need broader trait coverage and for CV researchers looking at citizen-data applications. A reader who wants to adapt similar pipelines could extract useful architecture ideas even if the results need more scrutiny. It deserves a serious referee because the data source and application are timely and the idea is coherent on its own terms. I would recommend sending it out so the authors can supply the missing quantitative details and ablations.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces PlantTraitNet, a multimodal, multi-task, uncertainty-aware deep learning framework that infers four plant traits (plant height, leaf area, specific leaf area, nitrogen content) from citizen-science photographs under weak supervision. Individual predictions are spatially aggregated to produce global trait maps, which are validated against the independent sPlotOpen vegetation survey dataset and benchmarked against existing global trait products; the central claim is that the framework consistently outperforms prior maps across all traits.

Significance. If the quantitative results and implementation details substantiate the claims, the work would demonstrate a scalable route to higher-accuracy global trait maps by exploiting the large volume of geotagged citizen-science imagery together with computer vision and geospatial covariates. The explicit uncertainty modeling and weak-supervision strategy, if shown to be effective, would be a useful technical contribution for ecological remote-sensing applications.

major comments (2)

[Abstract] Abstract: the statement that PlantTraitNet 'consistently outperforms existing trait maps across all evaluated traits' and is validated against sPlotOpen is presented without any numerical performance values, error bars, sample sizes, trait-specific metrics, or exclusion criteria. This absence prevents evaluation of the central empirical claim.
[Abstract] Abstract / Methods (inferred from abstract description): no information is supplied on how the image branch is isolated from geospatial covariates (e.g., via ablation, feature-importance analysis, or geospatial-only baseline). Without such controls it is impossible to confirm that citizen-science imagery supplies non-redundant visual signals rather than the model primarily learning from location-linked climate/soil features.

minor comments (1)

[Abstract] Abstract: typographical error 'predictsfour' should read 'predicts four'.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight opportunities to strengthen the presentation of empirical claims and to more explicitly demonstrate the contribution of the image modality. We address each point below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: the statement that PlantTraitNet 'consistently outperforms existing trait maps across all evaluated traits' and is validated against sPlotOpen is presented without any numerical performance values, error bars, sample sizes, trait-specific metrics, or exclusion criteria. This absence prevents evaluation of the central empirical claim.

Authors: We agree that the abstract would be strengthened by the inclusion of quantitative support for the central claim. In the revised manuscript we will update the abstract to report trait-specific validation metrics (e.g., Pearson r or RMSE) against sPlotOpen, the number of independent validation samples per trait, and any exclusion criteria applied (such as outlier removal or minimum sample thresholds). These additions will be kept concise while providing the numerical context needed to evaluate the performance statement. revision: yes
Referee: [Abstract] Abstract / Methods (inferred from abstract description): no information is supplied on how the image branch is isolated from geospatial covariates (e.g., via ablation, feature-importance analysis, or geospatial-only baseline). Without such controls it is impossible to confirm that citizen-science imagery supplies non-redundant visual signals rather than the model primarily learning from location-linked climate/soil features.

Authors: We acknowledge the value of explicitly isolating the contribution of the citizen-science imagery. The current manuscript describes the multimodal architecture and reports overall improvements relative to existing geospatial-only trait products, but does not contain a dedicated ablation against a geospatial-covariates-only baseline. In the revised version we will add such an ablation (or feature-importance analysis) in the Results or Methods section, reporting performance differences when the image branch is removed. This will directly address whether the visual signals provide non-redundant information beyond location-linked covariates. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper describes a multimodal deep learning model trained under weak supervision on citizen science photographs to infer four plant traits, followed by spatial aggregation to produce global maps that are then validated against the independent sPlotOpen survey dataset and benchmarked against existing trait products. No equations or steps reduce a claimed prediction to a fitted input by construction, no self-citation is invoked as a uniqueness theorem to force the architecture, and the central performance claim rests on external validation rather than internal redefinition. The derivation chain remains self-contained against the stated benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review; ledger is limited to the core domain assumption required by the central claim.

axioms (1)

domain assumption Citizen science photographs contain visual cues sufficient for inferring the four target plant traits under weak supervision
This premise is required for the weak-supervision training and subsequent global mapping to be valid.

pith-pipeline@v0.9.0 · 5829 in / 1198 out tokens · 69789 ms · 2026-05-21T19:41:31.507000+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

PlantTraitNet, a multi-modal, multi-task uncertainty-aware deep learning framework that predicts four key plant traits... from citizen science photos using weak supervision
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

integrates image, depth, and geospatial embeddings... residual network of 8 residual blocks

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

8 extracted references · 8 canonical work pages · 2 internal anchors

[1]

Bruelheide, H.; Dengler, J.; Purschke, O.; Lenoir, J.; Jim´enez-Alfaro, B.; Hennekens, S

Assessing the reliability of predicted plant trait distri- butions at the global scale.Global Ecology and Biogeogra- phy, 29(6): 1034–1051. Bruelheide, H.; Dengler, J.; Purschke, O.; Lenoir, J.; Jim´enez-Alfaro, B.; Hennekens, S. M.; Botta-Duk ´at, Z.; Chytr`y, M.; Field, R.; Jansen, F.; et al. 2018. Global trait– environment relationships of plant commun...

work page arXiv 2018
[2]

InAdvances in Neural Information Processing Systems (Datasets and Benchmarks Track)

Pl@ntnet-300k: A plant image dataset with high la- bel ambiguity and a long-tailed distribution. InAdvances in Neural Information Processing Systems (Datasets and Benchmarks Track). GBIF.org. 2025. GBIF Occurrence Download. Accessed: 2025-07-03. Go¨eau, H.; Martellucci, G.; Bonnet, P.; Vinatier, F.; and Joly, A. 2025. PlantCLEF2025 @ LifeCLEF & CVPR-FGVC....

work page 2025
[3]

InDAGM German Conference on Pattern Recognition, 329–343

Uncertainty V oting Ensemble for Imbalanced Deep Regression. InDAGM German Conference on Pattern Recognition, 329–343. Springer. Joswig, J. S.; Wirth, C.; Schuman, M. C.; Kattge, J.; Reu, B.; Wright, I. J.; Sippel, S. D.; R ¨uger, N.; Richter, R.; Schaep- man, M. E.; et al. 2022. Climatic and soil factors explain the two-dimensional spectrum of global pla...

work page 2022
[4]

Quantifying the Carbon Emissions of Machine Learning

Quantifying the carbon emissions of machine learn- ing.arXiv preprint arXiv:1910.09700. Lu, Y .; and He, W. 2022. SELC: self-ensemble label cor- rection improves learning with noisy labels.arXiv preprint arXiv:2205.01156. Lusk, D.; Wolf, S.; Svidzinska, D.; Dormann, C. F.; Kattge, J.; Bruelheide, H.; Sabatini, F. M.; Damasceno, G.; Mart´ınez, ´A. M.; Viol...

work page internal anchor Pith review Pith/arXiv arXiv 1910
[5]

Depth Anything V2

A fully traits-based approach to modeling global veg- etation distribution.Proceedings of the National Academy of Sciences, 111(38): 13733–13738. Van Horn, G.; Mac Aodha, O.; Song, Y .; Cui, Y .; Sun, C.; Shepard, A.; Adam, H.; Perona, P.; and Belongie, S. 2018. The inaturalist species classification and detection dataset. InProceedings of the IEEE confer...

work page internal anchor Pith review Pith/arXiv arXiv 2018
[6]

Zitzler, E.; Laumanns, M.; and Thiele, L

Understanding deep learning (still) requires rethink- ing generalization.Communications of the ACM, 64(3): 107–115. Zitzler, E.; Laumanns, M.; and Thiele, L. 2001. SPEA2: Im- proving the performance of the strength Pareto evolutionary algorithm. InProceedings of the 2001 Congress on Evolu- tionary Computation (CEC 2001), volume 1, 959–966. Supplemental Ma...

work page 2001
[7]

We visualized predicted trait values for a small, held-out subset of seven individuals that exhibited notable visual variation in developmental stage, size or structure

work page
[8]

This analysis provides insight into how well the model reflects trait variability within species, relative to aggregated trait observations

We compared the distribution of predicted trait values from up to 100 training images to the corresponding dis- tribution of observed trait values from up to 100 samples in the TRY database. This analysis provides insight into how well the model reflects trait variability within species, relative to aggregated trait observations. Our results, visualized i...

work page 2020

[1] [1]

Bruelheide, H.; Dengler, J.; Purschke, O.; Lenoir, J.; Jim´enez-Alfaro, B.; Hennekens, S

Assessing the reliability of predicted plant trait distri- butions at the global scale.Global Ecology and Biogeogra- phy, 29(6): 1034–1051. Bruelheide, H.; Dengler, J.; Purschke, O.; Lenoir, J.; Jim´enez-Alfaro, B.; Hennekens, S. M.; Botta-Duk ´at, Z.; Chytr`y, M.; Field, R.; Jansen, F.; et al. 2018. Global trait– environment relationships of plant commun...

work page arXiv 2018

[2] [2]

InAdvances in Neural Information Processing Systems (Datasets and Benchmarks Track)

Pl@ntnet-300k: A plant image dataset with high la- bel ambiguity and a long-tailed distribution. InAdvances in Neural Information Processing Systems (Datasets and Benchmarks Track). GBIF.org. 2025. GBIF Occurrence Download. Accessed: 2025-07-03. Go¨eau, H.; Martellucci, G.; Bonnet, P.; Vinatier, F.; and Joly, A. 2025. PlantCLEF2025 @ LifeCLEF & CVPR-FGVC....

work page 2025

[3] [3]

InDAGM German Conference on Pattern Recognition, 329–343

Uncertainty V oting Ensemble for Imbalanced Deep Regression. InDAGM German Conference on Pattern Recognition, 329–343. Springer. Joswig, J. S.; Wirth, C.; Schuman, M. C.; Kattge, J.; Reu, B.; Wright, I. J.; Sippel, S. D.; R ¨uger, N.; Richter, R.; Schaep- man, M. E.; et al. 2022. Climatic and soil factors explain the two-dimensional spectrum of global pla...

work page 2022

[4] [4]

Quantifying the Carbon Emissions of Machine Learning

Quantifying the carbon emissions of machine learn- ing.arXiv preprint arXiv:1910.09700. Lu, Y .; and He, W. 2022. SELC: self-ensemble label cor- rection improves learning with noisy labels.arXiv preprint arXiv:2205.01156. Lusk, D.; Wolf, S.; Svidzinska, D.; Dormann, C. F.; Kattge, J.; Bruelheide, H.; Sabatini, F. M.; Damasceno, G.; Mart´ınez, ´A. M.; Viol...

work page internal anchor Pith review Pith/arXiv arXiv 1910

[5] [5]

Depth Anything V2

A fully traits-based approach to modeling global veg- etation distribution.Proceedings of the National Academy of Sciences, 111(38): 13733–13738. Van Horn, G.; Mac Aodha, O.; Song, Y .; Cui, Y .; Sun, C.; Shepard, A.; Adam, H.; Perona, P.; and Belongie, S. 2018. The inaturalist species classification and detection dataset. InProceedings of the IEEE confer...

work page internal anchor Pith review Pith/arXiv arXiv 2018

[6] [6]

Zitzler, E.; Laumanns, M.; and Thiele, L

Understanding deep learning (still) requires rethink- ing generalization.Communications of the ACM, 64(3): 107–115. Zitzler, E.; Laumanns, M.; and Thiele, L. 2001. SPEA2: Im- proving the performance of the strength Pareto evolutionary algorithm. InProceedings of the 2001 Congress on Evolu- tionary Computation (CEC 2001), volume 1, 959–966. Supplemental Ma...

work page 2001

[7] [7]

We visualized predicted trait values for a small, held-out subset of seven individuals that exhibited notable visual variation in developmental stage, size or structure

work page

[8] [8]

This analysis provides insight into how well the model reflects trait variability within species, relative to aggregated trait observations

We compared the distribution of predicted trait values from up to 100 training images to the corresponding dis- tribution of observed trait values from up to 100 samples in the TRY database. This analysis provides insight into how well the model reflects trait variability within species, relative to aggregated trait observations. Our results, visualized i...

work page 2020