pith. sign in

arxiv: 2605.22848 · v1 · pith:E5MPRHYDnew · submitted 2026-05-15 · 💻 cs.CE · cs.LG· q-bio.OT

From Simulation to Discovery: AI Enabled Probabilistic Emulation of Mechanistic Crop Systems

Pith reviewed 2026-05-25 00:26 UTC · model grok-4.3

classification 💻 cs.CE cs.LGq-bio.OT
keywords neural emulatorAPSIMmaizecrop modelingclimate resiliencegenotype by environmentyield prediction
0
0 comments X

The pith

A neural emulator of the APSIM crop model identifies 181 maize trait combinations resilient across future climates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Process-based crop models like APSIM are too slow to explore thousands of possible maize traits under varying soils and climates. The paper creates a neural network emulator trained on two million simulations that matches APSIM outputs closely while running much faster and providing uncertainty estimates. This speed allows testing 100,000 trait setups in six Midwest soil environments under climate projections to 2100. The result is a set of 181 trait combinations that keep yields high no matter the conditions tested. The work also points to radiation use efficiency and root responses to temperature as the main factors behind this resilience, and shows that some areas may gain in productivity.

Core claim

The probabilistic neural emulator reproduces 13 maize growth outputs from APSIM with an R squared of 0.93 after training on two million simulations that cover diverse genetic, soil, and management conditions. Augmented with a synthetic weather generator, it enables large-scale screening of 100,000 trait configurations in six Iowa and Illinois soils under two emissions scenarios to 2100. This identifies 181 maize trait combinations that maintain high yield across all conditions, an analysis impossible with the original model. Radiation use efficiency and temperature-driven root dynamics are shown as dominant drivers of resilience, while projected yields vary by location with some lower produt

What carries the argument

Probabilistic neural emulator of APSIM, which approximates the mechanistic crop model's processes across multiple outputs while estimating predictive uncertainty.

Load-bearing premise

The neural emulator accurately reproduces the behavior of the full APSIM model for trait and environment combinations that were not included in its training data.

What would settle it

Running the original APSIM model on the 181 selected trait combinations under the future climate projections and comparing the resulting yields and growth metrics to those predicted by the emulator.

Figures

Figures reproduced from arXiv: 2605.22848 by Baskar Ganapathysubramanian, Carlos Messina, Gustavo Visentini, Juan Panelo, Mojdeh Saadati, Soumik Sarkar.

Figure 1
Figure 1. Figure 1: 5 [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
read the original abstract

Global food security depends on predicting crop responses to climate variability, yet process based crop models remain too computationally expensive for large scale exploration of genotype and environment interactions. Here we develop a probabilistic neural emulator of APSIM that reproduces key maize growth processes across 13 outputs with high fidelity (with R^2 of 0.93) while reducing simulation time by several orders of magnitude. Trained on two million simulations spanning diverse genetic, soil, and management conditions, and augmented with a convolutional synthetic weather generator that produces physically consistent climate sequences, the framework enables scalable exploration of crop responses under realistic and diverse environmental inputs while providing calibrated predictive uncertainty without costly Bayesian inference. Applying this framework across 100,000 trait configurations, six soil environments in Iowa and Illinois, and climate projections through the year 2100 under two emissions scenarios, we identify 181 maize trait combinations that consistently maintain high yield across all tested conditionsan analysis infeasible with the mechanistic model alone. We further show that radiation use efficiency and temperature driven root dynamics are dominant drivers of yield resilience. Notably, projected yield distributions vary substantially across locations, with some lower productivity sites exhibiting yield increases under future climate scenarios, indicating that climate change may reshape regional yield potential in nonintuitive ways. These results demonstrate how uncertainty aware emulation transforms mechanistic crop simulation from a computational bottleneck into an on demand discovery engine, one capable of interrogating the full genotype, environment and management space at a scale no process-based model can match.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript develops a probabilistic neural emulator of the APSIM mechanistic maize model, trained on two million simulations spanning genetic, soil, and management conditions to reproduce 13 outputs with aggregate R²=0.93. Augmented by a convolutional synthetic weather generator, the emulator is applied to screen 100,000 trait configurations across six Iowa/Illinois soils and climate projections to 2100 under two emissions scenarios, identifying 181 trait combinations that maintain high yield across all conditions—an analysis stated to be infeasible with direct APSIM runs. The work further identifies radiation use efficiency and temperature-driven root dynamics as dominant resilience drivers and notes non-intuitive regional yield shifts under future climates.

Significance. If the emulator accurately reproduces APSIM behavior for the selected trait combinations and extrapolated climates, the framework would enable genotype-by-environment exploration at a scale that directly addresses computational bottlenecks in process-based crop modeling. The reported training scale (two million simulations) and screening volume (100,000 configurations) constitute a concrete strength in demonstrating feasible large-scale discovery. The probabilistic uncertainty quantification without Bayesian inference is a methodological contribution that could generalize to other mechanistic simulators.

major comments (2)
  1. [Abstract] Abstract: The central claim identifies 181 resilient trait combinations from emulator rankings on 100k configurations, yet the only reported fidelity metric is the aggregate R²=0.93 on the two-million-simulation training set. No held-out validation, per-trait error statistics, or out-of-distribution checks are described for the specific 181 selections or for the 2100 climate projections generated by the synthetic weather model. Because selection is performed precisely on emulator outputs, any systematic bias in those regions directly affects the reported resilient set.
  2. [Abstract] Abstract and implied Results section: The synthetic weather generator is used to produce climate sequences for 2100 under two emissions scenarios, but no quantitative assessment of its fidelity against observed or APSIM-validated future weather statistics is supplied. Error propagation from this generator into the yield rankings for the 181 combinations therefore remains unquantified, which is load-bearing for the resilience claim.
minor comments (2)
  1. [Abstract] Abstract: Typographical error 'conditionsan' should read 'conditions—an'.
  2. [Abstract] Abstract: The parenthetical '(with R^2 of 0.93)' repeats the preceding 'high fidelity' phrase; consider removing the redundancy for conciseness.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments, which help clarify the validation requirements for our claims. We address each major comment below and have revised the manuscript accordingly to incorporate additional validation analyses.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim identifies 181 resilient trait combinations from emulator rankings on 100k configurations, yet the only reported fidelity metric is the aggregate R²=0.93 on the two-million-simulation training set. No held-out validation, per-trait error statistics, or out-of-distribution checks are described for the specific 181 selections or for the 2100 climate projections generated by the synthetic weather model. Because selection is performed precisely on emulator outputs, any systematic bias in those regions directly affects the reported resilient set.

    Authors: We agree that the aggregate R² on the training set alone is insufficient to fully support the selection of the 181 combinations. Although the two-million-simulation training corpus was constructed to span the relevant genetic, soil, and management parameter space, we acknowledge the absence of explicit held-out, per-trait, and out-of-distribution metrics for the screened configurations and future-climate projections. In the revised manuscript we will add (i) performance metrics on a held-out test partition, (ii) per-output error statistics and bias analysis, and (iii) targeted validation of emulator predictions for the 181 selected trait combinations under both historical and projected climate inputs. revision: yes

  2. Referee: [Abstract] Abstract and implied Results section: The synthetic weather generator is used to produce climate sequences for 2100 under two emissions scenarios, but no quantitative assessment of its fidelity against observed or APSIM-validated future weather statistics is supplied. Error propagation from this generator into the yield rankings for the 181 combinations therefore remains unquantified, which is load-bearing for the resilience claim.

    Authors: We accept that a quantitative fidelity assessment of the convolutional synthetic weather generator against future-climate statistics and an explicit error-propagation analysis are not reported. The generator was trained to produce physically consistent sequences, yet we agree that direct validation against climate-model outputs and sensitivity of the 181 yield rankings to weather-generator uncertainty are necessary. The revised manuscript will include quantitative fidelity metrics for the generated 2100 sequences and a sensitivity study quantifying how weather-generator variability affects the identification and ranking of the resilient trait set. revision: yes

Circularity Check

0 steps flagged

No circularity: emulator trained on external APSIM data then applied forward to new trait configurations

full rationale

The derivation consists of (1) training a neural emulator on two million APSIM simulations spanning genetic/soil/management inputs, (2) using the trained emulator plus a synthetic weather generator to evaluate 100,000 new trait configurations under future climates, and (3) ranking those outputs to select 181 resilient combinations. None of these steps reduces the final selection to the training inputs by construction; the emulator is an independent fitted model whose outputs on held-out trait space are not forced to match any particular ranking. The reported R^2=0.93 is an in-sample aggregate metric and does not define the downstream selection. No self-citation, uniqueness theorem, or ansatz is invoked to justify the core workflow. The analysis therefore remains a standard train-then-extrapolate procedure whose validity rests on out-of-distribution fidelity rather than definitional equivalence.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the unverified assumption that the emulator generalizes beyond its two-million-simulation training set to the 100k trait configurations and future climates; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)
  • domain assumption The convolutional synthetic weather generator produces physically consistent climate sequences that match real variability.
    Abstract states the generator is used to augment training and produce future projections.

pith-pipeline@v0.9.0 · 5820 in / 1275 out tokens · 35860 ms · 2026-05-25T00:26:26.673581+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages

  1. [1]

    Agricultural Systems50(3), 255–271 (1996)

    McCown, R.L., Hammer, G.L., Hargreaves, J.N.G., Holzworth, D.P., Freebairn, D.M.,et al.: Apsim: a novel software system for model development, model testing and simulation in agricultural systems research. Agricultural Systems50(3), 255–271 (1996)

  2. [2]

    Environmental Modelling & Software62, 327–350 (2014)

    Holzworth, D.P., Huth, N.I., Voil, P.G., Zurcher, E.J., Herrmann, N.I.,et al.: APSIM– evolution towards a new generation of agricultural systems simulation. Environmental Modelling & Software62, 327–350 (2014)

  3. [3]

    European Journal of Agronomy100, 17–35 (2018)

    Brown, H.E., Huth, N.I., Holzworth, D.P., Zurcher, E.,et al.: An overview of APSIM, a model designed for farming systems simulation. European Journal of Agronomy100, 17–35 (2018)

  4. [4]

    Frontiers in Applied Mathematics and Statistics9, 1133226 (2023)

    Bocquet, M.: Surrogate modeling for the climate sciences dynamics with machine learning and data assimilation. Frontiers in Applied Mathematics and Statistics9, 1133226 (2023)

  5. [5]

    Journal of Advances in Modeling Earth Systems 14(11), 2022–003170 (2022)

    Pawar, S., San, O.: Equation-free surrogate modeling of geophysical flows at the intersection of machine learning and data assimilation. Journal of Advances in Modeling Earth Systems 14(11), 2022–003170 (2022)

  6. [6]

    Water Resources Research52(3), 1984–2008 (2016)

    Gong, W., Duan, Q., Li, J., Wang, C., Di, Z., Ye, A., Miao, C., Dai, Y.: Multiobjective adaptive surrogate modeling-based optimization for parameter estimation of large, complex geophysical models. Water Resources Research52(3), 1984–2008 (2016)

  7. [7]

    Natural Hazards94(3), 1225–1253 (2018)

    Zhang, J., Taflanidis, A.A., Nadal-Caraballo, N.C., Melby, J.A., Diop, F.: Advances in surrogate modeling for storm surge prediction: storm selection and addressing characteristics related to climate change. Natural Hazards94(3), 1225–1253 (2018)

  8. [8]

    Ocean Engineering309, 118458 (2024)

    Jin, Q., Jiang, X., Hua, F., Yang, Y., Jiang, S., Yu, C., Song, Z.: Gwsm4c: A global wave surrogate model for climate simulation based on a convolutional architecture. Ocean Engineering309, 118458 (2024)

  9. [9]

    Environmental Modelling & Software162, 105634 (2023)

    Johnston, D.B., Pembleton, K.G., Huth, N.I., Deo, R.C.: Comparison of machine learning methods emulating process driven crop models. Environmental Modelling & Software162, 105634 (2023)

  10. [10]

    Frontiers in Sustainable Food Systems7, 1157854 (2023)

    Gunarathna, M., Sakai, K., Kumari, M.: Emulator-based optimization of apsim-sugar using the results of sensitivity analysis performed with the software gem-sa. Frontiers in Sustainable Food Systems7, 1157854 (2023)

  11. [11]

    Geoderma337, 311–321 (2019)

    Luo, Z., Eady, S., Sharma, B., Grant, T., Li Liu, D., Cowie, A., Farquharson, R., Simmons, A., Crawford, D., Searle, R.,et al.: Mapping future soil carbon change and its uncertainty in croplands using simple surrogates of a complex farming system model. Geoderma337, 311–321 (2019)

  12. [12]

    arXiv preprint arXiv:2602.20928 (2026)

    Vlachopoulos, O., Luther, N., Ceglar, A., Toreti, A., Xoplaki, E.: Surrogate impact modelling for crop yield assessment. arXiv preprint arXiv:2602.20928 (2026)

  13. [13]

    arXiv preprint arXiv:2504.16141 (2025)

    Shi, Y., Han, L., Zhang, X., Sobeih, T., Gaiser, T., Thuy, N.H., Behrend, D., Srivastava, A.K., Halder, K., Ewert, F.: Deep learning meets process-based models: A hybrid approach to agricultural challenges. arXiv preprint arXiv:2504.16141 (2025)

  14. [14]

    Climate research55, 253–265 (2013)

    Ramankutty, P., Ryan, M., Lawes, R., Speijers, J., Renton, M.: Statistical emulators of a plant growth simulation model. Climate research55, 253–265 (2013)

  15. [15]

    Agricultural and Forest Meteorology236, 145–161 (2017)

    Blanc, ´E.: Statistical emulators of maize, rice, soybean and wheat yields from global gridded 13 crop models. Agricultural and Forest Meteorology236, 145–161 (2017)

  16. [16]

    Advances in neural information processing systems 32(2019)

    Maddox, W.J., Izmailov, P., Garipov, T., Vetrov, D.P., Wilson, A.G.: A simple baseline for bayesian uncertainty in deep learning. Advances in neural information processing systems 32(2019)

  17. [17]

    Agricultural Systems190, 103085 (2021)

    Huang, J., Hartemink, A.E., Kucharik, C.J.: Soil-dependent responses of us crop yields to climate variability and depth to groundwater. Agricultural Systems190, 103085 (2021)

  18. [18]

    Agricultural Water Management275, 107993 (2023)

    Youssef, M.A., Strock, J., Bagheri, E., Reinhart, B.D., Abendroth, L.J., Chighladze, G., Ghane, E., Shedekar, V., Fausey, N.R., Frankenberger, J.R.,et al.: Impact of controlled drainage on corn yield under varying precipitation patterns: A synthesis of studies across the us midwest and southeast. Agricultural Water Management275, 107993 (2023)

  19. [19]

    International Journal of Climatology43(1), 255–274 (2023)

    Chen, L., Ford, T.W.: Future changes in the transitions of monthly-to-seasonal precipita- tion extremes over the midwest in coupled model intercomparison project phase 6 models. International Journal of Climatology43(1), 255–274 (2023)

  20. [20]

    Agricultural and Forest Meteorology250, 319–329 (2018)

    Wang, N., Wang, E., Wang, J., Zhang, J., Zheng, B., Huang, Y., Tan, M.: Modelling maize phenology, biomass growth and yield under contrasting temperature conditions. Agricultural and Forest Meteorology250, 319–329 (2018)

  21. [21]

    Agronomy journal95(3), 688–696 (2003)

    Earl, H.J., Davis, R.F.: Effect of drought stress on leaf and whole canopy radiation use efficiency and yield of maize. Agronomy journal95(3), 688–696 (2003)

  22. [22]

    Agriculture12(4), 443 (2022)

    Walne, C.H., Reddy, K.R.: Temperature effects on the shoot and root growth, development, and biomass accumulation of corn (zea mays l.). Agriculture12(4), 443 (2022)

  23. [23]

    Agronomy12(6), 1363 (2022)

    Zhang, X., Li, G., Yang, H., Lu, D.: Foliar brassinolide sprays ameliorate post-silking heat stress on the accumulation and remobilization of biomass and nitrogen in fresh waxy maize. Agronomy12(6), 1363 (2022)

  24. [24]

    Journal of Genetic Engineering and Biotechnology 20(1), 101 (2022)

    Hajibarat, Z., Saidi, A.: Senescence-associated proteins and nitrogen remobilization in grain filling under drought stress condition. Journal of Genetic Engineering and Biotechnology 20(1), 101 (2022)

  25. [25]

    Agricultural Water Management302, 109013 (2024) 14

    Ru, C., Hu, X., Wang, W., Yan, H.: Impact of nitrogen on photosynthesis, remobilization, yield, and efficiency in winter wheat under heat and drought stress. Agricultural Water Management302, 109013 (2024) 14