pith. sign in

arxiv: 2507.08977 · v4 · submitted 2025-07-11 · 💻 cs.LG · cs.AI· stat.ML

Simulation as Supervision: Mechanistic Pretraining for Scientific Discovery

Pith reviewed 2026-05-19 04:39 UTC · model grok-4.3

classification 💻 cs.LG cs.AIstat.ML
keywords Simulation-Grounded Neural Networksmechanistic pretrainingscientific forecastingmodel misspecificationstructural priorback-to-simulation attributionepidemiologyecology
0
0 comments X p. Extension

The pith

Neural networks pretrained on diverse mechanistic simulations outperform baselines and provide interpretability for scientific forecasting.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Simulation-Grounded Neural Networks that pretrain on synthetic data from multiple mechanistic model structures and noise levels to learn system dynamics as a structural prior. This avoids the bias of rigid functional constraints when equations are partially unknown. A sympathetic reader would care because the method improves forecasting in fields with limited real data while adding a way to trace predictions back to similar simulations for mechanistic insight. The evaluations show gains across epidemiology, ecology, social science, and chemistry, including robustness when the training simulations use incorrect assumptions.

Core claim

SGNNs incorporate scientific theory by using mechanistic simulations as training data for neural networks. By pretraining on diverse synthetic corpora that span multiple model structures and realistic observational noise, SGNNs internalize the underlying dynamics of a system as a structural prior. In forecasting tasks, SGNNs outperformed both standard data-driven baselines and physics-constrained hybrid models. They nearly tripled the forecasting skill of the average CDC models in COVID-19 mortality forecasts, accurately forecasted high-dimensional ecological systems, and remained effective under model misspecification. The framework also introduces back-to-simulation attribution for explain

What carries the argument

Simulation-Grounded Neural Networks (SGNNs) pretrained on synthetic corpora spanning multiple model structures and observational noise to internalize dynamics as a structural prior.

Load-bearing premise

That pretraining on simulations spanning multiple model structures and observational noise levels will produce a structural prior that transfers usefully to real data without introducing new biases from the choice of simulation ensemble.

What would settle it

A direct comparison on held-out real data where SGNNs trained on one ensemble of simulations fail to outperform baselines once the true dynamics or noise profile lies outside the pretraining distribution.

Figures

Figures reproduced from arXiv: 2507.08977 by Carson Dudley, Christopher Harding, Marisa Eisenberg, Reiden Magdaleno.

Figure 1
Figure 1. Figure 1 [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: SGNNs outperform state-of-the-art baselines in real-world disease forecasting and reveal the importance of mechanistic grounding. (A) SGNNs achieve 35.3% forecasting skill on early COVID-19 mortality, almost tripling the CDC Forecast Hub median and exceeding its best model—despite using no real COVID-19 data. (B) SGNNs produce accurate forecasts with calibrated uncertainty across diverse real-world locatio… view at source ↗
Figure 3
Figure 3. Figure 3: SGNNs generalize across scientific domains and task types. (A) Ecological forecasting: SGNNs outperform task-specific neural networks on both low-dimensional predator-prey systems (hare and lynx) and high-dimensional multispecies forecasting from the UK Butterfly Monitoring Scheme. SGNNs maintain forecasting skill as the number of species increases, while baselines degrade sharply. (B) Chemical yield predi… view at source ↗
Figure 4
Figure 4. Figure 4: SGNNs accurately infer unobservable parameters and provide mechanistic inter￾pretability via back-to-simulation attribution. (A) SGNNs infer a high reproduction number (R0 = 6.14) for New York City from early COVID-19 case data (Feb–Mar 2020), aligning with estimates from more complete datasets. Traditional methods underestimated early transmission due to underreporting and simplifying assumptions. (B) SGN… view at source ↗
Figure 5
Figure 5. Figure 5: SGNNs outperform classical models in forecasting lynx-hare predator-prey dynamics. Rolling forecasts are shown for four evaluation windows, comparing SGNN (red) to mechanistic models (RMG, blue) and VARMA (green) models. True population trajectories are plotted in black. SGNNs consistently maintain accurate phase and amplitude tracking, while baselines deteriorate. E.2 State R0 Estimates To further evaluat… view at source ↗
read the original abstract

Scientific modeling faces a tradeoff between the interpretability of mechanistic theory and the predictive power of machine learning. While existing hybrid approaches have made progress by incorporating domain knowledge into machine learning methods as functional constraints, they can be limited by a reliance on precise mathematical specifications. When the underlying equations are partially unknown or misspecified, enforcing rigid constraints can introduce bias and hinder a model's ability to learn from data. We introduce Simulation-Grounded Neural Networks (SGNNs), a framework that incorporates scientific theory by using mechanistic simulations as training data for neural networks. By pretraining on diverse synthetic corpora that span multiple model structures and realistic observational noise, SGNNs internalize the underlying dynamics of a system as a structural prior. We evaluated SGNNs across multiple disciplines, including epidemiology, ecology, social science, and chemistry. In forecasting tasks, SGNNs outperformed both standard data-driven baselines and physics-constrained hybrid models. They nearly tripled the forecasting skill of the average CDC models in COVID-19 mortality forecasts and accurately forecasted high-dimensional ecological systems. SGNNs demonstrated robustness to model misspecification, performing well even when trained on data with incorrect assumptions. Our framework also introduces back-to-simulation attribution, a method for mechanistic interpretability that explains real-world dynamics by identifying their most similar counterparts within the simulated corpus. By unifying these techniques into a single framework, we demonstrate that diverse mechanistic simulations can serve as effective training data for robust scientific inference.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces Simulation-Grounded Neural Networks (SGNNs), which pretrain neural networks on diverse mechanistic simulations spanning multiple model structures and observational noise levels to internalize system dynamics as a transferable structural prior. It evaluates the approach on forecasting tasks in epidemiology, ecology, social science, and chemistry, claiming that SGNNs outperform standard data-driven baselines and physics-constrained hybrid models, nearly triple the forecasting skill of average CDC models on COVID-19 mortality, remain effective under model misspecification, and enable mechanistic interpretability via back-to-simulation attribution.

Significance. If the empirical results hold under rigorous controls for ensemble construction and evaluation protocols, the framework offers a promising route to incorporate mechanistic knowledge flexibly without rigid equation constraints, potentially improving robustness in scientific ML applications across disciplines. The multi-domain evaluation and interpretability component add value if substantiated.

major comments (2)
  1. [Abstract and Experiments] The abstract and evaluation sections state that SGNNs nearly tripled CDC forecasting skill and outperformed baselines while remaining robust to misspecification, but supply no details on exact baselines, statistical tests, data splits, or how misspecification was implemented. This absence makes the central performance and robustness claims difficult to assess or reproduce.
  2. [Methods (Simulation Generation)] The description of the simulation corpus (spanning multiple model structures and noise levels) does not provide an explicit protocol demonstrating that ensemble selection was fixed independently of the target real data or that performance holds when relevant structures are deliberately omitted. Without this, the transfer of an unbiased structural prior cannot be distinguished from implicit leakage of domain knowledge about the target system.
minor comments (2)
  1. [Methods] Clarify the precise neural architecture, pretraining objective, and fine-tuning procedure for SGNNs, including any hyperparameters shared across domains.
  2. [Results] Add error bars, confidence intervals, or significance tests to all forecasting comparison figures and tables to support claims of outperformance.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive comments, which help strengthen the clarity and reproducibility of our work. We address each major comment below and have made revisions to the manuscript to provide the requested details on experimental protocols and simulation generation.

read point-by-point responses
  1. Referee: [Abstract and Experiments] The abstract and evaluation sections state that SGNNs nearly tripled CDC forecasting skill and outperformed baselines while remaining robust to misspecification, but supply no details on exact baselines, statistical tests, data splits, or how misspecification was implemented. This absence makes the central performance and robustness claims difficult to assess or reproduce.

    Authors: We agree that the original manuscript would benefit from greater specificity to support reproducibility. In the revised version, we have expanded the Experiments and Evaluation sections to explicitly list all baselines (including the precise data-driven models such as ARIMA, LSTM, and Transformer variants, as well as the hybrid physics-constrained models), report the statistical tests used (paired t-tests with Bonferroni correction and reported p-values), detail the data splitting protocol (temporal hold-out splits with fixed seed for forecasting horizons), and describe the misspecification implementation (e.g., training on simulations with deliberately omitted compartments or altered transmission rates). These additions are now included in the main text and supplementary materials. revision: yes

  2. Referee: [Methods (Simulation Generation)] The description of the simulation corpus (spanning multiple model structures and noise levels) does not provide an explicit protocol demonstrating that ensemble selection was fixed independently of the target real data or that performance holds when relevant structures are deliberately omitted. Without this, the transfer of an unbiased structural prior cannot be distinguished from implicit leakage of domain knowledge about the target system.

    Authors: We appreciate this point on ensuring independence to rule out leakage. The simulation corpus was generated from a fixed library of mechanistic models (SIR, SEIR, Lotka-Volterra, and others) and noise levels chosen a priori based on literature ranges, prior to accessing any real-world datasets. To address the concern directly, the revised Methods section now includes an explicit protocol subsection documenting the independent ensemble construction process, including the exact model structures, parameter ranges, and noise models used. We have also added ablation experiments demonstrating that performance remains strong even when structures most similar to the target system are deliberately omitted from the pretraining corpus, supporting the claim of a transferable structural prior rather than leakage. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper's core derivation uses externally generated mechanistic simulations (spanning multiple model structures and noise levels) as pretraining data to induce a structural prior in SGNNs, followed by transfer to real-world forecasting tasks. This chain does not reduce to self-definition, fitted inputs renamed as predictions, or self-citation load-bearing steps; the simulations are constructed from theoretical models independent of the target real data, and performance is assessed via direct comparison to baselines on held-out observations. No equations or protocols in the provided text exhibit the target result being presupposed in the input ensemble construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract does not introduce or rely on explicit free parameters, new axioms, or invented entities beyond standard neural network training and existing mechanistic simulators.

pith-pipeline@v0.9.0 · 5796 in / 1079 out tokens · 63288 ms · 2026-05-19T04:39:18.106704+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Mantis: A Foundation Model for Mechanistic Disease Forecasting

    cs.AI 2025-08 unverdicted novelty 7.0

    A foundation model trained only on disease simulations achieves top-ranked forecasting accuracy across 16 diseases and beats all CDC COVID-19 hub models on early unseen pandemic data.

  2. In-Context Learning Under Regime Change

    cs.LG 2026-04 unverdicted novelty 6.0

    Transformers can solve in-context change-point detection with model size scaling by knowledge of the shift timing, matching optimal baselines on synthetic data and improving pretrained models on disease and financial ...

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · cited by 2 Pith papers

  1. [1]

    Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations

    Maziar Raissi, Paris Perdikaris, and George Em Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics , 378:686–707, 2019

  2. [2]

    Aditya Prakash

    Alexander Rodr´ ıguez, Jiaming Cui, Naren Ramakrishnan, Bijaya Adhikari, and B. Aditya Prakash. Einns: Epidemiologically-informed neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, 2023

  3. [3]

    Defsi: Deep learning based epidemic forecasting with synthetic information

    Lijing Wang, Jiangzhuo Chen, and Madhav Marathe. Defsi: Deep learning based epidemic forecasting with synthetic information. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, 2019

  4. [4]

    Transformers can do bayesian inference.arXiv preprint arXiv:2112.10510, 2021

    Samuel M¨ uller, Noah Hollmann, Sebastian Pineda Arango, Josif Grabocka, and Frank Hutter. Trans- formers can do bayesian inference. arXiv preprint arXiv:2112.10510 , 2021

  5. [5]

    Tabpfn: A transformer that solves small tabular classification problems in a second, 2023

    Noah Hollmann, Samuel M¨ uller, Katharina Eggensperger, and Frank Hutter. Tabpfn: A transformer that solves small tabular classification problems in a second, 2023

  6. [6]

    TimePFN: Effective multivariate time series forecasting with synthetic data

    Ege Onur Taga, Muhammed Emrullah Ildiz, and Samet Oymak. TimePFN: Effective multivariate time series forecasting with synthetic data. In NeurIPS Workshop on Time Series in the Age of Large Models, 2024

  7. [7]

    A unified approach to interpreting model predictions, 2017

    Scott Lundberg and Su-In Lee. A unified approach to interpreting model predictions, 2017

  8. [8]

    ”why should i trust you?”: Explaining the predictions of any classifier, 2016

    Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ”why should i trust you?”: Explaining the predictions of any classifier, 2016

  9. [9]

    Weekly united states covid-19 cases and deaths by state - archived

    CDC COVID-19 Response. Weekly united states covid-19 cases and deaths by state - archived. https: //data.cdc.gov/Case-Surveillance/Weekly-United-States-COVID-19-Cases-and-Deaths-by-/ pwn4-m3yp, 2025. Centers for Disease Control and Prevention. Dataset last updated: June 1, 2023. Metadata last updated: February 23, 2025. Temporal coverage: 2020-01-22 to 2023-05-10

  10. [10]

    Cramer, Evan L

    Estee Y. Cramer, Evan L. Ray, Velma K. Lopez, Johannes Bracher, Andrea Brennen, Alvaro J. Castro Rivadeneira, Aaron Gerding, Tilmann Gneiting, Katie H. House, Yuxin Huang, Dasuni Jayawardena, Abdul H. Kanji, Ayush Khandelwal, Khoa Le, Anja M¨ uhlemann, Jarad Niemi, Apurv Shah, Ariane Stark, Yijin Wang, Nutcha Wattanachit, Martha W. Zorn, Youyang Gu, Sansi...

  11. [11]

    Maddix, Hao Wang, Michael W

    Abdul Fatir Ansari, Lorenzo Stella, Caner Turkmen, Xiyuan Zhang, Pedro Mercado, Huibin Shen, Oleksandr Shchur, Syama Sundar Rangapuram, Sebastian Pineda Arango, Shubham Kapoor, Jasper Zschiegner, Danielle C. Maddix, Hao Wang, Michael W. Mahoney, Kari Torkkola, Andrew Gordon Wilson, Michael Bohlke-Schneider, and Yuyang Wang. Chronos: Transformer-based lang...

  12. [12]

    Counts of dengue without warning signs reported in brazil: 1980–2005 (2.0) [data set]

    Willem Van Panhuis, Abigail Cross, and Donald Burke. Counts of dengue without warning signs reported in brazil: 1980–2005 (2.0) [data set]. https://doi.org/10.25337/T7/ptycho.v2.0/BR.722862003,

  13. [13]

    R datasets package: lynx time series data (annual canadian lynx trappings, 1821–1934),

    R Core Team. R datasets package: lynx time series data (annual canadian lynx trappings, 1821–1934),

  14. [14]

    Accessed via R base package datasets, version 4.3.1

  15. [15]

    United kingdom butterfly monitoring scheme collated indices 2023, 2023

    UK Butterfly Monitoring Scheme (UKBMS). United kingdom butterfly monitoring scheme collated indices 2023, 2023. Accessed on June 11, 2025

  16. [16]

    Graphical representation and stability conditions of predator-prey interactions

    Michael L Rosenzweig and Robert H MacArthur. Graphical representation and stability conditions of predator-prey interactions. The American Naturalist, 97(895):209–223, 1963

  17. [17]

    A new stereospecific cross-coupling by the palladium- catalyzed reaction of 1-alkenylboranes with 1-alkenyl or 1-alkynyl halides

    Norio Miyaura, Kinji Yamada, and Akira Suzuki. A new stereospecific cross-coupling by the palladium- catalyzed reaction of 1-alkenylboranes with 1-alkenyl or 1-alkynyl halides. Tetrahedron Letters, 20(36):3437–3440, 1979. 12

  18. [18]

    Tucker, Shalini Brahmbhatt, Christopher J

    Damith Perera, Joseph W. Tucker, Shalini Brahmbhatt, Christopher J. Helal, Ashley Chong, William Farrell, Paul Richardson, and Neal W. Sach. A platform for automated nanomole-scale reaction screening and micromole-scale synthesis in flow. Science, 359(6374):429–434, 2018

  19. [19]

    Granda, Liva Donina, Vincenza Dragone, De-Liang Long, and Leroy Cronin

    Jaros law M. Granda, Liva Donina, Vincenza Dragone, De-Liang Long, and Leroy Cronin. Controlling an organic synthesis robot with machine learning to search for new reactivity. Nature, 559(7714):377–381, 2018

  20. [20]

    Emergence of scaling in random networks

    Albert-Laszlo Barabasi and Reka Albert. Emergence of scaling in random networks. Science, 286(5439):509–512, October 1999

  21. [21]

    Rumors in a network: Who’s the culprit?, 2010

    Devavrat Shah and Tauhid Zaman. Rumors in a network: Who’s the culprit?, 2010

  22. [22]

    Cascade source inference in networks: a markov chain monte carlo approach

    Xuming Zhai, Weili Wu, and Wen Xu. Cascade source inference in networks: a markov chain monte carlo approach. Computational Social Networks , 2(1):17, 2015

  23. [23]

    Anderson and Robert M

    Roy M. Anderson and Robert M. May. Infectious Diseases of Humans: Dynamics and Control . Oxford University Press, Oxford, UK, 1992

  24. [24]

    Odo Diekmann, J. A. P. Heesterbeek, and J. A. J. Metz. On the definition and the computation of the basic reproduction ratio r0 in models for infectious diseases in heterogeneous populations. Journal of Mathematical Biology, 28(4):365–382, 1990

  25. [25]

    Anderson, and Neil M

    Christophe Fraser, Steven Riley, Roy M. Anderson, and Neil M. Ferguson. Factors that make an infectious disease outbreak controllable. Proceedings of the National Academy of Sciences, 104(8):3747– 3752, 2004

  26. [26]

    van den Driessche and J

    P. van den Driessche and J. Watmough. Reproduction numbers and sub-threshold endemic equilibria for compartmental models of disease transmission. Mathematical Biosciences, 180:29–48, Nov–Dec 2002

  27. [27]

    How generation intervals shape the relationship between growth rates and reproductive numbers

    Jacco Wallinga and Marc Lipsitch. How generation intervals shape the relationship between growth rates and reproductive numbers. Proceedings of the Royal Society B: Biological Sciences , 274(1609):599–604, 2007

  28. [28]

    Jes´ us Fern´ andez-Villaverde and Charles I. Jones. Estimating and simulating a sird model of covid-19 for many countries, states, and cities. NBER Working Paper 27128, National Bureau of Economic Research, Cambridge, MA, May 2020

  29. [29]

    Phillips, Zhongqai Miao, Whitney Mgbara, Yue You, Richard Salter, Alan E

    Ludovica Luisa Vissat, Nir Horvitz, Rachael V. Phillips, Zhongqai Miao, Whitney Mgbara, Yue You, Richard Salter, Alan E. Hubbard, and Wayne M. Getz. A comparison of covid-19 outbreaks across us combined statistical areas using new methods for estimating r0 and social distancing behaviour. Epidemics, 41:100640, December 2022

  30. [30]

    Ives and Claudio Bozzuto

    Anthony R. Ives and Claudio Bozzuto. State-by-state estimates of r0 at the start of covid-19 outbreaks in the usa. https://doi.org/10.1101/2020.05.17.20104653, May 2020. Preprint, not peer-reviewed

  31. [31]

    A contribution to the mathematical theory of epidemics

    William Ogilvy Kermack and Anderson G McKendrick. A contribution to the mathematical theory of epidemics. Proceedings of the royal society of london. Series A, Containing papers of a mathematical and physical character, 115(772):700–721, 1927

  32. [32]

    Contribution to the theory of periodic reactions

    Alfred J Lotka. Contribution to the theory of periodic reactions. The Journal of Physical Chemistry , 14(3):271–274, 2002

  33. [33]

    Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam

    Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers, 2023

  34. [34]

    Decoupled weight decay regularization, 2019

    Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization, 2019. 13 Appendices A Simulation Frameworks A.1 Infectious Disease Simulation Framework Mechanistic simulator. To generate synthetic training data for epidemiological forecasting and inference tasks, we developed a universal infectious disease simulator that synthesizes diverse outbr...

  35. [35]

    A main regression head (yield prediction), which is summed with the wide pathway output

  36. [36]

    An auxiliary head, providing an independent yield prediction to encourage representation diversity

  37. [37]

    •Ensembling: The final prediction is a learnable weighted ensemble of the main and auxiliary heads

    A confidence head, which outputs a [0,1] scalar using a sigmoid activation to represent model uncertainty. •Ensembling: The final prediction is a learnable weighted ensemble of the main and auxiliary heads. Ensemble weights are learned during training via a softmax-normalized two-element parameter vector. This architecture is trained entirely on synthetic...