pith. sign in

arxiv: 2606.07865 · v1 · pith:SSGZXGFOnew · submitted 2026-06-05 · 💻 cs.LG · cs.AI· physics.comp-ph· stat.ML

Instrumented data for causal scientific machine learning

Pith reviewed 2026-06-27 22:22 UTC · model grok-4.3

classification 💻 cs.LG cs.AIphysics.comp-phstat.ML
keywords instrumented datacausal scientific machine learningmechanistic modelscounterfactualsuncertainty quantificationimage-to-simulationverification and validationscientific machine learning
0
0 comments X

The pith

Every training datum can embed its own mechanistic model, uncertainty estimate, and editable counterfactuals.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Scientific machine learning is bottlenecked by data that either records outcomes without causes or uses generic templates instead of case-specific processes. The paper proposes instrumented data as a third option where each example carries the mechanistic model that generated it, an explicit uncertainty, and a family of executable counterfactuals. If realized through verification-and-validation image-to-simulation pipelines, this would let models train on substrate that is case-specific, mechanistically supervised, and open to causal interventions. Near-term uses include improved validation, auditing, and surrogate construction in biology, climate, materials, fluids, and medical imaging, with longer-term reach toward foundation models for scientific reasoning.

Core claim

The paper claims that instrumented data, in which every datum carries the mechanistic model that produced it, an explicit uncertainty over that model, and an executable family of counterfactuals, is now operationally feasible. Verification-and-validation instrumented image-to-simulation pipelines realize this by turning a sensor observation into a fully specified, solver-backed simulation with explicit, editable parameters and propagated aleatoric and epistemic uncertainty. The resulting data substrate is case-specific and mechanistically supervised, supports causal interventions through the do-operator, and enables near-term advances in validation, auditing, and surrogate training across co

What carries the argument

Instrumented data, the structure that attaches to each datum its generating mechanistic model, explicit uncertainty, and executable family of counterfactuals.

If this is right

  • Training data becomes open to direct causal interventions via the do-operator.
  • Validation and auditing gain mechanistic traceability in biology, climate, materials, fluids, and medical imaging.
  • Surrogate models can be trained on case-specific, mechanistically supervised examples rather than generic templates.
  • A longer-term path opens toward foundation models whose scientific reasoning can be falsified against embedded models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Existing sensor streams could be retrofitted into instrumented form without requiring entirely new data collection.
  • The approach may reduce the volume of data needed for reliable scientific models by increasing the information density per example.
  • Uncertainty propagation through the pipeline creates a natural test bed for comparing epistemic and aleatoric contributions in downstream tasks.
  • Adoption would require solver interfaces that expose parameters and uncertainties in standardized, machine-readable form.

Load-bearing premise

Verification-and-validation instrumented image-to-simulation pipelines are operationally feasible and can be realized at scale for case-specific mechanistic supervision.

What would settle it

A concrete demonstration that an image-to-simulation pipeline cannot propagate uncertainty or support editable parameters at usable scale for any domain such as medical imaging would falsify the operational-feasibility claim.

Figures

Figures reproduced from arXiv: 2606.07865 by Daniel N. Wilke.

Figure 1
Figure 1. Figure 1: The instrumented-data loop. A sensor observation is converted by a V&V instrumen [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: When can a mature instrumentation pipeline act as a peer reviewer? [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
read the original abstract

Scientific machine learning is limited less by model size than by the data it is trained on. Observational data records what happened but not why; template synthetic data has a known generating process but only for the simulator's template, not the case a user faces. We argue a third option is now operationally feasible: instrumented data, in which every datum carries the mechanistic model that produced it, an explicit uncertainty over that model, and an executable family of counterfactuals. Verification-and-validation (V&V) instrumented image-to-simulation pipelines are one realisation: a sensor observation becomes a fully specified, solver-backed simulation with explicit, editable parameters and a propagated aleatoric/epistemic uncertainty. The substrate is case-specific, mechanistically supervised, and supports causal interventions through Pearl's do-operator. Near-term consequences for validation, auditing, and surrogate training span computational biology, climate, materials, fluid mechanics, and medical imaging; a longer-term, falsifiable implication concerns foundation models for scientific reasoning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes 'instrumented data' as a third data paradigm for causal scientific machine learning, beyond observational data (which records outcomes without mechanisms) and template synthetic data (tied to a fixed simulator template). Each instrumented datum embeds the mechanistic model that produced it, explicit uncertainty over that model, and an executable family of counterfactuals. Verification-and-validation (V&V) instrumented image-to-simulation pipelines are presented as one realization that converts sensor observations into solver-backed simulations with editable parameters, propagated aleatoric/epistemic uncertainty, and support for Pearl's do-operator interventions. Near-term uses include validation, auditing, and surrogate training across computational biology, climate, materials, fluid mechanics, and medical imaging; a longer-term implication is foundation models for scientific reasoning.

Significance. If the feasibility claim holds, instrumented data could supply mechanistically supervised training signals that enable causal reasoning and case-specific validation at scale, addressing a core bottleneck in scientific ML. The vision of falsifiable implications for scientific foundation models is high-potential. The manuscript, however, is a position paper whose central assertion of operational feasibility rests on an unshown engineering substrate rather than any derivation, pipeline, or demonstration.

major comments (2)
  1. The abstract's claim that 'V&V instrumented image-to-simulation pipelines are one realisation' and are 'now operationally feasible' at scale is load-bearing for the entire argument yet supplies no workflow, solver integration example, uncertainty propagation derivation, runtime/memory scaling data, or domain-specific instance (e.g., biology or fluids).
  2. No concrete mechanism is given for how the 'executable family of counterfactuals' would be realized while preserving editability and propagating both aleatoric and epistemic uncertainty through the image-to-simulation step.
minor comments (1)
  1. The abstract would be strengthened by a brief comparison to related concepts such as digital twins or physics-informed surrogates to clarify novelty.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive report and for recognizing the potential of instrumented data. The manuscript is a position paper whose goal is to articulate a new data paradigm rather than to deliver an engineering implementation or scaling study. We address the two major comments below and will revise the text to better distinguish conceptual claims from the supporting substrate drawn from existing V&V practice.

read point-by-point responses
  1. Referee: The abstract's claim that 'V&V instrumented image-to-simulation pipelines are one realisation' and are 'now operationally feasible' at scale is load-bearing for the entire argument yet supplies no workflow, solver integration example, uncertainty propagation derivation, runtime/memory scaling data, or domain-specific instance (e.g., biology or fluids).

    Authors: We agree that the manuscript supplies none of the requested technical artifacts. As a position paper the central claim is that the required substrate (case-specific V&V pipelines that already embed solvers, uncertainty quantification, and editable parameters) exists in multiple scientific domains and can be repurposed for causal training. We will revise the abstract and introduction to cite representative published pipelines in medical imaging, computational fluid dynamics, and materials science that already perform image-to-simulation with propagated uncertainty, thereby grounding the feasibility statement without adding new derivations or benchmarks. revision: yes

  2. Referee: No concrete mechanism is given for how the 'executable family of counterfactuals' would be realized while preserving editability and propagating both aleatoric and epistemic uncertainty through the image-to-simulation step.

    Authors: The manuscript does not supply a concrete mechanism or derivation. The intended realization is the standard practice, already routine in V&V, of exposing solver parameters as editable inputs and applying established uncertainty-quantification techniques (Monte-Carlo sampling for aleatoric, Bayesian or ensemble methods for epistemic) to the resulting simulation ensemble. We will add a short clarifying paragraph that points to this existing practice rather than claiming a novel algorithmic contribution. revision: yes

Circularity Check

0 steps flagged

No circularity: conceptual position paper with no derivations or fitted quantities

full rationale

The manuscript is a conceptual proposal advocating instrumented data and V&V pipelines. It contains no equations, no parameter fitting, no derivation chain, and no self-citations used to justify core claims. The feasibility assertion is stated as a premise rather than derived from any prior result by the same authors. All load-bearing steps are therefore external to any self-referential construction, satisfying the default expectation of no significant circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

The claim rests on the unproven operational feasibility of instrumented pipelines and the utility of attaching mechanistic models to data; no free parameters, standard axioms, or invented physical entities are introduced.

invented entities (1)
  • instrumented data no independent evidence
    purpose: A new data substrate that carries mechanistic models, uncertainty, and counterfactuals for causal supervision
    Introduced in the abstract as a third option without reference to prior independent evidence or implementation.

pith-pipeline@v0.9.1-grok · 5700 in / 1159 out tokens · 24567 ms · 2026-06-27T22:22:43.733229+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

31 extracted references · 21 canonical work pages · 4 internal anchors

  1. [1]

    R. Lam, A. Sanchez-Gonzalez, M. Willson, P. Wirnsberger, M. Fortunato, F. Alet, S. Ravuri, T. Ewalds, Z. Eaton-Rosen, W. Hu, A. Merose, S. Hoyer, G. Holland, O. Vinyals, J. Stott, A. Pritzel, S. Mohamed, and P. Battaglia. Learning skillful medium-range global weather forecasting.Science, 382(6677):1416–1421, 2023. doi.org/10.1126/science.adi2336

  2. [2]

    K. Bi, L. Xie, H. Zhang, X. Chen, X. Gu, and Q. Tian. Accurate medium-range global weather forecasting with 3D neural networks.Nature, 619(7970):533–538, 2023.doi.org/10.1038/s41586-023-06185-3

  3. [3]

    S., Aykol, M., Cheon, G

    A. Merchant, S. Batzner, S.S. Schoenholz, M. Aykol, G. Cheon, and E.D. Cubuk. Scaling deep learning for materials discovery.Nature, 624:80–85, 2023.doi.org/10.1038/s41586-023-06735-9

  4. [4]

    Cheetham and R

    A.K. Cheetham and R. Seshadri. Artificial intelligence driving materials discovery? Perspective on the article: scaling deep learning for materials discovery.Chemistry of Materials, 36(8):3490–3495, 2024.doi.org/10. 1021/acs.chemmater.4c00643

  5. [5]

    Highly Accurate Protein Structure Prediction with

    J. Jumper, R. Evans, A. Pritzel, T. Green, M. Figurnov, O. Ronneberger, K. Tunyasuvunakool, R. Bates, A. Žídek, A. Potapenko, et al. Highly accurate protein structure prediction with AlphaFold.Nature, 596(7873):583–589, 2021.doi.org/10.1038/s41586-021-03819-2

  6. [6]

    Pfaff, M

    T. Pfaff, M. Fortunato, A. Sanchez-Gonzalez, and P.W. Battaglia. Learning mesh-based simulation with graph networks (MeshGraphNets). InInternational Conference on Learning Representations (ICLR), 2021. doi.org/10.48550/arXiv.2010.03409

  7. [7]

    Niederer, J

    S.A. Niederer, J. Lumens, and N.A. Trayanova. Computational models in cardiology.Nature Reviews Cardiology, 16:100–111, 2019.doi.org/10.1038/s41569-018-0104-y

  8. [8]

    Geirhos, J.-H

    R. Geirhos, J.-H. Jacobsen, C. Michaelis, R. Zemel, W. Brendel, M. Bethge, and F.A. Wichmann. Short- cut learning in deep neural networks.Nature Machine Intelligence, 2:665–673, 2020.doi.org/10.1038/ s42256-020-00257-z

  9. [9]

    Training Compute-Optimal Large Language Models

    J. Hoffmann, S. Borgeaud, A. Mensch, E. Buchatskaya, T. Cai, E. Rutherford, D. de Las Casas, L.A. Hendricks, J. Welbl, A. Clark, T. Hennigan, E. Noland, K. Millican, G. van den Driessche, B. Damoc, A. Guy, S. Osindero, K. Simonyan, E. Elsen, J.W. Rae, O. Vinyals, and L. Sifre. Training compute-optimal large language models. In Advances in Neural Informati...

  10. [10]

    Everyone wants to do the model work, not the data work

    N. Sambasivan, S. Kapania, H. Highfill, D. Akrong, P. Paritosh, and L.M. Aroyo. “Everyone wants to do the model work, not the data work”: Data cascades in high-stakes AI. InProc. ACM CHI Conference on Human Factors in Computing Systems, article no. 39, p. 1–15, 2021.doi.org/10.1145/3411764.3445518. 9

  11. [11]

    Gadre, G

    S.Y. Gadre, G. Ilharco, A. Fang, J. Hayase, G. Smyrnis, T. Nguyen, R. Marten, M. Wortsman, D. Ghosh, J. Zhang, E. Orgad, R. Entezari, G. Daras, S. Pratt, V. Ramanujan, Y. Bitton, K. Marathe, S. Mussmann, R. Vencu, M. Cherti, R. Krishna, P.W. Koh, O. Saukh, A. Ratner, S. Song, H. Hajishirzi, A. Farhadi, R. Beaumont, S. Oh, A. Dimakis, J. Jitsev, Y. Carmon,...

  12. [12]

    Schölkopf, F

    B. Schölkopf, F. Locatello, S. Bauer, N.R. Ke, N. Kalchbrenner, A. Goyal, and Y. Bengio. Toward causal representation learning.Proceedings of the IEEE, 109(5):612–634, 2021. doi.org/10.1109/JPROC.2021. 3058954

  13. [13]

    Oberkampf and C.J

    W.L. Oberkampf and C.J. Roy.Verification and Validation in Scientific Computing. Cambridge University Press, Cambridge, 2010. ISBN 978-0-521-11360-1.doi.org/10.1017/CBO9780511760396

  14. [14]

    ASME, New York, 2020 (reaffirmed 2025)

    American Society of Mechanical Engineers.ASME V&V 10-2019 (R2025): Guide for Verification and Validation in Computational Solid Mechanics. ASME, New York, 2020 (reaffirmed 2025). asme.org/ codes-standards/v-v-10

  15. [15]

    World Models

    D. Ha and J. Schmidhuber. World models. InAdvances in Neural Information Processing Systems (NeurIPS), 2018.doi.org/10.48550/arXiv.1803.10122

  16. [16]

    Y. LeCun. A path towards autonomous machine intelligence. Open Review preprint, version 0.9, 2022. openreview.net/forum?id=BZ5a1r-kVsf

  17. [17]

    Mastering Diverse Domains through World Models

    D. Hafner, J. Pasukonis, J. Ba, and T. Lillicrap. Mastering diverse domains through world models.arXiv preprintarXiv:2301.04104, 2023.doi.org/10.48550/arXiv.2301.04104

  18. [18]

    D.N. Wilke. From Perception to Autonomous Computational Modeling: A Multi-Agent Approach.arXiv preprintarXiv:2604.06788, 2026.arxiv.org/abs/2604.06788

  19. [19]

    Pearl.Causality: Models, Reasoning, and Inference

    J. Pearl.Causality: Models, Reasoning, and Inference. Cambridge University Press, Cambridge, 2nd edition,

  20. [20]

    ISBN 978-0-521-89560-6.doi.org/10.1017/CBO9780511803161

  21. [21]

    L.A. Zadeh. Fuzzy sets.Information and Control, 8(3):338–353, 1965.doi.org/10.1016/S0019-9958(65) 90241-X

  22. [22]

    Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart, and A. Anandkumar. Fourier neural operator for parametric partial differential equations. InInternational Conference on Learning Representations (ICLR), 2021.doi.org/10.48550/arXiv.2010.08895

  23. [23]

    Kovachki, Z

    N. Kovachki, Z. Li, B. Liu, K. Azizzadenesheli, K. Bhattacharya, A.M. Stuart, and A. Anandkumar. Neural operator: Learning maps between function spaces with applications to PDEs.Journal of Machine Learning Research, 24(89):1–97, 2023.jmlr.org/papers/v24/21-1524.html

  24. [24]

    Raissi, P

    M. Raissi, P. Perdikaris, and G.E. Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations.Journal of Computational Physics, 378:686–707, 2019.doi.org/10.1016/j.jcp.2018.10.045

  25. [25]

    Sanchez-Gonzalez, J

    A. Sanchez-Gonzalez, J. Godwin, T. Pfaff, R. Ying, J. Leskovec, and P.W. Battaglia. Learning to simulate complex physics with graph networks. InProceedings of the 37th International Conference on Machine Learning (ICML), volume 119 ofPMLR, pages 8459–8468, 2020.proceedings.mlr.press/v119/sanchez-gonzalez20a. html

  26. [26]

    Z. Jin, Y. Chen, F. Leeb, L. Gresele, O. Kamal, Z. Lyu, K. Blin, F. Gonzalez Adauto, M. Kleiman-Weiner, M. Sachan, and B. Schölkopf. CLadder: Assessing causal reasoning in language models. InAdvances in Neural Information Processing Systems (NeurIPS), 2023.doi.org/10.48550/arXiv.2312.04350

  27. [27]

    Kaushik, E

    D. Kaushik, E. Hovy, and Z.C. Lipton. Learning the difference that makes a difference with counterfactually- augmented data. InInternational Conference on Learning Representations (ICLR), 2020.doi.org/10.48550/ arXiv.1909.12434

  28. [28]

    V. Vovk, A. Gammerman, and G. Shafer.Algorithmic Learning in a Random World. Springer, New York,

  29. [29]

    ISBN 978-0-387-00152-4.doi.org/10.1007/b106715

  30. [30]

    and Bates, Stephen , title =

    A.N. Angelopoulos and S. Bates. Conformal prediction: A gentle introduction.Foundations and Trends in Machine Learning, 16(4):494–591, 2023.doi.org/10.1561/2200000101

  31. [31]

    Huang, L

    T. Gebru, J. Morgenstern, B. Vecchione, J.W. Vaughan, H. Wallach, H. Daumé III, and K. Crawford. Datasheets for datasets.Communications of the ACM, 64(12):86–92, 2021.doi.org/10.1145/3458723. 10