Instrumented data for causal scientific machine learning

Daniel N. Wilke

arxiv: 2606.07865 · v1 · pith:SSGZXGFOnew · submitted 2026-06-05 · 💻 cs.LG · cs.AI· physics.comp-ph· stat.ML

Instrumented data for causal scientific machine learning

Daniel N. Wilke This is my paper

Pith reviewed 2026-06-27 22:22 UTC · model grok-4.3

classification 💻 cs.LG cs.AIphysics.comp-phstat.ML

keywords instrumented datacausal scientific machine learningmechanistic modelscounterfactualsuncertainty quantificationimage-to-simulationverification and validationscientific machine learning

0 comments

The pith

Every training datum can embed its own mechanistic model, uncertainty estimate, and editable counterfactuals.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Scientific machine learning is bottlenecked by data that either records outcomes without causes or uses generic templates instead of case-specific processes. The paper proposes instrumented data as a third option where each example carries the mechanistic model that generated it, an explicit uncertainty, and a family of executable counterfactuals. If realized through verification-and-validation image-to-simulation pipelines, this would let models train on substrate that is case-specific, mechanistically supervised, and open to causal interventions. Near-term uses include improved validation, auditing, and surrogate construction in biology, climate, materials, fluids, and medical imaging, with longer-term reach toward foundation models for scientific reasoning.

Core claim

The paper claims that instrumented data, in which every datum carries the mechanistic model that produced it, an explicit uncertainty over that model, and an executable family of counterfactuals, is now operationally feasible. Verification-and-validation instrumented image-to-simulation pipelines realize this by turning a sensor observation into a fully specified, solver-backed simulation with explicit, editable parameters and propagated aleatoric and epistemic uncertainty. The resulting data substrate is case-specific and mechanistically supervised, supports causal interventions through the do-operator, and enables near-term advances in validation, auditing, and surrogate training across co

What carries the argument

Instrumented data, the structure that attaches to each datum its generating mechanistic model, explicit uncertainty, and executable family of counterfactuals.

If this is right

Training data becomes open to direct causal interventions via the do-operator.
Validation and auditing gain mechanistic traceability in biology, climate, materials, fluids, and medical imaging.
Surrogate models can be trained on case-specific, mechanistically supervised examples rather than generic templates.
A longer-term path opens toward foundation models whose scientific reasoning can be falsified against embedded models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Existing sensor streams could be retrofitted into instrumented form without requiring entirely new data collection.
The approach may reduce the volume of data needed for reliable scientific models by increasing the information density per example.
Uncertainty propagation through the pipeline creates a natural test bed for comparing epistemic and aleatoric contributions in downstream tasks.
Adoption would require solver interfaces that expose parameters and uncertainties in standardized, machine-readable form.

Load-bearing premise

Verification-and-validation instrumented image-to-simulation pipelines are operationally feasible and can be realized at scale for case-specific mechanistic supervision.

What would settle it

A concrete demonstration that an image-to-simulation pipeline cannot propagate uncertainty or support editable parameters at usable scale for any domain such as medical imaging would falsify the operational-feasibility claim.

Figures

Figures reproduced from arXiv: 2606.07865 by Daniel N. Wilke.

**Figure 2.** Figure 2: When can a mature instrumentation pipeline act as a peer reviewer? [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

read the original abstract

Scientific machine learning is limited less by model size than by the data it is trained on. Observational data records what happened but not why; template synthetic data has a known generating process but only for the simulator's template, not the case a user faces. We argue a third option is now operationally feasible: instrumented data, in which every datum carries the mechanistic model that produced it, an explicit uncertainty over that model, and an executable family of counterfactuals. Verification-and-validation (V&V) instrumented image-to-simulation pipelines are one realisation: a sensor observation becomes a fully specified, solver-backed simulation with explicit, editable parameters and a propagated aleatoric/epistemic uncertainty. The substrate is case-specific, mechanistically supervised, and supports causal interventions through Pearl's do-operator. Near-term consequences for validation, auditing, and surrogate training span computational biology, climate, materials, fluid mechanics, and medical imaging; a longer-term, falsifiable implication concerns foundation models for scientific reasoning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Position paper proposes 'instrumented data' with mechanistic models and counterfactuals but asserts feasibility without any pipeline, example, or evidence.

read the letter

The paper's main point is that scientific machine learning needs a third data type called instrumented data. Each point would carry the mechanistic model that generated it, explicit uncertainty, and an executable set of counterfactuals so that causal interventions become possible during training.

It does a clear job naming the limits of the two existing options. Observational data shows outcomes without mechanisms, and template synthetic data only matches the simulator's generic case rather than the specific instance a user faces. The suggestion that V&V image-to-simulation pipelines could supply case-specific, solver-backed data with propagated uncertainty is a reasonable way to frame the gap.

The soft spot is that the central claim of operational feasibility is simply stated. The abstract gives no workflow, no solver integration, no uncertainty propagation steps, no scaling numbers, and no worked example in any domain. Without those, the argument reduces to the assertion that the required engineering already exists at usable scale.

This is a position piece rather than a methods or results paper. It is aimed at people already working on causal methods inside scientific domains who want to think about data infrastructure. The thinking is coherent on its own terms and engages the relevant literature on causal ML and scientific computing.

A journal that publishes perspectives could usefully send it for review so the community can discuss whether the pipelines are close enough to build. As a standard research submission it would need concrete demonstrations before it would hold up.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes 'instrumented data' as a third data paradigm for causal scientific machine learning, beyond observational data (which records outcomes without mechanisms) and template synthetic data (tied to a fixed simulator template). Each instrumented datum embeds the mechanistic model that produced it, explicit uncertainty over that model, and an executable family of counterfactuals. Verification-and-validation (V&V) instrumented image-to-simulation pipelines are presented as one realization that converts sensor observations into solver-backed simulations with editable parameters, propagated aleatoric/epistemic uncertainty, and support for Pearl's do-operator interventions. Near-term uses include validation, auditing, and surrogate training across computational biology, climate, materials, fluid mechanics, and medical imaging; a longer-term implication is foundation models for scientific reasoning.

Significance. If the feasibility claim holds, instrumented data could supply mechanistically supervised training signals that enable causal reasoning and case-specific validation at scale, addressing a core bottleneck in scientific ML. The vision of falsifiable implications for scientific foundation models is high-potential. The manuscript, however, is a position paper whose central assertion of operational feasibility rests on an unshown engineering substrate rather than any derivation, pipeline, or demonstration.

major comments (2)

The abstract's claim that 'V&V instrumented image-to-simulation pipelines are one realisation' and are 'now operationally feasible' at scale is load-bearing for the entire argument yet supplies no workflow, solver integration example, uncertainty propagation derivation, runtime/memory scaling data, or domain-specific instance (e.g., biology or fluids).
No concrete mechanism is given for how the 'executable family of counterfactuals' would be realized while preserving editability and propagating both aleatoric and epistemic uncertainty through the image-to-simulation step.

minor comments (1)

The abstract would be strengthened by a brief comparison to related concepts such as digital twins or physics-informed surrogates to clarify novelty.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive report and for recognizing the potential of instrumented data. The manuscript is a position paper whose goal is to articulate a new data paradigm rather than to deliver an engineering implementation or scaling study. We address the two major comments below and will revise the text to better distinguish conceptual claims from the supporting substrate drawn from existing V&V practice.

read point-by-point responses

Referee: The abstract's claim that 'V&V instrumented image-to-simulation pipelines are one realisation' and are 'now operationally feasible' at scale is load-bearing for the entire argument yet supplies no workflow, solver integration example, uncertainty propagation derivation, runtime/memory scaling data, or domain-specific instance (e.g., biology or fluids).

Authors: We agree that the manuscript supplies none of the requested technical artifacts. As a position paper the central claim is that the required substrate (case-specific V&V pipelines that already embed solvers, uncertainty quantification, and editable parameters) exists in multiple scientific domains and can be repurposed for causal training. We will revise the abstract and introduction to cite representative published pipelines in medical imaging, computational fluid dynamics, and materials science that already perform image-to-simulation with propagated uncertainty, thereby grounding the feasibility statement without adding new derivations or benchmarks. revision: yes
Referee: No concrete mechanism is given for how the 'executable family of counterfactuals' would be realized while preserving editability and propagating both aleatoric and epistemic uncertainty through the image-to-simulation step.

Authors: The manuscript does not supply a concrete mechanism or derivation. The intended realization is the standard practice, already routine in V&V, of exposing solver parameters as editable inputs and applying established uncertainty-quantification techniques (Monte-Carlo sampling for aleatoric, Bayesian or ensemble methods for epistemic) to the resulting simulation ensemble. We will add a short clarifying paragraph that points to this existing practice rather than claiming a novel algorithmic contribution. revision: yes

Circularity Check

0 steps flagged

No circularity: conceptual position paper with no derivations or fitted quantities

full rationale

The manuscript is a conceptual proposal advocating instrumented data and V&V pipelines. It contains no equations, no parameter fitting, no derivation chain, and no self-citations used to justify core claims. The feasibility assertion is stated as a premise rather than derived from any prior result by the same authors. All load-bearing steps are therefore external to any self-referential construction, satisfying the default expectation of no significant circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

The claim rests on the unproven operational feasibility of instrumented pipelines and the utility of attaching mechanistic models to data; no free parameters, standard axioms, or invented physical entities are introduced.

invented entities (1)

instrumented data no independent evidence
purpose: A new data substrate that carries mechanistic models, uncertainty, and counterfactuals for causal supervision
Introduced in the abstract as a third option without reference to prior independent evidence or implementation.

pith-pipeline@v0.9.1-grok · 5700 in / 1159 out tokens · 24567 ms · 2026-06-27T22:22:43.733229+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 21 canonical work pages · 4 internal anchors

[1]

R. Lam, A. Sanchez-Gonzalez, M. Willson, P. Wirnsberger, M. Fortunato, F. Alet, S. Ravuri, T. Ewalds, Z. Eaton-Rosen, W. Hu, A. Merose, S. Hoyer, G. Holland, O. Vinyals, J. Stott, A. Pritzel, S. Mohamed, and P. Battaglia. Learning skillful medium-range global weather forecasting.Science, 382(6677):1416–1421, 2023. doi.org/10.1126/science.adi2336

work page doi:10.1126/science.adi2336 2023
[2]

K. Bi, L. Xie, H. Zhang, X. Chen, X. Gu, and Q. Tian. Accurate medium-range global weather forecasting with 3D neural networks.Nature, 619(7970):533–538, 2023.doi.org/10.1038/s41586-023-06185-3

work page doi:10.1038/s41586-023-06185-3 2023
[3]

S., Aykol, M., Cheon, G

A. Merchant, S. Batzner, S.S. Schoenholz, M. Aykol, G. Cheon, and E.D. Cubuk. Scaling deep learning for materials discovery.Nature, 624:80–85, 2023.doi.org/10.1038/s41586-023-06735-9

work page doi:10.1038/s41586-023-06735-9 2023
[4]

Cheetham and R

A.K. Cheetham and R. Seshadri. Artificial intelligence driving materials discovery? Perspective on the article: scaling deep learning for materials discovery.Chemistry of Materials, 36(8):3490–3495, 2024.doi.org/10. 1021/acs.chemmater.4c00643

2024
[5]

Highly Accurate Protein Structure Prediction with

J. Jumper, R. Evans, A. Pritzel, T. Green, M. Figurnov, O. Ronneberger, K. Tunyasuvunakool, R. Bates, A. Žídek, A. Potapenko, et al. Highly accurate protein structure prediction with AlphaFold.Nature, 596(7873):583–589, 2021.doi.org/10.1038/s41586-021-03819-2

work page doi:10.1038/s41586-021-03819-2 2021
[6]

Pfaff, M

T. Pfaff, M. Fortunato, A. Sanchez-Gonzalez, and P.W. Battaglia. Learning mesh-based simulation with graph networks (MeshGraphNets). InInternational Conference on Learning Representations (ICLR), 2021. doi.org/10.48550/arXiv.2010.03409

work page doi:10.48550/arxiv.2010.03409 2021
[7]

Niederer, J

S.A. Niederer, J. Lumens, and N.A. Trayanova. Computational models in cardiology.Nature Reviews Cardiology, 16:100–111, 2019.doi.org/10.1038/s41569-018-0104-y

work page doi:10.1038/s41569-018-0104-y 2019
[8]

Geirhos, J.-H

R. Geirhos, J.-H. Jacobsen, C. Michaelis, R. Zemel, W. Brendel, M. Bethge, and F.A. Wichmann. Short- cut learning in deep neural networks.Nature Machine Intelligence, 2:665–673, 2020.doi.org/10.1038/ s42256-020-00257-z

2020
[9]

Training Compute-Optimal Large Language Models

J. Hoffmann, S. Borgeaud, A. Mensch, E. Buchatskaya, T. Cai, E. Rutherford, D. de Las Casas, L.A. Hendricks, J. Welbl, A. Clark, T. Hennigan, E. Noland, K. Millican, G. van den Driessche, B. Damoc, A. Guy, S. Osindero, K. Simonyan, E. Elsen, J.W. Rae, O. Vinyals, and L. Sifre. Training compute-optimal large language models. In Advances in Neural Informati...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2203.15556 2022
[10]

Everyone wants to do the model work, not the data work

N. Sambasivan, S. Kapania, H. Highfill, D. Akrong, P. Paritosh, and L.M. Aroyo. “Everyone wants to do the model work, not the data work”: Data cascades in high-stakes AI. InProc. ACM CHI Conference on Human Factors in Computing Systems, article no. 39, p. 1–15, 2021.doi.org/10.1145/3411764.3445518. 9

work page doi:10.1145/3411764.3445518 2021
[11]

Gadre, G

S.Y. Gadre, G. Ilharco, A. Fang, J. Hayase, G. Smyrnis, T. Nguyen, R. Marten, M. Wortsman, D. Ghosh, J. Zhang, E. Orgad, R. Entezari, G. Daras, S. Pratt, V. Ramanujan, Y. Bitton, K. Marathe, S. Mussmann, R. Vencu, M. Cherti, R. Krishna, P.W. Koh, O. Saukh, A. Ratner, S. Song, H. Hajishirzi, A. Farhadi, R. Beaumont, S. Oh, A. Dimakis, J. Jitsev, Y. Carmon,...

work page doi:10.48550/arxiv.2304.14108 2023
[12]

Schölkopf, F

B. Schölkopf, F. Locatello, S. Bauer, N.R. Ke, N. Kalchbrenner, A. Goyal, and Y. Bengio. Toward causal representation learning.Proceedings of the IEEE, 109(5):612–634, 2021. doi.org/10.1109/JPROC.2021. 3058954

work page doi:10.1109/jproc.2021 2021
[13]

Oberkampf and C.J

W.L. Oberkampf and C.J. Roy.Verification and Validation in Scientific Computing. Cambridge University Press, Cambridge, 2010. ISBN 978-0-521-11360-1.doi.org/10.1017/CBO9780511760396

work page doi:10.1017/cbo9780511760396 2010
[14]

ASME, New York, 2020 (reaffirmed 2025)

American Society of Mechanical Engineers.ASME V&V 10-2019 (R2025): Guide for Verification and Validation in Computational Solid Mechanics. ASME, New York, 2020 (reaffirmed 2025). asme.org/ codes-standards/v-v-10

2019
[15]

World Models

D. Ha and J. Schmidhuber. World models. InAdvances in Neural Information Processing Systems (NeurIPS), 2018.doi.org/10.48550/arXiv.1803.10122

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1803.10122 2018
[16]

Y. LeCun. A path towards autonomous machine intelligence. Open Review preprint, version 0.9, 2022. openreview.net/forum?id=BZ5a1r-kVsf

2022
[17]

Mastering Diverse Domains through World Models

D. Hafner, J. Pasukonis, J. Ba, and T. Lillicrap. Mastering diverse domains through world models.arXiv preprintarXiv:2301.04104, 2023.doi.org/10.48550/arXiv.2301.04104

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2301.04104 2023
[18]

D.N. Wilke. From Perception to Autonomous Computational Modeling: A Multi-Agent Approach.arXiv preprintarXiv:2604.06788, 2026.arxiv.org/abs/2604.06788

Pith/arXiv arXiv 2026
[19]

Pearl.Causality: Models, Reasoning, and Inference

J. Pearl.Causality: Models, Reasoning, and Inference. Cambridge University Press, Cambridge, 2nd edition,
[20]

ISBN 978-0-521-89560-6.doi.org/10.1017/CBO9780511803161

work page doi:10.1017/cbo9780511803161
[21]

L.A. Zadeh. Fuzzy sets.Information and Control, 8(3):338–353, 1965.doi.org/10.1016/S0019-9958(65) 90241-X

work page doi:10.1016/s0019-9958(65 1965
[22]

Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart, and A. Anandkumar. Fourier neural operator for parametric partial differential equations. InInternational Conference on Learning Representations (ICLR), 2021.doi.org/10.48550/arXiv.2010.08895

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2010.08895 2021
[23]

Kovachki, Z

N. Kovachki, Z. Li, B. Liu, K. Azizzadenesheli, K. Bhattacharya, A.M. Stuart, and A. Anandkumar. Neural operator: Learning maps between function spaces with applications to PDEs.Journal of Machine Learning Research, 24(89):1–97, 2023.jmlr.org/papers/v24/21-1524.html

2023
[24]

Raissi, P

M. Raissi, P. Perdikaris, and G.E. Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations.Journal of Computational Physics, 378:686–707, 2019.doi.org/10.1016/j.jcp.2018.10.045

work page doi:10.1016/j.jcp.2018.10.045 2019
[25]

Sanchez-Gonzalez, J

A. Sanchez-Gonzalez, J. Godwin, T. Pfaff, R. Ying, J. Leskovec, and P.W. Battaglia. Learning to simulate complex physics with graph networks. InProceedings of the 37th International Conference on Machine Learning (ICML), volume 119 ofPMLR, pages 8459–8468, 2020.proceedings.mlr.press/v119/sanchez-gonzalez20a. html

2020
[26]

Z. Jin, Y. Chen, F. Leeb, L. Gresele, O. Kamal, Z. Lyu, K. Blin, F. Gonzalez Adauto, M. Kleiman-Weiner, M. Sachan, and B. Schölkopf. CLadder: Assessing causal reasoning in language models. InAdvances in Neural Information Processing Systems (NeurIPS), 2023.doi.org/10.48550/arXiv.2312.04350

work page doi:10.48550/arxiv.2312.04350 2023
[27]

Kaushik, E

D. Kaushik, E. Hovy, and Z.C. Lipton. Learning the difference that makes a difference with counterfactually- augmented data. InInternational Conference on Learning Representations (ICLR), 2020.doi.org/10.48550/ arXiv.1909.12434

arXiv 2020
[28]

V. Vovk, A. Gammerman, and G. Shafer.Algorithmic Learning in a Random World. Springer, New York,
[29]

ISBN 978-0-387-00152-4.doi.org/10.1007/b106715

work page doi:10.1007/b106715
[30]

and Bates, Stephen , title =

A.N. Angelopoulos and S. Bates. Conformal prediction: A gentle introduction.Foundations and Trends in Machine Learning, 16(4):494–591, 2023.doi.org/10.1561/2200000101

work page doi:10.1561/2200000101 2023
[31]

Huang, L

T. Gebru, J. Morgenstern, B. Vecchione, J.W. Vaughan, H. Wallach, H. Daumé III, and K. Crawford. Datasheets for datasets.Communications of the ACM, 64(12):86–92, 2021.doi.org/10.1145/3458723. 10

work page doi:10.1145/3458723 2021

[1] [1]

R. Lam, A. Sanchez-Gonzalez, M. Willson, P. Wirnsberger, M. Fortunato, F. Alet, S. Ravuri, T. Ewalds, Z. Eaton-Rosen, W. Hu, A. Merose, S. Hoyer, G. Holland, O. Vinyals, J. Stott, A. Pritzel, S. Mohamed, and P. Battaglia. Learning skillful medium-range global weather forecasting.Science, 382(6677):1416–1421, 2023. doi.org/10.1126/science.adi2336

work page doi:10.1126/science.adi2336 2023

[2] [2]

K. Bi, L. Xie, H. Zhang, X. Chen, X. Gu, and Q. Tian. Accurate medium-range global weather forecasting with 3D neural networks.Nature, 619(7970):533–538, 2023.doi.org/10.1038/s41586-023-06185-3

work page doi:10.1038/s41586-023-06185-3 2023

[3] [3]

S., Aykol, M., Cheon, G

A. Merchant, S. Batzner, S.S. Schoenholz, M. Aykol, G. Cheon, and E.D. Cubuk. Scaling deep learning for materials discovery.Nature, 624:80–85, 2023.doi.org/10.1038/s41586-023-06735-9

work page doi:10.1038/s41586-023-06735-9 2023

[4] [4]

Cheetham and R

A.K. Cheetham and R. Seshadri. Artificial intelligence driving materials discovery? Perspective on the article: scaling deep learning for materials discovery.Chemistry of Materials, 36(8):3490–3495, 2024.doi.org/10. 1021/acs.chemmater.4c00643

2024

[5] [5]

Highly Accurate Protein Structure Prediction with

J. Jumper, R. Evans, A. Pritzel, T. Green, M. Figurnov, O. Ronneberger, K. Tunyasuvunakool, R. Bates, A. Žídek, A. Potapenko, et al. Highly accurate protein structure prediction with AlphaFold.Nature, 596(7873):583–589, 2021.doi.org/10.1038/s41586-021-03819-2

work page doi:10.1038/s41586-021-03819-2 2021

[6] [6]

Pfaff, M

T. Pfaff, M. Fortunato, A. Sanchez-Gonzalez, and P.W. Battaglia. Learning mesh-based simulation with graph networks (MeshGraphNets). InInternational Conference on Learning Representations (ICLR), 2021. doi.org/10.48550/arXiv.2010.03409

work page doi:10.48550/arxiv.2010.03409 2021

[7] [7]

Niederer, J

S.A. Niederer, J. Lumens, and N.A. Trayanova. Computational models in cardiology.Nature Reviews Cardiology, 16:100–111, 2019.doi.org/10.1038/s41569-018-0104-y

work page doi:10.1038/s41569-018-0104-y 2019

[8] [8]

Geirhos, J.-H

R. Geirhos, J.-H. Jacobsen, C. Michaelis, R. Zemel, W. Brendel, M. Bethge, and F.A. Wichmann. Short- cut learning in deep neural networks.Nature Machine Intelligence, 2:665–673, 2020.doi.org/10.1038/ s42256-020-00257-z

2020

[9] [9]

Training Compute-Optimal Large Language Models

J. Hoffmann, S. Borgeaud, A. Mensch, E. Buchatskaya, T. Cai, E. Rutherford, D. de Las Casas, L.A. Hendricks, J. Welbl, A. Clark, T. Hennigan, E. Noland, K. Millican, G. van den Driessche, B. Damoc, A. Guy, S. Osindero, K. Simonyan, E. Elsen, J.W. Rae, O. Vinyals, and L. Sifre. Training compute-optimal large language models. In Advances in Neural Informati...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2203.15556 2022

[10] [10]

Everyone wants to do the model work, not the data work

N. Sambasivan, S. Kapania, H. Highfill, D. Akrong, P. Paritosh, and L.M. Aroyo. “Everyone wants to do the model work, not the data work”: Data cascades in high-stakes AI. InProc. ACM CHI Conference on Human Factors in Computing Systems, article no. 39, p. 1–15, 2021.doi.org/10.1145/3411764.3445518. 9

work page doi:10.1145/3411764.3445518 2021

[11] [11]

Gadre, G

S.Y. Gadre, G. Ilharco, A. Fang, J. Hayase, G. Smyrnis, T. Nguyen, R. Marten, M. Wortsman, D. Ghosh, J. Zhang, E. Orgad, R. Entezari, G. Daras, S. Pratt, V. Ramanujan, Y. Bitton, K. Marathe, S. Mussmann, R. Vencu, M. Cherti, R. Krishna, P.W. Koh, O. Saukh, A. Ratner, S. Song, H. Hajishirzi, A. Farhadi, R. Beaumont, S. Oh, A. Dimakis, J. Jitsev, Y. Carmon,...

work page doi:10.48550/arxiv.2304.14108 2023

[12] [12]

Schölkopf, F

B. Schölkopf, F. Locatello, S. Bauer, N.R. Ke, N. Kalchbrenner, A. Goyal, and Y. Bengio. Toward causal representation learning.Proceedings of the IEEE, 109(5):612–634, 2021. doi.org/10.1109/JPROC.2021. 3058954

work page doi:10.1109/jproc.2021 2021

[13] [13]

Oberkampf and C.J

W.L. Oberkampf and C.J. Roy.Verification and Validation in Scientific Computing. Cambridge University Press, Cambridge, 2010. ISBN 978-0-521-11360-1.doi.org/10.1017/CBO9780511760396

work page doi:10.1017/cbo9780511760396 2010

[14] [14]

ASME, New York, 2020 (reaffirmed 2025)

American Society of Mechanical Engineers.ASME V&V 10-2019 (R2025): Guide for Verification and Validation in Computational Solid Mechanics. ASME, New York, 2020 (reaffirmed 2025). asme.org/ codes-standards/v-v-10

2019

[15] [15]

World Models

D. Ha and J. Schmidhuber. World models. InAdvances in Neural Information Processing Systems (NeurIPS), 2018.doi.org/10.48550/arXiv.1803.10122

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1803.10122 2018

[16] [16]

Y. LeCun. A path towards autonomous machine intelligence. Open Review preprint, version 0.9, 2022. openreview.net/forum?id=BZ5a1r-kVsf

2022

[17] [17]

Mastering Diverse Domains through World Models

D. Hafner, J. Pasukonis, J. Ba, and T. Lillicrap. Mastering diverse domains through world models.arXiv preprintarXiv:2301.04104, 2023.doi.org/10.48550/arXiv.2301.04104

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2301.04104 2023

[18] [18]

D.N. Wilke. From Perception to Autonomous Computational Modeling: A Multi-Agent Approach.arXiv preprintarXiv:2604.06788, 2026.arxiv.org/abs/2604.06788

Pith/arXiv arXiv 2026

[19] [19]

Pearl.Causality: Models, Reasoning, and Inference

J. Pearl.Causality: Models, Reasoning, and Inference. Cambridge University Press, Cambridge, 2nd edition,

[20] [20]

ISBN 978-0-521-89560-6.doi.org/10.1017/CBO9780511803161

work page doi:10.1017/cbo9780511803161

[21] [21]

L.A. Zadeh. Fuzzy sets.Information and Control, 8(3):338–353, 1965.doi.org/10.1016/S0019-9958(65) 90241-X

work page doi:10.1016/s0019-9958(65 1965

[22] [22]

Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart, and A. Anandkumar. Fourier neural operator for parametric partial differential equations. InInternational Conference on Learning Representations (ICLR), 2021.doi.org/10.48550/arXiv.2010.08895

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2010.08895 2021

[23] [23]

Kovachki, Z

N. Kovachki, Z. Li, B. Liu, K. Azizzadenesheli, K. Bhattacharya, A.M. Stuart, and A. Anandkumar. Neural operator: Learning maps between function spaces with applications to PDEs.Journal of Machine Learning Research, 24(89):1–97, 2023.jmlr.org/papers/v24/21-1524.html

2023

[24] [24]

Raissi, P

M. Raissi, P. Perdikaris, and G.E. Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations.Journal of Computational Physics, 378:686–707, 2019.doi.org/10.1016/j.jcp.2018.10.045

work page doi:10.1016/j.jcp.2018.10.045 2019

[25] [25]

Sanchez-Gonzalez, J

A. Sanchez-Gonzalez, J. Godwin, T. Pfaff, R. Ying, J. Leskovec, and P.W. Battaglia. Learning to simulate complex physics with graph networks. InProceedings of the 37th International Conference on Machine Learning (ICML), volume 119 ofPMLR, pages 8459–8468, 2020.proceedings.mlr.press/v119/sanchez-gonzalez20a. html

2020

[26] [26]

Z. Jin, Y. Chen, F. Leeb, L. Gresele, O. Kamal, Z. Lyu, K. Blin, F. Gonzalez Adauto, M. Kleiman-Weiner, M. Sachan, and B. Schölkopf. CLadder: Assessing causal reasoning in language models. InAdvances in Neural Information Processing Systems (NeurIPS), 2023.doi.org/10.48550/arXiv.2312.04350

work page doi:10.48550/arxiv.2312.04350 2023

[27] [27]

Kaushik, E

D. Kaushik, E. Hovy, and Z.C. Lipton. Learning the difference that makes a difference with counterfactually- augmented data. InInternational Conference on Learning Representations (ICLR), 2020.doi.org/10.48550/ arXiv.1909.12434

arXiv 2020

[28] [28]

V. Vovk, A. Gammerman, and G. Shafer.Algorithmic Learning in a Random World. Springer, New York,

[29] [29]

ISBN 978-0-387-00152-4.doi.org/10.1007/b106715

work page doi:10.1007/b106715

[30] [30]

and Bates, Stephen , title =

A.N. Angelopoulos and S. Bates. Conformal prediction: A gentle introduction.Foundations and Trends in Machine Learning, 16(4):494–591, 2023.doi.org/10.1561/2200000101

work page doi:10.1561/2200000101 2023

[31] [31]

Huang, L

T. Gebru, J. Morgenstern, B. Vecchione, J.W. Vaughan, H. Wallach, H. Daumé III, and K. Crawford. Datasheets for datasets.Communications of the ACM, 64(12):86–92, 2021.doi.org/10.1145/3458723. 10

work page doi:10.1145/3458723 2021