Position: Prioritize Identifying Structure, Not Complex Models, for Scientific Discovery

Tyler H. McCormick

arxiv: 2606.02632 · v1 · pith:7TXF2ZL7new · submitted 2026-05-30 · 📊 stat.ML · cs.AI· cs.CY· cs.LG· econ.EM· stat.AP

Position: Prioritize Identifying Structure, Not Complex Models, for Scientific Discovery

Tyler H. McCormick This is my paper

Pith reviewed 2026-06-28 17:55 UTC · model grok-4.3

classification 📊 stat.ML cs.AIcs.CYcs.LGecon.EMstat.AP

keywords mechanistic learningunderdeterminationlarge language modelsscientific discoverymachine learningexplanatory equivalenceproxy regimes

0 comments

The pith

Many incompatible mechanisms produce the same observations, so predictive success alone does not establish mechanism discovery.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This position paper argues that in the high-dimensional proxy regimes where modern machine learning performs well, many different mechanisms can generate essentially identical observational patterns. As a result, models that predict accurately or generate coherent explanations provide no reliable evidence that they have identified the actual underlying mechanism. Large language models heighten the risk because they tend to select and present one fluent narrative from among many possible explanations. The author proposes concrete standards that any claim of mechanistic learning must satisfy if it is to advance scientific understanding rather than merely imitate it.

Core claim

In the high-dimensional proxy regimes where modern ML excels, mechanistic learning is generically underdetermined: many incompatible mechanisms induce essentially the same observational relationships on the support of the data, so predictive success and coherent explanations are insufficient evidence of mechanism discovery. This underdetermination becomes uniquely hazardous with large language models, which tend to collapse large equivalence classes of explanations into a single fluent narrative. Concrete standards for mechanistic ML are necessary if LLM-centered workflows are to support science rather than merely simulate it.

What carries the argument

Underdetermination of mechanisms by data: the property that many incompatible mechanisms induce essentially the same observational relationships on the support of the data in high-dimensional proxy regimes.

If this is right

Predictive accuracy of an ML model supplies no evidence that the model has recovered a unique mechanism.
Coherent narrative explanations generated by LLMs do not indicate that the model has resolved mechanistic ambiguity.
Any workflow claiming mechanistic discovery must include explicit checks that rule out alternative mechanisms consistent with the data.
Standards for mechanistic ML must require demonstration that the identified mechanism is distinguishable from others on the observed support.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Efforts to discover mechanisms may need to shift emphasis toward direct identification of structural constraints rather than reliance on complex predictive models.
Methods that enumerate or bound the set of mechanisms consistent with given data could reduce the hazard of narrative collapse.
The same underdetermination concern may apply to non-LLM machine learning systems that produce single-point explanations from high-dimensional inputs.

Load-bearing premise

That the settings where modern ML excels are high-dimensional proxy regimes in which underdetermination of mechanisms is generic and that large language models systematically reduce multiple possible explanations to one narrative.

What would settle it

A high-dimensional observational dataset together with an explicit list of at least two incompatible mechanisms that produce identical predictions on all observed points, yet an LLM workflow that outputs only one of them as the discovered mechanism.

Figures

Figures reproduced from arXiv: 2606.02632 by Tyler H. McCormick.

**Figure 1.** Figure 1: Identifying structure collapses mechanistic disagreement. Mean ℓ1 spread of counterfactual predicted probability vectors across MLP runs. “No design” is Regime A (phenotypeonly F2 self-cross data); “with design” is Regime B (cross labels plus additional labeled crosses, including testcross support). “All” uses all 60 random-state runs; “near-opt” restricts to runs within ε = 0.01 of the best held-out tes… view at source ↗

**Figure 2.** Figure 2: Regime A (phenotype-only): observational equivalence at the proxy level. Pooled F2 phenotype frequencies over four proxy categories (RG, RY, WG, WY). The Mendelian dihybrid mechanism (dominance + independent assortment; 9:3:3:1) and an alternative “independent traits” mechanism with fixed marginals pR = pY = 3/4 are distinct mechanistic stories but induce essentially the same pooled proxy distribution in t… view at source ↗

**Figure 3.** Figure 3: Regime B (design): cross structure rules out an observationally equivalent alternative. Per-sample log loss under additional labeled crosses (F2 self, testcross, monoA, monoB). The alternative mechanism matches Regime A by construction but fails when asked to simultaneously explain the outcomes of multiple designed crosses; the Mendelian mechanism remains well-calibrated across all crosses. Data-Generating… view at source ↗

**Figure 4.** Figure 4: No design label: counterfactual predictions vary across near-optimal black-box fits. Each point is the predicted probability of a phenotype category under a counterfactual query (RY×WG) from a different near-optimal MLP fit trained without cross-type labels. Even among near-optimal predictors, counterfactual mechanistic predictions exhibit nontrivial spread when identifying structure is absent [PITH_FULL_… view at source ↗

**Figure 5.** Figure 5: With design label: counterfactual predictions concentrate. Same counterfactual parent phenotype pair as [PITH_FULL_IMAGE:figures/full_fig_p018_5.png] view at source ↗

**Figure 6.** Figure 6: Black-box multiplicity: many near-optimal fits. Distribution of held-out test log loss across MLP random seeds when trained with design labels; near-optimal seeds are highlighted. This justifies evaluating mechanistic disagreement within a set of comparably good predictors, rather than attributing disagreement to poor optimization [PITH_FULL_IMAGE:figures/full_fig_p019_6.png] view at source ↗

**Figure 7.** Figure 7: In-domain fit. Histogram of test MSE over all seeds and model families. Despite different inductive biases, all models cluster tightly (MSE ≈ 0.003–0.005), showing strong in-domain agreement. Results. The key pattern is consistent with this paper’s claims: tight predictive clustering ( [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗

**Figure 8.** Figure 8: Gradient disagreement by family. Variance of partial derivatives ∂y/∂x ˆ j across seeds within each model family, log-scaled. Dark indicates agreement; bright indicates disagreement. The colorbar (bottom) is the scale for all panels. RF and GBR show sizable variance on x1 and x2 (mechanism-bearing coordinates) and spurious variance on proxy/noise coordinates; MLPs exhibit lower but nonzero variance. Exampl… view at source ↗

**Figure 9.** Figure 9: Off-manifold interventions. Each gray line is one model’s prediction when intervening on x2 (holding other coordinates fixed); black is the true response. Although in-domain MSEs are nearly identical ( [PITH_FULL_IMAGE:figures/full_fig_p021_9.png] view at source ↗

read the original abstract

Modern Machine Learning (ML) and Artificial Intelligence (AI) models, especially large language models (LLMs), are increasingly used to generate scientific hypotheses and mechanistic explanations from observational data. This position paper argues that in the high-dimensional proxy regimes where modern ML excels, mechanistic learning is generically underdetermined: many incompatible mechanisms induce essentially the same observational relationships on the support of the data, so predictive success and coherent explanations are insufficient evidence of mechanism discovery. This underdetermination becomes uniquely hazardous with large language models (LLMs), which tend to collapse large equivalence classes of explanations into a single fluent narrative. This paper proposes concrete standards for ``mechanistic ML,'' and argues these norms are necessary if LLM-centered workflows are to support science rather than merely simulate it.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper flags a genuine risk that LLMs can produce fluent but non-unique mechanistic stories from observational data, yet it asserts genericity of underdetermination without an example or formal characterization.

read the letter

The core point is that in the high-dimensional regimes where modern ML works best, many different mechanisms can produce the same observational patterns, so good predictions and coherent text do not confirm you have found the right mechanism. LLMs make this worse by defaulting to one smooth narrative. That is a fair warning for anyone using these tools for scientific discovery.

What the paper does well is lay out the issue plainly and sketch some standards for what would count as mechanistic ML. It correctly notes that predictive success alone is not enough when the goal is mechanism.

The main gap is that the central claim about underdetermination being generic rests on assertion rather than demonstration. There is no worked example of two incompatible mechanisms that agree on the support of the data, no definition of the regime, and no comparison showing LLMs are uniquely bad at this compared with other explanatory methods. Without that, it is hard to judge how often the problem actually arises.

This is aimed at people applying ML to scientific questions, especially those already worried about interpretability. It is worth sending to referees because the topic matters and the position is coherent on its own terms, even though the argument would be stronger with concrete illustrations.

Referee Report

2 major / 2 minor

Summary. This position paper claims that in the high-dimensional proxy regimes where modern ML excels, mechanistic learning from observational data is generically underdetermined: many incompatible mechanisms can induce essentially identical observational relationships on the data support, rendering predictive accuracy and coherent narratives insufficient to establish mechanism discovery. It further argues that this underdetermination is uniquely hazardous for LLMs, which collapse equivalence classes of explanations into single fluent outputs, and proposes concrete standards for 'mechanistic ML' to ensure such workflows support rather than merely simulate science.

Significance. If the genericity of underdetermination holds, the paper would offer a timely conceptual caution for AI-assisted scientific discovery, highlighting the priority of structure identification over complex model fitting. The emphasis on standards for mechanistic ML could usefully shape norms in the field, though the absence of any formal characterization or illustrative example restricts the argument to a high-level position statement.

major comments (2)

[Abstract and core argument on underdetermination] The central claim (abstract and opening paragraphs) that underdetermination is 'generic' precisely in high-dimensional proxy regimes lacks both a mathematical characterization of these regimes and a minimal worked example of two distinct mechanisms (e.g., structural causal models or dynamical systems) whose observational distributions coincide on the data support but whose mechanisms differ. This omission is load-bearing for the assertion that predictive success is insufficient evidence of mechanism discovery.
[Discussion of LLM-specific risks] The further claim that LLMs 'tend to collapse large equivalence classes of explanations into a single fluent narrative' (abstract and discussion of LLM hazards) is advanced without any comparison to other explanatory procedures or evidence establishing uniqueness relative to non-LLM methods. This is load-bearing for the position that the hazard is 'uniquely' LLM-specific.

minor comments (2)

The term 'support of the data' is invoked repeatedly without a precise definition in the observational setting, which could be clarified for readers outside causal inference.
The proposed standards for mechanistic ML are referenced but not enumerated in detail; expanding this section with explicit criteria would strengthen the constructive contribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for these constructive comments, which highlight areas where the position paper can be strengthened with greater concreteness. We address each major point below and indicate planned revisions.

read point-by-point responses

Referee: [Abstract and core argument on underdetermination] The central claim (abstract and opening paragraphs) that underdetermination is 'generic' precisely in high-dimensional proxy regimes lacks both a mathematical characterization of these regimes and a minimal worked example of two distinct mechanisms (e.g., structural causal models or dynamical systems) whose observational distributions coincide on the data support but whose mechanisms differ. This omission is load-bearing for the assertion that predictive success is insufficient evidence of mechanism discovery.

Authors: We agree that a minimal worked example would make the underdetermination claim more tangible and will add one in revision (e.g., two distinct linear SCMs or dynamical systems sharing the same marginal on observed variables but differing in latent structure or interventions). We will also expand the description of 'high-dimensional proxy regimes' with additional intuition and references. A full formal mathematical characterization of genericity, however, lies outside the scope of a position paper; the argument remains conceptual and draws on existing results in causal inference and identifiability. revision: partial
Referee: [Discussion of LLM-specific risks] The further claim that LLMs 'tend to collapse large equivalence classes of explanations into a single fluent narrative' (abstract and discussion of LLM hazards) is advanced without any comparison to other explanatory procedures or evidence establishing uniqueness relative to non-LLM methods. This is load-bearing for the position that the hazard is 'uniquely' LLM-specific.

Authors: We will revise the manuscript to remove the word 'uniquely' and instead argue that the combination of fluency, single-output generation, and lack of explicit uncertainty representation makes LLMs particularly prone to this collapse relative to traditional scientific workflows that retain multiple candidate explanations. We can cite literature on narrative bias in scientific communication and on the difference between generative models and explicit model-selection procedures, but a systematic empirical comparison across all explanatory methods is beyond the scope of this position statement. revision: yes

Circularity Check

0 steps flagged

No significant circularity in conceptual position paper

full rationale

The manuscript is a position paper advancing a conceptual argument about underdetermination of mechanisms in high-dimensional regimes and the risks of LLMs. It contains no equations, no fitted parameters, no derivation chain, and no self-citations that serve as load-bearing premises for the central claim. The argument rests on stated premises about observational equivalence rather than any reduction of a 'prediction' or 'result' to its own inputs by construction. This is the normal case for non-mathematical position papers and warrants score 0.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The position rests on the domain assumption of generic underdetermination in high-dimensional proxy regimes and on the behavioral claim about LLMs collapsing explanation classes.

axioms (2)

domain assumption In high-dimensional proxy regimes where modern ML excels, many incompatible mechanisms induce essentially the same observational relationships.
This premise is stated directly in the abstract as the basis for the underdetermination argument.
domain assumption LLMs tend to collapse large equivalence classes of explanations into a single fluent narrative.
This behavioral claim about LLMs is invoked to explain why the underdetermination is especially hazardous.

pith-pipeline@v0.9.1-grok · 5664 in / 1163 out tokens · 29142 ms · 2026-06-28T17:55:40.953880+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

120 extracted references · 20 canonical work pages · 4 internal anchors

[1]

Journal of Machine Learning Research , year =

All Models are Wrong, but Many are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously , author =. Journal of Machine Learning Research , year =
[2]

Journal of Statistics and Data Science Education , year =

D'Agostino McGowan, Lucy and Gerke, Travis and Barrett, Malcolm , title =. Journal of Statistics and Data Science Education , year =
[3]

The American Statistician , year =

Gelman, Andrew and Hullman, Jessica and Kennedy, Lauren , title =. The American Statistician , year =
[4]

Variational Autoencoders and Nonlinear

Khemakhem, Ilyes and Kingma, Diederik and Monti, Ricardo and Hyvarinen, Aapo , booktitle =. Variational Autoencoders and Nonlinear. 2020 , editor =

2020
[5]

Proceedings of the 41st International Conference on Machine Learning , pages =

Causal Representation Learning Made Identifiable by Grouping of Observational Variables , author =. Proceedings of the 41st International Conference on Machine Learning , pages =. 2024 , editor =

2024
[6]

Identification of Joint Interventional Distributions in Recursive

Shpitser, Ilya and Pearl, Judea , booktitle =. Identification of Joint Interventional Distributions in Recursive. 2006 , pages =

2006
[7]

Robustly estimating heterogeneity in factorial data using Rashomon Partitions

Robustly estimating heterogeneity in factorial data using rashomon partitions , author=. arXiv preprint arXiv:2404.02141 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[8]

1996 , edition =

The Structure of Scientific Revolutions , author =. 1996 , edition =

1996
[9]

The Philosophical Review , volume=

The structure of scientific revolutions , author=. The Philosophical Review , volume=. 1964 , publisher=

1964
[10]

What Would It be Like to be

Soler, L. What Would It be Like to be. Social Epistemology , volume=. 2025 , publisher=

2025
[11]

arXiv preprint arXiv:2402.13914 , year=

Position: Explain to question not to justify , author=. arXiv preprint arXiv:2402.13914 , year=

work page arXiv
[12]

American Journal of Epidemiology , volume=

A warning about using predicted values from regression models for epidemiologic inquiry , author=. American Journal of Epidemiology , volume=. 2021 , publisher=

2021
[13]

arXiv preprint arXiv:2508.15162 , year=

A Unified Framework for Inference with General Missingness Patterns and Machine Learning Imputation , author=. arXiv preprint arXiv:2508.15162 , year=

work page arXiv
[14]

Proceedings of the National Academy of Sciences , volume=

Methods for correcting inference based on outcomes predicted by machine learning , author=. Proceedings of the National Academy of Sciences , volume=. 2020 , publisher=

2020
[15]

arXiv preprint arXiv:2405.13926 , year=

Some models are useful, but for how long?: A decision theoretic approach to choosing when to refit large-scale prediction models , author=. arXiv preprint arXiv:2405.13926 , year=

work page arXiv
[16]

Verhandlungen des naturforschenden Vereines in Br

Mendel, Gregor , title =. Verhandlungen des naturforschenden Vereines in Br. 1866 , volume =
[17]

, title =

Fisher, Ronald A. , title =. Annals of Science , year =
[18]

Journal of Genetics , year =

Curtis, David , title =. Journal of Genetics , year =
[19]

Science & Education , year =

Bapty, Hannah , title =. Science & Education , year =
[20]

Mackay, Trudy F. C. and Anholt, Robert R. H. , title =. Nature Reviews Genetics , year =
[21]

1855 , edition =

Snow, John , title =. 1855 , edition =
[22]

1861 , note =

Pasteur, Louis , title =. 1861 , note =
[23]

Deutsche Medizinische Wochenschrift , year =

Koch, Robert , title =. Deutsche Medizinische Wochenschrift , year =
[24]

arXiv preprint arXiv:2512.05456 , year=

Do We Really Even Need Data? A Modern Look at Drawing Inference with Predicted Data , author=. arXiv preprint arXiv:2512.05456 , year=

work page arXiv
[25]

2025 , eprint =

Agarwal, Dhruv and Majumder, Bodhisattwa Prasad and Adamson, Reece and Chakravorty, Megha and Gavireddy, Satvika Reddy and Parashar, Aditya and Surana, Harshit and Mishra, Bhavana Dalvi and McCallum, Andrew and Sabharwal, Ashish and Clark, Peter , booktitle =. 2025 , eprint =

2025
[26]

Advances in Neural Information Processing Systems (NeurIPS) , year =

A Path to Simpler Models Starts With Noise , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =. doi:10.48550/arXiv.2310.19726 , url =

work page doi:10.48550/arxiv.2310.19726
[27]

Proceedings of the 40th International Conference on Machine Learning , series =

Transformers Learn In-Context by Gradient Descent , author =. Proceedings of the 40th International Conference on Machine Learning , series =. 2023 , publisher =

2023
[28]

Decomposition of Uncertainty in

Depeweg, Stefan and Hernandez-Lobato, Jose Miguel and Doshi-Velez, Finale and Udluft, Steffen , booktitle =. Decomposition of Uncertainty in. 2018 , publisher =

2018
[29]

Position: The No Free Lunch Theorem,

Goldblum, Micah and Finzi, Marc Anton and Rowan, Keefer and Wilson, Andrew Gordon , booktitle =. Position: The No Free Lunch Theorem,. 2024 , editor =

2024
[30]

Decomposition of Uncertainty in

Depeweg, Stefan and Hernandez-Lobato, Jose-Miguel and Doshi-Velez, Finale and Udluft, Steffen , booktitle =. Decomposition of Uncertainty in. 2018 , editor =

2018
[33]

Proceedings of the 42nd International Conference on Machine Learning , year =

Position: Uncertainty Quantification Needs Reassessment in the Era of Large Language Models , author =. Proceedings of the 42nd International Conference on Machine Learning , year =
[34]

and Rowan, Keefer and Wilson, Andrew Gordon , booktitle =

Goldblum, Micah and Finzi, Marc A. and Rowan, Keefer and Wilson, Andrew Gordon , booktitle =. Position: The No Free Lunch Theorem,. 2024 , publisher =

2024
[35]

, title =

Banerjee, Abhijit V. , title =. The Quarterly Journal of Economics , year =
[36]

Journal of Political Economy , year =

Bikhchandani, Sushil and Hirshleifer, David and Welch, Ivo , title =. Journal of Political Economy , year =
[37]

Pathological Outcomes of Observational Learning , journal =

Smith, Lones and S. Pathological Outcomes of Observational Learning , journal =. 2000 , volume =

2000
[38]

and Golub, Benjamin , title =

Banerjee, Abhijit and Breza, Emily and Chandrasekhar, Arun G. and Golub, Benjamin , title =. The Review of Economic Studies , year =
[39]

and Golub, Benjamin and Yang, He , title =

Chandrasekhar, Arun G. and Golub, Benjamin and Yang, He , title =
[40]

, title =

Machamer, Peter and Darden, Lindley and Craver, Carl F. , title =. Philosophy of Science , year =
[41]

Woodward, James , title =
[42]

Exploring the Whole

Xin, Rui and Zhong, Chudi and Chen, Zhi and Takagi, Takuya and Seltzer, Margo and Rudin, Cynthia , booktitle =. Exploring the Whole
[43]

International Conference on Machine Learning , year =

Transformers Learn In-Context by Gradient Descent , author =. International Conference on Machine Learning , year =
[44]

Nature , year =

Shumailov, Ilia and Shumaylov, Zakhar and Zhao, Yiren and Papernot, Nicolas and Anderson, Ross and Gal, Yarin , title =. Nature , year =
[45]

Proceedings of the 42nd International Conference on Machine Learning , pages =

Position: Uncertainty Quantification Needs Reassessment for Large Language Model Agents , author =. Proceedings of the 42nd International Conference on Machine Learning , pages =. 2025 , editor =

2025
[47]

2025 , journal =

Conformal Prediction = Bayes? , author =. 2025 , journal =. doi:10.48550/arXiv.2512.23308 , url =. 2512.23308 , archivePrefix =

work page doi:10.48550/arxiv.2512.23308 2025
[48]

Proceedings of the National Academy of Sciences , year =

Kleinberg, Jon and Raghavan, Manish , title =. Proceedings of the National Academy of Sciences , year =
[49]

Proceedings of the 42nd International Conference on Machine Learning , series =

Kirchhof, Michael and Kasneci, Gjergji and Kasneci, Enkelejda , title =. Proceedings of the 42nd International Conference on Machine Learning , series =. 2025 , month = jul, publisher =

2025
[50]

, title =

Hodel, Damian and West, Jevin D. , title =. arXiv preprint , year =. doi:10.48550/arXiv.2512.15011 , url =. 2512.15011 , archivePrefix =

work page doi:10.48550/arxiv.2512.15011
[51]

What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?

Kendall, Alex and Gal, Yarin , title =. Advances in Neural Information Processing Systems , year =. 1703.04977 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv
[52]

Aleatoric and epistemic uncertainty in machine learning: An introduction to concepts and methods , journal =

H. Aleatoric and epistemic uncertainty in machine learning: An introduction to concepts and methods , journal =. 2021 , volume =

2021
[53]

Studies in Subjective Probability , editor =

Bruno de Finetti , title =. Studies in Subjective Probability , editor =. 1964 , note =

1964
[54]

Aldous , title =

David J. Aldous , title =. Journal of Multivariate Analysis , year =
[55]

Olav Kallenberg , title =
[56]

The Annals of Probability , year =

Persi Diaconis and David Freedman , title =. The Annals of Probability , year =
[57]

The Annals of Mathematical Statistics , year =

Henry Teicher , title =. The Annals of Mathematical Statistics , year =
[58]

Allman and Catherine Matias and John A

Elizabeth S. Allman and Catherine Matias and John A. Rhodes , title =. The Annals of Statistics , year =
[59]

ICML , year =

Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations , author =. ICML , year =
[60]

Kingma and Ricardo Pio Monti and Aapo Hyv

Ilyes Khemakhem and Diederik P. Kingma and Ricardo Pio Monti and Aapo Hyv. Variational Autoencoders and Nonlinear. AISTATS , year =
[61]

ICML , year =

Causal Representation Learning Made Identifiable by Grouping of Observational Variables , author =. ICML , year =
[62]

Journal of Machine Learning Research , year =

Alexander D'Amour and others , title =. Journal of Machine Learning Research , year =
[63]

Calmon and Mario Diaz , title =

Lucas Monteiro Paes and Rodrigo Cruz and Flavio P. Calmon and Mario Diaz , title =. ISIT , year =
[64]

ICDT , year =

Kevin Beyer and Jonathan Goldstein and Raghu Ramakrishnan and Uri Shaft , title =. ICDT , year =
[65]

Aggarwal and Alexander Hinneburg and Daniel A

Charu C. Aggarwal and Alexander Hinneburg and Daniel A. Keim , title =. ICDT , year =
[66]

Advances in Neural Information Processing Systems 15 , year =

Jon Kleinberg , title =. Advances in Neural Information Processing Systems 15 , year =
[67]

NeurIPS , year =

Chulhee Yun and Yin Tat Lee and Samuel Wiseman , title =. NeurIPS , year =
[68]

Nature Machine Intelligence , year =

Robert Geirhos and others , title =. Nature Machine Intelligence , year =
[69]

ICCV , year =

Khalid Mahmood and others , title =. ICCV , year =
[70]

arXiv:2111.10659 , year =

Jindong Gu and Yao Qin and Volker Tresp , title =. arXiv:2111.10659 , year =

work page arXiv
[71]

NeurIPS , year =

Training Compute-Optimal Large Language Models , author =. NeurIPS , year =
[72]

Scaling Laws for Neural Language Models

Jared Kaplan and others , title =. arXiv:2001.08361 , year =

work page internal anchor Pith review Pith/arXiv arXiv 2001
[73]

Causal Inference by Using Invariant Prediction: Identification and Confidence Intervals , journal =

Jonas Peters and Peter B. Causal Inference by Using Invariant Prediction: Identification and Confidence Intervals , journal =. 2016 , volume =

2016
[74]

ICLR , year =

Sanghyuk Park and Sungdong Choe and Yiding Jiang and Victor Veitch , title =. ICLR , year =
[75]

Fuller , title =

Wayne A. Fuller , title =
[76]

Carroll and David Ruppert and Leonard A

Raymond J. Carroll and David Ruppert and Leonard A. Stefanski and Ciprian M. Crainiceanu , title =
[77]

Journal of Machine Learning Research , year=

Underspecification Presents Challenges for Credibility in Modern Machine Learning , author=. Journal of Machine Learning Research , year=
[79]

Hallucination is Inevitable: An Innate Limitation of Large Language Models

Hallucination is inevitable: An innate limitation of large language models , author=. arXiv preprint arXiv:2401.11817 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[80]

2025 , archivePrefix=

Hallucination is Inevitable for LLMs with the Open World Assumption , author=. 2025 , archivePrefix=. 2510.05116 , primaryClass=

work page arXiv 2025
[81]

2019 , archivePrefix=

On the Existence of Simpler Machine Learning Models , author=. 2019 , archivePrefix=. 1908.01755 , primaryClass=

work page arXiv 2019
[82]

Proceedings of the 41st International Conference on Machine Learning , series=

Position: Amazing Things Come From Having Many Good Models , author=. Proceedings of the 41st International Conference on Machine Learning , series=. 2024 , url=

2024
[83]

What Uncertainties Do We Need in

Kendall, Alex and Gal, Yarin , year=. What Uncertainties Do We Need in
[84]

Proceedings of Machine Learning Research (ICML 2025 Position Paper Track) , year=

Position: Uncertainty Quantification Needs Reassessment for Large-language Model Agents , author=. Proceedings of Machine Learning Research (ICML 2025 Position Paper Track) , year=

2025

Showing first 80 references.

[1] [1]

Journal of Machine Learning Research , year =

All Models are Wrong, but Many are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously , author =. Journal of Machine Learning Research , year =

[2] [2]

Journal of Statistics and Data Science Education , year =

D'Agostino McGowan, Lucy and Gerke, Travis and Barrett, Malcolm , title =. Journal of Statistics and Data Science Education , year =

[3] [3]

The American Statistician , year =

Gelman, Andrew and Hullman, Jessica and Kennedy, Lauren , title =. The American Statistician , year =

[4] [4]

Variational Autoencoders and Nonlinear

Khemakhem, Ilyes and Kingma, Diederik and Monti, Ricardo and Hyvarinen, Aapo , booktitle =. Variational Autoencoders and Nonlinear. 2020 , editor =

2020

[5] [5]

Proceedings of the 41st International Conference on Machine Learning , pages =

Causal Representation Learning Made Identifiable by Grouping of Observational Variables , author =. Proceedings of the 41st International Conference on Machine Learning , pages =. 2024 , editor =

2024

[6] [6]

Identification of Joint Interventional Distributions in Recursive

Shpitser, Ilya and Pearl, Judea , booktitle =. Identification of Joint Interventional Distributions in Recursive. 2006 , pages =

2006

[7] [7]

Robustly estimating heterogeneity in factorial data using Rashomon Partitions

Robustly estimating heterogeneity in factorial data using rashomon partitions , author=. arXiv preprint arXiv:2404.02141 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[8] [8]

1996 , edition =

The Structure of Scientific Revolutions , author =. 1996 , edition =

1996

[9] [9]

The Philosophical Review , volume=

The structure of scientific revolutions , author=. The Philosophical Review , volume=. 1964 , publisher=

1964

[10] [10]

What Would It be Like to be

Soler, L. What Would It be Like to be. Social Epistemology , volume=. 2025 , publisher=

2025

[11] [11]

arXiv preprint arXiv:2402.13914 , year=

Position: Explain to question not to justify , author=. arXiv preprint arXiv:2402.13914 , year=

work page arXiv

[12] [12]

American Journal of Epidemiology , volume=

A warning about using predicted values from regression models for epidemiologic inquiry , author=. American Journal of Epidemiology , volume=. 2021 , publisher=

2021

[13] [13]

arXiv preprint arXiv:2508.15162 , year=

A Unified Framework for Inference with General Missingness Patterns and Machine Learning Imputation , author=. arXiv preprint arXiv:2508.15162 , year=

work page arXiv

[14] [14]

Proceedings of the National Academy of Sciences , volume=

Methods for correcting inference based on outcomes predicted by machine learning , author=. Proceedings of the National Academy of Sciences , volume=. 2020 , publisher=

2020

[15] [15]

arXiv preprint arXiv:2405.13926 , year=

Some models are useful, but for how long?: A decision theoretic approach to choosing when to refit large-scale prediction models , author=. arXiv preprint arXiv:2405.13926 , year=

work page arXiv

[16] [16]

Verhandlungen des naturforschenden Vereines in Br

Mendel, Gregor , title =. Verhandlungen des naturforschenden Vereines in Br. 1866 , volume =

[17] [17]

, title =

Fisher, Ronald A. , title =. Annals of Science , year =

[18] [18]

Journal of Genetics , year =

Curtis, David , title =. Journal of Genetics , year =

[19] [19]

Science & Education , year =

Bapty, Hannah , title =. Science & Education , year =

[20] [20]

Mackay, Trudy F. C. and Anholt, Robert R. H. , title =. Nature Reviews Genetics , year =

[21] [21]

1855 , edition =

Snow, John , title =. 1855 , edition =

[22] [22]

1861 , note =

Pasteur, Louis , title =. 1861 , note =

[23] [23]

Deutsche Medizinische Wochenschrift , year =

Koch, Robert , title =. Deutsche Medizinische Wochenschrift , year =

[24] [24]

arXiv preprint arXiv:2512.05456 , year=

Do We Really Even Need Data? A Modern Look at Drawing Inference with Predicted Data , author=. arXiv preprint arXiv:2512.05456 , year=

work page arXiv

[25] [25]

2025 , eprint =

Agarwal, Dhruv and Majumder, Bodhisattwa Prasad and Adamson, Reece and Chakravorty, Megha and Gavireddy, Satvika Reddy and Parashar, Aditya and Surana, Harshit and Mishra, Bhavana Dalvi and McCallum, Andrew and Sabharwal, Ashish and Clark, Peter , booktitle =. 2025 , eprint =

2025

[26] [26]

Advances in Neural Information Processing Systems (NeurIPS) , year =

A Path to Simpler Models Starts With Noise , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =. doi:10.48550/arXiv.2310.19726 , url =

work page doi:10.48550/arxiv.2310.19726

[27] [27]

Proceedings of the 40th International Conference on Machine Learning , series =

Transformers Learn In-Context by Gradient Descent , author =. Proceedings of the 40th International Conference on Machine Learning , series =. 2023 , publisher =

2023

[28] [28]

Decomposition of Uncertainty in

Depeweg, Stefan and Hernandez-Lobato, Jose Miguel and Doshi-Velez, Finale and Udluft, Steffen , booktitle =. Decomposition of Uncertainty in. 2018 , publisher =

2018

[29] [29]

Position: The No Free Lunch Theorem,

Goldblum, Micah and Finzi, Marc Anton and Rowan, Keefer and Wilson, Andrew Gordon , booktitle =. Position: The No Free Lunch Theorem,. 2024 , editor =

2024

[30] [30]

Decomposition of Uncertainty in

Depeweg, Stefan and Hernandez-Lobato, Jose-Miguel and Doshi-Velez, Finale and Udluft, Steffen , booktitle =. Decomposition of Uncertainty in. 2018 , editor =

2018

[31] [33]

Proceedings of the 42nd International Conference on Machine Learning , year =

Position: Uncertainty Quantification Needs Reassessment in the Era of Large Language Models , author =. Proceedings of the 42nd International Conference on Machine Learning , year =

[32] [34]

and Rowan, Keefer and Wilson, Andrew Gordon , booktitle =

Goldblum, Micah and Finzi, Marc A. and Rowan, Keefer and Wilson, Andrew Gordon , booktitle =. Position: The No Free Lunch Theorem,. 2024 , publisher =

2024

[33] [35]

, title =

Banerjee, Abhijit V. , title =. The Quarterly Journal of Economics , year =

[34] [36]

Journal of Political Economy , year =

Bikhchandani, Sushil and Hirshleifer, David and Welch, Ivo , title =. Journal of Political Economy , year =

[35] [37]

Pathological Outcomes of Observational Learning , journal =

Smith, Lones and S. Pathological Outcomes of Observational Learning , journal =. 2000 , volume =

2000

[36] [38]

and Golub, Benjamin , title =

Banerjee, Abhijit and Breza, Emily and Chandrasekhar, Arun G. and Golub, Benjamin , title =. The Review of Economic Studies , year =

[37] [39]

and Golub, Benjamin and Yang, He , title =

Chandrasekhar, Arun G. and Golub, Benjamin and Yang, He , title =

[38] [40]

, title =

Machamer, Peter and Darden, Lindley and Craver, Carl F. , title =. Philosophy of Science , year =

[39] [41]

Woodward, James , title =

[40] [42]

Exploring the Whole

Xin, Rui and Zhong, Chudi and Chen, Zhi and Takagi, Takuya and Seltzer, Margo and Rudin, Cynthia , booktitle =. Exploring the Whole

[41] [43]

International Conference on Machine Learning , year =

Transformers Learn In-Context by Gradient Descent , author =. International Conference on Machine Learning , year =

[42] [44]

Nature , year =

Shumailov, Ilia and Shumaylov, Zakhar and Zhao, Yiren and Papernot, Nicolas and Anderson, Ross and Gal, Yarin , title =. Nature , year =

[43] [45]

Proceedings of the 42nd International Conference on Machine Learning , pages =

Position: Uncertainty Quantification Needs Reassessment for Large Language Model Agents , author =. Proceedings of the 42nd International Conference on Machine Learning , pages =. 2025 , editor =

2025

[44] [47]

2025 , journal =

Conformal Prediction = Bayes? , author =. 2025 , journal =. doi:10.48550/arXiv.2512.23308 , url =. 2512.23308 , archivePrefix =

work page doi:10.48550/arxiv.2512.23308 2025

[45] [48]

Proceedings of the National Academy of Sciences , year =

Kleinberg, Jon and Raghavan, Manish , title =. Proceedings of the National Academy of Sciences , year =

[46] [49]

Proceedings of the 42nd International Conference on Machine Learning , series =

Kirchhof, Michael and Kasneci, Gjergji and Kasneci, Enkelejda , title =. Proceedings of the 42nd International Conference on Machine Learning , series =. 2025 , month = jul, publisher =

2025

[47] [50]

, title =

Hodel, Damian and West, Jevin D. , title =. arXiv preprint , year =. doi:10.48550/arXiv.2512.15011 , url =. 2512.15011 , archivePrefix =

work page doi:10.48550/arxiv.2512.15011

[48] [51]

What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?

Kendall, Alex and Gal, Yarin , title =. Advances in Neural Information Processing Systems , year =. 1703.04977 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv

[49] [52]

Aleatoric and epistemic uncertainty in machine learning: An introduction to concepts and methods , journal =

H. Aleatoric and epistemic uncertainty in machine learning: An introduction to concepts and methods , journal =. 2021 , volume =

2021

[50] [53]

Studies in Subjective Probability , editor =

Bruno de Finetti , title =. Studies in Subjective Probability , editor =. 1964 , note =

1964

[51] [54]

Aldous , title =

David J. Aldous , title =. Journal of Multivariate Analysis , year =

[52] [55]

Olav Kallenberg , title =

[53] [56]

The Annals of Probability , year =

Persi Diaconis and David Freedman , title =. The Annals of Probability , year =

[54] [57]

The Annals of Mathematical Statistics , year =

Henry Teicher , title =. The Annals of Mathematical Statistics , year =

[55] [58]

Allman and Catherine Matias and John A

Elizabeth S. Allman and Catherine Matias and John A. Rhodes , title =. The Annals of Statistics , year =

[56] [59]

ICML , year =

Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations , author =. ICML , year =

[57] [60]

Kingma and Ricardo Pio Monti and Aapo Hyv

Ilyes Khemakhem and Diederik P. Kingma and Ricardo Pio Monti and Aapo Hyv. Variational Autoencoders and Nonlinear. AISTATS , year =

[58] [61]

ICML , year =

Causal Representation Learning Made Identifiable by Grouping of Observational Variables , author =. ICML , year =

[59] [62]

Journal of Machine Learning Research , year =

Alexander D'Amour and others , title =. Journal of Machine Learning Research , year =

[60] [63]

Calmon and Mario Diaz , title =

Lucas Monteiro Paes and Rodrigo Cruz and Flavio P. Calmon and Mario Diaz , title =. ISIT , year =

[61] [64]

ICDT , year =

Kevin Beyer and Jonathan Goldstein and Raghu Ramakrishnan and Uri Shaft , title =. ICDT , year =

[62] [65]

Aggarwal and Alexander Hinneburg and Daniel A

Charu C. Aggarwal and Alexander Hinneburg and Daniel A. Keim , title =. ICDT , year =

[63] [66]

Advances in Neural Information Processing Systems 15 , year =

Jon Kleinberg , title =. Advances in Neural Information Processing Systems 15 , year =

[64] [67]

NeurIPS , year =

Chulhee Yun and Yin Tat Lee and Samuel Wiseman , title =. NeurIPS , year =

[65] [68]

Nature Machine Intelligence , year =

Robert Geirhos and others , title =. Nature Machine Intelligence , year =

[66] [69]

ICCV , year =

Khalid Mahmood and others , title =. ICCV , year =

[67] [70]

arXiv:2111.10659 , year =

Jindong Gu and Yao Qin and Volker Tresp , title =. arXiv:2111.10659 , year =

work page arXiv

[68] [71]

NeurIPS , year =

Training Compute-Optimal Large Language Models , author =. NeurIPS , year =

[69] [72]

Scaling Laws for Neural Language Models

Jared Kaplan and others , title =. arXiv:2001.08361 , year =

work page internal anchor Pith review Pith/arXiv arXiv 2001

[70] [73]

Causal Inference by Using Invariant Prediction: Identification and Confidence Intervals , journal =

Jonas Peters and Peter B. Causal Inference by Using Invariant Prediction: Identification and Confidence Intervals , journal =. 2016 , volume =

2016

[71] [74]

ICLR , year =

Sanghyuk Park and Sungdong Choe and Yiding Jiang and Victor Veitch , title =. ICLR , year =

[72] [75]

Fuller , title =

Wayne A. Fuller , title =

[73] [76]

Carroll and David Ruppert and Leonard A

Raymond J. Carroll and David Ruppert and Leonard A. Stefanski and Ciprian M. Crainiceanu , title =

[74] [77]

Journal of Machine Learning Research , year=

Underspecification Presents Challenges for Credibility in Modern Machine Learning , author=. Journal of Machine Learning Research , year=

[75] [79]

Hallucination is Inevitable: An Innate Limitation of Large Language Models

Hallucination is inevitable: An innate limitation of large language models , author=. arXiv preprint arXiv:2401.11817 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[76] [80]

2025 , archivePrefix=

Hallucination is Inevitable for LLMs with the Open World Assumption , author=. 2025 , archivePrefix=. 2510.05116 , primaryClass=

work page arXiv 2025

[77] [81]

2019 , archivePrefix=

On the Existence of Simpler Machine Learning Models , author=. 2019 , archivePrefix=. 1908.01755 , primaryClass=

work page arXiv 2019

[78] [82]

Proceedings of the 41st International Conference on Machine Learning , series=

Position: Amazing Things Come From Having Many Good Models , author=. Proceedings of the 41st International Conference on Machine Learning , series=. 2024 , url=

2024

[79] [83]

What Uncertainties Do We Need in

Kendall, Alex and Gal, Yarin , year=. What Uncertainties Do We Need in

[80] [84]

Proceedings of Machine Learning Research (ICML 2025 Position Paper Track) , year=

Position: Uncertainty Quantification Needs Reassessment for Large-language Model Agents , author=. Proceedings of Machine Learning Research (ICML 2025 Position Paper Track) , year=

2025