A Decision Theoretic Perspective on Artificial Superintelligence: Coping with Missing Data Problems in Prediction and Treatment Choice
Pith reviewed 2026-05-21 22:01 UTC · model grok-4.3
The pith
Current machine learning AI will not outperform humans in handling missing data for predictions and treatment choices.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We see no indication that the current dominant architecture of machine learning-based artificial intelligence systems will outperform humans in coping with missing data problems in prediction and treatment choice. These problems give rise to identification issues that do not diminish with larger samples. A decision-theoretic perspective formalizes the connection between intelligence and identification problems, leading to the conclusion that current AI research paths may not overcome these challenges.
What carries the argument
A decision-theoretic perspective that formalizes the connection between intelligence and identification problems, applied to show why missing data issues persist beyond scaling.
If this is right
- Identification problems in prediction and treatment choice will persist for AI systems.
- Methodological research on missing data remains essential for decision making.
- The pursuit of artificial superintelligence faces structural limits from inferential problems.
- Current ML approaches may not resolve issues that are independent of sample size.
Where Pith is reading between the lines
- If identification problems resist current methods, similar barriers may exist in other decision contexts with incomplete information.
- Hybrid approaches combining AI with human judgment could be necessary for robust treatment choices.
- Future AI research might need to incorporate explicit mechanisms for handling uncertainty from missing data rather than relying solely on pattern recognition.
- Empirical tests could involve comparing AI and human performance on standardized missing data benchmarks in econometrics.
Load-bearing premise
That the structural nature of missing data identification problems makes them resistant to solutions based on increased data volume or computational power in existing machine learning systems.
What would settle it
Demonstration of an ML-based AI system that, when faced with missing data in a treatment choice scenario, consistently makes better decisions than human experts or standard econometric methods on the same problem.
read the original abstract
Enormous attention and resources are being devoted to the quest for artificial general intelligence and, even more ambitiously, artificial superintelligence. We wonder about the implications for methodological research that aims to help decision makers cope with what econometricians call identification problems, inferential problems in empirical research that do not diminish as sample size grows. Of particular concern are missing data problems in prediction and treatment choice. Essentially all data collection intended to inform decision making is subject to missing data, which gives rise to identification problems. Thus far, we see no indication that the current dominant architecture of machine learning (ML)-based artificial intelligence (AI) systems will outperform humans in this context. In this paper, we explain why we have reached this conclusion and why we see the missing data problem as a cautionary case study in the quest for superintelligence more generally. We first discuss the concept of intelligence, focusing initially on some work by AI researchers, before presenting a decision-theoretic perspective that formalizes the connection between intelligence and identification problems. We next apply this perspective to two leading cases of missing data problems. Then we explain why we are skeptical that AI research is currently on a path toward machines doing better than humans at solving these identification problems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper develops a decision-theoretic formalization of intelligence and applies it to missing-data identification problems in prediction and treatment choice. It argues that these problems are structurally resistant to the scaling and pattern-matching methods of current dominant ML architectures, that non-identifiability persists with larger samples or compute, and that this constitutes a cautionary case study for the prospects of artificial superintelligence.
Significance. If the central argument is correct, the paper would usefully connect decision theory to the limits of purely statistical approaches in econometrics and AI, underscoring that identification requires explicit modeling of what is not recoverable from the observed distribution. The work supplies a conceptual framework rather than derivations, data, or machine-checked results.
major comments (2)
- [Application to two leading cases] The section applying the decision-theoretic perspective to the two leading cases of missing data does not contain a formal separation result establishing that any architecture relying solely on statistical pattern matching from the observed distribution must fail to recover or deploy valid identifying assumptions.
- [Decision-theoretic perspective] The claim that current ML architectures lack the structure to outperform humans rests on the premise that identification problems cannot be solved through greater data volume or auxiliary training; this premise is asserted but not demonstrated via counter-example or impossibility argument in the decision-theoretic sections.
minor comments (2)
- [Abstract and Introduction] The abstract and introduction would benefit from explicit examples of the two leading missing-data cases (e.g., selection on unobservables or attrition) to make the scope concrete for readers.
- [Formalization of intelligence] Notation for the decision-theoretic objects (e.g., the mapping from observed distributions to actions) could be introduced earlier and used consistently when contrasting human and machine approaches.
Simulated Author's Rebuttal
We thank the referee for their thoughtful and constructive report. The comments help clarify the scope and limitations of our conceptual framework. We respond to each major comment below and note the revisions we will incorporate.
read point-by-point responses
-
Referee: [Application to two leading cases] The section applying the decision-theoretic perspective to the two leading cases of missing data does not contain a formal separation result establishing that any architecture relying solely on statistical pattern matching from the observed distribution must fail to recover or deploy valid identifying assumptions.
Authors: We agree that the applications to the two leading cases are illustrative rather than a formal separation theorem. The manuscript's goal is to supply a decision-theoretic lens that connects identification problems to the limits of purely statistical pattern matching, without claiming to derive impossibility results for every conceivable architecture. We will revise the relevant section to state this scope explicitly and to sketch the additional modeling assumptions that would be required for a general separation result. revision: partial
-
Referee: [Decision-theoretic perspective] The claim that current ML architectures lack the structure to outperform humans rests on the premise that identification problems cannot be solved through greater data volume or auxiliary training; this premise is asserted but not demonstrated via counter-example or impossibility argument in the decision-theoretic sections.
Authors: The decision-theoretic sections define intelligence in terms of the ability to solve identification problems, which are defined precisely as those that remain unresolved by the observed distribution alone. We will add a brief counter-example in the revision showing that increasing sample size or auxiliary training on the observed distribution cannot recover a non-identified parameter in a standard missing-data setting for treatment choice. This will make the premise more explicit without altering the paper's primarily conceptual character. revision: yes
Circularity Check
Conceptual discussion of identification problems contains no circular derivations or self-referential fits
full rationale
The paper advances a decision-theoretic framing of intelligence and applies it to standard missing-data identification problems in prediction and treatment choice. No equations, fitted parameters, or predictions derived from data appear in the provided abstract or described structure. The central skepticism—that dominant ML architectures will not outperform humans on non-identifiable problems—rests on the established econometric distinction between identification (which does not improve with sample size) and estimation, rather than on any self-definition, post-hoc renaming, or load-bearing self-citation chain. The argument is self-contained against external benchmarks in partial identification theory and does not reduce its conclusions to its own inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Missing data problems create identification issues that do not diminish as sample size grows.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We next apply this perspective to two leading cases of missing data problems... agnostic identification region for E(y|x=ξ) is the interval [E(y|x=ξ,z=1)P(z=1|x=ξ), ...]
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The Law of Decreasing Credibility: The credibility of inference decreases with the strength of the assumptions maintained.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
doi: 10.1162/99608f92.9898eede
Angrist, J. and J. Pischke (2010), “The Credibility Revolution in Empirical Economics: How Better Research Design Is Taking the Con out of Econometrics,” Journal of Economic Perspectives, 24, 3-30. Bailey, M. (2023), “A New Paradigm for Polling, ” Harvard Data Science Review , 5, DOI: 10.1162/99608f92.9898eede Berger, J. (1985), Statistical Decision Theor...
-
[2]
A Survey on Missing Data in Machine Learning,
Emmanuel, T., T. Maupong, D. Mpoeleng et al. (2021), “A Survey on Missing Data in Machine Learning,” Journal of Big Data, 8:140. Ferguson, T. (1967), Mathematical Statistics: A Decision Theoretic Approach, San Diego: Academic Press. Fitzgerald, J., P. Gottschalk, and R. Moffitt (1998), “An Analysis of Sample Attrition in Panel Data ,” Journal of Human Res...
work page 2021
-
[3]
Universal Intelligence: A Definition of Machine Intelligence,
Legg, S. and M. Hutter (2007), “ Universal Intelligence: A Definition of Machine Intelligence,” Minds & Machines, 17, 391-444. Li, S., V . Litvin, and C. Manski (2023), “Partial Identification of Personalized Treatment Response with Trial-reported Analyses of Binary Subgroups,” Epidemiology. 34, 319-324. Little, R. (2021), “Missing Data Assumptions,” Annu...
work page 2007
-
[4]
Policy Analysis with Incredible Certitude,
Manski, C. (2007), Identification for Prediction and Decision, Cambridge, MA: Harvard University Press. Manski, C. (2011), “Policy Analysis with Incredible Certitude,” The Economic Journal, 121, F261-F289. Manski, C. (2013), Public Policy in an Uncertain World, Cambridge, MA: Harvard University Press. Manski, C. (2016), “Credible Interval Estimates for Of...
-
[5]
Learning from Data with Structured Missingness,
Mitra, R., S. McGough, T. Chakraborti et al. (2023), “Learning from Data with Structured Missingness,” Nature Machine Intelligence, 5, 13–23. 37 Molinari, F. (2020), “Microeconometrics with Partial Identification,” Handbook of Econometrics, V ol. 7A, S. Durlauf, L. Hansen, J. Heckman, and R. Matzkin editors, Amsterdam: Elsevier, 355-486. Prosser, C. and J...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.