A Decision Theoretic Perspective on Artificial Superintelligence: Coping with Missing Data Problems in Prediction and Treatment Choice

Charles F. Manski; Jeff Dominitz

arxiv: 2509.12388 · v2 · pith:TOZHBA3Hnew · submitted 2025-09-15 · 💰 econ.EM

A Decision Theoretic Perspective on Artificial Superintelligence: Coping with Missing Data Problems in Prediction and Treatment Choice

Jeff Dominitz , Charles F. Manski This is my paper

Pith reviewed 2026-05-21 22:01 UTC · model grok-4.3

classification 💰 econ.EM

keywords artificial intelligencemissing dataidentification problemsdecision theorypredictiontreatment choicesuperintelligencemachine learning

0 comments

The pith

Current machine learning AI will not outperform humans in handling missing data for predictions and treatment choices.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that identification problems caused by missing data in empirical research do not go away with larger samples or more computation. A sympathetic reader would care because vast resources are directed at developing artificial superintelligence, yet these basic inferential challenges may limit machine performance relative to humans. The authors present a decision-theoretic framework linking intelligence to the ability to resolve such identification issues. They apply this to missing data in prediction and treatment choice and express skepticism that dominant ML architectures are on track to surpass human capabilities in these areas.

Core claim

We see no indication that the current dominant architecture of machine learning-based artificial intelligence systems will outperform humans in coping with missing data problems in prediction and treatment choice. These problems give rise to identification issues that do not diminish with larger samples. A decision-theoretic perspective formalizes the connection between intelligence and identification problems, leading to the conclusion that current AI research paths may not overcome these challenges.

What carries the argument

A decision-theoretic perspective that formalizes the connection between intelligence and identification problems, applied to show why missing data issues persist beyond scaling.

If this is right

Identification problems in prediction and treatment choice will persist for AI systems.
Methodological research on missing data remains essential for decision making.
The pursuit of artificial superintelligence faces structural limits from inferential problems.
Current ML approaches may not resolve issues that are independent of sample size.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If identification problems resist current methods, similar barriers may exist in other decision contexts with incomplete information.
Hybrid approaches combining AI with human judgment could be necessary for robust treatment choices.
Future AI research might need to incorporate explicit mechanisms for handling uncertainty from missing data rather than relying solely on pattern recognition.
Empirical tests could involve comparing AI and human performance on standardized missing data benchmarks in econometrics.

Load-bearing premise

That the structural nature of missing data identification problems makes them resistant to solutions based on increased data volume or computational power in existing machine learning systems.

What would settle it

Demonstration of an ML-based AI system that, when faced with missing data in a treatment choice scenario, consistently makes better decisions than human experts or standard econometric methods on the same problem.

read the original abstract

Enormous attention and resources are being devoted to the quest for artificial general intelligence and, even more ambitiously, artificial superintelligence. We wonder about the implications for methodological research that aims to help decision makers cope with what econometricians call identification problems, inferential problems in empirical research that do not diminish as sample size grows. Of particular concern are missing data problems in prediction and treatment choice. Essentially all data collection intended to inform decision making is subject to missing data, which gives rise to identification problems. Thus far, we see no indication that the current dominant architecture of machine learning (ML)-based artificial intelligence (AI) systems will outperform humans in this context. In this paper, we explain why we have reached this conclusion and why we see the missing data problem as a cautionary case study in the quest for superintelligence more generally. We first discuss the concept of intelligence, focusing initially on some work by AI researchers, before presenting a decision-theoretic perspective that formalizes the connection between intelligence and identification problems. We next apply this perspective to two leading cases of missing data problems. Then we explain why we are skeptical that AI research is currently on a path toward machines doing better than humans at solving these identification problems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper applies decision theory to argue that missing-data identification problems will limit current ML architectures from outperforming humans, but the case stays conceptual without showing why pattern matching cannot approximate the needed bounds.

read the letter

The main thing to know is that Dominitz and Manski use a decision-theoretic lens to claim that identification problems from missing data in prediction and treatment choice will not be solved by scaling today's ML systems, and they present this as a cautionary example for superintelligence more broadly. They walk through why sample size does not resolve these issues and why they see no sign that dominant AI approaches will do better than humans here.

Referee Report

2 major / 2 minor

Summary. The paper develops a decision-theoretic formalization of intelligence and applies it to missing-data identification problems in prediction and treatment choice. It argues that these problems are structurally resistant to the scaling and pattern-matching methods of current dominant ML architectures, that non-identifiability persists with larger samples or compute, and that this constitutes a cautionary case study for the prospects of artificial superintelligence.

Significance. If the central argument is correct, the paper would usefully connect decision theory to the limits of purely statistical approaches in econometrics and AI, underscoring that identification requires explicit modeling of what is not recoverable from the observed distribution. The work supplies a conceptual framework rather than derivations, data, or machine-checked results.

major comments (2)

[Application to two leading cases] The section applying the decision-theoretic perspective to the two leading cases of missing data does not contain a formal separation result establishing that any architecture relying solely on statistical pattern matching from the observed distribution must fail to recover or deploy valid identifying assumptions.
[Decision-theoretic perspective] The claim that current ML architectures lack the structure to outperform humans rests on the premise that identification problems cannot be solved through greater data volume or auxiliary training; this premise is asserted but not demonstrated via counter-example or impossibility argument in the decision-theoretic sections.

minor comments (2)

[Abstract and Introduction] The abstract and introduction would benefit from explicit examples of the two leading missing-data cases (e.g., selection on unobservables or attrition) to make the scope concrete for readers.
[Formalization of intelligence] Notation for the decision-theoretic objects (e.g., the mapping from observed distributions to actions) could be introduced earlier and used consistently when contrasting human and machine approaches.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful and constructive report. The comments help clarify the scope and limitations of our conceptual framework. We respond to each major comment below and note the revisions we will incorporate.

read point-by-point responses

Referee: [Application to two leading cases] The section applying the decision-theoretic perspective to the two leading cases of missing data does not contain a formal separation result establishing that any architecture relying solely on statistical pattern matching from the observed distribution must fail to recover or deploy valid identifying assumptions.

Authors: We agree that the applications to the two leading cases are illustrative rather than a formal separation theorem. The manuscript's goal is to supply a decision-theoretic lens that connects identification problems to the limits of purely statistical pattern matching, without claiming to derive impossibility results for every conceivable architecture. We will revise the relevant section to state this scope explicitly and to sketch the additional modeling assumptions that would be required for a general separation result. revision: partial
Referee: [Decision-theoretic perspective] The claim that current ML architectures lack the structure to outperform humans rests on the premise that identification problems cannot be solved through greater data volume or auxiliary training; this premise is asserted but not demonstrated via counter-example or impossibility argument in the decision-theoretic sections.

Authors: The decision-theoretic sections define intelligence in terms of the ability to solve identification problems, which are defined precisely as those that remain unresolved by the observed distribution alone. We will add a brief counter-example in the revision showing that increasing sample size or auxiliary training on the observed distribution cannot recover a non-identified parameter in a standard missing-data setting for treatment choice. This will make the premise more explicit without altering the paper's primarily conceptual character. revision: yes

Circularity Check

0 steps flagged

Conceptual discussion of identification problems contains no circular derivations or self-referential fits

full rationale

The paper advances a decision-theoretic framing of intelligence and applies it to standard missing-data identification problems in prediction and treatment choice. No equations, fitted parameters, or predictions derived from data appear in the provided abstract or described structure. The central skepticism—that dominant ML architectures will not outperform humans on non-identifiable problems—rests on the established econometric distinction between identification (which does not improve with sample size) and estimation, rather than on any self-definition, post-hoc renaming, or load-bearing self-citation chain. The argument is self-contained against external benchmarks in partial identification theory and does not reduce its conclusions to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The argument rests on standard econometric definitions of identification problems and decision theory without introducing new free parameters, axioms beyond domain assumptions, or invented entities.

axioms (1)

domain assumption Missing data problems create identification issues that do not diminish as sample size grows.
Explicitly stated in the abstract as the core concern for decision making.

pith-pipeline@v0.9.0 · 5752 in / 1096 out tokens · 54273 ms · 2026-05-21T22:01:44.367662+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We next apply this perspective to two leading cases of missing data problems... agnostic identification region for E(y|x=ξ) is the interval [E(y|x=ξ,z=1)P(z=1|x=ξ), ...]
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The Law of Decreasing Credibility: The credibility of inference decreases with the strength of the assumptions maintained.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

5 extracted references · 5 canonical work pages

[1]

doi: 10.1162/99608f92.9898eede

Angrist, J. and J. Pischke (2010), “The Credibility Revolution in Empirical Economics: How Better Research Design Is Taking the Con out of Econometrics,” Journal of Economic Perspectives, 24, 3-30. Bailey, M. (2023), “A New Paradigm for Polling, ” Harvard Data Science Review , 5, DOI: 10.1162/99608f92.9898eede Berger, J. (1985), Statistical Decision Theor...

work page doi:10.1162/99608f92.9898eede 2010
[2]

A Survey on Missing Data in Machine Learning,

Emmanuel, T., T. Maupong, D. Mpoeleng et al. (2021), “A Survey on Missing Data in Machine Learning,” Journal of Big Data, 8:140. Ferguson, T. (1967), Mathematical Statistics: A Decision Theoretic Approach, San Diego: Academic Press. Fitzgerald, J., P. Gottschalk, and R. Moffitt (1998), “An Analysis of Sample Attrition in Panel Data ,” Journal of Human Res...

work page 2021
[3]

Universal Intelligence: A Definition of Machine Intelligence,

Legg, S. and M. Hutter (2007), “ Universal Intelligence: A Definition of Machine Intelligence,” Minds & Machines, 17, 391-444. Li, S., V . Litvin, and C. Manski (2023), “Partial Identification of Personalized Treatment Response with Trial-reported Analyses of Binary Subgroups,” Epidemiology. 34, 319-324. Little, R. (2021), “Missing Data Assumptions,” Annu...

work page 2007
[4]

Policy Analysis with Incredible Certitude,

Manski, C. (2007), Identification for Prediction and Decision, Cambridge, MA: Harvard University Press. Manski, C. (2011), “Policy Analysis with Incredible Certitude,” The Economic Journal, 121, F261-F289. Manski, C. (2013), Public Policy in an Uncertain World, Cambridge, MA: Harvard University Press. Manski, C. (2016), “Credible Interval Estimates for Of...

work page doi:10.1073/pnas.2022886118 2007
[5]

Learning from Data with Structured Missingness,

Mitra, R., S. McGough, T. Chakraborti et al. (2023), “Learning from Data with Structured Missingness,” Nature Machine Intelligence, 5, 13–23. 37 Molinari, F. (2020), “Microeconometrics with Partial Identification,” Handbook of Econometrics, V ol. 7A, S. Durlauf, L. Hansen, J. Heckman, and R. Matzkin editors, Amsterdam: Elsevier, 355-486. Prosser, C. and J...

work page doi:10.1016/j.ejrad.2023.111276 2023

[1] [1]

doi: 10.1162/99608f92.9898eede

Angrist, J. and J. Pischke (2010), “The Credibility Revolution in Empirical Economics: How Better Research Design Is Taking the Con out of Econometrics,” Journal of Economic Perspectives, 24, 3-30. Bailey, M. (2023), “A New Paradigm for Polling, ” Harvard Data Science Review , 5, DOI: 10.1162/99608f92.9898eede Berger, J. (1985), Statistical Decision Theor...

work page doi:10.1162/99608f92.9898eede 2010

[2] [2]

A Survey on Missing Data in Machine Learning,

Emmanuel, T., T. Maupong, D. Mpoeleng et al. (2021), “A Survey on Missing Data in Machine Learning,” Journal of Big Data, 8:140. Ferguson, T. (1967), Mathematical Statistics: A Decision Theoretic Approach, San Diego: Academic Press. Fitzgerald, J., P. Gottschalk, and R. Moffitt (1998), “An Analysis of Sample Attrition in Panel Data ,” Journal of Human Res...

work page 2021

[3] [3]

Universal Intelligence: A Definition of Machine Intelligence,

Legg, S. and M. Hutter (2007), “ Universal Intelligence: A Definition of Machine Intelligence,” Minds & Machines, 17, 391-444. Li, S., V . Litvin, and C. Manski (2023), “Partial Identification of Personalized Treatment Response with Trial-reported Analyses of Binary Subgroups,” Epidemiology. 34, 319-324. Little, R. (2021), “Missing Data Assumptions,” Annu...

work page 2007

[4] [4]

Policy Analysis with Incredible Certitude,

Manski, C. (2007), Identification for Prediction and Decision, Cambridge, MA: Harvard University Press. Manski, C. (2011), “Policy Analysis with Incredible Certitude,” The Economic Journal, 121, F261-F289. Manski, C. (2013), Public Policy in an Uncertain World, Cambridge, MA: Harvard University Press. Manski, C. (2016), “Credible Interval Estimates for Of...

work page doi:10.1073/pnas.2022886118 2007

[5] [5]

Learning from Data with Structured Missingness,

Mitra, R., S. McGough, T. Chakraborti et al. (2023), “Learning from Data with Structured Missingness,” Nature Machine Intelligence, 5, 13–23. 37 Molinari, F. (2020), “Microeconometrics with Partial Identification,” Handbook of Econometrics, V ol. 7A, S. Durlauf, L. Hansen, J. Heckman, and R. Matzkin editors, Amsterdam: Elsevier, 355-486. Prosser, C. and J...

work page doi:10.1016/j.ejrad.2023.111276 2023