Recognition: unknown
Limitations on Accurate, Trusted, Human-level Reasoning
read the original abstract
We identify a fundamental incompatibility between the goals of accuracy, trust, and human-level reasoning in artificial intelligence (AI) systems, for strict mathematical definitions of these notions. We define accuracy of a system as the property that it never makes any false claims when it has the ability to abstain from making a prediction on any input, and trust as the assumption that the system is accurate. We define human-level reasoning as the property of an AI system always matching or exceeding human capability. Our core finding is that -- for our formal definitions of these notions -- an accurate and trusted AI system cannot be a human-level reasoning system: for such an accurate, trusted system there are task instances which are easily and provably solvable by a human but not by the system. Our proofs draw parallels to G\"odel's incompleteness theorems and Turing's proof of the undecidability of the halting problem, and can be regarded as interpretations of G\"odel's and Turing's results. Key to our proof is the formalization of the notion of trust, which allows us to separate the intrinsic property of a system (being accurate) from its epistemic status (being trusted).
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
The Accountability Horizon: An Impossibility Theorem for Governing Human-Agent Collectives
The Accountability Incompleteness Theorem demonstrates that human-AI collectives above the Accountability Horizon with feedback cycles cannot simultaneously meet attributability, foreseeability, non-vacuity, and compl...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.