An AGI with Time-Inconsistent Preferences
Pith reviewed 2026-05-25 17:29 UTC · model grok-4.3
The pith
Using standard economic discounting in AGI models implicitly assumes the AGI has time-consistent preferences.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
This paper reveals a trap for artificial general intelligence (AGI) theorists who use economists' standard method of discounting. This trap is implicitly and falsely assuming that a rational AGI would have time-consistent preferences. An agent with time-inconsistent preferences knows that its future self will disagree with its current self concerning intertemporal decision making. Such an agent cannot automatically trust its future self to carry out plans that its current self considers optimal.
What carries the argument
Time-inconsistent preferences, where the agent anticipates disagreement with its future self on intertemporal choices and thus cannot trust plan execution.
Load-bearing premise
The standard economic method of discounting future rewards necessarily encodes or requires time-consistent preferences in the modeled agent.
What would settle it
Constructing an AGI model that uses standard discounting yet has time-inconsistent preferences and still has its future selves reliably execute the current self's optimal plans without extra mechanisms.
read the original abstract
This paper reveals a trap for artificial general intelligence (AGI) theorists who use economists' standard method of discounting. This trap is implicitly and falsely assuming that a rational AGI would have time-consistent preferences. An agent with time-inconsistent preferences knows that its future self will disagree with its current self concerning intertemporal decision making. Such an agent cannot automatically trust its future self to carry out plans that its current self considers optimal.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that AGI theorists using the standard economic method of discounting future rewards implicitly and falsely assume that a rational AGI has time-consistent preferences. It observes that an agent with time-inconsistent preferences knows its future self will disagree on intertemporal choices and therefore cannot automatically trust that future self to execute plans the current self deems optimal.
Significance. If the central observation holds, the paper would identify a methodological pitfall in preference modeling for AGI systems. However, the manuscript supplies no definition of rationality, no derivation linking rationality to time-inconsistency, and no counter-example or reference establishing why the time-consistency assumption is false, which substantially limits the result's significance even if the descriptive part about distrust of future selves is accurate.
major comments (1)
- [Abstract] Abstract, paragraph 1: the claim that standard discounting 'implicitly and falsely' assumes time-consistent preferences for a rational AGI is asserted without a definition of rationality or an argument showing why time-consistency is not required for rationality; this unsupported premise is load-bearing for the 'trap' identified in the title and abstract.
Simulated Author's Rebuttal
We thank the referee for the detailed review. We respond to the single major comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract, paragraph 1: the claim that standard discounting 'implicitly and falsely' assumes time-consistent preferences for a rational AGI is asserted without a definition of rationality or an argument showing why time-consistency is not required for rationality; this unsupported premise is load-bearing for the 'trap' identified in the title and abstract.
Authors: The manuscript's primary aim is to identify a practical difficulty that arises for AGI design once time-inconsistent preferences are admitted, rather than to derive a new definition of rationality. The term 'falsely' is intended to signal that the standard discounting procedure rests on an assumption that may fail to hold for certain AGI preference structures; the body of the paper then examines the resulting distrust between successive selves. We nevertheless accept that the abstract would be clearer with explicit grounding. In revision we will (i) insert a concise definition of time-consistency drawn from the economics literature, (ii) note that exponential discounting is conventionally linked to time-consistent rational choice, and (iii) add a short introductory paragraph with standard references. These additions will support the premise while leaving the central observation about future-self distrust unchanged. revision: yes
Circularity Check
No circularity; conceptual distinction with no self-referential derivation or fitted quantities
full rationale
The manuscript presents a conceptual warning about an implicit assumption in standard discounting methods for AGI, resting on the definitional difference between time-consistent and time-inconsistent preferences. No equations, parameters, predictions, or self-citations are invoked to derive results that reduce to the paper's own inputs. The claim that the assumption is 'false' is asserted without derivation or external support, but this is a matter of evidential weakness rather than circular reduction. The argument is self-contained as a definitional observation and does not loop back on itself.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Rational agents can be modeled using standard economic discounting of future rewards.
Reference graph
Works this paper leans on
-
[1]
Eternity in six hours: Intergalactic spreading of intelligent life and sharpening the Fermi paradox
Armstrong, Stuart, and Anders Sandberg. "Eternity in six hours: Intergalactic spreading of intelligent life and sharpening the Fermi paradox." Acta Astronautica 89 (2013): 1-13. Bostrom, Nick. Superintelligence: Paths, Dangers, Strategies. Oxford: Oxford University Press,
work page 2013
-
[2]
Hyperbolic Discounting and Learning over Multiple Horizons
Fedus, William, Carles Gelada, Yoshua Bengio, Marc G. Bellemare, and Hugo Larochelle. "Hyperbolic Discounting and Learning over Multiple Horizons." arXiv preprint arXiv:1902.06865 (2019). Frederick, Shane, George Loewenstein, and Ted O'donoghue. "Time discounting and time preference: A critical review." Journal of economic literature 40, no. 2 (2002): 351...
work page internal anchor Pith review Pith/arXiv arXiv 1902
-
[3]
Pollak, Robert A. "Consistent planning." The Review of Economic Studies 35, no. 2 (1968): 201-208. Samuelson, Paul A. "A note on measurement of utility." The review of economic studies 4, no. 2 (1937): 155-161. Soares, Nate, Benja Fallenstein, Stuart Armstrong, and Eliezer Yudkowsky. "Corrigibility." In Workshops at the Twenty-Ninth AAAI Conference on Art...
work page 1968
-
[4]
On hyperbolic discounting and uncertain hazard rates
Sozou, Peter D. "On hyperbolic discounting and uncertain hazard rates." Proceedings of the Royal Society of London. Series B: Biological Sciences 265, no. 1409 (1998): 2015-2020. Yampolskiy, Roman V. Artificial superintelligence: a futuristic approach. Chapman and Hall/CRC,
work page 1998
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.