An AGI with Time-Inconsistent Preferences

James D. Miller; Roman Yampolskiy

arxiv: 1906.10536 · v1 · pith:BNSHKDTXnew · submitted 2019-06-23 · 💻 cs.AI

An AGI with Time-Inconsistent Preferences

James D. Miller , Roman Yampolskiy This is my paper

Pith reviewed 2026-05-25 17:29 UTC · model grok-4.3

classification 💻 cs.AI

keywords AGItime-inconsistent preferencesdiscountingintertemporal decision makingrational agentsfuture selvesplanning

0 comments

The pith

Using standard economic discounting in AGI models implicitly assumes the AGI has time-consistent preferences.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that AGI theorists applying the economists' standard method of discounting future rewards are making an implicit assumption that the AGI agent will have time-consistent preferences. If an AGI has time-inconsistent preferences, it will know that its future self will disagree with current decisions on when to take actions. As a result, the current self cannot count on the future self to implement plans that the current self views as best. This matters because many models of rational AGI behavior rely on discounting without addressing this potential disagreement between selves.

Core claim

This paper reveals a trap for artificial general intelligence (AGI) theorists who use economists' standard method of discounting. This trap is implicitly and falsely assuming that a rational AGI would have time-consistent preferences. An agent with time-inconsistent preferences knows that its future self will disagree with its current self concerning intertemporal decision making. Such an agent cannot automatically trust its future self to carry out plans that its current self considers optimal.

What carries the argument

Time-inconsistent preferences, where the agent anticipates disagreement with its future self on intertemporal choices and thus cannot trust plan execution.

Load-bearing premise

The standard economic method of discounting future rewards necessarily encodes or requires time-consistent preferences in the modeled agent.

What would settle it

Constructing an AGI model that uses standard discounting yet has time-inconsistent preferences and still has its future selves reliably execute the current self's optimal plans without extra mechanisms.

read the original abstract

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This note correctly flags that time-inconsistent preferences would make future AGI selves unreliable for current plans, but it asserts without support that standard discounting falsely assumes rationality requires consistency.

read the letter

The paper's main takeaway is straightforward: an AGI with time-inconsistent preferences will know its future self may not execute the current self's optimal plans, so standard discounting methods that assume consistency create a modeling trap. That observation follows from basic definitions in the dynamic inconsistency literature and applies the point to AGI theorists who borrow economists' discounting tools. The write-up spells out the trust issue in plain terms without unnecessary jargon. Beyond that, there is little new. Time-inconsistency has been standard in behavioral economics since at least Strotz and Pollak; the paper simply imports the distinction into AGI safety discussions rather than deriving a new result or resolving a technical open question. No formal model, counter-example, or derivation appears. The soft spot is the unsupported claim that assuming a rational AGI has time-consistent preferences is false. The text does not define rationality in this context, show why inconsistency is compatible with rationality, or cite evidence that real or idealized AGI systems would exhibit inconsistency. It treats the falsity as given rather than argued. That makes the central assertion rest on an unexamined premise. The work is a short conceptual note rather than a technical paper. Readers already working on AGI preference modeling or long-term planning might find the reminder useful as a cautionary point. It does not contain enough substance to shift modeling practice on its own. I would send this to peer review as a brief communication or letter, since the observation is clear enough to merit discussion even if the rationality claim needs tightening or removal.

Referee Report

1 major / 0 minor

Summary. The paper claims that AGI theorists using the standard economic method of discounting future rewards implicitly and falsely assume that a rational AGI has time-consistent preferences. It observes that an agent with time-inconsistent preferences knows its future self will disagree on intertemporal choices and therefore cannot automatically trust that future self to execute plans the current self deems optimal.

Significance. If the central observation holds, the paper would identify a methodological pitfall in preference modeling for AGI systems. However, the manuscript supplies no definition of rationality, no derivation linking rationality to time-inconsistency, and no counter-example or reference establishing why the time-consistency assumption is false, which substantially limits the result's significance even if the descriptive part about distrust of future selves is accurate.

major comments (1)

[Abstract] Abstract, paragraph 1: the claim that standard discounting 'implicitly and falsely' assumes time-consistent preferences for a rational AGI is asserted without a definition of rationality or an argument showing why time-consistency is not required for rationality; this unsupported premise is load-bearing for the 'trap' identified in the title and abstract.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed review. We respond to the single major comment below.

read point-by-point responses

Referee: [Abstract] Abstract, paragraph 1: the claim that standard discounting 'implicitly and falsely' assumes time-consistent preferences for a rational AGI is asserted without a definition of rationality or an argument showing why time-consistency is not required for rationality; this unsupported premise is load-bearing for the 'trap' identified in the title and abstract.

Authors: The manuscript's primary aim is to identify a practical difficulty that arises for AGI design once time-inconsistent preferences are admitted, rather than to derive a new definition of rationality. The term 'falsely' is intended to signal that the standard discounting procedure rests on an assumption that may fail to hold for certain AGI preference structures; the body of the paper then examines the resulting distrust between successive selves. We nevertheless accept that the abstract would be clearer with explicit grounding. In revision we will (i) insert a concise definition of time-consistency drawn from the economics literature, (ii) note that exponential discounting is conventionally linked to time-consistent rational choice, and (iii) add a short introductory paragraph with standard references. These additions will support the premise while leaving the central observation about future-self distrust unchanged. revision: yes

Circularity Check

0 steps flagged

No circularity; conceptual distinction with no self-referential derivation or fitted quantities

full rationale

The manuscript presents a conceptual warning about an implicit assumption in standard discounting methods for AGI, resting on the definitional difference between time-consistent and time-inconsistent preferences. No equations, parameters, predictions, or self-citations are invoked to derive results that reduce to the paper's own inputs. The claim that the assumption is 'false' is asserted without derivation or external support, but this is a matter of evidential weakness rather than circular reduction. The argument is self-contained as a definitional observation and does not loop back on itself.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The argument rests on the background economic modeling practice of discounting and the definitional properties of time consistency; no free parameters, invented entities, or ad-hoc axioms are introduced in the abstract.

axioms (1)

domain assumption Rational agents can be modeled using standard economic discounting of future rewards.
Invoked in the first sentence of the abstract as the method whose implicit assumption is being critiqued.

pith-pipeline@v0.9.0 · 5580 in / 1163 out tokens · 33034 ms · 2026-05-25T17:29:53.611523+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

4 extracted references · 4 canonical work pages · 1 internal anchor

[1]

Eternity in six hours: Intergalactic spreading of intelligent life and sharpening the Fermi paradox

Armstrong, Stuart, and Anders Sandberg. "Eternity in six hours: Intergalactic spreading of intelligent life and sharpening the Fermi paradox." Acta Astronautica 89 (2013): 1-13. Bostrom, Nick. Superintelligence: Paths, Dangers, Strategies. Oxford: Oxford University Press,

work page 2013
[2]

Hyperbolic Discounting and Learning over Multiple Horizons

Fedus, William, Carles Gelada, Yoshua Bengio, Marc G. Bellemare, and Hugo Larochelle. "Hyperbolic Discounting and Learning over Multiple Horizons." arXiv preprint arXiv:1902.06865 (2019). Frederick, Shane, George Loewenstein, and Ted O'donoghue. "Time discounting and time preference: A critical review." Journal of economic literature 40, no. 2 (2002): 351...

work page internal anchor Pith review Pith/arXiv arXiv 1902
[3]

Consistent planning

Pollak, Robert A. "Consistent planning." The Review of Economic Studies 35, no. 2 (1968): 201-208. Samuelson, Paul A. "A note on measurement of utility." The review of economic studies 4, no. 2 (1937): 155-161. Soares, Nate, Benja Fallenstein, Stuart Armstrong, and Eliezer Yudkowsky. "Corrigibility." In Workshops at the Twenty-Ninth AAAI Conference on Art...

work page 1968
[4]

On hyperbolic discounting and uncertain hazard rates

Sozou, Peter D. "On hyperbolic discounting and uncertain hazard rates." Proceedings of the Royal Society of London. Series B: Biological Sciences 265, no. 1409 (1998): 2015-2020. Yampolskiy, Roman V. Artificial superintelligence: a futuristic approach. Chapman and Hall/CRC,

work page 1998

[1] [1]

Eternity in six hours: Intergalactic spreading of intelligent life and sharpening the Fermi paradox

Armstrong, Stuart, and Anders Sandberg. "Eternity in six hours: Intergalactic spreading of intelligent life and sharpening the Fermi paradox." Acta Astronautica 89 (2013): 1-13. Bostrom, Nick. Superintelligence: Paths, Dangers, Strategies. Oxford: Oxford University Press,

work page 2013

[2] [2]

Hyperbolic Discounting and Learning over Multiple Horizons

Fedus, William, Carles Gelada, Yoshua Bengio, Marc G. Bellemare, and Hugo Larochelle. "Hyperbolic Discounting and Learning over Multiple Horizons." arXiv preprint arXiv:1902.06865 (2019). Frederick, Shane, George Loewenstein, and Ted O'donoghue. "Time discounting and time preference: A critical review." Journal of economic literature 40, no. 2 (2002): 351...

work page internal anchor Pith review Pith/arXiv arXiv 1902

[3] [3]

Consistent planning

Pollak, Robert A. "Consistent planning." The Review of Economic Studies 35, no. 2 (1968): 201-208. Samuelson, Paul A. "A note on measurement of utility." The review of economic studies 4, no. 2 (1937): 155-161. Soares, Nate, Benja Fallenstein, Stuart Armstrong, and Eliezer Yudkowsky. "Corrigibility." In Workshops at the Twenty-Ninth AAAI Conference on Art...

work page 1968

[4] [4]

On hyperbolic discounting and uncertain hazard rates

Sozou, Peter D. "On hyperbolic discounting and uncertain hazard rates." Proceedings of the Royal Society of London. Series B: Biological Sciences 265, no. 1409 (1998): 2015-2020. Yampolskiy, Roman V. Artificial superintelligence: a futuristic approach. Chapman and Hall/CRC,

work page 1998