Against Proxy Optimization

Sven Neth

arxiv: 2606.23597 · v1 · pith:7RMTDPVZnew · submitted 2026-06-22 · 💻 cs.AI

Against Proxy Optimization

Sven Neth This is my paper

Pith reviewed 2026-06-26 08:24 UTC · model grok-4.3

classification 💻 cs.AI

keywords proxy optimizationdecision theoryutility functionsharmful maximizationrational agentsapproximate utilities

0 comments

The pith

Maximizing a proxy utility function is harmful under certain conditions and poses problems for applying decision theory.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper identifies conditions where maximizing an approximate or proxy utility function produces harmful outcomes instead of better ones. It argues that these conditions create difficulties for using decision theory to guide choices. A sympathetic reader would care because decision theory underpins models of rational agents and AI systems that often rely on proxies for the true utility. If the conditions are real, then standard ways of applying decision theory need adjustment to avoid predictable failures.

Core claim

Maximizing a proxy utility function is harmful under certain conditions and this poses problems for applying decision theory.

What carries the argument

Proxy utility function, an approximation substituted for the true utility, whose maximization turns out to be harmful under the identified conditions.

If this is right

Decision theory must incorporate safeguards against proxy maximization in the relevant conditions.
Agent designs that optimize proxies can produce systematically bad results.
Practical applications of expected utility theory require checking whether the utility used is a safe proxy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The argument may extend to reward design in machine learning, where learned rewards often serve as proxies.
It could motivate work on methods that optimize the true utility without relying on approximations.
Similar issues might appear in other optimization settings where the objective is an imperfect stand-in for the desired goal.

Load-bearing premise

There exist identifiable conditions under which proxy maximization is systematically harmful in a manner that undermines decision theory applications.

What would settle it

A concrete case or formal proof showing either that no such harmful conditions exist or that they do not create problems for decision theory.

Figures

Figures reproduced from arXiv: 2606.23597 by Sven Neth.

**Figure 1.** Figure 1: Proxy failure. We have modeled states as points in n-dimensional Euclidean space. Zhuang and Hadfield-Menell (2020) assume that not every state is feasible. There is a cost function c : R n → R which measures how costly a state is to realize and state s is feasible only if c(s) ≤ 0. This captures the idea that we have finite resources and can’t maximize all features at the same time. Furthermore, features … view at source ↗

**Figure 2.** Figure 2: Two-dimensional value. higher true utility, visualizing gradients of bliss. The states below the arc going from approximately ⟨0, 4.25⟩ to ⟨4.25, 0⟩ are feasible. The feasible state with maximum utility is found by following the diagonal line where x = y to the boundary of the feasible region at x = y ≈ 3.32. Consider the proxy ˆu(⟨x, y⟩) = x. Maybe beauty can’t be measured easily so we just optimize for h… view at source ↗

**Figure 3.** Figure 3: No compactness. sense that c(s) ≤ 0 but ruled out by the lower bounds, a detail which matters for the theorem below. The example shown in [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

**Figure 4.** Figure 4: Monotonic transformation. does not. An example illustrating the basic idea is shown in [PITH_FULL_IMAGE:figures/full_fig_p018_4.png] view at source ↗

read the original abstract

I discuss conditions under which maximizing a proxy utility function is harmful and suggest this poses problems for applying decision theory.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a conceptual discussion of when proxy maximization harms decision theory use, but it offers no new formal conditions, derivations, or evidence beyond what's already in the alignment literature.

read the letter

The main takeaway is that Neth flags conditions where optimizing a proxy utility function leads to harm and claims this undercuts straightforward applications of decision theory.

The paper does a clean job of stating the basic worry in accessible terms and connecting it to decision theory without unnecessary jargon.

The soft spot is that proxy harms are a standard point in the literature, starting with Goodhart's law and running through multiple AI alignment papers. The abstract gives no indication of what the conditions actually are, how general they might be, or any derivation that would distinguish this from prior work. Without equations, specific counterexamples, or checks against existing results, the central claim stays at the level of a reminder rather than a new argument.

The reasoning holds up internally as far as it goes, but the lack of technical content means there's nothing load-bearing to verify or build on. The reader's note about the weakest assumption is accurate: we have no way to assess whether the conditions are identifiable or systematic from the given material.

This is for readers already steeped in decision theory and AI alignment who might want a short conceptual recap. It does not rise to the level that would justify referee time, since there is no result, proof, or dataset to evaluate.

Recommendation: desk reject rather than send for peer review.

Referee Report

1 major / 1 minor

Summary. The paper discusses conditions under which maximizing a proxy utility function is harmful and suggests this poses problems for applying decision theory.

Significance. If the conditions are made precise and the harm demonstrated, the discussion could inform limitations of decision theory in AI systems and alignment research. The manuscript's conceptual nature without formal theorems, counterexamples, or evidence limits its immediate impact.

major comments (1)

[Abstract] Abstract: the central claim that proxy maximization is harmful under certain conditions cannot be evaluated because no specific conditions, derivations, or supporting examples are provided; this is load-bearing for the paper's suggestion that it poses problems for decision theory.

minor comments (1)

Add citations to relevant literature on proxy objectives in reinforcement learning and decision theory to contextualize the discussion.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review. The manuscript is a short conceptual discussion note, and we address the concern about evaluability of the central claim below.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that proxy maximization is harmful under certain conditions cannot be evaluated because no specific conditions, derivations, or supporting examples are provided; this is load-bearing for the paper's suggestion that it poses problems for decision theory.

Authors: The current manuscript is intentionally brief and conceptual, summarizing the discussion of conditions without formal derivations or concrete examples in the provided text. We agree that this makes the central claim difficult to evaluate as presented and that it is load-bearing for the implications regarding decision theory. To address this, we will revise the manuscript by expanding the abstract and body to include at least one specific illustrative condition (e.g., a simple scenario where proxy optimization leads to misalignment with true utility) along with a qualitative derivation of the harm. This will make the claim more concrete while preserving the paper's discussion-oriented nature. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper consists of a conceptual discussion identifying conditions under which maximizing a proxy utility function may be harmful and suggesting implications for decision theory. No equations, formal derivations, predictions, fitted parameters, or load-bearing self-citations are present. The central claim is a modest philosophical observation without any reduction of results to inputs by construction or self-referential justification. This is a self-contained discussion with no derivation chain to inspect.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no equations, parameters, or explicit assumptions to audit.

pith-pipeline@v0.9.1-grok · 5507 in / 794 out tokens · 15247 ms · 2026-06-26T08:24:48.739389+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

53 extracted references · 2 canonical work pages

[1]

Philosophical Studies , title =

Adam Bales , doi =. Philosophical Studies , title =. 2025 , volume=

2025
[2]

arXiv preprint arXiv:2503.11926 , year=

Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation , author=. arXiv preprint arXiv:2503.11926 , year=

Pith/arXiv arXiv
[3]

Consequentialize This , volume =

Campbell Brown , doi =. Consequentialize This , volume =. Ethics , number =
[4]

On the Application of Inductive Logic , volume =

Rudolf Carnap , doi =. On the Application of Inductive Logic , volume =. Philosophy and Phenomenological Research , number =
[5]

arXiv preprint arXiv:2505.05410 , url=

Reasoning Models Don’t Always Say What They Think , author=. arXiv preprint arXiv:2505.05410 , url=

Pith/arXiv arXiv
[6]

arXiv preprint arXiv:2310.02743 , year=

Reward Model Ensembles Help Mitigate Overoptimization , author=. arXiv preprint arXiv:2310.02743 , year=

arXiv
[7]

The Algorithmic Leviathan: Arbitrariness, Fairness, and Opportunity in Algorithmic Decision-Making Systems , volume =

Kathleen Creel and Deborah Hellman , doi =. The Algorithmic Leviathan: Arbitrariness, Fairness, and Opportunity in Algorithmic Decision-Making Systems , volume =. Canadian Journal of Philosophy , number =
[8]

Current Cases of AI Misalignment and Their Implications for Future Risks , volume =

Leonard Dung , doi =. Current Cases of AI Misalignment and Their Implications for Future Risks , volume =. Synthese , number =
[9]

Synthese , volume=

Reward Tampering Problems and Solutions in Reinforcement Learning: A Causal Influence Diagram Perspective , author=. Synthese , volume=. 2021 , doi=

2021
[10]

Proceedings of the 40th International Conference on Machine Learning , pages=

Scaling Laws for Reward Model Overoptimization , author=. Proceedings of the 40th International Conference on Machine Learning , pages=. 2023 , url=

2023
[11]

Papers in Monetary Economics , pages=

Problems of Monetary Management: The UK Experience , author=. Papers in Monetary Economics , pages=. 1975 , publisher=

1975
[12]

A Query on Confirmation , doi =

Nelson Goodman , journal =. A Query on Confirmation , doi =. 1946 , pages =

1946
[13]

1955 , author =

Fact, Fiction, and Forecast , publisher =. 1955 , author =

1955
[14]

Proceedings of the 26th International Joint Conference on Artificial Intelligence,

The Off-Switch Game , author=. Proceedings of the 26th International Joint Conference on Artificial Intelligence,. 2017 , pages=

2017
[15]

Dimensions of Value , volume =

Hedden, Brian and Mu\. Dimensions of Value , volume =. doi:10.1111/nous.12454 , journal =

work page doi:10.1111/nous.12454
[16]

A Purely Syntactical Definition of Confirmation , volume =

Carl Gustav Hempel , doi =. A Purely Syntactical Definition of Confirmation , volume =. Journal of Symbolic Logic , number =
[17]

arXiv preprint arXiv:1906.01820 , url=

Risks From Learned Optimization in Advanced Machine Learning Systems , author=. arXiv preprint arXiv:1906.01820 , url=

Pith/arXiv arXiv 1906
[18]

1983 , edition=

The Logic of Decision , author=. 1983 , edition=

1983
[19]

Behavioral and Brain Sciences , volume=

Dead Rats, Dopamine, Performance Metrics, and Peacock Tails: Proxy Failure Is an Inherent Risk in Goal-Oriented Systems , author=. Behavioral and Brain Sciences , volume=. 2024 , doi=

2024
[20]

The Twelfth International Conference on Learning Representations , year=

Goodhart's Law in Reinforcement Learning , author=. The Twelfth International Conference on Learning Representations , year=
[21]

Academy of Management Journal , volume=

On the Folly of Rewarding A, While Hoping for B , author=. Academy of Management Journal , volume=
[22]

1923 , author =

A Tract on Monetary Reform , publisher =. 1923 , author =

1923
[23]

Why Be Rational? , volume =

Niko Kolodny , doi =. Why Be Rational? , volume =. Mind , number =
[24]

Economic Theory , volume=

Strength of Preference and Cardinal Utility , author=. Economic Theory , volume=. 2006 , doi=

2006
[25]

I: Additive and Polynomial Representations , year =

Foundations of Measurement, Vol. I: Additive and Polynomial Representations , year =
[26]

1988 , author =

Notes on the Theory of Choice , series =. 1988 , author =

1988
[27]

ICLR 2025 , year=

Correlated Proxies: A New Definition and Improved Mitigation for Reward Hacking , author=. ICLR 2025 , year=

2025
[28]

Are Interpersonal Comparisons of Utility Indeterminate? , volume =

Christian List , doi =. Are Interpersonal Comparisons of Utility Indeterminate? , volume =. Erkenntnis , number =
[29]

2025 , journal=

Agentic Misalignment: How LLMs Could be an Insider Threat , author=. 2025 , journal=

2025
[30]

arXiv preprint arXiv:1803.04585 , url=

Categorizing Variants of Goodhart's Law , author=. arXiv preprint arXiv:1803.04585 , url=

Pith/arXiv arXiv
[31]

arXiv preprint arXiv:2310.04373 , url=

Confronting Reward Model Overoptimization With Constrained RLHF , author=. arXiv preprint arXiv:2310.04373 , url=

arXiv
[32]

Science , volume =

Melanie Mitchell , title =. Science , volume =. 2025 , doi =

2025
[33]

Science , volume=

The Turing Test and Our Shifting Conceptions of Intelligence , author=. Science , volume=. doi:10.1126/science.adq9356 , year=

work page doi:10.1126/science.adq9356
[34]

Econometrica , volume=

The Bargaining Problem , author=. Econometrica , volume=. 1950 , doi=

1950
[35]

Nebel , doi =

Jacob M. Nebel , doi =. The Sum of Well-Being , volume =. Mind , number =
[36]

Off-Switching Not Guaranteed , volume =

Sven Neth , doi =. Off-Switching Not Guaranteed , volume =. Philosophical Studies , pages =
[37]

A Dilemma for Solomonoff Prediction , volume =

Sven Neth , doi =. A Dilemma for Solomonoff Prediction , volume =. Philosophy of Science , number =
[38]

Thi Nguyen , doi =

C. Thi Nguyen , doi =. Value Capture , volume =. Journal of Ethics and Social Philosophy , number =
[39]

Thi Nguyen , doi =

C. Thi Nguyen , doi =. Hostile Epistemology , volume =. Social Philosophy Today , pages =
[40]

Stewart , doi =

Michael Nielsen and Rush T. Stewart , doi =. Persistent Disagreement and Polarization in a Bayesian Setting , volume =. British Journal for the Philosophy of Science , number =
[41]

Proceedings of the 41th International Conference on Machine Learning , pages=

Feedback Loops With Language Models Drive In-Context Reward Hacking , author=. Proceedings of the 41th International Conference on Machine Learning , pages=. 2024 , url=

2024
[42]

A Brief History of Equality , author=
[43]

Ramsey , booktitle =

Frank P. Ramsey , booktitle =. Truth and Probability , pages =. 1926 , edition =

1926
[44]

Principles of Mathematical Analysis , author=
[45]

Infinite Prospects , volume =

Jeffrey Sanford Russell and Yoaav Isaacs , doi =. Infinite Prospects , volume =. Philosophy and Phenomenological Research , number =
[46]

1972 , publisher =

The Foundations of Statistics , author =. 1972 , publisher =

1972
[47]

American Economic Review , volume=

The Possibility of Social Choice , author=. American Economic Review , volume=. 1999 , doi=

1999
[48]

Advances in Neural Information Processing Systems , volume=

Defining and Characterizing Reward Hacking , author=. Advances in Neural Information Processing Systems , volume=
[49]

Theory of Games and Economic Behavior , year =
[50]

arXiv preprint arXiv:2411.02306 , url=

On Targeted Manipulation and Deception When Optimizing LLMs for User Feedback , author=. arXiv preprint arXiv:2411.02306 , url=

arXiv
[51]

Wandering Significance: An Essay on Conceptual Behavior , year =

Mark Wilson , publisher =. Wandering Significance: An Essay on Conceptual Behavior , year =
[52]

Patterns , volume=

Reliance on Metrics Is a Fundamental Challenge for AI , author=. Patterns , volume=. 2022 , doi=

2022
[53]

Advances in Neural Information Processing Systems , volume=

Consequences of Misaligned AI , author=. Advances in Neural Information Processing Systems , volume=

[1] [1]

Philosophical Studies , title =

Adam Bales , doi =. Philosophical Studies , title =. 2025 , volume=

2025

[2] [2]

arXiv preprint arXiv:2503.11926 , year=

Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation , author=. arXiv preprint arXiv:2503.11926 , year=

Pith/arXiv arXiv

[3] [3]

Consequentialize This , volume =

Campbell Brown , doi =. Consequentialize This , volume =. Ethics , number =

[4] [4]

On the Application of Inductive Logic , volume =

Rudolf Carnap , doi =. On the Application of Inductive Logic , volume =. Philosophy and Phenomenological Research , number =

[5] [5]

arXiv preprint arXiv:2505.05410 , url=

Reasoning Models Don’t Always Say What They Think , author=. arXiv preprint arXiv:2505.05410 , url=

Pith/arXiv arXiv

[6] [6]

arXiv preprint arXiv:2310.02743 , year=

Reward Model Ensembles Help Mitigate Overoptimization , author=. arXiv preprint arXiv:2310.02743 , year=

arXiv

[7] [7]

The Algorithmic Leviathan: Arbitrariness, Fairness, and Opportunity in Algorithmic Decision-Making Systems , volume =

Kathleen Creel and Deborah Hellman , doi =. The Algorithmic Leviathan: Arbitrariness, Fairness, and Opportunity in Algorithmic Decision-Making Systems , volume =. Canadian Journal of Philosophy , number =

[8] [8]

Current Cases of AI Misalignment and Their Implications for Future Risks , volume =

Leonard Dung , doi =. Current Cases of AI Misalignment and Their Implications for Future Risks , volume =. Synthese , number =

[9] [9]

Synthese , volume=

Reward Tampering Problems and Solutions in Reinforcement Learning: A Causal Influence Diagram Perspective , author=. Synthese , volume=. 2021 , doi=

2021

[10] [10]

Proceedings of the 40th International Conference on Machine Learning , pages=

Scaling Laws for Reward Model Overoptimization , author=. Proceedings of the 40th International Conference on Machine Learning , pages=. 2023 , url=

2023

[11] [11]

Papers in Monetary Economics , pages=

Problems of Monetary Management: The UK Experience , author=. Papers in Monetary Economics , pages=. 1975 , publisher=

1975

[12] [12]

A Query on Confirmation , doi =

Nelson Goodman , journal =. A Query on Confirmation , doi =. 1946 , pages =

1946

[13] [13]

1955 , author =

Fact, Fiction, and Forecast , publisher =. 1955 , author =

1955

[14] [14]

Proceedings of the 26th International Joint Conference on Artificial Intelligence,

The Off-Switch Game , author=. Proceedings of the 26th International Joint Conference on Artificial Intelligence,. 2017 , pages=

2017

[15] [15]

Dimensions of Value , volume =

Hedden, Brian and Mu\. Dimensions of Value , volume =. doi:10.1111/nous.12454 , journal =

work page doi:10.1111/nous.12454

[16] [16]

A Purely Syntactical Definition of Confirmation , volume =

Carl Gustav Hempel , doi =. A Purely Syntactical Definition of Confirmation , volume =. Journal of Symbolic Logic , number =

[17] [17]

arXiv preprint arXiv:1906.01820 , url=

Risks From Learned Optimization in Advanced Machine Learning Systems , author=. arXiv preprint arXiv:1906.01820 , url=

Pith/arXiv arXiv 1906

[18] [18]

1983 , edition=

The Logic of Decision , author=. 1983 , edition=

1983

[19] [19]

Behavioral and Brain Sciences , volume=

Dead Rats, Dopamine, Performance Metrics, and Peacock Tails: Proxy Failure Is an Inherent Risk in Goal-Oriented Systems , author=. Behavioral and Brain Sciences , volume=. 2024 , doi=

2024

[20] [20]

The Twelfth International Conference on Learning Representations , year=

Goodhart's Law in Reinforcement Learning , author=. The Twelfth International Conference on Learning Representations , year=

[21] [21]

Academy of Management Journal , volume=

On the Folly of Rewarding A, While Hoping for B , author=. Academy of Management Journal , volume=

[22] [22]

1923 , author =

A Tract on Monetary Reform , publisher =. 1923 , author =

1923

[23] [23]

Why Be Rational? , volume =

Niko Kolodny , doi =. Why Be Rational? , volume =. Mind , number =

[24] [24]

Economic Theory , volume=

Strength of Preference and Cardinal Utility , author=. Economic Theory , volume=. 2006 , doi=

2006

[25] [25]

I: Additive and Polynomial Representations , year =

Foundations of Measurement, Vol. I: Additive and Polynomial Representations , year =

[26] [26]

1988 , author =

Notes on the Theory of Choice , series =. 1988 , author =

1988

[27] [27]

ICLR 2025 , year=

Correlated Proxies: A New Definition and Improved Mitigation for Reward Hacking , author=. ICLR 2025 , year=

2025

[28] [28]

Are Interpersonal Comparisons of Utility Indeterminate? , volume =

Christian List , doi =. Are Interpersonal Comparisons of Utility Indeterminate? , volume =. Erkenntnis , number =

[29] [29]

2025 , journal=

Agentic Misalignment: How LLMs Could be an Insider Threat , author=. 2025 , journal=

2025

[30] [30]

arXiv preprint arXiv:1803.04585 , url=

Categorizing Variants of Goodhart's Law , author=. arXiv preprint arXiv:1803.04585 , url=

Pith/arXiv arXiv

[31] [31]

arXiv preprint arXiv:2310.04373 , url=

Confronting Reward Model Overoptimization With Constrained RLHF , author=. arXiv preprint arXiv:2310.04373 , url=

arXiv

[32] [32]

Science , volume =

Melanie Mitchell , title =. Science , volume =. 2025 , doi =

2025

[33] [33]

Science , volume=

The Turing Test and Our Shifting Conceptions of Intelligence , author=. Science , volume=. doi:10.1126/science.adq9356 , year=

work page doi:10.1126/science.adq9356

[34] [34]

Econometrica , volume=

The Bargaining Problem , author=. Econometrica , volume=. 1950 , doi=

1950

[35] [35]

Nebel , doi =

Jacob M. Nebel , doi =. The Sum of Well-Being , volume =. Mind , number =

[36] [36]

Off-Switching Not Guaranteed , volume =

Sven Neth , doi =. Off-Switching Not Guaranteed , volume =. Philosophical Studies , pages =

[37] [37]

A Dilemma for Solomonoff Prediction , volume =

Sven Neth , doi =. A Dilemma for Solomonoff Prediction , volume =. Philosophy of Science , number =

[38] [38]

Thi Nguyen , doi =

C. Thi Nguyen , doi =. Value Capture , volume =. Journal of Ethics and Social Philosophy , number =

[39] [39]

Thi Nguyen , doi =

C. Thi Nguyen , doi =. Hostile Epistemology , volume =. Social Philosophy Today , pages =

[40] [40]

Stewart , doi =

Michael Nielsen and Rush T. Stewart , doi =. Persistent Disagreement and Polarization in a Bayesian Setting , volume =. British Journal for the Philosophy of Science , number =

[41] [41]

Proceedings of the 41th International Conference on Machine Learning , pages=

Feedback Loops With Language Models Drive In-Context Reward Hacking , author=. Proceedings of the 41th International Conference on Machine Learning , pages=. 2024 , url=

2024

[42] [42]

A Brief History of Equality , author=

[43] [43]

Ramsey , booktitle =

Frank P. Ramsey , booktitle =. Truth and Probability , pages =. 1926 , edition =

1926

[44] [44]

Principles of Mathematical Analysis , author=

[45] [45]

Infinite Prospects , volume =

Jeffrey Sanford Russell and Yoaav Isaacs , doi =. Infinite Prospects , volume =. Philosophy and Phenomenological Research , number =

[46] [46]

1972 , publisher =

The Foundations of Statistics , author =. 1972 , publisher =

1972

[47] [47]

American Economic Review , volume=

The Possibility of Social Choice , author=. American Economic Review , volume=. 1999 , doi=

1999

[48] [48]

Advances in Neural Information Processing Systems , volume=

Defining and Characterizing Reward Hacking , author=. Advances in Neural Information Processing Systems , volume=

[49] [49]

Theory of Games and Economic Behavior , year =

[50] [50]

arXiv preprint arXiv:2411.02306 , url=

On Targeted Manipulation and Deception When Optimizing LLMs for User Feedback , author=. arXiv preprint arXiv:2411.02306 , url=

arXiv

[51] [51]

Wandering Significance: An Essay on Conceptual Behavior , year =

Mark Wilson , publisher =. Wandering Significance: An Essay on Conceptual Behavior , year =

[52] [52]

Patterns , volume=

Reliance on Metrics Is a Fundamental Challenge for AI , author=. Patterns , volume=. 2022 , doi=

2022

[53] [53]

Advances in Neural Information Processing Systems , volume=

Consequences of Misaligned AI , author=. Advances in Neural Information Processing Systems , volume=