pith. sign in

arxiv: 2511.17714 · v5 · submitted 2025-11-21 · 💻 cs.AI · cs.GT

Learning the Value of Value Learning

Pith reviewed 2026-05-17 20:07 UTC · model grok-4.3

classification 💻 cs.AI cs.GT
keywords Jeffrey-Bolker frameworkvalue of informationaxiological refinementzero-sum gamesNash bargainingrational choice theoryethical deliberationmulti-agent decision making
0
0 comments X

The pith

Extending the Jeffrey-Bolker framework to model value refinements proves a value-of-information theorem and shows mutual refinement converts zero-sum games into positive-sum interactions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper extends the Jeffrey-Bolker framework to model refinements in values and proves a value-of-information theorem for axiological refinement. In multi-agent settings, it establishes that mutual refinement will characteristically transform zero-sum games into positive-sum interactions and yield Pareto-improvements in Nash bargaining. This unifies epistemic and axiological refinement under a single formalism. A sympathetic reader would care because it supplies a formal way to treat value change as part of rational choice rather than external to it.

Core claim

We extend the Jeffrey-Bolker framework to model refinements in values and prove a value-of-information theorem for axiological refinement. In multi-agent settings, we establish that mutual refinement will characteristically transform zero-sum games into positive-sum interactions and yield Pareto-improvements in Nash bargaining. These results show that a framework of rational choice can be extended to model value refinement. By unifying epistemic and axiological refinement under a single formalism, we broaden the conceptual foundations of rational choice and illuminate the normative status of ethical deliberation.

What carries the argument

The Jeffrey-Bolker probability-and-utility structure extended to represent axiological refinements and update them analogously to factual uncertainty, enabling value-of-information calculations and game-theoretic results.

If this is right

  • Value-of-information calculations apply directly to axiological refinement.
  • Mutual axiological refinement transforms zero-sum games into positive-sum interactions.
  • Mutual refinement produces Pareto improvements in Nash bargaining.
  • Ethical deliberation acquires normative status inside a unified rational-choice formalism.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same structure might be used to quantify the expected benefit of value clarification exercises in negotiations.
  • It opens a route to treat moral uncertainty as a form of uncertainty that agents can rationally reduce.
  • Experimental tests could check whether subjects who refine stated values during play shift from competitive to cooperative equilibria more often than controls.

Load-bearing premise

Axiological refinements can be represented and updated inside the Jeffrey-Bolker probability-and-utility structure in a manner sufficiently analogous to factual uncertainty to support the same value-of-information calculations and game-theoretic conclusions.

What would settle it

A concrete multi-agent model or game in which parties update their values yet the interaction remains zero-sum without producing the predicted positive-sum transformation or Pareto improvement in bargaining outcomes.

Figures

Figures reproduced from arXiv: 2511.17714 by Alex John London, Aydin Mohseni.

Figure 1
Figure 1. Figure 1: A binary refinement of act A ∈ A0. The initial act partition A0 consists of acts A and ¬A. Refinement produces A1 by splitting A into the more fine-grained acts A ∧ B1 and A ∧ B2. 3.3.2. The Bounded Agent. The agents we study have not undergone arbitrary refinement. Their initial representation is D0 = ⟨A0, A0, P0, U0⟩ where A0 ⊂ A is a coarse-grained subalgebra; A0 is the maximally fine-grained coarsening… view at source ↗
Figure 2
Figure 2. Figure 2: Refinement transforms commitment to an average of a coarse-grained bundle into the ability to select the best component among it fine-grained elements. Under RRP and refinement un￾certainty, EµA [max{u1, u2}] > EµA [qu1 + (1 − q)u2]. The expected maximum of a non-uniform bundle exceeds its expected mean. Theorem 4 (Value of Value Refinement). Consider an agent with decision prob￾lem D0 = ⟨A0, A0, P0, U0⟩ a… view at source ↗
Figure 3
Figure 3. Figure 3: Visualization of how refinement can dissolve a value conflict with two value dimensions, V1, V2, and two acts, A, and ¬A. The left vertical axis denotes the degree of realization of the first value V1; the right vertical axis denotes the degree of realization of the second V2. Figure 3a shows the dilemmas: A is favored by V1 and ¬A is favored by V2. Figures 3b and 3c show two possible results of refining a… view at source ↗
Figure 4
Figure 4. Figure 4: Transformation of a 2 × 2 zero-sum game into a 3 × 2 game through refinement. The row player refines A1 into {A1 ∧ B1, A1 ∧ B2}; payoffs are perturbed by noise terms ϵ i jk. is zero-sum, W∗ 0 = 0. We show that refinement yields strict expected improvement in welfare. Theorem 10 (Zero-Sum Escape from Unilateral Value Refinement). Consider a 2 × 2 zero-sum, normal form game G0 = (N , S, U) with players N = {… view at source ↗
Figure 5
Figure 5. Figure 5: Refinement expands the feasible set from a line (left) to a rectangle (right) when preferences are orthogonal. The dashed line shows bundled allocations; the full shaded region includes al￾locations where dimensions are allocated independently. When θ = π/2, the Nash solution moves from ( 1 2 , 1 2 ) to (1, 1), doubling both agents’ payoffs. Theorem 11 (Value of Refinement in Nash Bargaining). Consider a s… view at source ↗
read the original abstract

Standard decision frameworks address uncertainty about facts but assume fixed options and values. We extend the Jeffrey-Bolker framework to model refinements in values and prove a value-of-information theorem for axiological refinement. In multi-agent settings, we establish that mutual refinement will characteristically transform zero-sum games into positive-sum interactions and yield Pareto-improvements in Nash bargaining. These results show that a framework of rational choice can be extended to model value refinement. By unifying epistemic and axiological refinement under a single formalism, we broaden the conceptual foundations of rational choice and illuminate the normative status of ethical deliberation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper extends the Jeffrey-Bolker framework to incorporate axiological refinement (updates to an agent's values) alongside factual uncertainty. It proves a value-of-information theorem for such refinements in the single-agent setting and claims that, in multi-agent interactions, mutual refinement characteristically converts zero-sum games into positive-sum ones while producing Pareto improvements in Nash bargaining outcomes. The work positions this as a unification of epistemic and axiological uncertainty within rational choice theory.

Significance. If the formal results hold, the paper would offer a technically grounded way to model value learning inside an existing decision-theoretic structure, with implications for understanding ethical deliberation and multi-agent coordination. The single-agent value-of-information result is a natural extension that could be useful for AI systems that update preferences; the multi-agent claims, if substantiated without hidden correlation assumptions, would strengthen the normative case for value alignment processes.

major comments (1)
  1. [Multi-agent section] Multi-agent analysis (around the claims on zero-sum to positive-sum transformation and Nash bargaining): the result that mutual axiological refinement 'characteristically' yields positive-sum interactions and Pareto improvements appears to rest on an unstated assumption that independent refinements are sufficiently correlated to expand or realign the joint feasible set. If two agents refine toward mutually incompatible terminal values, the post-refinement payoff matrix can remain zero-sum or become negative-sum. The manuscript should either introduce an explicit correlation mechanism or restrict the 'characteristic' claim to a precisely defined class of refinement processes; without this, the game-theoretic conclusions do not follow from the single-agent extension alone.
minor comments (2)
  1. [Abstract and Introduction] The abstract and introduction could more explicitly distinguish the technical extension of the Jeffrey-Bolker algebra (e.g., how value-laden atoms are added and how conditioning is defined on them) from the interpretive claims about ethical deliberation.
  2. [Formal framework] Notation for the extended probability-utility pair should be introduced once and used consistently; occasional shifts between 'refinement' and 'conditioning' language can be clarified in a dedicated definitions subsection.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We are grateful to the referee for their constructive feedback on our paper. The comment on the multi-agent analysis raises a valid point about the need for explicit assumptions regarding the correlation of value refinements. We address this below and will make corresponding revisions to the manuscript.

read point-by-point responses
  1. Referee: [Multi-agent section] Multi-agent analysis (around the claims on zero-sum to positive-sum transformation and Nash bargaining): the result that mutual axiological refinement 'characteristically' yields positive-sum interactions and Pareto improvements appears to rest on an unstated assumption that independent refinements are sufficiently correlated to expand or realign the joint feasible set. If two agents refine toward mutually incompatible terminal values, the post-refinement payoff matrix can remain zero-sum or become negative-sum. The manuscript should either introduce an explicit correlation mechanism or restrict the 'characteristic' claim to a precisely defined class of refinement processes; without this, the game-theoretic conclusions do not follow from the single-agent extension alone.

    Authors: We thank the referee for this insightful comment. The manuscript uses 'characteristically' to refer to refinements that occur in a shared informational environment, where agents' value updates are modeled as converging towards a common underlying axiology, thereby inducing positive correlation. This is implicit in the extension from the single-agent value-of-information result, where refinements improve accuracy. However, we agree that this should be made explicit to avoid any ambiguity. In the revised version, we will add a section clarifying the correlation structure of refinements, introduce a formal parameter for the degree of correlation between agents' refinement processes, and state the theorem under the condition that the correlation is positive. We will also discuss the case of incompatible refinements as a boundary condition where the positive-sum transformation may not hold. This ensures the game-theoretic conclusions are properly qualified and follow from the single-agent framework with the added correlation assumption. revision: yes

Circularity Check

0 steps flagged

Derivation chain is self-contained with no reductions to inputs or self-citations

full rationale

The paper extends the Jeffrey-Bolker framework by enriching the algebra to represent axiological refinements as a form of conditioning, then derives the value-of-information theorem and multi-agent game-theoretic results directly from this formal extension. The single-agent theorem follows from the updated probability-utility structure, and the multi-agent claims about zero-sum to positive-sum transformations and Pareto improvements in Nash bargaining are obtained by applying the same conditioning operation to joint payoff structures under the stated assumptions. No equations or steps reduce by construction to fitted parameters, prior self-citations, or definitional equivalences; the central results are obtained by standard application of the extended formalism rather than presupposing the target conclusions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Only the abstract is available, so the ledger is necessarily incomplete. The work rests on the background assumptions of the Jeffrey-Bolker model and on the unstated premise that value refinement can be formalized inside that model.

axioms (1)
  • standard math Jeffrey-Bolker framework for decision under uncertainty
    The paper states that it extends this existing framework.

pith-pipeline@v0.9.0 · 5379 in / 1243 out tokens · 95265 ms · 2026-05-17T20:07:17.854801+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages

  1. [1]

    Arrow, K. J. and G. Debreu (1954). Existence of an equilibrium for a competitive economy. Econometrica\/ 22\/ (3), 265--290. Sections 2--4

  2. [2]

    Favaro, and Z

    Balocchi, C., S. Favaro, and Z. Naulet (2025). Bayesian nonparametric inference for ``species-sampling'' problems. Statistical Science\/ . To appear; see IMS Statistical Science Future Papers. Preprint available as arXiv:2203.06076

  3. [3]

    Bjorndahl, A., A. J. London, and K. J. S. Zollman (2017, April). Kantian decision making under uncertainty: Dignity, price, and consistency. Philosophers' Imprint\/ 17\/ (7), 1--22

  4. [4]

    Bolker, E. D. (1966). Functions resembling quotients of measures. Transactions of the American Mathematical Society\/ 124\/ (2), 292--312

  5. [5]

    Bradley, R. (2017). Decision Theory with a Human Face . Cambridge: Cambridge University Press. Part IV develops the treatment of unawareness and awareness growth as algebra changes

  6. [6]

    Cyert, R. M. and M. H. DeGroot (1975). Adaptive utility. In R. H. Day and T. Groves (Eds.), Adaptive Economic Models , pp.\ 223--246. New York: Academic Press

  7. [7]

    Cyert, R. M. and M. H. DeGroot (1979). Adaptive utility. In M. Allais and O. Hagen (Eds.), Expected Utility Hypotheses and the Allais Paradox , Theory and Decision Library, pp.\ 223--241. Dordrecht: D. Reidel

  8. [8]

    de Finetti, B. (1937). La prévision: Ses lois logiques, ses sources subjectives. Annales de l'Institut Henri Poincaré\/ 7 , 1--68

  9. [9]

    Dewey, J. and J. H. Tufts (1936). Ethics\/ (Revised edition ed.). New York: H. Holt

  10. [10]

    Dorst, K. (2024). Reflection without idealisation. manuscript, 2024

  11. [11]

    Edgeworth, F. Y. (1881). Mathematical Psychics: An Essay on the Application of Mathematics to the Moral Sciences . London: Kegan Paul. Book I, Chapters I--II

  12. [12]

    Gibbard, A. and W. L. Harper (1978). Counterfactuals and two kinds of expected utility. In Foundations and Applications of Decision Theory . Reidel

  13. [13]

    Gigerenzer, G. and R. Selten (2002). Bounded Rationality: The Adaptive Toolbox . MIT Press

  14. [14]

    Good, I. J. (1967). On the principle of total evidence. The British Journal for the Philosophy of Science\/ 17\/ (4), 319--321

  15. [15]

    Greaves, H. and D. Wallace (2006). Justifying conditionalisation. Mind\/ 115\/ (459), 607--651

  16. [16]

    Halpern, J. Y. and L. C. R \^e go (2009, November). Reasoning about knowledge of unawareness. Games and Economic Behavior\/ 67\/ (2), 503--525

  17. [17]

    Halpern, J. Y. and L. C. R \^e go (2013). Reasoning about knowledge of unawareness revisited. Mathematical Social Sciences\/ 65\/ (2), 73--84

  18. [18]

    Hammond, P. J. (1988). Consequentialist foundations for expected utility. Theory and Decision\/ 25\/ (1), 25--78

  19. [19]

    Hirshleifer, J. (1971). The private and social value of information and the reward to inventive activity. American Economic Review\/ 61\/ (4), 561--574

  20. [20]

    Huttegger, S. M. (2013). In defense of reflection. Philosophy of Science\/ 80\/ (3), 413--433

  21. [21]

    Jeffrey, R. C. (1965). The Logic of Decision . McGraw-Hill

  22. [22]

    Joyce, J. M. (1998). A nonpragmatic vindication of probabilism. Philosophy of Science\/ 65\/ (4), 575--603

  23. [23]

    Kadane, J. B., M. J. Schervish, and T. Seidenfeld (2008). Is ignorance bliss? Journal of Philosophy\/ 105\/ (1), 5--36

  24. [24]

    Kahneman, D. (2003). Maps of bounded rationality: Psychology for behavioral economics. American Economic Review\/ 93\/ (5), 1449--1475

  25. [25]

    Kemeny, J. G. (1955). Fair bets and inductive probabilities. Journal of Symbolic Logic\/ 20\/ (3), 263--273

  26. [26]

    Levi, I. (1990). Hard choices: Decision making under unresolved conflict . Cambridge University Press

  27. [27]

    Lippman, S. A. and J. J. McCall (1976). The economics of job search: A survey. Economic Inquiry\/ 14\/ (2), 155--189

  28. [28]

    McCall, J. J. (1970). Economics of information and job search. Quarterly Journal of Economics\/ 84\/ (1), 113--126

  29. [29]

    McClennen, E. F. (1990). Rationality and Dynamic Choice: Foundational Explorations . Cambridge University Press

  30. [30]

    Mortensen, D. T. (1986). Job search and labor market analysis. In O. Ashenfelter and R. Layard (Eds.), Handbook of Labor Economics, Vol. 2 , pp.\ 849--919. Elsevier

  31. [31]

    Nash, J. F. (1950). The bargaining problem. Econometrica\/ 18\/ (2), 155--162

  32. [32]

    Paul, L. A. (2014). Transformative experience . OUP Oxford

  33. [33]

    Pettigrew, R. (2015). Transformative experience and decision theory

  34. [34]

    Pettigrew, R. (2016). Accuracy and the Laws of Credence . Oxford University Press

  35. [35]

    Pettigrew, R. (2019). Choosing for changing selves . Oxford University Press

  36. [36]

    Pettigrew, R. (2024). How should your beliefs change when your awareness grows? Episteme\/ 21\/ (3), 733--757

  37. [37]

    Ramsey, F. P. (1931). Truth and probability. In The Foundations of Mathematics and Other Logical Essays . Routledge

  38. [38]

    Ramsey, F. P. (1990). Weight or the value of knowledge. British Journal for the Philosophy of Science\/ 41\/ (1), 1--4. Posthumously published from archival manuscripts (c.\ 1928--29); preamble by Nils-Eric Sahlin

  39. [39]

    R \^e go, L. C. and J. Y. Halpern (2012, February). Generalized solution concepts in games with possibly unaware players. International Journal of Game Theory\/ 41\/ (1), 131--155

  40. [40]

    Ricardo, D. (1817). On the Principles of Political Economy and Taxation . London: John Murray. Chapter VII: ``On Foreign Trade''

  41. [41]

    Savage, L. J. (1954). The Foundations of Statistics . Wiley

  42. [42]

    Schoenfield, M. (2017). The accuracy and rationality of imprecise credences. Noûs\/ 51\/ (4), 667--685

  43. [43]

    Seidenfeld, T. (2009). When normalizing fails: The case of act--state dependence. Unpublished manuscript / workshop presentation

  44. [44]

    Simon, H. A. (1955). A behavioral model of rational choice. The Quarterly Journal of Economics\/ 69\/ (1), 99--118

  45. [45]

    Skyrms, B. (1985). Choice and Chance: An Introduction to Inductive Logic . Wadsworth

  46. [46]

    Skyrms, B. (1990). The value of knowledge. In Minnesota Studies in the Philosophy of Science , Volume 14, pp.\ 245--266. Minneapolis: University of Minnesota Press

  47. [47]

    Steele, K. and H. O. Stefánsson (2021). Beyond Uncertainty: Reasoning with Unknown Possibilities . Cambridge Elements in Decision Theory and Philosophy. Cambridge, UK: Cambridge University Press

  48. [48]

    Stigler, G. J. (1961). The economics of information. Journal of Political Economy\/ 69\/ (3), 213--225

  49. [49]

    Ullmann-Margalit, E. (2006). Big decisions: Opting, converting, drifting1. Royal Institute of Philosophy Supplements\/ 58 , 157--172

  50. [50]

    van Fraassen, B. C. (1984). Belief and the will. In Midwest Studies in Philosophy IX , pp.\ 428--463

  51. [51]

    von Neumann, J. and O. Morgenstern (1944). Theory of Games and Economic Behavior . Princeton University Press

  52. [52]

    Walker, S. G. (2013). Bayesian inference with misspecified models. Journal of Statistical Planning and Inference\/ 143\/ (10), 1621--1633