pith. sign in

arxiv: 2510.24872 · v2 · submitted 2025-10-28 · 💻 cs.GT

What Are People's Actual Utility Functions in Budget Aggregation?

Pith reviewed 2026-05-18 03:28 UTC · model grok-4.3

classification 💻 cs.GT
keywords budget aggregationutility functionspairwise comparisonsstar-shaped preferencessingle-peaked preferencesempirical validationhuman preferencesasymmetric utilities
0
0 comments X

The pith

Standard utility models like L1, L2 and Leontief fail to match how people evaluate budget allocations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors built an online polling tool that first records each participant's ideal budget split and then presents carefully chosen pairs of nearby allocations to check which properties their choices satisfy. Hundreds of participants took part, and the data show that almost none of them made choices consistent with any single classical model such as an L1, L2 or Leontief utility function. At the same time, the same choices line up well with broader geometric properties: star-shaped sets, multi-dimensional single-peakedness, and peak-linear shapes. The results also reveal systematic asymmetries, both between gains and losses and across different spending categories, which rules out any simple distance-based utility. These patterns indicate that budget-aggregation rules will need to be built on more flexible descriptions of voter preferences.

Core claim

The central claim is that empirical pairwise choices reject the hypothesis that voters' preferences over budget vectors are captured by any one of the standard parametric families (L1, L2, Leontief), while supporting the weaker but still restrictive properties of star-shaped, multi-dimensional single-peaked, and peak-linear preferences; the data further show that most voters treat positive and negative deviations asymmetrically and treat different budget lines asymmetrically, contradicting every Lp-metric model.

What carries the argument

A modular polling system that elicits an ideal allocation and then generates targeted pairs of non-ideal allocations to test consistency with specific preference properties.

If this is right

  • Aggregation rules designed for L1 or Euclidean distance will produce outcomes that most voters do not regard as closest to their ideal.
  • Mechanisms that assume symmetric treatment of over- and under-spending on each line will mismatch the observed sign asymmetries.
  • Practical budget-aggregation systems must accommodate star-shaped and peak-linear preferences rather than relying on a single norm.
  • Issue-specific asymmetries imply that a uniform distance metric across budget lines cannot represent most voters.
  • Designers should test new rules against the broader class of star-shaped and single-peaked preferences instead of classical parametric families.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same testing framework could be applied to other collective-choice settings such as multi-issue voting or resource division in teams.
  • Models that combine peak-linear shapes with sign-dependent weights may yield better predictive accuracy than purely geometric descriptions.
  • Field experiments that replace the lab pairs with real budget proposals could check whether the lab-detected properties survive strategic or contextual effects.
  • If the asymmetries persist, aggregation rules might need separate gain and loss parameters for each spending category rather than a single distance function.

Load-bearing premise

The pair-generation algorithms and consistency checks correctly isolate the targeted properties without being confounded by noise, inattention, or strategic answering.

What would settle it

A replication in which a majority of participants produce choice patterns that fit one of the standard models (L1, L2 or Leontief) across the tested pairs would falsify the claim that those models fail to capture human preferences.

Figures

Figures reproduced from arXiv: 2510.24872 by Ayelet Amster, Erel Segal-Halevi, Lioz Akirav, Rica Gonen.

Figure 1
Figure 1. Figure 1: Initial screen where participants enter their ideal budget allocation across education, health, and defense. [PITH_FULL_IMAGE:figures/full_fig_p010_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Example of a pairwise comparison question, where participants are asked to choose between two alternative allocations. [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: A question where participants are required to rank the three options. [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: An example of the automatic budget rescaling feature. (Top) A participant’s initial allocation that does not sum to 100. [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Preference matrix for a participant. Rows correspond to topics, columns correspond to magnitude levels. Each cell is colored [PITH_FULL_IMAGE:figures/full_fig_p020_5.png] view at source ↗
read the original abstract

Budget aggregation is a process in which citizens vote by declaring their individual ideal budget allocation, and a pre-determined rule aggregates all votes into a single outcome. Recent theoretical work has proposed various aggregation rules, along with impossibility results for satisfying desirable axioms simultaneously. These analyses rely on assumptions about how voters evaluate non-ideal allocations, yet such assumptions have not been empirically validated on human subjects. We present a framework for empirically testing hypotheses about human utility functions using simple pairwise comparisons. We introduce a modular, open-source polling system that, after eliciting a subject's ideal allocation, presents carefully generated pairs of non-ideal alternatives. Different pair-generation algorithms allow testing various properties of utility functions. Using this framework, we conduct polls with hundreds of participants. The results show that standard utility models, including $\ell_1$, $\ell_2$, and Leontief, fail to capture human preferences, as very few participants behave consistently with any single model. In contrast, we find strong empirical support for more general properties, such as star-shaped, multi-dimensional single-peaked, and peak-linear preferences. We also find that most participants exhibit asymmetries both with respect to sign (gains vs. losses) and issue, contradicting any utility model based on an $\ell_p$ metric. These findings suggest that developing practical budget-aggregation mechanisms requires more flexible models of human utility functions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces a modular framework and open-source polling system for empirically testing hypotheses about voters' utility functions over budget allocations via pairwise comparisons of non-ideal alternatives, after first eliciting each subject's ideal point. Experiments with hundreds of participants reject standard models (ℓ1, ℓ2, Leontief) as consistent with almost no individuals, while reporting substantial support for weaker, more general properties such as star-shaped, multi-dimensional single-peaked, and peak-linear preferences; the data also reveal systematic asymmetries with respect to gains versus losses and across issues.

Significance. If the empirical classifications hold, the results are significant for budget-aggregation theory: they supply the first large-scale human-subject evidence against the utility assumptions underlying recent impossibility theorems and mechanism-design proposals, and they point toward the need for more flexible models. The open-source polling infrastructure and the modular pair-generation approach constitute reusable methodological contributions that can support follow-up work.

major comments (2)
  1. [§3.2 and §4.3] Section 3.2 (Pair-generation algorithms) and Section 4.3 (Consistency classification): the binary consistency tests for general properties such as star-shaped or multi-dimensional single-peaked preferences lack any reported simulation or analytic bound on the false-positive rate under noisy or random responding; without this, the reported high support rates for the permissive properties cannot be distinguished from what would arise from inattentive participants, undermining the central contrast with the standard models.
  2. [§4.1] Section 4.1 (Participant screening and attention checks): the manuscript does not describe the exact filtering criteria, inconsistency thresholds, or any post-hoc exclusion rules applied to the hundreds of responses; because the main claims rest on the fraction of participants classified as consistent with each property, the absence of these details leaves open whether the reported asymmetries and property-support statistics survive standard robustness checks.
minor comments (2)
  1. [Abstract and §4] The abstract states 'hundreds of participants' but the precise sample size, recruitment method, and demographic breakdown appear only in an appendix; moving the core numbers into the main experimental section would improve readability.
  2. [§2.3] Notation for the peak-linear property is introduced without an explicit equation or diagram in the main text; a short formal definition would clarify how it differs from the other tested properties.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the presentation of our empirical results. We address each major comment below and have revised the manuscript accordingly to improve transparency and robustness.

read point-by-point responses
  1. Referee: [§3.2 and §4.3] Section 3.2 (Pair-generation algorithms) and Section 4.3 (Consistency classification): the binary consistency tests for general properties such as star-shaped or multi-dimensional single-peaked preferences lack any reported simulation or analytic bound on the false-positive rate under noisy or random responding; without this, the reported high support rates for the permissive properties cannot be distinguished from what would arise from inattentive participants, undermining the central contrast with the standard models.

    Authors: We agree that explicit quantification of false-positive rates under random or noisy responding would strengthen the interpretation of the consistency classifications. The original manuscript emphasized the near-zero consistency with the strict standard models (ℓ1, ℓ2, Leontief) as the primary contrast. To directly address the concern, we have added Monte Carlo simulations of random responders in a new subsection of §4.3. These simulations show that the observed consistency rates for star-shaped, multi-dimensional single-peaked, and peak-linear properties substantially exceed those expected from uniform random choice or moderate noise, while the standard models remain near zero. We also report analytic bounds for the simplest case of star-shaped preferences. revision: yes

  2. Referee: [§4.1] Section 4.1 (Participant screening and attention checks): the manuscript does not describe the exact filtering criteria, inconsistency thresholds, or any post-hoc exclusion rules applied to the hundreds of responses; because the main claims rest on the fraction of participants classified as consistent with each property, the absence of these details leaves open whether the reported asymmetries and property-support statistics survive standard robustness checks.

    Authors: We acknowledge that the precise screening and exclusion rules should have been reported in full. The original data collection used two attention checks (correct identification of the elicited ideal point and a consistency check on a repeated pair) and excluded participants who failed either check or completed the survey in under five minutes. We have now expanded §4.1 with the exact criteria, thresholds, and the number of participants excluded at each stage (approximately 18% of the initial sample). We also added a robustness appendix repeating the main analyses after applying stricter inconsistency thresholds; the reported asymmetries and relative support levels for the general properties remain qualitatively unchanged. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical claims rest on new participant data, not derivations or self-referential fits.

full rationale

The paper presents an empirical framework that elicits ideal allocations from participants and then uses generated pairwise comparisons to test consistency with various utility properties. All central findings—low consistency with ℓ1/ℓ2/Leontief models and higher support for star-shaped, multi-dimensional single-peaked, and peak-linear preferences—are reported as direct observations from the collected responses. No equations, fitted parameters, or predictions are derived from prior results in a way that reduces to the inputs by construction. No load-bearing self-citations or uniqueness theorems are invoked; the analysis is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper's claims rest on the validity of the experimental design and the assumption that pairwise responses reveal underlying utility properties; no free parameters, new entities, or mathematical axioms are introduced.

axioms (1)
  • domain assumption Participants answer pairwise comparisons consistently and truthfully in a way that reflects their true preferences over budget allocations.
    The entire testing framework and the interpretation of consistency with models depend on this behavioral assumption.

pith-pipeline@v0.9.0 · 5783 in / 1317 out tokens · 37389 ms · 2026-05-18T03:28:14.003345+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages

  1. [1]

    D., and Gal, Y

    Benadè, G., Itzhak, N., Shah, N., Procaccia, A. D., and Gal, Y . (2018). Efficiency and usability of participatory budgeting methods. InEmpirical Studies in Participatory Budgeting, Proceedings of the PB Conference / Workshops, pages 1–8. Unpublished manuscript / Academic preprint. Empirical study with over 1,200 voters comparing input formats for PB

  2. [2]

    Brandt, F., Greger, M., Segal-Halevi, E., and Suksompong, W. (2024a). Opti- mal budget aggregation with single-peaked preferences. InProceedings of the 25th ACM Conference on Economics and Computation, page 49, New York, NY , USA. Association for Computing Machinery

  3. [3]

    Brandt, F., Greger, M., Segal-Halevi, E., and Suksompong, W. (2024b). Optimal budget aggregation with star-shaped preferences.arXiv preprint arXiv:2402.15904

  4. [4]

    Caragiannis, I., Christodoulou, G., and Protopapas, N. (2022). Truthful aggrega- tion of budget proposals with proportionality guarantees.Proceedings of the AAAI Conference on Artificial Intelligence, 36(5):4917–4924. arXiv:2203.09971

  5. [5]

    de Berg, M., Freeman, R., Schmidt-Kraepelin, U., and Utke, M. (2024). Truth- ful budget aggregation: Beyond moving-phantom mechanisms.arXiv preprint arXiv:2405.20303

  6. [6]

    Elkind, E., Suksompong, W., and Teh, N. (2023). Settling the score: Portioning with cardinal preferences. InECAI 2023, Frontiers in Artificial Intelligence and Applications, pages 621–628. IOS Press. arXiv:2307.15586

  7. [7]

    Fairstein, R., Benadè, G., and Gal, K. (2023). Participatory budgeting design for the real world.arXiv preprint arXiv:2302.13316

  8. [8]

    M., Peters, D., and Wortman Vaughan, J

    Freeman, R., Pennock, D. M., Peters, D., and Wortman Vaughan, J. (2019). Truthful aggregation of budget proposals. InProceedings of the 2019 ACM Conference on Economics and Computation, pages 751–752, New York. Association for Computing Machinery. arXiv:1905.00457

  9. [9]

    and Schmidt-Kraepelin, U

    Freeman, R. and Schmidt-Kraepelin, U. (2023). Project-fair and truthful mecha- nisms for budget aggregation

  10. [10]

    K., Sakshuwong, S., and Aitamurto, T

    Goel, A., Krishnaswamy, A. K., Sakshuwong, S., and Aitamurto, T. (2019). Knap- sack voting for participatory budgeting.ACM Transactions on Economics and Compu- tation, 7(2):8:1–8:27. arXiv:2009.06856

  11. [11]

    Gourvès, L., Lampis, M., Melissinos, N., and Pagourtzis, A. (2025). Satisfactory budget division.arXiv preprint arXiv:2502.00484

  12. [12]

    Intriligator, M. D. (1973). A probabilistic model of social choice.The Review of Economic Studies, 40(4):553–560

  13. [13]

    Moulin, H. (1980). On strategy-proofness and single peakedness.Public Choice, 35(4):437–455

  14. [14]

    Peters, D., Pierczy´nski, G., and Skowron, P. (2021). Proportional participatory budgeting with additive utilities.ACM Transactions on Economics and Computation, 9(4):1–39

  15. [15]

    and Talmon, N

    Rosenfeld, A. and Talmon, N. (2021). What should we optimize in participatory budgeting? an experimental study.CoRR, abs/2111.07308

  16. [16]

    Schmidt-Kraepelin, T., Suksompong, W., and Utke, C. (2025). Discrete budget aggregation: Truthfulness and proportionality.arXiv preprint arXiv:2505.05708

  17. [17]

    D., Wailoo, A

    Skedgel, C. D., Wailoo, A. J., and Akehurst, R. L. (2015). Choosing vs. allocating: Discrete choice experiments and constant-sum paired comparisons for the elicitation of societal preferences.Health Expectations, 18(5):1227–1240

  18. [18]

    Skowron, P., Slinko, A., Szufa, S., and Talmon, N. (2020). Participatory budgeting with cumulative votes.arXiv preprint arXiv:2009.02690

  19. [19]

    C., Hausladen, C

    Yang, J. C., Hausladen, C. I., Peters, D., Pournaras, E., Hänggli Fricker, R., and Helbing, D. (2024). Designing digital voting systems for citizens: Achieving fairness and legitimacy in participatory budgeting.Digital Government: Research and Practice, 5(1):1–16. APPENDIX A POLL INTERFACE AND QUESTION DESIGN This appendix includes sample screenshots from...