pith. sign in

arxiv: 2606.07629 · v1 · pith:MBOBM73Xnew · submitted 2026-05-30 · 💻 cs.LG · cs.AI· cs.CL· cs.CY· cs.HC

Large Language Models Should Learn Personalized Rather Than Aggregated Human Preferences

Pith reviewed 2026-06-28 19:02 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CLcs.CYcs.HC
keywords preference aggregationLLM alignmentpersonalizationreward modelingsocial choice theoryuser autonomysafety constraints
0
0 comments X

The pith

Large language models should learn individualized preferences rather than preferences aggregated across users into one reward signal.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper contends that current LLM alignment methods combine diverse human preferences into a single averaged reward model, which corresponds to no actual person and therefore loses important information about how preferences vary by individual, group, and context. This loss is both theoretically expected from results in social choice theory and observable in practice when preferences differ systematically across demographics. The authors examine the internal structure of human preferences, review existing technical routes to personalization, and rebut objections about scalability, shared norms, and manipulation. They maintain that safety problems such as filter bubbles can be contained inside bounded personalization systems that keep universal constraints in place while still allowing genuine individual differences. The position concludes by outlining a combined research and policy program for building models that respect personal autonomy alongside collective safety requirements.

Core claim

Aggregating human preferences into one reward signal produces an average-user model that represents no real individual and erases critical diversity in values and contextual dependencies; therefore LLMs should instead learn personalized preferences, subject to bounded frameworks that preserve universal safety constraints while permitting legitimate individual variation.

What carries the argument

The contrast between an aggregated reward signal (which collapses all users into one hypothetical average) and individualized preference models (which retain user-specific structure), positioned as the central mechanism for capturing preference diversity.

If this is right

  • Aggregation necessarily masks both inter-group and intra-group differences in preferences, reducing alignment quality for every actual user.
  • Technical methods already exist that can encode the richer structure of individual preferences without requiring full per-user retraining.
  • Universal safety constraints can be retained as hard boundaries while softer preference dimensions are allowed to vary per user.
  • A concrete research agenda exists for developing and evaluating preference-aware models that balance autonomy and collective safety.
  • Policy attention should shift from purely aggregate alignment metrics toward metrics that also track fidelity to individual preference distributions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the argument holds, evaluation benchmarks that currently rely on aggregate human judgments would need to be replaced or augmented by per-user or per-demographic metrics.
  • The same aggregation problem appears in other domains such as recommendation systems and content moderation, suggesting the proposed shift could apply beyond LLMs.
  • One testable extension is whether bounded personalization reduces user disengagement rates compared with average-user models in longitudinal deployment studies.

Load-bearing premise

Bounded personalization frameworks can preserve universal safety constraints while still accommodating legitimate individual variation without creating unmanageable risks such as filter bubbles or value lock-in.

What would settle it

A controlled experiment that trains otherwise identical models on aggregated versus personalized preference data and shows that the personalized versions systematically violate the same safety constraints that the aggregated versions satisfy would falsify the claim that risks remain manageable.

Figures

Figures reproduced from arXiv: 2606.07629 by Cristina Garbacea.

Figure 1
Figure 1. Figure 1: Diverse user preferences cluster in distinct regions of preference space. The aggregated reward model (gray) falls in a sparse region between clusters, representing no actual user group and systematically failing minority populations. 2. Limitations of Aggregated Preferences 2.1. The Impossibility of Universal Preferences Aggregating diverse human preferences into a single re￾ward signal is not merely impr… view at source ↗
Figure 2
Figure 2. Figure 2: Aggregated alignment (A) collapses diverse user preferences into a single reward signal, losing individual variation. Personalized and adaptive alignment (B) preserves individual preference structure via user-specific models. sonalization particularly attractive for deployment settings where per-user parameter storage is infeasible. Prompt-based and training-based personalization are ul￾timately complement… view at source ↗
read the original abstract

Current approaches to aligning large language models (LLMs) aggregate diverse human preferences into a single reward signal, effectively optimizing for a hypothetical ``average user'' who represents no real person particularly well. This position paper argues that LLMs should learn personalized, individual preferences rather than aggregated ones. We show that aggregation masks critical information about preference diversity, individual values, and contextual dependencies, which is a limitation both theoretically grounded in social choice theory and empirically evident across demographic groups. We analyze the rich structure that human preferences encode, survey technical approaches to personalization, and systematically address counterarguments on scalability, shared standards, and manipulation risk. While personalization introduces genuine safety challenges including filter bubbles, value lock-in, and psychological manipulation, we argue these are manageable through bounded personalization frameworks that preserve universal safety constraints while accommodating legitimate individual variation. We conclude with a concrete research and policy agenda for developing preference-aware models that respect both individual autonomy and collective safety.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript is a position paper arguing that LLM alignment currently aggregates diverse human preferences into a single reward signal, optimizing for a hypothetical 'average user' that represents no real individual well. It claims aggregation masks critical information on preference diversity, individual values, and contextual dependencies—a limitation both theoretically grounded in social choice theory and empirically evident across demographic groups. The paper surveys technical approaches to personalization, systematically addresses counterarguments on scalability, shared standards, and manipulation risk, and advocates bounded personalization frameworks that preserve universal safety constraints while accommodating individual variation. It concludes with a research and policy agenda for preference-aware models respecting both autonomy and collective safety.

Significance. If the central normative argument holds, the paper could meaningfully influence AI alignment research by reframing aggregated preference optimization as a core limitation and promoting personalized alternatives. This has potential significance for ethical and effective LLM development. As a conceptual position paper without new empirical measurements, formal derivations, or algorithmic constructions, its value lies primarily in synthesizing existing ideas and addressing counterarguments rather than advancing technical methods. The explicit treatment of safety challenges and proposal of bounded frameworks is a constructive element.

major comments (3)
  1. [Abstract] Abstract: The claim that aggregation's limitations are 'empirically evident across demographic groups' is asserted without any data, citations, or analysis provided in the manuscript. This empirical grounding is load-bearing for the argument that aggregation is harmful in practice.
  2. [Abstract] Abstract: The invocation of social choice theory to ground the harm of aggregation is stated at a high level but does not cite or derive any specific results (e.g., Arrow's theorem or related impossibility results) to show why aggregation necessarily masks individual values in the LLM setting.
  3. [Abstract] Abstract: The bounded personalization framework is proposed to manage risks such as filter bubbles and value lock-in while preserving safety, but no concrete mechanisms, constraints, or enforcement methods are specified. This detail is load-bearing for the claim that the safety challenges are manageable.
minor comments (1)
  1. [Abstract] Abstract: The double-backtick formatting in ``average user'' is a typographical artifact and should use standard quotation marks.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments on our position paper. Below we respond point-by-point to the major comments on the abstract, indicating where we will revise the manuscript for clarity and precision while preserving its conceptual focus.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The claim that aggregation's limitations are 'empirically evident across demographic groups' is asserted without any data, citations, or analysis provided in the manuscript. This empirical grounding is load-bearing for the argument that aggregation is harmful in practice.

    Authors: We agree the abstract states the claim at a high level. The body of the manuscript synthesizes existing empirical literature on demographic variation in preferences (including cross-cultural value surveys and alignment studies). To directly address the concern, we will revise the abstract to include a brief qualifying phrase and pointer to these references. revision: yes

  2. Referee: [Abstract] Abstract: The invocation of social choice theory to ground the harm of aggregation is stated at a high level but does not cite or derive any specific results (e.g., Arrow's theorem or related impossibility results) to show why aggregation necessarily masks individual values in the LLM setting.

    Authors: We accept that an explicit link would strengthen the argument. The manuscript already draws on social choice concepts to illustrate how aggregation can produce outcomes misaligned with any individual's preferences. We will add citations to Arrow's impossibility theorem and related results, together with a short explanation of their applicability to LLM reward modeling. revision: yes

  3. Referee: [Abstract] Abstract: The bounded personalization framework is proposed to manage risks such as filter bubbles and value lock-in while preserving safety, but no concrete mechanisms, constraints, or enforcement methods are specified. This detail is load-bearing for the claim that the safety challenges are manageable.

    Authors: The framework is presented conceptually to outline the normative balance between autonomy and safety. We acknowledge that illustrative mechanisms would make the claim more robust. In revision we will add high-level examples (e.g., immutable safety layers that cannot be overridden by personalization) while keeping the paper's position-paper character. revision: partial

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

This is a normative position paper with no formal derivations, equations, fitted parameters, or algorithmic constructions. Its central claim is a policy recommendation grounded in external appeals to social choice theory and demographic observations. No load-bearing steps reduce by construction to the paper's own inputs, self-citations, or ansatzes. The argument is self-contained against external benchmarks and does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The position rests on the domain assumption that preference aggregation necessarily masks critical individual and contextual information, drawn from social choice theory, with no free parameters or invented entities introduced.

axioms (1)
  • domain assumption Aggregating diverse human preferences into a single reward signal necessarily masks critical information about preference diversity, individual values, and contextual dependencies
    This premise is invoked to argue that current alignment methods are fundamentally limited.

pith-pipeline@v0.9.1-grok · 5691 in / 1209 out tokens · 30253 ms · 2026-06-28T19:02:00.594928+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

30 extracted references · 14 canonical work pages · 6 internal anchors

  1. [1]

    arXiv:2603.18161 , year=

    Abdulhai, M., White, I., Wan, Y ., Qureshi, I., Leibo, J., Kleiman-Weiner, M., and Jaques, N. How llms distort our written language.arXiv preprint arXiv:2603.18161,

  2. [2]

    A General Language Assistant as a Laboratory for Alignment

    Askell, A., Bai, Y ., Chen, A., Drain, D., Ganguli, D., Henighan, T., Jones, A., Joseph, N., Mann, B., DasSarma, N., et al. A general language assistant as a laboratory for alignment.arXiv preprint arXiv:2112.00861,

  3. [3]

    Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

    Bai, Y ., Jones, A., Ndousse, K., Askell, A., Chen, A., Das- Sarma, N., Drain, D., Fort, S., Ganguli, D., Henighan, T., et al. Training a helpful and harmless assistant with rein- forcement learning from human feedback.arXiv preprint arXiv:2204.05862,

  4. [4]

    Pal: Sample-efficient personalized reward modeling for plural- istic alignment

    Chen, D., Chen, Y ., Rege, A., Wang, Z., and Vinayak, R. Pal: Sample-efficient personalized reward modeling for plural- istic alignment. InInternational Conference on Learning Representations, volume 2025, pp. 15755–15799,

  5. [5]

    User embedding model for personalized language prompting

    Doddapaneni, S., Sayana, K., Jash, A., Sodhi, S., and Kuzmin, D. User embedding model for personalized language prompting. InProceedings of the 1st Workshop on Personalization of Generative AI Systems (PERSON- ALIZE 2024), pp. 124–131,

  6. [6]

    When the majority is wrong: Modeling annotator disagreement for subjec- tive tasks

    Fleisig, E., Abebe, R., and Klein, D. When the majority is wrong: Modeling annotator disagreement for subjec- tive tasks. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp. 6715–6726,

  7. [7]

    and Tan, C

    Garbacea, C. and Tan, C. Hyperalign: Interpretable person- alized llm alignment via hypothesis generation.arXiv preprint arXiv:2505.00038,

  8. [8]

    Personalized Benchmarking: Evaluating LLMs by Individual Preferences

    Garbacea, C., Wang, H., and Tan, C. Personalized bench- marking: Evaluating llms by individual preferences. arXiv preprint arXiv:2604.18943,

  9. [9]

    L., Lam, M

    10 Large Language Models Should Learn Personalized Rather Than Aggregated Human Preferences Gordon, M. L., Lam, M. S., Park, J. S., Patel, K., Hancock, J., Hashimoto, T., and Bernstein, M. S. Jury learning: In- tegrating dissenting voices into machine learning models. InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems, pp. 1–19,

  10. [10]

    Beyond Fixed Psychological Personas: State Beats Trait, but Language Models are State-Blind

    Harry, T., Ngong, I., Nweke, C., Feng, Y ., and Near, J. Beyond fixed psychological personas: State beats trait, but language models are state-blind.arXiv preprint arXiv:2601.15395,

  11. [11]

    Context steering: Controllable personalization at inference time

    He, Z., Pandey, S., Schrum, M., and Dragan, A. Context steering: Controllable personalization at inference time. InInternational Conference on Learning Representations, volume 2025, pp. 87863–87895,

  12. [12]

    and Yang, D

    Huang, J. and Yang, D. Culturally aware natural language in- ference. InFindings of the Association for Computational Linguistics: EMNLP 2023, pp. 7591–7609,

  13. [13]

    Co-writing with opinionated language models affects users’ views

    Jakesch, M., Hancock, J., and Naaman, M. Co-writing with opinionated language models affects users’ views. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems, pp. 1–15,

  14. [14]

    Adaptation of large language models

    Ke, Z., Ming, Y ., and Joty, S. Adaptation of large language models. InProceedings of the 2025 Annual Conference of the Nations of the Americas Chapter of the Associa- tion for Computational Linguistics: Human Language Technologies (Volume 5: Tutorial Abstracts), pp. 30–37,

  15. [15]

    and Kim, J

    Kim, S. and Kim, J. Spring: Continual llm per- sonalization via selective parametric adaptation and retrieval-interpolated generation.arXiv preprint arXiv:2601.09974,

  16. [16]

    R., Vidgen, B., R¨ottger, P., and Hale, S

    Kirk, H. R., Vidgen, B., R¨ottger, P., and Hale, S. A. Personal- isation within bounds: A risk taxonomy and policy frame- work for the alignment of large language models with personalised feedback.arXiv preprint arXiv:2303.05453,

  17. [17]

    Lau, A., Choi, Y ., Balazadeh, V ., Chidambaram, K., Syrgka- nis, V ., and Krishnan, R. G. Personalized adapta- tion via in-context preference learning.arXiv preprint arXiv:2410.14001,

  18. [18]

    The power of scale for parameter-efficient prompt tuning

    Lester, B., Al-Rfou, R., and Constant, N. The power of scale for parameter-efficient prompt tuning. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 3045–3059,

  19. [19]

    Eliciting human preferences with language models

    Li, B., Tamkin, A., Goodman, N., and Andreas, J. Eliciting human preferences with language models. InInterna- tional Conference on Learning Representations, volume 2025, pp. 80984–81013, 2025a. Li, Y .-C., Zhang, F., Qiu, W., Yuan, L., Jia, C., Zhang, Z., Yu, Y ., and An, B. Q-adapter: Customizing pre-trained llms to new preferences with forgetting mitiga...

  20. [20]

    WebGPT: Browser-assisted question-answering with human feedback

    Nakano, R., Hilton, J., Balaji, S., Wu, J., Ouyang, L., Kim, C., Hesse, C., Jain, S., Kosaraju, V ., Saunders, W., et al. Webgpt: Browser-assisted question-answering with hu- man feedback.arXiv preprint arXiv:2112.09332,

  21. [21]

    The alignment prob- lem from a deep learning perspective

    Ngo, R., Chan, L., and Mindermann, S. The alignment prob- lem from a deep learning perspective. InInternational Conference on Learning Representations, volume 2024, pp. 7474–7501,

  22. [22]

    Disentan- gling length from quality in direct preference optimiza- tion

    Park, R., Rafailov, R., Ermon, S., and Finn, C. Disentan- gling length from quality in direct preference optimiza- tion. InFindings of the Association for Computational Linguistics: ACL 2024, pp. 4998–5017,

  23. [23]

    Towards understanding syco- phancy in language models

    Sharma, M., Tong, M., Korbak, T., Duvenaud, D., Askell, A., Bowman, S., Durmus, E., Hatfield-Dodds, Z., John- ston, S., Kravec, S., et al. Towards understanding syco- phancy in language models. InInternational Conference on Learning Representations, volume 2024, pp. 110–144,

  24. [24]

    Large language models are not yet human-level evaluators for abstractive summarization

    Shen, C., Cheng, L., Nguyen, X.-P., You, Y ., and Bing, L. Large language models are not yet human-level evaluators for abstractive summarization. InThe 2023 Conference on Empirical Methods in Natural Language Processing,

  25. [25]

    Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao

    Sheng, Y ., Cao, S., Li, D., Hooper, C., Lee, N., Yang, S., Chou, C., Zhu, B., Zheng, L., Keutzer, K., et al. S-lora: Serving thousands of concurrent lora adapters.arXiv preprint arXiv:2311.03285,

  26. [26]

    Beyond memorization: Violating privacy via inference with large language models

    Staab, R., Vero, M., Balunovic, M., and Vechev, M. Beyond memorization: Violating privacy via inference with large language models. InInternational Conference on Learn- ing Representations, volume 2024, pp. 33832–33878,

  27. [27]

    Rethinking bradley- terry models in preference-based reward modeling: Foundations, theory, and alternatives.arXiv preprint arXiv:2411.04991,

    Sun, H., Shen, Y ., and Ton, J.-F. Rethinking bradley- terry models in preference-based reward modeling: Foundations, theory, and alternatives.arXiv preprint arXiv:2411.04991,

  28. [28]

    Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges

    12 Large Language Models Should Learn Personalized Rather Than Aggregated Human Preferences Wang, X., Tian, M., Zeng, Y ., Huang, Z., Yuan, J., Chen, B., Xu, J., Zhou, M., Liu, W., Wu, M., et al. Reward hacking in the era of large models: Mechanisms, emergent mis- alignment, challenges.arXiv preprint arXiv:2604.13602,

  29. [29]

    B., and Choi, E

    Zhang, M., Knox, W. B., and Choi, E. Modeling future con- versation turns to teach llms to ask clarifying questions. InInternational Conference on Learning Representations, volume 2025, pp. 60722–60742,

  30. [30]

    Per- sonalllm: Tailoring llms to individual preferences

    Zollo, T., Siah, A., Ye, N., Li, L., and Namkoong, H. Per- sonalllm: Tailoring llms to individual preferences. In International Conference on Learning Representations, volume 2025, pp. 66949–66971,