AI Alignment From Social Choice Perspectives

Ariel D. Procaccia; Benjamin Schiffer; Daniel Halpern; Evi Micha; Itai Shapira; Shirley Zhang

arxiv: 2606.21550 · v1 · pith:EKC5JA4Bnew · submitted 2026-06-19 · 💻 cs.AI

AI Alignment From Social Choice Perspectives

Daniel Halpern , Evi Micha , Ariel D. Procaccia , Benjamin Schiffer , Itai Shapira , Shirley Zhang This is my paper

Pith reviewed 2026-06-26 14:11 UTC · model grok-4.3

classification 💻 cs.AI

keywords ai alignmentsocial choice theoryreinforcement learning from human feedbackpreference aggregationhuman feedbackdisagreement handlingvoting mechanisms

0 comments

The pith

Social choice theory identifies failure modes in how human feedback is aggregated for AI alignment and opens a wider space of explicit design options.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper surveys recent applications of social choice theory to the aggregation step in reinforcement learning from human feedback. When human judgments conflict, the learned model objective becomes some collective determination of desired behavior. The survey shows that this perspective surfaces concrete problems in standard aggregation practices and maps a larger set of principled mechanisms for resolving or accommodating disagreement. A reader would care because the choice of aggregation directly shapes what values the final model encodes.

Core claim

Alignment from human feedback uses human judgments about model outputs to steer the behavior of language models after pretraining. When those judgments reflect conflicting views of desirable behavior, the learned objective becomes an aggregate determination of what the model should prefer. The survey illustrates how the social choice perspective helps identify failure modes in the feedback aggregation layer and reveals a broader design space for handling disagreement in explicit and principled ways.

What carries the argument

The feedback aggregation layer that combines multiple human judgments into a single training objective, analyzed through social choice mechanisms for combining preferences.

If this is right

Standard aggregation methods such as averaging or majority voting can produce specific failures when preferences contain cycles or strong minorities.
A larger menu of aggregation rules becomes available, each satisfying different formal properties such as fairness or strategy-proofness.
Disagreement can be handled by mechanisms that preserve minority views or produce rankings rather than single winners.
The aggregation step can be made more transparent and justifiable by explicit appeal to social choice axioms.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Feedback collection interfaces could be redesigned to elicit preferences in forms that social choice rules can process more cleanly.
The same lens may apply to multi-model or multi-stakeholder alignment settings where several groups must jointly decide model behavior.
Empirical tests of social choice rules inside actual RLHF pipelines would reveal whether the identified failure modes appear at scale.

Load-bearing premise

Social choice theory supplies applicable and insightful tools for modeling and improving the aggregation of human judgments in reinforcement learning from human feedback.

What would settle it

An experiment that applies standard RLHF aggregation to conflicting preference data, finds no previously hidden failure modes, and shows that social-choice-inspired alternatives produce no measurable improvement in alignment metrics.

read the original abstract

Alignment from human feedback uses human judgments about model outputs to steer the behavior of language models after pretraining. When those judgments reflect conflicting views of desirable behavior, the learned objective becomes an aggregate determination of what the model should prefer. We survey recent work that has studied this aggregation problem through the lens of social choice theory. We illustrate how the social choice perspective helps identify failure modes in the feedback aggregation layer and reveals a broader design space for handling disagreement in explicit and principled ways.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a survey organizing social choice ideas for aggregating conflicting human feedback in alignment, with no new results.

read the letter

This paper is a survey that maps social choice theory onto the aggregation step in human feedback for AI alignment. It shows how disagreement among human judgments creates problems for the learned objective and how social choice concepts can make those problems more visible while opening up other ways to handle them.

It does a reasonable job of pulling together existing literature to illustrate failure modes, such as when simple averaging or majority rules produce objectives that ignore important minorities or create inconsistencies. The design space it describes comes from applying known mechanisms like different voting rules or fairness criteria to the RLHF setting.

The soft spots are straightforward: there are no new theorems, experiments, or derivations. The value is entirely in the synthesis and the clarity of the illustrations, which rest on the cited papers. If the connections are accurate and the coverage is broad, the survey works; if not, it adds little. The abstract suggests the synthesis is careful rather than forced.

This is for alignment researchers who want a clear introduction to the social choice angle on feedback aggregation. Readers already familiar with both fields or looking for original technical advances will not get much from it.

It deserves a serious referee because a well-executed survey that bridges these areas can save others time and point to open questions without needing to be groundbreaking.

Referee Report

0 major / 2 minor

Summary. The manuscript surveys recent literature applying social choice theory to the aggregation of conflicting human judgments in reinforcement learning from human feedback (RLHF) for AI alignment. Its central claim is that this lens identifies failure modes in the feedback aggregation layer and reveals a broader design space for handling disagreement explicitly and in principled ways, with the survey drawing on and organizing existing cited work rather than presenting new derivations or experiments.

Significance. If the survey accurately and comprehensively covers the referenced literature, it offers a useful organizational framework that connects social choice concepts to alignment challenges, potentially aiding researchers in designing aggregation mechanisms that better accommodate preference diversity. The manuscript appropriately attributes insights to the cited prior work rather than claiming internal novelty in theorems or empirical results.

minor comments (2)

[Abstract] Abstract: the phrase 'recent work' is vague; adding a sentence on the temporal scope or number of papers surveyed would improve clarity on the review's coverage.
The manuscript would benefit from an explicit table or structured list mapping specific social choice axioms or mechanisms (e.g., Condorcet consistency) to the alignment failure modes discussed, to make the connections more immediately usable for readers.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of the manuscript, recognition of its value as an organizational framework, and recommendation for minor revision. No specific major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity; survey of external literature

full rationale

The manuscript is explicitly a literature survey of prior work applying social choice theory to RLHF feedback aggregation. Its claims (identifying failure modes and expanding design space) are supported by citations to external papers rather than any internal derivation, theorem, or fitted prediction. No self-definitional steps, fitted inputs renamed as predictions, or load-bearing self-citations appear in the provided abstract or described structure. The central premise does not reduce to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a survey paper with no new mathematical derivations, empirical claims, or modeling assumptions that introduce free parameters, axioms, or invented entities.

pith-pipeline@v0.9.1-grok · 5604 in / 836 out tokens · 25609 ms · 2026-06-26T14:11:01.167491+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

146 extracted references · 9 linked inside Pith

[1]

Scalable Agent Alignment via Reward Modeling: A Research Direction, 2018

Jan Leike, David Krueger, Tom Everitt, Miljan Martic, Vishal Maini, and Shane Legg. Scalable Agent Alignment via Reward Modeling: A Research Direction, 2018. arXiv:1811.07871. 1

Pith/arXiv arXiv 2018
[2]

AI Alignment: A Contemporary Survey.ACM Computing Surveys, 58(5):132:1–132:38, 2025

Jiaming Ji, Tianyi Qiu, Boyuan Chen, Jiayi Zhou, Borong Zhang, Donghai Hong, Hantao Lou, Kaile Wang, Yawen Duan, Zhonghao He, Lukas Vierling, Zhaowei Zhang, Fanzhi Zeng, Juntao Dai, Xuehai Pan, Hua Xu, Aidan O’Gara, Kwan Ng, Brian Tse, Jie Fu, Stephen McAleer, Yanfeng Wang, Mingchuan Yang, Yunhuai Liu, Yizhou Wang, Song-Chun Zhu, Yike Guo, Yaodong Yang, a...

2025
[3]

Artificial Intelligence, Values, and Alignment.Minds and Machines, 30(3):411–437, 2020

Iason Gabriel. Artificial Intelligence, Values, and Alignment.Minds and Machines, 30(3):411–437, 2020. 1, 3

2020
[4]

Russell.Human Compatible: Artificial Intelligence and the Problem of Control

Stuart J. Russell.Human Compatible: Artificial Intelligence and the Problem of Control. Allen Lane, 2019. 1

2019
[5]

Alignment for Advanced Machine Learning Systems

Jessica Taylor, Eliezer Yudkowsky, Patrick LaVictoire, and Andrew Critch. Alignment for Advanced Machine Learning Systems. In S. Matthew Liao, editor,Ethics of Artificial Intelligence, pages 342–382. Oxford University Press,
[6]

Alignment of Language Agents, 2021

Zachary Kenton, Tom Everitt, Laura Weidinger, Iason Gabriel, Vladimir Mikulik, and Geoffrey Irving. Alignment of Language Agents, 2021. arXiv:2103.14659. 1

arXiv 2021
[7]

A Matter of Principle? AI Alignment as the Fair Treatment of Claims.Philosophical Studies, 182(7):1951–1973, 2025

Iason Gabriel and Geoff Keeling. A Matter of Principle? AI Alignment as the Fair Treatment of Claims.Philosophical Studies, 182(7):1951–1973, 2025. 1

1951
[8]

Unsolved Problems in ML Safety, 2021

Dan Hendrycks, Nicholas Carlini, John Schulman, and Jacob Steinhardt. Unsolved Problems in ML Safety, 2021. arXiv:2109.13916. 1

Pith/arXiv arXiv 2021
[9]

Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, J´ er´ emy Scheurer, Javier Rando, Rachel Freedman, Tomek Korbak, David Lindner, Pedro Freire, Tony Tong Wang, Samuel Marks, Charbel-Raphael Segerie, Micah Carroll, Andi Peng, Phillip J. K. Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J. ...
[10]

EloChoice

Andrew P. Clark, Kate L. Howard, Andy T. Woods, Ian S. Penton-Voak, and Christof Neumann. Why Rate When You Could Compare? Using the “EloChoice” Package to Assess Pairwise Comparisons of Perceived Physical Strength.PLOS ONE, 13(1):e0190393, 2018. 1

2018
[11]

The k-Armed Dueling Bandits Problem.Journal of Computer and System Sciences, 78(5):1538–1556, 2012

Yisong Yue, Josef Broder, Robert Kleinberg, and Thorsten Joachims. The k-Armed Dueling Bandits Problem.Journal of Computer and System Sciences, 78(5):1538–1556, 2012. 1

2012
[12]

Christiano, Jan Leike, Tom B

Paul F. Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, and Dario Amodei. Deep Reinforcement Learning from Human Preferences. In AI Alignment From Social Choice Perspectives·19 The Thirty-first Annual Conference on Neural Information Processing Systems,
[13]

Training Language Models to Follow Instructions with Human Feedback

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, and Ryan Lowe. Training Language Models to Follow Instructions with Human Feedback....

2022
[14]

Ziegler, Nisan Stiennon, Jeffrey Wu, Tom B

Daniel M. Ziegler, Nisan Stiennon, Jeffrey Wu, Tom B. Brown, Alec Radford, Dario Amodei, Paul Christiano, and Geoffrey Irving. Fine-Tuning Language Models from Human Preferences, 2020. arXiv:1909.08593. 1, 4, 5

Pith/arXiv arXiv 2020
[15]

Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, and Paul Christiano

Nisan Stiennon, Long Ouyang, Jeff Wu, Daniel M. Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, and Paul Christiano. Learning to Sum- marize with Human Feedback. InThe Thirty-fourth Annual Conference on Neural Information Processing Systems, 2020. 1, 2, 4, 7

2020
[16]

Manning, Stefano Ermon, and Chelsea Finn

Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D. Manning, Stefano Ermon, and Chelsea Finn. Direct Preference Optimization: Your Language Model Is Secretly a Reward Model. InThe Thirty-seventh Annual Conference on Neural Information Processing Systems, 2023. 1, 2

2023
[17]

Bowman, Zac Hatfield-Dodds, Ben Mann, Dario Amodei, Nicholas Joseph, Sam McCandlish, Tom Brown, and Jared Kaplan

Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, Carol Chen, Catherine Olsson, Christopher Olah, Danny Hernan- dez, Dawn Drain, Deep Ganguli, Dustin Li, Eli Tran-Johnson, Ethan Perez, Jamie Kerr, Jared Mueller, Jeffrey Ladish, Joshua Landau, Kamal Ndousse,...

Pith/arXiv arXiv 2022
[18]

Liao, Esin Durmus, Alex Tamkin, and Deep Ganguli

Saffron Huang, Divya Siddarth, Liane Lovitt, Thomas I. Liao, Esin Durmus, Alex Tamkin, and Deep Ganguli. Collective Constitutional AI: Aligning a Language Model with Public Input. InProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, 2024. 2

2024
[19]

AI Alignment at Your Discretion

Maarten Buyl, Hadi Khalaf, Claudio Mayrink Verdun, Lucas Monteiro Paes, Caio Cesar Vieira Machado, and Flavio du Pin Calmon. AI Alignment at Your Discretion. InProceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency, 2025. 2

2025
[20]

Jacobs and Hanna Wallach

Abigail Z. Jacobs and Hanna Wallach. Measurement and Fairness. InPro- ceedings of the 2021 ACM Conference on Fairness, Accountability, and Trans- parency, 2021. 2 20·Daniel Halpern et al

2021
[21]

Position: A Roadmap to Pluralistic Alignment

Taylor Sorensen, Jared Moore, Jillian Fisher, Mitchell Gordon, Niloofar Mireshghallah, Christopher Michael Rytting, Andre Ye, Liwei Jiang, Ximing Lu, Nouha Dziri, Tim Althoff, and Yejin Choi. Position: A Roadmap to Pluralistic Alignment. InProceedings of the 41st International Conference on Machine Learning, 2024. 2, 17

2024
[22]

Procaccia, and Itai Shapira

Daniel Halpern, Evi Micha, Ariel D. Procaccia, and Itai Shapira. Pairwise Calibrated Rewards for Pluralistic Alignment. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. 2, 17

2025
[23]

Beyond Preferences in AI Alignment.Philosophical Studies, 182(7):1813–1863, 2025

Tan Zhi-Xuan, Micah Carroll, Matija Franklin, and Hal Ashton. Beyond Preferences in AI Alignment.Philosophical Studies, 182(7):1813–1863, 2025. 2

2025
[24]

Hwang, Sydney Levine, Valentina Py- atkin, Peter West, Nouha Dziri, Ximing Lu, Kavel Rao, Chandra Bhagavatula, Maarten Sap, John Tasioulas, and Yejin Choi

Taylor Sorensen, Liwei Jiang, Jena D. Hwang, Sydney Levine, Valentina Py- atkin, Peter West, Nouha Dziri, Ximing Lu, Kavel Rao, Chandra Bhagavatula, Maarten Sap, John Tasioulas, and Yejin Choi. Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties. InProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence an...

2024
[25]

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback, 2022

Yuntao Bai, Andy Jones, Kamal Ndousse, Amanda Askell, Anna Chen, Nova DasSarma, Dawn Drain, Stanislav Fort, Deep Ganguli, Tom Henighan, Nicholas Joseph, Saurav Kadavath, Jackson Kernion, Tom Conerly, Sheer El-Showk, Nelson Elhage, Zac Hatfield-Dodds, Danny Hernandez, Tristan Hume, Scott Johnston, Shauna Kravec, Liane Lovitt, Neel Nanda, Catherine Olsson, ...

Pith/arXiv arXiv 2022
[26]

Cultural Palette: Pluralising Culture Alignment via Multi-Agent Palette, 2024

Jiahao Yuan, Zixiang Di, Shangzixin Zhao, Zhiqing Cui, Hanqing Wang, Guisong Yang, and Usman Naseem. Cultural Palette: Pluralising Culture Alignment via Multi-Agent Palette, 2024. arXiv:2412.11167. 2

arXiv 2024
[27]

Cultural Incon- gruencies in Artificial Intelligence

Vinodkumar Prabhakaran, Rida Qadri, and Ben Hutchinson. Cultural Incon- gruencies in Artificial Intelligence. InFirst Workshop on Cultures in AI/AI in Culture, NeurIPS 2022, 2022. 2

2022
[28]

Edelman, Zhaowei Zhang, Mario G¨ unther, Anton Korinek, Jose Hernandez-Orallo, Lewis Hammond, Eric J

Usman Anwar, Abulhair Saparov, Javier Rando, Daniel Paleka, Miles Turpin, Peter Hase, Ekdeep Singh Lubana, Erik Jenner, Stephen Casper, Oliver Sourbut, Benjamin L. Edelman, Zhaowei Zhang, Mario G¨ unther, Anton Korinek, Jose Hernandez-Orallo, Lewis Hammond, Eric J. Bigelow, Alexander Pan, Lauro Langosco, Tomasz Korbak, Heidi Chenyu Zhang, Ruiqi Zhong, Se´...

2024
[29]

Hannah Rose Kirk, Alexander Whitefield, Paul R¨ ottger, Andrew Michael AI Alignment From Social Choice Perspectives·21 Bean, Katerina Margatina, Rafael Mosquera, Juan Manuel Ciro, Max Bartolo, Adina Williams, He He, Bertie Vidgen, and Scott A. Hale. The PRISM Alignment Dataset: What Participatory, Representative and Individualised Human Feedback Reveals A...

2024
[30]

Modular Pluralism: Pluralistic Alignment via Multi-LLM Collaboration

Shangbin Feng, Taylor Sorensen, Yuhan Liu, Jillian Fisher, Chan Young Park, Yejin Choi, and Yulia Tsvetkov. Modular Pluralism: Pluralistic Alignment via Multi-LLM Collaboration. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024. 2, 17

2024
[31]

Zhang, Zhilin Wang, Jena D

Michael J.Q. Zhang, Zhilin Wang, Jena D. Hwang, Yi Dong, Olivier Delalleau, Yejin Choi, Eunsol Choi, Xiang Ren, and Valentina Pyatkin. Diverging Preferences: When Do Annotators Disagree and Do Models Know? In Proceedings of the 42nd International Conference on Machine Learning, 2025. 2

2025
[32]

Procaccia, and Aarti Singh

Kanad Shrikar Pardeshi, Itai Shapira, Ariel D. Procaccia, and Aarti Singh. Learning Social Welfare Functions. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. 2

2024
[33]

The Possibility of Social Choice.American Economic Review, 89(3):349–378, 1999

Amartya Sen. The Possibility of Social Choice.American Economic Review, 89(3):349–378, 1999. 2

1999
[34]

Social Choice Theory

Amartya Sen. Social Choice Theory. In Kenneth J. Arrow and Michael D. Intriligator, editors,Handbook of Mathematical Economics, volume 3, pages 1073–1181. Elsevier, 1986. 2

1986
[35]

Holliday, Bob M

Vincent Conitzer, Rachel Freedman, Jobst Heitzig, Wesley H. Holliday, Bob M. Jacobs, Nathan Lambert, Milan Moss´ e, Eric Pacuit, Stuart Russell, Hailey Schoelkopf, Emanuel Tewolde, and William S. Zwicker. Position: Social Choice Should Guide AI Alignment in Dealing with Diverse Human Feedback. In Proceedings of the 41st International Conference on Machine...

2024
[36]

Mapping Social Choice Theory to RLHF

Jessica Dai and Eve Fleisig. Mapping Social Choice Theory to RLHF. In ICLR 2024 Workshop on Reliable and Responsible Foundation Models, 2024. 2

2024
[37]

AI Alignment and Social Choice: Fundamental Limitations and Policy Implications, 2023

Abhilash Mishra. AI Alignment and Social Choice: Fundamental Limitations and Policy Implications, 2023. arXiv:2310.16048. 2

arXiv 2023
[38]

Procaccia, Itai Shapira, Yev- geniy Vorobeychik, and Junlin Wu

Luise Ge, Daniel Halpern, Evi Micha, Ariel D. Procaccia, Itai Shapira, Yev- geniy Vorobeychik, and Junlin Wu. Axioms for AI Alignment from Human Feedback. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. 2, 10

2024
[39]

Alamdari, Soroush Ebadian, and Ariel D

Parand A. Alamdari, Soroush Ebadian, and Ariel D. Procaccia. Policy Ag- gregation. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. 2

2024
[40]

Jackpot! Alignment as a Maximal Lottery, 2025

Roberto-Rafael Maura-Rivero, Marc Lanctot, Francesco Visin, and Kate Larson. Jackpot! Alignment as a Maximal Lottery, 2025. arXiv:2501.19266. 2, 15 22·Daniel Halpern et al

arXiv 2025
[41]

Moral Decision Making Frameworks for Artificial Intelligence

Vincent Conitzer, Walter Sinnott-Armstrong, Jana Schaich Borg, Yuan Deng, and Max Kramer. Moral Decision Making Frameworks for Artificial Intelligence. Proceedings of the AAAI Conference on Artificial Intelligence, 31(1), 2017. 2

2017
[42]

Social Choice and the Value Alignment Problem

Mahendra Prasad. Social Choice and the Value Alignment Problem. In Artificial Intelligence Safety and Security. Chapman and Hall/CRC, 2018. 2

2018
[43]

Scaling Laws for Reward Model Overoptimization

Leo Gao, John Schulman, and Jacob Hilton. Scaling Laws for Reward Model Overoptimization. InProceedings of the 40th International Conference on Machine Learning, 2023. 3

2023
[44]

Rethinking Reward Model Evalua- tion: Are We Barking Up the Wrong Tree? InThe Thirteenth International Conference on Learning Representations, 2025

Xueru Wen, Jie Lou, Yaojie Lu, Hongyu Lin, Xing Yu, Xinyu Lu, Ben He, Xianpei Han, Debing Zhang, and Le Sun. Rethinking Reward Model Evalua- tion: Are We Barking Up the Wrong Tree? InThe Thirteenth International Conference on Learning Representations, 2025. 3

2025
[45]

Gonzalez, and Ion Stoica

Evan Frick, Tianle Li, Connor Chen, Wei-Lin Chiang, Anastasios Nikolas Angelopoulos, Jiantao Jiao, Banghua Zhu, Joseph E. Gonzalez, and Ion Stoica. How to Evaluate Reward Models for RLHF. InThe Thirteenth International Conference on Learning Representations, 2025. 3

2025
[46]

The Challenge of Value Alignment: From Fairer Algorithms to AI Safety

Iason Gabriel and Vafa Ghazavi. The Challenge of Value Alignment: From Fairer Algorithms to AI Safety. In Carissa V´ eliz, editor,Oxford Handbook of Digital Ethics, pages 336–355. Oxford University Press, 2023. 3

2023
[47]

Seth D. Baum. Social Choice Ethics in Artificial Intelligence.AI & Society, 35(1):165–176, 2020. 3

2020
[48]

A Survey of Reinforcement Learning from Human Feedback.Transactions on Machine Learning Research, 2025

Timo Kaufmann, Paul Weng, Viktor Bengs, and Eyke H¨ ullermeier. A Survey of Reinforcement Learning from Human Feedback.Transactions on Machine Learning Research, 2025. 4

2025
[49]

Principled Reinforcement Learning with Human Feedback from Pairwise or K-Wise Comparisons

Banghua Zhu, Michael Jordan, and Jiantao Jiao. Principled Reinforcement Learning with Human Feedback from Pairwise or K-Wise Comparisons. In Proceedings of the 40th International Conference on Machine Learning, 2023. 4

2023
[50]

Fishburn

Peter C. Fishburn. Condorcet Social Choice Functions.SIAM Journal on Applied Mathematics, 33(3):469–489, 1977. 4

1977
[51]

Thurstone

Louis L. Thurstone. A Law of Comparative Judgment.Psychological Review, 34(4):273–286, 1927. 4

1927
[52]

Duncan Luce.Individual Choice Behavior

R. Duncan Luce.Individual Choice Behavior. John Wiley, Oxford, England,
[53]

Conditional Logit Analysis of Qualitative Choice Behavior

Daniel McFadden. Conditional Logit Analysis of Qualitative Choice Behavior. InFrontiers in Econometrics, pages 105–142. Academic Press, New York,
[54]

Procaccia

Ritesh Noothigattu, Dominik Peters, and Ariel D. Procaccia. Axioms for Learning from Pairwise Comparisons. InThe Thirty-fourth Annual Conference on Neural Information Processing Systems, 2020. 4

2020
[55]

Bradley Knox, Stephane Hatgis-Kessell, Serena Booth, Scott Niekum, Peter Stone, and Alessandro G

W. Bradley Knox, Stephane Hatgis-Kessell, Serena Booth, Scott Niekum, Peter Stone, and Alessandro G. Allievi. Models of Human Preference for Learning Reward Functions.Transactions on Machine Learning Research,
[56]

4 AI Alignment From Social Choice Perspectives·23
[57]

Ralph Allan Bradley and Milton E. Terry. Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons.Biometrika, 39(3/4):324–345,
[58]

Truong, Andreas Haupt, and Sanmi Koyejo.Machine Learning from Human Preferences

Sang T. Truong, Andreas Haupt, and Sanmi Koyejo.Machine Learning from Human Preferences. Stanford University, 2025. 4, 7

2025
[59]

John I. Yellott. The Relationship Between Luce’s Choice Axiom, Thurstone’s Theory of Comparative Judgment, and the Double Exponential Distribution. Journal of Mathematical Psychology, 15(2):109–144, 1977. 4

1977
[60]

M´ emoire sur les ´ elections au scrutin

Jean-Charles de Borda. M´ emoire sur les ´ elections au scrutin. InM´ emoires de l’Acad´ emie Royale des Sciences ann´ ee 1781, pages 657–665. l’Imprimerie Royale, Paris, 1784. 5
[61]

Procaccia.Handbook of Computational Social Choice

Felix Brandt, Vincent Conitzer, Ulle Endriss, J´ erˆ ome Lang, and Ariel D. Procaccia.Handbook of Computational Social Choice. Cambridge University Press, 2016. 5, 7, 14

2016
[62]

Falk, and Lana Yeganova

Lowell Bruce Anderson, Helena Dandurova, James E. Falk, and Lana Yeganova. Relationships Between Borda Voting and Zermelo Ranking.Social Choice and Welfare, 32(3):355–365, 2009. 6, 7

2009
[63]

Die Berechnung der Turnier-Ergebnisse als ein Maximumprob- lem der Wahrscheinlichkeitsrechnung.Mathematische Zeitschrift, 29:436–460,

Ernst Zermelo. Die Berechnung der Turnier-Ergebnisse als ein Maximumprob- lem der Wahrscheinlichkeitsrechnung.Mathematische Zeitschrift, 29:436–460,
[64]

Lester R. Ford. Solution of a Ranking Problem from Binary Comparisons. The American Mathematical Monthly, 64(8):28–33, 1957. 6

1957
[65]

Henry E. Daniels. Round-Robin Tournament Scores.Biometrika, 56(2): 295–299, 1969. 6

1969
[66]

The Ranking of Incomplete Tournaments: A Mathematician’s Guide to Popular Sports.The American Mathematical Monthly, 90(4):246–266,

Thomas Jech. The Ranking of Incomplete Tournaments: A Mathematician’s Guide to Popular Sports.The American Mathematical Monthly, 90(4):246–266,
[67]

Distri- butional Preference Learning: Understanding and Accounting for Hidden Context in RLHF

Anand Siththaranjan, Cassidy Laidlaw, and Dylan Hadfield-Menell. Distri- butional Preference Learning: Understanding and Accounting for Hidden Context in RLHF. InThe Twelfth International Conference on Learning Representations, 2024. 7

2024
[68]

Borda’s Rule and Arrow’s Independence Condition.Journal of Political Economy, 133(2):385–420, 2025

Eric Maskin. Borda’s Rule and Arrow’s Independence Condition.Journal of Political Economy, 133(2):385–420, 2025. 7

2025
[69]

Social Choice in the South Seas: Electoral Innovation and the Borda Count in the Pacific Island Countries.International Political Science Review, 23(4):355–372, 2002

Benjamin Reilly. Social Choice in the South Seas: Electoral Innovation and the Borda Count in the Pacific Island Countries.International Political Science Review, 23(4):355–372, 2002. 7

2002
[70]

Balinski and Rida Laraki.Majority Judgment: Measuring, Ranking, and Electing

Michel L. Balinski and Rida Laraki.Majority Judgment: Measuring, Ranking, and Electing. MIT Press, 2010. 7

2010
[71]

Linearly-Solvable Markov Decision Problems

Emanuel Todorov. Linearly-Solvable Markov Decision Problems. InThe Twentieth Annual Conference on Neural Information Processing Systems,
[72]

Relative Entropy Policy Search.Proceedings of the AAAI Conference on Artificial Intelligence, 24(1): 1607–1612, 2010

Jan Peters, Katharina M¨ ulling, and Yasemin Altun. Relative Entropy Policy Search.Proceedings of the AAAI Conference on Artificial Intelligence, 24(1): 1607–1612, 2010. 7 24·Daniel Halpern et al

2010
[73]

Procaccia

Ali Shirali, Arash Nasr-Esfahany, Abdullah Omar Alomar, Parsa Mirtaheri, Rediet Abebe, and Ariel D. Procaccia. Direct Alignment with Heterogeneous Preferences. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. 7

2025
[74]

Bean, Bertie Vidgen, Paul R¨ ottger, and Scott A

Hannah Rose Kirk, Andrew M. Bean, Bertie Vidgen, Paul R¨ ottger, and Scott A. Hale. The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023. 7

2023
[75]

Zhang, Xinyi Chen, Qiuyi Zhang, Rajesh Ranganath, and Kyunghyun Cho

Angelica Chen, Sadhika Malladi, Lily H. Zhang, Xinyi Chen, Qiuyi Zhang, Rajesh Ranganath, and Kyunghyun Cho. Preference Learning Algorithms Do Not Learn Preference Rankings. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. 7

2024
[76]

Margin Matching Preference Optimization: Enhanced Model Alignment with Granular Feedback

Kyuyoung Kim, Ah Jeong Seo, Hao Liu, Jinwoo Shin, and Kimin Lee. Margin Matching Preference Optimization: Enhanced Model Alignment with Granular Feedback. InFindings of the Association for Computational Linguistics: EMNLP 2024, 2024. 7

2024
[77]

The History and Risks of Reinforcement Learning and Human Feedback, 2023

Nathan Lambert, Thomas Krendl Gilbert, and Tom Zick. The History and Risks of Reinforcement Learning and Human Feedback, 2023. arXiv:2310.13595. 7

arXiv 2023
[78]

On Releasing Annotator-Level Labels and Information in Datasets

Vinodkumar Prabhakaran, Aida Mostafazadeh Davani, and Mark Diaz. On Releasing Annotator-Level Labels and Information in Datasets. InProceedings of the Joint 15th Linguistic Annotation Workshop (LAW) and 3rd Designing Meaning Representations (DMR) Workshop, 2021. 7

2021
[79]

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned, 2022

Deep Ganguli, Liane Lovitt, Jackson Kernion, Amanda Askell, Yuntao Bai, Saurav Kadavath, Ben Mann, Ethan Perez, Nicholas Schiefer, Kamal Ndousse, Andy Jones, Sam Bowman, Anna Chen, Tom Conerly, Nova DasSarma, Dawn Drain, Nelson Elhage, Sheer El-Showk, Stanislav Fort, Zac Hatfield- Dodds, Tom Henighan, Danny Hernandez, Tristan Hume, Josh Jacobson, Scott Jo...

Pith/arXiv arXiv 2022
[80]

Copeland

Arthur H. Copeland. A Reasonable Social Welfare Function. Mimeographed notes from the Seminar on Applications of Mathematics to the Social Sciences, University of Michigan, Ann Arbor, 1951. 7

1951

Showing first 80 references.

[1] [1]

Scalable Agent Alignment via Reward Modeling: A Research Direction, 2018

Jan Leike, David Krueger, Tom Everitt, Miljan Martic, Vishal Maini, and Shane Legg. Scalable Agent Alignment via Reward Modeling: A Research Direction, 2018. arXiv:1811.07871. 1

Pith/arXiv arXiv 2018

[2] [2]

AI Alignment: A Contemporary Survey.ACM Computing Surveys, 58(5):132:1–132:38, 2025

Jiaming Ji, Tianyi Qiu, Boyuan Chen, Jiayi Zhou, Borong Zhang, Donghai Hong, Hantao Lou, Kaile Wang, Yawen Duan, Zhonghao He, Lukas Vierling, Zhaowei Zhang, Fanzhi Zeng, Juntao Dai, Xuehai Pan, Hua Xu, Aidan O’Gara, Kwan Ng, Brian Tse, Jie Fu, Stephen McAleer, Yanfeng Wang, Mingchuan Yang, Yunhuai Liu, Yizhou Wang, Song-Chun Zhu, Yike Guo, Yaodong Yang, a...

2025

[3] [3]

Artificial Intelligence, Values, and Alignment.Minds and Machines, 30(3):411–437, 2020

Iason Gabriel. Artificial Intelligence, Values, and Alignment.Minds and Machines, 30(3):411–437, 2020. 1, 3

2020

[4] [4]

Russell.Human Compatible: Artificial Intelligence and the Problem of Control

Stuart J. Russell.Human Compatible: Artificial Intelligence and the Problem of Control. Allen Lane, 2019. 1

2019

[5] [5]

Alignment for Advanced Machine Learning Systems

Jessica Taylor, Eliezer Yudkowsky, Patrick LaVictoire, and Andrew Critch. Alignment for Advanced Machine Learning Systems. In S. Matthew Liao, editor,Ethics of Artificial Intelligence, pages 342–382. Oxford University Press,

[6] [6]

Alignment of Language Agents, 2021

Zachary Kenton, Tom Everitt, Laura Weidinger, Iason Gabriel, Vladimir Mikulik, and Geoffrey Irving. Alignment of Language Agents, 2021. arXiv:2103.14659. 1

arXiv 2021

[7] [7]

A Matter of Principle? AI Alignment as the Fair Treatment of Claims.Philosophical Studies, 182(7):1951–1973, 2025

Iason Gabriel and Geoff Keeling. A Matter of Principle? AI Alignment as the Fair Treatment of Claims.Philosophical Studies, 182(7):1951–1973, 2025. 1

1951

[8] [8]

Unsolved Problems in ML Safety, 2021

Dan Hendrycks, Nicholas Carlini, John Schulman, and Jacob Steinhardt. Unsolved Problems in ML Safety, 2021. arXiv:2109.13916. 1

Pith/arXiv arXiv 2021

[9] [9]

Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, J´ er´ emy Scheurer, Javier Rando, Rachel Freedman, Tomek Korbak, David Lindner, Pedro Freire, Tony Tong Wang, Samuel Marks, Charbel-Raphael Segerie, Micah Carroll, Andi Peng, Phillip J. K. Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J. ...

[10] [10]

EloChoice

Andrew P. Clark, Kate L. Howard, Andy T. Woods, Ian S. Penton-Voak, and Christof Neumann. Why Rate When You Could Compare? Using the “EloChoice” Package to Assess Pairwise Comparisons of Perceived Physical Strength.PLOS ONE, 13(1):e0190393, 2018. 1

2018

[11] [11]

The k-Armed Dueling Bandits Problem.Journal of Computer and System Sciences, 78(5):1538–1556, 2012

Yisong Yue, Josef Broder, Robert Kleinberg, and Thorsten Joachims. The k-Armed Dueling Bandits Problem.Journal of Computer and System Sciences, 78(5):1538–1556, 2012. 1

2012

[12] [12]

Christiano, Jan Leike, Tom B

Paul F. Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, and Dario Amodei. Deep Reinforcement Learning from Human Preferences. In AI Alignment From Social Choice Perspectives·19 The Thirty-first Annual Conference on Neural Information Processing Systems,

[13] [13]

Training Language Models to Follow Instructions with Human Feedback

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, and Ryan Lowe. Training Language Models to Follow Instructions with Human Feedback....

2022

[14] [14]

Ziegler, Nisan Stiennon, Jeffrey Wu, Tom B

Daniel M. Ziegler, Nisan Stiennon, Jeffrey Wu, Tom B. Brown, Alec Radford, Dario Amodei, Paul Christiano, and Geoffrey Irving. Fine-Tuning Language Models from Human Preferences, 2020. arXiv:1909.08593. 1, 4, 5

Pith/arXiv arXiv 2020

[15] [15]

Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, and Paul Christiano

Nisan Stiennon, Long Ouyang, Jeff Wu, Daniel M. Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, and Paul Christiano. Learning to Sum- marize with Human Feedback. InThe Thirty-fourth Annual Conference on Neural Information Processing Systems, 2020. 1, 2, 4, 7

2020

[16] [16]

Manning, Stefano Ermon, and Chelsea Finn

Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D. Manning, Stefano Ermon, and Chelsea Finn. Direct Preference Optimization: Your Language Model Is Secretly a Reward Model. InThe Thirty-seventh Annual Conference on Neural Information Processing Systems, 2023. 1, 2

2023

[17] [17]

Bowman, Zac Hatfield-Dodds, Ben Mann, Dario Amodei, Nicholas Joseph, Sam McCandlish, Tom Brown, and Jared Kaplan

Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, Carol Chen, Catherine Olsson, Christopher Olah, Danny Hernan- dez, Dawn Drain, Deep Ganguli, Dustin Li, Eli Tran-Johnson, Ethan Perez, Jamie Kerr, Jared Mueller, Jeffrey Ladish, Joshua Landau, Kamal Ndousse,...

Pith/arXiv arXiv 2022

[18] [18]

Liao, Esin Durmus, Alex Tamkin, and Deep Ganguli

Saffron Huang, Divya Siddarth, Liane Lovitt, Thomas I. Liao, Esin Durmus, Alex Tamkin, and Deep Ganguli. Collective Constitutional AI: Aligning a Language Model with Public Input. InProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, 2024. 2

2024

[19] [19]

AI Alignment at Your Discretion

Maarten Buyl, Hadi Khalaf, Claudio Mayrink Verdun, Lucas Monteiro Paes, Caio Cesar Vieira Machado, and Flavio du Pin Calmon. AI Alignment at Your Discretion. InProceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency, 2025. 2

2025

[20] [20]

Jacobs and Hanna Wallach

Abigail Z. Jacobs and Hanna Wallach. Measurement and Fairness. InPro- ceedings of the 2021 ACM Conference on Fairness, Accountability, and Trans- parency, 2021. 2 20·Daniel Halpern et al

2021

[21] [21]

Position: A Roadmap to Pluralistic Alignment

Taylor Sorensen, Jared Moore, Jillian Fisher, Mitchell Gordon, Niloofar Mireshghallah, Christopher Michael Rytting, Andre Ye, Liwei Jiang, Ximing Lu, Nouha Dziri, Tim Althoff, and Yejin Choi. Position: A Roadmap to Pluralistic Alignment. InProceedings of the 41st International Conference on Machine Learning, 2024. 2, 17

2024

[22] [22]

Procaccia, and Itai Shapira

Daniel Halpern, Evi Micha, Ariel D. Procaccia, and Itai Shapira. Pairwise Calibrated Rewards for Pluralistic Alignment. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. 2, 17

2025

[23] [23]

Beyond Preferences in AI Alignment.Philosophical Studies, 182(7):1813–1863, 2025

Tan Zhi-Xuan, Micah Carroll, Matija Franklin, and Hal Ashton. Beyond Preferences in AI Alignment.Philosophical Studies, 182(7):1813–1863, 2025. 2

2025

[24] [24]

Hwang, Sydney Levine, Valentina Py- atkin, Peter West, Nouha Dziri, Ximing Lu, Kavel Rao, Chandra Bhagavatula, Maarten Sap, John Tasioulas, and Yejin Choi

Taylor Sorensen, Liwei Jiang, Jena D. Hwang, Sydney Levine, Valentina Py- atkin, Peter West, Nouha Dziri, Ximing Lu, Kavel Rao, Chandra Bhagavatula, Maarten Sap, John Tasioulas, and Yejin Choi. Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties. InProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence an...

2024

[25] [25]

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback, 2022

Yuntao Bai, Andy Jones, Kamal Ndousse, Amanda Askell, Anna Chen, Nova DasSarma, Dawn Drain, Stanislav Fort, Deep Ganguli, Tom Henighan, Nicholas Joseph, Saurav Kadavath, Jackson Kernion, Tom Conerly, Sheer El-Showk, Nelson Elhage, Zac Hatfield-Dodds, Danny Hernandez, Tristan Hume, Scott Johnston, Shauna Kravec, Liane Lovitt, Neel Nanda, Catherine Olsson, ...

Pith/arXiv arXiv 2022

[26] [26]

Cultural Palette: Pluralising Culture Alignment via Multi-Agent Palette, 2024

Jiahao Yuan, Zixiang Di, Shangzixin Zhao, Zhiqing Cui, Hanqing Wang, Guisong Yang, and Usman Naseem. Cultural Palette: Pluralising Culture Alignment via Multi-Agent Palette, 2024. arXiv:2412.11167. 2

arXiv 2024

[27] [27]

Cultural Incon- gruencies in Artificial Intelligence

Vinodkumar Prabhakaran, Rida Qadri, and Ben Hutchinson. Cultural Incon- gruencies in Artificial Intelligence. InFirst Workshop on Cultures in AI/AI in Culture, NeurIPS 2022, 2022. 2

2022

[28] [28]

Edelman, Zhaowei Zhang, Mario G¨ unther, Anton Korinek, Jose Hernandez-Orallo, Lewis Hammond, Eric J

Usman Anwar, Abulhair Saparov, Javier Rando, Daniel Paleka, Miles Turpin, Peter Hase, Ekdeep Singh Lubana, Erik Jenner, Stephen Casper, Oliver Sourbut, Benjamin L. Edelman, Zhaowei Zhang, Mario G¨ unther, Anton Korinek, Jose Hernandez-Orallo, Lewis Hammond, Eric J. Bigelow, Alexander Pan, Lauro Langosco, Tomasz Korbak, Heidi Chenyu Zhang, Ruiqi Zhong, Se´...

2024

[29] [29]

Hannah Rose Kirk, Alexander Whitefield, Paul R¨ ottger, Andrew Michael AI Alignment From Social Choice Perspectives·21 Bean, Katerina Margatina, Rafael Mosquera, Juan Manuel Ciro, Max Bartolo, Adina Williams, He He, Bertie Vidgen, and Scott A. Hale. The PRISM Alignment Dataset: What Participatory, Representative and Individualised Human Feedback Reveals A...

2024

[30] [30]

Modular Pluralism: Pluralistic Alignment via Multi-LLM Collaboration

Shangbin Feng, Taylor Sorensen, Yuhan Liu, Jillian Fisher, Chan Young Park, Yejin Choi, and Yulia Tsvetkov. Modular Pluralism: Pluralistic Alignment via Multi-LLM Collaboration. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024. 2, 17

2024

[31] [31]

Zhang, Zhilin Wang, Jena D

Michael J.Q. Zhang, Zhilin Wang, Jena D. Hwang, Yi Dong, Olivier Delalleau, Yejin Choi, Eunsol Choi, Xiang Ren, and Valentina Pyatkin. Diverging Preferences: When Do Annotators Disagree and Do Models Know? In Proceedings of the 42nd International Conference on Machine Learning, 2025. 2

2025

[32] [32]

Procaccia, and Aarti Singh

Kanad Shrikar Pardeshi, Itai Shapira, Ariel D. Procaccia, and Aarti Singh. Learning Social Welfare Functions. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. 2

2024

[33] [33]

The Possibility of Social Choice.American Economic Review, 89(3):349–378, 1999

Amartya Sen. The Possibility of Social Choice.American Economic Review, 89(3):349–378, 1999. 2

1999

[34] [34]

Social Choice Theory

Amartya Sen. Social Choice Theory. In Kenneth J. Arrow and Michael D. Intriligator, editors,Handbook of Mathematical Economics, volume 3, pages 1073–1181. Elsevier, 1986. 2

1986

[35] [35]

Holliday, Bob M

Vincent Conitzer, Rachel Freedman, Jobst Heitzig, Wesley H. Holliday, Bob M. Jacobs, Nathan Lambert, Milan Moss´ e, Eric Pacuit, Stuart Russell, Hailey Schoelkopf, Emanuel Tewolde, and William S. Zwicker. Position: Social Choice Should Guide AI Alignment in Dealing with Diverse Human Feedback. In Proceedings of the 41st International Conference on Machine...

2024

[36] [36]

Mapping Social Choice Theory to RLHF

Jessica Dai and Eve Fleisig. Mapping Social Choice Theory to RLHF. In ICLR 2024 Workshop on Reliable and Responsible Foundation Models, 2024. 2

2024

[37] [37]

AI Alignment and Social Choice: Fundamental Limitations and Policy Implications, 2023

Abhilash Mishra. AI Alignment and Social Choice: Fundamental Limitations and Policy Implications, 2023. arXiv:2310.16048. 2

arXiv 2023

[38] [38]

Procaccia, Itai Shapira, Yev- geniy Vorobeychik, and Junlin Wu

Luise Ge, Daniel Halpern, Evi Micha, Ariel D. Procaccia, Itai Shapira, Yev- geniy Vorobeychik, and Junlin Wu. Axioms for AI Alignment from Human Feedback. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. 2, 10

2024

[39] [39]

Alamdari, Soroush Ebadian, and Ariel D

Parand A. Alamdari, Soroush Ebadian, and Ariel D. Procaccia. Policy Ag- gregation. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. 2

2024

[40] [40]

Jackpot! Alignment as a Maximal Lottery, 2025

Roberto-Rafael Maura-Rivero, Marc Lanctot, Francesco Visin, and Kate Larson. Jackpot! Alignment as a Maximal Lottery, 2025. arXiv:2501.19266. 2, 15 22·Daniel Halpern et al

arXiv 2025

[41] [41]

Moral Decision Making Frameworks for Artificial Intelligence

Vincent Conitzer, Walter Sinnott-Armstrong, Jana Schaich Borg, Yuan Deng, and Max Kramer. Moral Decision Making Frameworks for Artificial Intelligence. Proceedings of the AAAI Conference on Artificial Intelligence, 31(1), 2017. 2

2017

[42] [42]

Social Choice and the Value Alignment Problem

Mahendra Prasad. Social Choice and the Value Alignment Problem. In Artificial Intelligence Safety and Security. Chapman and Hall/CRC, 2018. 2

2018

[43] [43]

Scaling Laws for Reward Model Overoptimization

Leo Gao, John Schulman, and Jacob Hilton. Scaling Laws for Reward Model Overoptimization. InProceedings of the 40th International Conference on Machine Learning, 2023. 3

2023

[44] [44]

Rethinking Reward Model Evalua- tion: Are We Barking Up the Wrong Tree? InThe Thirteenth International Conference on Learning Representations, 2025

Xueru Wen, Jie Lou, Yaojie Lu, Hongyu Lin, Xing Yu, Xinyu Lu, Ben He, Xianpei Han, Debing Zhang, and Le Sun. Rethinking Reward Model Evalua- tion: Are We Barking Up the Wrong Tree? InThe Thirteenth International Conference on Learning Representations, 2025. 3

2025

[45] [45]

Gonzalez, and Ion Stoica

Evan Frick, Tianle Li, Connor Chen, Wei-Lin Chiang, Anastasios Nikolas Angelopoulos, Jiantao Jiao, Banghua Zhu, Joseph E. Gonzalez, and Ion Stoica. How to Evaluate Reward Models for RLHF. InThe Thirteenth International Conference on Learning Representations, 2025. 3

2025

[46] [46]

The Challenge of Value Alignment: From Fairer Algorithms to AI Safety

Iason Gabriel and Vafa Ghazavi. The Challenge of Value Alignment: From Fairer Algorithms to AI Safety. In Carissa V´ eliz, editor,Oxford Handbook of Digital Ethics, pages 336–355. Oxford University Press, 2023. 3

2023

[47] [47]

Seth D. Baum. Social Choice Ethics in Artificial Intelligence.AI & Society, 35(1):165–176, 2020. 3

2020

[48] [48]

A Survey of Reinforcement Learning from Human Feedback.Transactions on Machine Learning Research, 2025

Timo Kaufmann, Paul Weng, Viktor Bengs, and Eyke H¨ ullermeier. A Survey of Reinforcement Learning from Human Feedback.Transactions on Machine Learning Research, 2025. 4

2025

[49] [49]

Principled Reinforcement Learning with Human Feedback from Pairwise or K-Wise Comparisons

Banghua Zhu, Michael Jordan, and Jiantao Jiao. Principled Reinforcement Learning with Human Feedback from Pairwise or K-Wise Comparisons. In Proceedings of the 40th International Conference on Machine Learning, 2023. 4

2023

[50] [50]

Fishburn

Peter C. Fishburn. Condorcet Social Choice Functions.SIAM Journal on Applied Mathematics, 33(3):469–489, 1977. 4

1977

[51] [51]

Thurstone

Louis L. Thurstone. A Law of Comparative Judgment.Psychological Review, 34(4):273–286, 1927. 4

1927

[52] [52]

Duncan Luce.Individual Choice Behavior

R. Duncan Luce.Individual Choice Behavior. John Wiley, Oxford, England,

[53] [53]

Conditional Logit Analysis of Qualitative Choice Behavior

Daniel McFadden. Conditional Logit Analysis of Qualitative Choice Behavior. InFrontiers in Econometrics, pages 105–142. Academic Press, New York,

[54] [54]

Procaccia

Ritesh Noothigattu, Dominik Peters, and Ariel D. Procaccia. Axioms for Learning from Pairwise Comparisons. InThe Thirty-fourth Annual Conference on Neural Information Processing Systems, 2020. 4

2020

[55] [55]

Bradley Knox, Stephane Hatgis-Kessell, Serena Booth, Scott Niekum, Peter Stone, and Alessandro G

W. Bradley Knox, Stephane Hatgis-Kessell, Serena Booth, Scott Niekum, Peter Stone, and Alessandro G. Allievi. Models of Human Preference for Learning Reward Functions.Transactions on Machine Learning Research,

[56] [56]

4 AI Alignment From Social Choice Perspectives·23

[57] [57]

Ralph Allan Bradley and Milton E. Terry. Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons.Biometrika, 39(3/4):324–345,

[58] [58]

Truong, Andreas Haupt, and Sanmi Koyejo.Machine Learning from Human Preferences

Sang T. Truong, Andreas Haupt, and Sanmi Koyejo.Machine Learning from Human Preferences. Stanford University, 2025. 4, 7

2025

[59] [59]

John I. Yellott. The Relationship Between Luce’s Choice Axiom, Thurstone’s Theory of Comparative Judgment, and the Double Exponential Distribution. Journal of Mathematical Psychology, 15(2):109–144, 1977. 4

1977

[60] [60]

M´ emoire sur les ´ elections au scrutin

Jean-Charles de Borda. M´ emoire sur les ´ elections au scrutin. InM´ emoires de l’Acad´ emie Royale des Sciences ann´ ee 1781, pages 657–665. l’Imprimerie Royale, Paris, 1784. 5

[61] [61]

Procaccia.Handbook of Computational Social Choice

Felix Brandt, Vincent Conitzer, Ulle Endriss, J´ erˆ ome Lang, and Ariel D. Procaccia.Handbook of Computational Social Choice. Cambridge University Press, 2016. 5, 7, 14

2016

[62] [62]

Falk, and Lana Yeganova

Lowell Bruce Anderson, Helena Dandurova, James E. Falk, and Lana Yeganova. Relationships Between Borda Voting and Zermelo Ranking.Social Choice and Welfare, 32(3):355–365, 2009. 6, 7

2009

[63] [63]

Die Berechnung der Turnier-Ergebnisse als ein Maximumprob- lem der Wahrscheinlichkeitsrechnung.Mathematische Zeitschrift, 29:436–460,

Ernst Zermelo. Die Berechnung der Turnier-Ergebnisse als ein Maximumprob- lem der Wahrscheinlichkeitsrechnung.Mathematische Zeitschrift, 29:436–460,

[64] [64]

Lester R. Ford. Solution of a Ranking Problem from Binary Comparisons. The American Mathematical Monthly, 64(8):28–33, 1957. 6

1957

[65] [65]

Henry E. Daniels. Round-Robin Tournament Scores.Biometrika, 56(2): 295–299, 1969. 6

1969

[66] [66]

The Ranking of Incomplete Tournaments: A Mathematician’s Guide to Popular Sports.The American Mathematical Monthly, 90(4):246–266,

Thomas Jech. The Ranking of Incomplete Tournaments: A Mathematician’s Guide to Popular Sports.The American Mathematical Monthly, 90(4):246–266,

[67] [67]

Distri- butional Preference Learning: Understanding and Accounting for Hidden Context in RLHF

Anand Siththaranjan, Cassidy Laidlaw, and Dylan Hadfield-Menell. Distri- butional Preference Learning: Understanding and Accounting for Hidden Context in RLHF. InThe Twelfth International Conference on Learning Representations, 2024. 7

2024

[68] [68]

Borda’s Rule and Arrow’s Independence Condition.Journal of Political Economy, 133(2):385–420, 2025

Eric Maskin. Borda’s Rule and Arrow’s Independence Condition.Journal of Political Economy, 133(2):385–420, 2025. 7

2025

[69] [69]

Social Choice in the South Seas: Electoral Innovation and the Borda Count in the Pacific Island Countries.International Political Science Review, 23(4):355–372, 2002

Benjamin Reilly. Social Choice in the South Seas: Electoral Innovation and the Borda Count in the Pacific Island Countries.International Political Science Review, 23(4):355–372, 2002. 7

2002

[70] [70]

Balinski and Rida Laraki.Majority Judgment: Measuring, Ranking, and Electing

Michel L. Balinski and Rida Laraki.Majority Judgment: Measuring, Ranking, and Electing. MIT Press, 2010. 7

2010

[71] [71]

Linearly-Solvable Markov Decision Problems

Emanuel Todorov. Linearly-Solvable Markov Decision Problems. InThe Twentieth Annual Conference on Neural Information Processing Systems,

[72] [72]

Relative Entropy Policy Search.Proceedings of the AAAI Conference on Artificial Intelligence, 24(1): 1607–1612, 2010

Jan Peters, Katharina M¨ ulling, and Yasemin Altun. Relative Entropy Policy Search.Proceedings of the AAAI Conference on Artificial Intelligence, 24(1): 1607–1612, 2010. 7 24·Daniel Halpern et al

2010

[73] [73]

Procaccia

Ali Shirali, Arash Nasr-Esfahany, Abdullah Omar Alomar, Parsa Mirtaheri, Rediet Abebe, and Ariel D. Procaccia. Direct Alignment with Heterogeneous Preferences. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. 7

2025

[74] [74]

Bean, Bertie Vidgen, Paul R¨ ottger, and Scott A

Hannah Rose Kirk, Andrew M. Bean, Bertie Vidgen, Paul R¨ ottger, and Scott A. Hale. The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023. 7

2023

[75] [75]

Zhang, Xinyi Chen, Qiuyi Zhang, Rajesh Ranganath, and Kyunghyun Cho

Angelica Chen, Sadhika Malladi, Lily H. Zhang, Xinyi Chen, Qiuyi Zhang, Rajesh Ranganath, and Kyunghyun Cho. Preference Learning Algorithms Do Not Learn Preference Rankings. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. 7

2024

[76] [76]

Margin Matching Preference Optimization: Enhanced Model Alignment with Granular Feedback

Kyuyoung Kim, Ah Jeong Seo, Hao Liu, Jinwoo Shin, and Kimin Lee. Margin Matching Preference Optimization: Enhanced Model Alignment with Granular Feedback. InFindings of the Association for Computational Linguistics: EMNLP 2024, 2024. 7

2024

[77] [77]

The History and Risks of Reinforcement Learning and Human Feedback, 2023

Nathan Lambert, Thomas Krendl Gilbert, and Tom Zick. The History and Risks of Reinforcement Learning and Human Feedback, 2023. arXiv:2310.13595. 7

arXiv 2023

[78] [78]

On Releasing Annotator-Level Labels and Information in Datasets

Vinodkumar Prabhakaran, Aida Mostafazadeh Davani, and Mark Diaz. On Releasing Annotator-Level Labels and Information in Datasets. InProceedings of the Joint 15th Linguistic Annotation Workshop (LAW) and 3rd Designing Meaning Representations (DMR) Workshop, 2021. 7

2021

[79] [79]

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned, 2022

Deep Ganguli, Liane Lovitt, Jackson Kernion, Amanda Askell, Yuntao Bai, Saurav Kadavath, Ben Mann, Ethan Perez, Nicholas Schiefer, Kamal Ndousse, Andy Jones, Sam Bowman, Anna Chen, Tom Conerly, Nova DasSarma, Dawn Drain, Nelson Elhage, Sheer El-Showk, Stanislav Fort, Zac Hatfield- Dodds, Tom Henighan, Danny Hernandez, Tristan Hume, Josh Jacobson, Scott Jo...

Pith/arXiv arXiv 2022

[80] [80]

Copeland

Arthur H. Copeland. A Reasonable Social Welfare Function. Mimeographed notes from the Seminar on Applications of Mathematics to the Social Sciences, University of Michigan, Ann Arbor, 1951. 7

1951