Improving the Distributional Alignment of LLMs using Supervision

Alex Liu; Angela Zhang; Gauri Kambhatla; Junyi Jessy Li; Matthew Lease; Ravi Srinivasan; Sanjana Gautam

arxiv: 2507.00439 · v4 · submitted 2025-07-01 · 💻 cs.CL

Improving the Distributional Alignment of LLMs using Supervision

Gauri Kambhatla , Sanjana Gautam , Angela Zhang , Alex Liu , Ravi Srinivasan , Junyi Jessy Li , Matthew Lease This is my paper

Pith reviewed 2026-05-19 07:12 UTC · model grok-4.3

classification 💻 cs.CL

keywords distributional alignmentLLM supervisionpopulation groupspublic healthpublic opinionvalues and beliefsmodel alignment

0 comments

The pith

Adding simple supervision improves how consistently LLMs match response distributions of diverse population groups.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates ways to make large language models better reflect the varied answers that different groups of people give to subjective questions. It shows that straightforward supervision during generation leads to more reliable alignment between model outputs and real population data. The evaluation covers three distinct domains and examines both overall performance and differences across individual groups. The results across many models and prompting approaches create a reference point for continued work on this problem.

Core claim

Adding simple supervision can more consistently improve the alignment of LLM-generated distributions with diverse population groups, as measured across three datasets spanning public health, public opinion, and values and beliefs. Beyond evaluating average alignment, the work reports how alignment varies across specific groups. Broad findings from evaluations over many LLMs and prompting strategies provide insights into distributional alignment and establish a benchmark for future research.

What carries the argument

Supervision signals added to LLM generation that steer output distributions toward empirical human population data, assessed via alignment metrics on three domain-specific datasets.

Load-bearing premise

The chosen supervision signals and evaluation metrics on the three datasets capture genuine distributional alignment rather than superficial pattern matching that may not generalize beyond the tested LLMs and prompts.

What would settle it

Repeating the experiments with a fresh collection of LLMs and prompts that were not part of the original study and finding no consistent gains in alignment scores would undermine the main result.

Figures

Figures reproduced from arXiv: 2507.00439 by Alex Liu, Angela Zhang, Gauri Kambhatla, Junyi Jessy Li, Matthew Lease, Ravi Srinivasan, Sanjana Gautam.

**Figure 2.** Figure 2: Standard deviation vs. opinion alignment. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Opinion alignment for OLMo-2-7B models with different post-training methods using the verbalized [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Mean Squared Error (MSE) of regression models on various training data sizes, using SD prompted and [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗

read the original abstract

The ability to accurately align LLMs with diverse population groups on subjective questions would have great value. In this work, we show that adding simple supervision can more consistently improve the alignment of LLM-generated distributions with diverse population groups, as measured across three datasets spanning public health, public opinion, and values and beliefs. Beyond evaluating average alignment, we also report how alignment varies across specific groups. Our broad findings provide insights into the distributional alignment of LLM generations with diverse populations. By conducting evaluation over many LLMs and prompting strategies, we provide a benchmark to stimulate future research.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Supervision gives measurable gains on the three tested datasets but the setup leaves open whether it learns transferable alignment or just fits those specific distributions.

read the letter

The main thing to know is that the paper finds simple supervision improves how well LLM output distributions line up with human responses across groups on public health, opinion, and values questions. They run this over multiple models and prompting approaches and also report per-group breakdowns instead of just averages. That multi-model benchmark and the group-level view are the parts that could actually be useful to others working on the same problem. The work applies established supervision ideas to distributional alignment in a straightforward way and shows consistent lifts in their experiments, which is a solid empirical step even if the core technique is not brand new. They deserve credit for the breadth of the evaluation and for making the per-group variation visible rather than hiding it in overall scores. The soft spots sit in the details that are still thin. The abstract does not spell out the exact supervision format or whether the signals were drawn from the same data pools used for evaluation, so the risk that gains reflect dataset-specific pattern matching rather than a general mechanism is real and worth checking. There is also no mention of statistical tests or controls that would rule out prompt sensitivity as the driver of the reported improvements. If those checks are missing or weak in the full paper, the central claim stays plausible but not yet tight. This paper is for researchers who care about making LLMs reflect diverse human views in applied settings like health or policy. A reader who wants empirical baselines and group breakdowns will get something concrete to build on or cite. It is coherent enough and grounded enough in actual experiments to deserve a serious referee rather than a desk reject, mainly so the generalization questions can be pressed in review.

Referee Report

2 major / 2 minor

Summary. The paper claims that adding simple supervision can more consistently improve the alignment of LLM-generated distributions with diverse population groups, as measured across three datasets spanning public health, public opinion, and values and beliefs. Beyond average alignment, it reports variation across specific groups and provides a benchmark by evaluating over many LLMs and prompting strategies.

Significance. If the central empirical findings hold after addressing implementation details, this would be a useful contribution to LLM alignment research by demonstrating a practical, low-overhead method for better matching model output distributions to human subpopulations on subjective tasks. The broad evaluation across LLMs and strategies is a clear strength that could serve as a reusable benchmark for future work.

major comments (2)

[§4 (Experiments)] §4 (Experiments) and the abstract: the description of the supervision signals does not specify whether the few-shot examples, labels, or other signals are drawn from held-out portions of the three evaluation datasets or overlap with the test questions/populations. Without this, the reported gains on public health, opinion, and values datasets risk reflecting dataset-specific pattern matching rather than a transferable alignment mechanism.
[§4 and §5 (Results)] §4 and §5 (Results): no mention of statistical significance tests, variance across random seeds, or explicit controls for prompt sensitivity in the alignment improvements. This weakens the claim that supervision produces 'more consistent' gains across LLMs and strategies.

minor comments (2)

[Methods] Clarify the precise definition and computation of the distributional alignment metric (e.g., is it KL divergence, Wasserstein distance, or another measure) in the methods section to ensure reproducibility.
[§3 (Method)] Add a short table or paragraph summarizing the exact supervision format used for each dataset to improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments identify valuable opportunities to improve experimental transparency and statistical reporting, which we will address through targeted revisions. We respond to each major comment below.

read point-by-point responses

Referee: [§4 (Experiments)] §4 (Experiments) and the abstract: the description of the supervision signals does not specify whether the few-shot examples, labels, or other signals are drawn from held-out portions of the three evaluation datasets or overlap with the test questions/populations. Without this, the reported gains on public health, opinion, and values datasets risk reflecting dataset-specific pattern matching rather than a transferable alignment mechanism.

Authors: We thank the referee for highlighting this clarity issue. In the original experiments, supervision signals (few-shot examples and associated labels) were sampled exclusively from held-out portions of each dataset, with no overlap to the test questions or population groups used for evaluation. This split was implemented to ensure the observed alignment improvements reflect a transferable mechanism rather than dataset-specific leakage. We will revise §4 and the abstract to explicitly document the data partitioning procedure, including the sizes of the held-out supervision sets and confirmation of zero overlap with evaluation instances. revision: yes
Referee: [§4 and §5 (Results)] §4 and §5 (Results): no mention of statistical significance tests, variance across random seeds, or explicit controls for prompt sensitivity in the alignment improvements. This weakens the claim that supervision produces 'more consistent' gains across LLMs and strategies.

Authors: We agree that the absence of these elements limits the strength of the consistency claims. In the revised version we will add: (i) results aggregated over multiple random seeds with reported means and standard deviations; (ii) statistical significance tests (paired t-tests with p-values) comparing supervised versus baseline conditions; and (iii) an explicit analysis of prompt sensitivity, including variance across the prompting strategies already evaluated and additional controls that isolate the contribution of supervision from prompt variation. These additions will be placed in §4 and §5. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical evaluation against external benchmarks

full rationale

The paper is an empirical study that measures the effect of adding simple supervision on LLM-generated distributions versus human population responses on three independent datasets (public health, opinion, values). No mathematical derivations, equations, or first-principles predictions appear in the work. All reported improvements are obtained by direct comparison to external human survey data rather than by fitting parameters that are then renamed as predictions or by self-referential definitions. The central claims therefore rest on observable experimental outcomes rather than any reduction to inputs defined inside the paper itself.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on standard machine learning assumptions about the effectiveness of supervision for improving output distributions; no free parameters, invented entities, or non-standard axioms are introduced in the abstract.

axioms (1)

domain assumption Supervision signals can be applied to LLMs to improve distributional match with human groups without introducing new systematic biases
Implicit in the claim that simple supervision improves alignment across the tested datasets

pith-pipeline@v0.9.0 · 5631 in / 1213 out tokens · 31002 ms · 2026-05-19T07:12:21.630711+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We apply supervised regression to transform LLM-generated distributions... learn a regression such that each value is transformed using supervision from ground truth values for each answer choice
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_strictMono_of_one_lt unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We use the opinion alignment metric from Santurkar et al. (2023) to measure similarity between elicited distributions and ground truth distributions... 1−Wasserstein distance

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages

[1]

URL: " 'urlintro :=

ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type volume year eprint doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRINGS urlintro eprinturl eprintpr...

work page
[2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page
[3]

Arriaga, and Adam Tauman Kalai

Gati V Aher, Rosa I. Arriaga, and Adam Tauman Kalai. 2023. https://proceedings.mlr.press/v202/aher23a.html Using large language models to simulate multiple humans and replicate human subject studies . In Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 337--371. PMLR

work page 2023
[4]

Shayan Alipour, Indira Sen, Mattia Samory, and Tanushree Mitra. 2024. Robustness and confounders in the demographic alignment of llms with human perceptions of offensiveness. arXiv preprint arXiv:2411.08977

work page arXiv 2024
[5]

Tilman Beck, Hendrik Schuff, Anne Lauscher, and Iryna Gurevych. 2024. https://aclanthology.org/2024.eacl-long.159 Sensitivity, performance, robustness: Deconstructing the effect of sociodemographic prompting . In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2589-...

work page 2024
[6]

Laura Biester, Vanita Sharma, Ashkan Kazemi, Naihao Deng, Steven Wilson, and Rada Mihalcea. 2022. https://aclanthology.org/2022.nlperspectives-1.2 Analyzing the effects of annotator gender across NLP tasks . In Proceedings of the 1st Workshop on Perspectivist Approaches to NLP @LREC2022, pages 10--19, Marseille, France. European Language Resources Association

work page 2022
[7]

Myra Cheng, Esin Durmus, and Dan Jurafsky. 2023 a . https://doi.org/10.18653/v1/2023.acl-long.84 Marked personas: Using natural language prompts to measure stereotypes in language models . In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1504--1532, Toronto, Canada. Association for C...

work page doi:10.18653/v1/2023.acl-long.84 2023
[8]

Myra Cheng, Tiziano Piccardi, and Diyi Yang. 2023 b . https://doi.org/10.18653/v1/2023.emnlp-main.669 C o MP os T : Characterizing and evaluating caricature in LLM simulations . In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 10853--10875, Singapore. Association for Computational Linguistics

work page doi:10.18653/v1/2023.emnlp-main.669 2023
[9]

Xuan Long Do, Kenji Kawaguchi, Min-Yen Kan, and Nancy Chen. 2025. https://aclanthology.org/2025.coling-main.172/ Aligning large language models with human opinions through persona selection and value -- belief -- norm reasoning . In Proceedings of the 31st International Conference on Computational Linguistics, pages 2526--2547, Abu Dhabi, UAE. Association...

work page 2025
[10]

Esin Durmus, Karina Nguyen, Thomas Liao, Nicholas Schiefer, Amanda Askell, Anton Bakhtin, Carol Chen, Zac Hatfield-Dodds, Danny Hernandez, Nicholas Joseph, Liane Lovitt, Sam McCandlish, Orowa Sikder, Alex Tamkin, Janel Thamkul, Jared Kaplan, Jack Clark, and Deep Ganguli. 2024. https://openreview.net/forum?id=zl16jLb91v Towards measuring the representation...

work page 2024
[11]

Julia R Falconer, Eibe Frank, Devon LL Polaschek, and Chaitanya Joshi. 2022. Methods for eliciting informative prior distributions: A critical review. Decision Analysis, 19(3):189--204

work page 2022
[12]

Isabel O Gallegos, Ryan A Rossi, Joe Barrow, Md Mehrab Tanjim, Sungchul Kim, Franck Dernoncourt, Tong Yu, Ruiyi Zhang, and Nesreen K Ahmed. 2024. Bias and fairness in large language models: A survey. Computational Linguistics, pages 1--79

work page 2024
[13]

Jiahui Geng, Fengyu Cai, Yuxia Wang, Heinz Koeppl, Preslav Nakov, and Iryna Gurevych. 2024 a . A survey of confidence estimation and calibration in large language models. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 6577--6595

work page 2024
[14]

Jiahui Geng, Fengyu Cai, Yuxia Wang, Heinz Koeppl, Preslav Nakov, and Iryna Gurevych. 2024 b . https://doi.org/10.18653/v1/2024.naacl-long.366 A survey of confidence estimation and calibration in large language models . In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Tech...

work page doi:10.18653/v1/2024.naacl-long.366 2024
[15]

Mitchell L Gordon, Michelle S Lam, Joon Sung Park, Kayur Patel, Jeff Hancock, Tatsunori Hashimoto, and Michael S Bernstein. 2022. Jury learning: Integrating dissenting voices into machine learning models. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems, pages 1--19

work page 2022
[16]

Shashank Gupta, Vaishnavi Shrivastava, Ameet Deshpande, Ashwin Kalyan, Peter Clark, Ashish Sabharwal, and Tushar Khot. 2024. Bias runs deep: Implicit reasoning biases in persona-assigned llms. In The Twelfth International Conference on Learning Representations (ICLR)

work page 2024
[17]

Soumyajit Gupta, Sooyong Lee, Maria De-Arteaga, and Matthew Lease. 2023. Same same, but different: Conditional multi-task learning for demographic-specific toxicity detection. In Proceedings of the ACM Web Conference 2023, pages 3689--3700

work page 2023
[18]

Shirley Hayati, Minhwa Lee, Dheeraj Rajagopal, and Dongyeop Kang. 2024. How far can we extract diverse perspectives from large language models? In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 5336--5366

work page 2024
[19]

Tiancheng Hu and Nigel Collier. 2024. https://aclanthology.org/2024.acl-long.554 Quantifying the persona effect in LLM simulations . In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 10289--10307, Bangkok, Thailand. Association for Computational Linguistics

work page 2024
[20]

EunJeong Hwang, Bodhisattwa Majumder, and Niket Tandon. 2023. https://doi.org/10.18653/v1/2023.findings-emnlp.393 Aligning language models to user opinions . In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 5906--5919, Singapore. Association for Computational Linguistics

work page doi:10.18653/v1/2023.findings-emnlp.393 2023
[21]

Brihi Joshi, Xiang Ren, Swabha Swayamdipta, Rik Koncel-Kedziorski, and Tim Paek. 2025. Improving llm personas via rationalization with psychological scaffolds. arXiv preprint arXiv:2504.17993

work page arXiv 2025
[22]

Zhen Lin, Shubhendu Trivedi, and Jimeng Sun. 2024. https://openreview.net/forum?id=DWkJCSxKU5 Generating with confidence: Uncertainty quantification for black-box large language models . Transactions on Machine Learning Research

work page 2024
[23]

Rajiv Movva, Pang Wei Koh, and Emma Pierson. 2024. Annotation alignment: Comparing llm and human annotations of conversational safety. arXiv preprint arXiv:2406.06369

work page arXiv 2024
[24]

Sagnik Mukherjee, Muhammad Farid Adilazuarda, Sunayana Sitaram, Kalika Bali, Alham Fikri Aji, and Monojit Choudhury. 2024. Cultural conditioning or placebo? on the effectiveness of socio-demographic prompting. arXiv preprint arXiv:2406.11661

work page arXiv 2024
[25]

Moin Nadeem, Anna Bethke, and Siva Reddy. 2020. Stereoset: Measuring stereotypical bias in pretrained language models. arXiv preprint arXiv:2004.09456

work page arXiv 2020
[26]

Arbi Haza Nasution and Aytug Onan. 2024. Chatgpt label: Comparing the quality of human-generated and llm-generated annotations in low-resource language nlp tasks. IEEE Access

work page 2024
[27]

Matthias Orlikowski, Jiaxin Pei, Paul Röttger, Philipp Cimiano, David Jurgens, and Dirk Hovy. 2025. Beyond demographics: Fine-tuning large language models to predict individuals' subjective text perceptions. arXiv [cs.CL]

work page 2025
[28]

Jiaxin Pei and David Jurgens. 2023. https://doi.org/10.18653/v1/2023.law-1.25 When do annotator demographics matter? measuring the influence of annotator demographics with the POPQUORN dataset . In Proceedings of the 17th Linguistic Annotation Workshop (LAW-XVII), pages 252--265, Toronto, Canada. Association for Computational Linguistics

work page doi:10.18653/v1/2023.law-1.25 2023
[29]

David M Rothschild, James Brand, Hope Schroeder, and Jenny Wang. 2024. Opportunities and risks of llms in survey research. Available at SSRN

work page 2024
[30]

Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? In International Conference on Machine Learning, pages 29971--30004. PMLR

work page 2023
[31]

Maarten Sap, Swabha Swayamdipta, Laura Vianna, Xuhui Zhou, Yejin Choi, and Noah A. Smith. 2022. https://doi.org/10.18653/v1/2022.naacl-main.431 Annotators with attitudes: How annotator beliefs and identities bias toxic language detection . In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics:...

work page doi:10.18653/v1/2022.naacl-main.431 2022
[32]

Gordon, Niloofar Mireshghallah, Christopher Michael Rytting, Andre Ye, Liwei Jiang, Ximing Lu, Nouha Dziri, Tim Althoff, and Yejin Choi

Taylor Sorensen, Jared Moore, Jillian Fisher, Mitchell L. Gordon, Niloofar Mireshghallah, Christopher Michael Rytting, Andre Ye, Liwei Jiang, Ximing Lu, Nouha Dziri, Tim Althoff, and Yejin Choi. 2024. https://openreview.net/forum?id=gQpBnRHwxM Position: A roadmap to pluralistic alignment . In ICML

work page 2024
[33]

Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2025. https://arxiv.org/abs/2311.09730 Aligning with Whom? Large Language Models Have Gender and Racial Biases in Subjective NLP Tasks . In Proceedings of the 2025 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL). Association for Computational Linguisti...

work page arXiv 2025
[34]

Ran- dom silicon sampling: Simulating human sub- population opinion using a large language model based on group-level demographic information

Seungjong Sun, Eungu Lee, Dongyan Nan, Xiangying Zhao, Wonbyung Lee, Bernard J. Jansen, and Jang Hyun Kim. 2024. http://arxiv.org/abs/2402.18144 Random silicon sampling: Simulating human sub-population opinion using a large language model based on group-level demographic information

work page arXiv 2024
[35]

Katherine Tian, Eric Mitchell, Allan Zhou, Archit Sharma, Rafael Rafailov, Huaxiu Yao, Chelsea Finn, and Christopher Manning. 2023. https://doi.org/10.18653/v1/2023.emnlp-main.330 Just ask for calibration: Strategies for eliciting calibrated confidence scores from language models fine-tuned with human feedback . In Proceedings of the 2023 Conference on Em...

work page doi:10.18653/v1/2023.emnlp-main.330 2023
[36]

Dickerson

Angelina Wang, Jamie Morgenstern, and John P. Dickerson. 2024. http://arxiv.org/abs/2402.01908 Large language models should not replace human participants because they can misportray and flatten identity groups

work page arXiv 2024
[37]

Miao Xiong, Zhiyuan Hu, Xinyang Lu, YIFEI LI, Jie Fu, Junxian He, and Bryan Hooi. 2024. https://openreview.net/forum?id=gjeQKFxFpZ Can LLM s express their uncertainty? an empirical evaluation of confidence elicitation in LLM s . In The Twelfth International Conference on Learning Representations

work page 2024
[38]

Elle Michelle Yang, Matthias Gall \'e , and Seraphina Goldfarb-Tarrant. 2024. ``There are no solutions, only trade-offs.'' Taking A Closer Look At Safety Data Annotations. In Pluralistic Alignment Workshop at NeurIPS

work page 2024

[1] [1]

URL: " 'urlintro :=

ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type volume year eprint doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRINGS urlintro eprinturl eprintpr...

work page

[2] [2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page

[3] [3]

Arriaga, and Adam Tauman Kalai

Gati V Aher, Rosa I. Arriaga, and Adam Tauman Kalai. 2023. https://proceedings.mlr.press/v202/aher23a.html Using large language models to simulate multiple humans and replicate human subject studies . In Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 337--371. PMLR

work page 2023

[4] [4]

Shayan Alipour, Indira Sen, Mattia Samory, and Tanushree Mitra. 2024. Robustness and confounders in the demographic alignment of llms with human perceptions of offensiveness. arXiv preprint arXiv:2411.08977

work page arXiv 2024

[5] [5]

Tilman Beck, Hendrik Schuff, Anne Lauscher, and Iryna Gurevych. 2024. https://aclanthology.org/2024.eacl-long.159 Sensitivity, performance, robustness: Deconstructing the effect of sociodemographic prompting . In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2589-...

work page 2024

[6] [6]

Laura Biester, Vanita Sharma, Ashkan Kazemi, Naihao Deng, Steven Wilson, and Rada Mihalcea. 2022. https://aclanthology.org/2022.nlperspectives-1.2 Analyzing the effects of annotator gender across NLP tasks . In Proceedings of the 1st Workshop on Perspectivist Approaches to NLP @LREC2022, pages 10--19, Marseille, France. European Language Resources Association

work page 2022

[7] [7]

Myra Cheng, Esin Durmus, and Dan Jurafsky. 2023 a . https://doi.org/10.18653/v1/2023.acl-long.84 Marked personas: Using natural language prompts to measure stereotypes in language models . In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1504--1532, Toronto, Canada. Association for C...

work page doi:10.18653/v1/2023.acl-long.84 2023

[8] [8]

Myra Cheng, Tiziano Piccardi, and Diyi Yang. 2023 b . https://doi.org/10.18653/v1/2023.emnlp-main.669 C o MP os T : Characterizing and evaluating caricature in LLM simulations . In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 10853--10875, Singapore. Association for Computational Linguistics

work page doi:10.18653/v1/2023.emnlp-main.669 2023

[9] [9]

Xuan Long Do, Kenji Kawaguchi, Min-Yen Kan, and Nancy Chen. 2025. https://aclanthology.org/2025.coling-main.172/ Aligning large language models with human opinions through persona selection and value -- belief -- norm reasoning . In Proceedings of the 31st International Conference on Computational Linguistics, pages 2526--2547, Abu Dhabi, UAE. Association...

work page 2025

[10] [10]

Esin Durmus, Karina Nguyen, Thomas Liao, Nicholas Schiefer, Amanda Askell, Anton Bakhtin, Carol Chen, Zac Hatfield-Dodds, Danny Hernandez, Nicholas Joseph, Liane Lovitt, Sam McCandlish, Orowa Sikder, Alex Tamkin, Janel Thamkul, Jared Kaplan, Jack Clark, and Deep Ganguli. 2024. https://openreview.net/forum?id=zl16jLb91v Towards measuring the representation...

work page 2024

[11] [11]

Julia R Falconer, Eibe Frank, Devon LL Polaschek, and Chaitanya Joshi. 2022. Methods for eliciting informative prior distributions: A critical review. Decision Analysis, 19(3):189--204

work page 2022

[12] [12]

Isabel O Gallegos, Ryan A Rossi, Joe Barrow, Md Mehrab Tanjim, Sungchul Kim, Franck Dernoncourt, Tong Yu, Ruiyi Zhang, and Nesreen K Ahmed. 2024. Bias and fairness in large language models: A survey. Computational Linguistics, pages 1--79

work page 2024

[13] [13]

Jiahui Geng, Fengyu Cai, Yuxia Wang, Heinz Koeppl, Preslav Nakov, and Iryna Gurevych. 2024 a . A survey of confidence estimation and calibration in large language models. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 6577--6595

work page 2024

[14] [14]

Jiahui Geng, Fengyu Cai, Yuxia Wang, Heinz Koeppl, Preslav Nakov, and Iryna Gurevych. 2024 b . https://doi.org/10.18653/v1/2024.naacl-long.366 A survey of confidence estimation and calibration in large language models . In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Tech...

work page doi:10.18653/v1/2024.naacl-long.366 2024

[15] [15]

Mitchell L Gordon, Michelle S Lam, Joon Sung Park, Kayur Patel, Jeff Hancock, Tatsunori Hashimoto, and Michael S Bernstein. 2022. Jury learning: Integrating dissenting voices into machine learning models. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems, pages 1--19

work page 2022

[16] [16]

Shashank Gupta, Vaishnavi Shrivastava, Ameet Deshpande, Ashwin Kalyan, Peter Clark, Ashish Sabharwal, and Tushar Khot. 2024. Bias runs deep: Implicit reasoning biases in persona-assigned llms. In The Twelfth International Conference on Learning Representations (ICLR)

work page 2024

[17] [17]

Soumyajit Gupta, Sooyong Lee, Maria De-Arteaga, and Matthew Lease. 2023. Same same, but different: Conditional multi-task learning for demographic-specific toxicity detection. In Proceedings of the ACM Web Conference 2023, pages 3689--3700

work page 2023

[18] [18]

Shirley Hayati, Minhwa Lee, Dheeraj Rajagopal, and Dongyeop Kang. 2024. How far can we extract diverse perspectives from large language models? In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 5336--5366

work page 2024

[19] [19]

Tiancheng Hu and Nigel Collier. 2024. https://aclanthology.org/2024.acl-long.554 Quantifying the persona effect in LLM simulations . In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 10289--10307, Bangkok, Thailand. Association for Computational Linguistics

work page 2024

[20] [20]

EunJeong Hwang, Bodhisattwa Majumder, and Niket Tandon. 2023. https://doi.org/10.18653/v1/2023.findings-emnlp.393 Aligning language models to user opinions . In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 5906--5919, Singapore. Association for Computational Linguistics

work page doi:10.18653/v1/2023.findings-emnlp.393 2023

[21] [21]

Brihi Joshi, Xiang Ren, Swabha Swayamdipta, Rik Koncel-Kedziorski, and Tim Paek. 2025. Improving llm personas via rationalization with psychological scaffolds. arXiv preprint arXiv:2504.17993

work page arXiv 2025

[22] [22]

Zhen Lin, Shubhendu Trivedi, and Jimeng Sun. 2024. https://openreview.net/forum?id=DWkJCSxKU5 Generating with confidence: Uncertainty quantification for black-box large language models . Transactions on Machine Learning Research

work page 2024

[23] [23]

Rajiv Movva, Pang Wei Koh, and Emma Pierson. 2024. Annotation alignment: Comparing llm and human annotations of conversational safety. arXiv preprint arXiv:2406.06369

work page arXiv 2024

[24] [24]

Sagnik Mukherjee, Muhammad Farid Adilazuarda, Sunayana Sitaram, Kalika Bali, Alham Fikri Aji, and Monojit Choudhury. 2024. Cultural conditioning or placebo? on the effectiveness of socio-demographic prompting. arXiv preprint arXiv:2406.11661

work page arXiv 2024

[25] [25]

Moin Nadeem, Anna Bethke, and Siva Reddy. 2020. Stereoset: Measuring stereotypical bias in pretrained language models. arXiv preprint arXiv:2004.09456

work page arXiv 2020

[26] [26]

Arbi Haza Nasution and Aytug Onan. 2024. Chatgpt label: Comparing the quality of human-generated and llm-generated annotations in low-resource language nlp tasks. IEEE Access

work page 2024

[27] [27]

Matthias Orlikowski, Jiaxin Pei, Paul Röttger, Philipp Cimiano, David Jurgens, and Dirk Hovy. 2025. Beyond demographics: Fine-tuning large language models to predict individuals' subjective text perceptions. arXiv [cs.CL]

work page 2025

[28] [28]

Jiaxin Pei and David Jurgens. 2023. https://doi.org/10.18653/v1/2023.law-1.25 When do annotator demographics matter? measuring the influence of annotator demographics with the POPQUORN dataset . In Proceedings of the 17th Linguistic Annotation Workshop (LAW-XVII), pages 252--265, Toronto, Canada. Association for Computational Linguistics

work page doi:10.18653/v1/2023.law-1.25 2023

[29] [29]

David M Rothschild, James Brand, Hope Schroeder, and Jenny Wang. 2024. Opportunities and risks of llms in survey research. Available at SSRN

work page 2024

[30] [30]

Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? In International Conference on Machine Learning, pages 29971--30004. PMLR

work page 2023

[31] [31]

Maarten Sap, Swabha Swayamdipta, Laura Vianna, Xuhui Zhou, Yejin Choi, and Noah A. Smith. 2022. https://doi.org/10.18653/v1/2022.naacl-main.431 Annotators with attitudes: How annotator beliefs and identities bias toxic language detection . In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics:...

work page doi:10.18653/v1/2022.naacl-main.431 2022

[32] [32]

Gordon, Niloofar Mireshghallah, Christopher Michael Rytting, Andre Ye, Liwei Jiang, Ximing Lu, Nouha Dziri, Tim Althoff, and Yejin Choi

Taylor Sorensen, Jared Moore, Jillian Fisher, Mitchell L. Gordon, Niloofar Mireshghallah, Christopher Michael Rytting, Andre Ye, Liwei Jiang, Ximing Lu, Nouha Dziri, Tim Althoff, and Yejin Choi. 2024. https://openreview.net/forum?id=gQpBnRHwxM Position: A roadmap to pluralistic alignment . In ICML

work page 2024

[33] [33]

Huaman Sun, Jiaxin Pei, Minje Choi, and David Jurgens. 2025. https://arxiv.org/abs/2311.09730 Aligning with Whom? Large Language Models Have Gender and Racial Biases in Subjective NLP Tasks . In Proceedings of the 2025 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL). Association for Computational Linguisti...

work page arXiv 2025

[34] [34]

Ran- dom silicon sampling: Simulating human sub- population opinion using a large language model based on group-level demographic information

Seungjong Sun, Eungu Lee, Dongyan Nan, Xiangying Zhao, Wonbyung Lee, Bernard J. Jansen, and Jang Hyun Kim. 2024. http://arxiv.org/abs/2402.18144 Random silicon sampling: Simulating human sub-population opinion using a large language model based on group-level demographic information

work page arXiv 2024

[35] [35]

Katherine Tian, Eric Mitchell, Allan Zhou, Archit Sharma, Rafael Rafailov, Huaxiu Yao, Chelsea Finn, and Christopher Manning. 2023. https://doi.org/10.18653/v1/2023.emnlp-main.330 Just ask for calibration: Strategies for eliciting calibrated confidence scores from language models fine-tuned with human feedback . In Proceedings of the 2023 Conference on Em...

work page doi:10.18653/v1/2023.emnlp-main.330 2023

[36] [36]

Dickerson

Angelina Wang, Jamie Morgenstern, and John P. Dickerson. 2024. http://arxiv.org/abs/2402.01908 Large language models should not replace human participants because they can misportray and flatten identity groups

work page arXiv 2024

[37] [37]

Miao Xiong, Zhiyuan Hu, Xinyang Lu, YIFEI LI, Jie Fu, Junxian He, and Bryan Hooi. 2024. https://openreview.net/forum?id=gjeQKFxFpZ Can LLM s express their uncertainty? an empirical evaluation of confidence elicitation in LLM s . In The Twelfth International Conference on Learning Representations

work page 2024

[38] [38]

Elle Michelle Yang, Matthias Gall \'e , and Seraphina Goldfarb-Tarrant. 2024. ``There are no solutions, only trade-offs.'' Taking A Closer Look At Safety Data Annotations. In Pluralistic Alignment Workshop at NeurIPS

work page 2024