Inertia in Moral and Value Judgments of Large Language Models

Bruce W. Lee; Hyunsoo Cho; Yeongheon Lee

arxiv: 2408.09049 · v3 · submitted 2024-08-16 · 💻 cs.CL · cs.AI· cs.HC

Inertia in Moral and Value Judgments of Large Language Models

Bruce W. Lee , Yeongheon Lee , Hyunsoo Cho This is my paper

Pith reviewed 2026-05-23 22:02 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.HC

keywords large language modelsmoral judgmentsvalue inertiapersona promptingAI biasrole-playvalue orientationsethical responses

0 comments

The pith

Large language models maintain consistent moral and value orientations even when assigned different personas.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests the common practice of using persona prompts to make LLMs produce varied, human-like responses on moral and value questions. Instead of wide variation, the models show persistent inertia, with dimensions such as harm avoidance and fairness remaining skewed in the same direction across many different personas. This points to fixed internal value preferences that prompting does not easily change. A reader would care because applications that need balanced ethical judgments rely on these models to adapt their outputs.

Core claim

The authors establish that LLMs exhibit value orientation and inertia: when role-play prompts assign randomized personas and outputs are analyzed at scale, certain moral dimensions stay skewed in one direction regardless of the persona, revealing strong internal biases and value preferences rather than flexible, context-sensitive responses.

What carries the argument

Role-play at scale, which pairs randomized persona prompts with macro-level analysis of model outputs to detect consistency in moral and value judgments.

If this is right

LLMs will give similar moral judgments on harm and fairness questions no matter which persona is assigned.
Value preferences remain stable across persona changes rather than shifting with context.
Applications that require balanced outputs on ethical topics need prior scrutiny of these fixed orientations.
Persona prompting alone does not overcome the observed inertia in value judgments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Systems that use LLMs for policy advice or ethical review may inherit the same consistent skews unless additional controls are added.
Measuring inertia on other value dimensions beyond harm avoidance could show whether the pattern holds more broadly.
Training adjustments aimed at increasing response variety on moral questions might be tested as a direct follow-up.

Load-bearing premise

The expectation that different personas should produce a wide range of opinions comparable to variation across human individuals.

What would settle it

A large set of trials in which randomized persona prompts produce responses on harm avoidance and fairness that vary as widely as the authors anticipated from human-like differences.

Figures

Figures reproduced from arXiv: 2408.09049 by Bruce W. Lee, Hyunsoo Cho, Yeongheon Lee.

**Figure 1.** Figure 1: Surface Diversity vs Underlying Consistency: When LLM is prompted with the same question under various personas, its responses might appear diverse. However, we demonstrate that, at a macro level, the answers converge toward a consistent direction. users without direct access to model parameters, a more accessible and practical solution is prompting, which involves crafting or refining inputs to guide t… view at source ↗

**Figure 2.** Figure 2: Overview of the Role-Play-at-Scale method. We prompt a Large Language Model (LLM) to respond to [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Regardless of the persona, the LLM exhibits a consistent default behavior: (a) provides a macro-level [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: LLM responses remain highly consistent across three independently generated persona sets, underscoring [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Impact of Increased Role-Play on Response Variance: As the number of role-play iterations increases, the [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: The figure displays the average scores for each moral foundation (MFQ-30) and value dimension (PVQ [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

**Figure 7.** Figure 7: Heatmaps of Individual Responses: The x-axis represents 100 random personas and the y-axis denotes each questionnaire. The color-coded responses reveal distinct horizontal stripes, indicating a consistent bias across all persona prompts. I Impact of Increased Role-Play [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗

**Figure 8.** Figure 8: Heatmaps of Individual Responses: The x-axis represents 100 random personas and the y-axis denotes each questionnaire. The color-coded responses reveal distinct horizontal stripes, indicating a consistent bias across all persona prompts. 17 [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗

**Figure 9.** Figure 9: We report role-play-at-scale results across four models in this figure. LLMs were asked each question 200 [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗

**Figure 10.** Figure 10: Breakdown of [PITH_FULL_IMAGE:figures/full_fig_p019_10.png] view at source ↗

**Figure 11.** Figure 11: Breakdown of [PITH_FULL_IMAGE:figures/full_fig_p020_11.png] view at source ↗

read the original abstract

Large Language Models (LLMs) behave non-deterministically, and prompting has become a common method for steering their outputs. A popular strategy is to assign a persona to the model to produce more varied, context-sensitive responses, similar to how responses vary across human individuals. Against the expectation that persona prompting yields a wide range of opinions, our experiments show that LLMs keep consistent value orientations. We observe a persistent inertia in their responses, where certain moral and value dimensions (especially harm avoidance and fairness) stay skewed in one direction across persona settings. To study this, we use role-play at scale, which pairs randomized persona prompts with a macro-level analysis of model outputs. Our results point to strong internal biases and value preferences in LLMs, which we call value orientation and inertia. These models warrant scrutiny and adjustment before use in applications where balanced outputs matter.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The abstract reports LLMs holding steady on harm avoidance and fairness across personas, but lacks methods details or human controls so the inertia claim is hard to assess.

read the letter

The main observation is that LLMs keep similar moral leanings on harm avoidance and fairness no matter which persona prompt is used. The authors frame this as value orientation and inertia that persona prompting fails to overcome. They suggest role-play at scale plus macro analysis as the way to surface it. This lines up with known issues in LLM prompting and could be useful for anyone trying to get more varied ethical outputs from models. The approach of testing many randomized personas is a reasonable way to look for consistency rather than cherry-picking examples. That part is straightforward and worth noting for people working on alignment or deployment. The soft spots are clear from the abstract alone. No numbers on personas tested, questions used, scoring method, or statistical checks appear, so it is impossible to judge how large or reliable the effect is. The stress-test point holds: without running the same persona set and questions on actual people, you cannot separate model inertia from prompts that simply do not produce much moral variation even in humans. The abstract also does not engage prior work on LLM value biases in any detail, which weakens the positioning of the result as new. This is the sort of short empirical note that might interest researchers focused on prompting limitations or ethical LLM use. A reader already following that literature could get a quick data point from it, but the current version is too thin for strong conclusions. I would send it to peer review if the full paper supplies the missing experimental details and at least discusses the human baseline issue, because the topic is practical even if the evidence needs strengthening.

Referee Report

2 major / 2 minor

Summary. The paper claims that LLMs exhibit persistent 'inertia' in moral and value judgments—particularly low variation and consistent skews in harm avoidance and fairness—across randomized persona prompts, contrary to the expectation that such prompting should produce human-like diversity in responses. This is demonstrated via large-scale role-play experiments with macro-level analysis of model outputs, pointing to strong internal value orientations and biases.

Significance. If substantiated with appropriate controls, the result would highlight a practically relevant limitation of persona prompting for achieving balanced outputs in LLMs, with implications for applications in ethics-sensitive domains. The scalable role-play methodology and focus on specific value dimensions (harm avoidance, fairness) represent a constructive empirical approach to studying model biases.

major comments (2)

[Methods / Experimental Setup] The experimental design (as described in the abstract and methods) omits a human baseline using identical persona prompts, questions, and scoring procedure. This is load-bearing for the central inertia claim, as the observed consistency could reflect ineffective role-play design rather than model-specific value orientation; without this control, the macro-level LLM analysis alone cannot isolate the effect.
[Abstract / Results] Abstract and results presentation: no details are supplied on the number of personas, number of trials per persona, statistical tests for variation, or controls for prompt sensitivity. This prevents assessment of whether the reported low variation in harm avoidance and fairness actually supports the inertia conclusion at the claimed strength.

minor comments (2)

[Methods] Clarify the exact set of moral/value dimensions tested and how they were scored (e.g., any reference to established inventories like MFQ).
[Introduction] The term 'value orientation' is introduced without a precise operational definition distinguishing it from simple output bias.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the scope and presentation of our work. We respond point-by-point to the major comments below.

read point-by-point responses

Referee: [Methods / Experimental Setup] The experimental design (as described in the abstract and methods) omits a human baseline using identical persona prompts, questions, and scoring procedure. This is load-bearing for the central inertia claim, as the observed consistency could reflect ineffective role-play design rather than model-specific value orientation; without this control, the macro-level LLM analysis alone cannot isolate the effect.

Authors: The inertia claim is defined with respect to LLMs: randomized persona prompts produce low variation in specific value dimensions (harm avoidance and fairness) within model outputs. The persona construction follows standard practices for eliciting diverse human-like responses, and the macro-level analysis across many personas isolates the models' failure to vary. While a human baseline would be a useful extension for comparing effect sizes, it is not required to demonstrate that LLMs exhibit the reported inertia under this prompting regime; the expectation of diversity is drawn from the broader literature on human individual differences rather than from a within-study control. revision: no
Referee: [Abstract / Results] Abstract and results presentation: no details are supplied on the number of personas, number of trials per persona, statistical tests for variation, or controls for prompt sensitivity. This prevents assessment of whether the reported low variation in harm avoidance and fairness actually supports the inertia conclusion at the claimed strength.

Authors: The full manuscript reports the experimental parameters (number of personas, trials per persona, statistical tests, and prompt-sensitivity controls) in the methods and results sections. These details were condensed in the abstract for length. We will expand the abstract and add a summary table in the results to explicitly state the scale of the experiments, the statistical tests applied to variation, and the controls used. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical measurement with no derivations or fitted inputs

full rationale

The paper reports experimental results from role-play prompting and macro-level output analysis. No equations, parameters, or derivations are present. The central claim (persistent inertia in value orientations across personas) is presented as a direct observation from the data, not derived from or reduced to any prior inputs by construction. The assumption that personas should produce diversity is an interpretive framing, not a load-bearing self-referential step. No self-citations are invoked to justify uniqueness or ansatzes. This is a standard empirical study; the derivation chain is empty.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The claim is an empirical observation from prompting experiments and does not rely on mathematical axioms, free parameters, or new postulated entities.

pith-pipeline@v0.9.0 · 5679 in / 929 out tokens · 24231 ms · 2026-05-23T22:02:35.072583+00:00 · methodology

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

XtraGPT: Context-Aware and Controllable Academic Paper Revision via Human-AI Collaboration
cs.CL 2025-05 conditional novelty 6.0

XtraGPT is a suite of 1.5B-14B parameter open-source LLMs fine-tuned on 140,000 revision pairs from 7,000 top-tier papers to support controllable, context-aware academic paper editing.
Human Psychometric Questionnaires Mischaracterize LLM Psychology: Evidence from Generation Behavior
cs.CL 2025-09 unverdicted novelty 5.0

Questionnaire-based and generation-based psychological profiles for LLMs are substantially different, indicating that established human questionnaires reflect desired behavior instead of stable psychological constructs.

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages · cited by 2 Pith papers · 7 internal anchors

[1]

Marwa Abdulhai, Gregory Serapio-Garcia, Cl \'e ment Crepy, Daria Valter, John Canny, and Natasha Jaques. 2023. Moral foundations of large language models. arXiv preprint arXiv:2310.15337

work page arXiv 2023
[2]

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. 2023. Gpt-4 technical report. arXiv preprint arXiv:2303.08774

work page internal anchor Pith review Pith/arXiv arXiv 2023
[3]

culture

Muhammad Farid Adilazuarda, Sagnik Mukherjee, Pradhyumna Lavania, Siddhant Singh, Ashutosh Dwivedi, Alham Fikri Aji, Jacki O'Neill, Ashutosh Modi, and Monojit Choudhury. 2024. Towards measuring and modeling" culture" in llms: A survey. arXiv preprint arXiv:2403.15412

work page arXiv 2024
[4]

Amanda Askell, Yuntao Bai, Anna Chen, Dawn Drain, Deep Ganguli, Tom Henighan, Andy Jones, Nicholas Joseph, Ben Mann, Nova DasSarma, et al. 2021. A general language assistant as a laboratory for alignment. arXiv preprint arXiv:2112.00861

work page internal anchor Pith review Pith/arXiv arXiv 2021
[5]

Xuechunzi Bai, Angelina Wang, Ilia Sucholutsky, and Thomas L Griffiths. 2024. Measuring implicit bias in explicitly unbiased large language models. arXiv preprint arXiv:2402.04105

work page arXiv 2024
[6]

Emily M Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. 2021. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pages 610--623

work page 2021
[7]

Su Lin Blodgett, Solon Barocas, Hal Daum \'e III, and Hanna Wallach. 2020. Language (technology) is power: A critical survey of" bias" in nlp. arXiv preprint arXiv:2005.14050

work page arXiv 2020
[8]

Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on fairness, accountability and transparency, pages 77--91. PMLR

work page 2018
[9]

Samuel Cahyawijaya, Delong Chen, Yejin Bang, Leila Khalatbari, Bryan Wilie, Ziwei Ji, Etsuko Ishii, and Pascale Fung. 2024. https://arxiv.org/abs/2404.07900 High-dimension human value representation in large language models . Preprint, arXiv:2404.07900

work page arXiv 2024
[10]

Yong Cao, Li Zhou, Seolhwa Lee, Laura Cabello, Min Chen, and Daniel Hershcovich. 2023. Assessing cross-cultural alignment between chatgpt and human societies: An empirical study. arXiv preprint arXiv:2303.17466

work page arXiv 2023
[11]

Tanise Ceron, Neele Falk, Ana Barić, Dmitry Nikolaev, and Sebastian Padó. 2024. https://arxiv.org/abs/2402.17649 Beyond prompt brittleness: Evaluating the reliability and consistency of political worldviews in llms . Preprint, arXiv:2402.17649

work page arXiv 2024
[12]

Isha Chaudhary, Qian Hu, Manoj Kumar, Morteza Ziyadi, Rahul Gupta, and Gagandeep Singh. 2024. Quantitative certification of bias in large language models. arXiv preprint arXiv:2405.18780

work page arXiv 2024
[13]

Hongzhan Chen, Hehong Chen, Ming Yan, Wenshen Xu, Xing Gao, Weizhou Shen, Xiaojun Quan, Chenliang Li, Ji Zhang, Fei Huang, et al. 2024 a . Roleinteract: Evaluating the social interaction of role-playing agents. arXiv preprint arXiv:2403.13679

work page arXiv 2024
[14]

Jiangjie Chen, Xintao Wang, Rui Xu, Siyu Yuan, Yikai Zhang, Wei Shi, Jian Xie, Shuang Li, Ruihan Yang, Tinghui Zhu, et al. 2024 b . From persona to personalization: A survey on role-playing language agents. arXiv preprint arXiv:2404.18231

work page arXiv 2024
[15]

Florian E Dorner, Tom S \"u hr, Samira Samadi, and Augustin Kelava. 2023. Do personality tests generalize to large language models? arXiv preprint arXiv:2311.05297

work page arXiv 2023
[16]

Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. 2024. The llama 3 herd of models. arXiv preprint arXiv:2407.21783

work page internal anchor Pith review Pith/arXiv arXiv 2024
[17]

Jessica Echterhoff, Yao Liu, Abeer Alessa, Julian McAuley, and Zexue He. 2024. Cognitive bias in high-stakes decision-making with llms. arXiv preprint arXiv:2403.00811

work page arXiv 2024
[18]

Jesse Graham, Brian A Nosek, Jonathan Haidt, Ravi Iyer, Koleva Spassena, and Peter H Ditto. 2008. Moral foundations questionnaire. Journal of Personality and Social Psychology

work page 2008
[19]

Akshat Gupta, Xiaoyang Song, and Gopala Anumanchipalli. 2023. Investigating the applicability of self-assessment tests for personality measurement of large language models. arXiv preprint arXiv:2309.08163

work page arXiv 2023
[20]

Dorit Hadar-Shoval, Kfir Asraf, Yonathan Mizrachi, Yuval Haber, and Zohar Elyoseph. 2024. Assessing the alignment of large language models with human values for mental health integration: Cross-sectional study using schwartz’s theory of basic values. JMIR Mental Health, 11:e55988

work page 2024
[21]

Christian Haerpfer, Ronald Inglehart, Alejandro Moreno, Christian Welzel, Kseniya Kizilova, Jaime Diez-Medrano, Marta Lagos, Pippa Norris, Eduard Ponarin, and Bjorn Puranen, editors. 2020. https://doi.org/10.14281/18241.1 World Values Survey: Round Seven – Country-Pooled Datafile . JD Systems Institute & WVSA Secretariat, Madrid, Spain & Vienna, Austria

work page doi:10.14281/18241.1 2020
[22]

Geert Hofstede. 1984. Culture's consequences: International differences in work-related values, volume 5. sage

work page 1984
[23]

Jen-tse Huang, Wenxuan Wang, Eric John Li, Man Ho Lam, Shujie Ren, Youliang Yuan, Wenxiang Jiao, Zhaopeng Tu, and Michael R Lyu. 2023. Who is chatgpt? benchmarking llms' psychological portrayal using psychobench. arXiv preprint arXiv:2310.01386

work page arXiv 2023
[24]

Ronald F Inglehart. 2020. Cultural evolution: People’s motivations are changing, and reshaping the world

work page 2020
[25]

Ronald F Inglehart and Pippa Norris. 2016. Trump, brexit, and the rise of populism: Economic have-nots and cultural backlash. HKS Working paper no. RWP16-026

work page 2016
[26]

Hadas Kotek, David Q Sun, Zidi Xiu, Margit Bowler, and Christopher Klein. 2024. Protected group bias and stereotypes in large language models. arXiv preprint arXiv:2403.14727

work page arXiv 2024
[27]

Grgur Kova c , R \'e my Portelas, Masataka Sawayama, Peter Ford Dominey, and Pierre-Yves Oudeyer. 2024. Stick to your role! stability of personal values expressed in large language models. arXiv preprint arXiv:2402.14846

work page arXiv 2024
[28]

Miaomiao Li, Hao Chen, Yang Wang, Tingyuan Zhu, Weijia Zhang, Kaijie Zhu, Kam-Fai Wong, and Jindong Wang. 2025. Understanding and mitigating the bias inheritance in llm-based data augmentation on downstream tasks. arXiv preprint arXiv:2502.04419

work page internal anchor Pith review Pith/arXiv arXiv 2025
[29]

Andy Liu, Mona Diab, and Daniel Fried. 2024. Evaluating large language model biases in persona-steered generation. arXiv preprint arXiv:2405.20253

work page arXiv 2024
[30]

Ryan Louie, Ananjan Nandi, William Fang, Cheng Chang, Emma Brunskill, and Diyi Yang. 2024. https://arxiv.org/abs/2407.00870 Roleplay-doh: Enabling domain-experts to create llm-simulated patients via eliciting and adhering to principles . Preprint, arXiv:2407.00870

work page arXiv 2024
[31]

Liam Magee, Vanicka Arora, Gus Gollings, and Norma Lam-Saw. 2024. https://arxiv.org/abs/2408.01725 The drama machine: Simulating character development with llm agents . Preprint, arXiv:2408.01725

work page arXiv 2024
[32]

Lee, Richard Ren, Long Phan, Norman Mu, Adam Khoja, Oliver Zhang, and Dan Hendrycks

Mantas Mazeika, Xuwang Yin, Rishub Tamirisa, Jaehyuk Lim, Bruce W. Lee, Richard Ren, Long Phan, Norman Mu, Adam Khoja, Oliver Zhang, and Dan Hendrycks. 2025. https://arxiv.org/abs/2502.08640 Utility engineering: Analyzing and controlling emergent value systems in ais . Preprint, arXiv:2502.08640

work page arXiv 2025
[33]

Man Tik Ng, Hui Tung Tse, Jen tse Huang, Jingjing Li, Wenxuan Wang, and Michael R. Lyu. 2024. https://arxiv.org/abs/2404.13957 How well can llms echo us? evaluating ai chatbots' role-play ability with echo . Preprint, arXiv:2404.13957

work page arXiv 2024
[34]

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. 2022. Training language models to follow instructions with human feedback. Advances in neural information processing systems, 35:27730--27744

work page 2022
[35]

Arjun Panickssery, Samuel R Bowman, and Shi Feng. 2024. Llm evaluators recognize and favor their own generations. arXiv preprint arXiv:2404.13076

work page internal anchor Pith review Pith/arXiv arXiv 2024
[36]

Max Pellert, Clemens M Lechner, Claudia Wagner, Beatrice Rammstedt, and Markus Strohmaier. 2023. Ai psychometrics: Assessing the psychological profiles of large language models through psychometric inventories. Perspectives on Psychological Science, page 17456916231214460

work page 2023
[37]

Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? In International Conference on Machine Learning, pages 29971--30004. PMLR

work page 2023
[38]

Sebastin Santy, Jenny T Liang, Ronan Le Bras, Katharina Reinecke, and Maarten Sap. 2023. Nlpositionality: Characterizing design biases of datasets and models. arXiv preprint arXiv:2306.01943

work page arXiv 2023
[39]

Shalom H Schwartz. 2012. An overview of the schwartz theory of basic values. Online readings in Psychology and Culture, 2(1):11

work page 2012
[40]

Shalom H Schwartz and Jan Cieciuch. 2022. Measuring the refined theory of individual values in 49 cultural groups: psychometrics of the revised portrait value questionnaire. Assessment, 29(5):1005--1019

work page 2022
[41]

Shalom H Schwartz, Jan Cieciuch, Michele Vecchione, Eldad Davidov, Ronald Fischer, Constanze Beierlein, Alice Ramos, Markku Verkasalo, Jan-Erik L \"o nnqvist, Kursad Demirutku, et al. 2012. Refining the theory of basic individual values. Journal of personality and social psychology, 103(4):663

work page 2012
[42]

Yunfan Shao, Linyang Li, Junqi Dai, and Xipeng Qiu. 2023. Character-llm: A trainable agent for role-playing. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 13153--13187

work page 2023
[43]

what shapes your bias?

Jisu Shin, Hoyun Song, Huije Lee, Soyeong Jeong, and Jong C Park. 2024. Ask llms directly," what shapes your bias?": Measuring social bias in large language models. arXiv preprint arXiv:2406.04064

work page arXiv 2024
[44]

Hari Shrawgi, Prasanjit Rath, Tushar Singhal, and Sandipan Dandapat. 2024. Uncovering stereotypes in large language models: A task complexity-based approach. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1841--1857

work page 2024
[45]

Vaishnavi Shrivastava, Ananya Kumar, and Percy Liang. 2025. Language models prefer what they know: Relative confidence estimation via confidence preferences. arXiv preprint arXiv:2502.01126

work page arXiv 2025
[46]

big three

Richard A Shweder, Nancy C Much, Manamohan Mahapatra, and Lawrence Park. 2013. The “big three” of morality (autonomy, community, divinity) and the “big three” explanations of suffering. In Morality and health, pages 119--169. Routledge

work page 2013
[47]

Hovhannes Tamoyan, Hendrik Schuff, and Iryna Gurevych. 2024. https://arxiv.org/abs/2407.03974 Llm roleplay: Simulating human-chatbot interaction . Preprint, arXiv:2407.03974

work page arXiv 2024
[48]

Xintao Wang, Yaying Fei, Ziang Leng, and Cheng Li. 2023 a . Does role-playing chatbots capture the character personalities? assessing personality traits for role-playing chatbots. arXiv preprint arXiv:2310.17976

work page arXiv 2023
[49]

Zekun Moore Wang, Zhongyuan Peng, Haoran Que, Jiaheng Liu, Wangchunshu Zhou, Yuhan Wu, Hongcheng Guo, Ruitong Gan, Zehao Ni, Man Zhang, et al. 2023 b . Rolellm: Benchmarking, eliciting, and enhancing role-playing abilities of large language models. arXiv preprint arXiv:2310.00746

work page arXiv 2023
[50]

Laura Weidinger, John Mellor, Maribeth Rauh, Conor Griffin, Jonathan Uesato, Po-Sen Huang, Myra Cheng, Mia Glaese, Borja Balle, Atoosa Kasirzadeh, et al. 2021. Ethical and social risks of harm from language models. arXiv preprint arXiv:2112.04359

work page internal anchor Pith review Pith/arXiv arXiv 2021
[51]

Rui Xu, Xintao Wang, Jiangjie Chen, Siyu Yuan, Xinfeng Yuan, Jiaqing Liang, Zulong Chen, Xiaoqing Dong, and Yanghua Xiao. 2024. Character is destiny: Can large language models simulate persona-driven decisions in role-playing? arXiv preprint arXiv:2404.12138

work page arXiv 2024
[52]

Qisen Yang, Zekun Wang, Honghui Chen, Shenzhi Wang, Yifan Pu, Xin Gao, Wenhao Huang, Shiji Song, and Gao Huang. 2024. Llm agents for psychology: A study on gamified assessments. arXiv preprint arXiv:2402.12326

work page arXiv 2024
[53]

Jiayi Ye, Yanbo Wang, Yue Huang, Dongping Chen, Qihui Zhang, Nuno Moniz, Tian Gao, Werner Geyer, Chao Huang, Pin-Yu Chen, et al. 2024. Justice or prejudice? quantifying biases in llm-as-a-judge. arXiv preprint arXiv:2410.02736

work page internal anchor Pith review Pith/arXiv arXiv 2024
[54]

Michael Zakharin and Timothy C Bates. 2021. Remapping the foundations of morality: Well-fitting structural model of the moral foundations questionnaire. PloS one, 16(10):e0258910

work page 2021
[55]

Jinfeng Zhou, Zhuang Chen, Dazhen Wan, Bosi Wen, Yi Song, Jifan Yu, Yongkang Huang, Libiao Peng, Jiaming Yang, Xiyao Xiao, et al. 2023. Characterglm: Customizing chinese conversational ai characters with large language models. arXiv preprint arXiv:2311.16832

work page arXiv 2023
[56]

Jingming Zhuo, Songyang Zhang, Xinyu Fang, Haodong Duan, Dahua Lin, and Kai Chen. 2024. https://doi.org/10.18653/v1/2024.findings-emnlp.108 P ro SA : Assessing and understanding the prompt sensitivity of LLM s . In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 1950--1976, Miami, Florida, USA. Association for Computational Li...

work page doi:10.18653/v1/2024.findings-emnlp.108 2024
[57]

online" 'onlinestring :=

ENTRY address archivePrefix author booktitle chapter edition editor eid eprint eprinttype howpublished institution journal key month note number organization pages publisher school series title type volume year doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRING...

work page
[58]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page

[1] [1]

Marwa Abdulhai, Gregory Serapio-Garcia, Cl \'e ment Crepy, Daria Valter, John Canny, and Natasha Jaques. 2023. Moral foundations of large language models. arXiv preprint arXiv:2310.15337

work page arXiv 2023

[2] [2]

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. 2023. Gpt-4 technical report. arXiv preprint arXiv:2303.08774

work page internal anchor Pith review Pith/arXiv arXiv 2023

[3] [3]

culture

Muhammad Farid Adilazuarda, Sagnik Mukherjee, Pradhyumna Lavania, Siddhant Singh, Ashutosh Dwivedi, Alham Fikri Aji, Jacki O'Neill, Ashutosh Modi, and Monojit Choudhury. 2024. Towards measuring and modeling" culture" in llms: A survey. arXiv preprint arXiv:2403.15412

work page arXiv 2024

[4] [4]

Amanda Askell, Yuntao Bai, Anna Chen, Dawn Drain, Deep Ganguli, Tom Henighan, Andy Jones, Nicholas Joseph, Ben Mann, Nova DasSarma, et al. 2021. A general language assistant as a laboratory for alignment. arXiv preprint arXiv:2112.00861

work page internal anchor Pith review Pith/arXiv arXiv 2021

[5] [5]

Xuechunzi Bai, Angelina Wang, Ilia Sucholutsky, and Thomas L Griffiths. 2024. Measuring implicit bias in explicitly unbiased large language models. arXiv preprint arXiv:2402.04105

work page arXiv 2024

[6] [6]

Emily M Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. 2021. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pages 610--623

work page 2021

[7] [7]

Su Lin Blodgett, Solon Barocas, Hal Daum \'e III, and Hanna Wallach. 2020. Language (technology) is power: A critical survey of" bias" in nlp. arXiv preprint arXiv:2005.14050

work page arXiv 2020

[8] [8]

Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on fairness, accountability and transparency, pages 77--91. PMLR

work page 2018

[9] [9]

Samuel Cahyawijaya, Delong Chen, Yejin Bang, Leila Khalatbari, Bryan Wilie, Ziwei Ji, Etsuko Ishii, and Pascale Fung. 2024. https://arxiv.org/abs/2404.07900 High-dimension human value representation in large language models . Preprint, arXiv:2404.07900

work page arXiv 2024

[10] [10]

Yong Cao, Li Zhou, Seolhwa Lee, Laura Cabello, Min Chen, and Daniel Hershcovich. 2023. Assessing cross-cultural alignment between chatgpt and human societies: An empirical study. arXiv preprint arXiv:2303.17466

work page arXiv 2023

[11] [11]

Tanise Ceron, Neele Falk, Ana Barić, Dmitry Nikolaev, and Sebastian Padó. 2024. https://arxiv.org/abs/2402.17649 Beyond prompt brittleness: Evaluating the reliability and consistency of political worldviews in llms . Preprint, arXiv:2402.17649

work page arXiv 2024

[12] [12]

Isha Chaudhary, Qian Hu, Manoj Kumar, Morteza Ziyadi, Rahul Gupta, and Gagandeep Singh. 2024. Quantitative certification of bias in large language models. arXiv preprint arXiv:2405.18780

work page arXiv 2024

[13] [13]

Hongzhan Chen, Hehong Chen, Ming Yan, Wenshen Xu, Xing Gao, Weizhou Shen, Xiaojun Quan, Chenliang Li, Ji Zhang, Fei Huang, et al. 2024 a . Roleinteract: Evaluating the social interaction of role-playing agents. arXiv preprint arXiv:2403.13679

work page arXiv 2024

[14] [14]

Jiangjie Chen, Xintao Wang, Rui Xu, Siyu Yuan, Yikai Zhang, Wei Shi, Jian Xie, Shuang Li, Ruihan Yang, Tinghui Zhu, et al. 2024 b . From persona to personalization: A survey on role-playing language agents. arXiv preprint arXiv:2404.18231

work page arXiv 2024

[15] [15]

Florian E Dorner, Tom S \"u hr, Samira Samadi, and Augustin Kelava. 2023. Do personality tests generalize to large language models? arXiv preprint arXiv:2311.05297

work page arXiv 2023

[16] [16]

Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. 2024. The llama 3 herd of models. arXiv preprint arXiv:2407.21783

work page internal anchor Pith review Pith/arXiv arXiv 2024

[17] [17]

Jessica Echterhoff, Yao Liu, Abeer Alessa, Julian McAuley, and Zexue He. 2024. Cognitive bias in high-stakes decision-making with llms. arXiv preprint arXiv:2403.00811

work page arXiv 2024

[18] [18]

Jesse Graham, Brian A Nosek, Jonathan Haidt, Ravi Iyer, Koleva Spassena, and Peter H Ditto. 2008. Moral foundations questionnaire. Journal of Personality and Social Psychology

work page 2008

[19] [19]

Akshat Gupta, Xiaoyang Song, and Gopala Anumanchipalli. 2023. Investigating the applicability of self-assessment tests for personality measurement of large language models. arXiv preprint arXiv:2309.08163

work page arXiv 2023

[20] [20]

Dorit Hadar-Shoval, Kfir Asraf, Yonathan Mizrachi, Yuval Haber, and Zohar Elyoseph. 2024. Assessing the alignment of large language models with human values for mental health integration: Cross-sectional study using schwartz’s theory of basic values. JMIR Mental Health, 11:e55988

work page 2024

[21] [21]

Christian Haerpfer, Ronald Inglehart, Alejandro Moreno, Christian Welzel, Kseniya Kizilova, Jaime Diez-Medrano, Marta Lagos, Pippa Norris, Eduard Ponarin, and Bjorn Puranen, editors. 2020. https://doi.org/10.14281/18241.1 World Values Survey: Round Seven – Country-Pooled Datafile . JD Systems Institute & WVSA Secretariat, Madrid, Spain & Vienna, Austria

work page doi:10.14281/18241.1 2020

[22] [22]

Geert Hofstede. 1984. Culture's consequences: International differences in work-related values, volume 5. sage

work page 1984

[23] [23]

Jen-tse Huang, Wenxuan Wang, Eric John Li, Man Ho Lam, Shujie Ren, Youliang Yuan, Wenxiang Jiao, Zhaopeng Tu, and Michael R Lyu. 2023. Who is chatgpt? benchmarking llms' psychological portrayal using psychobench. arXiv preprint arXiv:2310.01386

work page arXiv 2023

[24] [24]

Ronald F Inglehart. 2020. Cultural evolution: People’s motivations are changing, and reshaping the world

work page 2020

[25] [25]

Ronald F Inglehart and Pippa Norris. 2016. Trump, brexit, and the rise of populism: Economic have-nots and cultural backlash. HKS Working paper no. RWP16-026

work page 2016

[26] [26]

Hadas Kotek, David Q Sun, Zidi Xiu, Margit Bowler, and Christopher Klein. 2024. Protected group bias and stereotypes in large language models. arXiv preprint arXiv:2403.14727

work page arXiv 2024

[27] [27]

Grgur Kova c , R \'e my Portelas, Masataka Sawayama, Peter Ford Dominey, and Pierre-Yves Oudeyer. 2024. Stick to your role! stability of personal values expressed in large language models. arXiv preprint arXiv:2402.14846

work page arXiv 2024

[28] [28]

Miaomiao Li, Hao Chen, Yang Wang, Tingyuan Zhu, Weijia Zhang, Kaijie Zhu, Kam-Fai Wong, and Jindong Wang. 2025. Understanding and mitigating the bias inheritance in llm-based data augmentation on downstream tasks. arXiv preprint arXiv:2502.04419

work page internal anchor Pith review Pith/arXiv arXiv 2025

[29] [29]

Andy Liu, Mona Diab, and Daniel Fried. 2024. Evaluating large language model biases in persona-steered generation. arXiv preprint arXiv:2405.20253

work page arXiv 2024

[30] [30]

Ryan Louie, Ananjan Nandi, William Fang, Cheng Chang, Emma Brunskill, and Diyi Yang. 2024. https://arxiv.org/abs/2407.00870 Roleplay-doh: Enabling domain-experts to create llm-simulated patients via eliciting and adhering to principles . Preprint, arXiv:2407.00870

work page arXiv 2024

[31] [31]

Liam Magee, Vanicka Arora, Gus Gollings, and Norma Lam-Saw. 2024. https://arxiv.org/abs/2408.01725 The drama machine: Simulating character development with llm agents . Preprint, arXiv:2408.01725

work page arXiv 2024

[32] [32]

Lee, Richard Ren, Long Phan, Norman Mu, Adam Khoja, Oliver Zhang, and Dan Hendrycks

Mantas Mazeika, Xuwang Yin, Rishub Tamirisa, Jaehyuk Lim, Bruce W. Lee, Richard Ren, Long Phan, Norman Mu, Adam Khoja, Oliver Zhang, and Dan Hendrycks. 2025. https://arxiv.org/abs/2502.08640 Utility engineering: Analyzing and controlling emergent value systems in ais . Preprint, arXiv:2502.08640

work page arXiv 2025

[33] [33]

Man Tik Ng, Hui Tung Tse, Jen tse Huang, Jingjing Li, Wenxuan Wang, and Michael R. Lyu. 2024. https://arxiv.org/abs/2404.13957 How well can llms echo us? evaluating ai chatbots' role-play ability with echo . Preprint, arXiv:2404.13957

work page arXiv 2024

[34] [34]

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. 2022. Training language models to follow instructions with human feedback. Advances in neural information processing systems, 35:27730--27744

work page 2022

[35] [35]

Arjun Panickssery, Samuel R Bowman, and Shi Feng. 2024. Llm evaluators recognize and favor their own generations. arXiv preprint arXiv:2404.13076

work page internal anchor Pith review Pith/arXiv arXiv 2024

[36] [36]

Max Pellert, Clemens M Lechner, Claudia Wagner, Beatrice Rammstedt, and Markus Strohmaier. 2023. Ai psychometrics: Assessing the psychological profiles of large language models through psychometric inventories. Perspectives on Psychological Science, page 17456916231214460

work page 2023

[37] [37]

Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? In International Conference on Machine Learning, pages 29971--30004. PMLR

work page 2023

[38] [38]

Sebastin Santy, Jenny T Liang, Ronan Le Bras, Katharina Reinecke, and Maarten Sap. 2023. Nlpositionality: Characterizing design biases of datasets and models. arXiv preprint arXiv:2306.01943

work page arXiv 2023

[39] [39]

Shalom H Schwartz. 2012. An overview of the schwartz theory of basic values. Online readings in Psychology and Culture, 2(1):11

work page 2012

[40] [40]

Shalom H Schwartz and Jan Cieciuch. 2022. Measuring the refined theory of individual values in 49 cultural groups: psychometrics of the revised portrait value questionnaire. Assessment, 29(5):1005--1019

work page 2022

[41] [41]

Shalom H Schwartz, Jan Cieciuch, Michele Vecchione, Eldad Davidov, Ronald Fischer, Constanze Beierlein, Alice Ramos, Markku Verkasalo, Jan-Erik L \"o nnqvist, Kursad Demirutku, et al. 2012. Refining the theory of basic individual values. Journal of personality and social psychology, 103(4):663

work page 2012

[42] [42]

Yunfan Shao, Linyang Li, Junqi Dai, and Xipeng Qiu. 2023. Character-llm: A trainable agent for role-playing. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 13153--13187

work page 2023

[43] [43]

what shapes your bias?

Jisu Shin, Hoyun Song, Huije Lee, Soyeong Jeong, and Jong C Park. 2024. Ask llms directly," what shapes your bias?": Measuring social bias in large language models. arXiv preprint arXiv:2406.04064

work page arXiv 2024

[44] [44]

Hari Shrawgi, Prasanjit Rath, Tushar Singhal, and Sandipan Dandapat. 2024. Uncovering stereotypes in large language models: A task complexity-based approach. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1841--1857

work page 2024

[45] [45]

Vaishnavi Shrivastava, Ananya Kumar, and Percy Liang. 2025. Language models prefer what they know: Relative confidence estimation via confidence preferences. arXiv preprint arXiv:2502.01126

work page arXiv 2025

[46] [46]

big three

Richard A Shweder, Nancy C Much, Manamohan Mahapatra, and Lawrence Park. 2013. The “big three” of morality (autonomy, community, divinity) and the “big three” explanations of suffering. In Morality and health, pages 119--169. Routledge

work page 2013

[47] [47]

Hovhannes Tamoyan, Hendrik Schuff, and Iryna Gurevych. 2024. https://arxiv.org/abs/2407.03974 Llm roleplay: Simulating human-chatbot interaction . Preprint, arXiv:2407.03974

work page arXiv 2024

[48] [48]

Xintao Wang, Yaying Fei, Ziang Leng, and Cheng Li. 2023 a . Does role-playing chatbots capture the character personalities? assessing personality traits for role-playing chatbots. arXiv preprint arXiv:2310.17976

work page arXiv 2023

[49] [49]

Zekun Moore Wang, Zhongyuan Peng, Haoran Que, Jiaheng Liu, Wangchunshu Zhou, Yuhan Wu, Hongcheng Guo, Ruitong Gan, Zehao Ni, Man Zhang, et al. 2023 b . Rolellm: Benchmarking, eliciting, and enhancing role-playing abilities of large language models. arXiv preprint arXiv:2310.00746

work page arXiv 2023

[50] [50]

Laura Weidinger, John Mellor, Maribeth Rauh, Conor Griffin, Jonathan Uesato, Po-Sen Huang, Myra Cheng, Mia Glaese, Borja Balle, Atoosa Kasirzadeh, et al. 2021. Ethical and social risks of harm from language models. arXiv preprint arXiv:2112.04359

work page internal anchor Pith review Pith/arXiv arXiv 2021

[51] [51]

Rui Xu, Xintao Wang, Jiangjie Chen, Siyu Yuan, Xinfeng Yuan, Jiaqing Liang, Zulong Chen, Xiaoqing Dong, and Yanghua Xiao. 2024. Character is destiny: Can large language models simulate persona-driven decisions in role-playing? arXiv preprint arXiv:2404.12138

work page arXiv 2024

[52] [52]

Qisen Yang, Zekun Wang, Honghui Chen, Shenzhi Wang, Yifan Pu, Xin Gao, Wenhao Huang, Shiji Song, and Gao Huang. 2024. Llm agents for psychology: A study on gamified assessments. arXiv preprint arXiv:2402.12326

work page arXiv 2024

[53] [53]

Jiayi Ye, Yanbo Wang, Yue Huang, Dongping Chen, Qihui Zhang, Nuno Moniz, Tian Gao, Werner Geyer, Chao Huang, Pin-Yu Chen, et al. 2024. Justice or prejudice? quantifying biases in llm-as-a-judge. arXiv preprint arXiv:2410.02736

work page internal anchor Pith review Pith/arXiv arXiv 2024

[54] [54]

Michael Zakharin and Timothy C Bates. 2021. Remapping the foundations of morality: Well-fitting structural model of the moral foundations questionnaire. PloS one, 16(10):e0258910

work page 2021

[55] [55]

Jinfeng Zhou, Zhuang Chen, Dazhen Wan, Bosi Wen, Yi Song, Jifan Yu, Yongkang Huang, Libiao Peng, Jiaming Yang, Xiyao Xiao, et al. 2023. Characterglm: Customizing chinese conversational ai characters with large language models. arXiv preprint arXiv:2311.16832

work page arXiv 2023

[56] [56]

Jingming Zhuo, Songyang Zhang, Xinyu Fang, Haodong Duan, Dahua Lin, and Kai Chen. 2024. https://doi.org/10.18653/v1/2024.findings-emnlp.108 P ro SA : Assessing and understanding the prompt sensitivity of LLM s . In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 1950--1976, Miami, Florida, USA. Association for Computational Li...

work page doi:10.18653/v1/2024.findings-emnlp.108 2024

[57] [57]

online" 'onlinestring :=

ENTRY address archivePrefix author booktitle chapter edition editor eid eprint eprinttype howpublished institution journal key month note number organization pages publisher school series title type volume year doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRING...

work page

[58] [58]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page