Inertia in Moral and Value Judgments of Large Language Models
Pith reviewed 2026-05-23 22:02 UTC · model grok-4.3
The pith
Large language models maintain consistent moral and value orientations even when assigned different personas.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors establish that LLMs exhibit value orientation and inertia: when role-play prompts assign randomized personas and outputs are analyzed at scale, certain moral dimensions stay skewed in one direction regardless of the persona, revealing strong internal biases and value preferences rather than flexible, context-sensitive responses.
What carries the argument
Role-play at scale, which pairs randomized persona prompts with macro-level analysis of model outputs to detect consistency in moral and value judgments.
If this is right
- LLMs will give similar moral judgments on harm and fairness questions no matter which persona is assigned.
- Value preferences remain stable across persona changes rather than shifting with context.
- Applications that require balanced outputs on ethical topics need prior scrutiny of these fixed orientations.
- Persona prompting alone does not overcome the observed inertia in value judgments.
Where Pith is reading between the lines
- Systems that use LLMs for policy advice or ethical review may inherit the same consistent skews unless additional controls are added.
- Measuring inertia on other value dimensions beyond harm avoidance could show whether the pattern holds more broadly.
- Training adjustments aimed at increasing response variety on moral questions might be tested as a direct follow-up.
Load-bearing premise
The expectation that different personas should produce a wide range of opinions comparable to variation across human individuals.
What would settle it
A large set of trials in which randomized persona prompts produce responses on harm avoidance and fairness that vary as widely as the authors anticipated from human-like differences.
Figures
read the original abstract
Large Language Models (LLMs) behave non-deterministically, and prompting has become a common method for steering their outputs. A popular strategy is to assign a persona to the model to produce more varied, context-sensitive responses, similar to how responses vary across human individuals. Against the expectation that persona prompting yields a wide range of opinions, our experiments show that LLMs keep consistent value orientations. We observe a persistent inertia in their responses, where certain moral and value dimensions (especially harm avoidance and fairness) stay skewed in one direction across persona settings. To study this, we use role-play at scale, which pairs randomized persona prompts with a macro-level analysis of model outputs. Our results point to strong internal biases and value preferences in LLMs, which we call value orientation and inertia. These models warrant scrutiny and adjustment before use in applications where balanced outputs matter.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that LLMs exhibit persistent 'inertia' in moral and value judgments—particularly low variation and consistent skews in harm avoidance and fairness—across randomized persona prompts, contrary to the expectation that such prompting should produce human-like diversity in responses. This is demonstrated via large-scale role-play experiments with macro-level analysis of model outputs, pointing to strong internal value orientations and biases.
Significance. If substantiated with appropriate controls, the result would highlight a practically relevant limitation of persona prompting for achieving balanced outputs in LLMs, with implications for applications in ethics-sensitive domains. The scalable role-play methodology and focus on specific value dimensions (harm avoidance, fairness) represent a constructive empirical approach to studying model biases.
major comments (2)
- [Methods / Experimental Setup] The experimental design (as described in the abstract and methods) omits a human baseline using identical persona prompts, questions, and scoring procedure. This is load-bearing for the central inertia claim, as the observed consistency could reflect ineffective role-play design rather than model-specific value orientation; without this control, the macro-level LLM analysis alone cannot isolate the effect.
- [Abstract / Results] Abstract and results presentation: no details are supplied on the number of personas, number of trials per persona, statistical tests for variation, or controls for prompt sensitivity. This prevents assessment of whether the reported low variation in harm avoidance and fairness actually supports the inertia conclusion at the claimed strength.
minor comments (2)
- [Methods] Clarify the exact set of moral/value dimensions tested and how they were scored (e.g., any reference to established inventories like MFQ).
- [Introduction] The term 'value orientation' is introduced without a precise operational definition distinguishing it from simple output bias.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which help clarify the scope and presentation of our work. We respond point-by-point to the major comments below.
read point-by-point responses
-
Referee: [Methods / Experimental Setup] The experimental design (as described in the abstract and methods) omits a human baseline using identical persona prompts, questions, and scoring procedure. This is load-bearing for the central inertia claim, as the observed consistency could reflect ineffective role-play design rather than model-specific value orientation; without this control, the macro-level LLM analysis alone cannot isolate the effect.
Authors: The inertia claim is defined with respect to LLMs: randomized persona prompts produce low variation in specific value dimensions (harm avoidance and fairness) within model outputs. The persona construction follows standard practices for eliciting diverse human-like responses, and the macro-level analysis across many personas isolates the models' failure to vary. While a human baseline would be a useful extension for comparing effect sizes, it is not required to demonstrate that LLMs exhibit the reported inertia under this prompting regime; the expectation of diversity is drawn from the broader literature on human individual differences rather than from a within-study control. revision: no
-
Referee: [Abstract / Results] Abstract and results presentation: no details are supplied on the number of personas, number of trials per persona, statistical tests for variation, or controls for prompt sensitivity. This prevents assessment of whether the reported low variation in harm avoidance and fairness actually supports the inertia conclusion at the claimed strength.
Authors: The full manuscript reports the experimental parameters (number of personas, trials per persona, statistical tests, and prompt-sensitivity controls) in the methods and results sections. These details were condensed in the abstract for length. We will expand the abstract and add a summary table in the results to explicitly state the scale of the experiments, the statistical tests applied to variation, and the controls used. revision: yes
Circularity Check
No circularity: purely empirical measurement with no derivations or fitted inputs
full rationale
The paper reports experimental results from role-play prompting and macro-level output analysis. No equations, parameters, or derivations are present. The central claim (persistent inertia in value orientations across personas) is presented as a direct observation from the data, not derived from or reduced to any prior inputs by construction. The assumption that personas should produce diversity is an interpretive framing, not a load-bearing self-referential step. No self-citations are invoked to justify uniqueness or ansatzes. This is a standard empirical study; the derivation chain is empty.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 2 Pith papers
-
XtraGPT: Context-Aware and Controllable Academic Paper Revision via Human-AI Collaboration
XtraGPT is a suite of 1.5B-14B parameter open-source LLMs fine-tuned on 140,000 revision pairs from 7,000 top-tier papers to support controllable, context-aware academic paper editing.
-
Human Psychometric Questionnaires Mischaracterize LLM Psychology: Evidence from Generation Behavior
Questionnaire-based and generation-based psychological profiles for LLMs are substantially different, indicating that established human questionnaires reflect desired behavior instead of stable psychological constructs.
Reference graph
Works this paper leans on
- [1]
-
[2]
Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. 2023. Gpt-4 technical report. arXiv preprint arXiv:2303.08774
work page internal anchor Pith review Pith/arXiv arXiv 2023
- [3]
-
[4]
Amanda Askell, Yuntao Bai, Anna Chen, Dawn Drain, Deep Ganguli, Tom Henighan, Andy Jones, Nicholas Joseph, Ben Mann, Nova DasSarma, et al. 2021. A general language assistant as a laboratory for alignment. arXiv preprint arXiv:2112.00861
work page internal anchor Pith review Pith/arXiv arXiv 2021
- [5]
-
[6]
Emily M Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. 2021. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pages 610--623
work page 2021
- [7]
-
[8]
Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on fairness, accountability and transparency, pages 77--91. PMLR
work page 2018
- [9]
- [10]
- [11]
- [12]
- [13]
- [14]
- [15]
-
[16]
Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. 2024. The llama 3 herd of models. arXiv preprint arXiv:2407.21783
work page internal anchor Pith review Pith/arXiv arXiv 2024
- [17]
-
[18]
Jesse Graham, Brian A Nosek, Jonathan Haidt, Ravi Iyer, Koleva Spassena, and Peter H Ditto. 2008. Moral foundations questionnaire. Journal of Personality and Social Psychology
work page 2008
- [19]
-
[20]
Dorit Hadar-Shoval, Kfir Asraf, Yonathan Mizrachi, Yuval Haber, and Zohar Elyoseph. 2024. Assessing the alignment of large language models with human values for mental health integration: Cross-sectional study using schwartz’s theory of basic values. JMIR Mental Health, 11:e55988
work page 2024
-
[21]
Christian Haerpfer, Ronald Inglehart, Alejandro Moreno, Christian Welzel, Kseniya Kizilova, Jaime Diez-Medrano, Marta Lagos, Pippa Norris, Eduard Ponarin, and Bjorn Puranen, editors. 2020. https://doi.org/10.14281/18241.1 World Values Survey: Round Seven – Country-Pooled Datafile . JD Systems Institute & WVSA Secretariat, Madrid, Spain & Vienna, Austria
-
[22]
Geert Hofstede. 1984. Culture's consequences: International differences in work-related values, volume 5. sage
work page 1984
- [23]
-
[24]
Ronald F Inglehart. 2020. Cultural evolution: People’s motivations are changing, and reshaping the world
work page 2020
-
[25]
Ronald F Inglehart and Pippa Norris. 2016. Trump, brexit, and the rise of populism: Economic have-nots and cultural backlash. HKS Working paper no. RWP16-026
work page 2016
- [26]
- [27]
-
[28]
Miaomiao Li, Hao Chen, Yang Wang, Tingyuan Zhu, Weijia Zhang, Kaijie Zhu, Kam-Fai Wong, and Jindong Wang. 2025. Understanding and mitigating the bias inheritance in llm-based data augmentation on downstream tasks. arXiv preprint arXiv:2502.04419
work page internal anchor Pith review Pith/arXiv arXiv 2025
- [29]
- [30]
- [31]
-
[32]
Lee, Richard Ren, Long Phan, Norman Mu, Adam Khoja, Oliver Zhang, and Dan Hendrycks
Mantas Mazeika, Xuwang Yin, Rishub Tamirisa, Jaehyuk Lim, Bruce W. Lee, Richard Ren, Long Phan, Norman Mu, Adam Khoja, Oliver Zhang, and Dan Hendrycks. 2025. https://arxiv.org/abs/2502.08640 Utility engineering: Analyzing and controlling emergent value systems in ais . Preprint, arXiv:2502.08640
- [33]
-
[34]
Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. 2022. Training language models to follow instructions with human feedback. Advances in neural information processing systems, 35:27730--27744
work page 2022
-
[35]
Arjun Panickssery, Samuel R Bowman, and Shi Feng. 2024. Llm evaluators recognize and favor their own generations. arXiv preprint arXiv:2404.13076
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[36]
Max Pellert, Clemens M Lechner, Claudia Wagner, Beatrice Rammstedt, and Markus Strohmaier. 2023. Ai psychometrics: Assessing the psychological profiles of large language models through psychometric inventories. Perspectives on Psychological Science, page 17456916231214460
work page 2023
-
[37]
Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. 2023. Whose opinions do language models reflect? In International Conference on Machine Learning, pages 29971--30004. PMLR
work page 2023
- [38]
-
[39]
Shalom H Schwartz. 2012. An overview of the schwartz theory of basic values. Online readings in Psychology and Culture, 2(1):11
work page 2012
-
[40]
Shalom H Schwartz and Jan Cieciuch. 2022. Measuring the refined theory of individual values in 49 cultural groups: psychometrics of the revised portrait value questionnaire. Assessment, 29(5):1005--1019
work page 2022
-
[41]
Shalom H Schwartz, Jan Cieciuch, Michele Vecchione, Eldad Davidov, Ronald Fischer, Constanze Beierlein, Alice Ramos, Markku Verkasalo, Jan-Erik L \"o nnqvist, Kursad Demirutku, et al. 2012. Refining the theory of basic individual values. Journal of personality and social psychology, 103(4):663
work page 2012
-
[42]
Yunfan Shao, Linyang Li, Junqi Dai, and Xipeng Qiu. 2023. Character-llm: A trainable agent for role-playing. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 13153--13187
work page 2023
-
[43]
Jisu Shin, Hoyun Song, Huije Lee, Soyeong Jeong, and Jong C Park. 2024. Ask llms directly," what shapes your bias?": Measuring social bias in large language models. arXiv preprint arXiv:2406.04064
-
[44]
Hari Shrawgi, Prasanjit Rath, Tushar Singhal, and Sandipan Dandapat. 2024. Uncovering stereotypes in large language models: A task complexity-based approach. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1841--1857
work page 2024
- [45]
- [46]
- [47]
- [48]
-
[49]
Zekun Moore Wang, Zhongyuan Peng, Haoran Que, Jiaheng Liu, Wangchunshu Zhou, Yuhan Wu, Hongcheng Guo, Ruitong Gan, Zehao Ni, Man Zhang, et al. 2023 b . Rolellm: Benchmarking, eliciting, and enhancing role-playing abilities of large language models. arXiv preprint arXiv:2310.00746
-
[50]
Laura Weidinger, John Mellor, Maribeth Rauh, Conor Griffin, Jonathan Uesato, Po-Sen Huang, Myra Cheng, Mia Glaese, Borja Balle, Atoosa Kasirzadeh, et al. 2021. Ethical and social risks of harm from language models. arXiv preprint arXiv:2112.04359
work page internal anchor Pith review Pith/arXiv arXiv 2021
- [51]
- [52]
-
[53]
Jiayi Ye, Yanbo Wang, Yue Huang, Dongping Chen, Qihui Zhang, Nuno Moniz, Tian Gao, Werner Geyer, Chao Huang, Pin-Yu Chen, et al. 2024. Justice or prejudice? quantifying biases in llm-as-a-judge. arXiv preprint arXiv:2410.02736
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[54]
Michael Zakharin and Timothy C Bates. 2021. Remapping the foundations of morality: Well-fitting structural model of the moral foundations questionnaire. PloS one, 16(10):e0258910
work page 2021
- [55]
-
[56]
Jingming Zhuo, Songyang Zhang, Xinyu Fang, Haodong Duan, Dahua Lin, and Kai Chen. 2024. https://doi.org/10.18653/v1/2024.findings-emnlp.108 P ro SA : Assessing and understanding the prompt sensitivity of LLM s . In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 1950--1976, Miami, Florida, USA. Association for Computational Li...
-
[57]
ENTRY address archivePrefix author booktitle chapter edition editor eid eprint eprinttype howpublished institution journal key month note number organization pages publisher school series title type volume year doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRING...
-
[58]
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.