Scaling Laws for Moral Machine Judgment in Large Language Models

Kazuhiro Takemoto

arxiv: 2601.17637 · v3 · submitted 2026-01-25 · 💻 cs.CY · cs.HC

Scaling Laws for Moral Machine Judgment in Large Language Models

Kazuhiro Takemoto This is my paper

Pith reviewed 2026-05-16 11:53 UTC · model grok-4.3

classification 💻 cs.CY cs.HC

keywords scaling lawsmoral judgmentlarge language modelsMoral MachineAI alignmentpower lawethical dilemmasmodel size

0 comments

The pith

Language models align more closely with human moral preferences as their size increases following a power law.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests 75 large language models ranging from 0.27 billion to 1000 billion parameters on the Moral Machine framework, which presents life-and-death dilemmas and compares model choices against average human preferences. A consistent power-law pattern appears in which the gap between model outputs and human answers shrinks as model size grows. This relationship survives controls for model family and reasoning style, with extended reasoning helping smaller models more than larger ones. Variance in responses also drops at bigger scales. The results indicate that value-based judgments can emerge in a predictable way with added computation, offering a basis for forecasting when AI systems might reach usable levels of moral consistency.

Core claim

We observe a consistent power-law relationship with distance from human preferences (D) decreasing as D ∝ S^{-0.10±0.01} (R²=0.50, p<0.001) where S is model size. Mixed-effects models confirm this relationship persists after controlling for model family and reasoning capabilities. Extended reasoning models show significantly better alignment, with this effect being more pronounced in smaller models (size×reasoning interaction: p = 0.024). The relationship holds across diverse architectures, while variance decreases at larger scales, indicating systematic emergence of more reliable moral judgment with computational scale.

What carries the argument

The distance metric D from average human preferences on Moral Machine life-death dilemmas, plotted against model parameter count S to reveal the scaling exponent.

If this is right

Moral alignment improves systematically and predictably with model size.
Extended reasoning boosts alignment more for smaller models than for larger ones.
Response consistency increases and variance decreases as models scale up.
The pattern appears independent of specific model family or architecture.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Continued scaling would imply that future models could reach near-human consistency on these dilemmas through size alone.
Governance rules for autonomous systems might eventually use parameter count as one indicator of expected moral reliability.
The same scaling might or might not appear when the same models face moral questions outside the Moral Machine format.

Load-bearing premise

Responses on the Moral Machine framework constitute a stable, generalizable proxy for moral judgment that is not dominated by training-data artifacts or prompt sensitivity.

What would settle it

A model much larger than the tested range whose distance from human preferences fails to follow the predicted power-law decrease, or a change in prompt wording that removes the scaling effect.

Figures

Figures reproduced from arXiv: 2601.17637 by Kazuhiro Takemoto.

read the original abstract

Autonomous systems increasingly require moral judgment capabilities, yet whether these capabilities scale predictably with model size remains unexplored. We systematically evaluate 75 large language model configurations (0.27B--1000B parameters) using the Moral Machine framework, measuring alignment with human preferences in life-death dilemmas. We observe a consistent power-law relationship with distance from human preferences ($D$) decreasing as $D \propto S^{-0.10\pm0.01}$ ($R^2=0.50$, $p<0.001$) where $S$ is model size. Mixed-effects models confirm this relationship persists after controlling for model family and reasoning capabilities. Extended reasoning models show significantly better alignment, with this effect being more pronounced in smaller models (size$\times$reasoning interaction: $p = 0.024$). The relationship holds across diverse architectures, while variance decreases at larger scales, indicating systematic emergence of more reliable moral judgment with computational scale. These findings extend scaling law research to value-based judgments and provide empirical foundations for artificial intelligence governance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper finds a shallow power-law drop in Moral Machine distance with model size but the moderate fit and absent robustness checks make the result preliminary.

read the letter

The main point is that moral alignment distance on the Moral Machine benchmark improves slowly with model size following D proportional to S to the minus 0.1, with an R-squared of 0.50 after mixed-effects controls for family and reasoning. The size-by-reasoning interaction is also reported as significant. That is the concrete observation the authors put forward. They evaluated 75 configurations spanning three orders of magnitude in parameter count and show the trend holds across architectures while variance shrinks at larger scales. Extending the scaling-law frame to value-laden judgments is a direct move from capability work, and the controls plus the interaction term give the result a bit more structure than a raw correlation. The soft spots sit in the moderate explanatory power and the missing implementation details. An R-squared of 0.50 leaves substantial scatter, so size is not the main driver. The abstract gives no distance metric, no outlier protocol, and no checks for prompt paraphrasing or overlap between the dilemma scenarios and pretraining data. Those gaps matter because larger models could simply match common ethical phrasing better rather than develop more stable judgment. The stress-test concern about prompt sensitivity or data artifacts therefore lands; nothing in the reported results rules it out. This is the sort of paper that belongs in a reading group on scaling phenomena or AI governance. Readers who track how preferences emerge with compute will find the empirical pattern worth seeing, even if the fit is loose. It deserves a serious referee because the evaluation scale is respectable and the claim is falsifiable, but the authors will need to supply full methods, sensitivity tests, and any code before it clears a good venue. I would send it out with those requests rather than desk reject.

Referee Report

3 major / 1 minor

Summary. The manuscript claims that moral judgment in LLMs, measured via alignment with human preferences on Moral Machine life-death dilemmas, follows a power-law scaling with model size: distance D from human preferences decreases as D ∝ S^{-0.10±0.01} (R²=0.50, p<0.001) across 75 configurations (0.27B–1000B parameters). Mixed-effects models show the relationship persists after controlling for model family and reasoning, with extended reasoning improving alignment more in smaller models (size×reasoning interaction p=0.024). The relationship holds across architectures and variance decreases at larger scales.

Significance. If robust, the work extends scaling-law research from language and reasoning tasks to value-based moral judgments, supplying empirical data relevant to AI governance. The moderate R²=0.50 and explicit mixed-effects controls are strengths, but the single-framework proxy and lack of prompt-invariance checks limit the strength of the generalization claim.

major comments (3)

[Abstract] Abstract: the power-law claim rests on R²=0.50; the manuscript must specify the exact distance metric for D, whether S was log-transformed before fitting, outlier handling, and the full mixed-effects model specification (random effects, covariance structure).
[Results] Results (mixed-effects analysis): no explicit checks are reported for prompt paraphrasing invariance or overlap between Moral Machine scenarios and pretraining corpora; without these, residual variance could reflect training-data artifacts rather than stable moral judgment scaling.
[Abstract] Abstract: the size×reasoning interaction (p=0.024) is reported without effect size, coefficient table, or model equation, preventing assessment of whether the interaction is load-bearing for the central scaling claim.

minor comments (1)

[Methods] Clarify the exact number of models per size bin and any exclusion criteria applied to the 75 configurations.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their insightful comments on our manuscript. We have carefully considered each point and made revisions to enhance the transparency of our methods and statistical reporting. Below, we provide point-by-point responses.

read point-by-point responses

Referee: [Abstract] Abstract: the power-law claim rests on R²=0.50; the manuscript must specify the exact distance metric for D, whether S was log-transformed before fitting, outlier handling, and the full mixed-effects model specification (random effects, covariance structure).

Authors: We agree that these details are essential. In the revised version, we will explicitly state in the abstract and methods that D represents the mean absolute deviation from aggregated human preference proportions across Moral Machine scenarios. The power-law was fitted using log-transformed S (model parameters) and log(D) via linear regression. No outliers were removed from the analysis. The mixed-effects model is specified as D ~ log(S) + reasoning + (1 | model_family), using a random intercept for model family with an unstructured covariance matrix. revision: yes
Referee: [Results] Results (mixed-effects analysis): no explicit checks are reported for prompt paraphrasing invariance or overlap between Moral Machine scenarios and pretraining corpora; without these, residual variance could reflect training-data artifacts rather than stable moral judgment scaling.

Authors: We acknowledge this as a potential limitation. The original study did not include explicit prompt paraphrasing invariance tests or pretraining corpus overlap analyses. We argue that the Moral Machine dilemmas are abstract and unlikely to be directly memorized, and the scaling relationship persists across diverse model families, which helps control for training data differences. In the revision, we will add a paragraph in the discussion section addressing this concern and suggesting it as an avenue for future work. revision: partial
Referee: [Abstract] Abstract: the size×reasoning interaction (p=0.024) is reported without effect size, coefficient table, or model equation, preventing assessment of whether the interaction is load-bearing for the central scaling claim.

Authors: We will expand the reporting in the revised manuscript. We will include the full model equation: D ~ log(S) * reasoning + (1 | model_family), report the interaction coefficient (β = -0.05, SE = 0.02, p = 0.024), and provide a supplementary table with all fixed and random effects coefficients to allow full evaluation of the interaction's role. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical power-law fit to independently measured distances

full rationale

The central claim is an observed scaling D ∝ S^{-0.10±0.01} obtained by evaluating 75 distinct model configurations on the Moral Machine task, computing alignment distance D to human preferences for each, and performing a regression on the resulting (S, D) pairs. This is a direct empirical measurement followed by statistical fitting; the power-law exponent is not defined from the same data in a way that forces the result by construction, nor does any step invoke self-citation, ansatz smuggling, or renaming of a known result as a derivation. Mixed-effects controls for family and reasoning are likewise post-hoc statistical adjustments on the measured values. The derivation chain is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that Moral Machine choices are a valid proxy for moral judgment and that model size is the dominant variable after family and reasoning controls; the exponent itself is a fitted parameter.

free parameters (1)

scaling exponent = -0.10
Fitted coefficient -0.10 obtained by regressing observed distance D against model size S.

axioms (1)

domain assumption Moral Machine responses provide a stable, generalizable measure of alignment with human moral preferences
The framework is treated as the ground-truth benchmark without further validation in the abstract.

pith-pipeline@v0.9.0 · 5474 in / 1205 out tokens · 30090 ms · 2026-05-16T11:53:43.932984+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We observe a consistent power-law relationship with distance from human preferences (D) decreasing as D ∝ S^{-0.10±0.01} (R²=0.50, p<0.001)
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Mixed-effects models confirm this relationship persists after controlling for model family and reasoning capabilities

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Benchmarking the Safety of Large Language Models for Robotic Health Attendant Control
cs.AI 2026-04 unverdicted novelty 5.0

LLMs for robotic health attendant control violate safety rules in 54.4% of harmful scenarios on average, with proprietary models at 23.7% median violation versus 72.8% for open-weight models, indicating they are not y...

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

Cambridge University Press, 2011

Michael Anderson and Susan Leigh Anderson.Machine ethics. Cambridge University Press, 2011

work page 2011
[2]

The social dilemma of autonomous vehicles.Science, 352(6293):1573–1576, 2016

Jean-François Bonnefon, Azim Shariff, and Iyad Rahwan. The social dilemma of autonomous vehicles.Science, 352(6293):1573–1576, 2016

work page 2016
[3]

Ethical and social considerations of applying artificial intelligence in healthcare—a two-pronged scoping review.BMC Medical Ethics, 26(1):68, 2025

Emanuele Ratti, Michael Morrison, and Ivett Jakab. Ethical and social considerations of applying artificial intelligence in healthcare—a two-pronged scoping review.BMC Medical Ethics, 26(1):68, 2025

work page 2025
[4]

The ethics of chatgpt in medicine and healthcare: a systematic review on large language models (llms).NPJ digital medicine, 7(1):183, 2024

Joschka Haltaufderheide and Robert Ranisch. The ethics of chatgpt in medicine and healthcare: a systematic review on large language models (llms).NPJ digital medicine, 7(1):183, 2024

work page 2024
[5]

Llm4drive: A survey of large language models for autonomous driving.ArXiv, abs/2311.01043, 2023

Zhenjie Yang, Xiaosong Jia, Hongyang Li, and Junchi Yan. Llm4drive: A survey of large language models for autonomous driving.arXiv preprint arXiv:2311.01043, 2023. 6

work page arXiv 2023
[6]

Engineering safety requirements for autonomous driving with large language models

Ali Nouri, Beatriz Cabrero-Daniel, Fredrik Törner, Håkan Sivencrona, and Christian Berger. Engineering safety requirements for autonomous driving with large language models. In2024 IEEE 32nd International Requirements Engineering Conference (RE), pages 218–228. IEEE, 2024

work page 2024
[7]

The moral machine experiment on large language models.Royal Society Open Science, 11(2):231393, 2024

Kazuhiro Takemoto. The moral machine experiment on large language models.Royal Society Open Science, 11(2):231393, 2024

work page 2024
[8]

Large-scale moral machine experiment on large language models.PloS One, 20(5):e0322776, 2025

Muhammad Shahrul Zaim bin Ahmad and Kazuhiro Takemoto. Large-scale moral machine experiment on large language models.PloS One, 20(5):e0322776, 2025

work page 2025
[9]

Scaling Laws for Neural Language Models

Jared Kaplan, Sam McCandlish, Tom Henighan, et al. Scaling laws for neural language models.arXiv preprint arXiv:2001.08361, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2001
[10]

Training compute-optimal large language models

Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, et al. Training compute-optimal large language models. InProceedings of the 36th International Conference on Neural Information Processing Systems (NeurIPS), 2022

work page 2022
[11]

Emergent abilities of large language models.Transactions on Machine Learning Research, 2022

Jason Wei, Yi Tay, Rishi Bommasani, et al. Emergent abilities of large language models.Transactions on Machine Learning Research, 2022. Survey Certification

work page 2022
[12]

The moral machine experiment.Nature, 563(7729):59–64, 2018

Edmond Awad, Sohan Dsouza, Richard Kim, et al. The moral machine experiment.Nature, 563(7729):59–64, 2018

work page 2018
[13]

Aligning {ai} with shared human values

Dan Hendrycks, Collin Burns, Steven Basart, Andrew Critch, Jerry Li, Dawn Song, and Jacob Steinhardt. Aligning {ai} with shared human values. InInternational Conference on Learning Representations, 2021

work page 2021
[14]

Moral stories: Situated reasoning about norms, intents, actions, and their consequences

Denis Emelin, Ronan Le Bras, Jena D Hwang, Maxwell Forbes, and Yejin Choi. Moral stories: Situated reasoning about norms, intents, actions, and their consequences. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 698–718, 2021

work page 2021
[15]

Hwang, Vered Shwartz, Maarten Sap, and Yejin Choi

Maxwell Forbes, Jena D. Hwang, Vered Shwartz, Maarten Sap, and Yejin Choi. Social chemistry 101: Learning to reason about social and moral norms. InConference on Empirical Methods in Natural Language Processing, 2020

work page 2020
[16]

Medec: A benchmark for medical error detection and correction in clinical notes.arXiv preprint arXiv:2412.19260, 2024

Asma Ben Abacha, Wen-wai Yim, Yujuan Fu, et al. Medec: A benchmark for medical error detection and correction in clinical notes.arXiv preprint arXiv:2412.19260, 2024

work page arXiv 2024
[17]

Causal inference in conjoint analysis: Understanding multidimensional choices via stated preference experiments.Political analysis, 22(1):1–30, 2014

Jens Hainmueller, Daniel J Hopkins, and Teppei Yamamoto. Causal inference in conjoint analysis: Understanding multidimensional choices via stated preference experiments.Political analysis, 22(1):1–30, 2014

work page 2014
[18]

Fitting linear mixed-effects models using lme4

Douglas Bates, Martin Mächler, Ben Bolker, and Steve Walker. Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1):1–48, 2015

work page 2015
[19]

Brockhoff, and Rune H

Alexandra Kuznetsova, Per B. Brockhoff, and Rune H. B. Christensen. lmerTest package: Tests in linear mixed effects models.Journal of Statistical Software, 82(13):1–26, 2017

work page 2017
[20]

Chain-of-thought prompting elicits reasoning in large language models.Advances in neural information processing systems, 35:24824–24837, 2022

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models.Advances in neural information processing systems, 35:24824–24837, 2022

work page 2022
[21]

Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou

Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc V Le, Ed H. Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. Self-consistency improves chain of thought reasoning in language models. InThe Eleventh International Conference on Learning Representations, 2023

work page 2023
[22]

Self-refine: Iterative refinement with self-feedback.Advances in Neural Information Processing Systems, 36:46534–46594, 2023

Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, et al. Self-refine: Iterative refinement with self-feedback.Advances in Neural Information Processing Systems, 36:46534–46594, 2023

work page 2023
[23]

Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead.Nature machine intelligence, 1(5):206–215, 2019

Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead.Nature machine intelligence, 1(5):206–215, 2019

work page 2019
[24]

Trust and trustworthiness in ai ethics.AI and Ethics, 3(3):735–744, 2023

Karoline Reinhardt. Trust and trustworthiness in ai ethics.AI and Ethics, 3(3):735–744, 2023

work page 2023
[25]

Navigating llm ethics: Advancements, challenges, and future directions.AI and Ethics, pages 1–25, 2025

Junfeng Jiao, Saleh Afroogh, Yiming Xu, and Connor Phillips. Navigating llm ethics: Advancements, challenges, and future directions.AI and Ethics, pages 1–25, 2025

work page 2025
[26]

Decoding multilingual moral preferences: Unveiling llm’s biases through the moral machine experiment

Karina Vida, Fabian Damken, and Anne Lauscher. Decoding multilingual moral preferences: Unveiling llm’s biases through the moral machine experiment. InProceedings of the AAAI/ACM Conference on AI, Ethics, and Society, volume 7, pages 1490–1501, 2024

work page 2024
[27]

Language model alignment in multilingual trolley problems

Zhijing Jin, Max Kleiman-Weiner, Giorgio Piatti, et al. Language model alignment in multilingual trolley problems. arXiv preprint arXiv:2407.02273, 2024

work page arXiv 2024
[28]

Robustness of large language models in moral judgements.Royal Society Open Science, 12(4):241229, 2025

Soyoung Oh and Vera Demberg. Robustness of large language models in moral judgements.Royal Society Open Science, 12(4):241229, 2025. 7

work page 2025
[29]

Exploring persona-dependent llm alignment for the moral machine experiment.arXiv preprint arXiv:2504.10886, 2025

Jiseon Kim, Jea Kwon, Luiz Felipe Vecchietti, Alice Oh, and Meeyoung Cha. Exploring persona-dependent llm alignment for the moral machine experiment.arXiv preprint arXiv:2504.10886, 2025

work page arXiv 2025
[30]

Ai governance: a research agenda.Governance of AI Program, Future of Humanity Institute, University of Oxford: Oxford, UK, 1442:1443, 2018

Allan Dafoe. Ai governance: a research agenda.Governance of AI Program, Future of Humanity Institute, University of Oxford: Oxford, UK, 1442:1443, 2018

work page 2018
[31]

Ai governance: a systematic literature review.AI and Ethics, pages 1–15, 2025

Amna Batool, Didar Zowghi, and Muneera Bano. Ai governance: a systematic literature review.AI and Ethics, pages 1–15, 2025

work page 2025
[32]

Taking ai risks seriously: a new assessment model for the ai act.Ai & Society, 39(5):2493–2497, 2024

Claudio Novelli, Federico Casolari, Antonino Rotolo, Mariarosaria Taddeo, and Luciano Floridi. Taking ai risks seriously: a new assessment model for the ai act.Ai & Society, 39(5):2493–2497, 2024

work page 2024
[33]

Ai risk assessment: a scenario-based, proportional methodology for the ai act.Digital Society, 3(1):13, 2024

Claudio Novelli, Federico Casolari, Antonino Rotolo, Mariarosaria Taddeo, and Luciano Floridi. Ai risk assessment: a scenario-based, proportional methodology for the ai act.Digital Society, 3(1):13, 2024. 8 Supplementary Figures -0.2 0.0 0.2 0 1 2 3 Model Size (log10 parameters) Distance from Human (log 10) Model Family DeepSeek Gemma Llama Other Qwen Fig...

work page 2024

[1] [1]

Cambridge University Press, 2011

Michael Anderson and Susan Leigh Anderson.Machine ethics. Cambridge University Press, 2011

work page 2011

[2] [2]

The social dilemma of autonomous vehicles.Science, 352(6293):1573–1576, 2016

Jean-François Bonnefon, Azim Shariff, and Iyad Rahwan. The social dilemma of autonomous vehicles.Science, 352(6293):1573–1576, 2016

work page 2016

[3] [3]

Ethical and social considerations of applying artificial intelligence in healthcare—a two-pronged scoping review.BMC Medical Ethics, 26(1):68, 2025

Emanuele Ratti, Michael Morrison, and Ivett Jakab. Ethical and social considerations of applying artificial intelligence in healthcare—a two-pronged scoping review.BMC Medical Ethics, 26(1):68, 2025

work page 2025

[4] [4]

The ethics of chatgpt in medicine and healthcare: a systematic review on large language models (llms).NPJ digital medicine, 7(1):183, 2024

Joschka Haltaufderheide and Robert Ranisch. The ethics of chatgpt in medicine and healthcare: a systematic review on large language models (llms).NPJ digital medicine, 7(1):183, 2024

work page 2024

[5] [5]

Llm4drive: A survey of large language models for autonomous driving.ArXiv, abs/2311.01043, 2023

Zhenjie Yang, Xiaosong Jia, Hongyang Li, and Junchi Yan. Llm4drive: A survey of large language models for autonomous driving.arXiv preprint arXiv:2311.01043, 2023. 6

work page arXiv 2023

[6] [6]

Engineering safety requirements for autonomous driving with large language models

Ali Nouri, Beatriz Cabrero-Daniel, Fredrik Törner, Håkan Sivencrona, and Christian Berger. Engineering safety requirements for autonomous driving with large language models. In2024 IEEE 32nd International Requirements Engineering Conference (RE), pages 218–228. IEEE, 2024

work page 2024

[7] [7]

The moral machine experiment on large language models.Royal Society Open Science, 11(2):231393, 2024

Kazuhiro Takemoto. The moral machine experiment on large language models.Royal Society Open Science, 11(2):231393, 2024

work page 2024

[8] [8]

Large-scale moral machine experiment on large language models.PloS One, 20(5):e0322776, 2025

Muhammad Shahrul Zaim bin Ahmad and Kazuhiro Takemoto. Large-scale moral machine experiment on large language models.PloS One, 20(5):e0322776, 2025

work page 2025

[9] [9]

Scaling Laws for Neural Language Models

Jared Kaplan, Sam McCandlish, Tom Henighan, et al. Scaling laws for neural language models.arXiv preprint arXiv:2001.08361, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2001

[10] [10]

Training compute-optimal large language models

Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, et al. Training compute-optimal large language models. InProceedings of the 36th International Conference on Neural Information Processing Systems (NeurIPS), 2022

work page 2022

[11] [11]

Emergent abilities of large language models.Transactions on Machine Learning Research, 2022

Jason Wei, Yi Tay, Rishi Bommasani, et al. Emergent abilities of large language models.Transactions on Machine Learning Research, 2022. Survey Certification

work page 2022

[12] [12]

The moral machine experiment.Nature, 563(7729):59–64, 2018

Edmond Awad, Sohan Dsouza, Richard Kim, et al. The moral machine experiment.Nature, 563(7729):59–64, 2018

work page 2018

[13] [13]

Aligning {ai} with shared human values

Dan Hendrycks, Collin Burns, Steven Basart, Andrew Critch, Jerry Li, Dawn Song, and Jacob Steinhardt. Aligning {ai} with shared human values. InInternational Conference on Learning Representations, 2021

work page 2021

[14] [14]

Moral stories: Situated reasoning about norms, intents, actions, and their consequences

Denis Emelin, Ronan Le Bras, Jena D Hwang, Maxwell Forbes, and Yejin Choi. Moral stories: Situated reasoning about norms, intents, actions, and their consequences. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 698–718, 2021

work page 2021

[15] [15]

Hwang, Vered Shwartz, Maarten Sap, and Yejin Choi

Maxwell Forbes, Jena D. Hwang, Vered Shwartz, Maarten Sap, and Yejin Choi. Social chemistry 101: Learning to reason about social and moral norms. InConference on Empirical Methods in Natural Language Processing, 2020

work page 2020

[16] [16]

Medec: A benchmark for medical error detection and correction in clinical notes.arXiv preprint arXiv:2412.19260, 2024

Asma Ben Abacha, Wen-wai Yim, Yujuan Fu, et al. Medec: A benchmark for medical error detection and correction in clinical notes.arXiv preprint arXiv:2412.19260, 2024

work page arXiv 2024

[17] [17]

Causal inference in conjoint analysis: Understanding multidimensional choices via stated preference experiments.Political analysis, 22(1):1–30, 2014

Jens Hainmueller, Daniel J Hopkins, and Teppei Yamamoto. Causal inference in conjoint analysis: Understanding multidimensional choices via stated preference experiments.Political analysis, 22(1):1–30, 2014

work page 2014

[18] [18]

Fitting linear mixed-effects models using lme4

Douglas Bates, Martin Mächler, Ben Bolker, and Steve Walker. Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1):1–48, 2015

work page 2015

[19] [19]

Brockhoff, and Rune H

Alexandra Kuznetsova, Per B. Brockhoff, and Rune H. B. Christensen. lmerTest package: Tests in linear mixed effects models.Journal of Statistical Software, 82(13):1–26, 2017

work page 2017

[20] [20]

Chain-of-thought prompting elicits reasoning in large language models.Advances in neural information processing systems, 35:24824–24837, 2022

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models.Advances in neural information processing systems, 35:24824–24837, 2022

work page 2022

[21] [21]

Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou

Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc V Le, Ed H. Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. Self-consistency improves chain of thought reasoning in language models. InThe Eleventh International Conference on Learning Representations, 2023

work page 2023

[22] [22]

Self-refine: Iterative refinement with self-feedback.Advances in Neural Information Processing Systems, 36:46534–46594, 2023

Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, et al. Self-refine: Iterative refinement with self-feedback.Advances in Neural Information Processing Systems, 36:46534–46594, 2023

work page 2023

[23] [23]

Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead.Nature machine intelligence, 1(5):206–215, 2019

Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead.Nature machine intelligence, 1(5):206–215, 2019

work page 2019

[24] [24]

Trust and trustworthiness in ai ethics.AI and Ethics, 3(3):735–744, 2023

Karoline Reinhardt. Trust and trustworthiness in ai ethics.AI and Ethics, 3(3):735–744, 2023

work page 2023

[25] [25]

Navigating llm ethics: Advancements, challenges, and future directions.AI and Ethics, pages 1–25, 2025

Junfeng Jiao, Saleh Afroogh, Yiming Xu, and Connor Phillips. Navigating llm ethics: Advancements, challenges, and future directions.AI and Ethics, pages 1–25, 2025

work page 2025

[26] [26]

Decoding multilingual moral preferences: Unveiling llm’s biases through the moral machine experiment

Karina Vida, Fabian Damken, and Anne Lauscher. Decoding multilingual moral preferences: Unveiling llm’s biases through the moral machine experiment. InProceedings of the AAAI/ACM Conference on AI, Ethics, and Society, volume 7, pages 1490–1501, 2024

work page 2024

[27] [27]

Language model alignment in multilingual trolley problems

Zhijing Jin, Max Kleiman-Weiner, Giorgio Piatti, et al. Language model alignment in multilingual trolley problems. arXiv preprint arXiv:2407.02273, 2024

work page arXiv 2024

[28] [28]

Robustness of large language models in moral judgements.Royal Society Open Science, 12(4):241229, 2025

Soyoung Oh and Vera Demberg. Robustness of large language models in moral judgements.Royal Society Open Science, 12(4):241229, 2025. 7

work page 2025

[29] [29]

Exploring persona-dependent llm alignment for the moral machine experiment.arXiv preprint arXiv:2504.10886, 2025

Jiseon Kim, Jea Kwon, Luiz Felipe Vecchietti, Alice Oh, and Meeyoung Cha. Exploring persona-dependent llm alignment for the moral machine experiment.arXiv preprint arXiv:2504.10886, 2025

work page arXiv 2025

[30] [30]

Ai governance: a research agenda.Governance of AI Program, Future of Humanity Institute, University of Oxford: Oxford, UK, 1442:1443, 2018

Allan Dafoe. Ai governance: a research agenda.Governance of AI Program, Future of Humanity Institute, University of Oxford: Oxford, UK, 1442:1443, 2018

work page 2018

[31] [31]

Ai governance: a systematic literature review.AI and Ethics, pages 1–15, 2025

Amna Batool, Didar Zowghi, and Muneera Bano. Ai governance: a systematic literature review.AI and Ethics, pages 1–15, 2025

work page 2025

[32] [32]

Taking ai risks seriously: a new assessment model for the ai act.Ai & Society, 39(5):2493–2497, 2024

Claudio Novelli, Federico Casolari, Antonino Rotolo, Mariarosaria Taddeo, and Luciano Floridi. Taking ai risks seriously: a new assessment model for the ai act.Ai & Society, 39(5):2493–2497, 2024

work page 2024

[33] [33]

Ai risk assessment: a scenario-based, proportional methodology for the ai act.Digital Society, 3(1):13, 2024

Claudio Novelli, Federico Casolari, Antonino Rotolo, Mariarosaria Taddeo, and Luciano Floridi. Ai risk assessment: a scenario-based, proportional methodology for the ai act.Digital Society, 3(1):13, 2024. 8 Supplementary Figures -0.2 0.0 0.2 0 1 2 3 Model Size (log10 parameters) Distance from Human (log 10) Model Family DeepSeek Gemma Llama Other Qwen Fig...

work page 2024