pith. machine review for the scientific record. sign in

arxiv: 2604.24155 · v2 · submitted 2026-04-27 · 💻 cs.CY · cs.AI· cs.HC

Recognition: no theorem link

The Alignment Target Problem: Divergent Moral Judgments of Humans, AI Systems, and Their Designers

Authors on Pith no claims yet

Pith reviewed 2026-05-13 07:15 UTC · model grok-4.3

classification 💻 cs.CY cs.AIcs.HC
keywords alignment target problemmoral judgmentsAI ethicsdeontological reasoningvalue alignmenttrolley problemhuman agencynormative targets
0
0 comments X

The pith

People judge AI systems and their designers more deontologically than human actors in the same situations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether moral judgments of AI match those of humans or shift when human design becomes visible. Using a runaway mine train scenario with 1,002 participants across four conditions, it finds similar evaluations of a repairman and an unprogrammed repair robot, but markedly more rule-based reasoning for both the programmed robot and the engineers who created it. This pattern indicates that highlighting human agency activates stricter moral constraints on both the machine and its creators. A sympathetic reader would care because the result challenges the assumption that AI alignment can simply benchmark against how humans would act or how humans judge AI.

Core claim

Evaluations of a repairman and a repair robot show no significant difference, yet judgments shift substantially toward deontological, rule-based reasoning when the robot is described as the product of human programming. The same stricter standards apply when evaluating the company engineers who programmed the robot. These findings indicate that people evaluate humans, AI systems acting in the same situation, and the humans who design them in meaningfully different ways, which gives rise to the alignment target problem of selecting which normative target should guide artificial moral agents.

What carries the argument

The four-condition moral judgment experiment that varies the subject of evaluation (repairman, repair robot, programmed repair robot, company engineers) while holding the runaway mine train scenario fixed and measures shifts between consequentialist and deontological reasoning.

Load-bearing premise

Moral judgments elicited by this specific hypothetical runaway mine train scenario represent the normative targets that should guide AI behavior in real high-stakes deployment contexts.

What would settle it

A replication experiment using a different high-stakes dilemma, such as medical resource allocation, that finds no significant differences in moral judgments across the four conditions would falsify the claim of divergent evaluations.

Figures

Figures reproduced from arXiv: 2604.24155 by Benjamin Minhao Chen, Xinyu Xie.

Figure 1
Figure 1. Figure 1: Experimental Materials and Conditions. Participants in all four conditions read a scenario description accompanied by an illustration view at source ↗
Figure 2
Figure 2. Figure 2: Percentage of Participants’ Who Judged It Permissible to Redirect the Train onto a Siderail. This bar graph shows the proportion of view at source ↗
Figure 3
Figure 3. Figure 3: Percentage of Participants Who Judged that the Train Should be Redirected onto a Siderail. This bar graph shows the proportion of view at source ↗
Figure 4
Figure 4. Figure 4: Mean General Moral Foundation Scores by Condition. This line graph displays average scores on six Moral Foundations Theory view at source ↗
Figure 5
Figure 5. Figure 5: Mean AI Moral Foundation Scores by Condition. This line graph displays average scores on six Moral Foundations Theory subscales view at source ↗
Figure 6
Figure 6. Figure 6: Mean Purity Gap by Condition. This bar chart displays the mean difference between participants’ AI purity score and their general view at source ↗
read the original abstract

The project of aligning machine behavior with human values raises a basic problem: whose moral expectations should guide AI decision-making? Much alignment research assumes that the appropriate benchmark is how humans themselves would act in a given situation. Studies of agent-type value forks challenge this assumption by showing that people do not always judge humans and AI systems identically.This paper extends that challenge by examining two further possibilities: first, that evaluations of AI behavior change when its human origins are made visible; and second, that people judge the humans who program AI systems differently from either the machines or the human actors they are compared against. An experiment with 1,002 U.S. adults measured moral judgments in a runaway mine train scenario, varying the subject of evaluation across four conditions: a repairman, a repair robot, a repair robot programmed by company engineers, and company engineers programming a repair robot. We find no significant difference in evaluations of the repairman and the robot. However, judgments shifted substantially when the robot's actions were described as the product of human design. Participants exhibited markedly more deontological, rule-based reasoning when evaluating either the programmed robot or the engineers who programmed it, suggesting that rendering human agency visible activates heightened moral constraints. These findings indicate that people may evaluate humans, AI systems acting in the same situation, and the humans who design them in meaningfully different ways. The fact that these evaluations do not necessarily converge gives rise to the alignment target problem: which normative target should guide the development of artificial moral agents in high-stakes domains, and whether these plural judgments can be reconciled within a coherent account of value alignment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper investigates the alignment target problem by experimentally comparing moral judgments of 1,002 U.S. adults in a runaway mine train scenario across four conditions: a human repairman, an autonomous repair robot, a repair robot programmed by engineers, and the engineers themselves. It reports no significant difference between judgments of the repairman and the autonomous robot, but a substantial shift toward deontological, rule-based reasoning when human design or designers are made visible. The authors conclude that humans, AI systems, and their designers elicit meaningfully different moral evaluations, creating a pluralistic challenge for determining the appropriate normative target for AI alignment.

Significance. If the directional findings hold under scrutiny, the work provides empirical grounding for questioning the common assumption that human behavior in a situation is the direct benchmark for AI alignment. By showing that rendering human agency salient activates stricter moral constraints, it suggests alignment research may need to address multiple or context-sensitive targets rather than a single human-like standard, with potential relevance for high-stakes domains where designer intent becomes visible.

major comments (3)
  1. [Results] Results: The abstract states 'no significant difference' between repairman and robot conditions and 'markedly more deontological' reasoning when human design is visible, yet supplies no p-values, effect sizes, confidence intervals, per-condition sample sizes, or details of the statistical tests; without these, the magnitude and reliability of the central empirical claim cannot be evaluated.
  2. [Methods] Methods: The exact vignette wording, the specific questions used to elicit moral judgments, and the coding or measurement scheme distinguishing deontological from consequentialist responses are not described, which is load-bearing for assessing whether the design isolates the effect of visible human agency as claimed.
  3. [Discussion] Discussion: The inference that divergent judgments in this single hypothetical create a general 'alignment target problem' for real high-stakes AI deployment assumes the pattern generalizes beyond the mine-train vignette and predicts policy-relevant preferences; no additional scenarios, robustness checks, or external validation are reported to support this step.
minor comments (1)
  1. [Abstract] Abstract: Adding one or two quantitative anchors (e.g., mean ratings or percentage shifts) would make the directional claims more informative without lengthening the summary.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thorough and constructive review. The comments identify key areas where additional detail and caution will strengthen the manuscript. We address each major comment below and specify the planned revisions.

read point-by-point responses
  1. Referee: [Results] Results: The abstract states 'no significant difference' between repairman and robot conditions and 'markedly more deontological' reasoning when human design is visible, yet supplies no p-values, effect sizes, confidence intervals, per-condition sample sizes, or details of the statistical tests; without these, the magnitude and reliability of the central empirical claim cannot be evaluated.

    Authors: We agree that the abstract should include these statistical details for self-contained evaluation. The full Results section reports the relevant tests, p-values, effect sizes, confidence intervals, and per-condition sample sizes (approximately 250–251 participants per condition). We will revise the abstract to incorporate summaries of these statistics, such as the non-significant comparison between the repairman and autonomous robot conditions and the significant shift in the human-design conditions. revision: yes

  2. Referee: [Methods] Methods: The exact vignette wording, the specific questions used to elicit moral judgments, and the coding or measurement scheme distinguishing deontological from consequentialist responses are not described, which is load-bearing for assessing whether the design isolates the effect of visible human agency as claimed.

    Authors: We acknowledge that the submitted manuscript does not provide sufficient detail on these elements. We will add verbatim vignette text for all four conditions, the exact wording of the moral judgment questions, and a clear description of the coding scheme used to classify responses as deontological versus consequentialist. This will make the isolation of visible human agency transparent and support replicability. revision: yes

  3. Referee: [Discussion] Discussion: The inference that divergent judgments in this single hypothetical create a general 'alignment target problem' for real high-stakes AI deployment assumes the pattern generalizes beyond the mine-train vignette and predicts policy-relevant preferences; no additional scenarios, robustness checks, or external validation are reported to support this step.

    Authors: We agree that the study uses only one vignette and provides no additional scenarios or external validation. The manuscript presents the findings as suggestive evidence for the alignment target problem rather than a universal claim. In revision we will expand the Discussion to explicitly note this limitation, discuss boundary conditions of the vignette, and outline future work on robustness checks and real-world validation. We will also moderate language regarding policy implications. revision: partial

Circularity Check

0 steps flagged

Empirical survey study with no derivations or self-referential constructions

full rationale

The paper reports results from a direct experiment with 1,002 U.S. adults responding to a single hypothetical runaway mine train vignette across four conditions. All central claims (differences in deontological vs. consequentialist judgments when human agency is made visible) rest on observed participant data and standard statistical comparisons. No equations, fitted parameters, predictions derived from inputs, self-citations used as load-bearing uniqueness theorems, or ansatzes appear in the abstract or described methods. The inference chain is self-contained empirical observation rather than any reduction to prior self-referential content.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that responses to a single hypothetical trolley-style scenario can be interpreted as evidence about appropriate normative targets for AI alignment in real domains.

axioms (1)
  • domain assumption Moral judgments elicited by hypothetical scenarios reliably indicate the normative expectations that should guide AI behavior in high-stakes real-world settings.
    Invoked when the authors interpret differences in deontological vs. consequentialist reasoning as directly relevant to the alignment target problem.

pith-pipeline@v0.9.0 · 5598 in / 1445 out tokens · 121379 ms · 2026-05-13T07:15:54.517853+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

60 extracted references · 60 canonical work pages · 1 internal anchor

  1. [1]

    Michael Anderson. 2006. MedEthEx: A prototype medical ethics advisor. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI '06). AAAI Press

  2. [2]

    Edmond Awad, Sohan Dsouza, Richard Kim, Jonathan Schulz, Joseph Henrich, Azim Shariff, Jean-François Bonnefon, and Iyad Rahwan. 2018. The Moral Machine experiment. Nature 563, 7729 (November 2018), 59–64. https://doi.org/10.1038/s41586-018-0637-6

  3. [3]

    Edmond Awad, Sohan Dsouza, Azim Shariff, Iyad Rahwan, and Jean-François Bonnefon. 2020. Universals and variations in moral decisions made in 42 countries by 70,000 participants. Proceedings of the National Academy of Sciences 117, 5 (February 2020), 2332–2337. https://doi.org/10.1073/pnas.1911517117

  4. [4]

    Bainbridge, Justin W

    Wilma A. Bainbridge, Justin W. Hart, Elizabeth S. Kim, and Brian Scassellati. 2011. The benefits of interactions with physically present robots over video- displayed agents. International Journal of Social Robotics 3, 1 (January 2011), 41–52. https://doi.org/10.1007/s12369-010-0082-7

  5. [5]

    William A. Bauer. 2020. Virtuous vs. utilitarian artificial moral agents. AI & Society 35, 1 (March 2020), 263–271. https://doi.org/10.1007/s00146-018-0871-3

  6. [6]

    Bigman and Kurt Gray

    Yochanan E. Bigman and Kurt Gray. 2018. People are averse to machines making moral decisions. Cognition 181 (December 2018), 21–34. https://doi.org/10.1016/j.cognition.2018.08.003

  7. [7]

    Joe Brailsford, Frank Vetere, and Eduardo Velloso. 2024. Exploring the association between moral foundations and judgements of AI behaviour. In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '24, May 11–16, 2024, Honolulu, HI, USA). ACM, New York, NY, USA, 1–15. https://doi.org/10.1145/3613904.3642712

  8. [8]

    Selmer Bringsjord and Joshua Taylor

    S. Selmer Bringsjord and Joshua Taylor. 2012. Introducing divine-command robot ethics. In Robot Ethics: The Ethical and Social Implications of Robotics, Patrick Lin, Keith Abney, and George A. Bekey (Eds.). MIT Press, Cambridge, MA, 85–108

  9. [9]

    Dario Cecchini, Michael Pflanzer, and Veljko Dubljević. 2024. Aligning artificial intelligence with moral intuitions: An intuitionist approach to the alignment problem. AI and Ethics (May 2024). https://doi.org/10.1007/s43681-024-00496-5

  10. [10]

    Arunima Chakraborty and Nisigandha Bhuyan. 2024. Can artificial intelligence be a Kantian moral agent? On moral autonomy of AI system. AI and Ethics 4, 2 (May 2024), 325–331. https://doi.org/10.1007/s43681-023-00269-6

  11. [11]

    Hongyan Chang and Reza Shokri. 2023. Bias propagation in federated learning. arXiv:2309.02160. Retrieved from https://arxiv.org/abs/2309.02160

  12. [12]

    Xiaocong Chen, Chaoran Huang, Lina Yao, Xianzhi Wang, Wei Liu, and Wenjie Zhang. 2020. Knowledge-guided deep reinforcement learning for interactive recommendation. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN '20, July 2020). IEEE, 1–8. https://doi.org/10.1109/IJCNN48605.2020.9207010

  13. [13]

    Yueying Chu and Peng Liu. 2023. Machines and humans in sacrificial moral dilemmas: Required similarly but judged differently? Cognition 239 (October 2023), 105575. https://doi.org/10.1016/j.cognition.2023.105575

  14. [14]

    Martin Cunneen, Martin Mullins, Finbarr Murphy, and Seán Gaines. 2019. Artificial driving intelligence and moral agency: Examining the decision ontology of unavoidable road traffic accidents through the prism of the trolley dilemma. Applied Artificial Intelligence 33, 3 (February 2019), 267–293. https://doi.org/10.1080/08839514.2018.1560124

  15. [15]

    Fiery Cushman. 2013. Action, outcome, and value: A dual-system framework for morality. Personality and Social Psychology Review 17, 3 (August 2013), 273– 16

  16. [16]

    https://doi.org/10.1177/1088868313495594

  17. [17]

    John Danaher and Henrik Skaug Sætra. 2022. Technology and moral change: The transformation of truth and trust. Ethics and Information Technology 24, 3 (September 2022), 35. https://doi.org/10.1007/s10676-022-09661-y

  18. [18]

    Zackary Okun Dunivin. 2024. Scalable qualitative coding with LLMs: Chain-of-thought reasoning matches human performance in some hermeneutic tasks. arXiv:2401.15170. Retrieved from https://arxiv.org/abs/2401.15170

  19. [19]

    Shuaishuai Fang. 2024. Moral relevance approach for AI ethics. Philosophies 9, 2 (March 2024), 42. https://doi.org/10.3390/philosophies9020042

  20. [20]

    Paul Formosa and Malcolm Ryan. 2021. Making moral machines: why we need artificial moral agents. AI Soc. 36, 3 (September 2021), 839–851. https://doi.org/10.1007/s00146-020-01089-6

  21. [21]

    Johannes Fürnkranz, Eyke Hüllermeier, Weiwei Cheng, and Sang-Hyeun Park. 2012. Preference-based reinforcement learning: a formal framework and a policy iteration algorithm. Mach. Learn. 89, 1–2 (October 2012), 123–156. https://doi.org/10.1007/s10994-012-5313-8

  22. [22]

    Iason Gabriel. 2020. Artificial intelligence, values, and alignment. Minds Mach. 30, 3 (September 2020), 411–437. https://doi.org/10.1007/s11023-020-09539-2

  23. [23]

    Fabrizio Gilardi, Meysam Alizadeh, and Maël Kubli. 2023. ChatGPT outperforms crowd workers for text-annotation tasks. Proc. Natl. Acad. Sci. 120, 30 (July 2023), e2305016120. https://doi.org/10.1073/pnas.2305016120

  24. [24]

    Ella Glikson and Anita Williams Woolley. 2020. Human trust in artificial intelligence: Review of empirical research. Acad. Manag. Ann. 14, 2 (July 2020), 627–

  25. [25]

    https://doi.org/10.5465/annals.2018.0057

  26. [26]

    Russell, and Anca Dragan

    Dylan Hadfield-Menell, Smitha Milli, Pieter Abbeel, Stuart J. Russell, and Anca Dragan. 2017. Inverse reward design. Adv. Neural Inf. Process. Syst. 30 (2017). arXiv:1711.02827. Retrieved from https://arxiv.org/abs/1711.02827

  27. [27]

    Jonathan Haidt, Fredrik Bjorklund, and Scott Murphy. 2000. Moral dumbfounding: When intuition finds no reason. Unpublished manuscript, University of Virginia

  28. [28]

    Tae Wan Kim, John Hooker, and Thomas Donaldson. 2021. Taking principles seriously: A hybrid approach to value alignment in artificial intelligence. J. Artif. Intell. Res. 70 (February 2021), 871–890. https://doi.org/10.1613/jair.1.12481

  29. [29]

    Markus Kneer and Juri Viehoff. 2025. The hard problem of AI alignment: Value forks in moral judgment. In Proceedings of the 2025 ACM CHI Conference on Human Factors in Computing Systems. ACM, 2671–2681. https://doi.org/10.1145/3715275.3732174

  30. [30]

    Michael Laakasuo. 2023. Moral Uncanny Valley revisited – how human expectations of robot morality based on robot appearance moderate the perceived morality of robot decisions in high conflict moral dilemmas. Front. Psychol. 14, (November 2023), 1270371. https://doi.org/10.3389/fpsyg.2023.1270371

  31. [31]

    Travis LaCroix and Alexandra Sasha Luccioni. 2025. Metaethical perspectives on ‘benchmarking’ AI ethics. AI Ethics 5, 4 (August 2025), 4029–4047. https://doi.org/10.1007/s43681-025-00703-x

  32. [32]

    Jan Leike, David Krueger, Tom Everitt, Miljan Martic, Vishal Maini, and Shane Legg. 2018. Scalable agent alignment via reward modeling: a research direction. arXiv:1811.07871. Retrieved from https://arxiv.org/abs/1811.07871

  33. [33]

    Robert James M. Boyles. 2024. Can’t Bottom-up Artificial Moral Agents Make Moral Judgements? Filos. Sociol. 35, 1 (February 2024). https://doi.org/10.6001/fil-soc.2024.35.1.3

  34. [34]

    Malle, Matthias Scheutz, Thomas Arnold, John Voiklis, and Corey Cusimano

    Bertram F. Malle, Matthias Scheutz, Thomas Arnold, John Voiklis, and Corey Cusimano. 2015. Sacrifice One For the Good of Many?: People Apply Different Moral Norms to Human and Robot Agents. In Proceedings of the Tenth Annual ACM/IEEE International Conference on Human-Robot Interaction, March 02,

  35. [35]

    https://doi.org/10.1145/2696454.2696458

    ACM, Portland Oregon USA, 117–124. https://doi.org/10.1145/2696454.2696458

  36. [36]

    Malle, Matthias Scheutz, Corey Cusimano, John Voiklis, Takanori Komatsu, Stuti Thapa, and Salomi Aladia

    Bertram F. Malle, Matthias Scheutz, Corey Cusimano, John Voiklis, Takanori Komatsu, Stuti Thapa, and Salomi Aladia. 2025. People’s judgments of humans and robots in a classic moral dilemma. Cognition 254, (January 2025), 105958. https://doi.org/10.1016/j.cognition.2024.105958

  37. [37]

    Malle, Matthias Scheutz, Jodi Forlizzi, and John Voiklis

    Bertram F. Malle, Matthias Scheutz, Jodi Forlizzi, and John Voiklis. 2016. Which robot am I thinking about? The impact of action and appearance on people’s evaluations of a moral robot. In 2016 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI), March 2016. IEEE, Christchurch, New Zealand, 125–132. https://doi.org/10.1109/HRI.2016.7451743

  38. [38]

    Davy Tsz Kit Ng, Wenjie Wu, Jac Ka Lok Leung, Thomas Kin Fung Chiu, and Samuel Kai Wah Chu. 2024. Design and validation of the AI literacy questionnaire: The affective, behavioural, cognitive and ethical approach. Br. J. Educ. Technol. 55, 3 (May 2024), 1082–1104. https://doi.org/10.1111/bjet.13411

  39. [39]

    Chengxuan Qian, Shuo Xing, Shawn Li, Yue Zhao, and Zhengzhong Tu. 2025. DecAlign: Hierarchical cross-modal alignment for decoupled multimodal representation learning. arXiv:2503.11892. Retrieved from https://arxiv.org/abs/2503.11892

  40. [40]

    Shiramizu, Anthony J

    Victor Kenji M. Shiramizu, Anthony J. Lee, Daria Altenburg, David R. Feinberg, and Benedict C. Jones. 2022. The role of valence, dominance, and pitch in perceptions of artificial intelligence (AI) conversational agents’ voices. Sci. Rep. 12, 1 (December 2022), 22479. https://doi.org/10.1038/s41598-022-27124-8

  41. [41]

    Joar Skalse, Nikolaus H. R. Howe, Dmitrii Krasheninnikov, and David Krueger. 2025. Defining and characterizing reward hacking. arXiv:2209.13085. Retrieved from https://arxiv.org/abs/2209.13085

  42. [42]

    Tai, Lillian R

    Robert H. Tai, Lillian R. Bentley, Xin Xia, Jason M. Sitt, Sarah C. Fankhauser, Ana M. Chicas-Mosier, and Barnas G. Monteith. 2024. An Examination of the Use of Large Language Models to Aid Analysis of Textual Data. Int. J. Qual. Methods 23, (January 2024), 16094069241231168. https://doi.org/10.1177/16094069241231168

  43. [43]

    The Future of Coding

    Nga Than, Leanne Fan, Tina Law, Laura K. Nelson, and Leslie McCall. 2025. Updating “The Future of Coding”: Qualitative Coding with Generative Large Language Models. Sociol. Methods Res. 54, 3 (August 2025), 849–888. https://doi.org/10.1177/00491241251339188

  44. [44]

    Suzanne Tolmeijer, Markus Kneer, Cristina Sarasua, Markus Christen, and Abraham Bernstein. 2021. Implementations in machine ethics: A survey. ACM Comput. Surv. 53, 6 (November 2021), 1–38. https://doi.org/10.1145/3419633

  45. [45]

    Human Values

    Alexey Turchin. 2019. AI Alignment Problem: “Human Values” Idea is Built Upon Many Assumptions. PhilPapers (2019)

  46. [46]

    Dan Card and Noah A. Smith. 2020. On Consequentialism and Fairness. Frontiers in Artificial Intelligence 3 (2020), 34. DOI: https://doi.org/10.3389/frai.2020.00034

  47. [47]

    Peter Vamplew, Richard Dazeley, Cameron Foale, Sally Firmin, and Jane Mummery. 2018. Human-aligned artificial intelligence is a multiobjective problem. Ethics Inf. Technol. 20, 1 (March 2018), 27–40. https://doi.org/10.1007/s10676-017-9440-6

  48. [48]

    Wendell Wallach and Colin Allen. 2009. Moral machines: teaching robots right from wrong. Oxford University Press, Oxford. https://doi.org/10.1093/acprof:oso/9780195374049.001.0001

  49. [49]

    Norbert Wiener. 1960. Some moral and technical consequences of automation. Science 131, 3410 (May 1960), 1355–1358

  50. [50]

    Bernard Williams. 2006. Problems of the self: philosophical papers 1956 - 1972 (Transferred to digital print ed.). Cambridge Univ. Press, Cambridge

  51. [51]

    Michael Walzer. 1973. Political action: The problem of dirty hands. Philos. Public Aff. 2, 2 (1973), 160–180

  52. [52]

    Yueh-Hua Wu and Shou-De Lin. 2018. A low-cost ethics shaping approach for designing reinforcement learning agents. arXiv:1712.04172. Retrieved from 17 https://arxiv.org/abs/1712.04172

  53. [53]

    Jingling Zhang, Jane Conway, and César A. Hidalgo. 2023. Why people judge humans differently from machines: The role of perceived agency and experience. arXiv:2210.10081. Retrieved from https://arxiv.org/abs/2210.10081

  54. [54]

    Yuyan Zhang, Jiahua Wu, Feng Yu, and Liying Xu. 2023. Moral Judgments of Human vs. AI Agents in Moral Dilemmas. Behav. Sci. 13, 2 (February 2023), 181. https://doi.org/10.3390/bs13020181

  55. [55]

    Malle, S

    Bertram F. Malle, S. T. Magar, and Matthias Scheutz. 2019. AI in the sky: How people morally evaluate human and machine decisions in a lethal strike dilemma. In Robotics and Well-Being, Markus Coeckelbergh, Janina Loh, Michael Funk, Johanna Seibt, and Marco Nørskov (Eds.). Springer International Publishing, Cham, 111–133

  56. [56]

    Yixiao Zhang and Jiang Lan. 2025. The Practical Problems of Value Alignment and the Chinese Approach. Ideological and Theoretical Education 5, 29–36. https://doi.org/10.16075/j.cnki.cn31-1220/g4.2025.05.003

  57. [57]

    Imbens and Donald B

    Guido W. Imbens and Donald B. Rubin. 2015. Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge University Press, Cambridge

  58. [58]

    Stevens, and Morteza Dehghani

    Mohammad Atari, Jonathan Haidt, Jesse Graham, Sena Koleva, Sean T. Stevens, and Morteza Dehghani. 2023. Morality Beyond the WEIRD: How the Nomological Network of Morality Varies Across Cultures. Journal of Personality and Social Psychology 125, 5 (2023), 1179–1224. https://doi.org/10.1037/pspp0000470

  59. [59]

    Jonathan Haidt and Jesse Graham. 2007. When Morality Opposes Justice: Conservatives Have Moral Intuitions that Liberals may not Recognize. Social Justice Research 20, 1 (2007), 98–116. https://doi.org/10.1007/s11211-007-0034-z

  60. [60]

    The needs of the many outweigh the needs of the few

    Jesse Graham, Brian A. Nosek, Jonathan Haidt, Ravi Iyer, Spassena Koleva, and Peter H. Ditto. 2011. Mapping the Moral Domain. Journal of Personality and Social Psychology 101, 2 (2011), 366–385. https://doi.org/10.1037/a0021847 18 APPENDICES A.1 Overview of Moral Foundations Theory Moral Foundations Theory (MFT) proposes that human moral judgment is not g...