Recognition: no theorem link
The Alignment Target Problem: Divergent Moral Judgments of Humans, AI Systems, and Their Designers
Pith reviewed 2026-05-13 07:15 UTC · model grok-4.3
The pith
People judge AI systems and their designers more deontologically than human actors in the same situations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Evaluations of a repairman and a repair robot show no significant difference, yet judgments shift substantially toward deontological, rule-based reasoning when the robot is described as the product of human programming. The same stricter standards apply when evaluating the company engineers who programmed the robot. These findings indicate that people evaluate humans, AI systems acting in the same situation, and the humans who design them in meaningfully different ways, which gives rise to the alignment target problem of selecting which normative target should guide artificial moral agents.
What carries the argument
The four-condition moral judgment experiment that varies the subject of evaluation (repairman, repair robot, programmed repair robot, company engineers) while holding the runaway mine train scenario fixed and measures shifts between consequentialist and deontological reasoning.
Load-bearing premise
Moral judgments elicited by this specific hypothetical runaway mine train scenario represent the normative targets that should guide AI behavior in real high-stakes deployment contexts.
What would settle it
A replication experiment using a different high-stakes dilemma, such as medical resource allocation, that finds no significant differences in moral judgments across the four conditions would falsify the claim of divergent evaluations.
Figures
read the original abstract
The project of aligning machine behavior with human values raises a basic problem: whose moral expectations should guide AI decision-making? Much alignment research assumes that the appropriate benchmark is how humans themselves would act in a given situation. Studies of agent-type value forks challenge this assumption by showing that people do not always judge humans and AI systems identically.This paper extends that challenge by examining two further possibilities: first, that evaluations of AI behavior change when its human origins are made visible; and second, that people judge the humans who program AI systems differently from either the machines or the human actors they are compared against. An experiment with 1,002 U.S. adults measured moral judgments in a runaway mine train scenario, varying the subject of evaluation across four conditions: a repairman, a repair robot, a repair robot programmed by company engineers, and company engineers programming a repair robot. We find no significant difference in evaluations of the repairman and the robot. However, judgments shifted substantially when the robot's actions were described as the product of human design. Participants exhibited markedly more deontological, rule-based reasoning when evaluating either the programmed robot or the engineers who programmed it, suggesting that rendering human agency visible activates heightened moral constraints. These findings indicate that people may evaluate humans, AI systems acting in the same situation, and the humans who design them in meaningfully different ways. The fact that these evaluations do not necessarily converge gives rise to the alignment target problem: which normative target should guide the development of artificial moral agents in high-stakes domains, and whether these plural judgments can be reconciled within a coherent account of value alignment.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper investigates the alignment target problem by experimentally comparing moral judgments of 1,002 U.S. adults in a runaway mine train scenario across four conditions: a human repairman, an autonomous repair robot, a repair robot programmed by engineers, and the engineers themselves. It reports no significant difference between judgments of the repairman and the autonomous robot, but a substantial shift toward deontological, rule-based reasoning when human design or designers are made visible. The authors conclude that humans, AI systems, and their designers elicit meaningfully different moral evaluations, creating a pluralistic challenge for determining the appropriate normative target for AI alignment.
Significance. If the directional findings hold under scrutiny, the work provides empirical grounding for questioning the common assumption that human behavior in a situation is the direct benchmark for AI alignment. By showing that rendering human agency salient activates stricter moral constraints, it suggests alignment research may need to address multiple or context-sensitive targets rather than a single human-like standard, with potential relevance for high-stakes domains where designer intent becomes visible.
major comments (3)
- [Results] Results: The abstract states 'no significant difference' between repairman and robot conditions and 'markedly more deontological' reasoning when human design is visible, yet supplies no p-values, effect sizes, confidence intervals, per-condition sample sizes, or details of the statistical tests; without these, the magnitude and reliability of the central empirical claim cannot be evaluated.
- [Methods] Methods: The exact vignette wording, the specific questions used to elicit moral judgments, and the coding or measurement scheme distinguishing deontological from consequentialist responses are not described, which is load-bearing for assessing whether the design isolates the effect of visible human agency as claimed.
- [Discussion] Discussion: The inference that divergent judgments in this single hypothetical create a general 'alignment target problem' for real high-stakes AI deployment assumes the pattern generalizes beyond the mine-train vignette and predicts policy-relevant preferences; no additional scenarios, robustness checks, or external validation are reported to support this step.
minor comments (1)
- [Abstract] Abstract: Adding one or two quantitative anchors (e.g., mean ratings or percentage shifts) would make the directional claims more informative without lengthening the summary.
Simulated Author's Rebuttal
We thank the referee for their thorough and constructive review. The comments identify key areas where additional detail and caution will strengthen the manuscript. We address each major comment below and specify the planned revisions.
read point-by-point responses
-
Referee: [Results] Results: The abstract states 'no significant difference' between repairman and robot conditions and 'markedly more deontological' reasoning when human design is visible, yet supplies no p-values, effect sizes, confidence intervals, per-condition sample sizes, or details of the statistical tests; without these, the magnitude and reliability of the central empirical claim cannot be evaluated.
Authors: We agree that the abstract should include these statistical details for self-contained evaluation. The full Results section reports the relevant tests, p-values, effect sizes, confidence intervals, and per-condition sample sizes (approximately 250–251 participants per condition). We will revise the abstract to incorporate summaries of these statistics, such as the non-significant comparison between the repairman and autonomous robot conditions and the significant shift in the human-design conditions. revision: yes
-
Referee: [Methods] Methods: The exact vignette wording, the specific questions used to elicit moral judgments, and the coding or measurement scheme distinguishing deontological from consequentialist responses are not described, which is load-bearing for assessing whether the design isolates the effect of visible human agency as claimed.
Authors: We acknowledge that the submitted manuscript does not provide sufficient detail on these elements. We will add verbatim vignette text for all four conditions, the exact wording of the moral judgment questions, and a clear description of the coding scheme used to classify responses as deontological versus consequentialist. This will make the isolation of visible human agency transparent and support replicability. revision: yes
-
Referee: [Discussion] Discussion: The inference that divergent judgments in this single hypothetical create a general 'alignment target problem' for real high-stakes AI deployment assumes the pattern generalizes beyond the mine-train vignette and predicts policy-relevant preferences; no additional scenarios, robustness checks, or external validation are reported to support this step.
Authors: We agree that the study uses only one vignette and provides no additional scenarios or external validation. The manuscript presents the findings as suggestive evidence for the alignment target problem rather than a universal claim. In revision we will expand the Discussion to explicitly note this limitation, discuss boundary conditions of the vignette, and outline future work on robustness checks and real-world validation. We will also moderate language regarding policy implications. revision: partial
Circularity Check
Empirical survey study with no derivations or self-referential constructions
full rationale
The paper reports results from a direct experiment with 1,002 U.S. adults responding to a single hypothetical runaway mine train vignette across four conditions. All central claims (differences in deontological vs. consequentialist judgments when human agency is made visible) rest on observed participant data and standard statistical comparisons. No equations, fitted parameters, predictions derived from inputs, self-citations used as load-bearing uniqueness theorems, or ansatzes appear in the abstract or described methods. The inference chain is self-contained empirical observation rather than any reduction to prior self-referential content.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Moral judgments elicited by hypothetical scenarios reliably indicate the normative expectations that should guide AI behavior in high-stakes real-world settings.
Reference graph
Works this paper leans on
-
[1]
Michael Anderson. 2006. MedEthEx: A prototype medical ethics advisor. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI '06). AAAI Press
work page 2006
-
[2]
Edmond Awad, Sohan Dsouza, Richard Kim, Jonathan Schulz, Joseph Henrich, Azim Shariff, Jean-François Bonnefon, and Iyad Rahwan. 2018. The Moral Machine experiment. Nature 563, 7729 (November 2018), 59–64. https://doi.org/10.1038/s41586-018-0637-6
-
[3]
Edmond Awad, Sohan Dsouza, Azim Shariff, Iyad Rahwan, and Jean-François Bonnefon. 2020. Universals and variations in moral decisions made in 42 countries by 70,000 participants. Proceedings of the National Academy of Sciences 117, 5 (February 2020), 2332–2337. https://doi.org/10.1073/pnas.1911517117
-
[4]
Wilma A. Bainbridge, Justin W. Hart, Elizabeth S. Kim, and Brian Scassellati. 2011. The benefits of interactions with physically present robots over video- displayed agents. International Journal of Social Robotics 3, 1 (January 2011), 41–52. https://doi.org/10.1007/s12369-010-0082-7
-
[5]
William A. Bauer. 2020. Virtuous vs. utilitarian artificial moral agents. AI & Society 35, 1 (March 2020), 263–271. https://doi.org/10.1007/s00146-018-0871-3
-
[6]
Yochanan E. Bigman and Kurt Gray. 2018. People are averse to machines making moral decisions. Cognition 181 (December 2018), 21–34. https://doi.org/10.1016/j.cognition.2018.08.003
-
[7]
Joe Brailsford, Frank Vetere, and Eduardo Velloso. 2024. Exploring the association between moral foundations and judgements of AI behaviour. In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '24, May 11–16, 2024, Honolulu, HI, USA). ACM, New York, NY, USA, 1–15. https://doi.org/10.1145/3613904.3642712
-
[8]
Selmer Bringsjord and Joshua Taylor
S. Selmer Bringsjord and Joshua Taylor. 2012. Introducing divine-command robot ethics. In Robot Ethics: The Ethical and Social Implications of Robotics, Patrick Lin, Keith Abney, and George A. Bekey (Eds.). MIT Press, Cambridge, MA, 85–108
work page 2012
-
[9]
Dario Cecchini, Michael Pflanzer, and Veljko Dubljević. 2024. Aligning artificial intelligence with moral intuitions: An intuitionist approach to the alignment problem. AI and Ethics (May 2024). https://doi.org/10.1007/s43681-024-00496-5
-
[10]
Arunima Chakraborty and Nisigandha Bhuyan. 2024. Can artificial intelligence be a Kantian moral agent? On moral autonomy of AI system. AI and Ethics 4, 2 (May 2024), 325–331. https://doi.org/10.1007/s43681-023-00269-6
- [11]
-
[12]
Xiaocong Chen, Chaoran Huang, Lina Yao, Xianzhi Wang, Wei Liu, and Wenjie Zhang. 2020. Knowledge-guided deep reinforcement learning for interactive recommendation. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN '20, July 2020). IEEE, 1–8. https://doi.org/10.1109/IJCNN48605.2020.9207010
-
[13]
Yueying Chu and Peng Liu. 2023. Machines and humans in sacrificial moral dilemmas: Required similarly but judged differently? Cognition 239 (October 2023), 105575. https://doi.org/10.1016/j.cognition.2023.105575
-
[14]
Martin Cunneen, Martin Mullins, Finbarr Murphy, and Seán Gaines. 2019. Artificial driving intelligence and moral agency: Examining the decision ontology of unavoidable road traffic accidents through the prism of the trolley dilemma. Applied Artificial Intelligence 33, 3 (February 2019), 267–293. https://doi.org/10.1080/08839514.2018.1560124
-
[15]
Fiery Cushman. 2013. Action, outcome, and value: A dual-system framework for morality. Personality and Social Psychology Review 17, 3 (August 2013), 273– 16
work page 2013
-
[16]
https://doi.org/10.1177/1088868313495594
-
[17]
John Danaher and Henrik Skaug Sætra. 2022. Technology and moral change: The transformation of truth and trust. Ethics and Information Technology 24, 3 (September 2022), 35. https://doi.org/10.1007/s10676-022-09661-y
- [18]
-
[19]
Shuaishuai Fang. 2024. Moral relevance approach for AI ethics. Philosophies 9, 2 (March 2024), 42. https://doi.org/10.3390/philosophies9020042
-
[20]
Paul Formosa and Malcolm Ryan. 2021. Making moral machines: why we need artificial moral agents. AI Soc. 36, 3 (September 2021), 839–851. https://doi.org/10.1007/s00146-020-01089-6
-
[21]
Johannes Fürnkranz, Eyke Hüllermeier, Weiwei Cheng, and Sang-Hyeun Park. 2012. Preference-based reinforcement learning: a formal framework and a policy iteration algorithm. Mach. Learn. 89, 1–2 (October 2012), 123–156. https://doi.org/10.1007/s10994-012-5313-8
-
[22]
Iason Gabriel. 2020. Artificial intelligence, values, and alignment. Minds Mach. 30, 3 (September 2020), 411–437. https://doi.org/10.1007/s11023-020-09539-2
-
[23]
Fabrizio Gilardi, Meysam Alizadeh, and Maël Kubli. 2023. ChatGPT outperforms crowd workers for text-annotation tasks. Proc. Natl. Acad. Sci. 120, 30 (July 2023), e2305016120. https://doi.org/10.1073/pnas.2305016120
-
[24]
Ella Glikson and Anita Williams Woolley. 2020. Human trust in artificial intelligence: Review of empirical research. Acad. Manag. Ann. 14, 2 (July 2020), 627–
work page 2020
-
[25]
https://doi.org/10.5465/annals.2018.0057
-
[26]
Dylan Hadfield-Menell, Smitha Milli, Pieter Abbeel, Stuart J. Russell, and Anca Dragan. 2017. Inverse reward design. Adv. Neural Inf. Process. Syst. 30 (2017). arXiv:1711.02827. Retrieved from https://arxiv.org/abs/1711.02827
-
[27]
Jonathan Haidt, Fredrik Bjorklund, and Scott Murphy. 2000. Moral dumbfounding: When intuition finds no reason. Unpublished manuscript, University of Virginia
work page 2000
-
[28]
Tae Wan Kim, John Hooker, and Thomas Donaldson. 2021. Taking principles seriously: A hybrid approach to value alignment in artificial intelligence. J. Artif. Intell. Res. 70 (February 2021), 871–890. https://doi.org/10.1613/jair.1.12481
-
[29]
Markus Kneer and Juri Viehoff. 2025. The hard problem of AI alignment: Value forks in moral judgment. In Proceedings of the 2025 ACM CHI Conference on Human Factors in Computing Systems. ACM, 2671–2681. https://doi.org/10.1145/3715275.3732174
-
[30]
Michael Laakasuo. 2023. Moral Uncanny Valley revisited – how human expectations of robot morality based on robot appearance moderate the perceived morality of robot decisions in high conflict moral dilemmas. Front. Psychol. 14, (November 2023), 1270371. https://doi.org/10.3389/fpsyg.2023.1270371
-
[31]
Travis LaCroix and Alexandra Sasha Luccioni. 2025. Metaethical perspectives on ‘benchmarking’ AI ethics. AI Ethics 5, 4 (August 2025), 4029–4047. https://doi.org/10.1007/s43681-025-00703-x
-
[32]
Jan Leike, David Krueger, Tom Everitt, Miljan Martic, Vishal Maini, and Shane Legg. 2018. Scalable agent alignment via reward modeling: a research direction. arXiv:1811.07871. Retrieved from https://arxiv.org/abs/1811.07871
work page Pith review arXiv 2018
-
[33]
Robert James M. Boyles. 2024. Can’t Bottom-up Artificial Moral Agents Make Moral Judgements? Filos. Sociol. 35, 1 (February 2024). https://doi.org/10.6001/fil-soc.2024.35.1.3
-
[34]
Malle, Matthias Scheutz, Thomas Arnold, John Voiklis, and Corey Cusimano
Bertram F. Malle, Matthias Scheutz, Thomas Arnold, John Voiklis, and Corey Cusimano. 2015. Sacrifice One For the Good of Many?: People Apply Different Moral Norms to Human and Robot Agents. In Proceedings of the Tenth Annual ACM/IEEE International Conference on Human-Robot Interaction, March 02,
work page 2015
-
[35]
https://doi.org/10.1145/2696454.2696458
ACM, Portland Oregon USA, 117–124. https://doi.org/10.1145/2696454.2696458
-
[36]
Bertram F. Malle, Matthias Scheutz, Corey Cusimano, John Voiklis, Takanori Komatsu, Stuti Thapa, and Salomi Aladia. 2025. People’s judgments of humans and robots in a classic moral dilemma. Cognition 254, (January 2025), 105958. https://doi.org/10.1016/j.cognition.2024.105958
-
[37]
Malle, Matthias Scheutz, Jodi Forlizzi, and John Voiklis
Bertram F. Malle, Matthias Scheutz, Jodi Forlizzi, and John Voiklis. 2016. Which robot am I thinking about? The impact of action and appearance on people’s evaluations of a moral robot. In 2016 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI), March 2016. IEEE, Christchurch, New Zealand, 125–132. https://doi.org/10.1109/HRI.2016.7451743
-
[38]
Davy Tsz Kit Ng, Wenjie Wu, Jac Ka Lok Leung, Thomas Kin Fung Chiu, and Samuel Kai Wah Chu. 2024. Design and validation of the AI literacy questionnaire: The affective, behavioural, cognitive and ethical approach. Br. J. Educ. Technol. 55, 3 (May 2024), 1082–1104. https://doi.org/10.1111/bjet.13411
-
[39]
Chengxuan Qian, Shuo Xing, Shawn Li, Yue Zhao, and Zhengzhong Tu. 2025. DecAlign: Hierarchical cross-modal alignment for decoupled multimodal representation learning. arXiv:2503.11892. Retrieved from https://arxiv.org/abs/2503.11892
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[40]
Victor Kenji M. Shiramizu, Anthony J. Lee, Daria Altenburg, David R. Feinberg, and Benedict C. Jones. 2022. The role of valence, dominance, and pitch in perceptions of artificial intelligence (AI) conversational agents’ voices. Sci. Rep. 12, 1 (December 2022), 22479. https://doi.org/10.1038/s41598-022-27124-8
- [41]
-
[42]
Robert H. Tai, Lillian R. Bentley, Xin Xia, Jason M. Sitt, Sarah C. Fankhauser, Ana M. Chicas-Mosier, and Barnas G. Monteith. 2024. An Examination of the Use of Large Language Models to Aid Analysis of Textual Data. Int. J. Qual. Methods 23, (January 2024), 16094069241231168. https://doi.org/10.1177/16094069241231168
-
[43]
Nga Than, Leanne Fan, Tina Law, Laura K. Nelson, and Leslie McCall. 2025. Updating “The Future of Coding”: Qualitative Coding with Generative Large Language Models. Sociol. Methods Res. 54, 3 (August 2025), 849–888. https://doi.org/10.1177/00491241251339188
-
[44]
Suzanne Tolmeijer, Markus Kneer, Cristina Sarasua, Markus Christen, and Abraham Bernstein. 2021. Implementations in machine ethics: A survey. ACM Comput. Surv. 53, 6 (November 2021), 1–38. https://doi.org/10.1145/3419633
-
[45]
Alexey Turchin. 2019. AI Alignment Problem: “Human Values” Idea is Built Upon Many Assumptions. PhilPapers (2019)
work page 2019
-
[46]
Dan Card and Noah A. Smith. 2020. On Consequentialism and Fairness. Frontiers in Artificial Intelligence 3 (2020), 34. DOI: https://doi.org/10.3389/frai.2020.00034
-
[47]
Peter Vamplew, Richard Dazeley, Cameron Foale, Sally Firmin, and Jane Mummery. 2018. Human-aligned artificial intelligence is a multiobjective problem. Ethics Inf. Technol. 20, 1 (March 2018), 27–40. https://doi.org/10.1007/s10676-017-9440-6
-
[48]
Wendell Wallach and Colin Allen. 2009. Moral machines: teaching robots right from wrong. Oxford University Press, Oxford. https://doi.org/10.1093/acprof:oso/9780195374049.001.0001
work page doi:10.1093/acprof:oso/9780195374049.001.0001 2009
-
[49]
Norbert Wiener. 1960. Some moral and technical consequences of automation. Science 131, 3410 (May 1960), 1355–1358
work page 1960
-
[50]
Bernard Williams. 2006. Problems of the self: philosophical papers 1956 - 1972 (Transferred to digital print ed.). Cambridge Univ. Press, Cambridge
work page 2006
-
[51]
Michael Walzer. 1973. Political action: The problem of dirty hands. Philos. Public Aff. 2, 2 (1973), 160–180
work page 1973
- [52]
- [53]
-
[54]
Yuyan Zhang, Jiahua Wu, Feng Yu, and Liying Xu. 2023. Moral Judgments of Human vs. AI Agents in Moral Dilemmas. Behav. Sci. 13, 2 (February 2023), 181. https://doi.org/10.3390/bs13020181
-
[55]
Bertram F. Malle, S. T. Magar, and Matthias Scheutz. 2019. AI in the sky: How people morally evaluate human and machine decisions in a lethal strike dilemma. In Robotics and Well-Being, Markus Coeckelbergh, Janina Loh, Michael Funk, Johanna Seibt, and Marco Nørskov (Eds.). Springer International Publishing, Cham, 111–133
work page 2019
-
[56]
Yixiao Zhang and Jiang Lan. 2025. The Practical Problems of Value Alignment and the Chinese Approach. Ideological and Theoretical Education 5, 29–36. https://doi.org/10.16075/j.cnki.cn31-1220/g4.2025.05.003
-
[57]
Guido W. Imbens and Donald B. Rubin. 2015. Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge University Press, Cambridge
work page 2015
-
[58]
Mohammad Atari, Jonathan Haidt, Jesse Graham, Sena Koleva, Sean T. Stevens, and Morteza Dehghani. 2023. Morality Beyond the WEIRD: How the Nomological Network of Morality Varies Across Cultures. Journal of Personality and Social Psychology 125, 5 (2023), 1179–1224. https://doi.org/10.1037/pspp0000470
-
[59]
Jonathan Haidt and Jesse Graham. 2007. When Morality Opposes Justice: Conservatives Have Moral Intuitions that Liberals may not Recognize. Social Justice Research 20, 1 (2007), 98–116. https://doi.org/10.1007/s11211-007-0034-z
-
[60]
The needs of the many outweigh the needs of the few
Jesse Graham, Brian A. Nosek, Jonathan Haidt, Ravi Iyer, Spassena Koleva, and Peter H. Ditto. 2011. Mapping the Moral Domain. Journal of Personality and Social Psychology 101, 2 (2011), 366–385. https://doi.org/10.1037/a0021847 18 APPENDICES A.1 Overview of Moral Foundations Theory Moral Foundations Theory (MFT) proposes that human moral judgment is not g...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.