arxiv: 2604.24155 · v2 · submitted 2026-04-27 · 💻 cs.CY · cs.AI· cs.HC

Recognition: no theorem link

The Alignment Target Problem: Divergent Moral Judgments of Humans, AI Systems, and Their Designers

Benjamin Minhao Chen , Xinyu Xie

Authors on Pith no claims yet

Pith reviewed 2026-05-13 07:15 UTC · model grok-4.3

classification 💻 cs.CY cs.AIcs.HC

keywords alignment target problemmoral judgmentsAI ethicsdeontological reasoningvalue alignmenttrolley problemhuman agencynormative targets

0 comments

The pith

People judge AI systems and their designers more deontologically than human actors in the same situations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether moral judgments of AI match those of humans or shift when human design becomes visible. Using a runaway mine train scenario with 1,002 participants across four conditions, it finds similar evaluations of a repairman and an unprogrammed repair robot, but markedly more rule-based reasoning for both the programmed robot and the engineers who created it. This pattern indicates that highlighting human agency activates stricter moral constraints on both the machine and its creators. A sympathetic reader would care because the result challenges the assumption that AI alignment can simply benchmark against how humans would act or how humans judge AI.

Core claim

Evaluations of a repairman and a repair robot show no significant difference, yet judgments shift substantially toward deontological, rule-based reasoning when the robot is described as the product of human programming. The same stricter standards apply when evaluating the company engineers who programmed the robot. These findings indicate that people evaluate humans, AI systems acting in the same situation, and the humans who design them in meaningfully different ways, which gives rise to the alignment target problem of selecting which normative target should guide artificial moral agents.

What carries the argument

The four-condition moral judgment experiment that varies the subject of evaluation (repairman, repair robot, programmed repair robot, company engineers) while holding the runaway mine train scenario fixed and measures shifts between consequentialist and deontological reasoning.

Load-bearing premise

Moral judgments elicited by this specific hypothetical runaway mine train scenario represent the normative targets that should guide AI behavior in real high-stakes deployment contexts.

What would settle it

A replication experiment using a different high-stakes dilemma, such as medical resource allocation, that finds no significant differences in moral judgments across the four conditions would falsify the claim of divergent evaluations.

Figures

Figures reproduced from arXiv: 2604.24155 by Benjamin Minhao Chen, Xinyu Xie.

**Figure 1.** Figure 1: Experimental Materials and Conditions. Participants in all four conditions read a scenario description accompanied by an illustration view at source ↗

**Figure 2.** Figure 2: Percentage of Participants’ Who Judged It Permissible to Redirect the Train onto a Siderail. This bar graph shows the proportion of view at source ↗

**Figure 3.** Figure 3: Percentage of Participants Who Judged that the Train Should be Redirected onto a Siderail. This bar graph shows the proportion of view at source ↗

**Figure 4.** Figure 4: Mean General Moral Foundation Scores by Condition. This line graph displays average scores on six Moral Foundations Theory view at source ↗

**Figure 5.** Figure 5: Mean AI Moral Foundation Scores by Condition. This line graph displays average scores on six Moral Foundations Theory subscales view at source ↗

**Figure 6.** Figure 6: Mean Purity Gap by Condition. This bar chart displays the mean difference between participants’ AI purity score and their general view at source ↗

read the original abstract

The project of aligning machine behavior with human values raises a basic problem: whose moral expectations should guide AI decision-making? Much alignment research assumes that the appropriate benchmark is how humans themselves would act in a given situation. Studies of agent-type value forks challenge this assumption by showing that people do not always judge humans and AI systems identically.This paper extends that challenge by examining two further possibilities: first, that evaluations of AI behavior change when its human origins are made visible; and second, that people judge the humans who program AI systems differently from either the machines or the human actors they are compared against. An experiment with 1,002 U.S. adults measured moral judgments in a runaway mine train scenario, varying the subject of evaluation across four conditions: a repairman, a repair robot, a repair robot programmed by company engineers, and company engineers programming a repair robot. We find no significant difference in evaluations of the repairman and the robot. However, judgments shifted substantially when the robot's actions were described as the product of human design. Participants exhibited markedly more deontological, rule-based reasoning when evaluating either the programmed robot or the engineers who programmed it, suggesting that rendering human agency visible activates heightened moral constraints. These findings indicate that people may evaluate humans, AI systems acting in the same situation, and the humans who design them in meaningfully different ways. The fact that these evaluations do not necessarily converge gives rise to the alignment target problem: which normative target should guide the development of artificial moral agents in high-stakes domains, and whether these plural judgments can be reconciled within a coherent account of value alignment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper shows people judge AI and its designers more deontologically than a human actor in one mine-train dilemma once design is visible, but the data are too thin to claim a general alignment target problem.

read the letter

The main thing to know is that the experiment finds no real difference in how people judge a human repairman versus an autonomous repair robot in the runaway mine train setup, but judgments turn more rule-based and deontological once the description mentions that engineers programmed the robot or asks directly about those engineers. That shift is the new piece they add to the agent-type value fork work they cite. They ran it with 1,002 U.S. adults, which is a reasonable sample size for this kind of survey, and the four-condition design lets them isolate the visibility-of-human-agency factor cleanly. That part is straightforward and worth having on the record. The abstract is clear that the pattern appears, so the basic empirical observation holds up on its own terms. The soft spots are mostly about scope and reporting. The abstract gives only directional results with no effect sizes, exact question text, or controls for order or wording confounds, so a referee would need the full methods and tables before trusting how large or stable the shift really is. Everything rests on this single vignette; the stress-test note is fair that we have no evidence the same pattern would show up in other high-stakes choices or that vignette answers track what people actually want from deployed AI. The jump from “judgments differ here” to “this creates an alignment target problem that needs reconciling” therefore feels preliminary rather than demonstrated. The paper is for alignment researchers and moral psychologists who already follow the value-fork literature and want to see the human-design visibility angle tested. A reader looking for solid new data on how people parse responsibility across humans, machines, and creators will find the setup useful even if the broader claim needs more work. It is coherent on its own terms and engages the existing citations honestly, so it deserves a serious referee to check the stats, request robustness checks with other dilemmas, and see whether the authors can tighten the link to actual alignment practice.

Referee Report

3 major / 1 minor

Summary. The paper investigates the alignment target problem by experimentally comparing moral judgments of 1,002 U.S. adults in a runaway mine train scenario across four conditions: a human repairman, an autonomous repair robot, a repair robot programmed by engineers, and the engineers themselves. It reports no significant difference between judgments of the repairman and the autonomous robot, but a substantial shift toward deontological, rule-based reasoning when human design or designers are made visible. The authors conclude that humans, AI systems, and their designers elicit meaningfully different moral evaluations, creating a pluralistic challenge for determining the appropriate normative target for AI alignment.

Significance. If the directional findings hold under scrutiny, the work provides empirical grounding for questioning the common assumption that human behavior in a situation is the direct benchmark for AI alignment. By showing that rendering human agency salient activates stricter moral constraints, it suggests alignment research may need to address multiple or context-sensitive targets rather than a single human-like standard, with potential relevance for high-stakes domains where designer intent becomes visible.

major comments (3)

[Results] Results: The abstract states 'no significant difference' between repairman and robot conditions and 'markedly more deontological' reasoning when human design is visible, yet supplies no p-values, effect sizes, confidence intervals, per-condition sample sizes, or details of the statistical tests; without these, the magnitude and reliability of the central empirical claim cannot be evaluated.
[Methods] Methods: The exact vignette wording, the specific questions used to elicit moral judgments, and the coding or measurement scheme distinguishing deontological from consequentialist responses are not described, which is load-bearing for assessing whether the design isolates the effect of visible human agency as claimed.
[Discussion] Discussion: The inference that divergent judgments in this single hypothetical create a general 'alignment target problem' for real high-stakes AI deployment assumes the pattern generalizes beyond the mine-train vignette and predicts policy-relevant preferences; no additional scenarios, robustness checks, or external validation are reported to support this step.

minor comments (1)

[Abstract] Abstract: Adding one or two quantitative anchors (e.g., mean ratings or percentage shifts) would make the directional claims more informative without lengthening the summary.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thorough and constructive review. The comments identify key areas where additional detail and caution will strengthen the manuscript. We address each major comment below and specify the planned revisions.

read point-by-point responses

Referee: [Results] Results: The abstract states 'no significant difference' between repairman and robot conditions and 'markedly more deontological' reasoning when human design is visible, yet supplies no p-values, effect sizes, confidence intervals, per-condition sample sizes, or details of the statistical tests; without these, the magnitude and reliability of the central empirical claim cannot be evaluated.

Authors: We agree that the abstract should include these statistical details for self-contained evaluation. The full Results section reports the relevant tests, p-values, effect sizes, confidence intervals, and per-condition sample sizes (approximately 250–251 participants per condition). We will revise the abstract to incorporate summaries of these statistics, such as the non-significant comparison between the repairman and autonomous robot conditions and the significant shift in the human-design conditions. revision: yes
Referee: [Methods] Methods: The exact vignette wording, the specific questions used to elicit moral judgments, and the coding or measurement scheme distinguishing deontological from consequentialist responses are not described, which is load-bearing for assessing whether the design isolates the effect of visible human agency as claimed.

Authors: We acknowledge that the submitted manuscript does not provide sufficient detail on these elements. We will add verbatim vignette text for all four conditions, the exact wording of the moral judgment questions, and a clear description of the coding scheme used to classify responses as deontological versus consequentialist. This will make the isolation of visible human agency transparent and support replicability. revision: yes
Referee: [Discussion] Discussion: The inference that divergent judgments in this single hypothetical create a general 'alignment target problem' for real high-stakes AI deployment assumes the pattern generalizes beyond the mine-train vignette and predicts policy-relevant preferences; no additional scenarios, robustness checks, or external validation are reported to support this step.

Authors: We agree that the study uses only one vignette and provides no additional scenarios or external validation. The manuscript presents the findings as suggestive evidence for the alignment target problem rather than a universal claim. In revision we will expand the Discussion to explicitly note this limitation, discuss boundary conditions of the vignette, and outline future work on robustness checks and real-world validation. We will also moderate language regarding policy implications. revision: partial

Circularity Check

0 steps flagged

Empirical survey study with no derivations or self-referential constructions

full rationale

The paper reports results from a direct experiment with 1,002 U.S. adults responding to a single hypothetical runaway mine train vignette across four conditions. All central claims (differences in deontological vs. consequentialist judgments when human agency is made visible) rest on observed participant data and standard statistical comparisons. No equations, fitted parameters, predictions derived from inputs, self-citations used as load-bearing uniqueness theorems, or ansatzes appear in the abstract or described methods. The inference chain is self-contained empirical observation rather than any reduction to prior self-referential content.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that responses to a single hypothetical trolley-style scenario can be interpreted as evidence about appropriate normative targets for AI alignment in real domains.

axioms (1)

domain assumption Moral judgments elicited by hypothetical scenarios reliably indicate the normative expectations that should guide AI behavior in high-stakes real-world settings.
Invoked when the authors interpret differences in deontological vs. consequentialist reasoning as directly relevant to the alignment target problem.

pith-pipeline@v0.9.0 · 5598 in / 1445 out tokens · 121379 ms · 2026-05-13T07:15:54.517853+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

60 extracted references · 60 canonical work pages · 1 internal anchor

[1]

Michael Anderson. 2006. MedEthEx: A prototype medical ethics advisor. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI '06). AAAI Press

work page 2006
[2]

Edmond Awad, Sohan Dsouza, Richard Kim, Jonathan Schulz, Joseph Henrich, Azim Shariff, Jean-François Bonnefon, and Iyad Rahwan. 2018. The Moral Machine experiment. Nature 563, 7729 (November 2018), 59–64. https://doi.org/10.1038/s41586-018-0637-6

work page doi:10.1038/s41586-018-0637-6 2018
[3]

Edmond Awad, Sohan Dsouza, Azim Shariff, Iyad Rahwan, and Jean-François Bonnefon. 2020. Universals and variations in moral decisions made in 42 countries by 70,000 participants. Proceedings of the National Academy of Sciences 117, 5 (February 2020), 2332–2337. https://doi.org/10.1073/pnas.1911517117

work page doi:10.1073/pnas.1911517117 2020
[4]

Bainbridge, Justin W

Wilma A. Bainbridge, Justin W. Hart, Elizabeth S. Kim, and Brian Scassellati. 2011. The benefits of interactions with physically present robots over video- displayed agents. International Journal of Social Robotics 3, 1 (January 2011), 41–52. https://doi.org/10.1007/s12369-010-0082-7

work page doi:10.1007/s12369-010-0082-7 2011
[5]

William A. Bauer. 2020. Virtuous vs. utilitarian artificial moral agents. AI & Society 35, 1 (March 2020), 263–271. https://doi.org/10.1007/s00146-018-0871-3

work page doi:10.1007/s00146-018-0871-3 2020
[6]

Bigman and Kurt Gray

Yochanan E. Bigman and Kurt Gray. 2018. People are averse to machines making moral decisions. Cognition 181 (December 2018), 21–34. https://doi.org/10.1016/j.cognition.2018.08.003

work page doi:10.1016/j.cognition.2018.08.003 2018
[7]

Joe Brailsford, Frank Vetere, and Eduardo Velloso. 2024. Exploring the association between moral foundations and judgements of AI behaviour. In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '24, May 11–16, 2024, Honolulu, HI, USA). ACM, New York, NY, USA, 1–15. https://doi.org/10.1145/3613904.3642712

work page doi:10.1145/3613904.3642712 2024
[8]

Selmer Bringsjord and Joshua Taylor

S. Selmer Bringsjord and Joshua Taylor. 2012. Introducing divine-command robot ethics. In Robot Ethics: The Ethical and Social Implications of Robotics, Patrick Lin, Keith Abney, and George A. Bekey (Eds.). MIT Press, Cambridge, MA, 85–108

work page 2012
[9]

Dario Cecchini, Michael Pflanzer, and Veljko Dubljević. 2024. Aligning artificial intelligence with moral intuitions: An intuitionist approach to the alignment problem. AI and Ethics (May 2024). https://doi.org/10.1007/s43681-024-00496-5

work page doi:10.1007/s43681-024-00496-5 2024
[10]

Arunima Chakraborty and Nisigandha Bhuyan. 2024. Can artificial intelligence be a Kantian moral agent? On moral autonomy of AI system. AI and Ethics 4, 2 (May 2024), 325–331. https://doi.org/10.1007/s43681-023-00269-6

work page doi:10.1007/s43681-023-00269-6 2024
[11]

Hongyan Chang and Reza Shokri. 2023. Bias propagation in federated learning. arXiv:2309.02160. Retrieved from https://arxiv.org/abs/2309.02160

work page arXiv 2023
[12]

Xiaocong Chen, Chaoran Huang, Lina Yao, Xianzhi Wang, Wei Liu, and Wenjie Zhang. 2020. Knowledge-guided deep reinforcement learning for interactive recommendation. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN '20, July 2020). IEEE, 1–8. https://doi.org/10.1109/IJCNN48605.2020.9207010

work page doi:10.1109/ijcnn48605.2020.9207010 2020
[13]

Yueying Chu and Peng Liu. 2023. Machines and humans in sacrificial moral dilemmas: Required similarly but judged differently? Cognition 239 (October 2023), 105575. https://doi.org/10.1016/j.cognition.2023.105575

work page doi:10.1016/j.cognition.2023.105575 2023
[14]

Martin Cunneen, Martin Mullins, Finbarr Murphy, and Seán Gaines. 2019. Artificial driving intelligence and moral agency: Examining the decision ontology of unavoidable road traffic accidents through the prism of the trolley dilemma. Applied Artificial Intelligence 33, 3 (February 2019), 267–293. https://doi.org/10.1080/08839514.2018.1560124

work page doi:10.1080/08839514.2018.1560124 2019
[15]

Fiery Cushman. 2013. Action, outcome, and value: A dual-system framework for morality. Personality and Social Psychology Review 17, 3 (August 2013), 273– 16

work page 2013
[16]

https://doi.org/10.1177/1088868313495594

work page doi:10.1177/1088868313495594
[17]

John Danaher and Henrik Skaug Sætra. 2022. Technology and moral change: The transformation of truth and trust. Ethics and Information Technology 24, 3 (September 2022), 35. https://doi.org/10.1007/s10676-022-09661-y

work page doi:10.1007/s10676-022-09661-y 2022
[18]

Zackary Okun Dunivin. 2024. Scalable qualitative coding with LLMs: Chain-of-thought reasoning matches human performance in some hermeneutic tasks. arXiv:2401.15170. Retrieved from https://arxiv.org/abs/2401.15170

work page arXiv 2024
[19]

Shuaishuai Fang. 2024. Moral relevance approach for AI ethics. Philosophies 9, 2 (March 2024), 42. https://doi.org/10.3390/philosophies9020042

work page doi:10.3390/philosophies9020042 2024
[20]

Paul Formosa and Malcolm Ryan. 2021. Making moral machines: why we need artificial moral agents. AI Soc. 36, 3 (September 2021), 839–851. https://doi.org/10.1007/s00146-020-01089-6

work page doi:10.1007/s00146-020-01089-6 2021
[21]

Johannes Fürnkranz, Eyke Hüllermeier, Weiwei Cheng, and Sang-Hyeun Park. 2012. Preference-based reinforcement learning: a formal framework and a policy iteration algorithm. Mach. Learn. 89, 1–2 (October 2012), 123–156. https://doi.org/10.1007/s10994-012-5313-8

work page doi:10.1007/s10994-012-5313-8 2012
[22]

Iason Gabriel. 2020. Artificial intelligence, values, and alignment. Minds Mach. 30, 3 (September 2020), 411–437. https://doi.org/10.1007/s11023-020-09539-2

work page doi:10.1007/s11023-020-09539-2 2020
[23]

Fabrizio Gilardi, Meysam Alizadeh, and Maël Kubli. 2023. ChatGPT outperforms crowd workers for text-annotation tasks. Proc. Natl. Acad. Sci. 120, 30 (July 2023), e2305016120. https://doi.org/10.1073/pnas.2305016120

work page doi:10.1073/pnas.2305016120 2023
[24]

Ella Glikson and Anita Williams Woolley. 2020. Human trust in artificial intelligence: Review of empirical research. Acad. Manag. Ann. 14, 2 (July 2020), 627–

work page 2020
[25]

https://doi.org/10.5465/annals.2018.0057

work page doi:10.5465/annals.2018.0057 2018
[26]

Russell, and Anca Dragan

Dylan Hadfield-Menell, Smitha Milli, Pieter Abbeel, Stuart J. Russell, and Anca Dragan. 2017. Inverse reward design. Adv. Neural Inf. Process. Syst. 30 (2017). arXiv:1711.02827. Retrieved from https://arxiv.org/abs/1711.02827

work page arXiv 2017
[27]

Jonathan Haidt, Fredrik Bjorklund, and Scott Murphy. 2000. Moral dumbfounding: When intuition finds no reason. Unpublished manuscript, University of Virginia

work page 2000
[28]

Tae Wan Kim, John Hooker, and Thomas Donaldson. 2021. Taking principles seriously: A hybrid approach to value alignment in artificial intelligence. J. Artif. Intell. Res. 70 (February 2021), 871–890. https://doi.org/10.1613/jair.1.12481

work page doi:10.1613/jair.1.12481 2021
[29]

Markus Kneer and Juri Viehoff. 2025. The hard problem of AI alignment: Value forks in moral judgment. In Proceedings of the 2025 ACM CHI Conference on Human Factors in Computing Systems. ACM, 2671–2681. https://doi.org/10.1145/3715275.3732174

work page doi:10.1145/3715275.3732174 2025
[30]

Michael Laakasuo. 2023. Moral Uncanny Valley revisited – how human expectations of robot morality based on robot appearance moderate the perceived morality of robot decisions in high conflict moral dilemmas. Front. Psychol. 14, (November 2023), 1270371. https://doi.org/10.3389/fpsyg.2023.1270371

work page doi:10.3389/fpsyg.2023.1270371 2023
[31]

Travis LaCroix and Alexandra Sasha Luccioni. 2025. Metaethical perspectives on ‘benchmarking’ AI ethics. AI Ethics 5, 4 (August 2025), 4029–4047. https://doi.org/10.1007/s43681-025-00703-x

work page doi:10.1007/s43681-025-00703-x 2025
[32]

Jan Leike, David Krueger, Tom Everitt, Miljan Martic, Vishal Maini, and Shane Legg. 2018. Scalable agent alignment via reward modeling: a research direction. arXiv:1811.07871. Retrieved from https://arxiv.org/abs/1811.07871

work page Pith review arXiv 2018
[33]

Robert James M. Boyles. 2024. Can’t Bottom-up Artificial Moral Agents Make Moral Judgements? Filos. Sociol. 35, 1 (February 2024). https://doi.org/10.6001/fil-soc.2024.35.1.3

work page doi:10.6001/fil-soc.2024.35.1.3 2024
[34]

Malle, Matthias Scheutz, Thomas Arnold, John Voiklis, and Corey Cusimano

Bertram F. Malle, Matthias Scheutz, Thomas Arnold, John Voiklis, and Corey Cusimano. 2015. Sacrifice One For the Good of Many?: People Apply Different Moral Norms to Human and Robot Agents. In Proceedings of the Tenth Annual ACM/IEEE International Conference on Human-Robot Interaction, March 02,

work page 2015
[35]

https://doi.org/10.1145/2696454.2696458

ACM, Portland Oregon USA, 117–124. https://doi.org/10.1145/2696454.2696458

work page doi:10.1145/2696454.2696458
[36]

Malle, Matthias Scheutz, Corey Cusimano, John Voiklis, Takanori Komatsu, Stuti Thapa, and Salomi Aladia

Bertram F. Malle, Matthias Scheutz, Corey Cusimano, John Voiklis, Takanori Komatsu, Stuti Thapa, and Salomi Aladia. 2025. People’s judgments of humans and robots in a classic moral dilemma. Cognition 254, (January 2025), 105958. https://doi.org/10.1016/j.cognition.2024.105958

work page doi:10.1016/j.cognition.2024.105958 2025
[37]

Malle, Matthias Scheutz, Jodi Forlizzi, and John Voiklis

Bertram F. Malle, Matthias Scheutz, Jodi Forlizzi, and John Voiklis. 2016. Which robot am I thinking about? The impact of action and appearance on people’s evaluations of a moral robot. In 2016 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI), March 2016. IEEE, Christchurch, New Zealand, 125–132. https://doi.org/10.1109/HRI.2016.7451743

work page doi:10.1109/hri.2016.7451743 2016
[38]

Davy Tsz Kit Ng, Wenjie Wu, Jac Ka Lok Leung, Thomas Kin Fung Chiu, and Samuel Kai Wah Chu. 2024. Design and validation of the AI literacy questionnaire: The affective, behavioural, cognitive and ethical approach. Br. J. Educ. Technol. 55, 3 (May 2024), 1082–1104. https://doi.org/10.1111/bjet.13411

work page doi:10.1111/bjet.13411 2024
[39]

Chengxuan Qian, Shuo Xing, Shawn Li, Yue Zhao, and Zhengzhong Tu. 2025. DecAlign: Hierarchical cross-modal alignment for decoupled multimodal representation learning. arXiv:2503.11892. Retrieved from https://arxiv.org/abs/2503.11892

work page internal anchor Pith review Pith/arXiv arXiv 2025
[40]

Shiramizu, Anthony J

Victor Kenji M. Shiramizu, Anthony J. Lee, Daria Altenburg, David R. Feinberg, and Benedict C. Jones. 2022. The role of valence, dominance, and pitch in perceptions of artificial intelligence (AI) conversational agents’ voices. Sci. Rep. 12, 1 (December 2022), 22479. https://doi.org/10.1038/s41598-022-27124-8

work page doi:10.1038/s41598-022-27124-8 2022
[41]

Joar Skalse, Nikolaus H. R. Howe, Dmitrii Krasheninnikov, and David Krueger. 2025. Defining and characterizing reward hacking. arXiv:2209.13085. Retrieved from https://arxiv.org/abs/2209.13085

work page arXiv 2025
[42]

Tai, Lillian R

Robert H. Tai, Lillian R. Bentley, Xin Xia, Jason M. Sitt, Sarah C. Fankhauser, Ana M. Chicas-Mosier, and Barnas G. Monteith. 2024. An Examination of the Use of Large Language Models to Aid Analysis of Textual Data. Int. J. Qual. Methods 23, (January 2024), 16094069241231168. https://doi.org/10.1177/16094069241231168

work page doi:10.1177/16094069241231168 2024
[43]

The Future of Coding

Nga Than, Leanne Fan, Tina Law, Laura K. Nelson, and Leslie McCall. 2025. Updating “The Future of Coding”: Qualitative Coding with Generative Large Language Models. Sociol. Methods Res. 54, 3 (August 2025), 849–888. https://doi.org/10.1177/00491241251339188

work page doi:10.1177/00491241251339188 2025
[44]

Suzanne Tolmeijer, Markus Kneer, Cristina Sarasua, Markus Christen, and Abraham Bernstein. 2021. Implementations in machine ethics: A survey. ACM Comput. Surv. 53, 6 (November 2021), 1–38. https://doi.org/10.1145/3419633

work page doi:10.1145/3419633 2021
[45]

Human Values

Alexey Turchin. 2019. AI Alignment Problem: “Human Values” Idea is Built Upon Many Assumptions. PhilPapers (2019)

work page 2019
[46]

Dan Card and Noah A. Smith. 2020. On Consequentialism and Fairness. Frontiers in Artificial Intelligence 3 (2020), 34. DOI: https://doi.org/10.3389/frai.2020.00034

work page doi:10.3389/frai.2020.00034 2020
[47]

Peter Vamplew, Richard Dazeley, Cameron Foale, Sally Firmin, and Jane Mummery. 2018. Human-aligned artificial intelligence is a multiobjective problem. Ethics Inf. Technol. 20, 1 (March 2018), 27–40. https://doi.org/10.1007/s10676-017-9440-6

work page doi:10.1007/s10676-017-9440-6 2018
[48]

Wendell Wallach and Colin Allen. 2009. Moral machines: teaching robots right from wrong. Oxford University Press, Oxford. https://doi.org/10.1093/acprof:oso/9780195374049.001.0001

work page doi:10.1093/acprof:oso/9780195374049.001.0001 2009
[49]

Norbert Wiener. 1960. Some moral and technical consequences of automation. Science 131, 3410 (May 1960), 1355–1358

work page 1960
[50]

Bernard Williams. 2006. Problems of the self: philosophical papers 1956 - 1972 (Transferred to digital print ed.). Cambridge Univ. Press, Cambridge

work page 2006
[51]

Michael Walzer. 1973. Political action: The problem of dirty hands. Philos. Public Aff. 2, 2 (1973), 160–180

work page 1973
[52]

Yueh-Hua Wu and Shou-De Lin. 2018. A low-cost ethics shaping approach for designing reinforcement learning agents. arXiv:1712.04172. Retrieved from 17 https://arxiv.org/abs/1712.04172

work page arXiv 2018
[53]

Jingling Zhang, Jane Conway, and César A. Hidalgo. 2023. Why people judge humans differently from machines: The role of perceived agency and experience. arXiv:2210.10081. Retrieved from https://arxiv.org/abs/2210.10081

work page arXiv 2023
[54]

Yuyan Zhang, Jiahua Wu, Feng Yu, and Liying Xu. 2023. Moral Judgments of Human vs. AI Agents in Moral Dilemmas. Behav. Sci. 13, 2 (February 2023), 181. https://doi.org/10.3390/bs13020181

work page doi:10.3390/bs13020181 2023
[55]

Malle, S

Bertram F. Malle, S. T. Magar, and Matthias Scheutz. 2019. AI in the sky: How people morally evaluate human and machine decisions in a lethal strike dilemma. In Robotics and Well-Being, Markus Coeckelbergh, Janina Loh, Michael Funk, Johanna Seibt, and Marco Nørskov (Eds.). Springer International Publishing, Cham, 111–133

work page 2019
[56]

Yixiao Zhang and Jiang Lan. 2025. The Practical Problems of Value Alignment and the Chinese Approach. Ideological and Theoretical Education 5, 29–36. https://doi.org/10.16075/j.cnki.cn31-1220/g4.2025.05.003

work page doi:10.16075/j.cnki.cn31-1220/g4.2025.05.003 2025
[57]

Imbens and Donald B

Guido W. Imbens and Donald B. Rubin. 2015. Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge University Press, Cambridge

work page 2015
[58]

Stevens, and Morteza Dehghani

Mohammad Atari, Jonathan Haidt, Jesse Graham, Sena Koleva, Sean T. Stevens, and Morteza Dehghani. 2023. Morality Beyond the WEIRD: How the Nomological Network of Morality Varies Across Cultures. Journal of Personality and Social Psychology 125, 5 (2023), 1179–1224. https://doi.org/10.1037/pspp0000470

work page doi:10.1037/pspp0000470 2023
[59]

Jonathan Haidt and Jesse Graham. 2007. When Morality Opposes Justice: Conservatives Have Moral Intuitions that Liberals may not Recognize. Social Justice Research 20, 1 (2007), 98–116. https://doi.org/10.1007/s11211-007-0034-z

work page doi:10.1007/s11211-007-0034-z 2007
[60]

The needs of the many outweigh the needs of the few

Jesse Graham, Brian A. Nosek, Jonathan Haidt, Ravi Iyer, Spassena Koleva, and Peter H. Ditto. 2011. Mapping the Moral Domain. Journal of Personality and Social Psychology 101, 2 (2011), 366–385. https://doi.org/10.1037/a0021847 18 APPENDICES A.1 Overview of Moral Foundations Theory Moral Foundations Theory (MFT) proposes that human moral judgment is not g...

work page doi:10.1037/a0021847 2011