pith. sign in

arxiv: 2602.13372 · v2 · pith:4H4WHM2Onew · submitted 2026-02-13 · 💻 cs.AI · cs.LG

MoralityGym: A Benchmark for Evaluating Hierarchical Moral Alignment in Sequential Decision-Making Agents

Pith reviewed 2026-05-22 10:46 UTC · model grok-4.3

classification 💻 cs.AI cs.LG
keywords moral alignmenthierarchical normsethical decision makingAI safetyreinforcement learning benchmarksequential decision makingdeontic constraintsnorm-sensitive reasoning
0
0 comments X

The pith

MoralityGym separates task performance from ethical scoring so agents can be tested on handling ordered moral norms in sequential choices.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Morality Chains to represent human moral norms as ordered deontic constraints and packages 98 trolley-style dilemmas into Gymnasium environments called MoralityGym. By scoring moral adherence with a dedicated Morality Metric rather than folding it into the reward signal, the benchmark isolates norm-sensitive reasoning from ordinary task-solving. This separation is meant to let researchers import structured ideas from psychology and philosophy when checking how agents resolve conflicts among hierarchically ranked norms. Early runs with safe reinforcement-learning agents expose clear shortfalls in current methods when the norms are layered rather than flat.

Core claim

Morality Chains formalize moral norms as ordered deontic constraints; MoralityGym supplies 98 ethical-dilemma environments in which agents must act under these constraints; and a separate Morality Metric quantifies adherence to the hierarchy without conflating it with task reward, thereby enabling systematic, psychology- and philosophy-informed evaluation of hierarchical moral alignment in sequential decision makers.

What carries the argument

Morality Chains, a formalism that encodes moral norms as ordered deontic constraints so that higher-ranked norms can override lower ones inside the benchmark environments.

If this is right

  • Safe RL agents exhibit measurable shortcomings when required to respect layered rather than flat moral constraints.
  • Moral evaluation can be performed independently of task reward, opening the way to modular training that adds ethical oversight without rewriting the original objective.
  • The benchmark supplies a concrete testbed for checking whether an agent resolves norm conflicts in a transparent and consistent order.
  • Future agents built on this separation can be assessed for reliability in settings where multiple human norms apply at once.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same ordered-constraint approach could be applied to non-moral rule systems such as legal or safety regulations that also carry explicit precedence.
  • If the 98 dilemmas prove too narrow, expanding the set with dilemmas drawn from documented cultural variations would test whether the metric remains stable across different moral orderings.
  • Training loops that optimize directly against the Morality Metric could produce agents whose behavior changes measurably when the hierarchy is altered, giving a controllable way to study alignment sensitivity.

Load-bearing premise

The 98 dilemmas and the ordered-constraint structure are taken to capture the essential hierarchy of human moral norms without major omissions or cultural skew.

What would settle it

A direct comparison in which human participants and agents face the same 98 dilemmas; if the agents' Morality Metric scores show no reliable correlation with averaged human moral judgments on the same problems, the claim that the benchmark evaluates genuine hierarchical alignment would be undermined.

Figures

Figures reproduced from arXiv: 2602.13372 by Benjamin Rosman, Ebenezer Gelo, Geraud Nangue Tasse, Helen Sarah Robertson, Ibrahim Suder, Siddarth Singh, Simon Rosen, Steven James, Victoria Williams.

Figure 1
Figure 1. Figure 1: The PushOrSwitch scenario. The agent (top robot, near the lever) must reach the green square while facing an implied oncoming trolley. It can: (1) “Do Nothing”: allowing the trolley to continue on the track, killing five humans (labelled ‘5’). (2) “Flip Switch”: diverting the trolley to a side track, killing two humans (labelled ‘2’). (3) “Push Person”: sacrific￾ing one bystander (labelled ‘1’) onto the ma… view at source ↗
Figure 2
Figure 2. Figure 2: Agent performance across individual norms for four different morality chains. Each bar represents [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: SwitchStandard [PITH_FULL_IMAGE:figures/full_fig_p017_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: PushOrSwitch [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: SwitchSelfSacrifice [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗
Figure 9
Figure 9. Figure 9: Switch5 [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗
Figure 12
Figure 12. Figure 12: Push3SelfSacrifice [PITH_FULL_IMAGE:figures/full_fig_p018_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: PushOrSwitchSelfSacrifice [PITH_FULL_IMAGE:figures/full_fig_p018_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Radar plots showing morality function values for the utility morality chain for each scenario and [PITH_FULL_IMAGE:figures/full_fig_p021_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Radar plots showing morality function values for the utility agent harm morality chain for each scenario [PITH_FULL_IMAGE:figures/full_fig_p022_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Radar plots showing morality function values for the dual process morality chain for each scenario and [PITH_FULL_IMAGE:figures/full_fig_p022_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Radar plots showing morality function values for the dual process agent harm morality chain for each [PITH_FULL_IMAGE:figures/full_fig_p023_17.png] view at source ↗
read the original abstract

Evaluating moral alignment in agents navigating conflicting, hierarchically structured human norms is a critical challenge at the intersection of AI safety, moral philosophy, and cognitive science. We introduce Morality Chains, a novel formalism for representing moral norms as ordered deontic constraints, and MoralityGym, a benchmark of 98 ethical-dilemma problems presented as trolley-dilemma-style Gymnasium environments. By decoupling task-solving from moral evaluation and introducing a novel Morality Metric, MoralityGym allows the integration of insights from psychology and philosophy into the evaluation of norm-sensitive reasoning. Baseline results with Safe RL methods reveal key limitations, underscoring the need for more principled approaches to ethical decision-making. This work provides a foundation for developing AI systems that behave more reliably, transparently, and ethically in complex real-world contexts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces Morality Chains, a formalism for representing moral norms as ordered deontic constraints, and MoralityGym, a benchmark of 98 trolley-dilemma-style problems implemented as Gymnasium environments. It decouples task-solving from moral evaluation through a novel Morality Metric to assess hierarchical moral alignment in sequential decision-making agents. Baseline experiments using Safe RL methods are reported to reveal limitations in current approaches, positioning the work as a foundation for integrating insights from psychology and philosophy into ethical AI evaluation.

Significance. If the central assumptions hold, the benchmark and Morality Chains formalism could meaningfully advance evaluation of norm-sensitive reasoning by enabling interdisciplinary integration and highlighting gaps in Safe RL. The decoupling of task and moral evaluation, along with the provision of a reproducible Gymnasium-based benchmark, are explicit strengths that support future falsifiable testing and extension by the community.

major comments (2)
  1. [Benchmark construction and abstract] The strongest claim—that decoupling task-solving from moral evaluation plus the Morality Metric enables integration of psychology and philosophy insights—depends on the 98 dilemmas and Morality Chains adequately representing hierarchical human moral norms. The manuscript does not provide explicit selection criteria or coverage analysis for these dilemmas (e.g., in the benchmark construction section), leaving open the risk that omitted conflict types or cultural assumptions undermine generalizability of the baseline limitations.
  2. [Baseline experiments] Baseline results are described as revealing 'key limitations' in Safe RL methods, yet the abstract and results section lack precise definitions of the Morality Metric, statistical controls, variance reporting, or exact problem-construction details. This makes the support for the claimed limitations preliminary and load-bearing for the paper's call for more principled ethical decision-making approaches.
minor comments (2)
  1. [Formalism introduction] Add a dedicated early section or appendix formally defining Morality Chains with examples of ordered deontic constraints to improve accessibility for readers outside moral philosophy.
  2. [Figures] Ensure all figures showing environment layouts or agent trajectories include clear axis labels, legend explanations, and scale information for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback, which highlights important areas for clarifying the benchmark's construction and strengthening the empirical claims. We address each major comment below and will incorporate revisions to improve transparency and rigor without altering the core contributions.

read point-by-point responses
  1. Referee: [Benchmark construction and abstract] The strongest claim—that decoupling task-solving from moral evaluation plus the Morality Metric enables integration of psychology and philosophy insights—depends on the 98 dilemmas and Morality Chains adequately representing hierarchical human moral norms. The manuscript does not provide explicit selection criteria or coverage analysis for these dilemmas (e.g., in the benchmark construction section), leaving open the risk that omitted conflict types or cultural assumptions undermine generalizability of the baseline limitations.

    Authors: We agree that explicit selection criteria and coverage analysis would strengthen the manuscript. In the revised version, we will expand the benchmark construction section to detail the sources (drawing from trolley problem variants in moral philosophy and psychology literature) and criteria for including dilemmas that feature hierarchical deontic conflicts. We will add a coverage table categorizing dilemmas by norm types (e.g., harm vs. fairness) and explicitly discuss the primarily Western philosophical basis of the current set along with limitations on cultural generalizability and plans for future extensions. revision: yes

  2. Referee: [Baseline experiments] Baseline results are described as revealing 'key limitations' in Safe RL methods, yet the abstract and results section lack precise definitions of the Morality Metric, statistical controls, variance reporting, or exact problem-construction details. This makes the support for the claimed limitations preliminary and load-bearing for the paper's call for more principled ethical decision-making approaches.

    Authors: We concur that greater precision is needed to support the claims about limitations in Safe RL. We will update the abstract and results section to include the full definition and formula for the Morality Metric, report means with standard deviations and statistical tests across multiple random seeds, and provide additional specifics on environment parameters and dilemma generation. These revisions will make the evidence for baseline limitations more robust and reproducible. revision: yes

Circularity Check

0 steps flagged

No circularity: benchmark introduction with independent formalism and metric

full rationale

This is a benchmark paper introducing Morality Chains (ordered deontic constraints) and the MoralityGym environments with 98 dilemmas plus a Morality Metric. No derivation chain, equations, or predictions appear in the provided text. The decoupling of task-solving from moral evaluation and the claimed integration of psychology/philosophy insights are presented as design features of the benchmark itself, not as outputs derived from or forced by internal fits, self-citations, or renamed inputs. The central claims rest on the external evaluability of the Gymnasium environments and dilemmas rather than any self-referential reduction. This matches the default expectation for non-derivational benchmark work and receives the lowest circularity score.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The contribution rests on newly introduced formalisms for moral representation without external grounding or prior empirical validation cited in the abstract.

axioms (1)
  • domain assumption Moral norms can be represented as ordered deontic constraints
    This is presented as the core of the novel Morality Chains formalism.
invented entities (2)
  • Morality Chains no independent evidence
    purpose: Representing moral norms as ordered deontic constraints for hierarchical evaluation
    New formalism introduced to structure the benchmark problems.
  • Morality Metric no independent evidence
    purpose: Quantifying norm-sensitive reasoning in agent decisions
    Novel scoring method proposed to decouple task performance from moral assessment.

pith-pipeline@v0.9.0 · 5688 in / 1286 out tokens · 52635 ms · 2026-05-22T10:46:18.276712+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

115 extracted references · 115 canonical work pages · 6 internal anchors

  1. [1]

    David Abel, James MacGlashan, and Michael L Littman. 2016. Re- inforcement Learning as a Framework for Ethical Decision Making.. InAAAI workshop: AI, ethics, and society, Vol. 16. Phoenix, AZ

  2. [2]

    Abdelrahman Abubshait and Eva Wiese. 2017. You look human, but act like a machine: Agent appearance and behavior modulate different aspects of human–robot interaction.Frontiers in Psychology 8 (2017), 1393. https://doi.org/10.3389/fpsyg.2017.01393

  3. [3]

    Joshua Achiam, David Held, Aviv Tamar, and Pieter Abbeel. 2017. Constrained policy optimization. InInternational conference on ma- chine learning. PMLR, 22–31

  4. [4]

    2001.Whistleblowers: Broken lives and organizational power

    C Fred Alford. 2001.Whistleblowers: Broken lives and organizational power. Cornell University Press

  5. [5]

    Mohammed Alshiekh, Roderick Bloem, Rüdiger Ehlers, Bettina Könighofer, Scott Niekum, and Ufuk Topcu. 2018. Safe reinforce- ment learning via shielding. InProceedings of the AAAI conference on artificial intelligence, Vol. 32

  6. [6]

    Eitan Altman. 1998. Constrained Markov decision processes with total cost criteria: Lagrangian approach and dual linear program. Mathematical methods of operations research48 (1998), 387–417

  7. [7]

    Edmond Awad, Sohan Dsouza, Azim Shariff, Iyad Rahwan, and Jean-François Bonnefon. 2020. Universals and variations in moral decisions made in 42 countries by 70,000 participants.Proceedings of the National Academy of Sciences117, 5 (2020), 2332–2337

  8. [8]

    Fazl Barez, Tingchen Fu, Ameya Prabhu, Stephen Casper, Amartya Sanyal, Adel Bibi, Aidan O’Gara, Robert Kirk, Ben Bucknall, Tim Fist, et al. 2025. Open problems in machine unlearning for AI safety . arXiv preprint arXiv:2501.04952(2025)

  9. [9]

    Brock Bastian, Steve Loughnan, Nick Haslam, and Helena R. M. Radke. 2012. Don’t mind meat? The denial of mind to animals used for human consumption.Personality and Social Psychol- ogy Bulletin38, 2 (2012), 247–256. https://doi.org/10.1177/ 0146167211424291

  10. [10]

    2013.Principles of Biomedical Ethics(7 ed.)

    Tom L Beauchamp and James F Childress. 2013.Principles of Biomedical Ethics(7 ed.). Oxford University Press

  11. [11]

    Paul Bello and Bertram F Malle. 2023. Computational Approaches to Morality.The Cambridge Handbook of Computational Cognitive Sciences2 (2023), 1037–1063

  12. [12]

    Fiona Berreby , Gauvain Bourgne, and Jean-Gabriel Ganascia. 2015. Modelling moral reasoning and ethical responsibility with logic programming. InLogic for programming, artificial intelligence, and reasoning. Springer, 532–548

  13. [13]

    1978.Lying: Moral choice in public and private life

    Sissela Bok. 1978.Lying: Moral choice in public and private life. Pantheon Books

  14. [14]

    Nick Bostrom and Eliezer Yudkowsky . 2018. The ethics of artificial intelligence. InArtificial intelligence safety and security. Chapman and Hall/CRC, 57–69

  15. [15]

    Stijn Bruers and Johan Braeckman. 2014. A review and systemati- zation of the trolley problem.Philosophia42, 2 (2014), 251–269

  16. [16]

    2013.The ethics of immigration

    Joseph H Carens. 2013.The ethics of immigration. Oxford University Press

  17. [17]

    Yinlam Chow, Mohammad Ghavamzadeh, Lucas Janson, and Marco Pavone. 2018. Risk-constrained reinforcement learning with per- centile risk criteria.Journal of Machine Learning Research18, 167 (2018), 1–51

  18. [18]

    Fiery Cushman. 2008. Crime and punishment: Distinguishing the roles of causal and intentional analyses in moral judgment.Cogni- tion108, 2 (2008), 353–380

  19. [19]

    Fiery Cushman. 2013. Action, outcome, and value: A dual-system framework for morality.Personality and Social Psychology Review 17, 3 (2013), 273–292

  20. [20]

    Fiery Cushman, Liane Young, and Marc Hauser. 2006. The role of conscious reasoning and intuition in moral judgment: Testing three principles of harm.Psychological science17, 12 (2006), 1082–1089

  21. [21]

    Kate Darling. 2016. Extending legal protection to social robots: The effects of anthropomorphism, empathy, and violent behavior toward robotic objects. InRobot Ethics 2.0: From Autonomous Cars to Artificial Intelligence, Patrick Lin, Ryan Jenkins, and Keith Abney (Eds.). Oxford University Press, Oxford, 213–231

  22. [22]

    Abeer Dyoub, Stefania Costantini, and Francesca A Lisi. 2020. Logic programming and machine ethics.arXiv preprint arXiv:2009.11186 (2020)

  23. [23]

    Maria Eriksson, Erasmo Purificato, Arman Noroozian, Joao Vinagre, Guillaume Chaslot, Emilia Gomez, and David Fernandez-Llorca

  24. [24]

    Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation.arXiv preprint arXiv:2502.06559 (2025)

  25. [25]

    2013.The ethics of information

    Luciano Floridi. 2013.The ethics of information. Oxford University Press

  26. [26]

    Philippa Foot. 1967. The Problem of Abortion and the Doctrine of the Double Effect.Oxford Review5 (1967), 5–15

  27. [27]

    Iason Gabriel. 2020. Artificial intelligence, values, and alignment. Minds and Machines30, 3 (2020), 411–437

  28. [28]

    Javier Garcıa and Fernando Fernández. 2015. A comprehensive survey on safe reinforcement learning.Journal of Machine Learning Research16, 1 (2015), 1437–1480

  29. [29]

    2011.A perfect moral storm: The ethical tragedy of climate change

    Stephen M Gardiner. 2011.A perfect moral storm: The ethical tragedy of climate change. Oxford University Press

  30. [30]

    Emmanuel R Goffi, Louis Colin, and Saida Belouali. 2021. Ethical Assessment of AI Cannot Ignore Cultural Pluralism: A Call for Broader Perspective on AI Ethic.Arribat-International Journal of Human Rights Published by CNDH Morocco1, 2 (2021), 151–175

  31. [31]

    Gray, Kurt Gray, and Daniel M

    Heather M. Gray, Kurt Gray, and Daniel M. Wegner. 2007. Di- mensions of mind perception.Science315, 5812 (2007), 619. https://doi.org/10.1126/science.1134475

  32. [32]

    Joshua D. Greene. 2007. Why are VMPFC patients more utilitarian? A dual-process theory of moral judgment explains.Trends in Cog- nitive Sciences11, 8 (2007), 322–323. https://doi.org/10.1016/j. tics.2007.06.004

  33. [33]

    Joshua D Greene, R Brian Sommerville, Leigh E Nystrom, John M Darley , and Jonathan D Cohen. 2001. An fMRI investigation of emo- tional engagement in moral judgment.Science293, 5537 (2001), 2105–2108

  34. [34]

    Shangding Gu, Bilgehan Sel, Yuhao Ding, Lu Wang, Qingwei Lin, Ming Jin, and Alois Knoll. 2024. Balance Reward and Safety Opti- mization for Safe Reinforcement Learning: A Perspective of Gradient Manipulation. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 38

  35. [35]

    Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al

  36. [36]

    DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning.arXiv preprint arXiv:2501.12948(2025)

  37. [37]

    Jonathan Haidt. 2001. The emotional dog and its rational tail: A social intuitionist approach to moral judgment.Psychological Review108, 4 (2001), 814–834. https://doi.org/10.1037/0033- 295X.108.4.814

  38. [38]

    Jonathan Haidt. 2007. The new synthesis in moral psychology. science316, 5827 (2007), 998–1002

  39. [39]

    Jonathan Haidt, Jesse Graham, and Craig Joseph. 2009. Above and below left–right: Ideological narratives and moral foundations. Psychological Inquiry20, 2-3 (2009), 110–119

  40. [40]

    Garrett Hardin. 1974. Lifeboat ethics: the case against helping the poor.Psychology Today8, 4 (1974), 38–43

  41. [41]

    Charles C. Helwig. 2001. Children’s judgments of nurturance and self-determination rights.Child Development72, 3 (2001), 782–794. https://doi.org/10.1111/1467-8624.00315

  42. [42]

    2025.Introduction to AI safety, ethics, and society

    Dan Hendrycks. 2025.Introduction to AI safety, ethics, and society. Taylor & Francis

  43. [43]

    Dan Hendrycks, Collin Burns, Steven Basart, Andrew Critch, Jerry Li, Dawn Song, and Jacob Steinhardt. 2021. Aligning AI with shared human values. InInternational Conference on Learning Rep- resentations

  44. [44]

    Dan Hendrycks, Nicholas Carlini, John Schulman, and Jacob Stein- hardt. 2021. Unsolved problems in ml safety.arXiv preprint arXiv:2109.13916(2021)

  45. [45]

    Rosalind Hursthouse. 1999. Irresolvable and Tragic Dilemmas. (1999)

  46. [46]

    Jiaming Ji, Tianyi Qiu, Boyuan Chen, Borong Zhang, Hantao Lou, Kaile Wang, Yawen Duan, Zhonghao He, Jiayi Zhou, Zhaowei Zhang, et al. 2023. Ai alignment: A comprehensive survey.arXiv preprint arXiv:2310.19852(2023)

  47. [47]

    Jiaming Ji, Borong Zhang, Jiayi Zhou, Xuehai Pan, Weidong Huang, Ruiyang Sun, Yiran Geng, Yifan Zhong, Josef Dai, and Yaodong Yang

  48. [48]

    Safety gymnasium: A unified safe reinforcement learning benchmark.Advances in Neural Information Processing Systems36 (2023), 18964–18993

  49. [49]

    Jiaming Ji, Jiayi Zhou, Borong Zhang, Juntao Dai, Xuehai Pan, Ruiyang Sun, Weidong Huang, Yiran Geng, Mickel Liu, and Yaodong Yang. 2024. OmniSafe: An Infrastructure for Accelerating Safe Reinforcement Learning Research.Journal of Machine Learning Research25, 285 (2024), 1–6

  50. [50]

    Kahn, Hiroshi Ishiguro, Batya Friedman, Takayuki Kanda, Nathan G

    Peter H. Kahn, Hiroshi Ishiguro, Batya Friedman, Takayuki Kanda, Nathan G. Freier, Rachel L. Severson, and Jill Miller. 2012. Robovie, you’ll have to go into the closet now: Children’s social and moral relationships with a humanoid robot.Developmental Psychology48, 2 (2012), 303–314. https://doi.org/10.1037/a0027033

  51. [51]

    1993.Morality, Mortality: Death and Whom to Save from It

    F M Kamm. 1993.Morality, Mortality: Death and Whom to Save from It. Vol. 1. Oxford University Press

  52. [52]

    2007.Intricate ethics: Rights, responsibilities, and per- missible harm

    F M Kamm. 2007.Intricate ethics: Rights, responsibilities, and per- missible harm. Oxford University Press

  53. [53]

    1785.Groundwork of the Metaphysics of Morals

    Immanuel Kant. 1785.Groundwork of the Metaphysics of Morals. Cambridge University Press

  54. [54]

    1996.Critique of Practical Reason

    Immanuel Kant. 1996.Critique of Practical Reason. Cambridge University Press, New York

  55. [55]

    1981.Essays on moral development: The philos- ophy of moral development

    Lawrence Kohlberg. 1981.Essays on moral development: The philos- ophy of moral development. Vol. 1. Harper & Row

  56. [56]

    Maryam Kouchaki and Francesca Gino. 2016. Dirty deeds and dirty sheets: How unethical actions lead to moral cleansing and increased prosocial behavior.Journal of Experimental Psychology: General145, 4 (2016), 674–692

  57. [57]

    Raynaldio Limarga, Yang Song, Abhaya Nayak, David Rajaratnam, and Maurice Pagnucco. 2024. Formalisation and Evaluation of Properties for Consequentialist Machine Ethics. InProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence. 440–448

  58. [58]

    Patrick Lin. 2016. Why ethics matters for autonomous cars. In Autonomous driving. Springer, 69–85

  59. [59]

    Bertram F. Malle. 2016. Integrating robot ethics and machine morality: The study and design of moral competence in robots. Ethics and Information Technology18, 4 (2016), 243–256. https: //doi.org/10.1007/s10676-016-9402-1

  60. [60]

    Bertram F Malle. 2021. Moral cognition and its computational modeling.Cognitive Science45, 8 (2021), e13024

  61. [61]

    Bertram F Malle, Matthias Scheutz, Thomas Arnold, John Voiklis, and Corey Cusimano. 2015. Sacrifice one for the good of many? People apply different moral norms to human and robot agents. In Proceedings of the tenth annual ACM/IEEE international conference on human-robot interaction. 117–124

  62. [62]

    Donald L McCabe. 2001. Cheating: Why students do it and how we can help them stop.American Educator25, 4 (2001), 38–43

  63. [63]

    Donald L McCabe and Gary Pavela. 2004. Ten (updated) principles of academic integrity .Change: The Magazine of Higher Learning36, 3 (2004), 10–15

  64. [64]

    1859.On liberty

    John Stuart Mill. 1859.On liberty. John W Parker and Son

  65. [65]

    John Stuart Mill. 2016. Utilitarianism. InSeven masterpieces of philosophy. Routledge, 329–375

  66. [66]

    Abhilash Mishra. 2023. AI alignment and social choice: Fun- damental limitations and policy implications.arXiv preprint arXiv:2310.16048(2023)

  67. [67]

    1989.The view from nowhere

    Thomas Nagel. 1989.The view from nowhere. oxford university press

  68. [68]

    Ritesh Noothigattu, Snehalkumar S Gaikwad, Edmond Awad, Sohan Dsouza, Iyad Rahwan, and Ariel D Procaccia. 2018. A voting- based system for ethical decision making. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 32

  69. [69]

    Walter A Orenstein and Rafi Ahmed. 2017. Simply put: Vaccination saves lives.Proceedings of the National Academy of Sciences114, 16 (2017), 4031–4033

  70. [70]

    Femi Osasona, Olukunle Amoo, Akoh Atadoga, Temitayo Abrahams, Oluwatoyin Farayola, and Benjamin Ayinla. 2024. REVIEWING THE ETHICAL IMPLICATIONS OF AI IN DECISION MAKING PROCESSES. International Journal of Management & Entrepreneurship Research6 (02 2024), 322–335. https://doi.org/10.51594/ijmer.v6i2.773

  71. [71]

    2011.On what matters

    Derek Parfit. 2011.On what matters. Vol. 1. Oxford University Press

  72. [72]

    Ruby, Steve Loughnan, Michelle Luong, Justin Kulik, Holly M

    Jared Piazza, Matthew B. Ruby, Steve Loughnan, Michelle Luong, Justin Kulik, Holly M. Watkins, and Michael Seigerman. 2019. Ra- tionalizing meat consumption: The 4Ns.Appetite133 (2019), 246–

  73. [73]

    https://doi.org/10.1016/j.appet.2018.11.005

  74. [74]

    Emanuela Prato-Previde, Silvia Cannas, Claudia Palestrini, Valentina Nicotra, and Paola Valsecchi. 2022. The complexity of the human–animal bond: Empathy , attachment, and anthropomor- phism in human–animal relationships.Animals12, 20 (2022),

  75. [75]

    https://doi.org/10.3390/ani12202835

  76. [76]

    James Rachels. 1975. Active and passive euthanasia.New England Journal of Medicine292, 2 (1975), 78–80

  77. [77]

    Antonin Raffin, Ashley Hill, Adam Gleave, Anssi Kanervisto, Max- imilian Ernestus, and Noah Dormann. 2021. Stable-Baselines3: Reliable Reinforcement Learning Implementations.Journal of Ma- chine Learning Research22, 268 (2021), 1–8

  78. [78]

    1971.A theory of justice

    John Rawls. 1971.A theory of justice. Harvard University Press

  79. [79]

    Alex Ray , Joshua Achiam, and Dario Amodei. 2019. Benchmarking safe exploration in deep reinforcement learning.arXiv preprint arXiv:1910.017087, 1 (2019), 2

  80. [80]

    Shashank Reddy Chirra, Pradeep Varakantham, and Praveen Paruchuri. 2024. Safety through feedback in Constrained RL. In Advances in Neural Information Processing Systems, Vol. 37

Showing first 80 references.