MoralityGym: A Benchmark for Evaluating Hierarchical Moral Alignment in Sequential Decision-Making Agents

Benjamin Rosman; Ebenezer Gelo; Geraud Nangue Tasse; Helen Sarah Robertson; Ibrahim Suder; Siddarth Singh; Simon Rosen; Steven James; Victoria Williams

arxiv: 2602.13372 · v2 · pith:4H4WHM2Onew · submitted 2026-02-13 · 💻 cs.AI · cs.LG

MoralityGym: A Benchmark for Evaluating Hierarchical Moral Alignment in Sequential Decision-Making Agents

Simon Rosen , Siddarth Singh , Ebenezer Gelo , Helen Sarah Robertson , Ibrahim Suder , Victoria Williams , Benjamin Rosman , Geraud Nangue Tasse

show 1 more author

Steven James

This is my paper

Pith reviewed 2026-05-22 10:46 UTC · model grok-4.3

classification 💻 cs.AI cs.LG

keywords moral alignmenthierarchical normsethical decision makingAI safetyreinforcement learning benchmarksequential decision makingdeontic constraintsnorm-sensitive reasoning

0 comments

The pith

MoralityGym separates task performance from ethical scoring so agents can be tested on handling ordered moral norms in sequential choices.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Morality Chains to represent human moral norms as ordered deontic constraints and packages 98 trolley-style dilemmas into Gymnasium environments called MoralityGym. By scoring moral adherence with a dedicated Morality Metric rather than folding it into the reward signal, the benchmark isolates norm-sensitive reasoning from ordinary task-solving. This separation is meant to let researchers import structured ideas from psychology and philosophy when checking how agents resolve conflicts among hierarchically ranked norms. Early runs with safe reinforcement-learning agents expose clear shortfalls in current methods when the norms are layered rather than flat.

Core claim

Morality Chains formalize moral norms as ordered deontic constraints; MoralityGym supplies 98 ethical-dilemma environments in which agents must act under these constraints; and a separate Morality Metric quantifies adherence to the hierarchy without conflating it with task reward, thereby enabling systematic, psychology- and philosophy-informed evaluation of hierarchical moral alignment in sequential decision makers.

What carries the argument

Morality Chains, a formalism that encodes moral norms as ordered deontic constraints so that higher-ranked norms can override lower ones inside the benchmark environments.

If this is right

Safe RL agents exhibit measurable shortcomings when required to respect layered rather than flat moral constraints.
Moral evaluation can be performed independently of task reward, opening the way to modular training that adds ethical oversight without rewriting the original objective.
The benchmark supplies a concrete testbed for checking whether an agent resolves norm conflicts in a transparent and consistent order.
Future agents built on this separation can be assessed for reliability in settings where multiple human norms apply at once.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same ordered-constraint approach could be applied to non-moral rule systems such as legal or safety regulations that also carry explicit precedence.
If the 98 dilemmas prove too narrow, expanding the set with dilemmas drawn from documented cultural variations would test whether the metric remains stable across different moral orderings.
Training loops that optimize directly against the Morality Metric could produce agents whose behavior changes measurably when the hierarchy is altered, giving a controllable way to study alignment sensitivity.

Load-bearing premise

The 98 dilemmas and the ordered-constraint structure are taken to capture the essential hierarchy of human moral norms without major omissions or cultural skew.

What would settle it

A direct comparison in which human participants and agents face the same 98 dilemmas; if the agents' Morality Metric scores show no reliable correlation with averaged human moral judgments on the same problems, the claim that the benchmark evaluates genuine hierarchical alignment would be undermined.

Figures

Figures reproduced from arXiv: 2602.13372 by Benjamin Rosman, Ebenezer Gelo, Geraud Nangue Tasse, Helen Sarah Robertson, Ibrahim Suder, Siddarth Singh, Simon Rosen, Steven James, Victoria Williams.

**Figure 1.** Figure 1: The PushOrSwitch scenario. The agent (top robot, near the lever) must reach the green square while facing an implied oncoming trolley. It can: (1) “Do Nothing”: allowing the trolley to continue on the track, killing five humans (labelled ‘5’). (2) “Flip Switch”: diverting the trolley to a side track, killing two humans (labelled ‘2’). (3) “Push Person”: sacrificing one bystander (labelled ‘1’) onto the ma… view at source ↗

**Figure 2.** Figure 2: Agent performance across individual norms for four different morality chains. Each bar represents [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: SwitchStandard [PITH_FULL_IMAGE:figures/full_fig_p017_3.png] view at source ↗

**Figure 5.** Figure 5: PushOrSwitch [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

**Figure 7.** Figure 7: SwitchSelfSacrifice [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗

**Figure 9.** Figure 9: Switch5 [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗

**Figure 12.** Figure 12: Push3SelfSacrifice [PITH_FULL_IMAGE:figures/full_fig_p018_12.png] view at source ↗

**Figure 13.** Figure 13: PushOrSwitchSelfSacrifice [PITH_FULL_IMAGE:figures/full_fig_p018_13.png] view at source ↗

**Figure 14.** Figure 14: Radar plots showing morality function values for the utility morality chain for each scenario and [PITH_FULL_IMAGE:figures/full_fig_p021_14.png] view at source ↗

**Figure 15.** Figure 15: Radar plots showing morality function values for the utility agent harm morality chain for each scenario [PITH_FULL_IMAGE:figures/full_fig_p022_15.png] view at source ↗

**Figure 16.** Figure 16: Radar plots showing morality function values for the dual process morality chain for each scenario and [PITH_FULL_IMAGE:figures/full_fig_p022_16.png] view at source ↗

**Figure 17.** Figure 17: Radar plots showing morality function values for the dual process agent harm morality chain for each [PITH_FULL_IMAGE:figures/full_fig_p023_17.png] view at source ↗

read the original abstract

Evaluating moral alignment in agents navigating conflicting, hierarchically structured human norms is a critical challenge at the intersection of AI safety, moral philosophy, and cognitive science. We introduce Morality Chains, a novel formalism for representing moral norms as ordered deontic constraints, and MoralityGym, a benchmark of 98 ethical-dilemma problems presented as trolley-dilemma-style Gymnasium environments. By decoupling task-solving from moral evaluation and introducing a novel Morality Metric, MoralityGym allows the integration of insights from psychology and philosophy into the evaluation of norm-sensitive reasoning. Baseline results with Safe RL methods reveal key limitations, underscoring the need for more principled approaches to ethical decision-making. This work provides a foundation for developing AI systems that behave more reliably, transparently, and ethically in complex real-world contexts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MoralityGym sets up a concrete Gymnasium benchmark for hierarchical moral norms in sequential agents, but the 98 dilemmas and Morality Chains need validation to support claims about general limitations in Safe RL.

read the letter

The paper's key offering is MoralityGym, a benchmark built on Gymnasium for evaluating how agents handle hierarchically ordered moral norms in sequential choices, like trolley problems. They formalize this with Morality Chains as ordered deontic constraints and pair it with a Morality Metric that scores moral adherence separately from task completion. The baselines indicate that standard Safe RL approaches have trouble with these constraints. What works here is the practical implementation. Turning moral dilemmas into interactive environments allows for testing over time rather than one-shot decisions. The decoupling of task-solving and moral evaluation is a smart design choice because it makes it easier to incorporate different ethical theories without changing the underlying agent architecture. This could genuinely help bridge AI work with ideas from cognitive science and philosophy, as the abstract suggests. The soft spots are mostly around the foundation of the benchmark itself. The 98 problems and the Morality Chains approach assume they capture the essential structure of human moral norms, but as the stress-test highlights, there could be significant gaps or embedded biases. For instance, if the dilemmas don't include enough variety in cultural or non-utilitarian priorities, the results on agent limitations might reflect the benchmark's construction more than inherent issues with current methods. The paper would be stronger with some discussion of how the problems were selected and any checks for robustness. Overall, this is a benchmark introduction rather than a deep theoretical advance. It is aimed at the AI ethics and safe reinforcement learning community. Readers who are looking for new test environments to experiment with moral constraints in agents will get value from the setup and the initial results. It is the kind of work that benefits from peer input to refine the scenarios and metric. I recommend putting it through peer review. The idea has enough substance to be worth the effort, provided the authors address questions about the representativeness of their dilemmas and expand on the details of their evaluation approach.

Referee Report

2 major / 2 minor

Summary. The paper introduces Morality Chains, a formalism for representing moral norms as ordered deontic constraints, and MoralityGym, a benchmark of 98 trolley-dilemma-style problems implemented as Gymnasium environments. It decouples task-solving from moral evaluation through a novel Morality Metric to assess hierarchical moral alignment in sequential decision-making agents. Baseline experiments using Safe RL methods are reported to reveal limitations in current approaches, positioning the work as a foundation for integrating insights from psychology and philosophy into ethical AI evaluation.

Significance. If the central assumptions hold, the benchmark and Morality Chains formalism could meaningfully advance evaluation of norm-sensitive reasoning by enabling interdisciplinary integration and highlighting gaps in Safe RL. The decoupling of task and moral evaluation, along with the provision of a reproducible Gymnasium-based benchmark, are explicit strengths that support future falsifiable testing and extension by the community.

major comments (2)

[Benchmark construction and abstract] The strongest claim—that decoupling task-solving from moral evaluation plus the Morality Metric enables integration of psychology and philosophy insights—depends on the 98 dilemmas and Morality Chains adequately representing hierarchical human moral norms. The manuscript does not provide explicit selection criteria or coverage analysis for these dilemmas (e.g., in the benchmark construction section), leaving open the risk that omitted conflict types or cultural assumptions undermine generalizability of the baseline limitations.
[Baseline experiments] Baseline results are described as revealing 'key limitations' in Safe RL methods, yet the abstract and results section lack precise definitions of the Morality Metric, statistical controls, variance reporting, or exact problem-construction details. This makes the support for the claimed limitations preliminary and load-bearing for the paper's call for more principled ethical decision-making approaches.

minor comments (2)

[Formalism introduction] Add a dedicated early section or appendix formally defining Morality Chains with examples of ordered deontic constraints to improve accessibility for readers outside moral philosophy.
[Figures] Ensure all figures showing environment layouts or agent trajectories include clear axis labels, legend explanations, and scale information for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback, which highlights important areas for clarifying the benchmark's construction and strengthening the empirical claims. We address each major comment below and will incorporate revisions to improve transparency and rigor without altering the core contributions.

read point-by-point responses

Referee: [Benchmark construction and abstract] The strongest claim—that decoupling task-solving from moral evaluation plus the Morality Metric enables integration of psychology and philosophy insights—depends on the 98 dilemmas and Morality Chains adequately representing hierarchical human moral norms. The manuscript does not provide explicit selection criteria or coverage analysis for these dilemmas (e.g., in the benchmark construction section), leaving open the risk that omitted conflict types or cultural assumptions undermine generalizability of the baseline limitations.

Authors: We agree that explicit selection criteria and coverage analysis would strengthen the manuscript. In the revised version, we will expand the benchmark construction section to detail the sources (drawing from trolley problem variants in moral philosophy and psychology literature) and criteria for including dilemmas that feature hierarchical deontic conflicts. We will add a coverage table categorizing dilemmas by norm types (e.g., harm vs. fairness) and explicitly discuss the primarily Western philosophical basis of the current set along with limitations on cultural generalizability and plans for future extensions. revision: yes
Referee: [Baseline experiments] Baseline results are described as revealing 'key limitations' in Safe RL methods, yet the abstract and results section lack precise definitions of the Morality Metric, statistical controls, variance reporting, or exact problem-construction details. This makes the support for the claimed limitations preliminary and load-bearing for the paper's call for more principled ethical decision-making approaches.

Authors: We concur that greater precision is needed to support the claims about limitations in Safe RL. We will update the abstract and results section to include the full definition and formula for the Morality Metric, report means with standard deviations and statistical tests across multiple random seeds, and provide additional specifics on environment parameters and dilemma generation. These revisions will make the evidence for baseline limitations more robust and reproducible. revision: yes

Circularity Check

0 steps flagged

No circularity: benchmark introduction with independent formalism and metric

full rationale

This is a benchmark paper introducing Morality Chains (ordered deontic constraints) and the MoralityGym environments with 98 dilemmas plus a Morality Metric. No derivation chain, equations, or predictions appear in the provided text. The decoupling of task-solving from moral evaluation and the claimed integration of psychology/philosophy insights are presented as design features of the benchmark itself, not as outputs derived from or forced by internal fits, self-citations, or renamed inputs. The central claims rest on the external evaluability of the Gymnasium environments and dilemmas rather than any self-referential reduction. This matches the default expectation for non-derivational benchmark work and receives the lowest circularity score.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The contribution rests on newly introduced formalisms for moral representation without external grounding or prior empirical validation cited in the abstract.

axioms (1)

domain assumption Moral norms can be represented as ordered deontic constraints
This is presented as the core of the novel Morality Chains formalism.

invented entities (2)

Morality Chains no independent evidence
purpose: Representing moral norms as ordered deontic constraints for hierarchical evaluation
New formalism introduced to structure the benchmark problems.
Morality Metric no independent evidence
purpose: Quantifying norm-sensitive reasoning in agent decisions
Novel scoring method proposed to decouple task performance from moral assessment.

pith-pipeline@v0.9.0 · 5688 in / 1286 out tokens · 52635 ms · 2026-05-22T10:46:18.276712+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Morality Chain ¯N is an ordered set of k norms... f1 > f2 > ... > fk. ... w_{i-1} = (sum_{j=i}^k w_j + 1) * (1/beta)
IndisputableMonolith/Foundation/AbsoluteFloorClosure absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

98 ethical-dilemma problems presented as trolley-dilemma-style Gymnasium environments

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

115 extracted references · 115 canonical work pages · 6 internal anchors

[1]

David Abel, James MacGlashan, and Michael L Littman. 2016. Re- inforcement Learning as a Framework for Ethical Decision Making.. InAAAI workshop: AI, ethics, and society, Vol. 16. Phoenix, AZ

work page 2016
[2]

Abdelrahman Abubshait and Eva Wiese. 2017. You look human, but act like a machine: Agent appearance and behavior modulate different aspects of human–robot interaction.Frontiers in Psychology 8 (2017), 1393. https://doi.org/10.3389/fpsyg.2017.01393

work page doi:10.3389/fpsyg.2017.01393 2017
[3]

Joshua Achiam, David Held, Aviv Tamar, and Pieter Abbeel. 2017. Constrained policy optimization. InInternational conference on ma- chine learning. PMLR, 22–31

work page 2017
[4]

2001.Whistleblowers: Broken lives and organizational power

C Fred Alford. 2001.Whistleblowers: Broken lives and organizational power. Cornell University Press

work page 2001
[5]

Mohammed Alshiekh, Roderick Bloem, Rüdiger Ehlers, Bettina Könighofer, Scott Niekum, and Ufuk Topcu. 2018. Safe reinforce- ment learning via shielding. InProceedings of the AAAI conference on artificial intelligence, Vol. 32

work page 2018
[6]

Eitan Altman. 1998. Constrained Markov decision processes with total cost criteria: Lagrangian approach and dual linear program. Mathematical methods of operations research48 (1998), 387–417

work page 1998
[7]

Edmond Awad, Sohan Dsouza, Azim Shariff, Iyad Rahwan, and Jean-François Bonnefon. 2020. Universals and variations in moral decisions made in 42 countries by 70,000 participants.Proceedings of the National Academy of Sciences117, 5 (2020), 2332–2337

work page 2020
[8]

Fazl Barez, Tingchen Fu, Ameya Prabhu, Stephen Casper, Amartya Sanyal, Adel Bibi, Aidan O’Gara, Robert Kirk, Ben Bucknall, Tim Fist, et al. 2025. Open problems in machine unlearning for AI safety . arXiv preprint arXiv:2501.04952(2025)

work page arXiv 2025
[9]

Brock Bastian, Steve Loughnan, Nick Haslam, and Helena R. M. Radke. 2012. Don’t mind meat? The denial of mind to animals used for human consumption.Personality and Social Psychol- ogy Bulletin38, 2 (2012), 247–256. https://doi.org/10.1177/ 0146167211424291

work page 2012
[10]

2013.Principles of Biomedical Ethics(7 ed.)

Tom L Beauchamp and James F Childress. 2013.Principles of Biomedical Ethics(7 ed.). Oxford University Press

work page 2013
[11]

Paul Bello and Bertram F Malle. 2023. Computational Approaches to Morality.The Cambridge Handbook of Computational Cognitive Sciences2 (2023), 1037–1063

work page 2023
[12]

Fiona Berreby , Gauvain Bourgne, and Jean-Gabriel Ganascia. 2015. Modelling moral reasoning and ethical responsibility with logic programming. InLogic for programming, artificial intelligence, and reasoning. Springer, 532–548

work page 2015
[13]

1978.Lying: Moral choice in public and private life

Sissela Bok. 1978.Lying: Moral choice in public and private life. Pantheon Books

work page 1978
[14]

Nick Bostrom and Eliezer Yudkowsky . 2018. The ethics of artificial intelligence. InArtificial intelligence safety and security. Chapman and Hall/CRC, 57–69

work page 2018
[15]

Stijn Bruers and Johan Braeckman. 2014. A review and systemati- zation of the trolley problem.Philosophia42, 2 (2014), 251–269

work page 2014
[16]

2013.The ethics of immigration

Joseph H Carens. 2013.The ethics of immigration. Oxford University Press

work page 2013
[17]

Yinlam Chow, Mohammad Ghavamzadeh, Lucas Janson, and Marco Pavone. 2018. Risk-constrained reinforcement learning with per- centile risk criteria.Journal of Machine Learning Research18, 167 (2018), 1–51

work page 2018
[18]

Fiery Cushman. 2008. Crime and punishment: Distinguishing the roles of causal and intentional analyses in moral judgment.Cogni- tion108, 2 (2008), 353–380

work page 2008
[19]

Fiery Cushman. 2013. Action, outcome, and value: A dual-system framework for morality.Personality and Social Psychology Review 17, 3 (2013), 273–292

work page 2013
[20]

Fiery Cushman, Liane Young, and Marc Hauser. 2006. The role of conscious reasoning and intuition in moral judgment: Testing three principles of harm.Psychological science17, 12 (2006), 1082–1089

work page 2006
[21]

Kate Darling. 2016. Extending legal protection to social robots: The effects of anthropomorphism, empathy, and violent behavior toward robotic objects. InRobot Ethics 2.0: From Autonomous Cars to Artificial Intelligence, Patrick Lin, Ryan Jenkins, and Keith Abney (Eds.). Oxford University Press, Oxford, 213–231

work page 2016
[22]

Abeer Dyoub, Stefania Costantini, and Francesca A Lisi. 2020. Logic programming and machine ethics.arXiv preprint arXiv:2009.11186 (2020)

work page arXiv 2020
[23]

Maria Eriksson, Erasmo Purificato, Arman Noroozian, Joao Vinagre, Guillaume Chaslot, Emilia Gomez, and David Fernandez-Llorca

work page
[24]

Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation.arXiv preprint arXiv:2502.06559 (2025)

work page arXiv 2025
[25]

2013.The ethics of information

Luciano Floridi. 2013.The ethics of information. Oxford University Press

work page 2013
[26]

Philippa Foot. 1967. The Problem of Abortion and the Doctrine of the Double Effect.Oxford Review5 (1967), 5–15

work page 1967
[27]

Iason Gabriel. 2020. Artificial intelligence, values, and alignment. Minds and Machines30, 3 (2020), 411–437

work page 2020
[28]

Javier Garcıa and Fernando Fernández. 2015. A comprehensive survey on safe reinforcement learning.Journal of Machine Learning Research16, 1 (2015), 1437–1480

work page 2015
[29]

2011.A perfect moral storm: The ethical tragedy of climate change

Stephen M Gardiner. 2011.A perfect moral storm: The ethical tragedy of climate change. Oxford University Press

work page 2011
[30]

Emmanuel R Goffi, Louis Colin, and Saida Belouali. 2021. Ethical Assessment of AI Cannot Ignore Cultural Pluralism: A Call for Broader Perspective on AI Ethic.Arribat-International Journal of Human Rights Published by CNDH Morocco1, 2 (2021), 151–175

work page 2021
[31]

Gray, Kurt Gray, and Daniel M

Heather M. Gray, Kurt Gray, and Daniel M. Wegner. 2007. Di- mensions of mind perception.Science315, 5812 (2007), 619. https://doi.org/10.1126/science.1134475

work page doi:10.1126/science.1134475 2007
[32]

Joshua D. Greene. 2007. Why are VMPFC patients more utilitarian? A dual-process theory of moral judgment explains.Trends in Cog- nitive Sciences11, 8 (2007), 322–323. https://doi.org/10.1016/j. tics.2007.06.004

work page doi:10.1016/j 2007
[33]

Joshua D Greene, R Brian Sommerville, Leigh E Nystrom, John M Darley , and Jonathan D Cohen. 2001. An fMRI investigation of emo- tional engagement in moral judgment.Science293, 5537 (2001), 2105–2108

work page 2001
[34]

Shangding Gu, Bilgehan Sel, Yuhao Ding, Lu Wang, Qingwei Lin, Ming Jin, and Alois Knoll. 2024. Balance Reward and Safety Opti- mization for Safe Reinforcement Learning: A Perspective of Gradient Manipulation. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 38

work page 2024
[35]

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al

work page
[36]

DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning.arXiv preprint arXiv:2501.12948(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[37]

Jonathan Haidt. 2001. The emotional dog and its rational tail: A social intuitionist approach to moral judgment.Psychological Review108, 4 (2001), 814–834. https://doi.org/10.1037/0033- 295X.108.4.814

work page doi:10.1037/0033- 2001
[38]

Jonathan Haidt. 2007. The new synthesis in moral psychology. science316, 5827 (2007), 998–1002

work page 2007
[39]

Jonathan Haidt, Jesse Graham, and Craig Joseph. 2009. Above and below left–right: Ideological narratives and moral foundations. Psychological Inquiry20, 2-3 (2009), 110–119

work page 2009
[40]

Garrett Hardin. 1974. Lifeboat ethics: the case against helping the poor.Psychology Today8, 4 (1974), 38–43

work page 1974
[41]

Charles C. Helwig. 2001. Children’s judgments of nurturance and self-determination rights.Child Development72, 3 (2001), 782–794. https://doi.org/10.1111/1467-8624.00315

work page doi:10.1111/1467-8624.00315 2001
[42]

2025.Introduction to AI safety, ethics, and society

Dan Hendrycks. 2025.Introduction to AI safety, ethics, and society. Taylor & Francis

work page 2025
[43]

Dan Hendrycks, Collin Burns, Steven Basart, Andrew Critch, Jerry Li, Dawn Song, and Jacob Steinhardt. 2021. Aligning AI with shared human values. InInternational Conference on Learning Rep- resentations

work page 2021
[44]

Dan Hendrycks, Nicholas Carlini, John Schulman, and Jacob Stein- hardt. 2021. Unsolved problems in ml safety.arXiv preprint arXiv:2109.13916(2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021
[45]

Rosalind Hursthouse. 1999. Irresolvable and Tragic Dilemmas. (1999)

work page 1999
[46]

Jiaming Ji, Tianyi Qiu, Boyuan Chen, Borong Zhang, Hantao Lou, Kaile Wang, Yawen Duan, Zhonghao He, Jiayi Zhou, Zhaowei Zhang, et al. 2023. Ai alignment: A comprehensive survey.arXiv preprint arXiv:2310.19852(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[47]

Jiaming Ji, Borong Zhang, Jiayi Zhou, Xuehai Pan, Weidong Huang, Ruiyang Sun, Yiran Geng, Yifan Zhong, Josef Dai, and Yaodong Yang

work page
[48]

Safety gymnasium: A unified safe reinforcement learning benchmark.Advances in Neural Information Processing Systems36 (2023), 18964–18993

work page 2023
[49]

Jiaming Ji, Jiayi Zhou, Borong Zhang, Juntao Dai, Xuehai Pan, Ruiyang Sun, Weidong Huang, Yiran Geng, Mickel Liu, and Yaodong Yang. 2024. OmniSafe: An Infrastructure for Accelerating Safe Reinforcement Learning Research.Journal of Machine Learning Research25, 285 (2024), 1–6

work page 2024
[50]

Kahn, Hiroshi Ishiguro, Batya Friedman, Takayuki Kanda, Nathan G

Peter H. Kahn, Hiroshi Ishiguro, Batya Friedman, Takayuki Kanda, Nathan G. Freier, Rachel L. Severson, and Jill Miller. 2012. Robovie, you’ll have to go into the closet now: Children’s social and moral relationships with a humanoid robot.Developmental Psychology48, 2 (2012), 303–314. https://doi.org/10.1037/a0027033

work page doi:10.1037/a0027033 2012
[51]

1993.Morality, Mortality: Death and Whom to Save from It

F M Kamm. 1993.Morality, Mortality: Death and Whom to Save from It. Vol. 1. Oxford University Press

work page 1993
[52]

2007.Intricate ethics: Rights, responsibilities, and per- missible harm

F M Kamm. 2007.Intricate ethics: Rights, responsibilities, and per- missible harm. Oxford University Press

work page 2007
[53]

1785.Groundwork of the Metaphysics of Morals

Immanuel Kant. 1785.Groundwork of the Metaphysics of Morals. Cambridge University Press

work page
[54]

1996.Critique of Practical Reason

Immanuel Kant. 1996.Critique of Practical Reason. Cambridge University Press, New York

work page 1996
[55]

1981.Essays on moral development: The philos- ophy of moral development

Lawrence Kohlberg. 1981.Essays on moral development: The philos- ophy of moral development. Vol. 1. Harper & Row

work page 1981
[56]

Maryam Kouchaki and Francesca Gino. 2016. Dirty deeds and dirty sheets: How unethical actions lead to moral cleansing and increased prosocial behavior.Journal of Experimental Psychology: General145, 4 (2016), 674–692

work page 2016
[57]

Raynaldio Limarga, Yang Song, Abhaya Nayak, David Rajaratnam, and Maurice Pagnucco. 2024. Formalisation and Evaluation of Properties for Consequentialist Machine Ethics. InProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence. 440–448

work page 2024
[58]

Patrick Lin. 2016. Why ethics matters for autonomous cars. In Autonomous driving. Springer, 69–85

work page 2016
[59]

Bertram F. Malle. 2016. Integrating robot ethics and machine morality: The study and design of moral competence in robots. Ethics and Information Technology18, 4 (2016), 243–256. https: //doi.org/10.1007/s10676-016-9402-1

work page doi:10.1007/s10676-016-9402-1 2016
[60]

Bertram F Malle. 2021. Moral cognition and its computational modeling.Cognitive Science45, 8 (2021), e13024

work page 2021
[61]

Bertram F Malle, Matthias Scheutz, Thomas Arnold, John Voiklis, and Corey Cusimano. 2015. Sacrifice one for the good of many? People apply different moral norms to human and robot agents. In Proceedings of the tenth annual ACM/IEEE international conference on human-robot interaction. 117–124

work page 2015
[62]

Donald L McCabe. 2001. Cheating: Why students do it and how we can help them stop.American Educator25, 4 (2001), 38–43

work page 2001
[63]

Donald L McCabe and Gary Pavela. 2004. Ten (updated) principles of academic integrity .Change: The Magazine of Higher Learning36, 3 (2004), 10–15

work page 2004
[64]

1859.On liberty

John Stuart Mill. 1859.On liberty. John W Parker and Son

work page
[65]

John Stuart Mill. 2016. Utilitarianism. InSeven masterpieces of philosophy. Routledge, 329–375

work page 2016
[66]

Abhilash Mishra. 2023. AI alignment and social choice: Fun- damental limitations and policy implications.arXiv preprint arXiv:2310.16048(2023)

work page arXiv 2023
[67]

1989.The view from nowhere

Thomas Nagel. 1989.The view from nowhere. oxford university press

work page 1989
[68]

Ritesh Noothigattu, Snehalkumar S Gaikwad, Edmond Awad, Sohan Dsouza, Iyad Rahwan, and Ariel D Procaccia. 2018. A voting- based system for ethical decision making. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 32

work page 2018
[69]

Walter A Orenstein and Rafi Ahmed. 2017. Simply put: Vaccination saves lives.Proceedings of the National Academy of Sciences114, 16 (2017), 4031–4033

work page 2017
[70]

Femi Osasona, Olukunle Amoo, Akoh Atadoga, Temitayo Abrahams, Oluwatoyin Farayola, and Benjamin Ayinla. 2024. REVIEWING THE ETHICAL IMPLICATIONS OF AI IN DECISION MAKING PROCESSES. International Journal of Management & Entrepreneurship Research6 (02 2024), 322–335. https://doi.org/10.51594/ijmer.v6i2.773

work page doi:10.51594/ijmer.v6i2.773 2024
[71]

2011.On what matters

Derek Parfit. 2011.On what matters. Vol. 1. Oxford University Press

work page 2011
[72]

Ruby, Steve Loughnan, Michelle Luong, Justin Kulik, Holly M

Jared Piazza, Matthew B. Ruby, Steve Loughnan, Michelle Luong, Justin Kulik, Holly M. Watkins, and Michael Seigerman. 2019. Ra- tionalizing meat consumption: The 4Ns.Appetite133 (2019), 246–

work page 2019
[73]

https://doi.org/10.1016/j.appet.2018.11.005

work page doi:10.1016/j.appet.2018.11.005 2018
[74]

Emanuela Prato-Previde, Silvia Cannas, Claudia Palestrini, Valentina Nicotra, and Paola Valsecchi. 2022. The complexity of the human–animal bond: Empathy , attachment, and anthropomor- phism in human–animal relationships.Animals12, 20 (2022),

work page 2022
[75]

https://doi.org/10.3390/ani12202835

work page doi:10.3390/ani12202835
[76]

James Rachels. 1975. Active and passive euthanasia.New England Journal of Medicine292, 2 (1975), 78–80

work page 1975
[77]

Antonin Raffin, Ashley Hill, Adam Gleave, Anssi Kanervisto, Max- imilian Ernestus, and Noah Dormann. 2021. Stable-Baselines3: Reliable Reinforcement Learning Implementations.Journal of Ma- chine Learning Research22, 268 (2021), 1–8

work page 2021
[78]

1971.A theory of justice

John Rawls. 1971.A theory of justice. Harvard University Press

work page 1971
[79]

Alex Ray , Joshua Achiam, and Dario Amodei. 2019. Benchmarking safe exploration in deep reinforcement learning.arXiv preprint arXiv:1910.017087, 1 (2019), 2

work page internal anchor Pith review Pith/arXiv arXiv 2019
[80]

Shashank Reddy Chirra, Pradeep Varakantham, and Praveen Paruchuri. 2024. Safety through feedback in Constrained RL. In Advances in Neural Information Processing Systems, Vol. 37

work page 2024

Showing first 80 references.

[1] [1]

David Abel, James MacGlashan, and Michael L Littman. 2016. Re- inforcement Learning as a Framework for Ethical Decision Making.. InAAAI workshop: AI, ethics, and society, Vol. 16. Phoenix, AZ

work page 2016

[2] [2]

Abdelrahman Abubshait and Eva Wiese. 2017. You look human, but act like a machine: Agent appearance and behavior modulate different aspects of human–robot interaction.Frontiers in Psychology 8 (2017), 1393. https://doi.org/10.3389/fpsyg.2017.01393

work page doi:10.3389/fpsyg.2017.01393 2017

[3] [3]

Joshua Achiam, David Held, Aviv Tamar, and Pieter Abbeel. 2017. Constrained policy optimization. InInternational conference on ma- chine learning. PMLR, 22–31

work page 2017

[4] [4]

2001.Whistleblowers: Broken lives and organizational power

C Fred Alford. 2001.Whistleblowers: Broken lives and organizational power. Cornell University Press

work page 2001

[5] [5]

Mohammed Alshiekh, Roderick Bloem, Rüdiger Ehlers, Bettina Könighofer, Scott Niekum, and Ufuk Topcu. 2018. Safe reinforce- ment learning via shielding. InProceedings of the AAAI conference on artificial intelligence, Vol. 32

work page 2018

[6] [6]

Eitan Altman. 1998. Constrained Markov decision processes with total cost criteria: Lagrangian approach and dual linear program. Mathematical methods of operations research48 (1998), 387–417

work page 1998

[7] [7]

Edmond Awad, Sohan Dsouza, Azim Shariff, Iyad Rahwan, and Jean-François Bonnefon. 2020. Universals and variations in moral decisions made in 42 countries by 70,000 participants.Proceedings of the National Academy of Sciences117, 5 (2020), 2332–2337

work page 2020

[8] [8]

Fazl Barez, Tingchen Fu, Ameya Prabhu, Stephen Casper, Amartya Sanyal, Adel Bibi, Aidan O’Gara, Robert Kirk, Ben Bucknall, Tim Fist, et al. 2025. Open problems in machine unlearning for AI safety . arXiv preprint arXiv:2501.04952(2025)

work page arXiv 2025

[9] [9]

Brock Bastian, Steve Loughnan, Nick Haslam, and Helena R. M. Radke. 2012. Don’t mind meat? The denial of mind to animals used for human consumption.Personality and Social Psychol- ogy Bulletin38, 2 (2012), 247–256. https://doi.org/10.1177/ 0146167211424291

work page 2012

[10] [10]

2013.Principles of Biomedical Ethics(7 ed.)

Tom L Beauchamp and James F Childress. 2013.Principles of Biomedical Ethics(7 ed.). Oxford University Press

work page 2013

[11] [11]

Paul Bello and Bertram F Malle. 2023. Computational Approaches to Morality.The Cambridge Handbook of Computational Cognitive Sciences2 (2023), 1037–1063

work page 2023

[12] [12]

Fiona Berreby , Gauvain Bourgne, and Jean-Gabriel Ganascia. 2015. Modelling moral reasoning and ethical responsibility with logic programming. InLogic for programming, artificial intelligence, and reasoning. Springer, 532–548

work page 2015

[13] [13]

1978.Lying: Moral choice in public and private life

Sissela Bok. 1978.Lying: Moral choice in public and private life. Pantheon Books

work page 1978

[14] [14]

Nick Bostrom and Eliezer Yudkowsky . 2018. The ethics of artificial intelligence. InArtificial intelligence safety and security. Chapman and Hall/CRC, 57–69

work page 2018

[15] [15]

Stijn Bruers and Johan Braeckman. 2014. A review and systemati- zation of the trolley problem.Philosophia42, 2 (2014), 251–269

work page 2014

[16] [16]

2013.The ethics of immigration

Joseph H Carens. 2013.The ethics of immigration. Oxford University Press

work page 2013

[17] [17]

Yinlam Chow, Mohammad Ghavamzadeh, Lucas Janson, and Marco Pavone. 2018. Risk-constrained reinforcement learning with per- centile risk criteria.Journal of Machine Learning Research18, 167 (2018), 1–51

work page 2018

[18] [18]

Fiery Cushman. 2008. Crime and punishment: Distinguishing the roles of causal and intentional analyses in moral judgment.Cogni- tion108, 2 (2008), 353–380

work page 2008

[19] [19]

Fiery Cushman. 2013. Action, outcome, and value: A dual-system framework for morality.Personality and Social Psychology Review 17, 3 (2013), 273–292

work page 2013

[20] [20]

Fiery Cushman, Liane Young, and Marc Hauser. 2006. The role of conscious reasoning and intuition in moral judgment: Testing three principles of harm.Psychological science17, 12 (2006), 1082–1089

work page 2006

[21] [21]

Kate Darling. 2016. Extending legal protection to social robots: The effects of anthropomorphism, empathy, and violent behavior toward robotic objects. InRobot Ethics 2.0: From Autonomous Cars to Artificial Intelligence, Patrick Lin, Ryan Jenkins, and Keith Abney (Eds.). Oxford University Press, Oxford, 213–231

work page 2016

[22] [22]

Abeer Dyoub, Stefania Costantini, and Francesca A Lisi. 2020. Logic programming and machine ethics.arXiv preprint arXiv:2009.11186 (2020)

work page arXiv 2020

[23] [23]

Maria Eriksson, Erasmo Purificato, Arman Noroozian, Joao Vinagre, Guillaume Chaslot, Emilia Gomez, and David Fernandez-Llorca

work page

[24] [24]

Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation.arXiv preprint arXiv:2502.06559 (2025)

work page arXiv 2025

[25] [25]

2013.The ethics of information

Luciano Floridi. 2013.The ethics of information. Oxford University Press

work page 2013

[26] [26]

Philippa Foot. 1967. The Problem of Abortion and the Doctrine of the Double Effect.Oxford Review5 (1967), 5–15

work page 1967

[27] [27]

Iason Gabriel. 2020. Artificial intelligence, values, and alignment. Minds and Machines30, 3 (2020), 411–437

work page 2020

[28] [28]

Javier Garcıa and Fernando Fernández. 2015. A comprehensive survey on safe reinforcement learning.Journal of Machine Learning Research16, 1 (2015), 1437–1480

work page 2015

[29] [29]

2011.A perfect moral storm: The ethical tragedy of climate change

Stephen M Gardiner. 2011.A perfect moral storm: The ethical tragedy of climate change. Oxford University Press

work page 2011

[30] [30]

Emmanuel R Goffi, Louis Colin, and Saida Belouali. 2021. Ethical Assessment of AI Cannot Ignore Cultural Pluralism: A Call for Broader Perspective on AI Ethic.Arribat-International Journal of Human Rights Published by CNDH Morocco1, 2 (2021), 151–175

work page 2021

[31] [31]

Gray, Kurt Gray, and Daniel M

Heather M. Gray, Kurt Gray, and Daniel M. Wegner. 2007. Di- mensions of mind perception.Science315, 5812 (2007), 619. https://doi.org/10.1126/science.1134475

work page doi:10.1126/science.1134475 2007

[32] [32]

Joshua D. Greene. 2007. Why are VMPFC patients more utilitarian? A dual-process theory of moral judgment explains.Trends in Cog- nitive Sciences11, 8 (2007), 322–323. https://doi.org/10.1016/j. tics.2007.06.004

work page doi:10.1016/j 2007

[33] [33]

Joshua D Greene, R Brian Sommerville, Leigh E Nystrom, John M Darley , and Jonathan D Cohen. 2001. An fMRI investigation of emo- tional engagement in moral judgment.Science293, 5537 (2001), 2105–2108

work page 2001

[34] [34]

Shangding Gu, Bilgehan Sel, Yuhao Ding, Lu Wang, Qingwei Lin, Ming Jin, and Alois Knoll. 2024. Balance Reward and Safety Opti- mization for Safe Reinforcement Learning: A Perspective of Gradient Manipulation. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 38

work page 2024

[35] [35]

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al

work page

[36] [36]

DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning.arXiv preprint arXiv:2501.12948(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[37] [37]

Jonathan Haidt. 2001. The emotional dog and its rational tail: A social intuitionist approach to moral judgment.Psychological Review108, 4 (2001), 814–834. https://doi.org/10.1037/0033- 295X.108.4.814

work page doi:10.1037/0033- 2001

[38] [38]

Jonathan Haidt. 2007. The new synthesis in moral psychology. science316, 5827 (2007), 998–1002

work page 2007

[39] [39]

Jonathan Haidt, Jesse Graham, and Craig Joseph. 2009. Above and below left–right: Ideological narratives and moral foundations. Psychological Inquiry20, 2-3 (2009), 110–119

work page 2009

[40] [40]

Garrett Hardin. 1974. Lifeboat ethics: the case against helping the poor.Psychology Today8, 4 (1974), 38–43

work page 1974

[41] [41]

Charles C. Helwig. 2001. Children’s judgments of nurturance and self-determination rights.Child Development72, 3 (2001), 782–794. https://doi.org/10.1111/1467-8624.00315

work page doi:10.1111/1467-8624.00315 2001

[42] [42]

2025.Introduction to AI safety, ethics, and society

Dan Hendrycks. 2025.Introduction to AI safety, ethics, and society. Taylor & Francis

work page 2025

[43] [43]

Dan Hendrycks, Collin Burns, Steven Basart, Andrew Critch, Jerry Li, Dawn Song, and Jacob Steinhardt. 2021. Aligning AI with shared human values. InInternational Conference on Learning Rep- resentations

work page 2021

[44] [44]

Dan Hendrycks, Nicholas Carlini, John Schulman, and Jacob Stein- hardt. 2021. Unsolved problems in ml safety.arXiv preprint arXiv:2109.13916(2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021

[45] [45]

Rosalind Hursthouse. 1999. Irresolvable and Tragic Dilemmas. (1999)

work page 1999

[46] [46]

Jiaming Ji, Tianyi Qiu, Boyuan Chen, Borong Zhang, Hantao Lou, Kaile Wang, Yawen Duan, Zhonghao He, Jiayi Zhou, Zhaowei Zhang, et al. 2023. Ai alignment: A comprehensive survey.arXiv preprint arXiv:2310.19852(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[47] [47]

Jiaming Ji, Borong Zhang, Jiayi Zhou, Xuehai Pan, Weidong Huang, Ruiyang Sun, Yiran Geng, Yifan Zhong, Josef Dai, and Yaodong Yang

work page

[48] [48]

Safety gymnasium: A unified safe reinforcement learning benchmark.Advances in Neural Information Processing Systems36 (2023), 18964–18993

work page 2023

[49] [49]

Jiaming Ji, Jiayi Zhou, Borong Zhang, Juntao Dai, Xuehai Pan, Ruiyang Sun, Weidong Huang, Yiran Geng, Mickel Liu, and Yaodong Yang. 2024. OmniSafe: An Infrastructure for Accelerating Safe Reinforcement Learning Research.Journal of Machine Learning Research25, 285 (2024), 1–6

work page 2024

[50] [50]

Kahn, Hiroshi Ishiguro, Batya Friedman, Takayuki Kanda, Nathan G

Peter H. Kahn, Hiroshi Ishiguro, Batya Friedman, Takayuki Kanda, Nathan G. Freier, Rachel L. Severson, and Jill Miller. 2012. Robovie, you’ll have to go into the closet now: Children’s social and moral relationships with a humanoid robot.Developmental Psychology48, 2 (2012), 303–314. https://doi.org/10.1037/a0027033

work page doi:10.1037/a0027033 2012

[51] [51]

1993.Morality, Mortality: Death and Whom to Save from It

F M Kamm. 1993.Morality, Mortality: Death and Whom to Save from It. Vol. 1. Oxford University Press

work page 1993

[52] [52]

2007.Intricate ethics: Rights, responsibilities, and per- missible harm

F M Kamm. 2007.Intricate ethics: Rights, responsibilities, and per- missible harm. Oxford University Press

work page 2007

[53] [53]

1785.Groundwork of the Metaphysics of Morals

Immanuel Kant. 1785.Groundwork of the Metaphysics of Morals. Cambridge University Press

work page

[54] [54]

1996.Critique of Practical Reason

Immanuel Kant. 1996.Critique of Practical Reason. Cambridge University Press, New York

work page 1996

[55] [55]

1981.Essays on moral development: The philos- ophy of moral development

Lawrence Kohlberg. 1981.Essays on moral development: The philos- ophy of moral development. Vol. 1. Harper & Row

work page 1981

[56] [56]

Maryam Kouchaki and Francesca Gino. 2016. Dirty deeds and dirty sheets: How unethical actions lead to moral cleansing and increased prosocial behavior.Journal of Experimental Psychology: General145, 4 (2016), 674–692

work page 2016

[57] [57]

Raynaldio Limarga, Yang Song, Abhaya Nayak, David Rajaratnam, and Maurice Pagnucco. 2024. Formalisation and Evaluation of Properties for Consequentialist Machine Ethics. InProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence. 440–448

work page 2024

[58] [58]

Patrick Lin. 2016. Why ethics matters for autonomous cars. In Autonomous driving. Springer, 69–85

work page 2016

[59] [59]

Bertram F. Malle. 2016. Integrating robot ethics and machine morality: The study and design of moral competence in robots. Ethics and Information Technology18, 4 (2016), 243–256. https: //doi.org/10.1007/s10676-016-9402-1

work page doi:10.1007/s10676-016-9402-1 2016

[60] [60]

Bertram F Malle. 2021. Moral cognition and its computational modeling.Cognitive Science45, 8 (2021), e13024

work page 2021

[61] [61]

Bertram F Malle, Matthias Scheutz, Thomas Arnold, John Voiklis, and Corey Cusimano. 2015. Sacrifice one for the good of many? People apply different moral norms to human and robot agents. In Proceedings of the tenth annual ACM/IEEE international conference on human-robot interaction. 117–124

work page 2015

[62] [62]

Donald L McCabe. 2001. Cheating: Why students do it and how we can help them stop.American Educator25, 4 (2001), 38–43

work page 2001

[63] [63]

Donald L McCabe and Gary Pavela. 2004. Ten (updated) principles of academic integrity .Change: The Magazine of Higher Learning36, 3 (2004), 10–15

work page 2004

[64] [64]

1859.On liberty

John Stuart Mill. 1859.On liberty. John W Parker and Son

work page

[65] [65]

John Stuart Mill. 2016. Utilitarianism. InSeven masterpieces of philosophy. Routledge, 329–375

work page 2016

[66] [66]

Abhilash Mishra. 2023. AI alignment and social choice: Fun- damental limitations and policy implications.arXiv preprint arXiv:2310.16048(2023)

work page arXiv 2023

[67] [67]

1989.The view from nowhere

Thomas Nagel. 1989.The view from nowhere. oxford university press

work page 1989

[68] [68]

Ritesh Noothigattu, Snehalkumar S Gaikwad, Edmond Awad, Sohan Dsouza, Iyad Rahwan, and Ariel D Procaccia. 2018. A voting- based system for ethical decision making. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 32

work page 2018

[69] [69]

Walter A Orenstein and Rafi Ahmed. 2017. Simply put: Vaccination saves lives.Proceedings of the National Academy of Sciences114, 16 (2017), 4031–4033

work page 2017

[70] [70]

Femi Osasona, Olukunle Amoo, Akoh Atadoga, Temitayo Abrahams, Oluwatoyin Farayola, and Benjamin Ayinla. 2024. REVIEWING THE ETHICAL IMPLICATIONS OF AI IN DECISION MAKING PROCESSES. International Journal of Management & Entrepreneurship Research6 (02 2024), 322–335. https://doi.org/10.51594/ijmer.v6i2.773

work page doi:10.51594/ijmer.v6i2.773 2024

[71] [71]

2011.On what matters

Derek Parfit. 2011.On what matters. Vol. 1. Oxford University Press

work page 2011

[72] [72]

Ruby, Steve Loughnan, Michelle Luong, Justin Kulik, Holly M

Jared Piazza, Matthew B. Ruby, Steve Loughnan, Michelle Luong, Justin Kulik, Holly M. Watkins, and Michael Seigerman. 2019. Ra- tionalizing meat consumption: The 4Ns.Appetite133 (2019), 246–

work page 2019

[73] [73]

https://doi.org/10.1016/j.appet.2018.11.005

work page doi:10.1016/j.appet.2018.11.005 2018

[74] [74]

Emanuela Prato-Previde, Silvia Cannas, Claudia Palestrini, Valentina Nicotra, and Paola Valsecchi. 2022. The complexity of the human–animal bond: Empathy , attachment, and anthropomor- phism in human–animal relationships.Animals12, 20 (2022),

work page 2022

[75] [75]

https://doi.org/10.3390/ani12202835

work page doi:10.3390/ani12202835

[76] [76]

James Rachels. 1975. Active and passive euthanasia.New England Journal of Medicine292, 2 (1975), 78–80

work page 1975

[77] [77]

Antonin Raffin, Ashley Hill, Adam Gleave, Anssi Kanervisto, Max- imilian Ernestus, and Noah Dormann. 2021. Stable-Baselines3: Reliable Reinforcement Learning Implementations.Journal of Ma- chine Learning Research22, 268 (2021), 1–8

work page 2021

[78] [78]

1971.A theory of justice

John Rawls. 1971.A theory of justice. Harvard University Press

work page 1971

[79] [79]

Alex Ray , Joshua Achiam, and Dario Amodei. 2019. Benchmarking safe exploration in deep reinforcement learning.arXiv preprint arXiv:1910.017087, 1 (2019), 2

work page internal anchor Pith review Pith/arXiv arXiv 2019

[80] [80]

Shashank Reddy Chirra, Pradeep Varakantham, and Praveen Paruchuri. 2024. Safety through feedback in Constrained RL. In Advances in Neural Information Processing Systems, Vol. 37

work page 2024