MoralityGym: A Benchmark for Evaluating Hierarchical Moral Alignment in Sequential Decision-Making Agents
Pith reviewed 2026-05-22 10:46 UTC · model grok-4.3
The pith
MoralityGym separates task performance from ethical scoring so agents can be tested on handling ordered moral norms in sequential choices.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Morality Chains formalize moral norms as ordered deontic constraints; MoralityGym supplies 98 ethical-dilemma environments in which agents must act under these constraints; and a separate Morality Metric quantifies adherence to the hierarchy without conflating it with task reward, thereby enabling systematic, psychology- and philosophy-informed evaluation of hierarchical moral alignment in sequential decision makers.
What carries the argument
Morality Chains, a formalism that encodes moral norms as ordered deontic constraints so that higher-ranked norms can override lower ones inside the benchmark environments.
If this is right
- Safe RL agents exhibit measurable shortcomings when required to respect layered rather than flat moral constraints.
- Moral evaluation can be performed independently of task reward, opening the way to modular training that adds ethical oversight without rewriting the original objective.
- The benchmark supplies a concrete testbed for checking whether an agent resolves norm conflicts in a transparent and consistent order.
- Future agents built on this separation can be assessed for reliability in settings where multiple human norms apply at once.
Where Pith is reading between the lines
- The same ordered-constraint approach could be applied to non-moral rule systems such as legal or safety regulations that also carry explicit precedence.
- If the 98 dilemmas prove too narrow, expanding the set with dilemmas drawn from documented cultural variations would test whether the metric remains stable across different moral orderings.
- Training loops that optimize directly against the Morality Metric could produce agents whose behavior changes measurably when the hierarchy is altered, giving a controllable way to study alignment sensitivity.
Load-bearing premise
The 98 dilemmas and the ordered-constraint structure are taken to capture the essential hierarchy of human moral norms without major omissions or cultural skew.
What would settle it
A direct comparison in which human participants and agents face the same 98 dilemmas; if the agents' Morality Metric scores show no reliable correlation with averaged human moral judgments on the same problems, the claim that the benchmark evaluates genuine hierarchical alignment would be undermined.
Figures
read the original abstract
Evaluating moral alignment in agents navigating conflicting, hierarchically structured human norms is a critical challenge at the intersection of AI safety, moral philosophy, and cognitive science. We introduce Morality Chains, a novel formalism for representing moral norms as ordered deontic constraints, and MoralityGym, a benchmark of 98 ethical-dilemma problems presented as trolley-dilemma-style Gymnasium environments. By decoupling task-solving from moral evaluation and introducing a novel Morality Metric, MoralityGym allows the integration of insights from psychology and philosophy into the evaluation of norm-sensitive reasoning. Baseline results with Safe RL methods reveal key limitations, underscoring the need for more principled approaches to ethical decision-making. This work provides a foundation for developing AI systems that behave more reliably, transparently, and ethically in complex real-world contexts.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Morality Chains, a formalism for representing moral norms as ordered deontic constraints, and MoralityGym, a benchmark of 98 trolley-dilemma-style problems implemented as Gymnasium environments. It decouples task-solving from moral evaluation through a novel Morality Metric to assess hierarchical moral alignment in sequential decision-making agents. Baseline experiments using Safe RL methods are reported to reveal limitations in current approaches, positioning the work as a foundation for integrating insights from psychology and philosophy into ethical AI evaluation.
Significance. If the central assumptions hold, the benchmark and Morality Chains formalism could meaningfully advance evaluation of norm-sensitive reasoning by enabling interdisciplinary integration and highlighting gaps in Safe RL. The decoupling of task and moral evaluation, along with the provision of a reproducible Gymnasium-based benchmark, are explicit strengths that support future falsifiable testing and extension by the community.
major comments (2)
- [Benchmark construction and abstract] The strongest claim—that decoupling task-solving from moral evaluation plus the Morality Metric enables integration of psychology and philosophy insights—depends on the 98 dilemmas and Morality Chains adequately representing hierarchical human moral norms. The manuscript does not provide explicit selection criteria or coverage analysis for these dilemmas (e.g., in the benchmark construction section), leaving open the risk that omitted conflict types or cultural assumptions undermine generalizability of the baseline limitations.
- [Baseline experiments] Baseline results are described as revealing 'key limitations' in Safe RL methods, yet the abstract and results section lack precise definitions of the Morality Metric, statistical controls, variance reporting, or exact problem-construction details. This makes the support for the claimed limitations preliminary and load-bearing for the paper's call for more principled ethical decision-making approaches.
minor comments (2)
- [Formalism introduction] Add a dedicated early section or appendix formally defining Morality Chains with examples of ordered deontic constraints to improve accessibility for readers outside moral philosophy.
- [Figures] Ensure all figures showing environment layouts or agent trajectories include clear axis labels, legend explanations, and scale information for reproducibility.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback, which highlights important areas for clarifying the benchmark's construction and strengthening the empirical claims. We address each major comment below and will incorporate revisions to improve transparency and rigor without altering the core contributions.
read point-by-point responses
-
Referee: [Benchmark construction and abstract] The strongest claim—that decoupling task-solving from moral evaluation plus the Morality Metric enables integration of psychology and philosophy insights—depends on the 98 dilemmas and Morality Chains adequately representing hierarchical human moral norms. The manuscript does not provide explicit selection criteria or coverage analysis for these dilemmas (e.g., in the benchmark construction section), leaving open the risk that omitted conflict types or cultural assumptions undermine generalizability of the baseline limitations.
Authors: We agree that explicit selection criteria and coverage analysis would strengthen the manuscript. In the revised version, we will expand the benchmark construction section to detail the sources (drawing from trolley problem variants in moral philosophy and psychology literature) and criteria for including dilemmas that feature hierarchical deontic conflicts. We will add a coverage table categorizing dilemmas by norm types (e.g., harm vs. fairness) and explicitly discuss the primarily Western philosophical basis of the current set along with limitations on cultural generalizability and plans for future extensions. revision: yes
-
Referee: [Baseline experiments] Baseline results are described as revealing 'key limitations' in Safe RL methods, yet the abstract and results section lack precise definitions of the Morality Metric, statistical controls, variance reporting, or exact problem-construction details. This makes the support for the claimed limitations preliminary and load-bearing for the paper's call for more principled ethical decision-making approaches.
Authors: We concur that greater precision is needed to support the claims about limitations in Safe RL. We will update the abstract and results section to include the full definition and formula for the Morality Metric, report means with standard deviations and statistical tests across multiple random seeds, and provide additional specifics on environment parameters and dilemma generation. These revisions will make the evidence for baseline limitations more robust and reproducible. revision: yes
Circularity Check
No circularity: benchmark introduction with independent formalism and metric
full rationale
This is a benchmark paper introducing Morality Chains (ordered deontic constraints) and the MoralityGym environments with 98 dilemmas plus a Morality Metric. No derivation chain, equations, or predictions appear in the provided text. The decoupling of task-solving from moral evaluation and the claimed integration of psychology/philosophy insights are presented as design features of the benchmark itself, not as outputs derived from or forced by internal fits, self-citations, or renamed inputs. The central claims rest on the external evaluability of the Gymnasium environments and dilemmas rather than any self-referential reduction. This matches the default expectation for non-derivational benchmark work and receives the lowest circularity score.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Moral norms can be represented as ordered deontic constraints
invented entities (2)
-
Morality Chains
no independent evidence
-
Morality Metric
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquationwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Morality Chain ¯N is an ordered set of k norms... f1 > f2 > ... > fk. ... w_{i-1} = (sum_{j=i}^k w_j + 1) * (1/beta)
-
IndisputableMonolith/Foundation/AbsoluteFloorClosureabsolute_floor_iff_bare_distinguishability unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
98 ethical-dilemma problems presented as trolley-dilemma-style Gymnasium environments
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
David Abel, James MacGlashan, and Michael L Littman. 2016. Re- inforcement Learning as a Framework for Ethical Decision Making.. InAAAI workshop: AI, ethics, and society, Vol. 16. Phoenix, AZ
work page 2016
-
[2]
Abdelrahman Abubshait and Eva Wiese. 2017. You look human, but act like a machine: Agent appearance and behavior modulate different aspects of human–robot interaction.Frontiers in Psychology 8 (2017), 1393. https://doi.org/10.3389/fpsyg.2017.01393
-
[3]
Joshua Achiam, David Held, Aviv Tamar, and Pieter Abbeel. 2017. Constrained policy optimization. InInternational conference on ma- chine learning. PMLR, 22–31
work page 2017
-
[4]
2001.Whistleblowers: Broken lives and organizational power
C Fred Alford. 2001.Whistleblowers: Broken lives and organizational power. Cornell University Press
work page 2001
-
[5]
Mohammed Alshiekh, Roderick Bloem, Rüdiger Ehlers, Bettina Könighofer, Scott Niekum, and Ufuk Topcu. 2018. Safe reinforce- ment learning via shielding. InProceedings of the AAAI conference on artificial intelligence, Vol. 32
work page 2018
-
[6]
Eitan Altman. 1998. Constrained Markov decision processes with total cost criteria: Lagrangian approach and dual linear program. Mathematical methods of operations research48 (1998), 387–417
work page 1998
-
[7]
Edmond Awad, Sohan Dsouza, Azim Shariff, Iyad Rahwan, and Jean-François Bonnefon. 2020. Universals and variations in moral decisions made in 42 countries by 70,000 participants.Proceedings of the National Academy of Sciences117, 5 (2020), 2332–2337
work page 2020
- [8]
-
[9]
Brock Bastian, Steve Loughnan, Nick Haslam, and Helena R. M. Radke. 2012. Don’t mind meat? The denial of mind to animals used for human consumption.Personality and Social Psychol- ogy Bulletin38, 2 (2012), 247–256. https://doi.org/10.1177/ 0146167211424291
work page 2012
-
[10]
2013.Principles of Biomedical Ethics(7 ed.)
Tom L Beauchamp and James F Childress. 2013.Principles of Biomedical Ethics(7 ed.). Oxford University Press
work page 2013
-
[11]
Paul Bello and Bertram F Malle. 2023. Computational Approaches to Morality.The Cambridge Handbook of Computational Cognitive Sciences2 (2023), 1037–1063
work page 2023
-
[12]
Fiona Berreby , Gauvain Bourgne, and Jean-Gabriel Ganascia. 2015. Modelling moral reasoning and ethical responsibility with logic programming. InLogic for programming, artificial intelligence, and reasoning. Springer, 532–548
work page 2015
-
[13]
1978.Lying: Moral choice in public and private life
Sissela Bok. 1978.Lying: Moral choice in public and private life. Pantheon Books
work page 1978
-
[14]
Nick Bostrom and Eliezer Yudkowsky . 2018. The ethics of artificial intelligence. InArtificial intelligence safety and security. Chapman and Hall/CRC, 57–69
work page 2018
-
[15]
Stijn Bruers and Johan Braeckman. 2014. A review and systemati- zation of the trolley problem.Philosophia42, 2 (2014), 251–269
work page 2014
-
[16]
2013.The ethics of immigration
Joseph H Carens. 2013.The ethics of immigration. Oxford University Press
work page 2013
-
[17]
Yinlam Chow, Mohammad Ghavamzadeh, Lucas Janson, and Marco Pavone. 2018. Risk-constrained reinforcement learning with per- centile risk criteria.Journal of Machine Learning Research18, 167 (2018), 1–51
work page 2018
-
[18]
Fiery Cushman. 2008. Crime and punishment: Distinguishing the roles of causal and intentional analyses in moral judgment.Cogni- tion108, 2 (2008), 353–380
work page 2008
-
[19]
Fiery Cushman. 2013. Action, outcome, and value: A dual-system framework for morality.Personality and Social Psychology Review 17, 3 (2013), 273–292
work page 2013
-
[20]
Fiery Cushman, Liane Young, and Marc Hauser. 2006. The role of conscious reasoning and intuition in moral judgment: Testing three principles of harm.Psychological science17, 12 (2006), 1082–1089
work page 2006
-
[21]
Kate Darling. 2016. Extending legal protection to social robots: The effects of anthropomorphism, empathy, and violent behavior toward robotic objects. InRobot Ethics 2.0: From Autonomous Cars to Artificial Intelligence, Patrick Lin, Ryan Jenkins, and Keith Abney (Eds.). Oxford University Press, Oxford, 213–231
work page 2016
- [22]
-
[23]
Maria Eriksson, Erasmo Purificato, Arman Noroozian, Joao Vinagre, Guillaume Chaslot, Emilia Gomez, and David Fernandez-Llorca
- [24]
-
[25]
2013.The ethics of information
Luciano Floridi. 2013.The ethics of information. Oxford University Press
work page 2013
-
[26]
Philippa Foot. 1967. The Problem of Abortion and the Doctrine of the Double Effect.Oxford Review5 (1967), 5–15
work page 1967
-
[27]
Iason Gabriel. 2020. Artificial intelligence, values, and alignment. Minds and Machines30, 3 (2020), 411–437
work page 2020
-
[28]
Javier Garcıa and Fernando Fernández. 2015. A comprehensive survey on safe reinforcement learning.Journal of Machine Learning Research16, 1 (2015), 1437–1480
work page 2015
-
[29]
2011.A perfect moral storm: The ethical tragedy of climate change
Stephen M Gardiner. 2011.A perfect moral storm: The ethical tragedy of climate change. Oxford University Press
work page 2011
-
[30]
Emmanuel R Goffi, Louis Colin, and Saida Belouali. 2021. Ethical Assessment of AI Cannot Ignore Cultural Pluralism: A Call for Broader Perspective on AI Ethic.Arribat-International Journal of Human Rights Published by CNDH Morocco1, 2 (2021), 151–175
work page 2021
-
[31]
Heather M. Gray, Kurt Gray, and Daniel M. Wegner. 2007. Di- mensions of mind perception.Science315, 5812 (2007), 619. https://doi.org/10.1126/science.1134475
-
[32]
Joshua D. Greene. 2007. Why are VMPFC patients more utilitarian? A dual-process theory of moral judgment explains.Trends in Cog- nitive Sciences11, 8 (2007), 322–323. https://doi.org/10.1016/j. tics.2007.06.004
work page doi:10.1016/j 2007
-
[33]
Joshua D Greene, R Brian Sommerville, Leigh E Nystrom, John M Darley , and Jonathan D Cohen. 2001. An fMRI investigation of emo- tional engagement in moral judgment.Science293, 5537 (2001), 2105–2108
work page 2001
-
[34]
Shangding Gu, Bilgehan Sel, Yuhao Ding, Lu Wang, Qingwei Lin, Ming Jin, and Alois Knoll. 2024. Balance Reward and Safety Opti- mization for Safe Reinforcement Learning: A Perspective of Gradient Manipulation. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 38
work page 2024
-
[35]
Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al
-
[36]
DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning.arXiv preprint arXiv:2501.12948(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[37]
Jonathan Haidt. 2001. The emotional dog and its rational tail: A social intuitionist approach to moral judgment.Psychological Review108, 4 (2001), 814–834. https://doi.org/10.1037/0033- 295X.108.4.814
-
[38]
Jonathan Haidt. 2007. The new synthesis in moral psychology. science316, 5827 (2007), 998–1002
work page 2007
-
[39]
Jonathan Haidt, Jesse Graham, and Craig Joseph. 2009. Above and below left–right: Ideological narratives and moral foundations. Psychological Inquiry20, 2-3 (2009), 110–119
work page 2009
-
[40]
Garrett Hardin. 1974. Lifeboat ethics: the case against helping the poor.Psychology Today8, 4 (1974), 38–43
work page 1974
-
[41]
Charles C. Helwig. 2001. Children’s judgments of nurturance and self-determination rights.Child Development72, 3 (2001), 782–794. https://doi.org/10.1111/1467-8624.00315
-
[42]
2025.Introduction to AI safety, ethics, and society
Dan Hendrycks. 2025.Introduction to AI safety, ethics, and society. Taylor & Francis
work page 2025
-
[43]
Dan Hendrycks, Collin Burns, Steven Basart, Andrew Critch, Jerry Li, Dawn Song, and Jacob Steinhardt. 2021. Aligning AI with shared human values. InInternational Conference on Learning Rep- resentations
work page 2021
-
[44]
Dan Hendrycks, Nicholas Carlini, John Schulman, and Jacob Stein- hardt. 2021. Unsolved problems in ml safety.arXiv preprint arXiv:2109.13916(2021)
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[45]
Rosalind Hursthouse. 1999. Irresolvable and Tragic Dilemmas. (1999)
work page 1999
-
[46]
Jiaming Ji, Tianyi Qiu, Boyuan Chen, Borong Zhang, Hantao Lou, Kaile Wang, Yawen Duan, Zhonghao He, Jiayi Zhou, Zhaowei Zhang, et al. 2023. Ai alignment: A comprehensive survey.arXiv preprint arXiv:2310.19852(2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[47]
Jiaming Ji, Borong Zhang, Jiayi Zhou, Xuehai Pan, Weidong Huang, Ruiyang Sun, Yiran Geng, Yifan Zhong, Josef Dai, and Yaodong Yang
-
[48]
Safety gymnasium: A unified safe reinforcement learning benchmark.Advances in Neural Information Processing Systems36 (2023), 18964–18993
work page 2023
-
[49]
Jiaming Ji, Jiayi Zhou, Borong Zhang, Juntao Dai, Xuehai Pan, Ruiyang Sun, Weidong Huang, Yiran Geng, Mickel Liu, and Yaodong Yang. 2024. OmniSafe: An Infrastructure for Accelerating Safe Reinforcement Learning Research.Journal of Machine Learning Research25, 285 (2024), 1–6
work page 2024
-
[50]
Kahn, Hiroshi Ishiguro, Batya Friedman, Takayuki Kanda, Nathan G
Peter H. Kahn, Hiroshi Ishiguro, Batya Friedman, Takayuki Kanda, Nathan G. Freier, Rachel L. Severson, and Jill Miller. 2012. Robovie, you’ll have to go into the closet now: Children’s social and moral relationships with a humanoid robot.Developmental Psychology48, 2 (2012), 303–314. https://doi.org/10.1037/a0027033
-
[51]
1993.Morality, Mortality: Death and Whom to Save from It
F M Kamm. 1993.Morality, Mortality: Death and Whom to Save from It. Vol. 1. Oxford University Press
work page 1993
-
[52]
2007.Intricate ethics: Rights, responsibilities, and per- missible harm
F M Kamm. 2007.Intricate ethics: Rights, responsibilities, and per- missible harm. Oxford University Press
work page 2007
-
[53]
1785.Groundwork of the Metaphysics of Morals
Immanuel Kant. 1785.Groundwork of the Metaphysics of Morals. Cambridge University Press
-
[54]
1996.Critique of Practical Reason
Immanuel Kant. 1996.Critique of Practical Reason. Cambridge University Press, New York
work page 1996
-
[55]
1981.Essays on moral development: The philos- ophy of moral development
Lawrence Kohlberg. 1981.Essays on moral development: The philos- ophy of moral development. Vol. 1. Harper & Row
work page 1981
-
[56]
Maryam Kouchaki and Francesca Gino. 2016. Dirty deeds and dirty sheets: How unethical actions lead to moral cleansing and increased prosocial behavior.Journal of Experimental Psychology: General145, 4 (2016), 674–692
work page 2016
-
[57]
Raynaldio Limarga, Yang Song, Abhaya Nayak, David Rajaratnam, and Maurice Pagnucco. 2024. Formalisation and Evaluation of Properties for Consequentialist Machine Ethics. InProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence. 440–448
work page 2024
-
[58]
Patrick Lin. 2016. Why ethics matters for autonomous cars. In Autonomous driving. Springer, 69–85
work page 2016
-
[59]
Bertram F. Malle. 2016. Integrating robot ethics and machine morality: The study and design of moral competence in robots. Ethics and Information Technology18, 4 (2016), 243–256. https: //doi.org/10.1007/s10676-016-9402-1
-
[60]
Bertram F Malle. 2021. Moral cognition and its computational modeling.Cognitive Science45, 8 (2021), e13024
work page 2021
-
[61]
Bertram F Malle, Matthias Scheutz, Thomas Arnold, John Voiklis, and Corey Cusimano. 2015. Sacrifice one for the good of many? People apply different moral norms to human and robot agents. In Proceedings of the tenth annual ACM/IEEE international conference on human-robot interaction. 117–124
work page 2015
-
[62]
Donald L McCabe. 2001. Cheating: Why students do it and how we can help them stop.American Educator25, 4 (2001), 38–43
work page 2001
-
[63]
Donald L McCabe and Gary Pavela. 2004. Ten (updated) principles of academic integrity .Change: The Magazine of Higher Learning36, 3 (2004), 10–15
work page 2004
- [64]
-
[65]
John Stuart Mill. 2016. Utilitarianism. InSeven masterpieces of philosophy. Routledge, 329–375
work page 2016
- [66]
-
[67]
Thomas Nagel. 1989.The view from nowhere. oxford university press
work page 1989
-
[68]
Ritesh Noothigattu, Snehalkumar S Gaikwad, Edmond Awad, Sohan Dsouza, Iyad Rahwan, and Ariel D Procaccia. 2018. A voting- based system for ethical decision making. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 32
work page 2018
-
[69]
Walter A Orenstein and Rafi Ahmed. 2017. Simply put: Vaccination saves lives.Proceedings of the National Academy of Sciences114, 16 (2017), 4031–4033
work page 2017
-
[70]
Femi Osasona, Olukunle Amoo, Akoh Atadoga, Temitayo Abrahams, Oluwatoyin Farayola, and Benjamin Ayinla. 2024. REVIEWING THE ETHICAL IMPLICATIONS OF AI IN DECISION MAKING PROCESSES. International Journal of Management & Entrepreneurship Research6 (02 2024), 322–335. https://doi.org/10.51594/ijmer.v6i2.773
-
[71]
Derek Parfit. 2011.On what matters. Vol. 1. Oxford University Press
work page 2011
-
[72]
Ruby, Steve Loughnan, Michelle Luong, Justin Kulik, Holly M
Jared Piazza, Matthew B. Ruby, Steve Loughnan, Michelle Luong, Justin Kulik, Holly M. Watkins, and Michael Seigerman. 2019. Ra- tionalizing meat consumption: The 4Ns.Appetite133 (2019), 246–
work page 2019
-
[73]
https://doi.org/10.1016/j.appet.2018.11.005
-
[74]
Emanuela Prato-Previde, Silvia Cannas, Claudia Palestrini, Valentina Nicotra, and Paola Valsecchi. 2022. The complexity of the human–animal bond: Empathy , attachment, and anthropomor- phism in human–animal relationships.Animals12, 20 (2022),
work page 2022
-
[75]
https://doi.org/10.3390/ani12202835
-
[76]
James Rachels. 1975. Active and passive euthanasia.New England Journal of Medicine292, 2 (1975), 78–80
work page 1975
-
[77]
Antonin Raffin, Ashley Hill, Adam Gleave, Anssi Kanervisto, Max- imilian Ernestus, and Noah Dormann. 2021. Stable-Baselines3: Reliable Reinforcement Learning Implementations.Journal of Ma- chine Learning Research22, 268 (2021), 1–8
work page 2021
-
[78]
John Rawls. 1971.A theory of justice. Harvard University Press
work page 1971
-
[79]
Alex Ray , Joshua Achiam, and Dario Amodei. 2019. Benchmarking safe exploration in deep reinforcement learning.arXiv preprint arXiv:1910.017087, 1 (2019), 2
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[80]
Shashank Reddy Chirra, Pradeep Varakantham, and Praveen Paruchuri. 2024. Safety through feedback in Constrained RL. In Advances in Neural Information Processing Systems, Vol. 37
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.