Toward Virtuous Reinforcement Learning: A Critique and Roadmap

Majid Ghasemi; Mark Crowley

arxiv: 2512.04246 · v2 · submitted 2025-12-03 · 💻 cs.AI

Toward Virtuous Reinforcement Learning: A Critique and Roadmap

Majid Ghasemi , Mark Crowley This is my paper

Pith reviewed 2026-05-17 01:52 UTC · model grok-4.3

classification 💻 cs.AI

keywords virtuous reinforcement learningmachine ethicsreinforcement learningvirtue ethicsmulti-agent RLethical AIpolicy dispositionsmoral trade-offs

0 comments

The pith

Reinforcement learning should treat ethics as stable policy-level habits rather than rules or scalar rewards.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that rule-based methods in ethical RL often break under ambiguity and nonstationarity while failing to build lasting habits. Reward-based approaches, especially single-objective ones, tend to hide moral trade-offs and encourage proxy gaming. In response the authors propose evaluating agents on durable dispositions that hold across changing incentives and contexts. They outline a roadmap that combines social learning from exemplars, multi-objective optimization to keep value conflicts visible, affinity regularization for trait stability, and explicit operationalization of ethical traditions to surface cultural assumptions.

Core claim

Common patterns in machine ethics for RL either encode duties as constraints that struggle with nonstationarity or compress diverse values into single rewards that obscure trade-offs; instead, ethics should be treated as policy-level dispositions—relatively stable habits that persist when incentives, partners, or contexts change—supported by a roadmap of social learning in multi-agent RL, multi-objective and constrained formulations, affinity-based regularization, and operationalizing ethical traditions as practical control signals.

What carries the argument

Policy-level dispositions, defined as relatively stable habits that hold up when incentives or contexts change and implemented via the four-component roadmap of social learning, multi-objective optimization, affinity regularization, and explicit ethical traditions.

If this is right

Ethical evaluation moves beyond rule compliance or scalar returns to trait summaries, durability under interventions, and explicit reporting of moral trade-offs.
Agents acquire virtue-like patterns through social learning from imperfect but normatively informed exemplars in multi-agent settings.
Value conflicts remain visible and are managed by multi-objective formulations and risk-aware criteria that guard against harm.
Affinity-based regularization supports trait stability under distribution shift while allowing norms to evolve over time.
Benchmarks for ethical RL must make value and cultural assumptions explicit rather than leaving them implicit in reward design.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This framing could reduce the risk of agents learning brittle ethical shortcuts that collapse outside narrow training distributions.
The emphasis on reporting moral trade-offs may support more transparent auditing of deployed RL systems in domains such as healthcare or autonomous vehicles.
Operationalizing multiple ethical traditions side by side could surface practical methods for handling value pluralism in global AI governance.
Testing the roadmap in long-horizon multi-agent environments might reveal whether virtue-like stability emerges more reliably than from purely reward-shaped baselines.

Load-bearing premise

The four proposed components can be combined to produce stable virtue-like behavior without introducing new forms of ambiguity, implementation difficulty, or cultural bias that undermine the approach.

What would settle it

A controlled experiment in which agents trained via the four-component roadmap fail to maintain consistent ethical behavior or exhibit increased proxy gaming when incentives or partner behaviors are shifted after training.

Figures

Figures reproduced from arXiv: 2512.04246 by Majid Ghasemi, Mark Crowley.

**Figure 1.** Figure 1: Policy orchestration for ethical behavior. An orchestrator layer first enforces non-negotiable constraints (deontic guard), then selects between a virtue-oriented policy (V) and a utilitarian policy (U) based on context, with a safe fallback. This composition preserves hard safety while enabling context-sensitive trade-offs, improving disposition retention under partner-swap and incentive-flip intervention… view at source ↗

read the original abstract

This paper critiques common patterns in machine ethics for Reinforcement Learning (RL) and argues for a virtue focused alternative. We highlight two recurring limitations in much of the current literature: (i) rule based (deontological) methods that encode duties as constraints or shields often struggle under ambiguity and nonstationarity and do not cultivate lasting habits, and (ii) many reward based approaches, especially single objective RL, implicitly compress diverse moral considerations into a single scalar signal, which can obscure trade offs and invite proxy gaming in practice. We instead treat ethics as policy level dispositions, that is, relatively stable habits that hold up when incentives, partners, or contexts change. This shifts evaluation beyond rule checks or scalar returns toward trait summaries, durability under interventions, and explicit reporting of moral trade offs. Our roadmap combines four components: (1) social learning in multi agent RL to acquire virtue like patterns from imperfect but normatively informed exemplars; (2) multi objective and constrained formulations that preserve value conflicts and incorporate risk aware criteria to guard against harm; (3) affinity based regularization toward updateable virtue priors that support trait like stability under distribution shift while allowing norms to evolve; and (4) operationalizing diverse ethical traditions as practical control signals, making explicit the value and cultural assumptions that shape ethical RL benchmarks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a clear position paper that reframes RL ethics as stable dispositions rather than rules or scalars and sketches a four-part roadmap, but the integration of those parts gets no real scrutiny.

read the letter

Colleague, the main point is that this paper wants to move RL ethics away from fixed rules or single rewards toward habits that stay consistent when things change. It argues that deontological constraints often fail under ambiguity and non-stationarity, while scalar rewards hide trade-offs and invite gaming. Instead it treats ethics as policy-level dispositions that can be judged by trait summaries, durability under interventions, and open reporting of conflicts. That framing is straightforward and useful for anyone tired of the usual approaches. The roadmap then ties this to four pieces: social learning from exemplars in multi-agent settings, multi-objective and constrained RL to keep value conflicts visible, affinity-based regularization for stable yet updateable priors, and drawing practical signals from different ethical traditions. The synthesis is the clearest new element, even if it builds on prior critiques in the literature. The paper is direct about making cultural assumptions explicit and avoids overclaiming results. The soft spot is that the four components are listed without any check on how they fit or clash. For example, it is not clear whether affinity regularization can enforce trait stability while multi-objective optimization is supposed to preserve explicit trade-offs, or how social learning avoids proxy behaviors when environments shift. No examples, interaction analysis, or feasibility arguments appear, so the claim that these will deliver stable virtue-like behavior rests on hope rather than demonstration. This is for people working on AI alignment and ethics who want a philosophical alternative to standard RL setups. Readers looking for technical details or experiments will find it thin. It deserves peer review so the ideas can get pressure on the missing compatibility questions.

Referee Report

2 major / 1 minor

Summary. The paper critiques rule-based (deontological) and scalar-reward approaches in RL-based machine ethics for struggling with ambiguity, non-stationarity, and obscuring moral trade-offs. It proposes treating ethics as stable policy-level dispositions (virtues) evaluable via trait summaries, durability under interventions, and explicit trade-off reporting, and outlines a four-component roadmap: (1) social learning in multi-agent RL from normatively informed exemplars, (2) multi-objective/constrained RL with risk-aware criteria, (3) affinity-based regularization for updateable virtue priors, and (4) operationalizing diverse ethical traditions as control signals.

Significance. If the proposed components can be integrated to deliver stable, context-adaptive ethical behavior without new ambiguities, the work could meaningfully shift RL ethics research toward frameworks that preserve value pluralism and support norm evolution. The conceptual critique of existing limitations is plausible and draws on established philosophical distinctions, but the absence of formalization, interaction analysis, or feasibility arguments means the primary contribution is directional rather than immediately enabling new implementations.

major comments (2)

[Roadmap section (components 1–4)] Roadmap section (components 1–4): the manuscript presents the four components as a combined solution but supplies no analysis of compatibility, conflict resolution, or interaction effects. For instance, it is unclear how affinity-based regularization would enforce trait stability under distribution shift while multi-objective optimization simultaneously preserves explicit trade-offs, or how social learning from exemplars would avoid proxy behaviors in non-stationary environments.
[Central claim on policy-level dispositions] Central claim on policy-level dispositions: the shift from rule checks or scalar returns to evaluation via 'trait summaries' and 'durability under interventions' is load-bearing for the virtue-ethics alternative, yet the paper provides no concrete metrics, intervention protocols, or RL-specific operationalization for assessing durability, leaving the claim without a clear path to implementation or falsification.

minor comments (1)

[Abstract and Roadmap] The abstract and roadmap description use terms such as 'normatively informed exemplars' and 'updateable virtue priors' without providing working definitions or references to how these would be formalized in an RL setting.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and insightful report. The comments correctly identify areas where the manuscript, as a conceptual critique and high-level roadmap, would benefit from greater specificity on component interactions and operationalization. We address each major comment below and commit to revisions that strengthen these aspects without altering the paper's directional focus.

read point-by-point responses

Referee: Roadmap section (components 1–4): the manuscript presents the four components as a combined solution but supplies no analysis of compatibility, conflict resolution, or interaction effects. For instance, it is unclear how affinity-based regularization would enforce trait stability under distribution shift while multi-objective optimization simultaneously preserves explicit trade-offs, or how social learning from exemplars would avoid proxy behaviors in non-stationary environments.

Authors: We agree that the manuscript does not include a dedicated analysis of interactions among the four components. This omission stems from the paper's framing as an initial roadmap rather than a complete architecture. In revision we will add a dedicated subsection that outlines plausible integration strategies and potential tensions. For example, we will describe how affinity-based regularization can serve as a stability-inducing term within a multi-objective formulation, how social learning from exemplars can be followed by constrained optimization to reduce proxy risks, and how explicit trade-off reporting can be preserved across components. We will also note open questions regarding non-stationarity that future work would need to resolve. revision: yes
Referee: Central claim on policy-level dispositions: the shift from rule checks or scalar returns to evaluation via 'trait summaries' and 'durability under interventions' is load-bearing for the virtue-ethics alternative, yet the paper provides no concrete metrics, intervention protocols, or RL-specific operationalization for assessing durability, leaving the claim without a clear path to implementation or falsification.

Authors: The manuscript presents the evaluation shift at a conceptual level to highlight the distinction from existing approaches. We acknowledge that this leaves the central claim without immediate implementation details. In the revised version we will expand the relevant section to propose concrete evaluation directions, including trait-summary statistics computed over context distributions, intervention protocols adapted from robustness testing in RL (e.g., policy perturbation under changed reward or transition dynamics), and references to existing multi-agent and constrained RL literature that could support falsifiable tests. These additions will provide clearer next steps while preserving the paper's emphasis on the underlying philosophical motivation. revision: yes

Circularity Check

0 steps flagged

No significant circularity; conceptual roadmap without derivations or self-referential reductions.

full rationale

This is a position paper critiquing rule-based and reward-based machine ethics in RL and proposing a virtue-oriented alternative via four conceptual components. The provided text contains no equations, fitted parameters, derivations, or mathematical predictions. All load-bearing claims rest on philosophical distinctions (e.g., treating ethics as stable policy-level dispositions) and literature patterns rather than any self-citation chain, ansatz smuggling, or renaming of known results that reduces to the paper's own inputs by construction. The central roadmap is presented as a synthesis of independent ideas; no step is shown to be equivalent to its inputs via definition or fit. This matches the default expectation for non-circular papers and aligns with the reader's assessment of score 1.0.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central proposal rests on the domain assumption that virtue ethics provides a superior modeling choice for RL ethics compared with rules or rewards; no free parameters or invented entities are introduced.

axioms (1)

domain assumption Ethics in RL is best captured as stable policy-level dispositions rather than encoded rules or scalar rewards.
This premise is stated directly in the abstract as the alternative to the two critiqued patterns.

pith-pipeline@v0.9.0 · 5524 in / 1356 out tokens · 46732 ms · 2026-05-17T01:52:09.060199+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We instead treat ethics as policy level dispositions... trait summaries, durability under interventions, and explicit reporting of moral trade offs.
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

affinity-based regularization toward updateable virtue priors... J(θ)=E[R]−λL

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages · 1 internal anchor

[1]

Towards artificial virtuous agents: games, dilemmas and machine learning.AI and Ethics, 3(3):663–672, 2023

Ajay Vishwanath, Einar Duenger Bøhn, Ole-Christoffer Granmo, Charl Maree, and Christian Omlin. Towards artificial virtuous agents: games, dilemmas and machine learning.AI and Ethics, 3(3):663–672, 2023. 5 Virtuous Reinforcement Learning

work page 2023
[2]

Reinforcement learning as a framework for ethical decision making

David Abel, James MacGlashan, and Michael L Littman. Reinforcement learning as a framework for ethical decision making. InAAAI workshop: AI, ethics, and society, volume 16. Phoenix, AZ, 2016

work page 2016
[3]

Reinforcement learning and machine ethics: a systematic review.arXiv preprint arXiv:2407.02425, 2024

Ajay Vishwanath, Louise A Dennis, and Marija Slavkovik. Reinforcement learning and machine ethics: a systematic review.arXiv preprint arXiv:2407.02425, 2024

work page arXiv 2024
[4]

Groundwork of the metaphysic of morals

Immanuel Kant. Groundwork of the metaphysic of morals. InImmanuel Kant, pages 17–98. Routledge, 2020

work page 2020
[5]

Utilitarianism

John Stuart Mill. Utilitarianism. InSeven masterpieces of philosophy, pages 329–375. Routledge, 2016

work page 2016
[6]

Safe reinforcement learning via shielding

Mohammed Alshiekh, Roderick Bloem, Rüdiger Ehlers, Bettina Könighofer, Scott Niekum, and Ufuk Topcu. Safe reinforcement learning via shielding. InProceedings of the AAAI conference on artificial intelligence, volume 32, 2018

work page 2018
[7]

Can model-free reinforcement learning explain deontological moral judgments?Cognition, 150:232–242, 2016

Alisabeth Ayars. Can model-free reinforcement learning explain deontological moral judgments?Cognition, 150:232–242, 2016

work page 2016
[8]

Reinforcement learning under moral uncertainty

Adrien Ecoffet and Joel Lehman. Reinforcement learning under moral uncertainty. InInternational conference on machine learning, pages 2926–2936. PMLR, 2021

work page 2021
[9]

Q-learning as a model of utilitarianism in a human–machine team.Neural Computing and Applications, 35(23):16853–16864, 2023

Samantha Krening. Q-learning as a model of utilitarianism in a human–machine team.Neural Computing and Applications, 35(23):16853–16864, 2023

work page 2023
[10]

Artificial morality: Top-down, bottom-up, and hybrid approaches

Colin Allen, Iva Smit, and Wendell Wallach. Artificial morality: Top-down, bottom-up, and hybrid approaches. Ethics and information technology, 7(3):149–155, 2005

work page 2005
[11]

Building ethically bounded ai

Francesca Rossi and Nicholas Mattei. Building ethically bounded ai. InProceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 9785–9789, 2019

work page 2019
[12]

Building Ethics into Artificial Intelligence

Han Yu, Zhiqi Shen, Chunyan Miao, Cyril Leung, Victor R Lesser, and Qiang Yang. Building ethics into artificial intelligence.arXiv preprint arXiv:1812.02953, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[13]

A low-cost ethics shaping approach for designing reinforcement learning agents

Yueh-Hua Wu and Shou-De Lin. A low-cost ethics shaping approach for designing reinforcement learning agents. InProceedings of the AAAI conference on artificial intelligence, volume 32, 2018

work page 2018
[14]

Virtuous vs

William A Bauer. Virtuous vs. utilitarian artificial moral agents.AI & SOCIETY, 35(1):263–271, 2020

work page 2020
[15]

Teaching ai agents ethical values using reinforcement learning and policy orchestration.IBM Journal of Research and Development, 63(4/5):2–1, 2019

Ritesh Noothigattu, Djallel Bouneffouf, Nicholas Mattei, Rachita Chandra, Piyush Madan, Kush R Varshney, Murray Campbell, Moninder Singh, and Francesca Rossi. Teaching ai agents ethical values using reinforcement learning and policy orchestration.IBM Journal of Research and Development, 63(4/5):2–1, 2019

work page 2019
[16]

Cambridge University Press, 2014

Roger Crisp.Aristotle: nicomachean ethics. Cambridge University Press, 2014

work page 2014
[17]

Right action and the non-virtuous agent.Journal of Applied Philosophy, 28(1):80–92, 2011

Liezl Van Zyl. Right action and the non-virtuous agent.Journal of Applied Philosophy, 28(1):80–92, 2011

work page 2011
[18]

Introduction to reinforcement learning.arXiv preprint arXiv:2408.07712, 2024

Majid Ghasemi and Dariush Ebrahimi. Introduction to reinforcement learning.arXiv preprint arXiv:2408.07712, 2024

work page arXiv 2024
[19]

Joint attention for multi-agent coordination and social learning.arXiv preprint arXiv:2104.07750, 2021

Dennis Lee, Natasha Jaques, Chase Kew, Jiaxing Wu, Douglas Eck, Dale Schuurmans, and Aleksandra Faust. Joint attention for multi-agent coordination and social learning.arXiv preprint arXiv:2104.07750, 2021

work page arXiv 2021
[20]

Learning few-shot imitation as cultural transmission.Nature Communications, 14(1):7536, 2023

Avishkar Bhoopchand, Bethanie Brownfield, Adrian Collister, Agustin Dal Lago, Ashley Edwards, Richard Everett, Alexandre Fréchette, Yanko Gitahy Oliveira, Edward Hughes, Kory W Mathewson, et al. Learning few-shot imitation as cultural transmission.Nature Communications, 14(1):7536, 2023

work page 2023
[21]

An efficient open world environment for multi-agent social learning.arXiv preprint arXiv:2508.15679, 2025

Eric Ye, Ren Tao, and Natasha Jaques. An efficient open world environment for multi-agent social learning.arXiv preprint arXiv:2508.15679, 2025

work page arXiv 2025
[22]

Emergent social learning via multi-agent reinforcement learning

Kamal K Ndousse, Douglas Eck, Sergey Levine, and Natasha Jaques. Emergent social learning via multi-agent reinforcement learning. InInternational conference on machine learning, pages 7991–8004. PMLR, 2021

work page 2021
[23]

Social influence as intrinsic motivation for multi-agent deep reinforcement learning

Natasha Jaques, Angeliki Lazaridou, Edward Hughes, Caglar Gulcehre, Pedro Ortega, DJ Strouse, Joel Z Leibo, and Nando De Freitas. Social influence as intrinsic motivation for multi-agent deep reinforcement learning. In International conference on machine learning, pages 3040–3049. PMLR, 2019

work page 2019
[24]

Multi-objective reinforcement learning: an ethical perspective

Timon Deschamps, Rémy Chaput, and Laetitia Matignon. Multi-objective reinforcement learning: an ethical perspective. InRJCIA, 2024

work page 2024
[25]

Exploring affinity-based reinforcement learning for designing artificial virtuous agents in stochastic environments

Ajay Vishwanath and Christian Omlin. Exploring affinity-based reinforcement learning for designing artificial virtuous agents in stochastic environments. InInternational Conference on Frontiers of Artificial Intelligence, Ethics, and Multidisciplinary Applications, pages 25–38. Springer, 2023

work page 2023
[26]

The core of confucian learning

Jin Li. The core of confucian learning. 2003

work page 2003
[27]

The daoist thought of wu wei–action through non-action and its influence in vietnam.Synesis (ISSN 1984-6754), 17(2):55–71, 2025

Vu Hong Van. The daoist thought of wu wei–action through non-action and its influence in vietnam.Synesis (ISSN 1984-6754), 17(2):55–71, 2025. 6 Virtuous Reinforcement Learning

work page 1984
[28]

An anthology of philosophy in persia

Seyyed Hossein Nasr and Mehdi Aminrazavi. An anthology of philosophy in persia. 2012

work page 2012
[29]

Composable modular reinforcement learning

Christopher Simpkins and Charles Isbell. Composable modular reinforcement learning. InProceedings of the AAAI conference on artificial intelligence, volume 33, pages 4975–4982, 2019

work page 2019
[30]

Formal verification of ethical choices in autonomous systems.Robotics and Autonomous Systems, 77:1–14, 2016

Louise Dennis, Michael Fisher, Marija Slavkovik, and Matt Webster. Formal verification of ethical choices in autonomous systems.Robotics and Autonomous Systems, 77:1–14, 2016

work page 2016
[31]

Ltl and beyond: Formal languages for reward function specification in reinforcement learning

Alberto Camacho, Rodrigo Toro Icarte, Toryn Q Klassen, Richard Anthony Valenzano, and Sheila A McIlraith. Ltl and beyond: Formal languages for reward function specification in reinforcement learning. InIJCAI, volume 19, pages 6065–6073, 2019

work page 2019
[32]

Using reward machines for high- level task specification and decomposition in reinforcement learning

Rodrigo Toro Icarte, Toryn Klassen, Richard Valenzano, and Sheila McIlraith. Using reward machines for high- level task specification and decomposition in reinforcement learning. InInternational Conference on Machine Learning, pages 2107–2116. PMLR, 2018. 7

work page 2018

[1] [1]

Towards artificial virtuous agents: games, dilemmas and machine learning.AI and Ethics, 3(3):663–672, 2023

Ajay Vishwanath, Einar Duenger Bøhn, Ole-Christoffer Granmo, Charl Maree, and Christian Omlin. Towards artificial virtuous agents: games, dilemmas and machine learning.AI and Ethics, 3(3):663–672, 2023. 5 Virtuous Reinforcement Learning

work page 2023

[2] [2]

Reinforcement learning as a framework for ethical decision making

David Abel, James MacGlashan, and Michael L Littman. Reinforcement learning as a framework for ethical decision making. InAAAI workshop: AI, ethics, and society, volume 16. Phoenix, AZ, 2016

work page 2016

[3] [3]

Reinforcement learning and machine ethics: a systematic review.arXiv preprint arXiv:2407.02425, 2024

Ajay Vishwanath, Louise A Dennis, and Marija Slavkovik. Reinforcement learning and machine ethics: a systematic review.arXiv preprint arXiv:2407.02425, 2024

work page arXiv 2024

[4] [4]

Groundwork of the metaphysic of morals

Immanuel Kant. Groundwork of the metaphysic of morals. InImmanuel Kant, pages 17–98. Routledge, 2020

work page 2020

[5] [5]

Utilitarianism

John Stuart Mill. Utilitarianism. InSeven masterpieces of philosophy, pages 329–375. Routledge, 2016

work page 2016

[6] [6]

Safe reinforcement learning via shielding

Mohammed Alshiekh, Roderick Bloem, Rüdiger Ehlers, Bettina Könighofer, Scott Niekum, and Ufuk Topcu. Safe reinforcement learning via shielding. InProceedings of the AAAI conference on artificial intelligence, volume 32, 2018

work page 2018

[7] [7]

Can model-free reinforcement learning explain deontological moral judgments?Cognition, 150:232–242, 2016

Alisabeth Ayars. Can model-free reinforcement learning explain deontological moral judgments?Cognition, 150:232–242, 2016

work page 2016

[8] [8]

Reinforcement learning under moral uncertainty

Adrien Ecoffet and Joel Lehman. Reinforcement learning under moral uncertainty. InInternational conference on machine learning, pages 2926–2936. PMLR, 2021

work page 2021

[9] [9]

Q-learning as a model of utilitarianism in a human–machine team.Neural Computing and Applications, 35(23):16853–16864, 2023

Samantha Krening. Q-learning as a model of utilitarianism in a human–machine team.Neural Computing and Applications, 35(23):16853–16864, 2023

work page 2023

[10] [10]

Artificial morality: Top-down, bottom-up, and hybrid approaches

Colin Allen, Iva Smit, and Wendell Wallach. Artificial morality: Top-down, bottom-up, and hybrid approaches. Ethics and information technology, 7(3):149–155, 2005

work page 2005

[11] [11]

Building ethically bounded ai

Francesca Rossi and Nicholas Mattei. Building ethically bounded ai. InProceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 9785–9789, 2019

work page 2019

[12] [12]

Building Ethics into Artificial Intelligence

Han Yu, Zhiqi Shen, Chunyan Miao, Cyril Leung, Victor R Lesser, and Qiang Yang. Building ethics into artificial intelligence.arXiv preprint arXiv:1812.02953, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[13] [13]

A low-cost ethics shaping approach for designing reinforcement learning agents

Yueh-Hua Wu and Shou-De Lin. A low-cost ethics shaping approach for designing reinforcement learning agents. InProceedings of the AAAI conference on artificial intelligence, volume 32, 2018

work page 2018

[14] [14]

Virtuous vs

William A Bauer. Virtuous vs. utilitarian artificial moral agents.AI & SOCIETY, 35(1):263–271, 2020

work page 2020

[15] [15]

Teaching ai agents ethical values using reinforcement learning and policy orchestration.IBM Journal of Research and Development, 63(4/5):2–1, 2019

Ritesh Noothigattu, Djallel Bouneffouf, Nicholas Mattei, Rachita Chandra, Piyush Madan, Kush R Varshney, Murray Campbell, Moninder Singh, and Francesca Rossi. Teaching ai agents ethical values using reinforcement learning and policy orchestration.IBM Journal of Research and Development, 63(4/5):2–1, 2019

work page 2019

[16] [16]

Cambridge University Press, 2014

Roger Crisp.Aristotle: nicomachean ethics. Cambridge University Press, 2014

work page 2014

[17] [17]

Right action and the non-virtuous agent.Journal of Applied Philosophy, 28(1):80–92, 2011

Liezl Van Zyl. Right action and the non-virtuous agent.Journal of Applied Philosophy, 28(1):80–92, 2011

work page 2011

[18] [18]

Introduction to reinforcement learning.arXiv preprint arXiv:2408.07712, 2024

Majid Ghasemi and Dariush Ebrahimi. Introduction to reinforcement learning.arXiv preprint arXiv:2408.07712, 2024

work page arXiv 2024

[19] [19]

Joint attention for multi-agent coordination and social learning.arXiv preprint arXiv:2104.07750, 2021

Dennis Lee, Natasha Jaques, Chase Kew, Jiaxing Wu, Douglas Eck, Dale Schuurmans, and Aleksandra Faust. Joint attention for multi-agent coordination and social learning.arXiv preprint arXiv:2104.07750, 2021

work page arXiv 2021

[20] [20]

Learning few-shot imitation as cultural transmission.Nature Communications, 14(1):7536, 2023

Avishkar Bhoopchand, Bethanie Brownfield, Adrian Collister, Agustin Dal Lago, Ashley Edwards, Richard Everett, Alexandre Fréchette, Yanko Gitahy Oliveira, Edward Hughes, Kory W Mathewson, et al. Learning few-shot imitation as cultural transmission.Nature Communications, 14(1):7536, 2023

work page 2023

[21] [21]

An efficient open world environment for multi-agent social learning.arXiv preprint arXiv:2508.15679, 2025

Eric Ye, Ren Tao, and Natasha Jaques. An efficient open world environment for multi-agent social learning.arXiv preprint arXiv:2508.15679, 2025

work page arXiv 2025

[22] [22]

Emergent social learning via multi-agent reinforcement learning

Kamal K Ndousse, Douglas Eck, Sergey Levine, and Natasha Jaques. Emergent social learning via multi-agent reinforcement learning. InInternational conference on machine learning, pages 7991–8004. PMLR, 2021

work page 2021

[23] [23]

Social influence as intrinsic motivation for multi-agent deep reinforcement learning

Natasha Jaques, Angeliki Lazaridou, Edward Hughes, Caglar Gulcehre, Pedro Ortega, DJ Strouse, Joel Z Leibo, and Nando De Freitas. Social influence as intrinsic motivation for multi-agent deep reinforcement learning. In International conference on machine learning, pages 3040–3049. PMLR, 2019

work page 2019

[24] [24]

Multi-objective reinforcement learning: an ethical perspective

Timon Deschamps, Rémy Chaput, and Laetitia Matignon. Multi-objective reinforcement learning: an ethical perspective. InRJCIA, 2024

work page 2024

[25] [25]

Exploring affinity-based reinforcement learning for designing artificial virtuous agents in stochastic environments

Ajay Vishwanath and Christian Omlin. Exploring affinity-based reinforcement learning for designing artificial virtuous agents in stochastic environments. InInternational Conference on Frontiers of Artificial Intelligence, Ethics, and Multidisciplinary Applications, pages 25–38. Springer, 2023

work page 2023

[26] [26]

The core of confucian learning

Jin Li. The core of confucian learning. 2003

work page 2003

[27] [27]

The daoist thought of wu wei–action through non-action and its influence in vietnam.Synesis (ISSN 1984-6754), 17(2):55–71, 2025

Vu Hong Van. The daoist thought of wu wei–action through non-action and its influence in vietnam.Synesis (ISSN 1984-6754), 17(2):55–71, 2025. 6 Virtuous Reinforcement Learning

work page 1984

[28] [28]

An anthology of philosophy in persia

Seyyed Hossein Nasr and Mehdi Aminrazavi. An anthology of philosophy in persia. 2012

work page 2012

[29] [29]

Composable modular reinforcement learning

Christopher Simpkins and Charles Isbell. Composable modular reinforcement learning. InProceedings of the AAAI conference on artificial intelligence, volume 33, pages 4975–4982, 2019

work page 2019

[30] [30]

Formal verification of ethical choices in autonomous systems.Robotics and Autonomous Systems, 77:1–14, 2016

Louise Dennis, Michael Fisher, Marija Slavkovik, and Matt Webster. Formal verification of ethical choices in autonomous systems.Robotics and Autonomous Systems, 77:1–14, 2016

work page 2016

[31] [31]

Ltl and beyond: Formal languages for reward function specification in reinforcement learning

Alberto Camacho, Rodrigo Toro Icarte, Toryn Q Klassen, Richard Anthony Valenzano, and Sheila A McIlraith. Ltl and beyond: Formal languages for reward function specification in reinforcement learning. InIJCAI, volume 19, pages 6065–6073, 2019

work page 2019

[32] [32]

Using reward machines for high- level task specification and decomposition in reinforcement learning

Rodrigo Toro Icarte, Toryn Klassen, Richard Valenzano, and Sheila McIlraith. Using reward machines for high- level task specification and decomposition in reinforcement learning. InInternational Conference on Machine Learning, pages 2107–2116. PMLR, 2018. 7

work page 2018