Fog of Love: Engineering Virtuous Agent Behavior with Affinity-based Reinforcement Learning in a Game Environment

Ajay Vishwanath; Christian Omlin

arxiv: 2606.04750 · v1 · pith:YPUI2FA7new · submitted 2026-06-03 · 💻 cs.AI · cs.CY· cs.LG

Fog of Love: Engineering Virtuous Agent Behavior with Affinity-based Reinforcement Learning in a Game Environment

Ajay Vishwanath , Christian Omlin This is my paper

Pith reviewed 2026-06-28 06:09 UTC · model grok-4.3

classification 💻 cs.AI cs.CYcs.LG

keywords affinity-based reinforcement learningmulti-agent reinforcement learningvirtuous agent behaviorFog of Love environmentpolicy regularizationcompetitive and cooperative objectivesinterpretable agent teleology

0 comments

The pith

Localized affinities enable agents to achieve higher scores in both individual virtues and cooperative relationship goals in the Fog of Love environment.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper extends affinity-based reinforcement learning to a two-player game environment modeled on the board game Fog of Love, where agents must pursue separate virtues while cooperating on a shared relationship. Standard multi-agent deep deterministic policy gradient methods fail to succeed at either competition or cooperation in this setting. The authors show that localized affinities, applied via policy regularization, produce superior overall scores across both domains. This yields virtuous actions and makes the agents' goals and decisions more interpretable to humans. The work matters to a sympathetic reader because it offers a method for guiding AI behavior toward virtue without complete dependence on reward-function engineering.

Core claim

Localized affinities enhance agent performance in achieving both competitive and cooperative objectives in the Fog of Love environment, resulting from superior overall scores in both domains; this produces virtuous choices, clarifies an agent's teleology, and renders its behavior human-level interpretable.

What carries the argument

Affinity-based reinforcement learning that applies policy regularization to the objective function to incentivize virtuous actions through localized affinities.

If this is right

Agents succeed at both competition and cooperation where standard MADDPG methods fail.
Agent behavior becomes human-level interpretable through clarified teleology.
Virtuous choices emerge without full reliance on reward-function design.
Superior scores are obtained simultaneously in individual virtue and shared relationship domains.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The regularization technique could be adapted to other multi-objective environments where reward engineering is difficult.
Interpretable teleology may help identify when agents align with or deviate from intended ethical goals.
The method might reduce the need for exhaustive reward shaping in settings that mix self-interest and collaboration.

Load-bearing premise

That superior scores on the chosen metrics in the Fog of Love environment correspond to genuinely virtuous behavior rather than exploitation of the affinity regularization or game rules.

What would settle it

Observation of agents reaching high scores via affinity regularization yet consistently choosing actions that exploit game mechanics without satisfying the intended virtues or relationship requirements.

read the original abstract

Instilling virtuous behavior in artificial intelligence has seen increasing interest. One of the techniques proposed is known as affinity-based reinforcement learning, which uses policy regularization on the objective function to incentivize virtuous actions without being fully dependent on the reward function design. Thus far, this technique has been demonstrated to be effective in grid worlds and toy-problem environments with minimal state and action spaces. To expand this research to more sophisticated environments, we introduce a two-player multi-agent environment based on the role-playing board game known as Fog of Love. In this environment, two agents compete to fulfill their individual virtues, while also cooperating to satisfy their relationship. Given the multi-agent nature, this is a complex problem where multi-agent deep deterministic policy gradient agents neither compete nor cooperate successfully. We present evidence that localized affinities enhance agent performance in achieving both competitive and cooperative objectives, resulting from superior overall scores in both domains. This not only results in virtuous choices but also clarifies an agent's teleology and makes its behavior human-level interpretable.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper moves affinity regularization into a mixed competitive-cooperative game but the results do not yet separate performance gains from simple optimization effects.

read the letter

The main new element is taking affinity-based policy regularization out of grid worlds and into Fog of Love, a two-player role-playing game where each agent has private virtues while also needing to satisfy a shared relationship goal. The setup creates genuine tension between individual and joint objectives, and the authors show that plain MADDPG fails to make progress on either. That environment choice and the baseline failure are the clearest contributions.

The work is straightforward in its execution: they add localized affinity terms to the objective and report better overall scores on both competitive and cooperative metrics. If the quantitative details in the full paper hold up, this gives a concrete data point on how the method behaves once state and action spaces grow and agents must trade off against each other.

The soft spot is the leap from “better scores” to “virtuous choices and clearer teleology.” The stress-test concern lands here. Without an ablation that removes the affinity term after training and checks whether performance holds, it remains possible that the regularization simply eases joint optimization rather than changing what the agents value. The abstract gives no numbers, no statistical tests, and no post-training removal results, so the evidence for internalized virtue is still thin.

This is for people already working on regularization techniques for multi-agent alignment. A reader who wants to see the method tried in a non-toy mixed-motive setting will find the environment description useful; someone looking for general methods or scaling arguments will not.

It is worth sending to peer review. The environment is a reasonable next step and the basic comparison is worth referee scrutiny, even if the current write-up needs more ablations and clearer quantitative reporting before the virtue claim can be taken at face value.

Referee Report

2 major / 2 minor

Summary. The paper introduces a two-player multi-agent environment based on the Fog of Love board game, where agents pursue individual virtues while cooperating on relationship goals. It claims that standard MADDPG fails to produce competitive or cooperative behavior, but affinity-based RL with localized affinities yields superior overall scores in both domains, resulting in virtuous choices and more interpretable agent teleology. The work extends prior affinity-based RL demonstrations from simple grid worlds to this more complex setting.

Significance. If the empirical results hold with proper controls, the approach could provide a practical method for engineering value-aligned behavior in multi-agent systems without sole reliance on reward shaping, while improving interpretability. The environment itself may serve as a useful benchmark for testing virtue-related objectives in RL.

major comments (2)

[Experiments] Experiments section (results on Fog of Love): The central claim that localized affinities produce virtuous behavior via superior scores requires evidence that the performance advantage persists after removing the affinity regularization term post-training. Without such an ablation, the gains may reflect direct policy bias from the regularization rather than internalized virtue, undermining the interpretation that the method engineers virtuous teleology rather than easing optimization.
[Experiments] Method and Experiments: No quantitative results, baselines, number of runs, or statistical tests are referenced in support of the 'superior overall scores' claim, making it impossible to evaluate effect sizes or reliability of the reported improvements over MADDPG.

minor comments (2)

[Abstract] Abstract: The phrase 'resulting from superior overall scores in both domains' is ambiguous; clarify whether this refers to individual virtue scores, relationship scores, or a combined metric.
[Environment] Environment description: Provide the exact state/action space sizes and reward structure details to allow replication and comparison with prior affinity-based RL work.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback. We address each major comment below and will revise the manuscript to incorporate additional experiments and quantitative details where appropriate.

read point-by-point responses

Referee: [Experiments] Experiments section (results on Fog of Love): The central claim that localized affinities produce virtuous behavior via superior scores requires evidence that the performance advantage persists after removing the affinity regularization term post-training. Without such an ablation, the gains may reflect direct policy bias from the regularization rather than internalized virtue, undermining the interpretation that the method engineers virtuous teleology rather than easing optimization.

Authors: We agree that an ablation study removing the affinity regularization term after training would strengthen the evidence that the agents have internalized virtuous behavior rather than relying on ongoing regularization. The current results show that localized affinities lead to superior scores and more interpretable teleology during training. To address this, we will add a post-training ablation experiment in the revised manuscript, continuing evaluation without the regularization term and reporting whether performance advantages persist. revision: yes
Referee: [Experiments] Method and Experiments: No quantitative results, baselines, number of runs, or statistical tests are referenced in support of the 'superior overall scores' claim, making it impossible to evaluate effect sizes or reliability of the reported improvements over MADDPG.

Authors: The manuscript presents baseline comparisons against MADDPG along with overall scores demonstrating improvements from localized affinities. We acknowledge, however, that details on the number of runs, variance, effect sizes, and statistical tests were not explicitly reported. In the revision we will add these quantitative elements to the Experiments section, including the number of independent runs, mean scores with standard deviations, and any statistical tests performed. revision: yes

Circularity Check

0 steps flagged

No derivation chain present; empirical comparison only

full rationale

The manuscript presents an empirical evaluation of affinity-based RL agents in the Fog of Love environment. No equations, derivations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided text. The central claim rests on observed score improvements under the affinity regularization, which is an external technique applied to the environment rather than a result derived from the paper's own inputs. Because no mathematical reduction or self-referential construction exists, the work contains no circular steps and is self-contained as an experimental demonstration.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim depends on the assumption that affinity regularization can be localized without introducing new free parameters that must be tuned per agent or per virtue; no explicit free parameters or invented entities are named in the abstract.

pith-pipeline@v0.9.1-grok · 5710 in / 984 out tokens · 18532 ms · 2026-06-28T06:09:49.359831+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

52 extracted references · 17 canonical work pages

[1]

AI Magazine28(4), 15 (2007) https://doi.org/10.1609/aimag.v28i4.2065

Anderson, M., Anderson, S.L.: Machine ethics: Creating an ethical intelligent agent. AI Magazine28(4), 15 (2007) https://doi.org/10.1609/aimag.v28i4.2065

work page doi:10.1609/aimag.v28i4.2065 2007
[2]

Oxford University Press, New York, NY (2010)

Wallach, W., Allen, C.: Moral Machines: Teaching Robots Right from Wrong, First issued as an oxford university press paperback edn. Oxford University Press, New York, NY (2010)

2010
[3]

Journal of Artificial Intelligence Research82, 1581–1628 (2025) https: //doi.org/10.1613/jair.1.16836

Zhong, T., Song, Y., Limarga, R., Pagnucco, M.: Computational Machine Ethics: A Survey. Journal of Artificial Intelligence Research82, 1581–1628 (2025) https: //doi.org/10.1613/jair.1.16836 . Accessed 2025-03-20

work page doi:10.1613/jair.1.16836 2025
[4]

The Stanford Encyclopedia of Philosophy (2021)

Alexander, L., Moore, M.: Deontological Ethics. The Stanford Encyclopedia of Philosophy (2021). Accessed 2022-04-05

2021
[5]

Accessed 2022-11-17

Sinnott-Armstrong, W.: Consequentialism (2003). Accessed 2022-11-17

2003
[6]

Oxford World’s Classics, Oxford, U.K

Ross, W.D.: Aristotle: The Nicomachean Ethics (Revised Edition). Oxford World’s Classics, Oxford, U.K. (1980). https://doi.org/10.1093/actrade/ 9780199213610.book.1

work page doi:10.1093/actrade/ 1980
[7]

001.0001

Hursthouse, R.: On Virtue Ethics (2001) https://doi.org/10.1093/0199247994. 001.0001

work page doi:10.1093/0199247994 2001
[8]

Crisp, R., Slote, M., Slote, M.A.: Virtue Ethics vol. 10. Oxford University Press, Oxford, U.K. (1997)

1997
[9]

AI & Society (2021) https://doi.org/10.1007/s00146-021-01325-7

Stenseke, J.: Artificial virtuous agents: from theory to machine implementation. AI & Society (2021) https://doi.org/10.1007/s00146-021-01325-7

work page doi:10.1007/s00146-021-01325-7 2021
[10]

AI and Ethics, 43681–022002518 (2022)

Vishwanath, A., Bøhn, E.D., Granmo, O.-C., Maree, C., Omlin, C.: Towards artificial virtuous agents: games, dilemmas and machine learning. AI and Ethics, 43681–022002518 (2022)

2022
[11]

In: Guarda, T., Portela, F., Diaz-Nafria, J.M

Ribeiro, B.A., Da Silva, M.B.: Machine Ethics and the Architecture of Virtue. In: Guarda, T., Portela, F., Diaz-Nafria, J.M. (eds.) Advanced Research in Technologies, Information, Innovation and Sustainability vol. 1936, pp. 22 384–401. Springer, Cham (2024). https://doi.org/10.1007/978-3-031-48855-9 29 . Series Title: Communications in Computer and Infor...

work page doi:10.1007/978-3-031-48855-9 1936
[12]

In: Farman- bar, M., Tzamtzi, M., Verma, A.K., Chakravorty, A

Vishwanath, A., Omlin, C.: Exploring Affinity-Based Reinforcement Learning for Designing Artificial Virtuous Agents in Stochastic Environments. In: Farman- bar, M., Tzamtzi, M., Verma, A.K., Chakravorty, A. (eds.) Frontiers of Artificial Intelligence, Ethics, and Multidisciplinary Applications, pp. 25–38. Springer, Singapore (2024)

2024
[13]

https://www

Red, C.P.: The Witcher 3: Wild Hunt - Official Website (2015). https://www. thewitcher.com/en Accessed 2022-04-11

2015
[14]

https://discowp.zaumstudio.com Accessed 2024- 12-05

ZA/UM: Disco Elysium (2019). https://discowp.zaumstudio.com Accessed 2024- 12-05

2019
[15]

This War Of Mine provides an experience of war seen from an entirely new angle

11 bit studios: This War of Mine. This War Of Mine provides an experience of war seen from an entirely new angle. For the very first time you do not play as an elite soldier, rather a group of civilians trying to survive in a besieged city. (2014). https://www.thiswarofmine.com/ Accessed 2024-12-05

2014
[16]

https://boardgamegeek.com/boardgame/ 175324/fog-of-love Accessed 2024-12-05

Jacob Jaskov: Fog of Love (2017). https://boardgamegeek.com/boardgame/ 175324/fog-of-love Accessed 2024-12-05

2017
[17]

https://boardgamegeek.com/boardgame/150376/ dead-of-winter-a-crossroads-game Accessed 2024-12-05

Jonathan Gilmour, Isaac Vega: Dead of Winter: A Crossroads Game (2014). https://boardgamegeek.com/boardgame/150376/ dead-of-winter-a-crossroads-game Accessed 2024-12-05

2014
[18]

In: Proceedings of the 12th Interna- tional Conference on the Foundations of Digital Games, pp

Nay, J.L., Zagal, J.P.: Meaning without consequence: virtue ethics and inconsequential choices in games. In: Proceedings of the 12th Interna- tional Conference on the Foundations of Digital Games, pp. 1–8. ACM, Hyannis Massachusetts (2017). https://doi.org/10.1145/3102071.3102073 . https://dl.acm.org/doi/10.1145/3102071.3102073

work page doi:10.1145/3102071.3102073 2017
[19]

Ethics and Information Technology 18(3), 211–225 (2016) https://doi.org/10.1007/s10676-016-9407-z

Formosa, P., Ryan, M., Staines, D.: Papers, Please and the systemic approach to engaging ethical expertise in videogames. Ethics and Information Technology 18(3), 211–225 (2016) https://doi.org/10.1007/s10676-016-9407-z

work page doi:10.1007/s10676-016-9407-z 2016
[20]

arXiv preprint arXiv:1312.5602 (2013)

Mnih, V.: Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)

Pith/arXiv arXiv 2013
[21]

Science 362(6419), 1140–1144 (2018)

Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanc- tot, M., Sifre, L., Kumaran, D., Graepel, T.,et al.: A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362(6419), 1140–1144 (2018)

2018
[22]

Robotics and Autonomous Systems77, 1–14 (2016) https://doi.org/10.1016/j.robot.2015.11.012

Dennis, L., Fisher, M., Slavkovik, M., Webster, M.: Formal verification of ethical 23 choices in autonomous systems. Robotics and Autonomous Systems77, 1–14 (2016) https://doi.org/10.1016/j.robot.2015.11.012

work page doi:10.1016/j.robot.2015.11.012 2016
[23]

IEEE Intelligent Systems21(4), 18–21 (2006) https://doi.org/10.1109/MIS.2006.80

Moor, J.H.: The Nature, Importance, and Difficulty of Machine Ethics. IEEE Intelligent Systems21(4), 18–21 (2006) https://doi.org/10.1109/MIS.2006.80

work page doi:10.1109/mis.2006.80 2006
[24]

Sci- ence and Engineering Ethics26(2), 501–532 (2020) https://doi.org/10.1007/ s11948-019-00151-x

Cervantes, J.-A., L´ opez, S., Rodr´ ıguez, L.-F., Cervantes, S., Cervantes, F., Ramos, F.: Artificial Moral Agents: A Survey of the Current Status. Sci- ence and Engineering Ethics26(2), 501–532 (2020) https://doi.org/10.1007/ s11948-019-00151-x

2020
[25]

ACM Computing Surveys53(6), 1–38 (2021) https://doi.org/10.1145/3419633

Tolmeijer, S., Kneer, M., Sarasua, C., Christen, M., Bernstein, A.: Implementa- tions in Machine Ethics: A Survey. ACM Computing Surveys53(6), 1–38 (2021) https://doi.org/10.1145/3419633

work page doi:10.1145/3419633 2021
[26]

In: Proceed- ings of the 21st International Conference on Autonomous Agents and Multiagent Systems

Peschl, M., Zgonnikov, A., Oliehoek, F.A., Siebert, L.C.: Moral: Aligning ai with human norms through multi-objective reinforced active learning. In: Proceed- ings of the 21st International Conference on Autonomous Agents and Multiagent Systems. AAMAS ’22, pp. 1038–1046. International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC (2022)

2022
[27]

Ethics and Information Technology24(1), 9 (2022)

Rodriguez-Soto, M., Serramia, M., Lopez-Sanchez, M., Rodriguez-Aguilar, J.A.: Instilling moral value alignment by means of multi-objective reinforcement learn- ing. Ethics and Information Technology24(1), 9 (2022). Accessed 2022-03-29

2022
[28]

Autonomous Agents and Multi-Agent Systems38(2), 50 (2024) https://doi.org/10.1007/s10458-024-09681-6

Ozaki, A., Rehman, A., Slavkovik, M.: Finding middle grounds for incoherent horn expressions: the moral machine case. Autonomous Agents and Multi-Agent Systems38(2), 50 (2024) https://doi.org/10.1007/s10458-024-09681-6 . Accessed 2025-02-05

work page doi:10.1007/s10458-024-09681-6 2024
[29]

The moral machine experiment.Nature, 563(7729):59–64, 2018

Awad, E., Dsouza, S., Kim, R., Schulz, J., Henrich, J., Shariff, A., Bonnefon, J.- F., Rahwan, I.: The Moral Machine experiment. Nature563(7729), 59–64 (2018) https://doi.org/10.1038/s41586-018-0637-6 . Accessed 2025-02-05

work page doi:10.1038/s41586-018-0637-6 2018
[30]

Autonomous Agents and Multi-Agent Systems5(3), 329–363 (2002) https://doi.org/10.1023/ A:1015508524218

Lang, J., Van Der Torre, L., Weydert, E.: Utilitarian Desires. Autonomous Agents and Multi-Agent Systems5(3), 329–363 (2002) https://doi.org/10.1023/ A:1015508524218 . Accessed 2025-02-05

2002
[31]

In: Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, And Society, pp

Govindarajulu, N.S., Bringsjord, S., Ghosh, R., Sarathy, V.: Toward the Engineer- ing of Virtuous Machines. In: Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, And Society, pp. 29–35. ACM, Honolulu HI USA (2019). https://doi. org/10.1145/3306618.3314256 .https://dl.acm.org/doi/10.1145/3306618.3314256 Accessed 2022-03-02

work page doi:10.1145/3306618.3314256 2019
[32]

AI & Society (2022) https://doi.org/10.1007/s00146-022-01569-x

Stenseke, J.: Artificial virtuous agents in a multi-agent tragedy of the commons. AI & Society (2022) https://doi.org/10.1007/s00146-022-01569-x . Accessed 2022-11-28 24

work page doi:10.1007/s00146-022-01569-x 2022
[33]

http://arxiv.org/abs/1806.10322

Berberich, N., Diepold, K.: The Virtuous Machine - Old Ethics for New Technology? arXiv: 1806.10322 (2018). http://arxiv.org/abs/1806.10322

Pith/arXiv arXiv 2018
[34]

Artificial Intelligence Review55(6), 4307–4346 (2022) https://doi.org/10.1007/ s10462-021-10108-x

Adams, S., Cody, T., Beling, P.A.: A survey of inverse reinforcement learning. Artificial Intelligence Review55(6), 4307–4346 (2022) https://doi.org/10.1007/ s10462-021-10108-x

2022
[35]

In: Icml, vol

Ng, A.Y., Harada, D., Russell, S.: Policy invariance under reward transforma- tions: Theory and application to reward shaping. In: Icml, vol. 99, pp. 278–287 (1999). Citeseer

1999
[36]

In: Proceedings of the 36th International Conference on Neural Information Processing Systems

Skalse, J., Howe, N.H.R., Krasheninnikov, D., Krueger, D.: Defining and charac- terizing reward hacking. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. NIPS ’22. Curran Associates Inc., Red Hook, NY, USA (2024)

2024
[37]

Journal of Machine Learning Research 18(136), 1–46 (2017)

Wirth, C., Akrour, R., Neumann, G., F¨ urnkranz, J.,et al.: A survey of preference- based reinforcement learning methods. Journal of Machine Learning Research 18(136), 1–46 (2017)

2017
[38]

Journal of Machine Learning Research16(1), 1437–1480 (2015)

Garcıa, J., Fern´ andez, F.: A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research16(1), 1437–1480 (2015)

2015
[39]

In: International Conference on Machine Learning, pp

Achiam, J., Held, D., Tamar, A., Abbeel, P.: Constrained policy optimization. In: International Conference on Machine Learning, pp. 22–31 (2017). PMLR

2017
[40]

In: International Conference on Machine Learning, pp

Ran, Y., Li, Y.-C., Zhang, F., Zhang, Z., Yu, Y.: Policy regularization with dataset constraint for offline reinforcement learning. In: International Conference on Machine Learning, pp. 28701–28717 (2023). PMLR

2023
[41]

Neural Computing and Applications, 1–10 (2022)

Persiani, M., Hellstr¨ om, T.: Policy regularization for legible behavior. Neural Computing and Applications, 1–10 (2022)

2022
[42]

Journal of Machine Learning Research 23(221), 1–68 (2022)

Tirumala, D., Galashov, A., Noh, H., Hasenclever, L., Pascanu, R., Schwarz, J., Desjardins, G., Czarnecki, W.M., Ahuja, A., Teh, Y.W., Heess, N.: Behavior priors for efficient reinforcement learning. Journal of Machine Learning Research 23(221), 1–68 (2022)

2022
[43]

AI3(2), 250–259 (2022)

Maree, C., Omlin, C.: Reinforcement Learning Your Way: Agent Characterization through Policy Regularization. AI3(2), 250–259 (2022)

2022
[44]

Maree, C., Omlin, C.W.: Can Interpretable Reinforcement Learning Manage Prosperity Your Way? AI3(2), 526–537 (2022)

2022
[45]

In: Bramer, M., Stahl, F

Vishwanath, A., Omlin, C.: Localized Affinity-Based Reinforcement Learning for Interpretable State-Specific Decision-Making. In: Bramer, M., Stahl, F. (eds.) Artificial Intelligence XLI, pp. 221–234. Springer, Cham (2025) 25

2025
[46]

https:// arxiv.org/abs/1706.02275

Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., Mordatch, I.: Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments (2020). https:// arxiv.org/abs/1706.02275

arXiv 2020
[47]

MacIntyre, A.C.: After Virtue: A Study in Moral Theory, (2007)

2007
[48]

Philosophy33(124), 1–19 (1958)

Anscombe, G.E.M.: Modern Moral Philosophy. Philosophy33(124), 1–19 (1958). Publisher: Cambridge University Press. Accessed 2024-04-17

1958
[49]

In: Powers, T.M

Howard, D., Muntean, I.: Artificial Moral Cognition: Moral Functionalism and Autonomous Moral Agency. In: Powers, T.M. (ed.) Philosophy and Comput- ing vol. 128, pp. 121–159. Springer, Cham (2017). https://doi.org/10.1007/ 978-3-319-61043-6 7 .http://link.springer.com/10.1007/978-3-319-61043-6 7

work page doi:10.1007/978-3-319-61043-6 2017
[50]

https://arxiv

Christianos, F., Papoudakis, G., Albrecht, S.V.: Pareto Actor-Critic for Equi- librium Selection in Multi-Agent Reinforcement Learning (2023). https://arxiv. org/abs/2209.14344

arXiv 2023
[51]

The Computer Journal66(7), 1573–1585 (2023) https://doi.org/10.1093/comjnl/bxac027

Zhao, Z., Cao, L., Chen, X., Lai, J., Zhang, L.: Improvement of MADRL Equi- librium Based on Pareto Optimization. The Computer Journal66(7), 1573–1585 (2023) https://doi.org/10.1093/comjnl/bxac027 . Accessed 2025-01-31 26 Appendix A Observation Space In this section, we outline the structure of the state space. The virtues are encoded as numbers between 1...

work page doi:10.1093/comjnl/bxac027 2023
[52]

sincerity In Table A1, we summarize the observation space of an agent in Fog of Love. Agent Observation Space Structure AgentKey Prefix Description Range Type player 1goal * Goal values (1-7) [-50,50] int32 player 1player 1 virtue * Player 1 virtue values (1-7) [-50,50] int32 player 1player 2 virtue * Player 2 virtue values (1-7) [-50,50] int32 player 1op...

[1] [1]

AI Magazine28(4), 15 (2007) https://doi.org/10.1609/aimag.v28i4.2065

Anderson, M., Anderson, S.L.: Machine ethics: Creating an ethical intelligent agent. AI Magazine28(4), 15 (2007) https://doi.org/10.1609/aimag.v28i4.2065

work page doi:10.1609/aimag.v28i4.2065 2007

[2] [2]

Oxford University Press, New York, NY (2010)

Wallach, W., Allen, C.: Moral Machines: Teaching Robots Right from Wrong, First issued as an oxford university press paperback edn. Oxford University Press, New York, NY (2010)

2010

[3] [3]

Journal of Artificial Intelligence Research82, 1581–1628 (2025) https: //doi.org/10.1613/jair.1.16836

Zhong, T., Song, Y., Limarga, R., Pagnucco, M.: Computational Machine Ethics: A Survey. Journal of Artificial Intelligence Research82, 1581–1628 (2025) https: //doi.org/10.1613/jair.1.16836 . Accessed 2025-03-20

work page doi:10.1613/jair.1.16836 2025

[4] [4]

The Stanford Encyclopedia of Philosophy (2021)

Alexander, L., Moore, M.: Deontological Ethics. The Stanford Encyclopedia of Philosophy (2021). Accessed 2022-04-05

2021

[5] [5]

Accessed 2022-11-17

Sinnott-Armstrong, W.: Consequentialism (2003). Accessed 2022-11-17

2003

[6] [6]

Oxford World’s Classics, Oxford, U.K

Ross, W.D.: Aristotle: The Nicomachean Ethics (Revised Edition). Oxford World’s Classics, Oxford, U.K. (1980). https://doi.org/10.1093/actrade/ 9780199213610.book.1

work page doi:10.1093/actrade/ 1980

[7] [7]

001.0001

Hursthouse, R.: On Virtue Ethics (2001) https://doi.org/10.1093/0199247994. 001.0001

work page doi:10.1093/0199247994 2001

[8] [8]

Crisp, R., Slote, M., Slote, M.A.: Virtue Ethics vol. 10. Oxford University Press, Oxford, U.K. (1997)

1997

[9] [9]

AI & Society (2021) https://doi.org/10.1007/s00146-021-01325-7

Stenseke, J.: Artificial virtuous agents: from theory to machine implementation. AI & Society (2021) https://doi.org/10.1007/s00146-021-01325-7

work page doi:10.1007/s00146-021-01325-7 2021

[10] [10]

AI and Ethics, 43681–022002518 (2022)

Vishwanath, A., Bøhn, E.D., Granmo, O.-C., Maree, C., Omlin, C.: Towards artificial virtuous agents: games, dilemmas and machine learning. AI and Ethics, 43681–022002518 (2022)

2022

[11] [11]

In: Guarda, T., Portela, F., Diaz-Nafria, J.M

Ribeiro, B.A., Da Silva, M.B.: Machine Ethics and the Architecture of Virtue. In: Guarda, T., Portela, F., Diaz-Nafria, J.M. (eds.) Advanced Research in Technologies, Information, Innovation and Sustainability vol. 1936, pp. 22 384–401. Springer, Cham (2024). https://doi.org/10.1007/978-3-031-48855-9 29 . Series Title: Communications in Computer and Infor...

work page doi:10.1007/978-3-031-48855-9 1936

[12] [12]

In: Farman- bar, M., Tzamtzi, M., Verma, A.K., Chakravorty, A

Vishwanath, A., Omlin, C.: Exploring Affinity-Based Reinforcement Learning for Designing Artificial Virtuous Agents in Stochastic Environments. In: Farman- bar, M., Tzamtzi, M., Verma, A.K., Chakravorty, A. (eds.) Frontiers of Artificial Intelligence, Ethics, and Multidisciplinary Applications, pp. 25–38. Springer, Singapore (2024)

2024

[13] [13]

https://www

Red, C.P.: The Witcher 3: Wild Hunt - Official Website (2015). https://www. thewitcher.com/en Accessed 2022-04-11

2015

[14] [14]

https://discowp.zaumstudio.com Accessed 2024- 12-05

ZA/UM: Disco Elysium (2019). https://discowp.zaumstudio.com Accessed 2024- 12-05

2019

[15] [15]

This War Of Mine provides an experience of war seen from an entirely new angle

11 bit studios: This War of Mine. This War Of Mine provides an experience of war seen from an entirely new angle. For the very first time you do not play as an elite soldier, rather a group of civilians trying to survive in a besieged city. (2014). https://www.thiswarofmine.com/ Accessed 2024-12-05

2014

[16] [16]

https://boardgamegeek.com/boardgame/ 175324/fog-of-love Accessed 2024-12-05

Jacob Jaskov: Fog of Love (2017). https://boardgamegeek.com/boardgame/ 175324/fog-of-love Accessed 2024-12-05

2017

[17] [17]

https://boardgamegeek.com/boardgame/150376/ dead-of-winter-a-crossroads-game Accessed 2024-12-05

Jonathan Gilmour, Isaac Vega: Dead of Winter: A Crossroads Game (2014). https://boardgamegeek.com/boardgame/150376/ dead-of-winter-a-crossroads-game Accessed 2024-12-05

2014

[18] [18]

In: Proceedings of the 12th Interna- tional Conference on the Foundations of Digital Games, pp

Nay, J.L., Zagal, J.P.: Meaning without consequence: virtue ethics and inconsequential choices in games. In: Proceedings of the 12th Interna- tional Conference on the Foundations of Digital Games, pp. 1–8. ACM, Hyannis Massachusetts (2017). https://doi.org/10.1145/3102071.3102073 . https://dl.acm.org/doi/10.1145/3102071.3102073

work page doi:10.1145/3102071.3102073 2017

[19] [19]

Ethics and Information Technology 18(3), 211–225 (2016) https://doi.org/10.1007/s10676-016-9407-z

Formosa, P., Ryan, M., Staines, D.: Papers, Please and the systemic approach to engaging ethical expertise in videogames. Ethics and Information Technology 18(3), 211–225 (2016) https://doi.org/10.1007/s10676-016-9407-z

work page doi:10.1007/s10676-016-9407-z 2016

[20] [20]

arXiv preprint arXiv:1312.5602 (2013)

Mnih, V.: Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)

Pith/arXiv arXiv 2013

[21] [21]

Science 362(6419), 1140–1144 (2018)

Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanc- tot, M., Sifre, L., Kumaran, D., Graepel, T.,et al.: A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362(6419), 1140–1144 (2018)

2018

[22] [22]

Robotics and Autonomous Systems77, 1–14 (2016) https://doi.org/10.1016/j.robot.2015.11.012

Dennis, L., Fisher, M., Slavkovik, M., Webster, M.: Formal verification of ethical 23 choices in autonomous systems. Robotics and Autonomous Systems77, 1–14 (2016) https://doi.org/10.1016/j.robot.2015.11.012

work page doi:10.1016/j.robot.2015.11.012 2016

[23] [23]

IEEE Intelligent Systems21(4), 18–21 (2006) https://doi.org/10.1109/MIS.2006.80

Moor, J.H.: The Nature, Importance, and Difficulty of Machine Ethics. IEEE Intelligent Systems21(4), 18–21 (2006) https://doi.org/10.1109/MIS.2006.80

work page doi:10.1109/mis.2006.80 2006

[24] [24]

Sci- ence and Engineering Ethics26(2), 501–532 (2020) https://doi.org/10.1007/ s11948-019-00151-x

Cervantes, J.-A., L´ opez, S., Rodr´ ıguez, L.-F., Cervantes, S., Cervantes, F., Ramos, F.: Artificial Moral Agents: A Survey of the Current Status. Sci- ence and Engineering Ethics26(2), 501–532 (2020) https://doi.org/10.1007/ s11948-019-00151-x

2020

[25] [25]

ACM Computing Surveys53(6), 1–38 (2021) https://doi.org/10.1145/3419633

Tolmeijer, S., Kneer, M., Sarasua, C., Christen, M., Bernstein, A.: Implementa- tions in Machine Ethics: A Survey. ACM Computing Surveys53(6), 1–38 (2021) https://doi.org/10.1145/3419633

work page doi:10.1145/3419633 2021

[26] [26]

In: Proceed- ings of the 21st International Conference on Autonomous Agents and Multiagent Systems

Peschl, M., Zgonnikov, A., Oliehoek, F.A., Siebert, L.C.: Moral: Aligning ai with human norms through multi-objective reinforced active learning. In: Proceed- ings of the 21st International Conference on Autonomous Agents and Multiagent Systems. AAMAS ’22, pp. 1038–1046. International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC (2022)

2022

[27] [27]

Ethics and Information Technology24(1), 9 (2022)

Rodriguez-Soto, M., Serramia, M., Lopez-Sanchez, M., Rodriguez-Aguilar, J.A.: Instilling moral value alignment by means of multi-objective reinforcement learn- ing. Ethics and Information Technology24(1), 9 (2022). Accessed 2022-03-29

2022

[28] [28]

Autonomous Agents and Multi-Agent Systems38(2), 50 (2024) https://doi.org/10.1007/s10458-024-09681-6

Ozaki, A., Rehman, A., Slavkovik, M.: Finding middle grounds for incoherent horn expressions: the moral machine case. Autonomous Agents and Multi-Agent Systems38(2), 50 (2024) https://doi.org/10.1007/s10458-024-09681-6 . Accessed 2025-02-05

work page doi:10.1007/s10458-024-09681-6 2024

[29] [29]

The moral machine experiment.Nature, 563(7729):59–64, 2018

Awad, E., Dsouza, S., Kim, R., Schulz, J., Henrich, J., Shariff, A., Bonnefon, J.- F., Rahwan, I.: The Moral Machine experiment. Nature563(7729), 59–64 (2018) https://doi.org/10.1038/s41586-018-0637-6 . Accessed 2025-02-05

work page doi:10.1038/s41586-018-0637-6 2018

[30] [30]

Autonomous Agents and Multi-Agent Systems5(3), 329–363 (2002) https://doi.org/10.1023/ A:1015508524218

Lang, J., Van Der Torre, L., Weydert, E.: Utilitarian Desires. Autonomous Agents and Multi-Agent Systems5(3), 329–363 (2002) https://doi.org/10.1023/ A:1015508524218 . Accessed 2025-02-05

2002

[31] [31]

In: Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, And Society, pp

Govindarajulu, N.S., Bringsjord, S., Ghosh, R., Sarathy, V.: Toward the Engineer- ing of Virtuous Machines. In: Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, And Society, pp. 29–35. ACM, Honolulu HI USA (2019). https://doi. org/10.1145/3306618.3314256 .https://dl.acm.org/doi/10.1145/3306618.3314256 Accessed 2022-03-02

work page doi:10.1145/3306618.3314256 2019

[32] [32]

AI & Society (2022) https://doi.org/10.1007/s00146-022-01569-x

Stenseke, J.: Artificial virtuous agents in a multi-agent tragedy of the commons. AI & Society (2022) https://doi.org/10.1007/s00146-022-01569-x . Accessed 2022-11-28 24

work page doi:10.1007/s00146-022-01569-x 2022

[33] [33]

http://arxiv.org/abs/1806.10322

Berberich, N., Diepold, K.: The Virtuous Machine - Old Ethics for New Technology? arXiv: 1806.10322 (2018). http://arxiv.org/abs/1806.10322

Pith/arXiv arXiv 2018

[34] [34]

Artificial Intelligence Review55(6), 4307–4346 (2022) https://doi.org/10.1007/ s10462-021-10108-x

Adams, S., Cody, T., Beling, P.A.: A survey of inverse reinforcement learning. Artificial Intelligence Review55(6), 4307–4346 (2022) https://doi.org/10.1007/ s10462-021-10108-x

2022

[35] [35]

In: Icml, vol

Ng, A.Y., Harada, D., Russell, S.: Policy invariance under reward transforma- tions: Theory and application to reward shaping. In: Icml, vol. 99, pp. 278–287 (1999). Citeseer

1999

[36] [36]

In: Proceedings of the 36th International Conference on Neural Information Processing Systems

Skalse, J., Howe, N.H.R., Krasheninnikov, D., Krueger, D.: Defining and charac- terizing reward hacking. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. NIPS ’22. Curran Associates Inc., Red Hook, NY, USA (2024)

2024

[37] [37]

Journal of Machine Learning Research 18(136), 1–46 (2017)

Wirth, C., Akrour, R., Neumann, G., F¨ urnkranz, J.,et al.: A survey of preference- based reinforcement learning methods. Journal of Machine Learning Research 18(136), 1–46 (2017)

2017

[38] [38]

Journal of Machine Learning Research16(1), 1437–1480 (2015)

Garcıa, J., Fern´ andez, F.: A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research16(1), 1437–1480 (2015)

2015

[39] [39]

In: International Conference on Machine Learning, pp

Achiam, J., Held, D., Tamar, A., Abbeel, P.: Constrained policy optimization. In: International Conference on Machine Learning, pp. 22–31 (2017). PMLR

2017

[40] [40]

In: International Conference on Machine Learning, pp

Ran, Y., Li, Y.-C., Zhang, F., Zhang, Z., Yu, Y.: Policy regularization with dataset constraint for offline reinforcement learning. In: International Conference on Machine Learning, pp. 28701–28717 (2023). PMLR

2023

[41] [41]

Neural Computing and Applications, 1–10 (2022)

Persiani, M., Hellstr¨ om, T.: Policy regularization for legible behavior. Neural Computing and Applications, 1–10 (2022)

2022

[42] [42]

Journal of Machine Learning Research 23(221), 1–68 (2022)

Tirumala, D., Galashov, A., Noh, H., Hasenclever, L., Pascanu, R., Schwarz, J., Desjardins, G., Czarnecki, W.M., Ahuja, A., Teh, Y.W., Heess, N.: Behavior priors for efficient reinforcement learning. Journal of Machine Learning Research 23(221), 1–68 (2022)

2022

[43] [43]

AI3(2), 250–259 (2022)

Maree, C., Omlin, C.: Reinforcement Learning Your Way: Agent Characterization through Policy Regularization. AI3(2), 250–259 (2022)

2022

[44] [44]

Maree, C., Omlin, C.W.: Can Interpretable Reinforcement Learning Manage Prosperity Your Way? AI3(2), 526–537 (2022)

2022

[45] [45]

In: Bramer, M., Stahl, F

Vishwanath, A., Omlin, C.: Localized Affinity-Based Reinforcement Learning for Interpretable State-Specific Decision-Making. In: Bramer, M., Stahl, F. (eds.) Artificial Intelligence XLI, pp. 221–234. Springer, Cham (2025) 25

2025

[46] [46]

https:// arxiv.org/abs/1706.02275

Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., Mordatch, I.: Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments (2020). https:// arxiv.org/abs/1706.02275

arXiv 2020

[47] [47]

MacIntyre, A.C.: After Virtue: A Study in Moral Theory, (2007)

2007

[48] [48]

Philosophy33(124), 1–19 (1958)

Anscombe, G.E.M.: Modern Moral Philosophy. Philosophy33(124), 1–19 (1958). Publisher: Cambridge University Press. Accessed 2024-04-17

1958

[49] [49]

In: Powers, T.M

Howard, D., Muntean, I.: Artificial Moral Cognition: Moral Functionalism and Autonomous Moral Agency. In: Powers, T.M. (ed.) Philosophy and Comput- ing vol. 128, pp. 121–159. Springer, Cham (2017). https://doi.org/10.1007/ 978-3-319-61043-6 7 .http://link.springer.com/10.1007/978-3-319-61043-6 7

work page doi:10.1007/978-3-319-61043-6 2017

[50] [50]

https://arxiv

Christianos, F., Papoudakis, G., Albrecht, S.V.: Pareto Actor-Critic for Equi- librium Selection in Multi-Agent Reinforcement Learning (2023). https://arxiv. org/abs/2209.14344

arXiv 2023

[51] [51]

The Computer Journal66(7), 1573–1585 (2023) https://doi.org/10.1093/comjnl/bxac027

Zhao, Z., Cao, L., Chen, X., Lai, J., Zhang, L.: Improvement of MADRL Equi- librium Based on Pareto Optimization. The Computer Journal66(7), 1573–1585 (2023) https://doi.org/10.1093/comjnl/bxac027 . Accessed 2025-01-31 26 Appendix A Observation Space In this section, we outline the structure of the state space. The virtues are encoded as numbers between 1...

work page doi:10.1093/comjnl/bxac027 2023

[52] [52]

sincerity In Table A1, we summarize the observation space of an agent in Fog of Love. Agent Observation Space Structure AgentKey Prefix Description Range Type player 1goal * Goal values (1-7) [-50,50] int32 player 1player 1 virtue * Player 1 virtue values (1-7) [-50,50] int32 player 1player 2 virtue * Player 2 virtue values (1-7) [-50,50] int32 player 1op...