Fog of Love: Engineering Virtuous Agent Behavior with Affinity-based Reinforcement Learning in a Game Environment
Pith reviewed 2026-06-28 06:09 UTC · model grok-4.3
The pith
Localized affinities enable agents to achieve higher scores in both individual virtues and cooperative relationship goals in the Fog of Love environment.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Localized affinities enhance agent performance in achieving both competitive and cooperative objectives in the Fog of Love environment, resulting from superior overall scores in both domains; this produces virtuous choices, clarifies an agent's teleology, and renders its behavior human-level interpretable.
What carries the argument
Affinity-based reinforcement learning that applies policy regularization to the objective function to incentivize virtuous actions through localized affinities.
If this is right
- Agents succeed at both competition and cooperation where standard MADDPG methods fail.
- Agent behavior becomes human-level interpretable through clarified teleology.
- Virtuous choices emerge without full reliance on reward-function design.
- Superior scores are obtained simultaneously in individual virtue and shared relationship domains.
Where Pith is reading between the lines
- The regularization technique could be adapted to other multi-objective environments where reward engineering is difficult.
- Interpretable teleology may help identify when agents align with or deviate from intended ethical goals.
- The method might reduce the need for exhaustive reward shaping in settings that mix self-interest and collaboration.
Load-bearing premise
That superior scores on the chosen metrics in the Fog of Love environment correspond to genuinely virtuous behavior rather than exploitation of the affinity regularization or game rules.
What would settle it
Observation of agents reaching high scores via affinity regularization yet consistently choosing actions that exploit game mechanics without satisfying the intended virtues or relationship requirements.
read the original abstract
Instilling virtuous behavior in artificial intelligence has seen increasing interest. One of the techniques proposed is known as affinity-based reinforcement learning, which uses policy regularization on the objective function to incentivize virtuous actions without being fully dependent on the reward function design. Thus far, this technique has been demonstrated to be effective in grid worlds and toy-problem environments with minimal state and action spaces. To expand this research to more sophisticated environments, we introduce a two-player multi-agent environment based on the role-playing board game known as Fog of Love. In this environment, two agents compete to fulfill their individual virtues, while also cooperating to satisfy their relationship. Given the multi-agent nature, this is a complex problem where multi-agent deep deterministic policy gradient agents neither compete nor cooperate successfully. We present evidence that localized affinities enhance agent performance in achieving both competitive and cooperative objectives, resulting from superior overall scores in both domains. This not only results in virtuous choices but also clarifies an agent's teleology and makes its behavior human-level interpretable.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a two-player multi-agent environment based on the Fog of Love board game, where agents pursue individual virtues while cooperating on relationship goals. It claims that standard MADDPG fails to produce competitive or cooperative behavior, but affinity-based RL with localized affinities yields superior overall scores in both domains, resulting in virtuous choices and more interpretable agent teleology. The work extends prior affinity-based RL demonstrations from simple grid worlds to this more complex setting.
Significance. If the empirical results hold with proper controls, the approach could provide a practical method for engineering value-aligned behavior in multi-agent systems without sole reliance on reward shaping, while improving interpretability. The environment itself may serve as a useful benchmark for testing virtue-related objectives in RL.
major comments (2)
- [Experiments] Experiments section (results on Fog of Love): The central claim that localized affinities produce virtuous behavior via superior scores requires evidence that the performance advantage persists after removing the affinity regularization term post-training. Without such an ablation, the gains may reflect direct policy bias from the regularization rather than internalized virtue, undermining the interpretation that the method engineers virtuous teleology rather than easing optimization.
- [Experiments] Method and Experiments: No quantitative results, baselines, number of runs, or statistical tests are referenced in support of the 'superior overall scores' claim, making it impossible to evaluate effect sizes or reliability of the reported improvements over MADDPG.
minor comments (2)
- [Abstract] Abstract: The phrase 'resulting from superior overall scores in both domains' is ambiguous; clarify whether this refers to individual virtue scores, relationship scores, or a combined metric.
- [Environment] Environment description: Provide the exact state/action space sizes and reward structure details to allow replication and comparison with prior affinity-based RL work.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback. We address each major comment below and will revise the manuscript to incorporate additional experiments and quantitative details where appropriate.
read point-by-point responses
-
Referee: [Experiments] Experiments section (results on Fog of Love): The central claim that localized affinities produce virtuous behavior via superior scores requires evidence that the performance advantage persists after removing the affinity regularization term post-training. Without such an ablation, the gains may reflect direct policy bias from the regularization rather than internalized virtue, undermining the interpretation that the method engineers virtuous teleology rather than easing optimization.
Authors: We agree that an ablation study removing the affinity regularization term after training would strengthen the evidence that the agents have internalized virtuous behavior rather than relying on ongoing regularization. The current results show that localized affinities lead to superior scores and more interpretable teleology during training. To address this, we will add a post-training ablation experiment in the revised manuscript, continuing evaluation without the regularization term and reporting whether performance advantages persist. revision: yes
-
Referee: [Experiments] Method and Experiments: No quantitative results, baselines, number of runs, or statistical tests are referenced in support of the 'superior overall scores' claim, making it impossible to evaluate effect sizes or reliability of the reported improvements over MADDPG.
Authors: The manuscript presents baseline comparisons against MADDPG along with overall scores demonstrating improvements from localized affinities. We acknowledge, however, that details on the number of runs, variance, effect sizes, and statistical tests were not explicitly reported. In the revision we will add these quantitative elements to the Experiments section, including the number of independent runs, mean scores with standard deviations, and any statistical tests performed. revision: yes
Circularity Check
No derivation chain present; empirical comparison only
full rationale
The manuscript presents an empirical evaluation of affinity-based RL agents in the Fog of Love environment. No equations, derivations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided text. The central claim rests on observed score improvements under the affinity regularization, which is an external technique applied to the environment rather than a result derived from the paper's own inputs. Because no mathematical reduction or self-referential construction exists, the work contains no circular steps and is self-contained as an experimental demonstration.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
AI Magazine28(4), 15 (2007) https://doi.org/10.1609/aimag.v28i4.2065
Anderson, M., Anderson, S.L.: Machine ethics: Creating an ethical intelligent agent. AI Magazine28(4), 15 (2007) https://doi.org/10.1609/aimag.v28i4.2065
-
[2]
Oxford University Press, New York, NY (2010)
Wallach, W., Allen, C.: Moral Machines: Teaching Robots Right from Wrong, First issued as an oxford university press paperback edn. Oxford University Press, New York, NY (2010)
2010
-
[3]
Zhong, T., Song, Y., Limarga, R., Pagnucco, M.: Computational Machine Ethics: A Survey. Journal of Artificial Intelligence Research82, 1581–1628 (2025) https: //doi.org/10.1613/jair.1.16836 . Accessed 2025-03-20
-
[4]
The Stanford Encyclopedia of Philosophy (2021)
Alexander, L., Moore, M.: Deontological Ethics. The Stanford Encyclopedia of Philosophy (2021). Accessed 2022-04-05
2021
-
[5]
Accessed 2022-11-17
Sinnott-Armstrong, W.: Consequentialism (2003). Accessed 2022-11-17
2003
-
[6]
Oxford World’s Classics, Oxford, U.K
Ross, W.D.: Aristotle: The Nicomachean Ethics (Revised Edition). Oxford World’s Classics, Oxford, U.K. (1980). https://doi.org/10.1093/actrade/ 9780199213610.book.1
-
[7]
Hursthouse, R.: On Virtue Ethics (2001) https://doi.org/10.1093/0199247994. 001.0001
-
[8]
Crisp, R., Slote, M., Slote, M.A.: Virtue Ethics vol. 10. Oxford University Press, Oxford, U.K. (1997)
1997
-
[9]
AI & Society (2021) https://doi.org/10.1007/s00146-021-01325-7
Stenseke, J.: Artificial virtuous agents: from theory to machine implementation. AI & Society (2021) https://doi.org/10.1007/s00146-021-01325-7
-
[10]
AI and Ethics, 43681–022002518 (2022)
Vishwanath, A., Bøhn, E.D., Granmo, O.-C., Maree, C., Omlin, C.: Towards artificial virtuous agents: games, dilemmas and machine learning. AI and Ethics, 43681–022002518 (2022)
2022
-
[11]
In: Guarda, T., Portela, F., Diaz-Nafria, J.M
Ribeiro, B.A., Da Silva, M.B.: Machine Ethics and the Architecture of Virtue. In: Guarda, T., Portela, F., Diaz-Nafria, J.M. (eds.) Advanced Research in Technologies, Information, Innovation and Sustainability vol. 1936, pp. 22 384–401. Springer, Cham (2024). https://doi.org/10.1007/978-3-031-48855-9 29 . Series Title: Communications in Computer and Infor...
-
[12]
In: Farman- bar, M., Tzamtzi, M., Verma, A.K., Chakravorty, A
Vishwanath, A., Omlin, C.: Exploring Affinity-Based Reinforcement Learning for Designing Artificial Virtuous Agents in Stochastic Environments. In: Farman- bar, M., Tzamtzi, M., Verma, A.K., Chakravorty, A. (eds.) Frontiers of Artificial Intelligence, Ethics, and Multidisciplinary Applications, pp. 25–38. Springer, Singapore (2024)
2024
-
[13]
https://www
Red, C.P.: The Witcher 3: Wild Hunt - Official Website (2015). https://www. thewitcher.com/en Accessed 2022-04-11
2015
-
[14]
https://discowp.zaumstudio.com Accessed 2024- 12-05
ZA/UM: Disco Elysium (2019). https://discowp.zaumstudio.com Accessed 2024- 12-05
2019
-
[15]
This War Of Mine provides an experience of war seen from an entirely new angle
11 bit studios: This War of Mine. This War Of Mine provides an experience of war seen from an entirely new angle. For the very first time you do not play as an elite soldier, rather a group of civilians trying to survive in a besieged city. (2014). https://www.thiswarofmine.com/ Accessed 2024-12-05
2014
-
[16]
https://boardgamegeek.com/boardgame/ 175324/fog-of-love Accessed 2024-12-05
Jacob Jaskov: Fog of Love (2017). https://boardgamegeek.com/boardgame/ 175324/fog-of-love Accessed 2024-12-05
2017
-
[17]
https://boardgamegeek.com/boardgame/150376/ dead-of-winter-a-crossroads-game Accessed 2024-12-05
Jonathan Gilmour, Isaac Vega: Dead of Winter: A Crossroads Game (2014). https://boardgamegeek.com/boardgame/150376/ dead-of-winter-a-crossroads-game Accessed 2024-12-05
2014
-
[18]
In: Proceedings of the 12th Interna- tional Conference on the Foundations of Digital Games, pp
Nay, J.L., Zagal, J.P.: Meaning without consequence: virtue ethics and inconsequential choices in games. In: Proceedings of the 12th Interna- tional Conference on the Foundations of Digital Games, pp. 1–8. ACM, Hyannis Massachusetts (2017). https://doi.org/10.1145/3102071.3102073 . https://dl.acm.org/doi/10.1145/3102071.3102073
-
[19]
Ethics and Information Technology 18(3), 211–225 (2016) https://doi.org/10.1007/s10676-016-9407-z
Formosa, P., Ryan, M., Staines, D.: Papers, Please and the systemic approach to engaging ethical expertise in videogames. Ethics and Information Technology 18(3), 211–225 (2016) https://doi.org/10.1007/s10676-016-9407-z
-
[20]
arXiv preprint arXiv:1312.5602 (2013)
Mnih, V.: Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)
Pith/arXiv arXiv 2013
-
[21]
Science 362(6419), 1140–1144 (2018)
Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanc- tot, M., Sifre, L., Kumaran, D., Graepel, T.,et al.: A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362(6419), 1140–1144 (2018)
2018
-
[22]
Robotics and Autonomous Systems77, 1–14 (2016) https://doi.org/10.1016/j.robot.2015.11.012
Dennis, L., Fisher, M., Slavkovik, M., Webster, M.: Formal verification of ethical 23 choices in autonomous systems. Robotics and Autonomous Systems77, 1–14 (2016) https://doi.org/10.1016/j.robot.2015.11.012
-
[23]
IEEE Intelligent Systems21(4), 18–21 (2006) https://doi.org/10.1109/MIS.2006.80
Moor, J.H.: The Nature, Importance, and Difficulty of Machine Ethics. IEEE Intelligent Systems21(4), 18–21 (2006) https://doi.org/10.1109/MIS.2006.80
-
[24]
Sci- ence and Engineering Ethics26(2), 501–532 (2020) https://doi.org/10.1007/ s11948-019-00151-x
Cervantes, J.-A., L´ opez, S., Rodr´ ıguez, L.-F., Cervantes, S., Cervantes, F., Ramos, F.: Artificial Moral Agents: A Survey of the Current Status. Sci- ence and Engineering Ethics26(2), 501–532 (2020) https://doi.org/10.1007/ s11948-019-00151-x
2020
-
[25]
ACM Computing Surveys53(6), 1–38 (2021) https://doi.org/10.1145/3419633
Tolmeijer, S., Kneer, M., Sarasua, C., Christen, M., Bernstein, A.: Implementa- tions in Machine Ethics: A Survey. ACM Computing Surveys53(6), 1–38 (2021) https://doi.org/10.1145/3419633
-
[26]
In: Proceed- ings of the 21st International Conference on Autonomous Agents and Multiagent Systems
Peschl, M., Zgonnikov, A., Oliehoek, F.A., Siebert, L.C.: Moral: Aligning ai with human norms through multi-objective reinforced active learning. In: Proceed- ings of the 21st International Conference on Autonomous Agents and Multiagent Systems. AAMAS ’22, pp. 1038–1046. International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC (2022)
2022
-
[27]
Ethics and Information Technology24(1), 9 (2022)
Rodriguez-Soto, M., Serramia, M., Lopez-Sanchez, M., Rodriguez-Aguilar, J.A.: Instilling moral value alignment by means of multi-objective reinforcement learn- ing. Ethics and Information Technology24(1), 9 (2022). Accessed 2022-03-29
2022
-
[28]
Autonomous Agents and Multi-Agent Systems38(2), 50 (2024) https://doi.org/10.1007/s10458-024-09681-6
Ozaki, A., Rehman, A., Slavkovik, M.: Finding middle grounds for incoherent horn expressions: the moral machine case. Autonomous Agents and Multi-Agent Systems38(2), 50 (2024) https://doi.org/10.1007/s10458-024-09681-6 . Accessed 2025-02-05
-
[29]
The moral machine experiment.Nature, 563(7729):59–64, 2018
Awad, E., Dsouza, S., Kim, R., Schulz, J., Henrich, J., Shariff, A., Bonnefon, J.- F., Rahwan, I.: The Moral Machine experiment. Nature563(7729), 59–64 (2018) https://doi.org/10.1038/s41586-018-0637-6 . Accessed 2025-02-05
-
[30]
Autonomous Agents and Multi-Agent Systems5(3), 329–363 (2002) https://doi.org/10.1023/ A:1015508524218
Lang, J., Van Der Torre, L., Weydert, E.: Utilitarian Desires. Autonomous Agents and Multi-Agent Systems5(3), 329–363 (2002) https://doi.org/10.1023/ A:1015508524218 . Accessed 2025-02-05
2002
-
[31]
In: Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, And Society, pp
Govindarajulu, N.S., Bringsjord, S., Ghosh, R., Sarathy, V.: Toward the Engineer- ing of Virtuous Machines. In: Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, And Society, pp. 29–35. ACM, Honolulu HI USA (2019). https://doi. org/10.1145/3306618.3314256 .https://dl.acm.org/doi/10.1145/3306618.3314256 Accessed 2022-03-02
-
[32]
AI & Society (2022) https://doi.org/10.1007/s00146-022-01569-x
Stenseke, J.: Artificial virtuous agents in a multi-agent tragedy of the commons. AI & Society (2022) https://doi.org/10.1007/s00146-022-01569-x . Accessed 2022-11-28 24
-
[33]
http://arxiv.org/abs/1806.10322
Berberich, N., Diepold, K.: The Virtuous Machine - Old Ethics for New Technology? arXiv: 1806.10322 (2018). http://arxiv.org/abs/1806.10322
Pith/arXiv arXiv 2018
-
[34]
Artificial Intelligence Review55(6), 4307–4346 (2022) https://doi.org/10.1007/ s10462-021-10108-x
Adams, S., Cody, T., Beling, P.A.: A survey of inverse reinforcement learning. Artificial Intelligence Review55(6), 4307–4346 (2022) https://doi.org/10.1007/ s10462-021-10108-x
2022
-
[35]
In: Icml, vol
Ng, A.Y., Harada, D., Russell, S.: Policy invariance under reward transforma- tions: Theory and application to reward shaping. In: Icml, vol. 99, pp. 278–287 (1999). Citeseer
1999
-
[36]
In: Proceedings of the 36th International Conference on Neural Information Processing Systems
Skalse, J., Howe, N.H.R., Krasheninnikov, D., Krueger, D.: Defining and charac- terizing reward hacking. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. NIPS ’22. Curran Associates Inc., Red Hook, NY, USA (2024)
2024
-
[37]
Journal of Machine Learning Research 18(136), 1–46 (2017)
Wirth, C., Akrour, R., Neumann, G., F¨ urnkranz, J.,et al.: A survey of preference- based reinforcement learning methods. Journal of Machine Learning Research 18(136), 1–46 (2017)
2017
-
[38]
Journal of Machine Learning Research16(1), 1437–1480 (2015)
Garcıa, J., Fern´ andez, F.: A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research16(1), 1437–1480 (2015)
2015
-
[39]
In: International Conference on Machine Learning, pp
Achiam, J., Held, D., Tamar, A., Abbeel, P.: Constrained policy optimization. In: International Conference on Machine Learning, pp. 22–31 (2017). PMLR
2017
-
[40]
In: International Conference on Machine Learning, pp
Ran, Y., Li, Y.-C., Zhang, F., Zhang, Z., Yu, Y.: Policy regularization with dataset constraint for offline reinforcement learning. In: International Conference on Machine Learning, pp. 28701–28717 (2023). PMLR
2023
-
[41]
Neural Computing and Applications, 1–10 (2022)
Persiani, M., Hellstr¨ om, T.: Policy regularization for legible behavior. Neural Computing and Applications, 1–10 (2022)
2022
-
[42]
Journal of Machine Learning Research 23(221), 1–68 (2022)
Tirumala, D., Galashov, A., Noh, H., Hasenclever, L., Pascanu, R., Schwarz, J., Desjardins, G., Czarnecki, W.M., Ahuja, A., Teh, Y.W., Heess, N.: Behavior priors for efficient reinforcement learning. Journal of Machine Learning Research 23(221), 1–68 (2022)
2022
-
[43]
AI3(2), 250–259 (2022)
Maree, C., Omlin, C.: Reinforcement Learning Your Way: Agent Characterization through Policy Regularization. AI3(2), 250–259 (2022)
2022
-
[44]
Maree, C., Omlin, C.W.: Can Interpretable Reinforcement Learning Manage Prosperity Your Way? AI3(2), 526–537 (2022)
2022
-
[45]
In: Bramer, M., Stahl, F
Vishwanath, A., Omlin, C.: Localized Affinity-Based Reinforcement Learning for Interpretable State-Specific Decision-Making. In: Bramer, M., Stahl, F. (eds.) Artificial Intelligence XLI, pp. 221–234. Springer, Cham (2025) 25
2025
-
[46]
https:// arxiv.org/abs/1706.02275
Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., Mordatch, I.: Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments (2020). https:// arxiv.org/abs/1706.02275
arXiv 2020
-
[47]
MacIntyre, A.C.: After Virtue: A Study in Moral Theory, (2007)
2007
-
[48]
Philosophy33(124), 1–19 (1958)
Anscombe, G.E.M.: Modern Moral Philosophy. Philosophy33(124), 1–19 (1958). Publisher: Cambridge University Press. Accessed 2024-04-17
1958
-
[49]
Howard, D., Muntean, I.: Artificial Moral Cognition: Moral Functionalism and Autonomous Moral Agency. In: Powers, T.M. (ed.) Philosophy and Comput- ing vol. 128, pp. 121–159. Springer, Cham (2017). https://doi.org/10.1007/ 978-3-319-61043-6 7 .http://link.springer.com/10.1007/978-3-319-61043-6 7
-
[50]
Christianos, F., Papoudakis, G., Albrecht, S.V.: Pareto Actor-Critic for Equi- librium Selection in Multi-Agent Reinforcement Learning (2023). https://arxiv. org/abs/2209.14344
arXiv 2023
-
[51]
The Computer Journal66(7), 1573–1585 (2023) https://doi.org/10.1093/comjnl/bxac027
Zhao, Z., Cao, L., Chen, X., Lai, J., Zhang, L.: Improvement of MADRL Equi- librium Based on Pareto Optimization. The Computer Journal66(7), 1573–1585 (2023) https://doi.org/10.1093/comjnl/bxac027 . Accessed 2025-01-31 26 Appendix A Observation Space In this section, we outline the structure of the state space. The virtues are encoded as numbers between 1...
-
[52]
sincerity In Table A1, we summarize the observation space of an agent in Fog of Love. Agent Observation Space Structure AgentKey Prefix Description Range Type player 1goal * Goal values (1-7) [-50,50] int32 player 1player 1 virtue * Player 1 virtue values (1-7) [-50,50] int32 player 1player 2 virtue * Player 2 virtue values (1-7) [-50,50] int32 player 1op...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.