Addressing Moral Uncertainty using Large Language Models for Ethical Decision-Making

Damian Dailisan; Rohit K. Dubey; Sachit Mahajan

arxiv: 2503.05724 · v2 · submitted 2025-02-17 · 💻 cs.CY · cs.AI

Addressing Moral Uncertainty using Large Language Models for Ethical Decision-Making

Rohit K. Dubey , Damian Dailisan , Sachit Mahajan This is my paper

Pith reviewed 2026-05-23 02:57 UTC · model grok-4.3

classification 💻 cs.CY cs.AI

keywords ethical decision-makingreinforcement learninglarge language modelsmoral uncertaintybelief aggregationDempster-Shafer Theoryshaping rewardsmulti-principle ethics

0 comments

The pith

Reinforcement learning agents can be ethically refined by replacing human feedback with belief values from a large language model embodying five moral principles, aggregated into shaping rewards.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a framework that adds a task-agnostic ethical layer to a pre-trained reinforcement learning model. In this layer an LLM assigns belief values to actions by embodying consequentialist, deontological, virtue, social justice, and care ethics; these values are then combined with Belief Jensen-Shannon Divergence and Dempster-Shafer Theory to produce probability scores that double as shaping rewards. The resulting fine-tuning steers the agent toward balanced ethical choices in environments where moral requirements are uncertain or change. A reader would care because the method removes the need for scenario-specific handcrafted rewards while still producing decisions that reflect multiple ethical perspectives simultaneously.

Core claim

The central claim is that an ethical layer which uses an LLM to generate belief values from five moral principles, aggregates those values via Belief Jensen-Shannon Divergence and Dempster-Shafer Theory into probability scores, and feeds the scores back as shaping rewards, enables a reinforcement learning agent to navigate moral uncertainty and produce morally sound decisions across diverse tasks while lowering dependence on manually designed ethical rewards.

What carries the argument

The ethical layer that aggregates LLM-generated belief scores from five moral perspectives using Belief Jensen-Shannon Divergence and Dempster-Shafer Theory to form probability scores that serve as shaping rewards.

If this is right

The framework produces improved consistency and adaptability compared with other belief aggregation methods.
It reduces reliance on handcrafted ethical rewards for each new task.
It remains effective when ethical challenges appear unexpectedly in dynamic environments.
It works across multiple LLM variants and yields decisions suited to real-world applications.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same aggregation step could be inserted into other learning loops, such as fine-tuning language models directly on ethical preferences.
If the underlying LLM systematically under-represents one of the five principles, the resulting reward signal would embed that imbalance across all tasks.
Deployment in high-stakes domains such as autonomous vehicles would require an additional verification layer that checks whether the aggregate score actually matches human expert judgments on edge cases.

Load-bearing premise

The large language model can accurately and unbiasedly embody and apply the five specified moral principles to assign belief values to actions.

What would settle it

A controlled test in which the agent faces an action that one moral principle strongly endorses while the aggregated score strongly opposes it; if the agent reliably follows the aggregate score even when individual principles conflict, the claim holds; systematic deviation would falsify it.

read the original abstract

We present an ethical decision-making framework that refines a pre-trained reinforcement learning (RL) model using a task-agnostic ethical layer. Following initial training, the RL model undergoes ethical fine-tuning, where human feedback is replaced by feedback generated from a large language model (LLM). The LLM embodies consequentialist, deontological, virtue, social justice, and care ethics as moral principles to assign belief values to recommended actions during ethical decision-making. An ethical layer aggregates belief scores from multiple LLM-derived moral perspectives using Belief Jensen-Shannon Divergence and Dempster-Shafer Theory into probability scores that also serve as the shaping reward, steering the agent toward choices that align with a balanced ethical framework. This integrated learning framework helps the RL agent navigate moral uncertainty in complex environments and enables it to make morally sound decisions across diverse tasks. Our approach, tested across different LLM variants and compared with other belief aggregation techniques, demonstrates improved consistency, adaptability, and reduced reliance on handcrafted ethical rewards. This method is especially effective in dynamic scenarios where ethical challenges arise unexpectedly, making it well-suited for real-world applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper tries to replace human feedback in RL ethical tuning with an LLM that role-plays five moral theories and aggregates via Belief JSD plus Dempster-Shafer, but the moral soundness claim has no external check against human judgments.

read the letter

The core idea is straightforward: train an RL agent normally, then add a task-agnostic layer where one LLM is prompted to score actions from five fixed ethical standpoints, aggregate the resulting belief masses with Jensen-Shannon divergence and Dempster-Shafer, and use the output as a shaping reward. That specific pipeline and the choice of those two aggregation tools together look new relative to prior LLM-for-ethics work I have seen. The write-up also shows they ran the same setup on several LLM back-ends and compared the aggregation step against simpler baselines, which is useful housekeeping. Those are the concrete contributions. The rest of the abstract is mostly description of the setup. The obvious gap is the one flagged in the stress test. They claim the method produces “morally sound decisions” and improved consistency, yet the only evidence offered is that different LLMs agree more after aggregation. There is no reported comparison to human ethical ratings on the same action sets or standard dilemma benchmarks. Without that, consistency among models does not tell us whether the outputs track independent moral criteria or just shared LLM biases. The central modeling assumption—that an LLM can reliably and unbiasedly instantiate consequentialism, deontology, virtue ethics, social justice, and care ethics—therefore carries the whole argument and is not tested. The paper is aimed at people already working on value alignment for RL agents in robotics or autonomous systems. A reader in that niche might pick up the aggregation trick or the multi-theory prompting pattern. On current evidence the work is not yet ready for a serious referee; the empirical claims need at least one human-validation experiment and clearer baselines before it justifies review time. If the full manuscript contains those checks, then yes; otherwise it should stay in the workshop pile.

Referee Report

3 major / 2 minor

Summary. The manuscript presents a framework for ethical decision-making in reinforcement learning agents. It uses a large language model to generate feedback based on five moral principles—consequentialist, deontological, virtue, social justice, and care ethics—to assign belief values to actions. These are aggregated using Belief Jensen-Shannon Divergence and Dempster-Shafer Theory to create probability scores that serve as shaping rewards for the RL agent, aiming to handle moral uncertainty without human feedback or handcrafted rewards. The approach is tested across LLM variants and compared to other aggregation techniques, claiming improved consistency and adaptability.

Significance. If the empirical claims hold and external validation is added, the work could contribute to scalable methods for incorporating multiple ethical perspectives into AI systems, addressing moral uncertainty in dynamic environments. The use of established aggregation techniques like DST is a positive element. However, without human benchmarks, the significance for producing 'morally sound decisions' remains limited.

major comments (3)

[Abstract] Abstract: The assertion that the method 'demonstrates improved consistency, adaptability, and reduced reliance on handcrafted ethical rewards' after testing across LLM variants is unsupported by any reported data, baselines, quantitative metrics, or statistical comparisons, which is load-bearing for the central claim.
[Experiments / Evaluation section] Experiments / Evaluation section: No comparisons of LLM-assigned belief values to human ethical judgments on the same action sets or established moral dilemma benchmarks (e.g., trolley problems or standard ethics datasets) are provided. This directly undermines the claim that the framework enables 'morally sound decisions,' as consistency among LLMs may capture shared model biases rather than independent moral validity.
[Framework section] Framework section: The core assumption that the LLM can accurately and unbiasedly 'embody' the five moral principles to assign belief values is not validated against external human standards or inter-rater reliability measures, which is load-bearing for the claim that the aggregated rewards steer the agent toward ethical choices.

minor comments (2)

[Framework section] The description of how Belief Jensen-Shannon Divergence is computed from the LLM outputs could include an explicit equation or pseudocode for reproducibility.
[Framework section] Clarify whether the 'ethical fine-tuning layer' modifies the RL policy directly or only the reward function, as this affects interpretation of the results.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments, which highlight important areas for clarification and improvement in our work on ethical decision-making for RL agents. We address each major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: The assertion that the method 'demonstrates improved consistency, adaptability, and reduced reliance on handcrafted ethical rewards' after testing across LLM variants is unsupported by any reported data, baselines, quantitative metrics, or statistical comparisons, which is load-bearing for the central claim.

Authors: We agree with this observation. The experiments section describes testing across LLM variants and comparisons to other aggregation techniques, but does not include specific quantitative metrics or statistical comparisons to support claims of 'improved consistency' and 'adaptability'. We will revise the abstract to remove these unsupported assertions and instead describe the experimental setup more accurately. revision: yes
Referee: [Experiments / Evaluation section] Experiments / Evaluation section: No comparisons of LLM-assigned belief values to human ethical judgments on the same action sets or established moral dilemma benchmarks (e.g., trolley problems or standard ethics datasets) are provided. This directly undermines the claim that the framework enables 'morally sound decisions,' as consistency among LLMs may capture shared model biases rather than independent moral validity.

Authors: This is a valid criticism. Our framework is intended to address moral uncertainty by aggregating diverse LLM perspectives without relying on human feedback, which is a key contribution. However, we recognize that without external human validation, claims of moral soundness are limited. We will add a dedicated limitations paragraph in the discussion section to explicitly address this point, noting that LLM agreement may reflect training data biases and recommending human studies on standard benchmarks as future work. revision: partial
Referee: [Framework section] Framework section: The core assumption that the LLM can accurately and unbiasedly 'embody' the five moral principles to assign belief values is not validated against external human standards or inter-rater reliability measures, which is load-bearing for the claim that the aggregated rewards steer the agent toward ethical choices.

Authors: We acknowledge that the manuscript does not provide validation of the LLM's embodiment of moral principles against human standards. The approach relies on the capability of LLMs to simulate different ethical frameworks through prompting, as described in the framework section. To address this, we will revise the framework section to more clearly articulate this assumption and discuss potential biases, while emphasizing that the aggregation via Belief JSD and DST is meant to mitigate individual perspective biases. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation relies on external LLM assumptions and standard aggregation methods

full rationale

The paper proposes replacing human feedback with LLM-generated belief values from five fixed moral principles, then aggregates them via Belief Jensen-Shannon Divergence and Dempster-Shafer Theory to produce shaping rewards. No equations or steps define a quantity in terms of itself, rename a fitted parameter as a prediction, or reduce the central claim to a self-citation chain. The framework is self-contained against its stated inputs; any weakness lies in unvalidated external assumptions about LLM moral accuracy rather than internal circularity.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The approach depends on the effectiveness of LLM in ethical reasoning and the validity of the aggregation techniques for producing balanced ethical guidance.

free parameters (1)

Belief values from LLM
Generated by the LLM for each moral perspective; specific computation details not provided in abstract.

axioms (1)

domain assumption LLMs can reliably simulate multiple ethical frameworks without introducing significant bias
Central to replacing human feedback with LLM feedback.

invented entities (1)

Ethical fine-tuning layer no independent evidence
purpose: Aggregates moral beliefs into shaping rewards for RL
New component introduced in the framework.

pith-pipeline@v0.9.0 · 5726 in / 1337 out tokens · 48170 ms · 2026-05-23T02:57:53.872429+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

71 extracted references · 71 canonical work pages · 3 internal anchors

[1]

Nature computational science4(9), 629–632 (2024) https://doi.org/10.1038/s43588-024-00695-4

Chen, S.: The lost data: how AI systems censor LGBTQ+ content in the name of safety. Nature computational science4(9), 629–632 (2024) https://doi.org/10.1038/s43588-024-00695-4

work page doi:10.1038/s43588-024-00695-4 2024
[2]

Journal of Experimental & Theoretical Artificial Intelligence12(3), 251–261 (2000)

Allen, C., Varner, G., Zinser, J.: Prolegomena to any future artificial moral agent. Journal of Experimental & Theoretical Artificial Intelligence12(3), 251–261 (2000)

work page 2000
[3]

Wallach, W., Allen, C.: Moral Machines: Teaching Robots Right from Wrong, (2008)

work page 2008
[4]

IEEE Intelligent Systems21(4), 38–44 (2006) https://doi.org/10.1109/mis.2006.82

Bringsjord, S., Arkoudas, K., Bello, P.: Toward a general logicist methodology for engineering ethically correct robots. IEEE Intelligent Systems21(4), 38–44 (2006) https://doi.org/10.1109/mis.2006.82

work page doi:10.1109/mis.2006.82 2006
[5]

AI magazine36(4), 105–114 (2015)

Russell, S., Dewey, D., Tegmark, M.: Research priorities for robust and beneficial artificial intelligence. AI magazine36(4), 105–114 (2015)

work page 2015
[6]

arXiv (2020)

Ecoffet, A., Lehman, J.: Reinforcement Learning Under Moral Uncertainty. arXiv (2020). https://doi. org/10.48550/arxiv.2006.04734

work page doi:10.48550/arxiv.2006.04734 2020
[7]

Oxford Univer- sity Press (2018).https://doi.org/10.1093/oso/9780198814788.001.0001

MacAskill, M., Bykvist, K., Ord, T.: Moral Uncertainty. Oxford University Press, ??? (2020). https: //doi.org/10.1093/oso/9780198722274.001.0001

work page doi:10.1093/oso/9780198722274.001.0001 2020
[8]

Edward Elgar Publishing, ??? (2019)

L¨ utge, C.: The Ethics of Competition: How a Competitive Society Is Good for All. Edward Elgar Publishing, ??? (2019)

work page 2019
[9]

A Formalization of Kant's Second Formulation of the Categorical Imperative

Lindner, F., Bentzen, M.M.: A formalization of kant’s second formulation of the categorical imperative. arXiv preprint arXiv:1801.03160 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[10]

In: Workshops at the Thirtieth AAAI Conference on Artificial Intelligence (2016)

Abel, D., MacGlashan, J., Littman, M.L.: Reinforcement learning as a framework for ethical decision making. In: Workshops at the Thirtieth AAAI Conference on Artificial Intelligence (2016)

work page 2016
[11]

Dragan, Pieter Abbeel, and Stuart Russell

Hadfield-Menell, D., Dragan, A., Abbeel, P., Russell, S.: The off-switch game. In: Workshops at the Thirty-First AAAI Conference on Artificial Intelligence, pp. 220–227. International Joint Conferences on Artificial Intelligence Organization, ??? (2017). https://doi.org/10.24963/ijcai.2017/32

work page doi:10.24963/ijcai.2017/32 2017
[12]

In: ICML, vol

Ng, A.Y., Russell, S.,et al.: Algorithms for inverse reinforcement learning. In: ICML, vol. 1, p. 2 (2000) 14

work page 2000
[13]

In: AAAI, vol

Ziebart, B.D., Maas, A.L., Bagnell, J.A., Dey, A.K.,et al.: Maximum entropy inverse reinforcement learning. In: AAAI, vol. 8, pp. 1433–1438 (2008). Chicago, IL, USA

work page 2008
[14]

Ethics and Information Technology18, 243–256 (2016) https://doi.org/10.1007/ s10676-015-9367-8

Malle, B.F.: Integrating robot ethics and machine morality: the study and design of moral com- petence in robots. Ethics and Information Technology18, 243–256 (2016) https://doi.org/10.1007/ s10676-015-9367-8

work page 2016
[15]

Robotics and Autonomous Systems77, 1–14 (2016) https://doi.org/10.1016/j.robot.2015.11

Dennis, L., Fisher, M., Slavkovik, M., Webster, M.: Formal verification of ethical choices in autonomous systems. Robotics and Autonomous Systems77, 1–14 (2016) https://doi.org/10.1016/j.robot.2015.11. 012

work page doi:10.1016/j.robot.2015.11 2016
[16]

Privacy-preserving and unforgeable searchable encrypted audit logs for cloud storage,

Sio, F., Hoven, J.: Meaningful human control over autonomous systems: A philosophical account. Frontiers in Robotics and AI5, 15 (2018) https://doi.org/10.3389/frobt.2018.00015

work page doi:10.3389/frobt.2018.00015 2018
[17]

Cummings, M.M.: Man versus machine or man+ machine? IEEE Intelligent Systems29(5), 62–69 (2014) https://doi.org/10.1109/mis.2014.87

work page doi:10.1109/mis.2014.87 2014
[18]

Nature Computational Science4(9), 641–643 (2024) https://doi.org/10.1038/s43588-024-00694-5

Suri, S.: Defining our future with generative AI. Nature Computational Science4(9), 641–643 (2024) https://doi.org/10.1038/s43588-024-00694-5

work page doi:10.1038/s43588-024-00694-5 2024
[19]

Nature621(7980), 672–675 (2023) https://doi.org/10.1038/d41586-023-02980-0

Van Noorden, R., Perkel, J.M.: Ai and science: what 1,600 researchers think. Nature621(7980), 672–675 (2023) https://doi.org/10.1038/d41586-023-02980-0

work page doi:10.1038/d41586-023-02980-0 2023
[20]

Generative Agents: Interactive Simulacra of Human Behavior

Park, J.S., O’Brien, J.C., Cai, C.J., Morris, M.R., Liang, P., Bernstein, M.S.: Generative Agents: Interactive Simulacra of Human Behavior. arXiv (2023). https://doi.org/10.48550/arxiv.2304.03442

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2304.03442 2023
[21]

Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society 7, 1696–1708 (2024) https://doi.org/10.1609/aies.v7i1.31758

Yang, J.C., Dailisan, D., Korecki, M., Hausladen, C.I., Helbing, D.: LLM voting: Human choices and AI collective decision making. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society 7, 1696–1708 (2024) https://doi.org/10.1609/aies.v7i1.31758

work page doi:10.1609/aies.v7i1.31758 2024
[22]

Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences382(2285) (2024) https://doi.org/10.1098/rsta.2024.0100

Gudi˜ no-Rosero, J., Grandi, U., Hidalgo, C.A.: Large language models (LLMs) as agents for augmented democracy. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences382(2285) (2024) https://doi.org/10.1098/rsta.2024.0100

work page doi:10.1098/rsta.2024.0100 2024
[23]

Frontiers Comput

Wang, L., Ma, C., Feng, X., Zhang, Z., Yang, H., Zhang, J., Chen, Z., Tang, J., Chen, X., Lin, Y., Zhao, W.X., Wei, Z., Wen, J.: A survey on large language model based autonomous agents. Frontiers of Computer Science18(6) (2024) https://doi.org/10.1007/s11704-024-40231-1

work page doi:10.1007/s11704-024-40231-1 2024
[24]

In: Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S

Scherrer, N., Shi, C., Feder, A., Blei, D.: Evaluating the moral beliefs encoded in LLMs. In: Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S. (eds.) Advances in Neural Information Processing Systems, vol. 36, pp. 51778–51809 (2023)

work page 2023
[25]

arXiv (2024)

Garcia, B., Qian, C., Palminteri, S.: The Moral Turing Test: Evaluating Human-LLM Alignment in Moral Decision-Making. arXiv (2024). https://doi.org/10.48550/arxiv.2410.07304

work page doi:10.48550/arxiv.2410.07304 2024
[26]

Penguin UK, ??? (2017)

Sen, A.: Collective Choice and Social Welfare: Expanded Edition. Penguin UK, ??? (2017)

work page 2017
[27]

Nature Computational Science3(10), 833–838 (2023) https://doi.org/10.1038/s43588-023-00527-x

Hagendorff, T., Fabi, S., Kosinski, M.: Human-like intuitive behavior and reasoning biases emerged in large language models but disappeared in ChatGPT. Nature Computational Science3(10), 833–838 (2023) https://doi.org/10.1038/s43588-023-00527-x

work page doi:10.1038/s43588-023-00527-x 2023
[28]

Machine ethics1, 244–253 (2011) https://doi.org/10.1017/ cbo9780511978036.019

Gips, J.: Towards the ethical robot. Machine ethics1, 244–253 (2011) https://doi.org/10.1017/ cbo9780511978036.019

work page 2011
[29]

arXiv preprint arXiv:2406.18841 (2024)

Jiao, J., Afroogh, S., Xu, Y., Phillips, C.: Navigating LLM ethics: Advancements, challenges, and future directions. arXiv preprint arXiv:2406.18841 (2024)

work page arXiv 2024
[30]

Science and Engineering Ethics 31(1), 1–13 (2025)

Coeckelbergh, M.: LLMs, truth, and democracy: An overview of risks. Science and Engineering Ethics 31(1), 1–13 (2025)

work page 2025
[31]

arXiv preprint arXiv:2411.00784 (2024) 15

Xie, Z., Xing, R., Wang, Y., Geng, J., Iqbal, H., Sahnan, D., Gurevych, I., Nakov, P.: Fire: Fact-checking with iterative retrieval and verification. arXiv preprint arXiv:2411.00784 (2024) 15

work page arXiv 2024
[32]

In: Proceedings of the 34th International Conference on Machine Learning, pp

Achiam, J., Held, D., Tamar, A., Abbeel, P.: Constrained policy optimization. In: Proceedings of the 34th International Conference on Machine Learning, pp. 22–31 (2017). PMLR

work page 2017
[33]

In: Advances in Neural Information Processing Systems, pp

Tamar, A., Chow, Y., Ghavamzadeh, M., Mannor, S.: Policy gradients with variance related risk criteria. In: Advances in Neural Information Processing Systems, pp. 167–175 (2015)

work page 2015
[34]

In: Proceedings of the 35th International Conference on Machine Learning, pp

Chow, Y., Nachum, O., Du´ enez-Guzm´ an, E., Ghavamzadeh, M.: Lyapunov-based safe policy optimiza- tion for continuous control. In: Proceedings of the 35th International Conference on Machine Learning, pp. 809–818 (2018). PMLR

work page 2018
[35]

In: Proceedings of the AAAI Conference on Artificial Intelligence, vol

Alshiekh, M., Bloem, R., Ehlers, R., Koch, M., K¨ onighofer, B., Niekum, S., Topcu, U.: Safe reinforce- ment learning via shielding. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)

work page 2018
[36]

In: Advances in Neural Information Processing Systems, pp

Berkenkamp, F., Turchetta, M., Schoellig, A.P., Krause, A.: Safe model-based reinforcement learning with stability guarantees. In: Advances in Neural Information Processing Systems, pp. 908–918 (2017)

work page 2017
[37]

Neural Computation17(2), 335–359 (2005)

Morimoto, J., Doya, K.: Robust reinforcement learning. Neural Computation17(2), 335–359 (2005)

work page 2005
[38]

Journal of Machine Learning Research16(1), 1437–1480 (2015)

Garc´ ıa, J., Fern´ andez, F.: A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research16(1), 1437–1480 (2015)

work page 2015
[39]

arXiv preprint arXiv:2205.10330 (2022)

Gu, S., Yang, L., Du, Y., Chen, G., Walter, F., Wang, J., Knoll, A.: A review of safe reinforcement learning: Methods, theory and applications. arXiv preprint arXiv:2205.10330 (2022)

work page arXiv 2022
[40]

International journal of information management48, 63–71 (2019) https://doi.org/10.1016/j.ijinfomgt.2019.01.021

Duan, Y., Edwards, J.S., Dwivedi, Y.K.: Artificial intelligence for decision making in the era of big data–evolution, challenges and research agenda. International journal of information management48, 63–71 (2019) https://doi.org/10.1016/j.ijinfomgt.2019.01.021

work page doi:10.1016/j.ijinfomgt.2019.01.021 2019
[41]

AI & SOCIETY, 1–8 (2024) https://doi.org/10.1007/s00146-024-01968-2

Mahajan, S.: The executioner paradox: understanding self-referential dilemma in computational systems. AI & SOCIETY, 1–8 (2024) https://doi.org/10.1007/s00146-024-01968-2

work page doi:10.1007/s00146-024-01968-2 2024
[42]

ACM Computing Surveys (CSUR)53(6), 1–38 (2020) https://doi.org/10.1145/3419633

Tolmeijer, S., Kneer, M., Sarasua, C., Christen, M., Bernstein, A.: Implementations in machine ethics: A survey. ACM Computing Surveys (CSUR)53(6), 1–38 (2020) https://doi.org/10.1145/3419633

work page doi:10.1145/3419633 2020
[43]

Nature Machine Intelligence4(3), 258–268 (2022)

Schramowski, P., Turan, C., Andersen, N., Rothkopf, C.A., Kersting, K.: Large pre-trained language models contain human-like biases of what is right and wrong to do. Nature Machine Intelligence4(3), 258–268 (2022)

work page 2022
[44]

Cambridge University Press, ??? (2011)

Anderson, M., Anderson, S.L.: Machine Ethics. Cambridge University Press, ??? (2011). https://doi. org/10.1017/cbo9780511978036

work page doi:10.1017/cbo9780511978036 2011
[45]

IEEE Intelligent Systems21(4), 46–51 (2006)

Powers, T.M.: Prospects for a kantian machine. IEEE Intelligent Systems21(4), 46–51 (2006)

work page 2006
[46]

In: Zalta, E.N., Nodelman, U

Alexander, L., Moore, M.: Deontological Ethics. In: Zalta, E.N., Nodelman, U. (eds.) The Stanford Encyclopedia of Philosophy, Winter 2024 edn. Metaphysics Research Lab, Stanford University, ??? (2024)

work page 2024
[47]

Journal of social philosophy45(1), 89–106 (2014)

Gilligan, C.: Moral injury and the ethic of care: Reframing the conversation about differences. Journal of social philosophy45(1), 89–106 (2014)

work page 2014
[48]

Oxford University Press (2007)

Pogge, T.: John Rawls: His Life and Theory of Justice. Oxford University Press (2007). https://doi. org/10.5860/choice.45-1128

work page doi:10.5860/choice.45-1128 2007
[49]

Ethics and Information Technology20(1), 1–3 (2018) https://doi.org/10.1007/s10676-018-9450-z

Dignum, V.: Ethics in artificial intelligence: introduction to the special issue. Ethics and Information Technology20(1), 1–3 (2018) https://doi.org/10.1007/s10676-018-9450-z

work page doi:10.1007/s10676-018-9450-z 2018
[50]

In: Proceedings of the AAAI Conference on Artificial Intelli- gence, vol

Conitzer, V., Sinnott-Armstrong, W., Borg, J.S., Deng, Y., Kramer, M.: Moral decision making frameworks for artificial intelligence. In: Proceedings of the AAAI Conference on Artificial Intelli- gence, vol. 31. Association for the Advancement of Artificial Intelligence (AAAI), ??? (2017). https: //doi.org/10.1609/aaai.v31i1.11140

work page doi:10.1609/aaai.v31i1.11140 2017
[51]

Artificial Intelligence281, 103239 16 (2020)

Bench-Capon, T.J.: Ethical approaches and autonomous systems. Artificial Intelligence281, 103239 16 (2020)

work page 2020
[52]

Soft Computing 27(22), 16483–16492 (2023) https://doi.org/10.1007/s00500-023-09112-w

Chen, X., Deng, Y.: A novel combination rule for conflict management in data fusion. Soft Computing 27(22), 16483–16492 (2023) https://doi.org/10.1007/s00500-023-09112-w

work page doi:10.1007/s00500-023-09112-w 2023
[53]

Applied Soft Computing124, 109075 (2022) https://doi.org/10.1016/j.asoc.2022.109075

Zhao, K., Li, L., Chen, Z., Sun, R., Yuan, G., Li, J.: A survey: Optimization and applications of evidence fusion algorithm based on Dempster–Shafer theory. Applied Soft Computing124, 109075 (2022) https://doi.org/10.1016/j.asoc.2022.109075

work page doi:10.1016/j.asoc.2022.109075 2022
[54]

Information Fusion46, 23–32 (2019) https://doi.org/10.1016/j.inffus.2018.04.003

Xiao, F.: Multi-sensor data fusion based on the belief divergence measure of evidences and the belief entropy. Information Fusion46, 23–32 (2019) https://doi.org/10.1016/j.inffus.2018.04.003

work page doi:10.1016/j.inffus.2018.04.003 2019
[55]

In: Classic Works of the Dempster-Shafer Theory of Belief Functions, pp

Dempster, A.P.: Upper and lower probabilities induced by a multivalued mapping. In: Classic Works of the Dempster-Shafer Theory of Belief Functions, pp. 57–72. Springer, ??? (2008). https://doi.org/ 10.1007/978-3-540-44792-4 3

work page doi:10.1007/978-3-540-44792-4 2008
[56]

PhD thesis, University of Oxford (2014)

MacAskill, W.: Normative uncertainty. PhD thesis, University of Oxford (2014)

work page 2014
[57]

Social Cognitive and Affective Neuroscience17(3), 253–265 (2021) https://doi.org/10.1093/scan/nsab100

Ugazio, G., Grueschow, M., Polania, R., Lamm, C., Tobler, P., Ruff, C.: Neuro-computational foun- dations of moral preferences. Social Cognitive and Affective Neuroscience17(3), 253–265 (2021) https://doi.org/10.1093/scan/nsab100

work page doi:10.1093/scan/nsab100 2021
[58]

In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A

Wei, J., Wang, X., Schuurmans, D., Bosma, M., ichter, b., Xia, F., Chi, E., Le, Q.V., Zhou, D.: Chain- of-thought prompting elicits reasoning in large language models. In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) Advances in Neural Information Processing Systems, vol. 35, pp. 24824–24837 (2022)

work page 2022
[59]

Bellman, R.: A markovian decision process. J. Math. Mech.6, 679–684 (1957)

work page 1957
[60]

MIT press, ??? (2018)

Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT press, ??? (2018). https: //doi.org/10.1109/tnn.1998.712192

work page doi:10.1109/tnn.1998.712192 2018
[61]

Proximal Policy Optimization Algorithms

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal Policy Optimization Algorithms (2017). https://doi.org/10.48550/arxiv.1707.06347

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1707.06347 2017
[62]

Journal of Machine Learning Research23(274), 1–18 (2022)

Huang, S., Dossa, R.F.J., Ye, C., Braga, J., Chakraborty, D., Mehta, K., Ara´ ujo, J.G.M.: Cleanrl: High-quality single-file implementations of deep reinforcement learning algorithms. Journal of Machine Learning Research23(274), 1–18 (2022)

work page 2022
[63]

In: Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society

Butlin, P.: AI alignment and human reward. In: Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society. AIES ’21, pp. 437–445. ACM, ??? (2021). https://doi.org/10.1145/3461702. 3462570

work page doi:10.1145/3461702 2021
[64]

In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R

Christiano, P.F., Leike, J., Brown, T., Martic, M., Legg, S., Amodei, D.: Deep reinforcement learn- ing from human preferences. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 30 (2017)

work page 2017
[65]

Hugging Face Blog (2022)

Lambert, N., Castricato, L., Werra, L., Havrilla, A.: Illustrating reinforcement learning from human feedback (rlhf). Hugging Face Blog (2022). https://huggingface.co/blog/rlhf

work page 2022
[66]

Proceedings of the AAAI Conference on Artificial Intelligence32(1) (2018) https://doi.org/10.1609/ aaai.v32i1.11498

Wu, Y.-H., Lin, S.-D.: A low-cost ethics shaping approach for designing reinforcement learning agents. Proceedings of the AAAI Conference on Artificial Intelligence32(1) (2018) https://doi.org/10.1609/ aaai.v32i1.11498

work page 2018
[67]

Scientific Reports9(1) (2019) https://doi.org/10.1038/ s41598-019-49411-7

Frank, D.-A., Chrysochou, P., Mitkidis, P., Ariely, D.: Human decision-making biases in the moral dilemmas of autonomous vehicles. Scientific Reports9(1) (2019) https://doi.org/10.1038/ s41598-019-49411-7

work page 2019
[68]

arXiv (2024)

White, C., Dooley, S., Roberts, M., Pal, A., Feuer, B., Jain, S., Shwartz-Ziv, R., Jain, N., Saifullah, K., Naidu, S., Hegde, C., LeCun, Y., Goldstein, T., Neiswanger, W., Goldblum, M.: LiveBench: A Challenging, Contamination-Free LLM Benchmark. arXiv (2024). https://doi.org/10.48550/arxiv.2406. 19314 17

work page doi:10.48550/arxiv.2406 2024
[69]

Helbing, D.: Summary: What’s wrong with AI? Humanistic technology needed! Next Civilization: Digital Democracy and Socio-Ecological Finance-How to Avoid Dystopia and Upgrade Society by Digital Means, 285–313 (2021)

work page 2021
[70]

Helbing, D., Ienca, M.: Why converging technologies need converging international regulation. Ethics and Information Technology26(1), 15 (2024) https://doi.org/10.1007/s10676-024-09756-8 18 Appendix A LLM Prompts Throughout our simulations, the moral agent is embodied by a large language model (LLM) interacting with the simulation environment. These inter...

work page doi:10.1007/s10676-024-09756-8 2024
[71]

BJS 1k

BJS 1j . . .BJS 1k ... ... ... ... ... BJS i1 . . .0. . .BJS ik ... ... ... ... ... BJS k1 . . .BJS kj . . .0   (C2) The belief divergence as a distance measure quantifies the level of consistency across evidence from different sources. This measure of consistency allows for the identification of sources that are in alignment versus those that are...

work page

[1] [1]

Nature computational science4(9), 629–632 (2024) https://doi.org/10.1038/s43588-024-00695-4

Chen, S.: The lost data: how AI systems censor LGBTQ+ content in the name of safety. Nature computational science4(9), 629–632 (2024) https://doi.org/10.1038/s43588-024-00695-4

work page doi:10.1038/s43588-024-00695-4 2024

[2] [2]

Journal of Experimental & Theoretical Artificial Intelligence12(3), 251–261 (2000)

Allen, C., Varner, G., Zinser, J.: Prolegomena to any future artificial moral agent. Journal of Experimental & Theoretical Artificial Intelligence12(3), 251–261 (2000)

work page 2000

[3] [3]

Wallach, W., Allen, C.: Moral Machines: Teaching Robots Right from Wrong, (2008)

work page 2008

[4] [4]

IEEE Intelligent Systems21(4), 38–44 (2006) https://doi.org/10.1109/mis.2006.82

Bringsjord, S., Arkoudas, K., Bello, P.: Toward a general logicist methodology for engineering ethically correct robots. IEEE Intelligent Systems21(4), 38–44 (2006) https://doi.org/10.1109/mis.2006.82

work page doi:10.1109/mis.2006.82 2006

[5] [5]

AI magazine36(4), 105–114 (2015)

Russell, S., Dewey, D., Tegmark, M.: Research priorities for robust and beneficial artificial intelligence. AI magazine36(4), 105–114 (2015)

work page 2015

[6] [6]

arXiv (2020)

Ecoffet, A., Lehman, J.: Reinforcement Learning Under Moral Uncertainty. arXiv (2020). https://doi. org/10.48550/arxiv.2006.04734

work page doi:10.48550/arxiv.2006.04734 2020

[7] [7]

Oxford Univer- sity Press (2018).https://doi.org/10.1093/oso/9780198814788.001.0001

MacAskill, M., Bykvist, K., Ord, T.: Moral Uncertainty. Oxford University Press, ??? (2020). https: //doi.org/10.1093/oso/9780198722274.001.0001

work page doi:10.1093/oso/9780198722274.001.0001 2020

[8] [8]

Edward Elgar Publishing, ??? (2019)

L¨ utge, C.: The Ethics of Competition: How a Competitive Society Is Good for All. Edward Elgar Publishing, ??? (2019)

work page 2019

[9] [9]

A Formalization of Kant's Second Formulation of the Categorical Imperative

Lindner, F., Bentzen, M.M.: A formalization of kant’s second formulation of the categorical imperative. arXiv preprint arXiv:1801.03160 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[10] [10]

In: Workshops at the Thirtieth AAAI Conference on Artificial Intelligence (2016)

Abel, D., MacGlashan, J., Littman, M.L.: Reinforcement learning as a framework for ethical decision making. In: Workshops at the Thirtieth AAAI Conference on Artificial Intelligence (2016)

work page 2016

[11] [11]

Dragan, Pieter Abbeel, and Stuart Russell

Hadfield-Menell, D., Dragan, A., Abbeel, P., Russell, S.: The off-switch game. In: Workshops at the Thirty-First AAAI Conference on Artificial Intelligence, pp. 220–227. International Joint Conferences on Artificial Intelligence Organization, ??? (2017). https://doi.org/10.24963/ijcai.2017/32

work page doi:10.24963/ijcai.2017/32 2017

[12] [12]

In: ICML, vol

Ng, A.Y., Russell, S.,et al.: Algorithms for inverse reinforcement learning. In: ICML, vol. 1, p. 2 (2000) 14

work page 2000

[13] [13]

In: AAAI, vol

Ziebart, B.D., Maas, A.L., Bagnell, J.A., Dey, A.K.,et al.: Maximum entropy inverse reinforcement learning. In: AAAI, vol. 8, pp. 1433–1438 (2008). Chicago, IL, USA

work page 2008

[14] [14]

Ethics and Information Technology18, 243–256 (2016) https://doi.org/10.1007/ s10676-015-9367-8

Malle, B.F.: Integrating robot ethics and machine morality: the study and design of moral com- petence in robots. Ethics and Information Technology18, 243–256 (2016) https://doi.org/10.1007/ s10676-015-9367-8

work page 2016

[15] [15]

Robotics and Autonomous Systems77, 1–14 (2016) https://doi.org/10.1016/j.robot.2015.11

Dennis, L., Fisher, M., Slavkovik, M., Webster, M.: Formal verification of ethical choices in autonomous systems. Robotics and Autonomous Systems77, 1–14 (2016) https://doi.org/10.1016/j.robot.2015.11. 012

work page doi:10.1016/j.robot.2015.11 2016

[16] [16]

Privacy-preserving and unforgeable searchable encrypted audit logs for cloud storage,

Sio, F., Hoven, J.: Meaningful human control over autonomous systems: A philosophical account. Frontiers in Robotics and AI5, 15 (2018) https://doi.org/10.3389/frobt.2018.00015

work page doi:10.3389/frobt.2018.00015 2018

[17] [17]

Cummings, M.M.: Man versus machine or man+ machine? IEEE Intelligent Systems29(5), 62–69 (2014) https://doi.org/10.1109/mis.2014.87

work page doi:10.1109/mis.2014.87 2014

[18] [18]

Nature Computational Science4(9), 641–643 (2024) https://doi.org/10.1038/s43588-024-00694-5

Suri, S.: Defining our future with generative AI. Nature Computational Science4(9), 641–643 (2024) https://doi.org/10.1038/s43588-024-00694-5

work page doi:10.1038/s43588-024-00694-5 2024

[19] [19]

Nature621(7980), 672–675 (2023) https://doi.org/10.1038/d41586-023-02980-0

Van Noorden, R., Perkel, J.M.: Ai and science: what 1,600 researchers think. Nature621(7980), 672–675 (2023) https://doi.org/10.1038/d41586-023-02980-0

work page doi:10.1038/d41586-023-02980-0 2023

[20] [20]

Generative Agents: Interactive Simulacra of Human Behavior

Park, J.S., O’Brien, J.C., Cai, C.J., Morris, M.R., Liang, P., Bernstein, M.S.: Generative Agents: Interactive Simulacra of Human Behavior. arXiv (2023). https://doi.org/10.48550/arxiv.2304.03442

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2304.03442 2023

[21] [21]

Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society 7, 1696–1708 (2024) https://doi.org/10.1609/aies.v7i1.31758

Yang, J.C., Dailisan, D., Korecki, M., Hausladen, C.I., Helbing, D.: LLM voting: Human choices and AI collective decision making. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society 7, 1696–1708 (2024) https://doi.org/10.1609/aies.v7i1.31758

work page doi:10.1609/aies.v7i1.31758 2024

[22] [22]

Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences382(2285) (2024) https://doi.org/10.1098/rsta.2024.0100

Gudi˜ no-Rosero, J., Grandi, U., Hidalgo, C.A.: Large language models (LLMs) as agents for augmented democracy. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences382(2285) (2024) https://doi.org/10.1098/rsta.2024.0100

work page doi:10.1098/rsta.2024.0100 2024

[23] [23]

Frontiers Comput

Wang, L., Ma, C., Feng, X., Zhang, Z., Yang, H., Zhang, J., Chen, Z., Tang, J., Chen, X., Lin, Y., Zhao, W.X., Wei, Z., Wen, J.: A survey on large language model based autonomous agents. Frontiers of Computer Science18(6) (2024) https://doi.org/10.1007/s11704-024-40231-1

work page doi:10.1007/s11704-024-40231-1 2024

[24] [24]

In: Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S

Scherrer, N., Shi, C., Feder, A., Blei, D.: Evaluating the moral beliefs encoded in LLMs. In: Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S. (eds.) Advances in Neural Information Processing Systems, vol. 36, pp. 51778–51809 (2023)

work page 2023

[25] [25]

arXiv (2024)

Garcia, B., Qian, C., Palminteri, S.: The Moral Turing Test: Evaluating Human-LLM Alignment in Moral Decision-Making. arXiv (2024). https://doi.org/10.48550/arxiv.2410.07304

work page doi:10.48550/arxiv.2410.07304 2024

[26] [26]

Penguin UK, ??? (2017)

Sen, A.: Collective Choice and Social Welfare: Expanded Edition. Penguin UK, ??? (2017)

work page 2017

[27] [27]

Nature Computational Science3(10), 833–838 (2023) https://doi.org/10.1038/s43588-023-00527-x

Hagendorff, T., Fabi, S., Kosinski, M.: Human-like intuitive behavior and reasoning biases emerged in large language models but disappeared in ChatGPT. Nature Computational Science3(10), 833–838 (2023) https://doi.org/10.1038/s43588-023-00527-x

work page doi:10.1038/s43588-023-00527-x 2023

[28] [28]

Machine ethics1, 244–253 (2011) https://doi.org/10.1017/ cbo9780511978036.019

Gips, J.: Towards the ethical robot. Machine ethics1, 244–253 (2011) https://doi.org/10.1017/ cbo9780511978036.019

work page 2011

[29] [29]

arXiv preprint arXiv:2406.18841 (2024)

Jiao, J., Afroogh, S., Xu, Y., Phillips, C.: Navigating LLM ethics: Advancements, challenges, and future directions. arXiv preprint arXiv:2406.18841 (2024)

work page arXiv 2024

[30] [30]

Science and Engineering Ethics 31(1), 1–13 (2025)

Coeckelbergh, M.: LLMs, truth, and democracy: An overview of risks. Science and Engineering Ethics 31(1), 1–13 (2025)

work page 2025

[31] [31]

arXiv preprint arXiv:2411.00784 (2024) 15

Xie, Z., Xing, R., Wang, Y., Geng, J., Iqbal, H., Sahnan, D., Gurevych, I., Nakov, P.: Fire: Fact-checking with iterative retrieval and verification. arXiv preprint arXiv:2411.00784 (2024) 15

work page arXiv 2024

[32] [32]

In: Proceedings of the 34th International Conference on Machine Learning, pp

Achiam, J., Held, D., Tamar, A., Abbeel, P.: Constrained policy optimization. In: Proceedings of the 34th International Conference on Machine Learning, pp. 22–31 (2017). PMLR

work page 2017

[33] [33]

In: Advances in Neural Information Processing Systems, pp

Tamar, A., Chow, Y., Ghavamzadeh, M., Mannor, S.: Policy gradients with variance related risk criteria. In: Advances in Neural Information Processing Systems, pp. 167–175 (2015)

work page 2015

[34] [34]

In: Proceedings of the 35th International Conference on Machine Learning, pp

Chow, Y., Nachum, O., Du´ enez-Guzm´ an, E., Ghavamzadeh, M.: Lyapunov-based safe policy optimiza- tion for continuous control. In: Proceedings of the 35th International Conference on Machine Learning, pp. 809–818 (2018). PMLR

work page 2018

[35] [35]

In: Proceedings of the AAAI Conference on Artificial Intelligence, vol

Alshiekh, M., Bloem, R., Ehlers, R., Koch, M., K¨ onighofer, B., Niekum, S., Topcu, U.: Safe reinforce- ment learning via shielding. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)

work page 2018

[36] [36]

In: Advances in Neural Information Processing Systems, pp

Berkenkamp, F., Turchetta, M., Schoellig, A.P., Krause, A.: Safe model-based reinforcement learning with stability guarantees. In: Advances in Neural Information Processing Systems, pp. 908–918 (2017)

work page 2017

[37] [37]

Neural Computation17(2), 335–359 (2005)

Morimoto, J., Doya, K.: Robust reinforcement learning. Neural Computation17(2), 335–359 (2005)

work page 2005

[38] [38]

Journal of Machine Learning Research16(1), 1437–1480 (2015)

Garc´ ıa, J., Fern´ andez, F.: A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research16(1), 1437–1480 (2015)

work page 2015

[39] [39]

arXiv preprint arXiv:2205.10330 (2022)

Gu, S., Yang, L., Du, Y., Chen, G., Walter, F., Wang, J., Knoll, A.: A review of safe reinforcement learning: Methods, theory and applications. arXiv preprint arXiv:2205.10330 (2022)

work page arXiv 2022

[40] [40]

International journal of information management48, 63–71 (2019) https://doi.org/10.1016/j.ijinfomgt.2019.01.021

Duan, Y., Edwards, J.S., Dwivedi, Y.K.: Artificial intelligence for decision making in the era of big data–evolution, challenges and research agenda. International journal of information management48, 63–71 (2019) https://doi.org/10.1016/j.ijinfomgt.2019.01.021

work page doi:10.1016/j.ijinfomgt.2019.01.021 2019

[41] [41]

AI & SOCIETY, 1–8 (2024) https://doi.org/10.1007/s00146-024-01968-2

Mahajan, S.: The executioner paradox: understanding self-referential dilemma in computational systems. AI & SOCIETY, 1–8 (2024) https://doi.org/10.1007/s00146-024-01968-2

work page doi:10.1007/s00146-024-01968-2 2024

[42] [42]

ACM Computing Surveys (CSUR)53(6), 1–38 (2020) https://doi.org/10.1145/3419633

Tolmeijer, S., Kneer, M., Sarasua, C., Christen, M., Bernstein, A.: Implementations in machine ethics: A survey. ACM Computing Surveys (CSUR)53(6), 1–38 (2020) https://doi.org/10.1145/3419633

work page doi:10.1145/3419633 2020

[43] [43]

Nature Machine Intelligence4(3), 258–268 (2022)

Schramowski, P., Turan, C., Andersen, N., Rothkopf, C.A., Kersting, K.: Large pre-trained language models contain human-like biases of what is right and wrong to do. Nature Machine Intelligence4(3), 258–268 (2022)

work page 2022

[44] [44]

Cambridge University Press, ??? (2011)

Anderson, M., Anderson, S.L.: Machine Ethics. Cambridge University Press, ??? (2011). https://doi. org/10.1017/cbo9780511978036

work page doi:10.1017/cbo9780511978036 2011

[45] [45]

IEEE Intelligent Systems21(4), 46–51 (2006)

Powers, T.M.: Prospects for a kantian machine. IEEE Intelligent Systems21(4), 46–51 (2006)

work page 2006

[46] [46]

In: Zalta, E.N., Nodelman, U

Alexander, L., Moore, M.: Deontological Ethics. In: Zalta, E.N., Nodelman, U. (eds.) The Stanford Encyclopedia of Philosophy, Winter 2024 edn. Metaphysics Research Lab, Stanford University, ??? (2024)

work page 2024

[47] [47]

Journal of social philosophy45(1), 89–106 (2014)

Gilligan, C.: Moral injury and the ethic of care: Reframing the conversation about differences. Journal of social philosophy45(1), 89–106 (2014)

work page 2014

[48] [48]

Oxford University Press (2007)

Pogge, T.: John Rawls: His Life and Theory of Justice. Oxford University Press (2007). https://doi. org/10.5860/choice.45-1128

work page doi:10.5860/choice.45-1128 2007

[49] [49]

Ethics and Information Technology20(1), 1–3 (2018) https://doi.org/10.1007/s10676-018-9450-z

Dignum, V.: Ethics in artificial intelligence: introduction to the special issue. Ethics and Information Technology20(1), 1–3 (2018) https://doi.org/10.1007/s10676-018-9450-z

work page doi:10.1007/s10676-018-9450-z 2018

[50] [50]

In: Proceedings of the AAAI Conference on Artificial Intelli- gence, vol

Conitzer, V., Sinnott-Armstrong, W., Borg, J.S., Deng, Y., Kramer, M.: Moral decision making frameworks for artificial intelligence. In: Proceedings of the AAAI Conference on Artificial Intelli- gence, vol. 31. Association for the Advancement of Artificial Intelligence (AAAI), ??? (2017). https: //doi.org/10.1609/aaai.v31i1.11140

work page doi:10.1609/aaai.v31i1.11140 2017

[51] [51]

Artificial Intelligence281, 103239 16 (2020)

Bench-Capon, T.J.: Ethical approaches and autonomous systems. Artificial Intelligence281, 103239 16 (2020)

work page 2020

[52] [52]

Soft Computing 27(22), 16483–16492 (2023) https://doi.org/10.1007/s00500-023-09112-w

Chen, X., Deng, Y.: A novel combination rule for conflict management in data fusion. Soft Computing 27(22), 16483–16492 (2023) https://doi.org/10.1007/s00500-023-09112-w

work page doi:10.1007/s00500-023-09112-w 2023

[53] [53]

Applied Soft Computing124, 109075 (2022) https://doi.org/10.1016/j.asoc.2022.109075

Zhao, K., Li, L., Chen, Z., Sun, R., Yuan, G., Li, J.: A survey: Optimization and applications of evidence fusion algorithm based on Dempster–Shafer theory. Applied Soft Computing124, 109075 (2022) https://doi.org/10.1016/j.asoc.2022.109075

work page doi:10.1016/j.asoc.2022.109075 2022

[54] [54]

Information Fusion46, 23–32 (2019) https://doi.org/10.1016/j.inffus.2018.04.003

Xiao, F.: Multi-sensor data fusion based on the belief divergence measure of evidences and the belief entropy. Information Fusion46, 23–32 (2019) https://doi.org/10.1016/j.inffus.2018.04.003

work page doi:10.1016/j.inffus.2018.04.003 2019

[55] [55]

In: Classic Works of the Dempster-Shafer Theory of Belief Functions, pp

Dempster, A.P.: Upper and lower probabilities induced by a multivalued mapping. In: Classic Works of the Dempster-Shafer Theory of Belief Functions, pp. 57–72. Springer, ??? (2008). https://doi.org/ 10.1007/978-3-540-44792-4 3

work page doi:10.1007/978-3-540-44792-4 2008

[56] [56]

PhD thesis, University of Oxford (2014)

MacAskill, W.: Normative uncertainty. PhD thesis, University of Oxford (2014)

work page 2014

[57] [57]

Social Cognitive and Affective Neuroscience17(3), 253–265 (2021) https://doi.org/10.1093/scan/nsab100

Ugazio, G., Grueschow, M., Polania, R., Lamm, C., Tobler, P., Ruff, C.: Neuro-computational foun- dations of moral preferences. Social Cognitive and Affective Neuroscience17(3), 253–265 (2021) https://doi.org/10.1093/scan/nsab100

work page doi:10.1093/scan/nsab100 2021

[58] [58]

In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A

Wei, J., Wang, X., Schuurmans, D., Bosma, M., ichter, b., Xia, F., Chi, E., Le, Q.V., Zhou, D.: Chain- of-thought prompting elicits reasoning in large language models. In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) Advances in Neural Information Processing Systems, vol. 35, pp. 24824–24837 (2022)

work page 2022

[59] [59]

Bellman, R.: A markovian decision process. J. Math. Mech.6, 679–684 (1957)

work page 1957

[60] [60]

MIT press, ??? (2018)

Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT press, ??? (2018). https: //doi.org/10.1109/tnn.1998.712192

work page doi:10.1109/tnn.1998.712192 2018

[61] [61]

Proximal Policy Optimization Algorithms

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal Policy Optimization Algorithms (2017). https://doi.org/10.48550/arxiv.1707.06347

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1707.06347 2017

[62] [62]

Journal of Machine Learning Research23(274), 1–18 (2022)

Huang, S., Dossa, R.F.J., Ye, C., Braga, J., Chakraborty, D., Mehta, K., Ara´ ujo, J.G.M.: Cleanrl: High-quality single-file implementations of deep reinforcement learning algorithms. Journal of Machine Learning Research23(274), 1–18 (2022)

work page 2022

[63] [63]

In: Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society

Butlin, P.: AI alignment and human reward. In: Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society. AIES ’21, pp. 437–445. ACM, ??? (2021). https://doi.org/10.1145/3461702. 3462570

work page doi:10.1145/3461702 2021

[64] [64]

In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R

Christiano, P.F., Leike, J., Brown, T., Martic, M., Legg, S., Amodei, D.: Deep reinforcement learn- ing from human preferences. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 30 (2017)

work page 2017

[65] [65]

Hugging Face Blog (2022)

Lambert, N., Castricato, L., Werra, L., Havrilla, A.: Illustrating reinforcement learning from human feedback (rlhf). Hugging Face Blog (2022). https://huggingface.co/blog/rlhf

work page 2022

[66] [66]

Proceedings of the AAAI Conference on Artificial Intelligence32(1) (2018) https://doi.org/10.1609/ aaai.v32i1.11498

Wu, Y.-H., Lin, S.-D.: A low-cost ethics shaping approach for designing reinforcement learning agents. Proceedings of the AAAI Conference on Artificial Intelligence32(1) (2018) https://doi.org/10.1609/ aaai.v32i1.11498

work page 2018

[67] [67]

Scientific Reports9(1) (2019) https://doi.org/10.1038/ s41598-019-49411-7

Frank, D.-A., Chrysochou, P., Mitkidis, P., Ariely, D.: Human decision-making biases in the moral dilemmas of autonomous vehicles. Scientific Reports9(1) (2019) https://doi.org/10.1038/ s41598-019-49411-7

work page 2019

[68] [68]

arXiv (2024)

White, C., Dooley, S., Roberts, M., Pal, A., Feuer, B., Jain, S., Shwartz-Ziv, R., Jain, N., Saifullah, K., Naidu, S., Hegde, C., LeCun, Y., Goldstein, T., Neiswanger, W., Goldblum, M.: LiveBench: A Challenging, Contamination-Free LLM Benchmark. arXiv (2024). https://doi.org/10.48550/arxiv.2406. 19314 17

work page doi:10.48550/arxiv.2406 2024

[69] [69]

Helbing, D.: Summary: What’s wrong with AI? Humanistic technology needed! Next Civilization: Digital Democracy and Socio-Ecological Finance-How to Avoid Dystopia and Upgrade Society by Digital Means, 285–313 (2021)

work page 2021

[70] [70]

Helbing, D., Ienca, M.: Why converging technologies need converging international regulation. Ethics and Information Technology26(1), 15 (2024) https://doi.org/10.1007/s10676-024-09756-8 18 Appendix A LLM Prompts Throughout our simulations, the moral agent is embodied by a large language model (LLM) interacting with the simulation environment. These inter...

work page doi:10.1007/s10676-024-09756-8 2024

[71] [71]

BJS 1k

BJS 1j . . .BJS 1k ... ... ... ... ... BJS i1 . . .0. . .BJS ik ... ... ... ... ... BJS k1 . . .BJS kj . . .0   (C2) The belief divergence as a distance measure quantifies the level of consistency across evidence from different sources. This measure of consistency allows for the identification of sources that are in alignment versus those that are...

work page