pith. sign in

arxiv: 2503.05724 · v2 · submitted 2025-02-17 · 💻 cs.CY · cs.AI

Addressing Moral Uncertainty using Large Language Models for Ethical Decision-Making

Pith reviewed 2026-05-23 02:57 UTC · model grok-4.3

classification 💻 cs.CY cs.AI
keywords ethical decision-makingreinforcement learninglarge language modelsmoral uncertaintybelief aggregationDempster-Shafer Theoryshaping rewardsmulti-principle ethics
0
0 comments X

The pith

Reinforcement learning agents can be ethically refined by replacing human feedback with belief values from a large language model embodying five moral principles, aggregated into shaping rewards.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a framework that adds a task-agnostic ethical layer to a pre-trained reinforcement learning model. In this layer an LLM assigns belief values to actions by embodying consequentialist, deontological, virtue, social justice, and care ethics; these values are then combined with Belief Jensen-Shannon Divergence and Dempster-Shafer Theory to produce probability scores that double as shaping rewards. The resulting fine-tuning steers the agent toward balanced ethical choices in environments where moral requirements are uncertain or change. A reader would care because the method removes the need for scenario-specific handcrafted rewards while still producing decisions that reflect multiple ethical perspectives simultaneously.

Core claim

The central claim is that an ethical layer which uses an LLM to generate belief values from five moral principles, aggregates those values via Belief Jensen-Shannon Divergence and Dempster-Shafer Theory into probability scores, and feeds the scores back as shaping rewards, enables a reinforcement learning agent to navigate moral uncertainty and produce morally sound decisions across diverse tasks while lowering dependence on manually designed ethical rewards.

What carries the argument

The ethical layer that aggregates LLM-generated belief scores from five moral perspectives using Belief Jensen-Shannon Divergence and Dempster-Shafer Theory to form probability scores that serve as shaping rewards.

If this is right

  • The framework produces improved consistency and adaptability compared with other belief aggregation methods.
  • It reduces reliance on handcrafted ethical rewards for each new task.
  • It remains effective when ethical challenges appear unexpectedly in dynamic environments.
  • It works across multiple LLM variants and yields decisions suited to real-world applications.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same aggregation step could be inserted into other learning loops, such as fine-tuning language models directly on ethical preferences.
  • If the underlying LLM systematically under-represents one of the five principles, the resulting reward signal would embed that imbalance across all tasks.
  • Deployment in high-stakes domains such as autonomous vehicles would require an additional verification layer that checks whether the aggregate score actually matches human expert judgments on edge cases.

Load-bearing premise

The large language model can accurately and unbiasedly embody and apply the five specified moral principles to assign belief values to actions.

What would settle it

A controlled test in which the agent faces an action that one moral principle strongly endorses while the aggregated score strongly opposes it; if the agent reliably follows the aggregate score even when individual principles conflict, the claim holds; systematic deviation would falsify it.

read the original abstract

We present an ethical decision-making framework that refines a pre-trained reinforcement learning (RL) model using a task-agnostic ethical layer. Following initial training, the RL model undergoes ethical fine-tuning, where human feedback is replaced by feedback generated from a large language model (LLM). The LLM embodies consequentialist, deontological, virtue, social justice, and care ethics as moral principles to assign belief values to recommended actions during ethical decision-making. An ethical layer aggregates belief scores from multiple LLM-derived moral perspectives using Belief Jensen-Shannon Divergence and Dempster-Shafer Theory into probability scores that also serve as the shaping reward, steering the agent toward choices that align with a balanced ethical framework. This integrated learning framework helps the RL agent navigate moral uncertainty in complex environments and enables it to make morally sound decisions across diverse tasks. Our approach, tested across different LLM variants and compared with other belief aggregation techniques, demonstrates improved consistency, adaptability, and reduced reliance on handcrafted ethical rewards. This method is especially effective in dynamic scenarios where ethical challenges arise unexpectedly, making it well-suited for real-world applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript presents a framework for ethical decision-making in reinforcement learning agents. It uses a large language model to generate feedback based on five moral principles—consequentialist, deontological, virtue, social justice, and care ethics—to assign belief values to actions. These are aggregated using Belief Jensen-Shannon Divergence and Dempster-Shafer Theory to create probability scores that serve as shaping rewards for the RL agent, aiming to handle moral uncertainty without human feedback or handcrafted rewards. The approach is tested across LLM variants and compared to other aggregation techniques, claiming improved consistency and adaptability.

Significance. If the empirical claims hold and external validation is added, the work could contribute to scalable methods for incorporating multiple ethical perspectives into AI systems, addressing moral uncertainty in dynamic environments. The use of established aggregation techniques like DST is a positive element. However, without human benchmarks, the significance for producing 'morally sound decisions' remains limited.

major comments (3)
  1. [Abstract] Abstract: The assertion that the method 'demonstrates improved consistency, adaptability, and reduced reliance on handcrafted ethical rewards' after testing across LLM variants is unsupported by any reported data, baselines, quantitative metrics, or statistical comparisons, which is load-bearing for the central claim.
  2. [Experiments / Evaluation section] Experiments / Evaluation section: No comparisons of LLM-assigned belief values to human ethical judgments on the same action sets or established moral dilemma benchmarks (e.g., trolley problems or standard ethics datasets) are provided. This directly undermines the claim that the framework enables 'morally sound decisions,' as consistency among LLMs may capture shared model biases rather than independent moral validity.
  3. [Framework section] Framework section: The core assumption that the LLM can accurately and unbiasedly 'embody' the five moral principles to assign belief values is not validated against external human standards or inter-rater reliability measures, which is load-bearing for the claim that the aggregated rewards steer the agent toward ethical choices.
minor comments (2)
  1. [Framework section] The description of how Belief Jensen-Shannon Divergence is computed from the LLM outputs could include an explicit equation or pseudocode for reproducibility.
  2. [Framework section] Clarify whether the 'ethical fine-tuning layer' modifies the RL policy directly or only the reward function, as this affects interpretation of the results.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments, which highlight important areas for clarification and improvement in our work on ethical decision-making for RL agents. We address each major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The assertion that the method 'demonstrates improved consistency, adaptability, and reduced reliance on handcrafted ethical rewards' after testing across LLM variants is unsupported by any reported data, baselines, quantitative metrics, or statistical comparisons, which is load-bearing for the central claim.

    Authors: We agree with this observation. The experiments section describes testing across LLM variants and comparisons to other aggregation techniques, but does not include specific quantitative metrics or statistical comparisons to support claims of 'improved consistency' and 'adaptability'. We will revise the abstract to remove these unsupported assertions and instead describe the experimental setup more accurately. revision: yes

  2. Referee: [Experiments / Evaluation section] Experiments / Evaluation section: No comparisons of LLM-assigned belief values to human ethical judgments on the same action sets or established moral dilemma benchmarks (e.g., trolley problems or standard ethics datasets) are provided. This directly undermines the claim that the framework enables 'morally sound decisions,' as consistency among LLMs may capture shared model biases rather than independent moral validity.

    Authors: This is a valid criticism. Our framework is intended to address moral uncertainty by aggregating diverse LLM perspectives without relying on human feedback, which is a key contribution. However, we recognize that without external human validation, claims of moral soundness are limited. We will add a dedicated limitations paragraph in the discussion section to explicitly address this point, noting that LLM agreement may reflect training data biases and recommending human studies on standard benchmarks as future work. revision: partial

  3. Referee: [Framework section] Framework section: The core assumption that the LLM can accurately and unbiasedly 'embody' the five moral principles to assign belief values is not validated against external human standards or inter-rater reliability measures, which is load-bearing for the claim that the aggregated rewards steer the agent toward ethical choices.

    Authors: We acknowledge that the manuscript does not provide validation of the LLM's embodiment of moral principles against human standards. The approach relies on the capability of LLMs to simulate different ethical frameworks through prompting, as described in the framework section. To address this, we will revise the framework section to more clearly articulate this assumption and discuss potential biases, while emphasizing that the aggregation via Belief JSD and DST is meant to mitigate individual perspective biases. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation relies on external LLM assumptions and standard aggregation methods

full rationale

The paper proposes replacing human feedback with LLM-generated belief values from five fixed moral principles, then aggregates them via Belief Jensen-Shannon Divergence and Dempster-Shafer Theory to produce shaping rewards. No equations or steps define a quantity in terms of itself, rename a fitted parameter as a prediction, or reduce the central claim to a self-citation chain. The framework is self-contained against its stated inputs; any weakness lies in unvalidated external assumptions about LLM moral accuracy rather than internal circularity.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The approach depends on the effectiveness of LLM in ethical reasoning and the validity of the aggregation techniques for producing balanced ethical guidance.

free parameters (1)
  • Belief values from LLM
    Generated by the LLM for each moral perspective; specific computation details not provided in abstract.
axioms (1)
  • domain assumption LLMs can reliably simulate multiple ethical frameworks without introducing significant bias
    Central to replacing human feedback with LLM feedback.
invented entities (1)
  • Ethical fine-tuning layer no independent evidence
    purpose: Aggregates moral beliefs into shaping rewards for RL
    New component introduced in the framework.

pith-pipeline@v0.9.0 · 5726 in / 1337 out tokens · 48170 ms · 2026-05-23T02:57:53.872429+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

71 extracted references · 71 canonical work pages · 3 internal anchors

  1. [1]

    Nature computational science4(9), 629–632 (2024) https://doi.org/10.1038/s43588-024-00695-4

    Chen, S.: The lost data: how AI systems censor LGBTQ+ content in the name of safety. Nature computational science4(9), 629–632 (2024) https://doi.org/10.1038/s43588-024-00695-4

  2. [2]

    Journal of Experimental & Theoretical Artificial Intelligence12(3), 251–261 (2000)

    Allen, C., Varner, G., Zinser, J.: Prolegomena to any future artificial moral agent. Journal of Experimental & Theoretical Artificial Intelligence12(3), 251–261 (2000)

  3. [3]

    Wallach, W., Allen, C.: Moral Machines: Teaching Robots Right from Wrong, (2008)

  4. [4]

    IEEE Intelligent Systems21(4), 38–44 (2006) https://doi.org/10.1109/mis.2006.82

    Bringsjord, S., Arkoudas, K., Bello, P.: Toward a general logicist methodology for engineering ethically correct robots. IEEE Intelligent Systems21(4), 38–44 (2006) https://doi.org/10.1109/mis.2006.82

  5. [5]

    AI magazine36(4), 105–114 (2015)

    Russell, S., Dewey, D., Tegmark, M.: Research priorities for robust and beneficial artificial intelligence. AI magazine36(4), 105–114 (2015)

  6. [6]

    arXiv (2020)

    Ecoffet, A., Lehman, J.: Reinforcement Learning Under Moral Uncertainty. arXiv (2020). https://doi. org/10.48550/arxiv.2006.04734

  7. [7]

    Oxford Univer- sity Press (2018).https://doi.org/10.1093/oso/9780198814788.001.0001

    MacAskill, M., Bykvist, K., Ord, T.: Moral Uncertainty. Oxford University Press, ??? (2020). https: //doi.org/10.1093/oso/9780198722274.001.0001

  8. [8]

    Edward Elgar Publishing, ??? (2019)

    L¨ utge, C.: The Ethics of Competition: How a Competitive Society Is Good for All. Edward Elgar Publishing, ??? (2019)

  9. [9]

    A Formalization of Kant's Second Formulation of the Categorical Imperative

    Lindner, F., Bentzen, M.M.: A formalization of kant’s second formulation of the categorical imperative. arXiv preprint arXiv:1801.03160 (2018)

  10. [10]

    In: Workshops at the Thirtieth AAAI Conference on Artificial Intelligence (2016)

    Abel, D., MacGlashan, J., Littman, M.L.: Reinforcement learning as a framework for ethical decision making. In: Workshops at the Thirtieth AAAI Conference on Artificial Intelligence (2016)

  11. [11]

    Dragan, Pieter Abbeel, and Stuart Russell

    Hadfield-Menell, D., Dragan, A., Abbeel, P., Russell, S.: The off-switch game. In: Workshops at the Thirty-First AAAI Conference on Artificial Intelligence, pp. 220–227. International Joint Conferences on Artificial Intelligence Organization, ??? (2017). https://doi.org/10.24963/ijcai.2017/32

  12. [12]

    In: ICML, vol

    Ng, A.Y., Russell, S.,et al.: Algorithms for inverse reinforcement learning. In: ICML, vol. 1, p. 2 (2000) 14

  13. [13]

    In: AAAI, vol

    Ziebart, B.D., Maas, A.L., Bagnell, J.A., Dey, A.K.,et al.: Maximum entropy inverse reinforcement learning. In: AAAI, vol. 8, pp. 1433–1438 (2008). Chicago, IL, USA

  14. [14]

    Ethics and Information Technology18, 243–256 (2016) https://doi.org/10.1007/ s10676-015-9367-8

    Malle, B.F.: Integrating robot ethics and machine morality: the study and design of moral com- petence in robots. Ethics and Information Technology18, 243–256 (2016) https://doi.org/10.1007/ s10676-015-9367-8

  15. [15]

    Robotics and Autonomous Systems77, 1–14 (2016) https://doi.org/10.1016/j.robot.2015.11

    Dennis, L., Fisher, M., Slavkovik, M., Webster, M.: Formal verification of ethical choices in autonomous systems. Robotics and Autonomous Systems77, 1–14 (2016) https://doi.org/10.1016/j.robot.2015.11. 012

  16. [16]

    Privacy-preserving and unforgeable searchable encrypted audit logs for cloud storage,

    Sio, F., Hoven, J.: Meaningful human control over autonomous systems: A philosophical account. Frontiers in Robotics and AI5, 15 (2018) https://doi.org/10.3389/frobt.2018.00015

  17. [17]

    Cummings, M.M.: Man versus machine or man+ machine? IEEE Intelligent Systems29(5), 62–69 (2014) https://doi.org/10.1109/mis.2014.87

  18. [18]

    Nature Computational Science4(9), 641–643 (2024) https://doi.org/10.1038/s43588-024-00694-5

    Suri, S.: Defining our future with generative AI. Nature Computational Science4(9), 641–643 (2024) https://doi.org/10.1038/s43588-024-00694-5

  19. [19]

    Nature621(7980), 672–675 (2023) https://doi.org/10.1038/d41586-023-02980-0

    Van Noorden, R., Perkel, J.M.: Ai and science: what 1,600 researchers think. Nature621(7980), 672–675 (2023) https://doi.org/10.1038/d41586-023-02980-0

  20. [20]

    Generative Agents: Interactive Simulacra of Human Behavior

    Park, J.S., O’Brien, J.C., Cai, C.J., Morris, M.R., Liang, P., Bernstein, M.S.: Generative Agents: Interactive Simulacra of Human Behavior. arXiv (2023). https://doi.org/10.48550/arxiv.2304.03442

  21. [21]

    Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society 7, 1696–1708 (2024) https://doi.org/10.1609/aies.v7i1.31758

    Yang, J.C., Dailisan, D., Korecki, M., Hausladen, C.I., Helbing, D.: LLM voting: Human choices and AI collective decision making. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society 7, 1696–1708 (2024) https://doi.org/10.1609/aies.v7i1.31758

  22. [22]

    Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences382(2285) (2024) https://doi.org/10.1098/rsta.2024.0100

    Gudi˜ no-Rosero, J., Grandi, U., Hidalgo, C.A.: Large language models (LLMs) as agents for augmented democracy. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences382(2285) (2024) https://doi.org/10.1098/rsta.2024.0100

  23. [23]

    Frontiers Comput

    Wang, L., Ma, C., Feng, X., Zhang, Z., Yang, H., Zhang, J., Chen, Z., Tang, J., Chen, X., Lin, Y., Zhao, W.X., Wei, Z., Wen, J.: A survey on large language model based autonomous agents. Frontiers of Computer Science18(6) (2024) https://doi.org/10.1007/s11704-024-40231-1

  24. [24]

    In: Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S

    Scherrer, N., Shi, C., Feder, A., Blei, D.: Evaluating the moral beliefs encoded in LLMs. In: Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S. (eds.) Advances in Neural Information Processing Systems, vol. 36, pp. 51778–51809 (2023)

  25. [25]

    arXiv (2024)

    Garcia, B., Qian, C., Palminteri, S.: The Moral Turing Test: Evaluating Human-LLM Alignment in Moral Decision-Making. arXiv (2024). https://doi.org/10.48550/arxiv.2410.07304

  26. [26]

    Penguin UK, ??? (2017)

    Sen, A.: Collective Choice and Social Welfare: Expanded Edition. Penguin UK, ??? (2017)

  27. [27]

    Nature Computational Science3(10), 833–838 (2023) https://doi.org/10.1038/s43588-023-00527-x

    Hagendorff, T., Fabi, S., Kosinski, M.: Human-like intuitive behavior and reasoning biases emerged in large language models but disappeared in ChatGPT. Nature Computational Science3(10), 833–838 (2023) https://doi.org/10.1038/s43588-023-00527-x

  28. [28]

    Machine ethics1, 244–253 (2011) https://doi.org/10.1017/ cbo9780511978036.019

    Gips, J.: Towards the ethical robot. Machine ethics1, 244–253 (2011) https://doi.org/10.1017/ cbo9780511978036.019

  29. [29]

    arXiv preprint arXiv:2406.18841 (2024)

    Jiao, J., Afroogh, S., Xu, Y., Phillips, C.: Navigating LLM ethics: Advancements, challenges, and future directions. arXiv preprint arXiv:2406.18841 (2024)

  30. [30]

    Science and Engineering Ethics 31(1), 1–13 (2025)

    Coeckelbergh, M.: LLMs, truth, and democracy: An overview of risks. Science and Engineering Ethics 31(1), 1–13 (2025)

  31. [31]

    arXiv preprint arXiv:2411.00784 (2024) 15

    Xie, Z., Xing, R., Wang, Y., Geng, J., Iqbal, H., Sahnan, D., Gurevych, I., Nakov, P.: Fire: Fact-checking with iterative retrieval and verification. arXiv preprint arXiv:2411.00784 (2024) 15

  32. [32]

    In: Proceedings of the 34th International Conference on Machine Learning, pp

    Achiam, J., Held, D., Tamar, A., Abbeel, P.: Constrained policy optimization. In: Proceedings of the 34th International Conference on Machine Learning, pp. 22–31 (2017). PMLR

  33. [33]

    In: Advances in Neural Information Processing Systems, pp

    Tamar, A., Chow, Y., Ghavamzadeh, M., Mannor, S.: Policy gradients with variance related risk criteria. In: Advances in Neural Information Processing Systems, pp. 167–175 (2015)

  34. [34]

    In: Proceedings of the 35th International Conference on Machine Learning, pp

    Chow, Y., Nachum, O., Du´ enez-Guzm´ an, E., Ghavamzadeh, M.: Lyapunov-based safe policy optimiza- tion for continuous control. In: Proceedings of the 35th International Conference on Machine Learning, pp. 809–818 (2018). PMLR

  35. [35]

    In: Proceedings of the AAAI Conference on Artificial Intelligence, vol

    Alshiekh, M., Bloem, R., Ehlers, R., Koch, M., K¨ onighofer, B., Niekum, S., Topcu, U.: Safe reinforce- ment learning via shielding. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)

  36. [36]

    In: Advances in Neural Information Processing Systems, pp

    Berkenkamp, F., Turchetta, M., Schoellig, A.P., Krause, A.: Safe model-based reinforcement learning with stability guarantees. In: Advances in Neural Information Processing Systems, pp. 908–918 (2017)

  37. [37]

    Neural Computation17(2), 335–359 (2005)

    Morimoto, J., Doya, K.: Robust reinforcement learning. Neural Computation17(2), 335–359 (2005)

  38. [38]

    Journal of Machine Learning Research16(1), 1437–1480 (2015)

    Garc´ ıa, J., Fern´ andez, F.: A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research16(1), 1437–1480 (2015)

  39. [39]

    arXiv preprint arXiv:2205.10330 (2022)

    Gu, S., Yang, L., Du, Y., Chen, G., Walter, F., Wang, J., Knoll, A.: A review of safe reinforcement learning: Methods, theory and applications. arXiv preprint arXiv:2205.10330 (2022)

  40. [40]

    International journal of information management48, 63–71 (2019) https://doi.org/10.1016/j.ijinfomgt.2019.01.021

    Duan, Y., Edwards, J.S., Dwivedi, Y.K.: Artificial intelligence for decision making in the era of big data–evolution, challenges and research agenda. International journal of information management48, 63–71 (2019) https://doi.org/10.1016/j.ijinfomgt.2019.01.021

  41. [41]

    AI & SOCIETY, 1–8 (2024) https://doi.org/10.1007/s00146-024-01968-2

    Mahajan, S.: The executioner paradox: understanding self-referential dilemma in computational systems. AI & SOCIETY, 1–8 (2024) https://doi.org/10.1007/s00146-024-01968-2

  42. [42]

    ACM Computing Surveys (CSUR)53(6), 1–38 (2020) https://doi.org/10.1145/3419633

    Tolmeijer, S., Kneer, M., Sarasua, C., Christen, M., Bernstein, A.: Implementations in machine ethics: A survey. ACM Computing Surveys (CSUR)53(6), 1–38 (2020) https://doi.org/10.1145/3419633

  43. [43]

    Nature Machine Intelligence4(3), 258–268 (2022)

    Schramowski, P., Turan, C., Andersen, N., Rothkopf, C.A., Kersting, K.: Large pre-trained language models contain human-like biases of what is right and wrong to do. Nature Machine Intelligence4(3), 258–268 (2022)

  44. [44]

    Cambridge University Press, ??? (2011)

    Anderson, M., Anderson, S.L.: Machine Ethics. Cambridge University Press, ??? (2011). https://doi. org/10.1017/cbo9780511978036

  45. [45]

    IEEE Intelligent Systems21(4), 46–51 (2006)

    Powers, T.M.: Prospects for a kantian machine. IEEE Intelligent Systems21(4), 46–51 (2006)

  46. [46]

    In: Zalta, E.N., Nodelman, U

    Alexander, L., Moore, M.: Deontological Ethics. In: Zalta, E.N., Nodelman, U. (eds.) The Stanford Encyclopedia of Philosophy, Winter 2024 edn. Metaphysics Research Lab, Stanford University, ??? (2024)

  47. [47]

    Journal of social philosophy45(1), 89–106 (2014)

    Gilligan, C.: Moral injury and the ethic of care: Reframing the conversation about differences. Journal of social philosophy45(1), 89–106 (2014)

  48. [48]

    Oxford University Press (2007)

    Pogge, T.: John Rawls: His Life and Theory of Justice. Oxford University Press (2007). https://doi. org/10.5860/choice.45-1128

  49. [49]

    Ethics and Information Technology20(1), 1–3 (2018) https://doi.org/10.1007/s10676-018-9450-z

    Dignum, V.: Ethics in artificial intelligence: introduction to the special issue. Ethics and Information Technology20(1), 1–3 (2018) https://doi.org/10.1007/s10676-018-9450-z

  50. [50]

    In: Proceedings of the AAAI Conference on Artificial Intelli- gence, vol

    Conitzer, V., Sinnott-Armstrong, W., Borg, J.S., Deng, Y., Kramer, M.: Moral decision making frameworks for artificial intelligence. In: Proceedings of the AAAI Conference on Artificial Intelli- gence, vol. 31. Association for the Advancement of Artificial Intelligence (AAAI), ??? (2017). https: //doi.org/10.1609/aaai.v31i1.11140

  51. [51]

    Artificial Intelligence281, 103239 16 (2020)

    Bench-Capon, T.J.: Ethical approaches and autonomous systems. Artificial Intelligence281, 103239 16 (2020)

  52. [52]

    Soft Computing 27(22), 16483–16492 (2023) https://doi.org/10.1007/s00500-023-09112-w

    Chen, X., Deng, Y.: A novel combination rule for conflict management in data fusion. Soft Computing 27(22), 16483–16492 (2023) https://doi.org/10.1007/s00500-023-09112-w

  53. [53]

    Applied Soft Computing124, 109075 (2022) https://doi.org/10.1016/j.asoc.2022.109075

    Zhao, K., Li, L., Chen, Z., Sun, R., Yuan, G., Li, J.: A survey: Optimization and applications of evidence fusion algorithm based on Dempster–Shafer theory. Applied Soft Computing124, 109075 (2022) https://doi.org/10.1016/j.asoc.2022.109075

  54. [54]

    Information Fusion46, 23–32 (2019) https://doi.org/10.1016/j.inffus.2018.04.003

    Xiao, F.: Multi-sensor data fusion based on the belief divergence measure of evidences and the belief entropy. Information Fusion46, 23–32 (2019) https://doi.org/10.1016/j.inffus.2018.04.003

  55. [55]

    In: Classic Works of the Dempster-Shafer Theory of Belief Functions, pp

    Dempster, A.P.: Upper and lower probabilities induced by a multivalued mapping. In: Classic Works of the Dempster-Shafer Theory of Belief Functions, pp. 57–72. Springer, ??? (2008). https://doi.org/ 10.1007/978-3-540-44792-4 3

  56. [56]

    PhD thesis, University of Oxford (2014)

    MacAskill, W.: Normative uncertainty. PhD thesis, University of Oxford (2014)

  57. [57]

    Social Cognitive and Affective Neuroscience17(3), 253–265 (2021) https://doi.org/10.1093/scan/nsab100

    Ugazio, G., Grueschow, M., Polania, R., Lamm, C., Tobler, P., Ruff, C.: Neuro-computational foun- dations of moral preferences. Social Cognitive and Affective Neuroscience17(3), 253–265 (2021) https://doi.org/10.1093/scan/nsab100

  58. [58]

    In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A

    Wei, J., Wang, X., Schuurmans, D., Bosma, M., ichter, b., Xia, F., Chi, E., Le, Q.V., Zhou, D.: Chain- of-thought prompting elicits reasoning in large language models. In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) Advances in Neural Information Processing Systems, vol. 35, pp. 24824–24837 (2022)

  59. [59]

    Bellman, R.: A markovian decision process. J. Math. Mech.6, 679–684 (1957)

  60. [60]

    MIT press, ??? (2018)

    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT press, ??? (2018). https: //doi.org/10.1109/tnn.1998.712192

  61. [61]

    Proximal Policy Optimization Algorithms

    Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal Policy Optimization Algorithms (2017). https://doi.org/10.48550/arxiv.1707.06347

  62. [62]

    Journal of Machine Learning Research23(274), 1–18 (2022)

    Huang, S., Dossa, R.F.J., Ye, C., Braga, J., Chakraborty, D., Mehta, K., Ara´ ujo, J.G.M.: Cleanrl: High-quality single-file implementations of deep reinforcement learning algorithms. Journal of Machine Learning Research23(274), 1–18 (2022)

  63. [63]

    In: Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society

    Butlin, P.: AI alignment and human reward. In: Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society. AIES ’21, pp. 437–445. ACM, ??? (2021). https://doi.org/10.1145/3461702. 3462570

  64. [64]

    In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R

    Christiano, P.F., Leike, J., Brown, T., Martic, M., Legg, S., Amodei, D.: Deep reinforcement learn- ing from human preferences. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 30 (2017)

  65. [65]

    Hugging Face Blog (2022)

    Lambert, N., Castricato, L., Werra, L., Havrilla, A.: Illustrating reinforcement learning from human feedback (rlhf). Hugging Face Blog (2022). https://huggingface.co/blog/rlhf

  66. [66]

    Proceedings of the AAAI Conference on Artificial Intelligence32(1) (2018) https://doi.org/10.1609/ aaai.v32i1.11498

    Wu, Y.-H., Lin, S.-D.: A low-cost ethics shaping approach for designing reinforcement learning agents. Proceedings of the AAAI Conference on Artificial Intelligence32(1) (2018) https://doi.org/10.1609/ aaai.v32i1.11498

  67. [67]

    Scientific Reports9(1) (2019) https://doi.org/10.1038/ s41598-019-49411-7

    Frank, D.-A., Chrysochou, P., Mitkidis, P., Ariely, D.: Human decision-making biases in the moral dilemmas of autonomous vehicles. Scientific Reports9(1) (2019) https://doi.org/10.1038/ s41598-019-49411-7

  68. [68]

    arXiv (2024)

    White, C., Dooley, S., Roberts, M., Pal, A., Feuer, B., Jain, S., Shwartz-Ziv, R., Jain, N., Saifullah, K., Naidu, S., Hegde, C., LeCun, Y., Goldstein, T., Neiswanger, W., Goldblum, M.: LiveBench: A Challenging, Contamination-Free LLM Benchmark. arXiv (2024). https://doi.org/10.48550/arxiv.2406. 19314 17

  69. [69]

    Helbing, D.: Summary: What’s wrong with AI? Humanistic technology needed! Next Civilization: Digital Democracy and Socio-Ecological Finance-How to Avoid Dystopia and Upgrade Society by Digital Means, 285–313 (2021)

  70. [70]

    Helbing, D., Ienca, M.: Why converging technologies need converging international regulation. Ethics and Information Technology26(1), 15 (2024) https://doi.org/10.1007/s10676-024-09756-8 18 Appendix A LLM Prompts Throughout our simulations, the moral agent is embodied by a large language model (LLM) interacting with the simulation environment. These inter...

  71. [71]

    BJS 1k

    BJS 1j . . .BJS 1k ... ... ... ... ... BJS i1 . . .0. . .BJS ik ... ... ... ... ... BJS k1 . . .BJS kj . . .0   (C2) The belief divergence as a distance measure quantifies the level of consistency across evidence from different sources. This measure of consistency allows for the identification of sources that are in alignment versus those that are...