pith. sign in

arxiv: 1907.00430 · v1 · pith:HYSUT4ZLnew · submitted 2019-06-30 · 💻 cs.AI

Requisite Variety in Ethical Utility Functions for AI Value Alignment

Pith reviewed 2026-05-25 12:13 UTC · model grok-4.3

classification 💻 cs.AI
keywords AI value alignmentethical utility functionsrequisite varietyneurosciencepsychologyaugmented utilitarianismAI safetymoral judgment modeling
0
0 comments X

The pith

AI utility functions must model the full variety of human moral judgments using scientific neuroscience and psychology insights.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that an AI utility function will violate human ethical intuitions unless it accurately models those intuitions and preserves their variety. Because moral judgments arise from brain processes in biological humans, the most faithful models are scientific ones drawn from neuroscience and psychology. The authors perform a transdisciplinary analysis under a security mindset to summarize relevant variety knowledge from those fields, connect it to augmented utilitarianism, and derive initial practical guidelines for approximate ethical goal functions. This is presented as a necessary step toward value alignment that respects the complexity of actual human morality rather than simplified abstractions.

Core claim

For the utility function of an AI not to violate human ethical intuitions, it trivially has to be a model of these intuitions and reflect their variety, whereby the most accurate models pertaining to human entities being biological organisms equipped with a brain constructing concepts like moral judgements, are scientific models.

What carries the argument

Requisite variety requirement for ethical utility functions, implemented by grounding them in scientific models of human brain-based moral judgment construction.

If this is right

  • Approximate ethical goal functions can be designed using insights from neuroscience and psychology to better capture the variety of human moral judgements.
  • Augmented utilitarianism provides a suitable ethical framework for linking these scientific models to AI design.
  • A security-mindset transdisciplinary analysis helps identify the variety-relevant features of human morality that utility functions must include.
  • Future work must address challenges in implementing these models without loss of requisite variety.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Purely philosophical value-alignment methods would need empirical supplementation from brain sciences to meet the variety criterion.
  • Utility functions built this way would likely require periodic revision as neuroscience refines its models of moral cognition.
  • Empirical tests could compare how well different utility functions predict or avoid violating human judgments in real-world dilemmas.

Load-bearing premise

Scientific models from neuroscience and psychology can be translated into practical AI utility functions that preserve the full variety of human moral judgments without introducing new violations or oversimplifications.

What would settle it

A controlled comparison in which a utility function built from abstract philosophical ethics matches or exceeds the alignment performance of one derived from neuroscience and psychology models when tested against human moral intuitions across varied scenarios.

Figures

Figures reproduced from arXiv: 1907.00430 by Leon Kester, Nadisha-Marie Aliman.

Figure 1
Figure 1. Figure 1: Intuitive illustration for the Law of Requisite Variety. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
read the original abstract

Being a complex subject of major importance in AI Safety research, value alignment has been studied from various perspectives in the last years. However, no final consensus on the design of ethical utility functions facilitating AI value alignment has been achieved yet. Given the urgency to identify systematic solutions, we postulate that it might be useful to start with the simple fact that for the utility function of an AI not to violate human ethical intuitions, it trivially has to be a model of these intuitions and reflect their variety $ - $ whereby the most accurate models pertaining to human entities being biological organisms equipped with a brain constructing concepts like moral judgements, are scientific models. Thus, in order to better assess the variety of human morality, we perform a transdisciplinary analysis applying a security mindset to the issue and summarizing variety-relevant background knowledge from neuroscience and psychology. We complement this information by linking it to augmented utilitarianism as a suitable ethical framework. Based on that, we propose first practical guidelines for the design of approximate ethical goal functions that might better capture the variety of human moral judgements. Finally, we conclude and address future possible challenges.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that AI ethical utility functions must model the variety of human moral intuitions to avoid violations, and that the most accurate such models come from neuroscience and psychology. It performs a transdisciplinary analysis (with a security mindset) of variety-relevant background knowledge, links the findings to augmented utilitarianism, and derives high-level practical guidelines for approximate ethical goal functions that better capture moral variety.

Significance. If the translation step from neuro/psych models to utility functions can be made explicit and shown to preserve variety, the work would supply a biologically grounded route to value alignment that explicitly addresses moral heterogeneity; the security-mindset framing and call for requisite variety are useful conceptual contributions even if the mapping remains schematic.

major comments (2)
  1. [Abstract / transdisciplinary analysis paragraph] Abstract and the paragraph on transdisciplinary analysis: the central claim that scientific models of moral-judgement construction can be turned into utility functions while preserving requisite variety (without new violations or lossy simplifications) is asserted but never accompanied by an explicit mapping, example derivation, or check that any model element survives translation into a mathematical term.
  2. [Section on practical guidelines] Section proposing practical guidelines for approximate goal functions: the guidelines remain high-level summaries; no concrete illustration is given showing how a specific neuroscience or psychology finding (e.g., context-dependent concept construction) is encoded as a utility term that matches the original variety across individuals and situations.
minor comments (2)
  1. Define 'requisite variety' and 'augmented utilitarianism' at first use rather than assuming reader familiarity.
  2. Add citations to specific empirical studies whose models are invoked in the background-knowledge summary.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed review. The comments correctly identify that the manuscript remains at a high-level conceptual stage without explicit mappings or concrete illustrations. We address each point below and will make revisions to strengthen these aspects.

read point-by-point responses
  1. Referee: [Abstract / transdisciplinary analysis paragraph] Abstract and the paragraph on transdisciplinary analysis: the central claim that scientific models of moral-judgement construction can be turned into utility functions while preserving requisite variety (without new violations or lossy simplifications) is asserted but never accompanied by an explicit mapping, example derivation, or check that any model element survives translation into a mathematical term.

    Authors: We agree that the central claim is asserted without an accompanying explicit mapping or example derivation. The manuscript's scope is limited to summarizing variety-relevant findings from neuroscience and psychology, linking them to augmented utilitarianism, and outlining high-level guidelines; it does not attempt the full translation step. This limitation is genuine and the comment is accurate. In revision we will add a dedicated subsection with one illustrative example (using a neuroscience finding on concept construction) that sketches a possible utility-function approximation, while explicitly noting that the mapping is schematic, may introduce simplifications, and does not guarantee preservation of all variety. revision: yes

  2. Referee: [Section on practical guidelines] Section proposing practical guidelines for approximate goal functions: the guidelines remain high-level summaries; no concrete illustration is given showing how a specific neuroscience or psychology finding (e.g., context-dependent concept construction) is encoded as a utility term that matches the original variety across individuals and situations.

    Authors: The observation is correct: the guidelines are high-level summaries without concrete encodings. The paper presents them as initial practical implications rather than worked examples. We will revise the section to include a specific illustration of how context-dependent concept construction could be approximated (e.g., via context-sensitive modular terms or adjustable parameters in the utility function) and will discuss the extent to which such an approximation can or cannot match the original variety across individuals and situations, including acknowledged limitations. revision: yes

Circularity Check

0 steps flagged

No circularity: argument is postulational summary without self-referential derivations or fitted predictions

full rationale

The paper opens with a postulate that an ethical utility function must model human intuitions to avoid violating them and that scientific models are the most accurate such models. It then summarizes neuroscience/psychology background, links it to augmented utilitarianism, and offers high-level guidelines. No equations, parameter fits, predictions, or self-citations appear in the provided text that would reduce any claimed result to its inputs by construction. The translation feasibility is treated as an assumption rather than derived, so the chain does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The claim rests on the domain assumption that scientific models of human morality exist and can be used to construct utility functions that reflect requisite variety; no free parameters or invented entities are identifiable from the abstract.

axioms (2)
  • domain assumption The most accurate models of human moral judgements are scientific models from neuroscience and psychology
    Stated directly in the abstract as the basis for assessing variety of human morality
  • domain assumption Augmented utilitarianism is a suitable ethical framework for linking neuroscience/psychology insights to AI utility functions
    Abstract states it is used to complement the information and propose guidelines

pith-pipeline@v0.9.0 · 5717 in / 1375 out tokens · 47660 ms · 2026-05-25T12:13:05.317369+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages · 3 internal anchors

  1. [1]

    Apprenticeship learning via inverse reinforcement learn- ing

    [Abbeel and Ng, 2004] Pieter Abbeel and Andrew Y Ng. Apprenticeship learning via inverse reinforcement learn- ing. In Proceedings of the twenty-first international con- ference on Machine learning, page

  2. [2]

    Reinforcement learning as a frame- work for ethical decision making

    [Abel et al., 2016] David Abel, James MacGlashan, and Michael L Littman. Reinforcement learning as a frame- work for ethical decision making. In Workshops at the Thirtieth AAAI Conference on Artificial Intelligence,

  3. [3]

    Are episodic memories special? On the sameness of remembered and imagined event simulation

    [Addis, 2018] Donna Rose Addis. Are episodic memories special? On the sameness of remembered and imagined event simulation. Journal of the Royal Society of New Zealand, 48(2-3):64–88,

  4. [4]

    An impossibility the- orem for welfarist axiologies

    [Arrhenius, 2000] Gustaf Arrhenius. An impossibility the- orem for welfarist axiologies. Economics & Philosophy , 16(2):247–266,

  5. [5]

    An introduction to cybernet- ics

    [Ashby, 1961] W Ross Ashby. An introduction to cybernet- ics. Chapman & Hall Ltd,

  6. [6]

    Principles.(2017)

    [Asilomar, 2018] AI Asilomar. Principles.(2017). In Prin- ciples developed in conjunction with the 2017 Asilomar conference [Benevolent AI 2017],

  7. [7]

    Interoceptive predictions in the brain

    [Barrett and Simmons, 2015] Lisa Feldman Barrett and W Kyle Simmons. Interoceptive predictions in the brain. Nature Reviews Neuroscience, 16(7):419,

  8. [8]

    The con- ceptual act theory: A roadmap

    [Barrett et al., 2015] Lisa Feldman Barrett, Christine D Wilson-Mendenhall, and Lawrence W Barsalou. The con- ceptual act theory: A roadmap. pages 83–110,

  9. [9]

    What is emotion? Be- havioural processes, 60(2):69–83,

    [Cabanac, 2002] Michel Cabanac. What is emotion? Be- havioural processes, 60(2):69–83,

  10. [10]

    A constructionist review of morality and emotions: No evidence for specific links between moral content and discrete emotions

    [Cameron et al., 2015] C Daryl Cameron, Kristen A Lindquist, and Kurt Gray. A constructionist review of morality and emotions: No evidence for specific links between moral content and discrete emotions. Personality and Social Psychology Review, 19(4):371–394,

  11. [11]

    Deep reinforcement learning from human preferences

    [Christiano et al., 2017] Paul F Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, and Dario Amodei. Deep reinforcement learning from human preferences. In Advances in Neural Information Processing Systems , pages 4299–4307,

  12. [12]

    Every good regulator of a system must be a model of that system

    [Conant and Ross Ashby, 1970] Roger C Conant and W Ross Ashby. Every good regulator of a system must be a model of that system. International journal of systems science, 1(2):89–97,

  13. [13]

    Impossibility and Uncertainty Theorems in AI Value Alignment (or why your AGI should not have a utility function)

    [Eckersley, 2018] Peter Eckersley. Impossibility and Un- certainty Theorems in AI Value Alignment (or why your AGI should not have a utility function). arXiv preprint arXiv:1901.00064,

  14. [14]

    Prospection: Experiencing the future

    [Gilbert and Wilson, 2007] Daniel T Gilbert and Timothy D Wilson. Prospection: Experiencing the future. Science, 317(5843):1351–1354,

  15. [15]

    Defense Against the Dark Arts: An overview of adversarial example security research and future research directions

    [Goodfellow, 2018] Ian Goodfellow. Defense Against the Dark Arts: An overview of adversarial example security research and future research directions. arXiv preprint arXiv:1806.04169,

  16. [16]

    Adversarial Robust- ness for AI Safety

    [Goodfellow, 2019] Ian Goodfellow. Adversarial Robust- ness for AI Safety. https://safeai.webs.upv.es/wp-content/ uploads/2019/02/2019-01-27-goodfellow.pdf,

  17. [17]

    How to think about emotion and morality: cir- cles, not arrows

    [Gray et al., 2017] Kurt Gray, Chelsea Schein, and C Daryl Cameron. How to think about emotion and morality: cir- cles, not arrows. Current opinion in psychology, 17:41–46,

  18. [18]

    Coopera- tive inverse reinforcement learning

    [Hadfield-Menell et al., 2016] Dylan Hadfield-Menell, Stu- art J Russell, Pieter Abbeel, and Anca Dragan. Coopera- tive inverse reinforcement learning. In Advances in neural information processing systems, pages 3909–3917,

  19. [19]

    Beyond dual-processes: the interplay of reason and emotion in moral judgment

    [Helion and Pizarro, 2015] Chelsea Helion and David A Pizarro. Beyond dual-processes: the interplay of reason and emotion in moral judgment. Handbook of neuroethics, pages 109–125,

  20. [20]

    Concepts dissolve artificial boundaries in the study of emotion and cognition, uniting body, brain, and mind

    [Hoemann and Barrett, 2019] Katie Hoemann and Lisa Feld- man Barrett. Concepts dissolve artificial boundaries in the study of emotion and cognition, uniting body, brain, and mind. Cognition and Emotion, 33(1):67–76,

  21. [21]

    [Kahneman et al., 1997] Daniel Kahneman, Peter P Wakker, and Rakesh Sarin

    PMID: 30336722. [Kahneman et al., 1997] Daniel Kahneman, Peter P Wakker, and Rakesh Sarin. Back to Bentham? Explorations of experienced utility. The quarterly journal of economics , 112(2):375–406,

  22. [22]

    Evidence for a large-scale brain system supporting allostasis and interoception in humans

    [Kleckner et al., 2017] Ian R Kleckner, Jiahe Zhang, Alexandra Touroutoglou, Lorena Chanes, Chenjie Xia, W Kyle Simmons, Karen S Quigley, Bradford C Dicker- son, and Lisa Feldman Barrett. Evidence for a large-scale brain system supporting allostasis and interoception in humans. Nature human behaviour, 1(5):0069,

  23. [23]

    Scalable agent alignment via reward modeling: a research direction

    [Leike et al., 2018] Jan Leike, David Krueger, Tom Everitt, Miljan Martic, Vishal Maini, and Shane Legg. Scalable agent alignment via reward modeling: a research direction. arXiv preprint arXiv:1811.07871,

  24. [24]

    The triune brain in evo- lution: Role in paleocerebral functions

    [MacLean, 1990] Paul D MacLean. The triune brain in evo- lution: Role in paleocerebral functions . Springer Science & Business Media,

  25. [25]

    Hap- pily entangled: prediction, emotion, and the embodied mind

    [Miller and Clark, 2018] Mark Miller and Andy Clark. Hap- pily entangled: prediction, emotion, and the embodied mind. Synthese, 195(6):2559–2575,

  26. [26]

    Special Operations Forces: A Global Immune System? In International Conference on Complex Sys- tems, pages 486–498

    [Norman and Bar-Yam, 2018] Joseph Norman and Yaneer Bar-Yam. Special Operations Forces: A Global Immune System? In International Conference on Complex Sys- tems, pages 486–498. Springer,

  27. [27]

    States of mind: Emotions, body feelings, and thoughts share distributed neural networks

    [Oosterwijk et al., 2012] Suzanne Oosterwijk, Kristen A Lindquist, Eric Anderson, Rebecca Dautoff, Yoshiya Moriguchi, and Lisa Feldman Barrett. States of mind: Emotions, body feelings, and thoughts share distributed neural networks. NeuroImage, 62(3):2110–2128,

  28. [28]

    The unifying moral dyad: Liberals and conservatives share the same harm-based moral template

    [Schein and Gray, 2015] Chelsea Schein and Kurt Gray. The unifying moral dyad: Liberals and conservatives share the same harm-based moral template. Personality and Social Psychology Bulletin, 41(8):1147–1163,

  29. [29]

    The theory of dyadic morality: Reinventing moral judgment by redefining harm

    [Schein and Gray, 2018] Chelsea Schein and Kurt Gray. The theory of dyadic morality: Reinventing moral judgment by redefining harm. Personality and Social Psychology Review, 22(1):32–70,

  30. [30]

    The visual guide to morality: Vision as an in- tegrative analogy for moral experience, variability and mechanism

    [Schein et al., 2016] Chelsea Schein, Neil Hester, and Kurt Gray. The visual guide to morality: Vision as an in- tegrative analogy for moral experience, variability and mechanism. Social and Personality Psychology Compass, 10(4):231–251,

  31. [31]

    Flourish: A vision- ary new understanding of happiness and well-being

    [Seligman, 2012] Martin EP Seligman. Flourish: A vision- ary new understanding of happiness and well-being . Si- mon and Schuster,

  32. [32]

    Agent foundations for aligning machine in- telligence with human interests: a technical research agenda

    [Soares and Fallenstein, 2017] Nate Soares and Benya Fal- lenstein. Agent foundations for aligning machine in- telligence with human interests: a technical research agenda. In The Technological Singularity, pages 103–125. Springer,

  33. [33]

    Alignment for ad- vanced machine learning systems

    [Taylor et al., 2016] Jessica Taylor, Eliezer Yudkowsky, Patrick LaVictoire, and Andrew Critch. Alignment for ad- vanced machine learning systems. Machine Intelligence Research Institute,

  34. [34]

    Consequentialism, ratio- nality and the relevant description of outcomes

    [Verbeek, 2001] Bruno Verbeek. Consequentialism, ratio- nality and the relevant description of outcomes. Eco- nomics & Philosophy, 17(2):181–205,

  35. [35]

    Telling autonomous systems what to do

    [Werkhoven et al., 2018] Peter Werkhoven, Leon Kester, and Mark Neerincx. Telling autonomous systems what to do. In Proceedings of the 36th European Conference on Cognitive Ergonomics, page

  36. [36]

    The AI Alignment Problem: Why it is Hard, and Where to Start

    [Yudkowsky, 2016] Eliezer Yudkowsky. The AI Alignment Problem: Why it is Hard, and Where to Start. Symbolic Systems Distinguished Speaker, 2016