Requisite Variety in Ethical Utility Functions for AI Value Alignment

Leon Kester; Nadisha-Marie Aliman

arxiv: 1907.00430 · v1 · pith:HYSUT4ZLnew · submitted 2019-06-30 · 💻 cs.AI

Requisite Variety in Ethical Utility Functions for AI Value Alignment

Nadisha-Marie Aliman , Leon Kester This is my paper

Pith reviewed 2026-05-25 12:13 UTC · model grok-4.3

classification 💻 cs.AI

keywords AI value alignmentethical utility functionsrequisite varietyneurosciencepsychologyaugmented utilitarianismAI safetymoral judgment modeling

0 comments

The pith

AI utility functions must model the full variety of human moral judgments using scientific neuroscience and psychology insights.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that an AI utility function will violate human ethical intuitions unless it accurately models those intuitions and preserves their variety. Because moral judgments arise from brain processes in biological humans, the most faithful models are scientific ones drawn from neuroscience and psychology. The authors perform a transdisciplinary analysis under a security mindset to summarize relevant variety knowledge from those fields, connect it to augmented utilitarianism, and derive initial practical guidelines for approximate ethical goal functions. This is presented as a necessary step toward value alignment that respects the complexity of actual human morality rather than simplified abstractions.

Core claim

For the utility function of an AI not to violate human ethical intuitions, it trivially has to be a model of these intuitions and reflect their variety, whereby the most accurate models pertaining to human entities being biological organisms equipped with a brain constructing concepts like moral judgements, are scientific models.

What carries the argument

Requisite variety requirement for ethical utility functions, implemented by grounding them in scientific models of human brain-based moral judgment construction.

If this is right

Approximate ethical goal functions can be designed using insights from neuroscience and psychology to better capture the variety of human moral judgements.
Augmented utilitarianism provides a suitable ethical framework for linking these scientific models to AI design.
A security-mindset transdisciplinary analysis helps identify the variety-relevant features of human morality that utility functions must include.
Future work must address challenges in implementing these models without loss of requisite variety.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Purely philosophical value-alignment methods would need empirical supplementation from brain sciences to meet the variety criterion.
Utility functions built this way would likely require periodic revision as neuroscience refines its models of moral cognition.
Empirical tests could compare how well different utility functions predict or avoid violating human judgments in real-world dilemmas.

Load-bearing premise

Scientific models from neuroscience and psychology can be translated into practical AI utility functions that preserve the full variety of human moral judgments without introducing new violations or oversimplifications.

What would settle it

A controlled comparison in which a utility function built from abstract philosophical ethics matches or exceeds the alignment performance of one derived from neuroscience and psychology models when tested against human moral intuitions across varied scenarios.

Figures

Figures reproduced from arXiv: 1907.00430 by Leon Kester, Nadisha-Marie Aliman.

read the original abstract

Being a complex subject of major importance in AI Safety research, value alignment has been studied from various perspectives in the last years. However, no final consensus on the design of ethical utility functions facilitating AI value alignment has been achieved yet. Given the urgency to identify systematic solutions, we postulate that it might be useful to start with the simple fact that for the utility function of an AI not to violate human ethical intuitions, it trivially has to be a model of these intuitions and reflect their variety $ - $ whereby the most accurate models pertaining to human entities being biological organisms equipped with a brain constructing concepts like moral judgements, are scientific models. Thus, in order to better assess the variety of human morality, we perform a transdisciplinary analysis applying a security mindset to the issue and summarizing variety-relevant background knowledge from neuroscience and psychology. We complement this information by linking it to augmented utilitarianism as a suitable ethical framework. Based on that, we propose first practical guidelines for the design of approximate ethical goal functions that might better capture the variety of human moral judgements. Finally, we conclude and address future possible challenges.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a synthesis paper that restates the need to model moral variety in AI utilities via neuro/psych insights but provides no mapping or validation for the translation step.

read the letter

The paper's main point is straightforward: for an AI utility function to avoid clashing with human ethics, it has to reflect the actual variety in how people form moral judgments, and the best models for that come from neuroscience and psychology rather than pure philosophy. It pulls in the idea of requisite variety, runs a quick transdisciplinary scan of relevant background, ties it to augmented utilitarianism, and ends with some practical guidelines for approximate goal functions. That framing is clear and the security-mindset angle is a reasonable way to approach alignment risks. The summary of variety-relevant findings from the cited fields is competent and gives readers a compact entry point to the literature. What stands out as new is the explicit linkage of those models to utility design under a variety lens, though it stays at the level of proposal. The soft spot is exactly where the stress-test flagged: the argument assumes the scientific models can be turned into utility terms that keep the original moral distinctions intact across people and situations, but there is no example derivation, no check for information loss, and no discussion of how context-dependent judgments survive the formalization. The guidelines remain high-level and untested on that point. This is for people already working in AI value alignment who want a structured reminder of why variety matters and where to look for human data. It does not deliver results that would change practice or generate new experiments. A serious editor should send it to referees because the underlying claim is coherent on its own terms and the literature engagement is honest, even if the central feasibility step needs more work.

Referee Report

2 major / 2 minor

Summary. The paper claims that AI ethical utility functions must model the variety of human moral intuitions to avoid violations, and that the most accurate such models come from neuroscience and psychology. It performs a transdisciplinary analysis (with a security mindset) of variety-relevant background knowledge, links the findings to augmented utilitarianism, and derives high-level practical guidelines for approximate ethical goal functions that better capture moral variety.

Significance. If the translation step from neuro/psych models to utility functions can be made explicit and shown to preserve variety, the work would supply a biologically grounded route to value alignment that explicitly addresses moral heterogeneity; the security-mindset framing and call for requisite variety are useful conceptual contributions even if the mapping remains schematic.

major comments (2)

[Abstract / transdisciplinary analysis paragraph] Abstract and the paragraph on transdisciplinary analysis: the central claim that scientific models of moral-judgement construction can be turned into utility functions while preserving requisite variety (without new violations or lossy simplifications) is asserted but never accompanied by an explicit mapping, example derivation, or check that any model element survives translation into a mathematical term.
[Section on practical guidelines] Section proposing practical guidelines for approximate goal functions: the guidelines remain high-level summaries; no concrete illustration is given showing how a specific neuroscience or psychology finding (e.g., context-dependent concept construction) is encoded as a utility term that matches the original variety across individuals and situations.

minor comments (2)

Define 'requisite variety' and 'augmented utilitarianism' at first use rather than assuming reader familiarity.
Add citations to specific empirical studies whose models are invoked in the background-knowledge summary.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed review. The comments correctly identify that the manuscript remains at a high-level conceptual stage without explicit mappings or concrete illustrations. We address each point below and will make revisions to strengthen these aspects.

read point-by-point responses

Referee: [Abstract / transdisciplinary analysis paragraph] Abstract and the paragraph on transdisciplinary analysis: the central claim that scientific models of moral-judgement construction can be turned into utility functions while preserving requisite variety (without new violations or lossy simplifications) is asserted but never accompanied by an explicit mapping, example derivation, or check that any model element survives translation into a mathematical term.

Authors: We agree that the central claim is asserted without an accompanying explicit mapping or example derivation. The manuscript's scope is limited to summarizing variety-relevant findings from neuroscience and psychology, linking them to augmented utilitarianism, and outlining high-level guidelines; it does not attempt the full translation step. This limitation is genuine and the comment is accurate. In revision we will add a dedicated subsection with one illustrative example (using a neuroscience finding on concept construction) that sketches a possible utility-function approximation, while explicitly noting that the mapping is schematic, may introduce simplifications, and does not guarantee preservation of all variety. revision: yes
Referee: [Section on practical guidelines] Section proposing practical guidelines for approximate goal functions: the guidelines remain high-level summaries; no concrete illustration is given showing how a specific neuroscience or psychology finding (e.g., context-dependent concept construction) is encoded as a utility term that matches the original variety across individuals and situations.

Authors: The observation is correct: the guidelines are high-level summaries without concrete encodings. The paper presents them as initial practical implications rather than worked examples. We will revise the section to include a specific illustration of how context-dependent concept construction could be approximated (e.g., via context-sensitive modular terms or adjustable parameters in the utility function) and will discuss the extent to which such an approximation can or cannot match the original variety across individuals and situations, including acknowledged limitations. revision: yes

Circularity Check

0 steps flagged

No circularity: argument is postulational summary without self-referential derivations or fitted predictions

full rationale

The paper opens with a postulate that an ethical utility function must model human intuitions to avoid violating them and that scientific models are the most accurate such models. It then summarizes neuroscience/psychology background, links it to augmented utilitarianism, and offers high-level guidelines. No equations, parameter fits, predictions, or self-citations appear in the provided text that would reduce any claimed result to its inputs by construction. The translation feasibility is treated as an assumption rather than derived, so the chain does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The claim rests on the domain assumption that scientific models of human morality exist and can be used to construct utility functions that reflect requisite variety; no free parameters or invented entities are identifiable from the abstract.

axioms (2)

domain assumption The most accurate models of human moral judgements are scientific models from neuroscience and psychology
Stated directly in the abstract as the basis for assessing variety of human morality
domain assumption Augmented utilitarianism is a suitable ethical framework for linking neuroscience/psychology insights to AI utility functions
Abstract states it is used to complement the information and propose guidelines

pith-pipeline@v0.9.0 · 5717 in / 1375 out tokens · 47660 ms · 2026-05-25T12:13:05.317369+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages · 3 internal anchors

[1]

Apprenticeship learning via inverse reinforcement learn- ing

[Abbeel and Ng, 2004] Pieter Abbeel and Andrew Y Ng. Apprenticeship learning via inverse reinforcement learn- ing. In Proceedings of the twenty-ﬁrst international con- ference on Machine learning, page

work page 2004
[2]

Reinforcement learning as a frame- work for ethical decision making

[Abel et al., 2016] David Abel, James MacGlashan, and Michael L Littman. Reinforcement learning as a frame- work for ethical decision making. In Workshops at the Thirtieth AAAI Conference on Artiﬁcial Intelligence,

work page 2016
[3]

Are episodic memories special? On the sameness of remembered and imagined event simulation

[Addis, 2018] Donna Rose Addis. Are episodic memories special? On the sameness of remembered and imagined event simulation. Journal of the Royal Society of New Zealand, 48(2-3):64–88,

work page 2018
[4]

An impossibility the- orem for welfarist axiologies

[Arrhenius, 2000] Gustaf Arrhenius. An impossibility the- orem for welfarist axiologies. Economics & Philosophy , 16(2):247–266,

work page 2000
[5]

An introduction to cybernet- ics

[Ashby, 1961] W Ross Ashby. An introduction to cybernet- ics. Chapman & Hall Ltd,

work page 1961
[6]

Principles.(2017)

[Asilomar, 2018] AI Asilomar. Principles.(2017). In Prin- ciples developed in conjunction with the 2017 Asilomar conference [Benevolent AI 2017],

work page 2018
[7]

Interoceptive predictions in the brain

[Barrett and Simmons, 2015] Lisa Feldman Barrett and W Kyle Simmons. Interoceptive predictions in the brain. Nature Reviews Neuroscience, 16(7):419,

work page 2015
[8]

The con- ceptual act theory: A roadmap

[Barrett et al., 2015] Lisa Feldman Barrett, Christine D Wilson-Mendenhall, and Lawrence W Barsalou. The con- ceptual act theory: A roadmap. pages 83–110,

work page 2015
[9]

What is emotion? Be- havioural processes, 60(2):69–83,

[Cabanac, 2002] Michel Cabanac. What is emotion? Be- havioural processes, 60(2):69–83,

work page 2002
[10]

A constructionist review of morality and emotions: No evidence for speciﬁc links between moral content and discrete emotions

[Cameron et al., 2015] C Daryl Cameron, Kristen A Lindquist, and Kurt Gray. A constructionist review of morality and emotions: No evidence for speciﬁc links between moral content and discrete emotions. Personality and Social Psychology Review, 19(4):371–394,

work page 2015
[11]

Deep reinforcement learning from human preferences

[Christiano et al., 2017] Paul F Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, and Dario Amodei. Deep reinforcement learning from human preferences. In Advances in Neural Information Processing Systems , pages 4299–4307,

work page 2017
[12]

Every good regulator of a system must be a model of that system

[Conant and Ross Ashby, 1970] Roger C Conant and W Ross Ashby. Every good regulator of a system must be a model of that system. International journal of systems science, 1(2):89–97,

work page 1970
[13]

Impossibility and Uncertainty Theorems in AI Value Alignment (or why your AGI should not have a utility function)

[Eckersley, 2018] Peter Eckersley. Impossibility and Un- certainty Theorems in AI Value Alignment (or why your AGI should not have a utility function). arXiv preprint arXiv:1901.00064,

work page internal anchor Pith review Pith/arXiv arXiv 2018
[14]

Prospection: Experiencing the future

[Gilbert and Wilson, 2007] Daniel T Gilbert and Timothy D Wilson. Prospection: Experiencing the future. Science, 317(5843):1351–1354,

work page 2007
[15]

Defense Against the Dark Arts: An overview of adversarial example security research and future research directions

[Goodfellow, 2018] Ian Goodfellow. Defense Against the Dark Arts: An overview of adversarial example security research and future research directions. arXiv preprint arXiv:1806.04169,

work page internal anchor Pith review Pith/arXiv arXiv 2018
[16]

Adversarial Robust- ness for AI Safety

[Goodfellow, 2019] Ian Goodfellow. Adversarial Robust- ness for AI Safety. https://safeai.webs.upv.es/wp-content/ uploads/2019/02/2019-01-27-goodfellow.pdf,

work page 2019
[17]

How to think about emotion and morality: cir- cles, not arrows

[Gray et al., 2017] Kurt Gray, Chelsea Schein, and C Daryl Cameron. How to think about emotion and morality: cir- cles, not arrows. Current opinion in psychology, 17:41–46,

work page 2017
[18]

Coopera- tive inverse reinforcement learning

[Hadﬁeld-Menell et al., 2016] Dylan Hadﬁeld-Menell, Stu- art J Russell, Pieter Abbeel, and Anca Dragan. Coopera- tive inverse reinforcement learning. In Advances in neural information processing systems, pages 3909–3917,

work page 2016
[19]

Beyond dual-processes: the interplay of reason and emotion in moral judgment

[Helion and Pizarro, 2015] Chelsea Helion and David A Pizarro. Beyond dual-processes: the interplay of reason and emotion in moral judgment. Handbook of neuroethics, pages 109–125,

work page 2015
[20]

Concepts dissolve artiﬁcial boundaries in the study of emotion and cognition, uniting body, brain, and mind

[Hoemann and Barrett, 2019] Katie Hoemann and Lisa Feld- man Barrett. Concepts dissolve artiﬁcial boundaries in the study of emotion and cognition, uniting body, brain, and mind. Cognition and Emotion, 33(1):67–76,

work page 2019
[21]

[Kahneman et al., 1997] Daniel Kahneman, Peter P Wakker, and Rakesh Sarin

PMID: 30336722. [Kahneman et al., 1997] Daniel Kahneman, Peter P Wakker, and Rakesh Sarin. Back to Bentham? Explorations of experienced utility. The quarterly journal of economics , 112(2):375–406,

work page 1997
[22]

Evidence for a large-scale brain system supporting allostasis and interoception in humans

[Kleckner et al., 2017] Ian R Kleckner, Jiahe Zhang, Alexandra Touroutoglou, Lorena Chanes, Chenjie Xia, W Kyle Simmons, Karen S Quigley, Bradford C Dicker- son, and Lisa Feldman Barrett. Evidence for a large-scale brain system supporting allostasis and interoception in humans. Nature human behaviour, 1(5):0069,

work page 2017
[23]

Scalable agent alignment via reward modeling: a research direction

[Leike et al., 2018] Jan Leike, David Krueger, Tom Everitt, Miljan Martic, Vishal Maini, and Shane Legg. Scalable agent alignment via reward modeling: a research direction. arXiv preprint arXiv:1811.07871,

work page internal anchor Pith review Pith/arXiv arXiv 2018
[24]

The triune brain in evo- lution: Role in paleocerebral functions

[MacLean, 1990] Paul D MacLean. The triune brain in evo- lution: Role in paleocerebral functions . Springer Science & Business Media,

work page 1990
[25]

Hap- pily entangled: prediction, emotion, and the embodied mind

[Miller and Clark, 2018] Mark Miller and Andy Clark. Hap- pily entangled: prediction, emotion, and the embodied mind. Synthese, 195(6):2559–2575,

work page 2018
[26]

Special Operations Forces: A Global Immune System? In International Conference on Complex Sys- tems, pages 486–498

[Norman and Bar-Yam, 2018] Joseph Norman and Yaneer Bar-Yam. Special Operations Forces: A Global Immune System? In International Conference on Complex Sys- tems, pages 486–498. Springer,

work page 2018
[27]

States of mind: Emotions, body feelings, and thoughts share distributed neural networks

[Oosterwijk et al., 2012] Suzanne Oosterwijk, Kristen A Lindquist, Eric Anderson, Rebecca Dautoff, Yoshiya Moriguchi, and Lisa Feldman Barrett. States of mind: Emotions, body feelings, and thoughts share distributed neural networks. NeuroImage, 62(3):2110–2128,

work page 2012
[28]

The unifying moral dyad: Liberals and conservatives share the same harm-based moral template

[Schein and Gray, 2015] Chelsea Schein and Kurt Gray. The unifying moral dyad: Liberals and conservatives share the same harm-based moral template. Personality and Social Psychology Bulletin, 41(8):1147–1163,

work page 2015
[29]

The theory of dyadic morality: Reinventing moral judgment by redeﬁning harm

[Schein and Gray, 2018] Chelsea Schein and Kurt Gray. The theory of dyadic morality: Reinventing moral judgment by redeﬁning harm. Personality and Social Psychology Review, 22(1):32–70,

work page 2018
[30]

The visual guide to morality: Vision as an in- tegrative analogy for moral experience, variability and mechanism

[Schein et al., 2016] Chelsea Schein, Neil Hester, and Kurt Gray. The visual guide to morality: Vision as an in- tegrative analogy for moral experience, variability and mechanism. Social and Personality Psychology Compass, 10(4):231–251,

work page 2016
[31]

Flourish: A vision- ary new understanding of happiness and well-being

[Seligman, 2012] Martin EP Seligman. Flourish: A vision- ary new understanding of happiness and well-being . Si- mon and Schuster,

work page 2012
[32]

Agent foundations for aligning machine in- telligence with human interests: a technical research agenda

[Soares and Fallenstein, 2017] Nate Soares and Benya Fal- lenstein. Agent foundations for aligning machine in- telligence with human interests: a technical research agenda. In The Technological Singularity, pages 103–125. Springer,

work page 2017
[33]

Alignment for ad- vanced machine learning systems

[Taylor et al., 2016] Jessica Taylor, Eliezer Yudkowsky, Patrick LaVictoire, and Andrew Critch. Alignment for ad- vanced machine learning systems. Machine Intelligence Research Institute,

work page 2016
[34]

Consequentialism, ratio- nality and the relevant description of outcomes

[Verbeek, 2001] Bruno Verbeek. Consequentialism, ratio- nality and the relevant description of outcomes. Eco- nomics & Philosophy, 17(2):181–205,

work page 2001
[35]

Telling autonomous systems what to do

[Werkhoven et al., 2018] Peter Werkhoven, Leon Kester, and Mark Neerincx. Telling autonomous systems what to do. In Proceedings of the 36th European Conference on Cognitive Ergonomics, page

work page 2018
[36]

The AI Alignment Problem: Why it is Hard, and Where to Start

[Yudkowsky, 2016] Eliezer Yudkowsky. The AI Alignment Problem: Why it is Hard, and Where to Start. Symbolic Systems Distinguished Speaker, 2016

work page 2016

[1] [1]

Apprenticeship learning via inverse reinforcement learn- ing

[Abbeel and Ng, 2004] Pieter Abbeel and Andrew Y Ng. Apprenticeship learning via inverse reinforcement learn- ing. In Proceedings of the twenty-ﬁrst international con- ference on Machine learning, page

work page 2004

[2] [2]

Reinforcement learning as a frame- work for ethical decision making

[Abel et al., 2016] David Abel, James MacGlashan, and Michael L Littman. Reinforcement learning as a frame- work for ethical decision making. In Workshops at the Thirtieth AAAI Conference on Artiﬁcial Intelligence,

work page 2016

[3] [3]

Are episodic memories special? On the sameness of remembered and imagined event simulation

[Addis, 2018] Donna Rose Addis. Are episodic memories special? On the sameness of remembered and imagined event simulation. Journal of the Royal Society of New Zealand, 48(2-3):64–88,

work page 2018

[4] [4]

An impossibility the- orem for welfarist axiologies

[Arrhenius, 2000] Gustaf Arrhenius. An impossibility the- orem for welfarist axiologies. Economics & Philosophy , 16(2):247–266,

work page 2000

[5] [5]

An introduction to cybernet- ics

[Ashby, 1961] W Ross Ashby. An introduction to cybernet- ics. Chapman & Hall Ltd,

work page 1961

[6] [6]

Principles.(2017)

[Asilomar, 2018] AI Asilomar. Principles.(2017). In Prin- ciples developed in conjunction with the 2017 Asilomar conference [Benevolent AI 2017],

work page 2018

[7] [7]

Interoceptive predictions in the brain

[Barrett and Simmons, 2015] Lisa Feldman Barrett and W Kyle Simmons. Interoceptive predictions in the brain. Nature Reviews Neuroscience, 16(7):419,

work page 2015

[8] [8]

The con- ceptual act theory: A roadmap

[Barrett et al., 2015] Lisa Feldman Barrett, Christine D Wilson-Mendenhall, and Lawrence W Barsalou. The con- ceptual act theory: A roadmap. pages 83–110,

work page 2015

[9] [9]

What is emotion? Be- havioural processes, 60(2):69–83,

[Cabanac, 2002] Michel Cabanac. What is emotion? Be- havioural processes, 60(2):69–83,

work page 2002

[10] [10]

A constructionist review of morality and emotions: No evidence for speciﬁc links between moral content and discrete emotions

[Cameron et al., 2015] C Daryl Cameron, Kristen A Lindquist, and Kurt Gray. A constructionist review of morality and emotions: No evidence for speciﬁc links between moral content and discrete emotions. Personality and Social Psychology Review, 19(4):371–394,

work page 2015

[11] [11]

Deep reinforcement learning from human preferences

[Christiano et al., 2017] Paul F Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, and Dario Amodei. Deep reinforcement learning from human preferences. In Advances in Neural Information Processing Systems , pages 4299–4307,

work page 2017

[12] [12]

Every good regulator of a system must be a model of that system

[Conant and Ross Ashby, 1970] Roger C Conant and W Ross Ashby. Every good regulator of a system must be a model of that system. International journal of systems science, 1(2):89–97,

work page 1970

[13] [13]

Impossibility and Uncertainty Theorems in AI Value Alignment (or why your AGI should not have a utility function)

[Eckersley, 2018] Peter Eckersley. Impossibility and Un- certainty Theorems in AI Value Alignment (or why your AGI should not have a utility function). arXiv preprint arXiv:1901.00064,

work page internal anchor Pith review Pith/arXiv arXiv 2018

[14] [14]

Prospection: Experiencing the future

[Gilbert and Wilson, 2007] Daniel T Gilbert and Timothy D Wilson. Prospection: Experiencing the future. Science, 317(5843):1351–1354,

work page 2007

[15] [15]

Defense Against the Dark Arts: An overview of adversarial example security research and future research directions

[Goodfellow, 2018] Ian Goodfellow. Defense Against the Dark Arts: An overview of adversarial example security research and future research directions. arXiv preprint arXiv:1806.04169,

work page internal anchor Pith review Pith/arXiv arXiv 2018

[16] [16]

Adversarial Robust- ness for AI Safety

[Goodfellow, 2019] Ian Goodfellow. Adversarial Robust- ness for AI Safety. https://safeai.webs.upv.es/wp-content/ uploads/2019/02/2019-01-27-goodfellow.pdf,

work page 2019

[17] [17]

How to think about emotion and morality: cir- cles, not arrows

[Gray et al., 2017] Kurt Gray, Chelsea Schein, and C Daryl Cameron. How to think about emotion and morality: cir- cles, not arrows. Current opinion in psychology, 17:41–46,

work page 2017

[18] [18]

Coopera- tive inverse reinforcement learning

[Hadﬁeld-Menell et al., 2016] Dylan Hadﬁeld-Menell, Stu- art J Russell, Pieter Abbeel, and Anca Dragan. Coopera- tive inverse reinforcement learning. In Advances in neural information processing systems, pages 3909–3917,

work page 2016

[19] [19]

Beyond dual-processes: the interplay of reason and emotion in moral judgment

[Helion and Pizarro, 2015] Chelsea Helion and David A Pizarro. Beyond dual-processes: the interplay of reason and emotion in moral judgment. Handbook of neuroethics, pages 109–125,

work page 2015

[20] [20]

Concepts dissolve artiﬁcial boundaries in the study of emotion and cognition, uniting body, brain, and mind

[Hoemann and Barrett, 2019] Katie Hoemann and Lisa Feld- man Barrett. Concepts dissolve artiﬁcial boundaries in the study of emotion and cognition, uniting body, brain, and mind. Cognition and Emotion, 33(1):67–76,

work page 2019

[21] [21]

[Kahneman et al., 1997] Daniel Kahneman, Peter P Wakker, and Rakesh Sarin

PMID: 30336722. [Kahneman et al., 1997] Daniel Kahneman, Peter P Wakker, and Rakesh Sarin. Back to Bentham? Explorations of experienced utility. The quarterly journal of economics , 112(2):375–406,

work page 1997

[22] [22]

Evidence for a large-scale brain system supporting allostasis and interoception in humans

[Kleckner et al., 2017] Ian R Kleckner, Jiahe Zhang, Alexandra Touroutoglou, Lorena Chanes, Chenjie Xia, W Kyle Simmons, Karen S Quigley, Bradford C Dicker- son, and Lisa Feldman Barrett. Evidence for a large-scale brain system supporting allostasis and interoception in humans. Nature human behaviour, 1(5):0069,

work page 2017

[23] [23]

Scalable agent alignment via reward modeling: a research direction

[Leike et al., 2018] Jan Leike, David Krueger, Tom Everitt, Miljan Martic, Vishal Maini, and Shane Legg. Scalable agent alignment via reward modeling: a research direction. arXiv preprint arXiv:1811.07871,

work page internal anchor Pith review Pith/arXiv arXiv 2018

[24] [24]

The triune brain in evo- lution: Role in paleocerebral functions

[MacLean, 1990] Paul D MacLean. The triune brain in evo- lution: Role in paleocerebral functions . Springer Science & Business Media,

work page 1990

[25] [25]

Hap- pily entangled: prediction, emotion, and the embodied mind

[Miller and Clark, 2018] Mark Miller and Andy Clark. Hap- pily entangled: prediction, emotion, and the embodied mind. Synthese, 195(6):2559–2575,

work page 2018

[26] [26]

Special Operations Forces: A Global Immune System? In International Conference on Complex Sys- tems, pages 486–498

[Norman and Bar-Yam, 2018] Joseph Norman and Yaneer Bar-Yam. Special Operations Forces: A Global Immune System? In International Conference on Complex Sys- tems, pages 486–498. Springer,

work page 2018

[27] [27]

States of mind: Emotions, body feelings, and thoughts share distributed neural networks

[Oosterwijk et al., 2012] Suzanne Oosterwijk, Kristen A Lindquist, Eric Anderson, Rebecca Dautoff, Yoshiya Moriguchi, and Lisa Feldman Barrett. States of mind: Emotions, body feelings, and thoughts share distributed neural networks. NeuroImage, 62(3):2110–2128,

work page 2012

[28] [28]

The unifying moral dyad: Liberals and conservatives share the same harm-based moral template

[Schein and Gray, 2015] Chelsea Schein and Kurt Gray. The unifying moral dyad: Liberals and conservatives share the same harm-based moral template. Personality and Social Psychology Bulletin, 41(8):1147–1163,

work page 2015

[29] [29]

The theory of dyadic morality: Reinventing moral judgment by redeﬁning harm

[Schein and Gray, 2018] Chelsea Schein and Kurt Gray. The theory of dyadic morality: Reinventing moral judgment by redeﬁning harm. Personality and Social Psychology Review, 22(1):32–70,

work page 2018

[30] [30]

The visual guide to morality: Vision as an in- tegrative analogy for moral experience, variability and mechanism

[Schein et al., 2016] Chelsea Schein, Neil Hester, and Kurt Gray. The visual guide to morality: Vision as an in- tegrative analogy for moral experience, variability and mechanism. Social and Personality Psychology Compass, 10(4):231–251,

work page 2016

[31] [31]

Flourish: A vision- ary new understanding of happiness and well-being

[Seligman, 2012] Martin EP Seligman. Flourish: A vision- ary new understanding of happiness and well-being . Si- mon and Schuster,

work page 2012

[32] [32]

Agent foundations for aligning machine in- telligence with human interests: a technical research agenda

[Soares and Fallenstein, 2017] Nate Soares and Benya Fal- lenstein. Agent foundations for aligning machine in- telligence with human interests: a technical research agenda. In The Technological Singularity, pages 103–125. Springer,

work page 2017

[33] [33]

Alignment for ad- vanced machine learning systems

[Taylor et al., 2016] Jessica Taylor, Eliezer Yudkowsky, Patrick LaVictoire, and Andrew Critch. Alignment for ad- vanced machine learning systems. Machine Intelligence Research Institute,

work page 2016

[34] [34]

Consequentialism, ratio- nality and the relevant description of outcomes

[Verbeek, 2001] Bruno Verbeek. Consequentialism, ratio- nality and the relevant description of outcomes. Eco- nomics & Philosophy, 17(2):181–205,

work page 2001

[35] [35]

Telling autonomous systems what to do

[Werkhoven et al., 2018] Peter Werkhoven, Leon Kester, and Mark Neerincx. Telling autonomous systems what to do. In Proceedings of the 36th European Conference on Cognitive Ergonomics, page

work page 2018

[36] [36]

The AI Alignment Problem: Why it is Hard, and Where to Start

[Yudkowsky, 2016] Eliezer Yudkowsky. The AI Alignment Problem: Why it is Hard, and Where to Start. Symbolic Systems Distinguished Speaker, 2016

work page 2016