An Algebraic Exposition of the Theory of Dyadic Morality

Kush R. Varshney

arxiv: 2605.16153 · v1 · pith:RG5FZ4RInew · submitted 2026-05-15 · 💻 cs.AI

An Algebraic Exposition of the Theory of Dyadic Morality

Kush R. Varshney This is my paper

Pith reviewed 2026-05-20 17:26 UTC · model grok-4.3

classification 💻 cs.AI

keywords theory of dyadic moralitystructural causal modelingmoral judgmentneurosymbolic AIAI policycausal inferencepsychological operators

0 comments

The pith

Moral judgments reduce to a simple agent-patient harm template that structural causal models can capture with three added operators.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how the theory of dyadic morality, built on one intentional agent harming a vulnerable patient, can be written in the language of structural causal models. It adds three operators that let the model typecast roles, complete incomplete scenarios, and shift inferences according to positive or negative valence. These extensions explain how people simplify multi-party moral problems by collapsing nodes and handling them one at a time. The resulting algebra supplies concrete methods for AI to spot clashing duties, shape helpfulness rules that leave users in control, and treat failure messages as deliberate causal interventions. If the formalization holds, it supplies a mathematically exact route for embedding human-style moral reasoning inside neurosymbolic systems.

Core claim

The theory of dyadic morality is formalized by expressing its basic template—an intentional agent causing harm to a vulnerable patient—in structural causal model notation and extending the notation with a typecasting operator that assigns moral roles, a completion operator that supplies missing causal links, and a valence-dependent inference mechanism that modulates conclusions according to the sign of the outcome. The same framework accounts for scalability by demonstrating that moral cognition reduces larger graphs through node collapse and sequential processing rather than exhaustive enumeration.

What carries the argument

The dyadic template of intentional agent harming vulnerable patient, extended inside structural causal models by the typecasting operator, completion operator, and valence-dependent inference mechanism.

If this is right

AI systems can use the model to detect and resolve conflicting moral obligations before acting.
Helpfulness policies can be written to preserve user agency by keeping the dyadic structure intact.
Post-failure messages can be crafted as targeted causal interventions that restore the intended moral framing.
Mind perception should be measured in narrow, context-specific ways rather than through broad averages.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The compression rules could let AI handle moral dilemmas with many agents without exploding computational cost.
The same operators might be tested directly in behavioral experiments that present scaled-up versions of the basic template.
If the operators prove stable across cultures, they offer a route to align AI moral outputs with shared human patterns.

Load-bearing premise

The three psychological operators correctly describe the shortcuts people actually use to compute moral judgments from the basic agent-patient template.

What would settle it

A controlled experiment in which participants judge multi-agent moral dilemmas and fail to show the predicted pattern of node collapse or sequential processing would falsify the scalability claim.

Figures

Figures reproduced from arXiv: 2605.16153 by Kush R. Varshney.

read the original abstract

This paper provides an algebraic exposition of the theory of dyadic morality (TDM), a psychological model of moral judgment grounded in a simple two-node template: an intentional agent causing harm to a vulnerable patient. We formalize TDM using structural causal modeling (SCM) notation and identify three psychological operators (typecasting operator, completion operator, and valence-dependent inference mechanism) that extend standard SCM to capture how people compute moral judgments under constraints. We address scalability challenges arising from TDM's dyadic limitation, showing how moral cognition compresses multi-node scenarios through node collapse and sequential processing. Drawing on this algebraic framework, we demonstrate concrete applications to AI policy design: detecting conflicting obligations, structuring helpfulness policies to preserve user agency, and designing post-failure communication as causal interventions. Finally, we recommend scoped, contextual measurement of mind perception over universal averaging to operationalize the theory empirically. This algebraic formalization enables neurosymbolic AI systems to compute morality in a way that is both mathematically rigorous and faithful to human moral cognition.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper maps the theory of dyadic morality onto structural causal models and names three operators for scaling it, then sketches AI policy uses, but the operators stay at the level of definitions without derivations or data checks.

read the letter

This paper takes the existing theory of dyadic morality and writes it out using structural causal model notation. It identifies three operators—typecasting, completion, and valence-dependent inference—to handle how the basic two-node template gets applied when scenarios grow larger through node collapse and sequential steps. The main addition is the link to concrete AI policy questions, such as spotting conflicting obligations, keeping user agency in helpfulness rules, and treating post-failure messages as interventions. That part gives the work a practical angle that pure expositions often lack. The recommendation for scoped rather than averaged measurement of mind perception is also a clear, usable suggestion for anyone who wants to test the ideas later. The formalization itself is straightforward and stays within standard SCM language, which makes the template easy to follow for readers already comfortable with causal graphs. The stress-test note about missing empirical mapping holds up on the abstract: the operators are introduced by definition to extend SCM, yet no algebraic steps or comparisons to human judgment data on intentional harm or patient vulnerability are supplied. This leaves the claim that the setup is both rigorous and faithful to actual moral cognition resting on the mapping alone. The paper is aimed at people building neurosymbolic systems or designing AI policies that try to track human-style moral constraints. A reader who wants a structured starting template for embedding dyadic reasoning could pull useful pieces from it. It has enough shape and citations to warrant sending out for peer review, where the main questions would be whether the operators can be derived or tested in a follow-up.

Referee Report

3 major / 2 minor

Summary. The paper provides an algebraic exposition of the Theory of Dyadic Morality (TDM) by recasting its two-node template (intentional agent causing harm to a vulnerable patient) in structural causal modeling (SCM) notation. It introduces three psychological operators—the typecasting operator, completion operator, and valence-dependent inference mechanism—to extend standard SCM for computing moral judgments under constraints, addresses scalability challenges via node collapse and sequential processing, and applies the framework to AI policy tasks such as detecting conflicting obligations, structuring helpfulness policies to preserve agency, and post-failure communication as interventions. The work ends with a recommendation for scoped, contextual measurement of mind perception rather than universal averaging.

Significance. If the operator definitions prove internally consistent and receive empirical grounding against moral judgment data, the framework could supply a mathematically rigorous bridge between psychological models of morality and neurosymbolic AI systems. The concrete policy applications and emphasis on falsifiable measurement recommendations are strengths that would support more human-aligned AI design if the central faithfulness claim holds.

major comments (3)

[§3] §3 (Algebraic Formalization): The manuscript states that the typecasting operator, completion operator, and valence-dependent inference mechanism extend standard SCM to capture constraints on human moral computation, yet supplies no explicit algebraic definitions, graph transformations, or equations showing how these operators modify causal structures or probability distributions. This is load-bearing for the central claim that the formalization is faithful to human moral cognition.
[§4] §4 (Scalability via Node Collapse): The discussion of compressing multi-node scenarios through node collapse and sequential processing lacks any concrete example or equation demonstrating the resulting SCM after collapse, undermining the claim that this resolves TDM's dyadic limitations in a computationally tractable way.
[§6] §6 (Empirical Recommendations): The suggestion for scoped measurement of mind perception is presented without reference to specific existing datasets on intentional harm or patient vulnerability, or any proposed test that could falsify the operators' predictions against observed human judgments.

minor comments (2)

[Introduction] The SCM notation is introduced without a brief recap of standard do-calculus or intervention semantics, which would aid readers from AI backgrounds who may not be familiar with the psychological extensions.
[References] A small number of citations to foundational TDM papers appear to be missing from the reference list, which would strengthen the grounding of the psychological operators.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed comments. We address each major comment point by point below, indicating where revisions will be made to improve the manuscript's rigor and clarity.

read point-by-point responses

Referee: [§3] §3 (Algebraic Formalization): The manuscript states that the typecasting operator, completion operator, and valence-dependent inference mechanism extend standard SCM to capture constraints on human moral computation, yet supplies no explicit algebraic definitions, graph transformations, or equations showing how these operators modify causal structures or probability distributions. This is load-bearing for the central claim that the formalization is faithful to human moral cognition.

Authors: We agree that the absence of explicit algebraic definitions weakens the central faithfulness claim. The current manuscript introduces the operators at a conceptual level within the SCM framework but does not supply the required equations or graph transformations. In the revised manuscript we will expand §3 with formal definitions: the typecasting operator as a graph augmentation function that introduces typed nodes with associated priors; the completion operator as a probabilistic inference rule that fills in missing causal edges under valence constraints; and the valence-dependent inference mechanism as a conditional update P(judgment | evidence, valence). We will also include explicit graph transformation rules and worked probability calculations. This addresses the load-bearing concern directly. revision: yes
Referee: [§4] §4 (Scalability via Node Collapse): The discussion of compressing multi-node scenarios through node collapse and sequential processing lacks any concrete example or equation demonstrating the resulting SCM after collapse, undermining the claim that this resolves TDM's dyadic limitations in a computationally tractable way.

Authors: We concur that a concrete example is necessary to substantiate the scalability claim. The manuscript describes node collapse and sequential processing at a high level but provides no worked illustration. In the revision we will add to §4 a specific multi-node example (e.g., a three-agent harm scenario), showing the original SCM, the collapsed dyadic graph, the transformation equations, and the resulting probability distributions before and after collapse. This will demonstrate computational tractability explicitly. revision: yes
Referee: [§6] §6 (Empirical Recommendations): The suggestion for scoped measurement of mind perception is presented without reference to specific existing datasets on intentional harm or patient vulnerability, or any proposed test that could falsify the operators' predictions against observed human judgments.

Authors: This observation is correct; the recommendation remains at a general level without concrete empirical anchors. In the revised manuscript we will cite relevant existing datasets from moral psychology (e.g., studies on intentionality and harm perception) and propose a specific falsification test: generate operator predictions for a set of controlled vignettes, compare them statistically to human judgment data from a selected dataset, and define clear criteria (e.g., deviation thresholds) under which the operators would be falsified. revision: yes

Circularity Check

0 steps flagged

Formalization of existing TDM rests on external SCM with operators introduced by definition

full rationale

The paper provides an algebraic exposition that maps the pre-existing Theory of Dyadic Morality onto standard structural causal modeling notation. The three operators are explicitly identified and defined as extensions to capture psychological constraints, rather than being derived from prior equations or data within the manuscript. No predictions or results are shown to reduce by construction to fitted parameters or self-citations; the central formalization therefore remains self-contained against external benchmarks and does not exhibit load-bearing circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that standard SCM can be extended by the three named operators without loss of fidelity to human moral cognition; no free parameters or invented entities are introduced.

axioms (1)

domain assumption Structural causal modeling notation can be extended with typecasting, completion, and valence-dependent inference operators to capture moral judgment under constraints.
Invoked when the paper states that these operators extend standard SCM to model TDM.

pith-pipeline@v0.9.0 · 5698 in / 1136 out tokens · 31199 ms · 2026-05-20T17:26:45.405728+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We formalize TDM using structural causal modeling (SCM) notation and identify three psychological operators (typecasting operator, completion operator, and valence-dependent inference mechanism) that extend standard SCM
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

T(A, P) =⇒ A ∝ 1/P

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages

[1]

Abdulhai, M.; Serapio-Garcia, G.; Crepy, C.; Valter, D.; Canny, J.; and Jaques, N. 2024. Moral Foundations of Large Language Models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 17737--17752

work page 2024
[2]

M.; and Ruggieri, S

Alvarez, J. M.; and Ruggieri, S. 2025. Toward A Causal Framework for Modeling Perception. In Proceedings of the AAAI/ACM Conference on AI , Ethics, and Society , 166--178

work page 2025
[3]

Y.; Magnus, P.; Richards, J.; and Varshney, K

Ashktorab, Z.; Buccella, A.; D’Cruz, J.; Fowler, Z.; Gill, A.; Leung, K. Y.; Magnus, P.; Richards, J.; and Varshney, K. R. 2025. Who’s Sorry Now: User Preferences Among Rote, Empathic, and Explanatory Apologies from LLM Chatbots. ACM Transactions on Computer-Human Interaction

work page 2025
[4]

Chatila, R.; Firth-Butterfield, K.; and Havens, J. C. 2018. Ethically Aligned Design: A Vision for Prioritizing Human Well-Being With Autonomous and Intelligent Systems Version 2

work page 2018
[5]

M.; Gray, K.; and Wegner, D

Gray, H. M.; Gray, K.; and Wegner, D. M. 2007. Dimensions of Mind Perception. Science, 315(5812): 619

work page 2007
[6]

Gray, K. 2025. Outraged: Why We Fight About Morality and Politics and How to Find Common Ground. Random House

work page 2025
[7]

Gray, K.; Waytz, A.; and Young, L. 2012. The Moral Dyad: A Fundamental Template Unifying Moral Judgment. Psychological Inquiry, 23(2): 206--215

work page 2012
[8]

Gray, K.; and Wegner, D. M. 2009. Moral Typecasting: Divergent Perceptions of Moral Agents and Moral Patients. Journal of Personality and Social Psychology, 96(3): 505

work page 2009
[9]

Gray, K.; Young, L.; and Waytz, A. 2012. Mind Perception Is the Essence of Morality. Psychological Inquiry, 23(2): 101--124

work page 2012
[10]

Hu, T.; Kyrychenko, Y.; Rathje, S.; Collier, N.; Van Der Linden, S.; and Roozenbeek, J. 2025. Generative Language Models Exhibit Social Identity Biases. Nature Computational Science, 5(1): 65--75

work page 2025
[11]

Hullman, J.; Broska, D.; Sun, H.; and Shaw, A. 2026. This Human Study Did Not Involve Human Subjects: Validating LLM Simulations as Behavioral Evidence. arXiv:2602.15785

work page arXiv 2026
[12]

H.; Raj, A.; Suh, J.; Chan, D

Kang, M.; Moon, S.; Lee, S. H.; Raj, A.; Suh, J.; Chan, D. M.; and Canny, J. 2025. Deep Binding of Language Model Virtual Personas: A Study on Approximating Political Partisan Misperceptions. arXiv:2504.11673

work page arXiv 2025
[13]

Kegel, M.; and Ghanem, L. 2024. You Did That On Purpose! An Investigation of the K nobe Effect in Human-Robot Interactions. In Proceedings of the Hawaii International Conference on System Sciences

work page 2024
[14]

T.; and Varshney, K

Knowles, B.; Fledderjohann, J.; Richards, J. T.; and Varshney, K. R. 2023. Trustworthy AI and the Logics of Intersectional Resistance. In Proceedings of the ACM Conference on Fairness, Accountability, and Transparency , 172--182

work page 2023
[15]

Levine, S.; Kleiman-Weiner, M.; Schulz, L.; Tenenbaum, J.; and Cushman, F. 2020. The Logic of Universalization Guides Moral Judgment. Proceedings of the National Academy of Sciences, 117(42): 26158--26169

work page 2020
[16]

Malone, E.; Afroogh, S.; D’Cruz, J.; and Varshney, K. R. 2025. When Trust Is Zero Sum: Automation’s Threat to Epistemic Agency. Ethics and Information Technology, 27(2): 29

work page 2025
[17]

Pan, L.; Albalak, A.; Wang, X.; and Wang, W. 2023. Logic- LM : Empowering Large Language Models With Symbolic Solvers for Faithful Logical Reasoning. In Findings of the Association for Computational Linguistics: EMNLP , 3806--3824

work page 2023
[18]

Pearl, J. 2000. Causality: Models, Reasoning, and Inference. Cambridge University Press

work page 2000
[19]

T.; Martino, J.; Bellamy, R

Richards, J. T.; Martino, J.; Bellamy, R. K.; and Muller, M. 2025. Musings on AI Muses: Support for Human Creativity. In Advances in Neural Information Processing Systems: Creative AI Track

work page 2025
[20]

Schein, C.; and Gray, K. 2018. The Theory of Dyadic Morality: Reinventing Moral Judgment by Redefining Harm. Personality and Social Psychology Review, 22(1): 32--70

work page 2018
[21]

R.; Ashktorab, Z.; Bouneffouf, D.; Riemer, M.; and Weisz, J

Varshney, K. R.; Ashktorab, Z.; Bouneffouf, D.; Riemer, M.; and Weisz, J. D. 2025. Scopes of Alignment. arXiv:2501.12405

work page arXiv 2025
[22]

M.; and Gray, K

Wegner, D. M.; and Gray, K. 2017. The Mind Club: Who Thinks, What Feels, and Why It Matters. Penguin

work page 2017
[23]

K.; Goodman, N

Wong, L.; Grand, G.; Lew, A. K.; Goodman, N. D.; Mansinghka, V. K.; Andreas, J.; and Tenenbaum, J. B. 2023. From Word Models to World Models: Translating from Natural Language to the Probabilistic Language of Thought. arXiv:2306.12672

work page arXiv 2023
[24]

Zewail, A.; Figueroa, A.; Graham, J.; and Atari, M. 2026. Moral Stereotyping in Large Language Models. Proceedings of the National Academy of Sciences, 123(10): e2519941123

work page 2026
[25]

Zhou, J.; Hu, M.; Li, J.; Zhang, X.; Wu, X.; King, I.; and Meng, H. 2024. Rethinking Machine Ethics--Can LLM s Perform Moral Reasoning Through the Lens of Moral Theories? In Findings of the Association for Computational Linguistics: NAACL , 2227--2242

work page 2024

[1] [1]

Abdulhai, M.; Serapio-Garcia, G.; Crepy, C.; Valter, D.; Canny, J.; and Jaques, N. 2024. Moral Foundations of Large Language Models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 17737--17752

work page 2024

[2] [2]

M.; and Ruggieri, S

Alvarez, J. M.; and Ruggieri, S. 2025. Toward A Causal Framework for Modeling Perception. In Proceedings of the AAAI/ACM Conference on AI , Ethics, and Society , 166--178

work page 2025

[3] [3]

Y.; Magnus, P.; Richards, J.; and Varshney, K

Ashktorab, Z.; Buccella, A.; D’Cruz, J.; Fowler, Z.; Gill, A.; Leung, K. Y.; Magnus, P.; Richards, J.; and Varshney, K. R. 2025. Who’s Sorry Now: User Preferences Among Rote, Empathic, and Explanatory Apologies from LLM Chatbots. ACM Transactions on Computer-Human Interaction

work page 2025

[4] [4]

Chatila, R.; Firth-Butterfield, K.; and Havens, J. C. 2018. Ethically Aligned Design: A Vision for Prioritizing Human Well-Being With Autonomous and Intelligent Systems Version 2

work page 2018

[5] [5]

M.; Gray, K.; and Wegner, D

Gray, H. M.; Gray, K.; and Wegner, D. M. 2007. Dimensions of Mind Perception. Science, 315(5812): 619

work page 2007

[6] [6]

Gray, K. 2025. Outraged: Why We Fight About Morality and Politics and How to Find Common Ground. Random House

work page 2025

[7] [7]

Gray, K.; Waytz, A.; and Young, L. 2012. The Moral Dyad: A Fundamental Template Unifying Moral Judgment. Psychological Inquiry, 23(2): 206--215

work page 2012

[8] [8]

Gray, K.; and Wegner, D. M. 2009. Moral Typecasting: Divergent Perceptions of Moral Agents and Moral Patients. Journal of Personality and Social Psychology, 96(3): 505

work page 2009

[9] [9]

Gray, K.; Young, L.; and Waytz, A. 2012. Mind Perception Is the Essence of Morality. Psychological Inquiry, 23(2): 101--124

work page 2012

[10] [10]

Hu, T.; Kyrychenko, Y.; Rathje, S.; Collier, N.; Van Der Linden, S.; and Roozenbeek, J. 2025. Generative Language Models Exhibit Social Identity Biases. Nature Computational Science, 5(1): 65--75

work page 2025

[11] [11]

Hullman, J.; Broska, D.; Sun, H.; and Shaw, A. 2026. This Human Study Did Not Involve Human Subjects: Validating LLM Simulations as Behavioral Evidence. arXiv:2602.15785

work page arXiv 2026

[12] [12]

H.; Raj, A.; Suh, J.; Chan, D

Kang, M.; Moon, S.; Lee, S. H.; Raj, A.; Suh, J.; Chan, D. M.; and Canny, J. 2025. Deep Binding of Language Model Virtual Personas: A Study on Approximating Political Partisan Misperceptions. arXiv:2504.11673

work page arXiv 2025

[13] [13]

Kegel, M.; and Ghanem, L. 2024. You Did That On Purpose! An Investigation of the K nobe Effect in Human-Robot Interactions. In Proceedings of the Hawaii International Conference on System Sciences

work page 2024

[14] [14]

T.; and Varshney, K

Knowles, B.; Fledderjohann, J.; Richards, J. T.; and Varshney, K. R. 2023. Trustworthy AI and the Logics of Intersectional Resistance. In Proceedings of the ACM Conference on Fairness, Accountability, and Transparency , 172--182

work page 2023

[15] [15]

Levine, S.; Kleiman-Weiner, M.; Schulz, L.; Tenenbaum, J.; and Cushman, F. 2020. The Logic of Universalization Guides Moral Judgment. Proceedings of the National Academy of Sciences, 117(42): 26158--26169

work page 2020

[16] [16]

Malone, E.; Afroogh, S.; D’Cruz, J.; and Varshney, K. R. 2025. When Trust Is Zero Sum: Automation’s Threat to Epistemic Agency. Ethics and Information Technology, 27(2): 29

work page 2025

[17] [17]

Pan, L.; Albalak, A.; Wang, X.; and Wang, W. 2023. Logic- LM : Empowering Large Language Models With Symbolic Solvers for Faithful Logical Reasoning. In Findings of the Association for Computational Linguistics: EMNLP , 3806--3824

work page 2023

[18] [18]

Pearl, J. 2000. Causality: Models, Reasoning, and Inference. Cambridge University Press

work page 2000

[19] [19]

T.; Martino, J.; Bellamy, R

Richards, J. T.; Martino, J.; Bellamy, R. K.; and Muller, M. 2025. Musings on AI Muses: Support for Human Creativity. In Advances in Neural Information Processing Systems: Creative AI Track

work page 2025

[20] [20]

Schein, C.; and Gray, K. 2018. The Theory of Dyadic Morality: Reinventing Moral Judgment by Redefining Harm. Personality and Social Psychology Review, 22(1): 32--70

work page 2018

[21] [21]

R.; Ashktorab, Z.; Bouneffouf, D.; Riemer, M.; and Weisz, J

Varshney, K. R.; Ashktorab, Z.; Bouneffouf, D.; Riemer, M.; and Weisz, J. D. 2025. Scopes of Alignment. arXiv:2501.12405

work page arXiv 2025

[22] [22]

M.; and Gray, K

Wegner, D. M.; and Gray, K. 2017. The Mind Club: Who Thinks, What Feels, and Why It Matters. Penguin

work page 2017

[23] [23]

K.; Goodman, N

Wong, L.; Grand, G.; Lew, A. K.; Goodman, N. D.; Mansinghka, V. K.; Andreas, J.; and Tenenbaum, J. B. 2023. From Word Models to World Models: Translating from Natural Language to the Probabilistic Language of Thought. arXiv:2306.12672

work page arXiv 2023

[24] [24]

Zewail, A.; Figueroa, A.; Graham, J.; and Atari, M. 2026. Moral Stereotyping in Large Language Models. Proceedings of the National Academy of Sciences, 123(10): e2519941123

work page 2026

[25] [25]

Zhou, J.; Hu, M.; Li, J.; Zhang, X.; Wu, X.; King, I.; and Meng, H. 2024. Rethinking Machine Ethics--Can LLM s Perform Moral Reasoning Through the Lens of Moral Theories? In Findings of the Association for Computational Linguistics: NAACL , 2227--2242

work page 2024