Smart But Not Moral? Moral Alignment In Human-AI Decision-Making
Pith reviewed 2026-05-10 12:09 UTC · model grok-4.3
The pith
Moral alignment may be a more fundamental dimension of human-AI decision-making than functional or behavioral alignment.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Moral alignment, defined as the perceived congruence between the values embedded in an AI system's decision logic and the moral intuitions of stakeholders, may be a more fundamental dimension of human-AI decision-making than functional or behavioral alignment, and moral (mis)alignment therefore shapes whether AI can be integrated meaningfully in contexts that involve fairness, responsibility, and harm.
What carries the argument
Moral alignment, defined as the perceived congruence between the values embedded in an AI system's decision logic and the moral intuitions of stakeholders.
If this is right
- AI design in sensitive domains must address embedded values to achieve stakeholder congruence beyond task accuracy.
- Misalignment on moral intuitions can prevent adoption even when an AI system is functionally effective.
- A multi-stakeholder view identifies differing moral foundations that an AI decision logic must accommodate.
- Moral Foundations Theory supplies specific categories for diagnosing where value mismatches occur.
Where Pith is reading between the lines
- Design processes could incorporate explicit checks for how stakeholders perceive the moral values in an AI's rules.
- Evaluation of deployed systems might include measures of moral congruence alongside accuracy and usability metrics.
- Training data selection and model objectives could be adjusted to increase the chance of producing decisions that feel morally aligned to users.
- Organizations using AI in high-stakes settings may need to map stakeholder moral intuitions before choosing or tuning decision logic.
Load-bearing premise
Moral alignment can be treated as a distinct and more fundamental dimension than functional or behavioral alignment without empirical validation of its priority.
What would settle it
An experiment that measures acceptance and trust in a high-stakes AI decision system while varying only functional performance versus moral congruence and finds that functional performance alone accounts for the outcomes.
read the original abstract
In high-stakes AI-supported decisions, considerations are not purely technical but involve moral judgments about fairness, responsibility, and harm. While prior research has focused mainly on functional or behavioral alignment, this paper argues that moral alignment may be a more fundamental dimension of human-AI decision-making. Moral alignment is defined as the perceived congruence between the values embedded in an AI system's decision logic and the moral intuitions of stakeholders. Building on Moral Foundations Theory, the paper adopts a multi-stakeholder perspective and highlights why moral (mis)alignment matters for the meaningful integration of AI in sensitive contexts.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper argues that in high-stakes AI-supported decisions, moral alignment—defined as the perceived congruence between the values embedded in an AI system's decision logic and the moral intuitions of stakeholders, drawing on Moral Foundations Theory—represents a more fundamental dimension of human-AI decision-making than functional or behavioral alignment. It adopts a multi-stakeholder perspective to explain the importance of moral (mis)alignment for meaningful AI integration in sensitive contexts.
Significance. If the priority of moral alignment over other forms could be derived or validated, the argument would reorient HCI and AI ethics research toward value congruence as a core design criterion, with potential implications for guidelines in domains like healthcare, justice, and autonomous systems. The multi-stakeholder framing offers a useful lens, but the lack of supporting analysis limits its current contribution to theory-building.
major comments (3)
- [Abstract] Abstract: The claim that moral alignment 'may be a more fundamental dimension' is presented as an argument without any logical derivation, comparative analysis, or counter-example showing why it takes precedence over functional/behavioral alignment rather than coexisting as one factor among several.
- [Abstract] The manuscript relies on Moral Foundations Theory to define moral alignment but provides no explicit mapping of its foundations (e.g., care, fairness) to AI decision logics, nor any justification for why this establishes priority in high-stakes decisions.
- [Abstract] No empirical evidence, illustrative scenarios, or formal steps are offered to demonstrate the distinction between moral misalignment and other alignment failures or to show differential impacts on integration.
minor comments (2)
- The title 'Smart But Not Moral?' is somewhat misleading as the focus is on alignment rather than AI possessing or lacking morality.
- Terms such as 'meaningful integration' and 'sensitive contexts' would benefit from more precise operational definitions to clarify the scope of the argument.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback on our manuscript. The comments highlight important opportunities to strengthen the clarity and rigor of our conceptual argument. We address each major comment point by point below, indicating where revisions will be made to the next version of the paper.
read point-by-point responses
-
Referee: [Abstract] Abstract: The claim that moral alignment 'may be a more fundamental dimension' is presented as an argument without any logical derivation, comparative analysis, or counter-example showing why it takes precedence over functional/behavioral alignment rather than coexisting as one factor among several.
Authors: We acknowledge that the abstract presents the core claim concisely as a conceptual proposition rather than a formally derived result. The full manuscript develops the argument through a multi-stakeholder lens grounded in Moral Foundations Theory, contrasting moral alignment with functional and behavioral approaches by showing how value congruence underpins stakeholder acceptance in high-stakes contexts even when technical performance is adequate. To address the request for greater explicitness, we will revise the abstract to briefly signal the reasoning and add a dedicated comparative analysis subsection in the introduction that includes counter-examples from domains such as healthcare and justice. revision: partial
-
Referee: [Abstract] The manuscript relies on Moral Foundations Theory to define moral alignment but provides no explicit mapping of its foundations (e.g., care, fairness) to AI decision logics, nor any justification for why this establishes priority in high-stakes decisions.
Authors: This observation is fair and points to a useful clarification. The manuscript invokes Moral Foundations Theory to ground the definition of moral alignment from a multi-stakeholder perspective, but we agree that an explicit mapping would make the link to AI decision logics more transparent. In the revised version we will insert a new subsection (or table) that maps each foundation (care/harm, fairness/cheating, loyalty/betrayal, authority/subversion, sanctity/degradation) to concrete AI decision scenarios, together with a short justification of why these mappings confer priority in high-stakes settings where moral intuitions directly affect integration. revision: yes
-
Referee: [Abstract] No empirical evidence, illustrative scenarios, or formal steps are offered to demonstrate the distinction between moral misalignment and other alignment failures or to show differential impacts on integration.
Authors: As a theoretical contribution focused on theory-building rather than empirical validation, the manuscript does not present new data. It does, however, employ illustrative scenarios drawn from sensitive decision contexts to differentiate moral misalignment (e.g., an AI that is functionally aligned yet violates fairness or care intuitions, eroding trust) from purely technical or behavioral failures. We will expand these scenarios, add a short formal outline of the distinction, and include a discussion of differential impacts on meaningful integration to make the argument more accessible. revision: partial
Circularity Check
No circularity; conceptual argument relies on external Moral Foundations Theory without self-referential reduction
full rationale
The paper offers a conceptual argument that moral alignment (defined as perceived value congruence with stakeholder intuitions) may be more fundamental than functional or behavioral alignment in human-AI decisions. It adopts a multi-stakeholder lens and builds explicitly on Moral Foundations Theory from external literature. No equations, fitted parameters, predictions, or derivations exist that reduce to the paper's own inputs by construction. No self-citations are load-bearing in a circular manner, and the priority claim is asserted as an analytical perspective rather than derived from a self-referential loop. The derivation chain is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Moral Foundations Theory provides an appropriate framework for stakeholder moral intuitions in AI decision contexts.
Reference graph
Works this paper leans on
-
[1]
Atari, M., Haidt, J., Graham, J., Koleva, S., Stevens, S. T., & Dehghani, M. (2023). Morality beyond the WEIRD: How the nomological network of morality varies across cultures. Journal of Personality and Social Psychology, 125(5), 1157-1188
work page 2023
-
[2]
Bhat, S., Lyon s, J. B., Shi, C., & Yang, X. J. (2024). Value Alignment and Trust in Human -robot Interaction: Insights From Simulation and User Study. In Discovering the Frontiers of Human- Robot Interaction: Insights and Innovations in Collaboration, Communication, and Control (pp. 39-63). Springer
work page 2024
-
[3]
Bonnefon, J.-F., Rahwan, I., & Shariff, A. (2024). The moral psychology of Artificial Intelligence. Annual review of psychology, 75(1), 653-675
work page 2024
-
[4]
Byrne, D. E. (1972). The attraction paradigm. Behavior Therapy, 3(2), 337-338
work page 1972
-
[5]
Dargnies, M. -P., Hakimov, R., & Kübler, D. (2024). Aversion to hiring algorithms: Transparency, gender profiling, and self-confidence. Management Science
work page 2024
-
[6]
Evans, J. (2025). Calling for Fair Visibility for All on LinkedIn . Change.org. Retrieved March 2026 from https://www.change.org/p/calling-for-fair-visibility-for-all-on-linkedin
work page 2025
-
[7]
Graham, J., Haidt, J., & Nosek, B. A. (2009). Liberals and conservatives rely on different sets of moral foundations. J Pers Soc Psychol, 96(5), 1029-1046
work page 2009
-
[8]
Haidt, J., & Graham, J. (2007). When morality opposes justice: Conservatives have moral intuitions that liberals may not recognize. Social justice research, 20(1), 98-116
work page 2007
-
[9]
Kirshner, S. N. (2024). Psychological Distance and Algorithm Aversion: Congruency and Advisor Confidence. Service Science
work page 2024
-
[10]
McPherson, M., Smith -Lovin, L., & Cook, J. M. (2001). Birds of a feather: Homophily in social networks. Annual review of sociology, 27(1), 415-444
work page 2001
-
[11]
Zipperling, D., Deck, L., Lanzl, J., & Kühl, N. (2025). It's Only Fair When I Think It's Fair: How Gender Bias Alignment Undermines Distributive Fairness in Human-AI Collaboration. Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency, Athens, Greece
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.