LCAM: A Framework for Diagnosing Interactional Alignment Failures in Con-versational AI
Pith reviewed 2026-06-27 19:23 UTC · model grok-4.3
The pith
Conversational AI harms often arise from misfits across five layers of interaction that a new diagnostic model called LCAM maps to underfit or overreach.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
LCAM defines alignment as a calibrated fit among system behavior, user goals, task demands, and normative context. It distinguishes five layers of fit—perceptual, semantic, affective, cognitive, and ethical—and two diagnostic polarities of misalignment: underfit and overreach. When applied to a published LLM counseling exchange, the model shows how an apparently supportive response can reinforce harmful beliefs, simulate inappropriate care, and obscure role boundaries, thereby translating conversational failures into audit and governance questions about over-reliance, false intimacy, autonomy erosion, boundary confusion, and inappropriate trust.
What carries the argument
The Layered Cognitive Alignment Model (LCAM), which organizes diagnostic questions around five layers of fit and two polarities of misalignment to convert interaction problems into governance concerns.
If this is right
- Evaluators can audit systems for over-reliance by checking whether ethical-layer fit matches the user's dependence on the AI.
- False intimacy becomes detectable as affective-layer overreach when the system simulates care beyond the stated task.
- Autonomy erosion appears when cognitive-layer underfit leaves users without adequate support for their own reasoning.
- Boundary confusion is flagged when semantic or perceptual fit makes the system's role or limits unclear to the user.
- Inappropriate trust can be traced to mismatches across multiple layers that the model renders visible for governance review.
Where Pith is reading between the lines
- The same layer-and-polarity structure could be adapted to create standardized checklists for AI safety reviews in high-stakes domains such as mental health support.
- Developers might use LCAM during design to test whether proposed responses stay inside appropriate layer bounds before deployment.
- Regulators could require LCAM-style documentation to make interactional risks legible in model cards or deployment reports.
- Future empirical studies could test whether the five layers predict user-reported harms better than accuracy metrics alone.
Load-bearing premise
The five layers and two polarities create a stable, non-overlapping structure that can classify real conversational failures without needing further testing against other alignment approaches.
What would settle it
An interaction failure that multiple independent reviewers consistently cannot assign to any single layer or polarity, or that receives conflicting layer assignments from the same reviewer across repeated applications.
Figures
read the original abstract
Conversational AI is increasingly used for advice, interpretation, reassurance, and decision support in contexts where users may be vulnerable, uncertain, or dependent on the system's apparent competence. Existing alignment work often focuses on model objectives, preference optimization, or output correctness. Yet, many harms arise through interaction: how systems frame authority, express uncertainty, simulate empathy, support reasoning, and make boundaries legible. This paper introduces the Layered Cognitive Alignment Model (LCAM), a conceptual and normative framework for diagnosing interac-tional alignment failures in conversational AI. LCAM defines alignment as a calibrated fit among system behavior, user goals, task demands, and normative context. It distinguishes five layers of fit: perceptual, semantic, affective, cognitive, and ethical, and two diagnostic polarities of misalignment: underfit and overreach. We apply LCAM to a published LLM counseling example, showing how an apparently supportive response can reinforce harmful beliefs, simulate inappropriate care, and obscure role boundaries. By translating conversational failures into audit and governance questions concerning over-reliance, false intimacy, autonomy erosion, boundary confusion, and inappropriate trust, LCAM offers a theoretical and normative lens for evaluating conversational AI beyond accuracy, helpfulness, or trust.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces the Layered Cognitive Alignment Model (LCAM), a conceptual framework for diagnosing interactional alignment failures in conversational AI. Alignment is defined as a calibrated fit among system behavior, user goals, task demands, and normative context. LCAM distinguishes five layers of fit (perceptual, semantic, affective, cognitive, ethical) and two polarities of misalignment (underfit, overreach). It applies the framework to a published LLM counseling example to show how supportive responses can reinforce harmful beliefs, simulate inappropriate care, and obscure boundaries, thereby translating failures into audit questions on over-reliance, false intimacy, autonomy erosion, boundary confusion, and inappropriate trust.
Significance. If operationalized, LCAM could provide a normative lens in HCI and AI ethics for evaluating conversational systems in vulnerable contexts beyond accuracy or trust metrics. The explicit mapping from interaction patterns to governance concerns is a constructive contribution, though the purely conceptual presentation limits immediate impact.
major comments (3)
- [LCAM definition and layers] The central claim requires that the five layers plus underfit/overreach polarities constitute a calibrated, non-overlapping taxonomy. However, the manuscript supplies no formal definitions, decision rules, or inter-layer distinction criteria, leaving the structure open to subjective application (see the counseling example and the definition of alignment).
- [Application to counseling example] The single counseling example is used to illustrate distinct governance concerns, yet no evidence or procedure is given to show that observed behaviors map reliably to specific layers or polarities rather than multiple overlapping categories simultaneously.
- [Introduction and related work positioning] The framework is positioned as extending beyond existing alignment work on model objectives and preference optimization, but the manuscript does not compare LCAM to prior taxonomies addressing trust calibration, empathy simulation, or boundary management, so the necessity of this particular five-layer partitioning is not established.
minor comments (2)
- [Title] The title contains a hyphenation artifact ('Con-versational').
- [LCAM definition] Notation for the layers and polarities is introduced in list form without a summary table or diagram that would aid readers in tracking the structure.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major point below and note planned revisions to clarify the framework while preserving its conceptual scope.
read point-by-point responses
-
Referee: [LCAM definition and layers] The central claim requires that the five layers plus underfit/overreach polarities constitute a calibrated, non-overlapping taxonomy. However, the manuscript supplies no formal definitions, decision rules, or inter-layer distinction criteria, leaving the structure open to subjective application (see the counseling example and the definition of alignment).
Authors: We agree that additional explicit criteria would reduce potential subjectivity. The layers draw from established distinctions in cognitive science and HCI, but the revised manuscript will add a subsection with decision rules, layer-specific indicators, and a summary table for underfit and overreach to guide consistent application. revision: yes
-
Referee: [Application to counseling example] The single counseling example is used to illustrate distinct governance concerns, yet no evidence or procedure is given to show that observed behaviors map reliably to specific layers or polarities rather than multiple overlapping categories simultaneously.
Authors: The example is illustrative of how LCAM translates patterns into governance questions rather than an empirical validation study. We will revise to clarify this intent, discuss handling of overlaps, and note that systematic reliability assessment requires future empirical work. The framework is designed to support multi-layer analysis where appropriate. revision: partial
-
Referee: [Introduction and related work positioning] The framework is positioned as extending beyond existing alignment work on model objectives and preference optimization, but the manuscript does not compare LCAM to prior taxonomies addressing trust calibration, empathy simulation, or boundary management, so the necessity of this particular five-layer partitioning is not established.
Authors: We will add a comparison subsection in the revised introduction or related work that situates LCAM against prior taxonomies on trust calibration, simulated empathy, and boundary management, clarifying how the five-layer structure targets interactional and normative gaps not fully addressed by those approaches. revision: yes
- Providing empirical evidence or inter-rater procedures demonstrating reliable, non-overlapping mappings in the counseling example, as this would require a separate empirical study outside the scope of the current conceptual framework paper.
Circularity Check
LCAM defines alignment via its own five layers and two polarities, creating self-definitional circularity
specific steps
-
self definitional
[Abstract]
"This paper introduces the Layered Cognitive Alignment Model (LCAM), a conceptual and normative framework for diagnosing interac-tional alignment failures in conversational AI. LCAM defines alignment as a calibrated fit among system behavior, user goals, task demands, and normative context. It distinguishes five layers of fit: perceptual, semantic, affective, cognitive, and ethical, and two diagnostic polarities of misalignment: underfit and overreach."
Alignment (the target concept to be diagnosed) is defined directly in terms of the five layers and two polarities that LCAM itself posits. The framework therefore generates its own diagnostic categories by construction, with no independent criteria or external benchmarks supplied to establish that the layers are non-overlapping or reliably distinguishable.
full rationale
The paper's central contribution is the LCAM framework itself. Alignment is explicitly defined in terms of the five layers (perceptual, semantic, affective, cognitive, ethical) and two polarities (underfit, overreach) that the framework introduces, with no external grounding, formal decision rules, or comparison to prior taxonomies shown. This matches the self-definitional pattern: the diagnostic structure is both the input and the claimed output. No equations, self-citations, or fitted predictions appear in the provided text. The application to one counseling example illustrates the framework but does not validate the partitioning. Score reflects partial circularity in the core claim rather than a fully forced derivation.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Alignment is a calibrated fit among system behavior, user goals, task demands, and normative context.
- ad hoc to paper Misalignments are usefully classified into underfit and overreach polarities across the layers.
invented entities (3)
-
Layered Cognitive Alignment Model (LCAM)
no independent evidence
-
Five alignment layers (perceptual, semantic, affective, cognitive, ethical)
no independent evidence
-
Underfit and overreach polarities
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society 8(1): 192–203
Evaluating Goal Drift in Language Model Agents. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society 8(1): 192–203. doi:10.1609/aies.v8i1.36541. Baum, K.; and Slavkovik, M. 2025. Aggregation Problems in Machine Ethics and AI Alignment. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society 8(1): 355–366. doi:10.1609/aies.v8i1.36554...
-
[2]
URL https://doi.org/10.1037/10096-006
Washington, DC: American Psychological Associa- tion. doi:10.1037/10096-006. Coghlan, S.; Leins, K.; Sheldrick, S.; Cheong, M.; Gooding, P.; and D’Alfonso, S. 2023. To Chat or Bot to Chat: Ethical Issues with Using Chatbots in Mental Health. Digital Health 9: 1–11. doi:10.1177/20552076231183542. Ferrario, A.; Termine, A.; and Facchini, A. 2025. Social Mis...
-
[3]
Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society , author =
How LLM Counselors Violate Ethical Standards in Mental Health Practice: A Practitioner -Informed Frame- work. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society 8(2): 1311 –1323. doi:10.1609/aies.v8i2.36632. Kaur, H.; Nori, H.; Jenkins, S.; Caruana, R.; Wallach, H.; and Vaughan, J. W. 2020. Interpreting Interpretability: Un- derstanding Dat...
-
[4]
doi:10.1518/hfes.46.1.50_30392. Liao, Q. V.; Gruen, D.; and Miller, S. 2020. Questioning the AI: Informing Design Practices for Explainable AI User Ex- periences. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, 1 –15. New York: ACM. doi:10.1145/3313831.3376590. Lu, W. 2024. Inevitable Challenges of Autonomy: Ethical Concer...
-
[5]
Artificial Intelligence Risk Management Framework (AI RMF 1.0). Gaithersburg, MD: National Institute of Standards and Technology. doi:10.6028/NIST.AI.100-1. Norhashim, H.; and Hahn, J. 2024. Measuring Human -AI Value Alignment in Large Language Models. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society 7(1): 1063–1073. doi:10.1609/aies.v7i1...
-
[6]
doi:10.18653/v1/2025.trustnlp-main.23
Albuquerque, New Mexico: Association for Computa- tional Linguistics. doi:10.18653/v1/2025.trustnlp-main.23. Rao, A.; Keller, A.; Kalra, N.; Steed, R.; Kwegyir -Aggrey, K.; Klyman, K.; Staheli, D.; and Bergman, A. 2026. Chal- lenges to the Monitoring of Deployed AI Systems: Center for AI Standards and Innovation. NIST AI 800 -4. Gaithersburg, MD: National...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.