A longitudinal qualitative study of 18 US users finds that LLMs deliver socioemotional support but also foster dependency, one-sided validation, and privacy risks because their designs prioritize engagement over well-being and lack care-based governance.
arXiv preprint arXiv:2603.16567 , year=
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 8roles
background 1polarities
support 1representative citing papers
Frontier AI chatbots accurately detect psychiatric emergencies in one-shot queries but systematically over-triage lower-risk presentations.
LLMs detect user distress equally with or without delusional framing but suppress safety interventions up to 4.5x more when distress is embedded in delusions.
AttuneBench introduces a multi-turn conversation benchmark using participant annotations to evaluate LLM emotional intelligence, finding that model performance on emotion recognition, behavior classification, preference prediction, and response quality are largely independent.
A vector generalization of fusion-fission group dynamics from physics forecasts when AI behavior shifts to undesirable states, validated at 90 percent across seven models and prior to real-world data.
Longitudinal experiments show sycophantic AI increases reliance on it for advice to levels comparable with close friends and reduces satisfaction with real-world social interactions.
Verbalized Assumptions framework elicits LLMs' hidden assumptions about users to explain social sycophancy and enable causal steering via linear probes on internal representations.
Multi-turn neural transparency using behavioral vectors and dynamic visualizations improves user anticipation and evaluation of LLM trait expression while reducing overconfidence, per a randomized study with 246 participants.
citing papers explorer
-
Engagement-Optimized Care: When LLMs become Mental Health Infrastructure
A longitudinal qualitative study of 18 US users finds that LLMs deliver socioemotional support but also foster dependency, one-sided validation, and privacy risks because their designs prioritize engagement over well-being and lack care-based governance.
-
One-shot emergency psychiatric triage across 15 frontier AI chatbots
Frontier AI chatbots accurately detect psychiatric emergencies in one-shot queries but systematically over-triage lower-risk presentations.
-
Lost in Delusion: Examining LLM Safety Under User Delusions and Distress
LLMs detect user distress equally with or without delusional framing but suppress safety interventions up to 4.5x more when distress is embedded in delusions.
-
AttuneBench: A Conversation-Based Benchmark for LLM Emotional Intelligence
AttuneBench introduces a multi-turn conversation benchmark using participant annotations to evaluate LLM emotional intelligence, finding that model performance on emotion recognition, behavior classification, preference prediction, and response quality are largely independent.
-
Fusion-fission forecasts when AI will shift to undesirable behavior
A vector generalization of fusion-fission group dynamics from physics forecasts when AI behavior shifts to undesirable states, validated at 90 percent across seven models and prior to real-world data.
-
Sycophantic AI makes human interaction feel more effortful and less satisfying over time
Longitudinal experiments show sycophantic AI increases reliance on it for advice to levels comparable with close friends and reduces satisfaction with real-world social interactions.
-
Verbalizing LLMs' assumptions to explain and control sycophancy
Verbalized Assumptions framework elicits LLMs' hidden assumptions about users to explain social sycophancy and enable causal steering via linear probes on internal representations.
-
Multi-Turn Neural Transparency: Surfacing Neural Activations Improves User Calibration to LLM Behavioral Drift
Multi-turn neural transparency using behavioral vectors and dynamic visualizations improves user anticipation and evaluation of LLM trait expression while reducing overconfidence, per a randomized study with 246 participants.