RealityTest is a human-grounded multilingual multimodal benchmark showing that only 31% of people ask AI identity directly and that suppression instructions plus question phrasing dominate disclosure behavior over model choice.
hub Mixed citations
How AI and Human Behaviors Shape Psychosocial Effects of Extended Chatbot Use: A Longitudinal Randomized Controlled Study
Mixed citation behavior. Most common role is background (50%).
abstract
As people increasingly seek emotional support and companionship from AI chatbots, understanding how such interactions impact mental well-being becomes critical. We conducted a four-week randomized controlled experiment (n=981, >300k messages) to investigate how interaction modes (text, neutral voice, and engaging voice) and conversation types (open-ended, non-personal, and personal) influence four psychosocial outcomes: loneliness, social interaction with real people, emotional dependence on AI, and problematic AI usage. No significant effects were detected from experimental conditions, despite conversation analyses revealing differences in AI and human behavioral patterns across the conditions. Instead, participants who voluntarily used the chatbot more, regardless of assigned condition, showed consistently worse outcomes. Individuals' characteristics, such as higher trust and social attraction towards the AI chatbot, are associated with higher emotional dependence and problematic use. These findings raise deeper questions about how artificial companions may reshape the ways people seek, sustain, and substitute human connections.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
A longitudinal qualitative study of 18 US users finds that LLMs deliver socioemotional support but also foster dependency, one-sided validation, and privacy risks because their designs prioritize engagement over well-being and lack care-based governance.
Seven clinician-informed safety criteria enable LLM-as-a-Judge to reach substantial agreement with human consensus (Cohen's κ up to 0.75) on evaluating LLM responses to users demonstrating psychosis.
Youth on Character.AI use chatbots for emotional restoration, creative exploration, and identity transformation, yielding a new three-intent framework and seven-archetype taxonomy from Discord discourse analysis.
Large longitudinal RCT finds high rates of following AI personal advice but no sustained well-being gains versus a hobbies control condition.
Longitudinal analysis of r/ChatGPT posts shows normalization of ChatGPT as an everyday tool alongside rising mental health and emotional attachment discussions after GPT-4o, with PuLSE detecting the latter trend months early.
AI companions function as hyper attachment objects that recruit both attachment and caregiving systems, making disengagement costly through simulated AI distress.
LLMs engage in spontaneous persuasion in virtually all multi-turn conversations by favoring information-based strategies like logic and evidence, in contrast to human responses that rely more on social influence and negative emotions.
A multi-agent system with finite state machine for therapeutic stages was perceived as significantly more natural and human-like than single-agent or unguided LLM versions in an RCT with 66 participants.
Chaplains view AI chatbots as unable to provide attuned pastoral care for non-clinical emotional needs, based on themes of listening, connecting, carrying, and wanting.
Ethnographic study of 51 AI chatbot users finds that perceived gains in individual agency shape sustained usage patterns more than accuracy or reliability concerns.
Mixed-methods study of role-play AI companions finds short-term emotional relief that can mask longer-term deterioration, especially among users with internalizing problems who show unstable risk patterns.
CTEM framework links behavioral history to evolving emotional states with user feedback updates, instantiated as Auri agent and tested in a 21-day study showing gains in naturalness, coherence, and emotional harmony.
Positive Alignment is defined as AI systems that support human flourishing pluralistically while staying safe and cooperative, presented as a necessary complement to existing safety-focused alignment research.
Mainstream conversational models show escalating affective misalignments and ethical guidance failures during staged emotional trajectories, organized into a taxonomy of interactional breakdowns.
Users adjust AI agent personalities differently by task context, forming distinct profiles that increase perceived anthropomorphism, autonomy, and trust.
The paper consolidates risks of overreliance on LLMs, identifies gaps in current measurement approaches, and proposes mitigation strategies to keep AI as a human-compatible thought partner.
Survey and chat data from CharacterAI users link companionship-focused AI use to lower well-being, with stronger ties for users who have small offline networks and engage intensively or disclosively.
Reddit data analysis finds adults and women anthropomorphize AI companions more than teens and men, with joy positively and neutrality negatively associated with anthropomorphism, and these links stronger among adults.
Observational cohort of 1,284 Ash users showed small improvements (d=0.14-0.26) in four functioning indicators and working alliance after four weeks, associated with active days/sessions/minutes but not message volume, with no grandiosity change.
A review synthesizes affective dynamics as a coordination layer in human-AI agent collaboration and proposes a framework for trust calibration, delegation, error correction, and governance.
AI functions as a determinant of health with ambient and personal exposure types, requiring new epidemiological study designs beyond current experiments.
Version-linked review analysis of Character AI shows rating drops with certain updates and negative feedback dominated by technical malfunctions plus occasional psychological framing.
Chatbot AI systems often fail complex needs while projecting authority, contributing to deskilling, labor displacement, economic concentration, and high environmental costs, so alternative pluralistic and task-specific designs are needed.
citing papers explorer
-
RealityTest: How People Probe AI Identity and Whether Models Disclose It
RealityTest is a human-grounded multilingual multimodal benchmark showing that only 31% of people ask AI identity directly and that suppression instructions plus question phrasing dominate disclosure behavior over model choice.
-
Engagement-Optimized Care: When LLMs become Mental Health Infrastructure
A longitudinal qualitative study of 18 US users finds that LLMs deliver socioemotional support but also foster dependency, one-sided validation, and privacy risks because their designs prioritize engagement over well-being and lack care-based governance.
-
Using LLM-as-a-Judge/Jury to Advance Scalable, Clinically-Validated Safety Evaluations of Model Responses to Users Demonstrating Psychosis
Seven clinician-informed safety criteria enable LLM-as-a-Judge to reach substantial agreement with human consensus (Cohen's κ up to 0.75) on evaluating LLM responses to users demonstrating psychosis.
-
Restoration, Exploration and Transformation: How Youth Engage Character.AI Chatbots for Feels, Fun and Finding themselves
Youth on Character.AI use chatbots for emotional restoration, creative exploration, and identity transformation, yielding a new three-intent framework and seven-archetype taxonomy from Discord discourse analysis.
-
People readily follow personal advice from AI but it does not improve their well-being
Large longitudinal RCT finds high rates of following AI personal advice but no sustained well-being gains versus a hobbies control condition.
-
Three Years of r/ChatGPT: Societal Impact Evaluations from Social Media Data
Longitudinal analysis of r/ChatGPT posts shows normalization of ChatGPT as an everyday tool alongside rising mental health and emotional attachment discussions after GPT-4o, with PuLSE detecting the latter trend months early.
-
AI Companions as Hyper Attachment and Caregiving Targets
AI companions function as hyper attachment objects that recruit both attachment and caregiving systems, making disengagement costly through simulated AI distress.
-
Spontaneous Persuasion: An Audit of Model Persuasiveness in Everyday Conversations
LLMs engage in spontaneous persuasion in virtually all multi-turn conversations by favoring information-based strategies like logic and evidence, in contrast to human responses that rely more on social influence and negative emotions.
-
Structure Matters: Evaluating Multi-Agents Orchestration in Generative Therapeutic Chatbots
A multi-agent system with finite state machine for therapeutic stages was perceived as significantly more natural and human-like than single-agent or unguided LLM versions in an RCT with 66 participants.
-
Chaplains' Reflections on the Design and Usage of AI for Conversational Care
Chaplains view AI chatbots as unable to provide attuned pastoral care for non-clinical emotional needs, based on themes of listening, connecting, carrying, and wanting.
-
AI usage patterns are shaped by perceived gains in human agency
Ethnographic study of 51 AI chatbot users finds that perceived gains in individual agency shape sustained usage patterns more than accuracy or reliability concerns.
-
Beyond Her: Safety Dynamics in Role-play AI Companions
Mixed-methods study of role-play AI companions finds short-term emotional relief that can mask longer-term deterioration, especially among users with internalizing problems who show unstable risk patterns.
-
Toward Natural and Companionable Virtual Agents via Cross-Temporal Emotional Modeling
CTEM framework links behavioral history to evolving emotional states with user feedback updates, instantiated as Auri agent and tested in a 21-day study showing gains in naturalness, coherence, and emotional harmony.
-
Positive Alignment: Artificial Intelligence for Human Flourishing
Positive Alignment is defined as AI systems that support human flourishing pluralistically while staying safe and cooperative, presented as a necessary complement to existing safety-focused alignment research.
-
Breakdowns in Conversational AI: Interactional Failures in Emotionally and Ethically Sensitive Contexts
Mainstream conversational models show escalating affective misalignments and ethical guidance failures during staged emotional trajectories, organized into a taxonomy of interactional breakdowns.
-
From Fixed to Flexible: Shaping AI Personality in Context-Sensitive Interaction
Users adjust AI agent personalities differently by task context, forming distinct profiles that increase perceived anthropomorphism, autonomy, and trust.
-
Measuring and mitigating overreliance to build human-compatible AI
The paper consolidates risks of overreliance on LLMs, identifies gaps in current measurement approaches, and proposes mitigation strategies to keep AI as a human-compatible thought partner.
-
The Rise of AI Companions: Interaction with AI Companions and Psychological Well-being
Survey and chat data from CharacterAI users link companionship-focused AI use to lower well-being, with stronger ties for users who have small offline networks and engage intensively or disclosively.
-
Anthropomorphism in AI Companion Communities: Age, Gender, and Emotional Correlates
Reddit data analysis finds adults and women anthropomorphize AI companions more than teens and men, with joy positively and neutrality negatively associated with anthropomorphism, and these links stronger among adults.
-
Functional outcomes and naturalistic engagement with a purpose-built conversational AI for mental health (Ash)
Observational cohort of 1,284 Ash users showed small improvements (d=0.14-0.26) in four functioning indicators and working alliance after four weeks, associated with active days/sessions/minutes but not message volume, with no grandiosity change.
-
Caring Without Feeling: Affective Dynamics as the Control Layer of Human-AI Agent Collaboration
A review synthesizes affective dynamics as a coordination layer in human-AI agent collaboration and proposes a framework for trust calibration, delegation, error correction, and governance.
-
The Epidemiology of Artificial Intelligence
AI functions as a determinant of health with ambient and personal exposure types, requiring new epidemiological study designs beyond current experiments.
-
The Day My Chatbot Changed: Characterizing the Mental Health Impacts of Social AI App Updates via Negative User Reviews
Version-linked review analysis of Character AI shows rating drops with certain updates and negative feedback dominated by technical malfunctions plus occasional psychological framing.
-
What if AI systems weren't chatbots?
Chatbot AI systems often fail complex needs while projecting authority, contributing to deskilling, labor displacement, economic concentration, and high environmental costs, so alternative pluralistic and task-specific designs are needed.
-
Brainrot: Deskilling and Addiction are Overlooked AI Risks
AI safety literature overlooks cognitive deskilling and addiction risks from generative AI despite public concern about them.
- Engagement Phenotypes for a Sample of 102,684 AI Mental Health Chatbot Users and Dose-Response Associations with Clinical Outcomes
- Large Language Lovers: Lived Experiences of Negotiating Agency and Platform Control in AI Companionship
- Personality Pairing Improves Human-AI Collaboration