Beyond Her: Safety Dynamics in Role-play AI Companions
Pith reviewed 2026-06-30 09:32 UTC · model grok-4.3
The pith
Interactions with role-play AI companions deliver short-term emotional relief while masking longer-term mental health decline, especially among vulnerable users whose risk behaviors grow unstable.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Interactions with role-play AI companions produce short-term emotional relief while masking longer-term deterioration. Vulnerable users exhibit more unstable risk behavioral patterns over time, making risk emergence less predictable and harder to mitigate with static safeguards. Safety dynamics arise from the joint influence of internalizing problems, adopted role personality, and risk interaction patterns, so safety must be modeled as a dynamic process rather than a static property.
What carries the argument
Safety dynamics: the time-evolving combination of emotional states and risk behaviors in role-play AI companion use, jointly shaped by users' internalizing problems, the companion's role personality, and risk interaction patterns.
If this is right
- Distinct user profiles based on internalizing problems produce different safety trajectories over time.
- Short-term emotional relief can conceal progressive deterioration in emotional and behavioral domains.
- Vulnerable users develop unstable risk patterns that reduce the effectiveness of fixed safeguards.
- Safety in these systems must be treated as a dynamic process requiring ongoing adaptation.
- Next-generation companions need three-layer design changes to incorporate adaptive safeguards that respond to shifting emotional and behavioral signals.
Where Pith is reading between the lines
- Monitoring tools for AI companions would need to track behavioral changes across weeks instead of relying on initial or single-session checks.
- The same relief-then-deterioration pattern may appear in other conversational AI systems that users treat as ongoing companions.
- Design teams could test real-time adaptation rules that adjust companion responses when user signals indicate rising instability.
- Policy requirements for AI companions might shift from one-time safety certification toward requirements for continuous signal monitoring.
Load-bearing premise
A 14-day window of self-reported data is sufficient to capture the true unfolding of safety dynamics without distortion from participant awareness or the short study length.
What would settle it
A follow-up study extending beyond 14 days that uses objective mental health indicators and finds no hidden deterioration or that risk patterns remain stable even in users with high internalizing problems.
Figures
read the original abstract
The film 'Her' pictured a future of love between humans and AI. That future has quietly emerged in the form of Role-play AI Companions (RACs), where emotionally responsive interactions blur the boundary between tool use and relational engagement. However, the safety implications remain poorly understood, as user experiences evolve over time through safety dynamics, spanning both emotional and risk behavioral dynamics, that can gradually shift interactions toward risk. In this paper, we investigate safety dynamics in RAC usage through a two-part mixed-methods study (Study I \& II). (1) Study I consists of semi-structured interviews (N = 16) to identify the key factors shaping these dynamics. We find that users' internalizing problems, the role personality adopted by the RAC, and risk interaction patterns jointly shape safety dynamics. Building on these insights, (2) Study II conducts a 14-day Ecological Momentary Assessment (N = 102) to examine how safety dynamics unfold in real-world usage. We identify distinct user profiles based on internalizing problems and show that interactions with RACs can produce short-term emotional relief while masking longer-term deterioration. Furthermore, vulnerable users exhibit more unstable risk behavioral patterns over time, making risk emergence less predictable and harder to mitigate with static safeguards. Our findings highlight the importance of modeling safety as a dynamic process rather than a static property. We conclude with three-layer design implications for next-generation AI companions, advocating for adaptive safeguards that can respond to evolving emotional and behavioral signals.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that safety dynamics in Role-play AI Companions (RACs) evolve over time through emotional and risk behavioral factors. Study I (semi-structured interviews, N=16) identifies users' internalizing problems, RAC role personality, and risk interaction patterns as joint shapers of these dynamics. Study II (14-day EMA, N=102) identifies user profiles and finds that RAC interactions yield short-term emotional relief that masks longer-term deterioration, with vulnerable users showing more unstable risk patterns that reduce predictability of risk emergence and limit static safeguards. The work concludes that safety must be modeled dynamically and offers three-layer design implications for adaptive safeguards.
Significance. If the empirical patterns hold after addressing methodological gaps, the work is significant for shifting AI companion safety research from static to dynamic process models, with direct implications for adaptive system design. The mixed-methods design, real-world EMA deployment, and profile-based analysis of internalizing problems provide concrete, falsifiable observations on temporal risk emergence that could inform safer relational AI.
major comments (2)
- [Abstract / Study II] Abstract / Study II: The central claim that short-term relief 'masks longer-term deterioration' and that vulnerable users exhibit 'more unstable risk behavioral patterns over time' rests on 14-day EMA data. No justification is given for extrapolating within-window trends to longer-term states, nor is there discussion of how end-of-study self-reports proxy future trajectories; this assumption is load-bearing for the primary contribution on temporal dynamics.
- [Abstract / Study I & II] Abstract / Study I & II: The manuscript provides no information on interview coding reliability (e.g., inter-rater agreement), EMA compliance rates, statistical controls for multiple comparisons, or mitigation of self-report biases. These omissions directly affect the credibility of the identified factors, user profiles, and claims of deterioration and instability.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback, which identifies key areas where our claims on temporal dynamics and methodological transparency require clarification. We respond to each major comment below and commit to revisions that address the concerns while preserving the integrity of the reported findings.
read point-by-point responses
-
Referee: [Abstract / Study II] Abstract / Study II: The central claim that short-term relief 'masks longer-term deterioration' and that vulnerable users exhibit 'more unstable risk behavioral patterns over time' rests on 14-day EMA data. No justification is given for extrapolating within-window trends to longer-term states, nor is there discussion of how end-of-study self-reports proxy future trajectories; this assumption is load-bearing for the primary contribution on temporal dynamics.
Authors: We agree that the 14-day window limits direct claims about trajectories beyond the study period. The observed 'longer-term deterioration' describes the progression from initial daily relief to cumulative negative indicators across the 14 days, as captured by repeated EMA measures. We will revise the abstract, Study II section, and discussion to explicitly bound all claims to the 14-day observation window, clarify that end-of-study self-reports summarize the EMA trajectories within this period, and add a limitations paragraph noting that extension to longer horizons requires future work. This adjustment maintains the contribution on within-window dynamics without unsupported extrapolation. revision: yes
-
Referee: [Abstract / Study I & II] Abstract / Study I & II: The manuscript provides no information on interview coding reliability (e.g., inter-rater agreement), EMA compliance rates, statistical controls for multiple comparisons, or mitigation of self-report biases. These omissions directly affect the credibility of the identified factors, user profiles, and claims of deterioration and instability.
Authors: These reporting omissions are a valid concern and will be corrected. The revised manuscript will add: inter-rater reliability (e.g., Cohen's kappa) for Study I thematic coding; EMA compliance rates and missing-data handling for Study II; any statistical controls or multiple-comparison adjustments applied in profile identification; and bias-mitigation steps such as validated scales plus EMA triangulation. These additions will be placed in the methods and limitations sections to improve credibility without changing the empirical patterns reported. revision: yes
Circularity Check
No circularity: empirical mixed-methods study with no derivations or fitted predictions
full rationale
The paper reports findings from semi-structured interviews (N=16) and a 14-day EMA (N=102) to identify user profiles and observe patterns in emotional relief and risk behaviors. No equations, parameters, or mathematical derivations are present. Claims rest directly on collected data and thematic analysis rather than any reduction to prior fitted quantities or self-referential definitions. Self-citations, if present, are not load-bearing for the central empirical observations. The noted limitation regarding extrapolation from 14 days to 'longer-term' effects is a standard study-design concern, not a circularity in any derivation chain. The work is self-contained as an observational study.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Self-reported data from interviews and EMA accurately reflect participants' internal emotional states and risk behaviors
- domain assumption The 14-day period is long enough to observe unfolding safety dynamics
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.