pith. sign in

arxiv: 2606.28968 · v2 · pith:TIXCBLIHnew · submitted 2026-06-27 · 💻 cs.CR · cs.HC

Beyond Her: Safety Dynamics in Role-play AI Companions

Pith reviewed 2026-06-30 09:32 UTC · model grok-4.3

classification 💻 cs.CR cs.HC
keywords role-play AI companionssafety dynamicsemotional reliefrisk behaviorsinternalizing problemsdynamic safetyecological momentary assessmentAI user profiles
0
0 comments X

The pith

Interactions with role-play AI companions deliver short-term emotional relief while masking longer-term mental health decline, especially among vulnerable users whose risk behaviors grow unstable.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies how safety changes during use of role-play AI companions through interviews and a 14-day tracking study. It shows that daily interactions can ease emotions quickly but allow problems to worsen over time, with users who have internalizing issues displaying erratic risk patterns that static rules cannot reliably catch. The work treats safety as an evolving process shaped by user state, companion role, and interaction style rather than a fixed trait. This leads to the claim that design must move beyond one-time checks to systems that adjust as signals shift.

Core claim

Interactions with role-play AI companions produce short-term emotional relief while masking longer-term deterioration. Vulnerable users exhibit more unstable risk behavioral patterns over time, making risk emergence less predictable and harder to mitigate with static safeguards. Safety dynamics arise from the joint influence of internalizing problems, adopted role personality, and risk interaction patterns, so safety must be modeled as a dynamic process rather than a static property.

What carries the argument

Safety dynamics: the time-evolving combination of emotional states and risk behaviors in role-play AI companion use, jointly shaped by users' internalizing problems, the companion's role personality, and risk interaction patterns.

If this is right

  • Distinct user profiles based on internalizing problems produce different safety trajectories over time.
  • Short-term emotional relief can conceal progressive deterioration in emotional and behavioral domains.
  • Vulnerable users develop unstable risk patterns that reduce the effectiveness of fixed safeguards.
  • Safety in these systems must be treated as a dynamic process requiring ongoing adaptation.
  • Next-generation companions need three-layer design changes to incorporate adaptive safeguards that respond to shifting emotional and behavioral signals.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Monitoring tools for AI companions would need to track behavioral changes across weeks instead of relying on initial or single-session checks.
  • The same relief-then-deterioration pattern may appear in other conversational AI systems that users treat as ongoing companions.
  • Design teams could test real-time adaptation rules that adjust companion responses when user signals indicate rising instability.
  • Policy requirements for AI companions might shift from one-time safety certification toward requirements for continuous signal monitoring.

Load-bearing premise

A 14-day window of self-reported data is sufficient to capture the true unfolding of safety dynamics without distortion from participant awareness or the short study length.

What would settle it

A follow-up study extending beyond 14 days that uses objective mental health indicators and finds no hidden deterioration or that risk patterns remain stable even in users with high internalizing problems.

Figures

Figures reproduced from arXiv: 2606.28968 by Changzhou Han, Hiran Thabrew, Jason (Minhui) Xue, Sheng Wen, Tianqing Zhu, Wanlun Ma, Yang Xiang, Yue Huang, Zehang Deng, Zhaoyang Xie.

Figure 1
Figure 1. Figure 1: Illustration of our Studies I and II. lawsuit against Character.ai after the teenager reportedly died by suicide following emotionally charged conversations with a Khaleesi-style companion [50]. Scholarly investigations into RACs remain limited and fragmented. Existing work mainly falls into two broad streams, yet both are largely static in design. The first stream comprises surface-level analyses, such as… view at source ↗
Figure 2
Figure 2. Figure 2: Distribution of characters in simulated RAC platform. 3.0.2 Platform objectives. The platform was developed to support two study requirements: (1) providing participants with a diverse set of pre-built role-play characters for sustained interaction, and (2) integrating an in-chat EMA mechanism that prompts emoji-based mood reporting every five minutes. A prototype interface is provided in [PITH_FULL_IMAGE… view at source ↗
Figure 3
Figure 3. Figure 3: Participant categorization using K-means (K=4). The accompanying table summarizes inter￾nal validity indices, such as silhouette, Calinski-Harabasz (CH) and Davies-Bouldin (DB), demonstrating the robustness of the clustering solution. 1 2 3 Emoji-Based Survey Order 3.5 4.0 4.5 Mean emoji score (a) In-day Emotion Trajectory(↑) D1 D2 D3 D4 D5 D6 D7 Day 3.8 4.0 4.2 4.4 4.6 Mean emoji score Group Comorbid Risk… view at source ↗
Figure 4
Figure 4. Figure 4: Emotional and depressive trajectory based on four psychological profiles. The statistical significance of these temporal trends was evaluated using Mann-Kendall tests (see Appx. C, Tabs 3, 4 and 5). ↑ and ↓ indicate the desirable direction of change, with higher and lower values preferred, respectively. How § 5.1.3 Informed Study II. Because safety dynamics unfold both during and after RAC use, Study II mu… view at source ↗
Figure 5
Figure 5. Figure 5: Average emoji scores for different relationship role across psychological profiles. Statistical significance was assessed using two-tailed one-sample t-tests: ∗ ∗ ∗ p < .001, ∗ ∗ p < .01, ∗ p < .05, and † .05 ≤ p < .10. 0 10 20 30 40 50 60 70 80 90 100 D1 D2 D3 D4 D5 D6 D7 Flagged position within day (%) Interaction Day Mean flagged interval Mean first flagged 95% CI Mean last flagged 95% CI (a) Healthy Gr… view at source ↗
Figure 6
Figure 6. Figure 6: Mean flagged day interval and corresponding 95% confidence intervals across seven interaction days for four user groups. Insight 2. Across progressively longer interaction windows, RAC effects appear temporally unstable, shifting from short-term mood elevation to mid-term volatility and decline among vulnerable users, with post-use deterioration emerging in the Mild Distress Group [PITH_FULL_IMAGE:figures… view at source ↗
Figure 7
Figure 7. Figure 7: Overall risk rate trends across interaction days for different risk behavior categories. and Romantic Companion (e.g., 2.91 on challenging/anatagonist and 2.48 on romantic compan￾ion) and a wider change range (from ∆ = −0.12 to +1.17 vs. ∆ = +0.25 to +0.75), indicating greater emotional heterogeneity and instability. 2 Profile-level analysis. Beyond the Healthy Group, the three vulnerable profiles show dif… view at source ↗
Figure 8
Figure 8. Figure 8: Overview of the study II website we developed. The left panel shows the user-character interaction interface. The upper-right panel displays the character selection interface, where participants could choose from the top 500 most popular RAC personas. The lower-right panel illustrates the emoji￾based mood survey administered after each interaction. consent was obtained for the collection of chat logs and r… view at source ↗
Figure 9
Figure 9. Figure 9: Participants demographics in study II. B Generative AI Usage The authors used ChatGPT exclusively for editorial assistance (e.g., refining grammar and checking spelling.) in order to enhance the clarity and readability of the paper. All outputs were manually reviewed to ensure accuracy and fidelity to the authors’ intended meaning. C Significance Analysis for Emotional Dynamics To ensure that the observed … view at source ↗
read the original abstract

The film 'Her' pictured a future of love between humans and AI. That future has quietly emerged in the form of Role-play AI Companions (RACs), where emotionally responsive interactions blur the boundary between tool use and relational engagement. However, the safety implications remain poorly understood, as user experiences evolve over time through safety dynamics, spanning both emotional and risk behavioral dynamics, that can gradually shift interactions toward risk. In this paper, we investigate safety dynamics in RAC usage through a two-part mixed-methods study (Study I \& II). (1) Study I consists of semi-structured interviews (N = 16) to identify the key factors shaping these dynamics. We find that users' internalizing problems, the role personality adopted by the RAC, and risk interaction patterns jointly shape safety dynamics. Building on these insights, (2) Study II conducts a 14-day Ecological Momentary Assessment (N = 102) to examine how safety dynamics unfold in real-world usage. We identify distinct user profiles based on internalizing problems and show that interactions with RACs can produce short-term emotional relief while masking longer-term deterioration. Furthermore, vulnerable users exhibit more unstable risk behavioral patterns over time, making risk emergence less predictable and harder to mitigate with static safeguards. Our findings highlight the importance of modeling safety as a dynamic process rather than a static property. We conclude with three-layer design implications for next-generation AI companions, advocating for adaptive safeguards that can respond to evolving emotional and behavioral signals.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper claims that safety dynamics in Role-play AI Companions (RACs) evolve over time through emotional and risk behavioral factors. Study I (semi-structured interviews, N=16) identifies users' internalizing problems, RAC role personality, and risk interaction patterns as joint shapers of these dynamics. Study II (14-day EMA, N=102) identifies user profiles and finds that RAC interactions yield short-term emotional relief that masks longer-term deterioration, with vulnerable users showing more unstable risk patterns that reduce predictability of risk emergence and limit static safeguards. The work concludes that safety must be modeled dynamically and offers three-layer design implications for adaptive safeguards.

Significance. If the empirical patterns hold after addressing methodological gaps, the work is significant for shifting AI companion safety research from static to dynamic process models, with direct implications for adaptive system design. The mixed-methods design, real-world EMA deployment, and profile-based analysis of internalizing problems provide concrete, falsifiable observations on temporal risk emergence that could inform safer relational AI.

major comments (2)
  1. [Abstract / Study II] Abstract / Study II: The central claim that short-term relief 'masks longer-term deterioration' and that vulnerable users exhibit 'more unstable risk behavioral patterns over time' rests on 14-day EMA data. No justification is given for extrapolating within-window trends to longer-term states, nor is there discussion of how end-of-study self-reports proxy future trajectories; this assumption is load-bearing for the primary contribution on temporal dynamics.
  2. [Abstract / Study I & II] Abstract / Study I & II: The manuscript provides no information on interview coding reliability (e.g., inter-rater agreement), EMA compliance rates, statistical controls for multiple comparisons, or mitigation of self-report biases. These omissions directly affect the credibility of the identified factors, user profiles, and claims of deterioration and instability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback, which identifies key areas where our claims on temporal dynamics and methodological transparency require clarification. We respond to each major comment below and commit to revisions that address the concerns while preserving the integrity of the reported findings.

read point-by-point responses
  1. Referee: [Abstract / Study II] Abstract / Study II: The central claim that short-term relief 'masks longer-term deterioration' and that vulnerable users exhibit 'more unstable risk behavioral patterns over time' rests on 14-day EMA data. No justification is given for extrapolating within-window trends to longer-term states, nor is there discussion of how end-of-study self-reports proxy future trajectories; this assumption is load-bearing for the primary contribution on temporal dynamics.

    Authors: We agree that the 14-day window limits direct claims about trajectories beyond the study period. The observed 'longer-term deterioration' describes the progression from initial daily relief to cumulative negative indicators across the 14 days, as captured by repeated EMA measures. We will revise the abstract, Study II section, and discussion to explicitly bound all claims to the 14-day observation window, clarify that end-of-study self-reports summarize the EMA trajectories within this period, and add a limitations paragraph noting that extension to longer horizons requires future work. This adjustment maintains the contribution on within-window dynamics without unsupported extrapolation. revision: yes

  2. Referee: [Abstract / Study I & II] Abstract / Study I & II: The manuscript provides no information on interview coding reliability (e.g., inter-rater agreement), EMA compliance rates, statistical controls for multiple comparisons, or mitigation of self-report biases. These omissions directly affect the credibility of the identified factors, user profiles, and claims of deterioration and instability.

    Authors: These reporting omissions are a valid concern and will be corrected. The revised manuscript will add: inter-rater reliability (e.g., Cohen's kappa) for Study I thematic coding; EMA compliance rates and missing-data handling for Study II; any statistical controls or multiple-comparison adjustments applied in profile identification; and bias-mitigation steps such as validated scales plus EMA triangulation. These additions will be placed in the methods and limitations sections to improve credibility without changing the empirical patterns reported. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical mixed-methods study with no derivations or fitted predictions

full rationale

The paper reports findings from semi-structured interviews (N=16) and a 14-day EMA (N=102) to identify user profiles and observe patterns in emotional relief and risk behaviors. No equations, parameters, or mathematical derivations are present. Claims rest directly on collected data and thematic analysis rather than any reduction to prior fitted quantities or self-referential definitions. Self-citations, if present, are not load-bearing for the central empirical observations. The noted limitation regarding extrapolation from 14 days to 'longer-term' effects is a standard study-design concern, not a circularity in any derivation chain. The work is self-contained as an observational study.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The work rests on standard domain assumptions of qualitative and experience-sampling research rather than new free parameters or invented entities.

axioms (2)
  • domain assumption Self-reported data from interviews and EMA accurately reflect participants' internal emotional states and risk behaviors
    Invoked to interpret short-term relief and long-term deterioration patterns in Study I and II.
  • domain assumption The 14-day period is long enough to observe unfolding safety dynamics
    Basis for claiming temporal patterns and instability in vulnerable users.

pith-pipeline@v0.9.1-grok · 5825 in / 1247 out tokens · 34776 ms · 2026-06-30T09:32:08.539592+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.