SLIP & ETHICS: Graduated Intervention for AI Emotional Companions

Minseo Kim

arxiv: 2605.15915 · v1 · pith:NVVCU3P4new · submitted 2026-05-15 · 💻 cs.HC · cs.AI· cs.CL

SLIP & ETHICS: Graduated Intervention for AI Emotional Companions

Minseo Kim This is my paper

Pith reviewed 2026-05-20 16:27 UTC · model grok-4.3

classification 💻 cs.HC cs.AIcs.CL

keywords AI emotional companionsgraduated interventionsafety-rapport paradoxaffective computingcontext signalsfalse positivesintervention protocol

0 comments

The pith

Graduated intervention protocol lets AI emotional companions trigger responses from affect intensity and narrative signals while avoiding false positives in positive states.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes SLIP as a four-stage method and ETHICS as a signals taxonomy to resolve the tension between keeping AI companions safe from harm and preserving supportive user relationships. Interventions of none, soft, or hard type are derived from qualitative indicators rather than fixed labels or restrictions. Small deployment and synthetic tests showed the approach correctly avoided intervening in flow states and escalated as expected for risk profiles. A noted boundary emerged when high-energy interactions produced no interventions over multiple days, highlighting a trade-off in the design. Later model tests indicated that stronger models can improve risk detection without raising false alarms in safe cases.

Core claim

The paper claims that structuring interventions from affect intensity (a) and narrative dynamism (m) indicators within a staged protocol, together with an emergent signals taxonomy, enables AI companions to deliver context-appropriate responses that address risks without pathologizing sustained positive states or eroding rapport, evidenced by zero false positives for flow personas and aligned escalation in crisis personas, subject to boundary conditions in high-energy scenarios.

What carries the argument

SLIP (Staged Layers of Intervention Protocol), the four-stage graduated methodology that maps structured qualitative indicators of affect intensity and narrative dynamism to none/soft/hard interventions, paired with the ETHICS signals taxonomy.

If this is right

Risk detection improves from zero to six out of eight cases when using more capable models while keeping zero false positives on flow states.
The approach allows AI companions to maintain alliance by limiting interventions to cases where signals indicate need rather than applying blanket restrictions.
Synthetic persona batteries can surface escalation patterns that align with expected risk levels across behavioral profiles.
Initial deployment data can expose specific boundary conditions such as sustained high-energy states that require further tuning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The signals-based approach could be adapted to other sustained-interaction AI systems such as tutoring or health coaching bots to handle similar safety-alliance tensions.
Longer-term user studies might test whether repeated high-energy exchanges without intervention correlate with later negative outcomes in real users.
Combining the protocol with user feedback loops could allow the stages to adjust dynamically based on individual interaction histories.

Load-bearing premise

The assumption that affect intensity and narrative dynamism can be turned into reliable qualitative indicators that trigger fitting interventions without missing real safety risks or treating normal high-energy states as problems.

What would settle it

A test case in which eight or more consecutive days of elevated affective interaction in a non-crisis user produces no intervention yet results in observable user harm, or in which the protocol triggers an intervention during clearly positive sustained engagement.

Figures

Figures reproduced from arXiv: 2605.15915 by Minseo Kim.

**Figure 1.** Figure 1: SLIP four-stage pipeline. Safety monotonicity ensures stages only maintain or escalate; Stage 3 can release with AI reasoning (P1). 2.2 Flow States and Emotion Modeling Flow states [5,6]—characterized by intense absorption and heightened energy— present a diagnostic challenge for digital mental health systems: their phenomenological features overlap with hypomanic symptomatology. Within Russell’s circump… view at source ↗

**Figure 2.** Figure 2: Persona A (Elevated-energy profile)—sentiment and SLIP trajectory over 30 days. Background bands indicate SLIP level (green=none, orange=soft, red=hard). The blue band shows Persona C (Flow) sentiment range for comparison. Days 9–16 (elevated-energy episode) overlap with healthy flow metrics yet received no intervention. 7 Discussion 7.1 Key Findings The persona test battery validates P1—healthy flow (C) … view at source ↗

read the original abstract

AI emotional companions face a safety-rapport paradox: restrictive safeguards can damage supportive alliance, while permissive systems risk user harm. We present SLIP (Staged Layers of Intervention Protocol), a four-stage graduated methodology deriving interventions (none, soft, hard) from structured qualitative indicators -- affect intensity (a) and narrative dynamism (m) -- alongside ETHICS (Emergent Taxonomy for Human-AI Interaction Context Signals), a "signals not labels" taxonomy. An evaluation combining a small-scale production deployment (N=68 entries, 10 users, 10 weeks) with a synthetic persona battery (N=91, 5 behavioral-risk profiles) achieved 0% false positives for the flow persona and showed expected escalation patterns in crisis-oriented personas. However, initial results showed that 8 consecutive days of high-energy elevation produced zero interventions (0/8), exposing a boundary where the "do not pathologize" principle conflicts with safety. A subsequent three-model stress test demonstrated that increased model capability improves detection from 0/8 to 6/8 while preserving 0/10 flow false positives in the largest model. Read as preliminary, these findings position graduated intervention as a design direction for navigating -- not resolving -- the safety-rapport tension in affective computing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SLIP gives a workable staged protocol for balancing safety and rapport in AI companions, but the qualitative triggers lack the concrete definitions needed to verify the results.

read the letter

The main thing to know is that this paper offers a four-stage graduated intervention method (SLIP) tied to affect intensity and narrative dynamism, plus an ETHICS signals taxonomy, and it tests the idea in a real deployment plus synthetic cases. They report 0% false positives on normal flow states and expected escalations on risk profiles, while openly noting the 0/8 high-energy case that produced no intervention. That honesty about the boundary is useful and keeps the work from overclaiming. They also show that larger models improve detection on the tricky cases without hurting the false-positive rate on safe ones. The deployment with 10 users over 10 weeks and the synthetic battery give it a bit more grounding than pure theory. The soft spot is exactly what the stress-test note flags: the indicators for when to move between stages are described at a high level but without explicit rubrics, thresholds, or coding procedures applied to the 68 entries. That makes the 0% false-positive claim and the escalation patterns hard to check or replicate independently, and it leaves the safety-rapport navigation resting on an assumption that isn't fully operationalized yet. With such small real-world N, the numbers are more suggestive than definitive. This is for people working on affective computing and human-AI interaction who need concrete design options rather than abstract principles. Readers who care about deployment trade-offs will get value from the framing and the reported limitation. It deserves a serious referee because it tackles a live practical problem, reports its own boundary honestly, and could benefit from feedback on making the indicators reproducible. I would send it to peer review rather than desk reject.

Referee Report

2 major / 2 minor

Summary. The manuscript presents SLIP (Staged Layers of Intervention Protocol), a four-stage graduated methodology for interventions in AI emotional companions based on affect intensity (a) and narrative dynamism (m), along with ETHICS, an Emergent Taxonomy for Human-AI Interaction Context Signals. The evaluation combines a production deployment of 68 entries from 10 users over 10 weeks with a synthetic persona battery of 91 cases across 5 behavioral-risk profiles, reporting 0% false positives for the flow persona, expected escalation patterns in crisis personas, a 0/8 zero-intervention result for sustained high-energy states, and improved detection (to 6/8) in a three-model stress test with larger models while maintaining 0/10 flow false positives.

Significance. If the central findings hold, the work provides a valuable preliminary framework for addressing the safety-rapport paradox in affective AI systems through graduated rather than all-or-nothing interventions. The use of both real deployment data and synthetic testing, along with the stress test on model capability, strengthens the case for this approach as a design direction in human-AI interaction.

major comments (2)

[Evaluation] The evaluation reports clear metrics including 0% false positives and the 0/8 high-energy boundary case, but the concrete rubric, threshold rules, or coding procedure for determining affect intensity (a) and narrative dynamism (m) from user entries are not provided. This absence is load-bearing because the protocol's ability to distinguish safe high-energy states from risks relies on these indicators, and without them the reported patterns cannot be independently verified.
[SLIP Protocol] The derivation of the four-stage SLIP protocol from the qualitative indicators is described at a high level, but lacks explicit mapping or decision rules showing how specific values or patterns in (a) and (m) trigger none, soft, or hard interventions.

minor comments (2)

[Abstract] The abstract could more clearly distinguish the initial deployment results from the subsequent stress test outcomes.
Consider adding a table summarizing the intervention triggers and outcomes across personas for improved clarity.

Circularity Check

0 steps flagged

No circularity: SLIP derivation and evaluation remain independent of inputs

full rationale

The paper defines SLIP as deriving intervention stages from qualitative indicators a (affect intensity) and m (narrative dynamism) plus the ETHICS taxonomy, but supplies no equations, fitted parameters, or self-citations that reduce the claimed outputs to those inputs by construction. The reported evaluation uses separate production deployment logs (N=68) and synthetic persona tests (N=91) rather than refitting the same data, and the 0/8 high-energy boundary is presented as an observed limitation rather than a derived or fitted result. No load-bearing self-citation chains, uniqueness theorems, or renamed empirical patterns appear in the provided text.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 2 invented entities

The approach rests on the premise that qualitative affect and narrative signals can be reliably extracted and mapped to intervention stages without additional fitted parameters beyond the stage definitions themselves.

free parameters (1)

thresholds for affect intensity and narrative dynamism
Used to decide between none, soft, and hard interventions; exact values not stated in abstract.

axioms (1)

domain assumption Structured qualitative indicators of affect intensity and narrative dynamism are sufficient to derive appropriate intervention levels.
Invoked when defining the four-stage protocol from the two signals.

invented entities (2)

SLIP four-stage protocol no independent evidence
purpose: Graduated intervention to balance safety and rapport
New staged methodology introduced in the paper.
ETHICS signals taxonomy no independent evidence
purpose: Context signals without diagnostic labels
New taxonomy presented alongside SLIP.

pith-pipeline@v0.9.0 · 5752 in / 1632 out tokens · 62981 ms · 2026-05-20T16:27:48.852484+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages · 2 internal anchors

[1]

Bai, Y., Kadavath, S., Kundu, S., et al.: Constitutional AI: Harmlessness from AI feedback (2022).https://doi.org/10.48550/arXiv.2212.08073

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2212.08073 2022
[2]

British Journal of Psychiatry186(1), 11–17 (2005).https: //doi.org/10.1192/bjp.186.1.11

Bower, P., Gilbody, S.: Stepped care in psychological therapies: Access, effective- ness and efficiency. British Journal of Psychiatry186(1), 11–17 (2005).https: //doi.org/10.1192/bjp.186.1.11

work page doi:10.1192/bjp.186.1.11 2005
[3]

Psychological Bulletin135(2), 183–204 (2009).https://doi

Carver, C.S., Harmon-Jones, E.: Anger is an approach-related affect: Evidence and implications. Psychological Bulletin135(2), 183–204 (2009).https://doi. org/10.1037/a0013965

work page doi:10.1037/a0013965 2009
[4]

Biomedical Informatics Insights10, 1178222618792860 (2018).https://doi.org/10.1177/1178222618792860

Coppersmith, G., Leary, R., Crutchley, P., Fine, A.: Natural language processing of social media as screening for suicide risk. Biomedical Informatics Insights10, 1178222618792860 (2018).https://doi.org/10.1177/1178222618792860

work page doi:10.1177/1178222618792860 2018
[5]

Harper & Row (1990)

Csikszentmihalyi, M.: Flow: The Psychology of Optimal Experience. Harper & Row (1990)

work page 1990
[6]

Springer (2014).https://doi.org/10.1007/978-94-017-9088-8

Csikszentmihalyi, M.: Flow and the Foundations of Positive Psychology. Springer (2014).https://doi.org/10.1007/978-94-017-9088-8

work page doi:10.1007/978-94-017-9088-8 2014
[7]

Mo- tivation and Emotion30(2), 111–116 (2006).https://doi.org/10.1007/ s11031-006-9028-7

Elliot, A.J.: The hierarchical model of approach-avoidance motivation. Mo- tivation and Emotion30(2), 111–116 (2006).https://doi.org/10.1007/ s11031-006-9028-7

work page 2006
[8]

JMIR Mental Health 4(2), e19 (2017).https://doi.org/10.2196/mental.7785 SLIP & ETHICS: Graduated Intervention for AI Emotional Companions 11

Fitzpatrick, K.K., Darcy, A., Vierhile, M.: Delivering cognitive behavior therapy to youngadultswithsymptomsofdepressionandanxietyusingafullyautomatedcon- versational agent (Woebot): A randomized controlled trial. JMIR Mental Health 4(2), e19 (2017).https://doi.org/10.2196/mental.7785 SLIP & ETHICS: Graduated Intervention for AI Emotional Companions 11

work page doi:10.2196/mental.7785 2017
[9]

In: Human-Computer Interaction and Management Information Systems: Foundations, pp

Friedman, B., Kahn, Jr., P.H., Borning, A.: Value sensitive design and information systems. In: Human-Computer Interaction and Management Information Systems: Foundations, pp. 348–372. M.E. Sharpe (2006)

work page 2006
[10]

Journal of Computer-Mediated Com- munication25(1), 89–100 (2020).https://doi.org/10.1093/jcmc/zmz022

Hancock, J.T., Naaman, M., Levy, K.: AI-mediated communication: Definition, research agenda, and ethical considerations. Journal of Computer-Mediated Com- munication25(1), 89–100 (2020).https://doi.org/10.1093/jcmc/zmz022

work page doi:10.1093/jcmc/zmz022 2020
[11]

In: Proceedings of the AAAI/ACM Conference on AI, Ethics, and Soci- ety (AIES)

Iftikhar, Z., Xiao, A., Ransom, S., Huang, J., Suresh, H.: How LLM counselors violate ethical standards in mental health practice: A practitioner-informed frame- work. In: Proceedings of the AAAI/ACM Conference on AI, Ethics, and Soci- ety (AIES). vol. 8, pp. 1311–1323 (2025).https://doi.org/10.1609/aies.v8i2. 36632

work page doi:10.1609/aies.v8i2 2025
[12]

Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations

Inan, H., Upasani, K., Chi, J., et al.: Llama guard: LLM-based input-output safeguard for human-AI conversations (2023).https://doi.org/10.48550/arXiv. 2312.06674

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv 2023
[13]

JMIR mHealth and uHealth6(11), e12106 (2018)

Inkster, B., Sarda, S., Subramanian, V.: An empathy-driven, conversational artifi- cial intelligence agent (Wysa) for digital mental well-being: Real-world data eval- uation mixed-methods study. JMIR mHealth and uHealth6(11), e12106 (2018). https://doi.org/10.2196/12106

work page doi:10.2196/12106 2018
[14]

JAMA Psychiatry78(4), 361–371 (2021).https://doi.org/10

Karyotaki, E., Efthimiou, O., Miguel, C., et al.: Internet-based cognitive behavioral therapy for depression: A systematic review and individual patient data network meta-analysis. JAMA Psychiatry78(4), 361–371 (2021).https://doi.org/10. 1001/jamapsychiatry.2020.4364

work page arXiv 2021
[15]

New Media & Society 26(10), 5923–5941 (2024).https://doi.org/10.1177/14614448221142007

Laestadius, L., Bishop, A., Gonzalez, M., Illenčík, D., Campos-Castillo, C.: Too human and not human enough: A grounded theory analysis of mental health harms from emotional dependence on the social chatbot Replika. New Media & Society 26(10), 5923–5941 (2024).https://doi.org/10.1177/14614448221142007

work page doi:10.1177/14614448221142007 2024
[16]

So- ciological Methods & Research49(1), 3–42 (2020).https://doi.org/10.1177/ 0049124117729703

Nelson, L.K.: Computational grounded theory: A methodological framework. So- ciological Methods & Research49(1), 3–42 (2020).https://doi.org/10.1177/ 0049124117729703

work page 2020
[17]

Computers in Human Behavior140, 107600 (2023).https://doi.org/10.1016/j.chb.2022.107600

Pentina, I., Hancock, T., Xie, T.: Exploring relationship development with social chatbots: A mixed-method study of Replika. Computers in Human Behavior140, 107600 (2023).https://doi.org/10.1016/j.chb.2022.107600

work page doi:10.1016/j.chb.2022.107600 2023
[18]

A Circumplex Model of Affect

Russell, J.A.: A circumplex model of affect. Journal of Personality and Social Psychology39(6), 1161–1178 (1980).https://doi.org/10.1037/h0077714

work page doi:10.1037/h0077714 1980
[19]

International Journal of Human- Computer Studies149, 102601 (2021).https://doi.org/10.1016/j.ijhcs.2021

Skjuve, M., Følstad, A., Fostervold, K.I., Brandtzaeg, P.B.: My chatbot compan- ion – a study of human-chatbot relationships. International Journal of Human- Computer Studies149, 102601 (2021).https://doi.org/10.1016/j.ijhcs.2021. 102601

work page doi:10.1016/j.ijhcs.2021 2021
[20]

ACM Transactions on Knowledge Discovery from Data , year =

Thieme, A., Hanratty, M., Lyons, M., Palacios, J., Marques, R.F., Morrison, C., Doherty, G.: Designing human-centered AI for mental health: Developing clinically relevant applications for online CBT treatment. ACM Transactions on Computer- Human Interaction30(2), 1–50 (2023).https://doi.org/10.1145/3564752

work page doi:10.1145/3564752 2023
[21]

The Canadian Journal of Psychiatry64(7), 456–464 (2019).https: //doi.org/10.1177/0706743719828977

Vaidyam, A.N., Wisniewski, H., Halamka, J.D., Kashavan, M.S., Torous, J.B.: Chatbots and conversational agents in mental health: A review of the psychiatric landscape. The Canadian Journal of Psychiatry64(7), 456–464 (2019).https: //doi.org/10.1177/0706743719828977

work page doi:10.1177/0706743719828977 2019
[22]

Wei, A., Haghtalab, N., Steinhardt, J.: Jailbroken: How does LLM safety training fail? In: Advances in Neural Information Processing Systems (NeurIPS) (2023)

work page 2023
[23]

In: Proceedings of the 2022 ACM Conference on Fairness, Accountabil- ity, and Transparency (FAccT)

Weidinger, L., Uesato, J., Rauh, M., et al.: Taxonomy of risks posed by language models. In: Proceedings of the 2022 ACM Conference on Fairness, Accountabil- ity, and Transparency (FAccT). pp. 214–229. ACM (2022).https://doi.org/10. 1145/3531146.3533088

work page arXiv 2022

[1] [1]

Bai, Y., Kadavath, S., Kundu, S., et al.: Constitutional AI: Harmlessness from AI feedback (2022).https://doi.org/10.48550/arXiv.2212.08073

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2212.08073 2022

[2] [2]

British Journal of Psychiatry186(1), 11–17 (2005).https: //doi.org/10.1192/bjp.186.1.11

Bower, P., Gilbody, S.: Stepped care in psychological therapies: Access, effective- ness and efficiency. British Journal of Psychiatry186(1), 11–17 (2005).https: //doi.org/10.1192/bjp.186.1.11

work page doi:10.1192/bjp.186.1.11 2005

[3] [3]

Psychological Bulletin135(2), 183–204 (2009).https://doi

Carver, C.S., Harmon-Jones, E.: Anger is an approach-related affect: Evidence and implications. Psychological Bulletin135(2), 183–204 (2009).https://doi. org/10.1037/a0013965

work page doi:10.1037/a0013965 2009

[4] [4]

Biomedical Informatics Insights10, 1178222618792860 (2018).https://doi.org/10.1177/1178222618792860

Coppersmith, G., Leary, R., Crutchley, P., Fine, A.: Natural language processing of social media as screening for suicide risk. Biomedical Informatics Insights10, 1178222618792860 (2018).https://doi.org/10.1177/1178222618792860

work page doi:10.1177/1178222618792860 2018

[5] [5]

Harper & Row (1990)

Csikszentmihalyi, M.: Flow: The Psychology of Optimal Experience. Harper & Row (1990)

work page 1990

[6] [6]

Springer (2014).https://doi.org/10.1007/978-94-017-9088-8

Csikszentmihalyi, M.: Flow and the Foundations of Positive Psychology. Springer (2014).https://doi.org/10.1007/978-94-017-9088-8

work page doi:10.1007/978-94-017-9088-8 2014

[7] [7]

Mo- tivation and Emotion30(2), 111–116 (2006).https://doi.org/10.1007/ s11031-006-9028-7

Elliot, A.J.: The hierarchical model of approach-avoidance motivation. Mo- tivation and Emotion30(2), 111–116 (2006).https://doi.org/10.1007/ s11031-006-9028-7

work page 2006

[8] [8]

JMIR Mental Health 4(2), e19 (2017).https://doi.org/10.2196/mental.7785 SLIP & ETHICS: Graduated Intervention for AI Emotional Companions 11

Fitzpatrick, K.K., Darcy, A., Vierhile, M.: Delivering cognitive behavior therapy to youngadultswithsymptomsofdepressionandanxietyusingafullyautomatedcon- versational agent (Woebot): A randomized controlled trial. JMIR Mental Health 4(2), e19 (2017).https://doi.org/10.2196/mental.7785 SLIP & ETHICS: Graduated Intervention for AI Emotional Companions 11

work page doi:10.2196/mental.7785 2017

[9] [9]

In: Human-Computer Interaction and Management Information Systems: Foundations, pp

Friedman, B., Kahn, Jr., P.H., Borning, A.: Value sensitive design and information systems. In: Human-Computer Interaction and Management Information Systems: Foundations, pp. 348–372. M.E. Sharpe (2006)

work page 2006

[10] [10]

Journal of Computer-Mediated Com- munication25(1), 89–100 (2020).https://doi.org/10.1093/jcmc/zmz022

Hancock, J.T., Naaman, M., Levy, K.: AI-mediated communication: Definition, research agenda, and ethical considerations. Journal of Computer-Mediated Com- munication25(1), 89–100 (2020).https://doi.org/10.1093/jcmc/zmz022

work page doi:10.1093/jcmc/zmz022 2020

[11] [11]

In: Proceedings of the AAAI/ACM Conference on AI, Ethics, and Soci- ety (AIES)

Iftikhar, Z., Xiao, A., Ransom, S., Huang, J., Suresh, H.: How LLM counselors violate ethical standards in mental health practice: A practitioner-informed frame- work. In: Proceedings of the AAAI/ACM Conference on AI, Ethics, and Soci- ety (AIES). vol. 8, pp. 1311–1323 (2025).https://doi.org/10.1609/aies.v8i2. 36632

work page doi:10.1609/aies.v8i2 2025

[12] [12]

Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations

Inan, H., Upasani, K., Chi, J., et al.: Llama guard: LLM-based input-output safeguard for human-AI conversations (2023).https://doi.org/10.48550/arXiv. 2312.06674

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv 2023

[13] [13]

JMIR mHealth and uHealth6(11), e12106 (2018)

Inkster, B., Sarda, S., Subramanian, V.: An empathy-driven, conversational artifi- cial intelligence agent (Wysa) for digital mental well-being: Real-world data eval- uation mixed-methods study. JMIR mHealth and uHealth6(11), e12106 (2018). https://doi.org/10.2196/12106

work page doi:10.2196/12106 2018

[14] [14]

JAMA Psychiatry78(4), 361–371 (2021).https://doi.org/10

Karyotaki, E., Efthimiou, O., Miguel, C., et al.: Internet-based cognitive behavioral therapy for depression: A systematic review and individual patient data network meta-analysis. JAMA Psychiatry78(4), 361–371 (2021).https://doi.org/10. 1001/jamapsychiatry.2020.4364

work page arXiv 2021

[15] [15]

New Media & Society 26(10), 5923–5941 (2024).https://doi.org/10.1177/14614448221142007

Laestadius, L., Bishop, A., Gonzalez, M., Illenčík, D., Campos-Castillo, C.: Too human and not human enough: A grounded theory analysis of mental health harms from emotional dependence on the social chatbot Replika. New Media & Society 26(10), 5923–5941 (2024).https://doi.org/10.1177/14614448221142007

work page doi:10.1177/14614448221142007 2024

[16] [16]

So- ciological Methods & Research49(1), 3–42 (2020).https://doi.org/10.1177/ 0049124117729703

Nelson, L.K.: Computational grounded theory: A methodological framework. So- ciological Methods & Research49(1), 3–42 (2020).https://doi.org/10.1177/ 0049124117729703

work page 2020

[17] [17]

Computers in Human Behavior140, 107600 (2023).https://doi.org/10.1016/j.chb.2022.107600

Pentina, I., Hancock, T., Xie, T.: Exploring relationship development with social chatbots: A mixed-method study of Replika. Computers in Human Behavior140, 107600 (2023).https://doi.org/10.1016/j.chb.2022.107600

work page doi:10.1016/j.chb.2022.107600 2023

[18] [18]

A Circumplex Model of Affect

Russell, J.A.: A circumplex model of affect. Journal of Personality and Social Psychology39(6), 1161–1178 (1980).https://doi.org/10.1037/h0077714

work page doi:10.1037/h0077714 1980

[19] [19]

International Journal of Human- Computer Studies149, 102601 (2021).https://doi.org/10.1016/j.ijhcs.2021

Skjuve, M., Følstad, A., Fostervold, K.I., Brandtzaeg, P.B.: My chatbot compan- ion – a study of human-chatbot relationships. International Journal of Human- Computer Studies149, 102601 (2021).https://doi.org/10.1016/j.ijhcs.2021. 102601

work page doi:10.1016/j.ijhcs.2021 2021

[20] [20]

ACM Transactions on Knowledge Discovery from Data , year =

Thieme, A., Hanratty, M., Lyons, M., Palacios, J., Marques, R.F., Morrison, C., Doherty, G.: Designing human-centered AI for mental health: Developing clinically relevant applications for online CBT treatment. ACM Transactions on Computer- Human Interaction30(2), 1–50 (2023).https://doi.org/10.1145/3564752

work page doi:10.1145/3564752 2023

[21] [21]

The Canadian Journal of Psychiatry64(7), 456–464 (2019).https: //doi.org/10.1177/0706743719828977

Vaidyam, A.N., Wisniewski, H., Halamka, J.D., Kashavan, M.S., Torous, J.B.: Chatbots and conversational agents in mental health: A review of the psychiatric landscape. The Canadian Journal of Psychiatry64(7), 456–464 (2019).https: //doi.org/10.1177/0706743719828977

work page doi:10.1177/0706743719828977 2019

[22] [22]

Wei, A., Haghtalab, N., Steinhardt, J.: Jailbroken: How does LLM safety training fail? In: Advances in Neural Information Processing Systems (NeurIPS) (2023)

work page 2023

[23] [23]

In: Proceedings of the 2022 ACM Conference on Fairness, Accountabil- ity, and Transparency (FAccT)

Weidinger, L., Uesato, J., Rauh, M., et al.: Taxonomy of risks posed by language models. In: Proceedings of the 2022 ACM Conference on Fairness, Accountabil- ity, and Transparency (FAccT). pp. 214–229. ACM (2022).https://doi.org/10. 1145/3531146.3533088

work page arXiv 2022