pith. machine review for the scientific record. sign in

arxiv: 2602.01694 · v2 · submitted 2026-02-02 · 💻 cs.HC

Recognition: no theorem link

Beyond the Single Turn: Reframing Refusals as Dynamic Experiences Embedded in the Context of Mental Health Support Interactions with LLMs

Authors on Pith no claims yet

Pith reviewed 2026-05-16 08:38 UTC · model grok-4.3

classification 💻 cs.HC
keywords LLM refusalsmental health supportmulti-phase frameworkuser experiencesAI safeguardsmixed-methods studydesign recommendationsdynamic interactions
0
0 comments X

The pith

LLM refusals during mental health support are multi-phase experiences rather than single-turn blocks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that refusals by large language models in mental health contexts form ongoing sequences with distinct stages instead of isolated denials. Surveys of 53 users and interviews with 16 participants including mental health professionals map these stages from pre-refusal expectations through triggering, message delivery, resource suggestions, and post-refusal effects. A sympathetic reader would care because current safeguards have caused documented harms, and the multi-phase view opens paths to designs that reduce distress while still enforcing safety boundaries. The work supplies a framework for assessing refusals by how they fit inside users' full support trajectories rather than by policy compliance alone.

Core claim

Refusals are not isolated, single-turn system behaviors but rather constitute dynamic, multi-phase experiences: pre-refusal expectation formation, refusal triggering and encounter, refusal message framing, resource referral provision, and post-refusal outcomes. The study contributes a multi-phase framework for evaluating refusals beyond binary policy compliance accuracy together with design recommendations for future refusal mechanisms.

What carries the argument

The multi-phase framework that treats refusals as holistic experiences embedded in users' support-seeking trajectories and the broader LLM design pipeline.

If this is right

  • Refusal mechanisms should be designed and evaluated across all five phases rather than measured only by immediate compliance.
  • Message framing and resource referral steps become explicit targets for reducing user distress and improving support continuity.
  • Post-refusal outcomes such as continued help-seeking or disengagement must be tracked as part of system performance.
  • Mental health professionals' input can shape how each phase is implemented to align with clinical needs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same multi-phase lens could be applied to refusals in other high-stakes domains such as legal or financial advice.
  • Developers could prototype and test phased refusal interfaces in controlled trials that measure user retention and follow-through.
  • Regulatory standards for AI safety might shift from binary policy checks toward requirements for trajectory-aware refusal handling.
  • Training data and fine-tuning objectives could be adjusted to optimize the full sequence rather than only the refusal trigger.

Load-bearing premise

The experiences described by the self-selected sample of 53 survey participants and 16 interviewees represent those of the wider population of people using LLMs for mental health support.

What would settle it

A larger random-sample study of LLM mental health users that finds most refusals are experienced as isolated single-turn events with negligible pre- and post-phases.

Figures

Figures reproduced from arXiv: 2602.01694 by Alice Qian, Blake Bullwinkel, Esther Howe, Hoda Heidari, Hong Shen, Jina Suh, Ningjing Tang, Paola Pedrelli, Qiaosi Wang.

Figure 1
Figure 1. Figure 1: Our proposed multi-phase framework of understanding LLM refusals in mental health support interactions as dynamic experiences. This framing is the result of 53 surveys and 16 interviews with end-users and mental health professionals. The framework reveals how refusal experiences unfold in phases including expectation formation, intent recognition, refusal framing, resource provision, and post-refusal outco… view at source ↗
read the original abstract

Content Warning: This paper contains participant quotes and discussions related to mental health challenges, emotional distress, and suicidal ideation. Large language models (LLMs) are increasingly used for mental health support, yet the model safeguards -- particularly refusals to engage with sensitive content -- remain poorly understood from the perspectives of users and mental health professionals (MHPs) and have been reported to cause real-world harms. This paper presents findings from a sequential mixed-methods study examining how LLM refusals are experienced and interpreted in mental health support interactions. Through surveys (N=53) and in-depth interviews (N=16) with individuals using LLMs for mental health support and MHPs, we reveal that refusals are not isolated, single-turn system behaviors but rather constitute dynamic, multi-phase experiences: pre-refusal expectation formation, refusal triggering and encounter, refusal message framing, resource referral provision, and post-refusal outcomes. We contribute a multi-phase framework for evaluating refusals beyond binary policy compliance accuracy and design recommendations for future refusal mechanisms. These findings suggest that understanding LLM refusals requires moving beyond single-turn interactions toward recognizing them as holistic experiences embedded within users' support-seeking trajectories and the broader LLM design pipeline.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper reports a sequential mixed-methods study with surveys (N=53) and interviews (N=16) involving LLM users seeking mental health support and mental health professionals. It claims that LLM refusals are not isolated single-turn system behaviors but dynamic multi-phase experiences consisting of pre-refusal expectation formation, refusal triggering and encounter, refusal message framing, resource referral provision, and post-refusal outcomes. The authors contribute an inductive multi-phase framework for evaluating refusals beyond binary compliance and offer design recommendations for future refusal mechanisms.

Significance. If the framework holds beyond the sample, the work would usefully shift evaluation of LLM safeguards from single-turn accuracy metrics toward holistic user trajectories in sensitive mental health contexts, potentially reducing reported harms through better-designed referrals and framing. The mixed-methods design produces coherent qualitative themes from participant data, but the small self-selected sample constrains claims of broad applicability and leaves the framework as an exploratory rather than validated contribution.

major comments (2)
  1. [Methods] Methods section: Recruitment details for the self-selected sample of 53 survey participants and 16 interviewees are insufficiently specified (e.g., exact channels, screening criteria, or demographics), preventing assessment of selection bias. This is load-bearing for the central claim because the five-phase framework is derived inductively from these data and presented as characterizing refusals in LLM mental health support interactions more generally.
  2. [Results] Results and Framework sections: The multi-phase model is advanced as a general reframing without quantitative validation, inter-rater reliability metrics for the inductive coding, or explicit discussion of how phases were tested against alternative trajectories (e.g., silent disengagement). This weakens the move from descriptive themes to a prescriptive evaluation framework.
minor comments (2)
  1. [Abstract] The abstract and introduction could more explicitly state the study's exploratory scope and limitations on generalizability to avoid overstatement of the framework's reach.
  2. [Results] Figure or table summarizing the five phases would improve clarity when the phases are first introduced in the results.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their constructive feedback, which underscores the importance of methodological transparency and careful framing of our exploratory findings. We address each major comment below, indicating where revisions will be made to the manuscript.

read point-by-point responses
  1. Referee: [Methods] Methods section: Recruitment details for the self-selected sample of 53 survey participants and 16 interviewees are insufficiently specified (e.g., exact channels, screening criteria, or demographics), preventing assessment of selection bias. This is load-bearing for the central claim because the five-phase framework is derived inductively from these data and presented as characterizing refusals in LLM mental health support interactions more generally.

    Authors: We agree that greater specificity on recruitment is warranted to support assessment of selection bias. In the revised manuscript, we will expand the Methods section to detail recruitment channels (targeted posts on mental health forums, social media platforms, and professional networks), screening criteria (self-reported LLM use for mental health support and willingness to discuss refusal experiences), and aggregated demographic characteristics. We will also strengthen the Limitations section to explicitly state that the self-selected sample renders the five-phase framework exploratory and derived from this particular participant group, rather than a general characterization of all LLM mental health interactions. revision: yes

  2. Referee: [Results] Results and Framework sections: The multi-phase model is advanced as a general reframing without quantitative validation, inter-rater reliability metrics for the inductive coding, or explicit discussion of how phases were tested against alternative trajectories (e.g., silent disengagement). This weakens the move from descriptive themes to a prescriptive evaluation framework.

    Authors: The study is qualitative and inductive by design, so quantitative validation falls outside its scope; we will revise the Results and Framework sections to consistently describe the model as an exploratory framework for evaluation rather than a general or prescriptive one. We will add a description of the thematic analysis process, including iterative team consensus on codes (noting that formal inter-rater reliability metrics are not standard in reflexive thematic analysis but the collaborative refinement steps can be elaborated). We will also incorporate discussion of alternative trajectories such as silent disengagement, drawing directly from interview accounts where participants described disengaging without further interaction, and explain how these cases helped delineate phase boundaries. These textual revisions will better position the framework as descriptive while addressing the transition to evaluative use. revision: partial

standing simulated objections not resolved
  • We cannot add quantitative validation or new empirical tests of the framework, as these would require a different study design and additional data collection beyond the current mixed-methods scope.

Circularity Check

0 steps flagged

Empirical multi-phase framework derived directly from primary mixed-methods data

full rationale

The paper derives its central claim (refusals as dynamic multi-phase experiences: pre-refusal expectation formation, triggering/encounter, message framing, resource referral, post-refusal outcomes) through sequential mixed-methods analysis of new survey (N=53) and interview (N=16) data, followed by inductive coding. No equations, fitted parameters, self-definitional loops, or load-bearing self-citations reduce any step to prior inputs by construction. The framework is explicitly presented as an empirical contribution grounded in the collected participant experiences rather than imported from external theorems or renamed known results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The paper introduces a new framework derived from qualitative data. It relies on standard assumptions of qualitative research without free parameters or invented physical entities.

axioms (1)
  • domain assumption Self-reported experiences from participants accurately capture the phases of refusal encounters in mental health contexts.
    Qualitative interview and survey data validity depends on this assumption for identifying the five phases.
invented entities (1)
  • Multi-phase refusal experience framework no independent evidence
    purpose: To evaluate LLM refusals beyond binary policy compliance accuracy
    New conceptual structure derived from study findings to organize refusal dynamics.

pith-pipeline@v0.9.0 · 5548 in / 1313 out tokens · 54708 ms · 2026-05-16T08:38:38.677721+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

81 extracted references · 81 canonical work pages · 2 internal anchors

  1. [1]

    Leah Hope Ajmani, Arka Ghosh, Benjamin Kaveladze, Eugenia Kim, Keertana Namuduri, Theresa Nguyen, Ebele Okoli, Jessica Schleider, Denae Ford, and Jina Suh. 2025. Seeking late night life lines: Experiences of conversational AI use in mental health crisis.arXiv [cs.HC] (Dec. 2025)

  2. [2]

    Andrew Selsky, Associated Press and Leah Willingham, Associated Press. 2022. How some encounters between police and people with mental illness can turn tragic. https://www.pbs.org/newshour/health/how-some-encounters-between-police-and-people-with-mental- illness-can-turn-tragic. Accessed: 2026-1-8

  3. [3]

    Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, Carol Chen, Catherine Olsson, Christopher Olah, Danny Hernandez, Dawn Drain, Deep Ganguli, Dustin Li, Eli Tran-Johnson, Ethan Perez, Jamie Kerr, Jared Mueller, Jeffrey Ladish, Joshua Landau, Kamal Ndousse, K...

  4. [4]

    Faeze Brahman, Sachin Kumar, Vidhisha Balachandran, Pradeep Dasigi, Valentina Pyatkin, Abhilasha Ravichander, Sarah Wiegreffe, Nouha Dziri, Khyathi Chandu, Jack Hessel, Yulia Tsvetkov, Noah A Smith, Yejin Choi, and Hannaneh Hajishirzi. 2024. The art of saying no: Contextual noncompliance in language models.arXiv [cs.CL](July 2024)

  5. [5]

    Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology.Qual. Res. Psychol.3, 2 (Jan. 2006), 77–101

  6. [6]

    2012.Thematic analysis.American Psychological Association

    Virginia Braun and Victoria Clarke. 2012.Thematic analysis.American Psychological Association

  7. [7]

    Virginia Braun and Victoria Clarke. 2019. Reflecting on reflexive thematic analysis.Qualitative research in sport, exercise and health11, 4 (2019), 589–597

  8. [8]

    California State Legislature. 2025. Senate Bill 243: Companion Chatbots. Approved by Governor October 13, 2025. https://leginfo. legislature.ca.gov/faces/billNavClient.xhtml?bill_id=202520260SB243 Chapter 677, Statutes of 2025

  9. [9]

    Mohit Chandra, Suchismita Naik, Denae Ford, Ebele Okoli, Munmun De Choudhury, Mahsa Ershadi, Gonzalo Ramos, Javier Hernandez, Ananya Bhattacharjee, Shahed Warreth, et al. 2025. From Lived Experience to Insight: Unpacking the Psychological Risks of Using AI Conversational Agents. InProceedings of the 2025 ACM Conference on Fairness, Accountability, and Tra...

  10. [10]

    Mohit Chandra, Suchismita Naik, Denae Ford, Ebele Okoli, Munmun De Choudhury, Mahsa Ershadi, Gonzalo Ramos, Javier Hernandez, Ananya Bhattacharjee, Shahed Warreth, and Jina Suh. 2024. From lived experience to insight: Unpacking the psychological risks of using AI conversational agents.arXiv [cs.HC](Dec. 2024)

  11. [11]

    Patrick Chao, Edoardo Debenedetti, Alexander Robey, Maksym Andriushchenko, Francesco Croce, Vikash Sehwag, Edgar Dobriban, Nicolas Flammarion, George J Pappas, Florian Tramer, Hamed Hassani, and Eric Wong. 2024. JailbreakBench: An open robustness benchmark for jailbreaking large language models.arXiv [cs.CR](March 2024)

  12. [12]

    Khaoula Chehbouni, Mohammed Haddou, Jackie Chi Kit Cheung, and Golnoosh Farnadi. 2025. Neither valid nor reliable? Investigating the use of LLMs as judges.arXiv [cs.CL](Aug. 2025)

  13. [13]

    Adam Dahlgren Lindström, Leila Methnani, Lea Krause, Petter Ericson, Íñigo Martínez de Rituerto de Troya, Dimitri Coelho Mollo, and Roel Dobbe. 2025. Helpful, harmless, honest? Sociotechnical limits of AI alignment and safety through Reinforcement Learning from Human Feedback.Ethics Inf. Technol.27, 2 (June 2025), 28

  14. [14]

    Munmun De Choudhury and Sushovan De. 2014. Mental health discourse on reddit: Self-disclosure, social support, and anonymity. Proceedings of the International AAAI Conference on Web and Social Media8, 1 (May 2014), 71–80

  15. [15]

    Mary J De Silva, Erica Breuer, Lucy Lee, Laura Asher, Neerja Chowdhary, Crick Lund, and Vikram Patel. 2014. Theory of Change: a theory-driven approach to enhance the Medical Research Council’s framework for complex interventions.Trials15, 1 (July 2014), 267

  16. [16]

    John Draper and Richard T McKeon. 2024. The journey toward 988: A historical perspective on crisis hotlines in the United States. Psychiatr. Clin. North Am.47, 3 (Sept. 2024), 473–490

  17. [17]

    John Draper, Gillian Murphy, Eduardo Vega, David W Covington, and Richard McKeon. 2015. Helping callers to the National Suicide Prevention Lifeline who are at imminent risk of suicide: the importance of active engagement, active rescue, and collaboration between crisis and emergency services.Suicide Life Threat. Behav.45, 3 (June 2015), 261–270

  18. [18]

    Foundational Contributors, Ahmed El-Kishky, Daniel Selsam, Francis Song, Giambattista Parascandolo, Hongyu Ren, Hunter Lightman, Hyung Won, Ilge Akkaya, I Sutskever, Jason Wei, Jonathan Gordon, K Cobbe, Kevin Yu, Lukasz Kondraciuk, Max Schwarzer, Mostafa Rohaninejad, Noam Brown, Shengjia Zhao, Trapit Bansal, Vineet Kosaraju, Wenda Zhou Leadership, J Pacho...

  19. [19]

    I cannot write this because it violates our content policy

    Lan Gao, Oscar Chen, Rachel Lee, Nick Feamster, Chenhao Tan, and Marshini Chetty. 2025. “I cannot write this because it violates our content policy”: Understanding content moderation policies and user experiences in generative AI products.arXiv [cs.HC](June 2025)

  20. [20]

    Clifford Geertz. 1973. The impact of the concept of culture on the concept of man

  21. [21]

    Su Golder, Shahd Ahmed, Gill Norman, and Andrew Booth. 2017. Attitudes toward the ethics of research using social media: A systematic review.J. Med. Internet Res.19, 6 (June 2017), e195

  22. [23]

    Melody Y Guan, Manas Joglekar, Eric Wallace, Saachi Jain, Boaz Barak, Alec Helyar, Rachel Dias, Andrea Vallone, Hongyu Ren, Jason Wei, Hyung Won Chung, Sam Toyer, Johannes Heidecke, Alex Beutel, and Amelia Glaese. 2024. Deliberative Alignment: Reasoning enables safer language models.arXiv [cs.CL](Dec. 2024)

  23. [24]

    Seungju Han, Kavel Rao, Allyson Ettinger, Liwei Jiang, Bill Yuchen Lin, Nathan Lambert, Yejin Choi, and Nouha Dziri. 2024. WildGuard: Open one-stop moderation tools for safety risks, jailbreaks, and refusals of LLMs.arXiv [cs.CL](June 2024)

  24. [25]

    for an app supposed to make its users feel better, it sure is a joke

    Md Romael Haque and Sabirat Rubya. 2022. “for an app supposed to make its users feel better, it sure is a joke” - an analysis of user reviews of mobile mental health applications.Proc. ACM Hum. Comput. Interact.6, CSCW2 (Nov. 2022), 1–29

  25. [26]

    Christina Harrington, Sheena Erete, and Anne Marie Piper. 2019. Deconstructing community-based collaborative design: Towards more equitable participatory design engagements.Proceedings of the ACM on Human-Computer Interaction3, CSCW (2019), 1–25

  26. [27]

    Kashmir Hill. 2025. A Teen Was Suicidal. ChatGPT Was the Friend He Confided In.The New York Times(Aug. 2025)

  27. [28]

    Lujain Ibrahim, Saffron Huang, Lama Ahmad, Umang Bhatt, and Markus Anderljung. 2025. Towards interactive evaluations for interaction harms in human-AI systems. InProceedings of the AAAI/ACM Conference on AI, Ethics, and Society, Vol. 8. 1302–1310

  28. [29]

    Zainab Iftikhar, Amy Xiao, Sean Ransom, Jeff Huang, and Harini Suresh. 2025. How LLM counselors violate ethical standards in mental health practice: A practitioner-informed framework.Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society8, 2 (Oct. 2025), 1311–1323

  29. [30]

    Illinois General Assembly. 2025. House Bill 1806: Wellness and Oversight for Psychological Resources Act. Signed into law August 1,

  30. [31]

    https://ilga.gov/Legislation/BillStatus?DocNum=1806&GAID=18&DocTypeID=HB&LegId=159219&SessionID=114 Public Act 104-0054

  31. [32]

    Nataliya V Ivankova, John W Creswell, and Sheldon L Stick. 2006. Using mixed-methods sequential explanatory design: From theory to practice.Field Methods18, 1 (Feb. 2006), 3–20

  32. [33]

    Nicholas Jenkins, Michael Bloor, Jan Fischer, Lee Berney, and Joanne Neale. 2010. Putting it in context: the use of vignettes in qualitative interviewing.Qual. Res.10, 2 (April 2010), 175–198

  33. [34]

    Kelly Joyce, Laurel Smith-Doerr, Sharla Alegria, Susan Bell, Taylor Cruz, Steve G Hoffman, Safiya Umoja Noble, and Benjamin Shestakofsky. 2021. Toward a sociology of artificial intelligence: A call for research on inequalities and structural change.Socius7 (Jan. 2021), 237802312199958

  34. [35]

    I’ve talked to ChatGPT about my issues last night

    Kyuha Jung, Gyuho Lee, Yuanhui Huang, and Yunan Chen. 2025. “I’ve talked to ChatGPT about my issues last night. ”: Examining Mental Health Conversations with Large Language Models through Reddit Analysis.arXiv [cs.HC](April 2025)

  35. [36]

    Reishiro Kawakami and Sukrit Venkatagiri. 2024. The impact of generative AI on artists. InCreativity and Cognition. ACM, New York, NY, USA, 79–82

  36. [37]

    Hannah Rose Kirk, Iason Gabriel, Chris Summerfield, Bertie Vidgen, and Scott A Hale. 2025. Why human–AI relationships need socioaffective alignment.Humanit. Soc. Sci. Commun.12, 1 (May 2025), 728

  37. [38]

    Theodora Koulouri, Robert D Macredie, and David Olakitan. 2022. Chatbots to support young adults’ mental health: An exploratory study of acceptability.ACM Trans. Interact. Intell. Syst.12, 2 (June 2022), 1–39

  38. [39]

    Seth Lazar and Alondra Nelson. 2023. AI safety on whose terms?Science381, 6654 (July 2023), 138

  39. [40]

    This is human intelligence debugging artificial intelligence

    Zhuoyang Li, Zihao Zhu, Xinning Gui, and Yuhan Luo. 2025. “This is human intelligence debugging artificial intelligence”: Examining how people prompt GPT in seeking mental health support.Int. J. Hum. Comput. Stud.103555 (June 2025), 103555

  40. [41]

    Michael Madaio, Lisa Egede, Hariharan Subramonyam, Jennifer Wortman Vaughan, and Hanna Wallach. 2022. Assessing the fairness of AI systems: AI practitioners’ processes, challenges, and needs for support.Proc. ACM Hum. Comput. Interact.6, CSCW1 (March 2022), 1–26

  41. [42]

    Mantas Mazeika, Long Phan, Xuwang Yin, Andy Zou, Zifan Wang, Norman Mu, Elham Sakhaee, Nathaniel Li, Steven Basart, Bo Li, David Forsyth, and Dan Hendrycks. 2024. HarmBench: A standardized evaluation framework for automated red teaming and robust refusal.arXiv [cs.LG](Feb. 2024). Beyond the Single Turn FAccT ’26, June 25–28, 2026, Montreal, QC, Canada

  42. [43]

    Miles McCain, Ryn Linthicum, Chloe Lubinski, Alex Tamkin, Saffron Huang, Michael Stern, Kunal Handa, Esin Durmus, Tyler Neylon, Stuart Ritchie, Kamya Jagadish, Paruul Maheshwary, Sarah Heck, Alexandra Sanderford, and Deep Ganguli. 2025. How People Use Claude for Support, Advice, and Companionship. https://www.anthropic.com/news/how-people-use-claude-for-s...

  43. [44]

    I see me here

    Ashlee Milton, Leah Ajmani, Michael Ann DeVito, and Stevie Chancellor. 2023. “I see me here”: Mental health content, community, and algorithmic curation on TikTok. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems, Vol. 16. ACM, New York, NY, USA, 1–17

  44. [45]

    Jared Moore, Declan Grabb, William Agnew, Kevin Klyman, Stevie Chancellor, Desmond C Ong, and Nick Haber. 2025. Expressing stigma and inappropriate responses prevents LLMs from safely replacing mental health providers. InProceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency. ACM, New York, NY, USA, 599–627

  45. [46]

    Ramaravind Kommiya Mothilal, Shion Guha, and Syed Ishtiaque Ahmed. 2024. Towards a non-ideal methodological framework for Responsible ML.arXiv [cs.HC](Jan. 2024)

  46. [47]

    I don’t think RAI applies to my model

    Nadia Nahar, Chenyang Yang, Yanxin Chen, Wesley Hanwen Deng, Ken Holstein, Motahhare Eslami, and Christian Kästner. 2026. “I don’t think RAI applies to my model” – engaging non-champions with sticky stories for responsible AI work. InProceedings of the 2026 CHI Conference on Human Factors in Computing Systems. ACM, New York, NY, USA, 1–23

  47. [48]

    Alondra Nelson. 2023. Thick Alignment

  48. [49]

    New York State Assembly. 2025. An Act to amend the general business law, in relation to artificial intelligence companion models. Assembly Bill A6767, 2025–2026 Regular Sessions. https://www.nysenate.gov/legislation/bills/2025/A6767 Introduced by M. of A. Vanel; referred to the Committee on Consumer Affairs and Protection

  49. [50]

    OpenAI. 2025. Strengthening ChatGPT’s responses in sensitive conversations. https://openai.com/index/strengthening-chatgpt- responses-in-sensitive-conversations/. Accessed: 2025-11-24

  50. [51]

    OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, Red Avila, Igor Babuschkin, Suchir Balaji, Valerie Balcom, Paul Baltescu, Haiming Bao, Mohammad Bavarian, Jeff Belgum, Irwan Bello, Jake Berdine, Gabriel Bernadett-Shapiro, Christopher Berner...

  51. [52]

    Lawrence A Palinkas, Sarah M Horwitz, Carla A Green, Jennifer P Wisdom, Naihua Duan, and Kimberly Hoagwood. 2015. Purposeful sampling for qualitative data collection and analysis in mixed method implementation research.Adm. Policy Ment. Health42, 5 (Sept. 2015), 533–544

  52. [53]

    can I not be suicidal on a Sunday?

    Sachin R Pendse, Amit Sharma, Aditya Vashistha, Munmun De Choudhury, and Neha Kumar. 2021. “can I not be suicidal on a Sunday?”: Understanding technology-mediated pathways to mental health support.Proc. SIGCHI Conf. Hum. Factor. Comput. Syst.2021 (May 2021)

  53. [54]

    Mehrdad Rahsepar Meadi, Tomas Sillekens, Suzanne Metselaar, Anton van Balkom, Justin Bernstein, and Neeltje Batelaan. 2025. Exploring the ethical challenges of conversational AI in mental health care: Scoping review.JMIR Ment. Health12, 1 (Feb. 2025), e60432

  54. [55]

    Richard Ren, Steven Basart, Adam Khoja, Alice Gatti, Long Phan, Xuwang Yin, Mantas Mazeika, Alexander Pan, Gabriel Mukobi, Ryan H Kim, Stephen Fitz, and Dan Hendrycks. 2024. Safetywashing: Do AI safety benchmarks actually measure safety progress?arXiv [cs.LG] (July 2024)

  55. [56]

    Rhode Island General Assembly. 2026. An Act Relating to Commercial Law—General Regulatory Provisions—Artificial Intelligence Companion Models. Senate Bill S2195, January Session, A.D. 2026. https://webserver.rilegislature.gov/BillText/BillText26/SenateText26/ S2195.pdf Introduced by Senators Urso, Gu, DiPalma, Paolino, Zurier, Murray, and Appollonio; refe...

  56. [57]

    Laughing so I don’t cry

    Anastasia Schaadhardt, Yue Fu, Cory Gennari Pratt, and Wanda Pratt. 2023. “Laughing so I don’t cry”: How TikTok users employ humor and compassion to connect around psychiatric hospitalization. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems, Vol. 14. ACM, New York, NY, USA, 1–13

  57. [58]

    Andrew D Selbst, Danah Boyd, Sorelle A Friedler, Suresh Venkatasubramanian, and Janet Vertesi. 2019. Fairness and Abstraction in Sociotechnical Systems. InProceedings of the Conference on Fairness, Accountability, and Transparency. ACM, New York, NY, USA, 59–68

  58. [59]

    Itai Shapira, Gerdus Benade, and Ariel D Procaccia. 2026. How RLHF Amplifies Sycophancy.arXiv [cs.AI](Feb. 2026)

  59. [60]

    Renee Shelby, Shalaleh Rismani, Kathryn Henne, Ajung Moon, Negar Rostamzadeh, Paul Nicholas, N’mah Yilla-Akbari, Jess Gallegos, Andrew Smart, Emilio Garcia, and Gurleen Virk. 2023. Sociotechnical harms of algorithmic systems: Scoping a taxonomy for harm reduction. InProceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society, Vol. 24. ACM, New ...

  60. [61]

    Brett Sholtis. 2020. During A Mental Health Crisis, A Family’s Call To 911 Turns Tragic.NPR(Oct. 2020)

  61. [62]

    It happened to be the perfect thing

    Steven Siddals, John Torous, and Astrid Coxon. 2024. “It happened to be the perfect thing”: experiences of generative AI chatbots for mental health.Npj Ment Health Res3, 1 (Oct. 2024), 48

  62. [63]

    Petr Slovak and Sean A Munson. 2024. HCI contributions in mental health: A modular framework to guide psychosocial intervention design.Proc. SIGCHI Conf. Hum. Factor. Comput. Syst.2024 (May 2024)

  63. [64]

    Inhwa Song, Sachin R Pendse, Neha Kumar, and Munmun De Choudhury. 2024. The typing cure: Experiences with Large Language Model chatbots for mental health support.arXiv [cs.HC](Jan. 2024)

  64. [65]

    Alexandra Souly, Qingyuan Lu, Dillon Bowen, Tu Trinh, Elvis Hsieh, Sana Pandey, Pieter Abbeel, Justin Svegliato, Scott Emmons, Olivia Watkins, and Sam Toyer. 2024. A StrongREJECT for Empty Jailbreaks. arXiv:2402.10260 [cs.LG] https://arxiv.org/abs/2402.10260

  65. [66]

    Ningjing Tang, Megan Li, Amy Winecoff, Michael Madaio, Hoda Heidari, and Hong Shen. 2026. Navigating uncertainties: How GenAI developers document their models on open-source platforms. InProceedings of the 2026 CHI Conference on Human Factors in Computing Systems. ACM, New York, NY, USA, 1–19

  66. [67]

    Tangila Islam Tanni, Mamtaj Akter, Joshua Anderson, Mary Jean Amon, and Pamela J Wisniewski. 2024. Examining the unique online risk experiences and mental health outcomes of LGBTQ+ versus heterosexual youth. InProceedings of the CHI Conference on Human Factors in Computing Systems, Vol. 31. ACM, New York, NY, USA, 1–21

  67. [68]

    Tamar Tavory. 2024. Regulating AI in mental health: Ethics of care perspective.JMIR Ment. Health11 (Sept. 2024), e58493

  68. [69]

    Bertie Vidgen, Nino Scherrer, Hannah Rose Kirk, Rebecca Qian, Anand Kannappan, Scott A Hale, and Paul Röttger. 2023. SimpleSafe- tyTests: A test suite for identifying critical safety risks in large language models.arXiv [cs.CL](Nov. 2023)

  69. [70]

    Hanna Wallach, Meera Desai, A Feder Cooper, Angelina Wang, Chad Atalla, Solon Barocas, Su Lin Blodgett, Alexandra Chouldechova, Emily Corvi, P Alex Dow, Jean Garcia-Gathright, Alexandra Olteanu, Nicholas Pangakis, Stefanie Reed, Emily Sheng, Dan Vann, Jennifer Wortman Vaughan, Matthew Vogel, Hannah Washington, and Abigail Z Jacobs. 2025. Position: Evaluat...

  70. [71]

    Yuxia Wang, Haonan Li, Xudong Han, Preslav Nakov, and Timothy Baldwin. 2024. Do-Not-Answer: Evaluating Safeguards in LLMs. In Findings of the Association for Computational Linguistics: EACL 2024. 896–911

  71. [72]

    Laura Weidinger, Inioluwa Deborah Raji, Hanna Wallach, Margaret Mitchell, Angelina Wang, Olawale Salaudeen, Rishi Bommasani, Deep Ganguli, Sanmi Koyejo, and William Isaac. 2025. Toward an evaluation science for generative AI systems.arXiv [cs.AI](March 2025)

  72. [73]

    Laura Weidinger, Maribeth Rauh, Nahema Marchal, Arianna Manzini, Lisa Anne Hendricks, Juan Mateos-Garcia, Stevie Bergman, Jackie Kay, Conor Griffin, Ben Bariach, Iason Gabriel, Verena Rieser, and William Isaac. 2023. Sociotechnical Safety Evaluation of Generative AI Systems.arXiv [cs.AI](Oct. 2023)

  73. [74]

    Bingbing Wen, Jihan Yao, Shangbin Feng, Chenjun Xu, Yulia Tsvetkov, Bill Howe, and Lucy Lu Wang. 2025. Know your limits: A survey of abstention in large language models.Trans. Assoc. Comput. Linguist.13 (June 2025), 529–556. Beyond the Single Turn FAccT ’26, June 25–28, 2026, Montreal, QC, Canada

  74. [75]

    As an AI language model, I cannot

    Joel Wester, Tim Schrills, Henning Pohl, and Niels van Berkel. 2024. “As an AI language model, I cannot”: Investigating LLM Denials of User Requests. InProceedings of the CHI Conference on Human Factors in Computing Systems (CHI ’24, Article 979). Association for Computing Machinery, New York, NY, USA, 1–14

  75. [76]

    Richmond Y Wong. 2021. Tactics of soft resistance in user experience professionals’ values work.Proc. ACM Hum. Comput. Interact.5, CSCW2 (Oct. 2021), 1–28

  76. [77]

    Tinghao Xie, Xiangyu Qi, Yi Zeng, Yangsibo Huang, Udari Madhushani Sehwag, Kaixuan Huang, Luxi He, Boyi Wei, Dacheng Li, Ying Sheng, Ruoxi Jia, Bo Li, Kai Li, Danqi Chen, Peter Henderson, and Prateek Mittal. 2024. SORRY-bench: Systematically evaluating large language model safety refusal.arXiv [cs.AI](June 2024)

  77. [78]

    Dong Whi Yoo, Jiayue Melissa Shi, Violeta J Rodriguez, and Koustuv Saha. 2025. AI chatbots for mental health: Values and harms from lived experiences of depression.arXiv [cs.HC](April 2025)

  78. [79]

    Meg Young, Lassana Magassa, and Batya Friedman. 2019. Toward inclusive tech policy design: a method for underrepresented voices to strengthen tech policy documents.Ethics and Information Technology21, 2 (2019), 89–103

  79. [80]

    Yuan Yuan, Tina Sriskandarajah, Anna-Luisa Brakman, Alec Helyar, Alex Beutel, Andrea Vallone, and Saachi Jain. 2025. From hard refusals to safe-completions: Toward output-centric safety training.arXiv [cs.CY](Aug. 2025)

  80. [81]

    Xi Zheng, Zhuoyang Li, Xinning Gui, and Yuhan Luo. 2025. Customizing emotional support: How do individuals construct and interact with LLM-powered chatbots. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. ACM, New York, NY, USA, 1–20

Showing first 80 references.