pith. sign in

arxiv: 2602.01694 · v3 · pith:D7HW75Y6new · submitted 2026-02-02 · 💻 cs.HC

Beyond the Single Turn: Reframing Refusals as Dynamic Experiences Embedded in the Context of Mental Health Support Interactions with LLMs

Pith reviewed 2026-05-25 06:46 UTC · model grok-4.3

classification 💻 cs.HC
keywords LLM refusalsmental health supportuser experiencemixed methodsAI safetyrefusal mechanismshuman-AI interaction
0
0 comments X

The pith

LLM refusals in mental health support form multi-phase experiences instead of isolated single-turn events.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper shows that when large language models refuse to engage with sensitive mental health topics, users experience this not as a single response but through five connected phases. These phases are pre-refusal expectation formation, the triggering and encounter with the refusal, how the refusal message is framed, the provision of resource referrals, and the post-refusal outcomes. A sympathetic reader would care because these refusals have been linked to real-world harms, and current design focuses only on whether the model correctly follows policy in one turn rather than the full user journey. The study uses surveys of 53 people and interviews with 16 to build a framework that treats refusals as embedded in ongoing support interactions.

Core claim

Through surveys (N=53) and in-depth interviews (N=16) with individuals using LLMs for mental health support and mental health professionals, we reveal that refusals are not isolated, single-turn system behaviors but rather constitute dynamic, multi-phase experiences: pre-refusal expectation formation, refusal triggering and encounter, refusal message framing, resource referral provision, and post-refusal outcomes. We contribute a multi-phase framework for evaluating refusals beyond binary policy compliance accuracy and design recommendations for future refusal mechanisms.

What carries the argument

The multi-phase framework that identifies five stages in the refusal experience to move evaluation from single-turn compliance to holistic user trajectories.

If this is right

  • Refusal design must account for pre-refusal expectations users bring to the interaction.
  • Message framing and resource referrals need to be coordinated to avoid negative post-refusal outcomes.
  • Evaluation metrics for LLM safeguards should incorporate the full sequence of phases rather than binary accuracy.
  • Future mechanisms can be improved by addressing how refusals affect users' continued support-seeking behavior.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the five phases are consistent, then refusal systems could be prototyped and tested against each phase separately.
  • The framework could be applied to compare different LLM platforms' refusal behaviors in mental health contexts.
  • Extending the study to observe actual interactions rather than self-reports might validate or refine the phases.
  • Similar phase models might apply to refusals in other high-stakes domains like crisis support.

Load-bearing premise

The self-reported experiences from the 53 survey participants and 16 interviewees represent the main patterns in how people use LLMs for mental health support.

What would settle it

A study that collects refusal experiences from a larger and more varied group of users and checks whether the same five phases reliably describe their accounts.

Figures

Figures reproduced from arXiv: 2602.01694 by Alice Qian, Blake Bullwinkel, Esther Howe, Hoda Heidari, Hong Shen, Jina Suh, Ningjing Tang, Paola Pedrelli, Qiaosi Wang.

Figure 1
Figure 1. Figure 1: Our proposed multi-phase framework of understanding LLM refusals in mental health support interactions as dynamic experiences. This framing is the result of 53 surveys and 16 interviews with end-users and mental health professionals. The framework reveals how refusal experiences unfold in phases including expectation formation, intent recognition, refusal framing, resource provision, and post-refusal outco… view at source ↗
read the original abstract

Content Warning: This paper contains participant quotes and discussions related to mental health challenges, emotional distress, and suicidal ideation. Large language models (LLMs) are increasingly used for mental health support, yet the model safeguards -- particularly refusals to engage with sensitive content -- remain poorly understood from the perspectives of users and mental health professionals (MHPs) and have been reported to cause real-world harms. This paper presents findings from a sequential mixed-methods study examining how LLM refusals are experienced and interpreted in mental health support interactions. Through surveys (N=53) and in-depth interviews (N=16) with individuals using LLMs for mental health support and MHPs, we reveal that refusals are not isolated, single-turn system behaviors but rather constitute dynamic, multi-phase experiences: pre-refusal expectation formation, refusal triggering and encounter, refusal message framing, resource referral provision, and post-refusal outcomes. We contribute a multi-phase framework for evaluating refusals beyond binary policy compliance accuracy and design recommendations for future refusal mechanisms. These findings suggest that understanding LLM refusals requires moving beyond single-turn interactions toward recognizing them as holistic experiences embedded within users' support-seeking trajectories and the broader LLM design pipeline.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that LLM refusals in mental health support interactions are not isolated single-turn system behaviors but instead constitute dynamic, multi-phase experiences. Drawing on a sequential mixed-methods study with surveys (N=53) and interviews (N=16) involving users and mental health professionals, it identifies five phases—pre-refusal expectation formation, refusal triggering and encounter, refusal message framing, resource referral provision, and post-refusal outcomes—and contributes a framework for evaluating refusals beyond binary policy compliance, along with design recommendations.

Significance. If the framework is robustly supported, the work is significant for HCI and AI ethics as it shifts focus from technical refusal accuracy to holistic, user-embedded experiences in sensitive mental health contexts. This could inform more context-aware safeguard designs and reduce reported harms from abrupt refusals. The mixed-methods contribution provides empirical user perspectives on an under-examined aspect of LLM deployment.

major comments (2)
  1. [Abstract/Methods] Abstract and Methods: The claim that the data support the five-phase framework is presented without any description of the thematic analysis process, inter-rater reliability measures, coding procedures, or how saturation was determined. This absence directly affects the ability to evaluate whether the phases represent core dynamics or are shaped by the specific sample.
  2. [Methods] Methods/Participant section: No details are given on sampling strategy, platform diversity (e.g., GPT-4 vs. Claude vs. open-source models), or stratification by distress severity. The central claim that the phases capture general refusal experiences therefore rests on an unexamined assumption that the N=53 + N=16 self-selected participants are representative across user groups and LLM interfaces.
minor comments (1)
  1. [Abstract] Abstract: The content warning is appropriate; consider adding a brief note on the potential emotional impact for readers engaging with the participant quotes.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback, which highlights important areas for improving the transparency and rigor of our methods description. We agree that expanding these sections will strengthen the manuscript and will incorporate the suggested revisions.

read point-by-point responses
  1. Referee: [Abstract/Methods] Abstract and Methods: The claim that the data support the five-phase framework is presented without any description of the thematic analysis process, inter-rater reliability measures, coding procedures, or how saturation was determined. This absence directly affects the ability to evaluate whether the phases represent core dynamics or are shaped by the specific sample.

    Authors: We agree that the manuscript would benefit from a more explicit account of the qualitative analysis. In the revised version, we will add a dedicated 'Data Analysis' subsection under Methods that details the thematic analysis process (inductive open coding of interview transcripts and open-ended survey responses, followed by iterative grouping into phases), inter-rater reliability (two coders independently coded 20% of the data with Cohen's kappa reported), coding procedures (codebook development and refinement), and saturation determination (no new themes after the 12th interview). This will clarify how the five-phase framework emerged from the data. revision: yes

  2. Referee: [Methods] Methods/Participant section: No details are given on sampling strategy, platform diversity (e.g., GPT-4 vs. Claude vs. open-source models), or stratification by distress severity. The central claim that the phases capture general refusal experiences therefore rests on an unexamined assumption that the N=53 + N=16 self-selected participants are representative across user groups and LLM interfaces.

    Authors: We acknowledge this gap in reporting. We will expand the Participants subsection to describe the convenience sampling strategy (recruitment via Reddit communities, mental health forums, and professional networks for MHPs), self-reported LLM platform usage (noting diversity across GPT-4, Claude, and other models), and data collected on distress levels (without formal stratification due to the exploratory nature and ethical constraints on screening). We will also revise the framing throughout to present the framework as derived from this purposive sample and explicitly discuss limitations on generalizability, rather than implying broad representativeness. revision: yes

Circularity Check

0 steps flagged

No circularity; framework constructed directly from participant data

full rationale

The paper derives its five-phase framework exclusively through thematic analysis of new survey (N=53) and interview (N=16) data. No equations, fitted parameters, predictions, or self-citations appear in the derivation. The phases are presented as emergent from the collected responses rather than defined in terms of themselves or reduced to prior author work. This matches the default case of a self-contained empirical study with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a qualitative empirical study; the framework is induced from interview and survey responses. No free parameters, mathematical axioms, or invented entities are introduced.

pith-pipeline@v0.9.0 · 5779 in / 1157 out tokens · 30848 ms · 2026-05-25T06:46:29.100545+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Engagement-Optimized Care: When LLMs become Mental Health Infrastructure

    cs.CY 2026-05 unverdicted novelty 7.0

    A longitudinal qualitative study of 18 US users finds that LLMs deliver socioemotional support but also foster dependency, one-sided validation, and privacy risks because their designs prioritize engagement over well-...

Reference graph

Works this paper leans on

81 extracted references · 81 canonical work pages · cited by 1 Pith paper · 2 internal anchors

  1. [1]

    Leah Hope Ajmani, Arka Ghosh, Benjamin Kaveladze, Eugenia Kim, Keertana Namuduri, Theresa Nguyen, Ebele Okoli, Jessica Schleider, Denae Ford, and Jina Suh. 2025. Seeking late night life lines: Experiences of conversational AI use in mental health crisis.arXiv [cs.HC] (Dec. 2025)

  2. [2]

    Andrew Selsky, Associated Press and Leah Willingham, Associated Press. 2022. How some encounters between police and people with mental illness can turn tragic. https://www.pbs.org/newshour/health/how-some-encounters-between-police-and-people-with-mental- illness-can-turn-tragic. Accessed: 2026-1-8

  3. [3]

    Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, Carol Chen, Catherine Olsson, Christopher Olah, Danny Hernandez, Dawn Drain, Deep Ganguli, Dustin Li, Eli Tran-Johnson, Ethan Perez, Jamie Kerr, Jared Mueller, Jeffrey Ladish, Joshua Landau, Kamal Ndousse, K...

  4. [4]

    Faeze Brahman, Sachin Kumar, Vidhisha Balachandran, Pradeep Dasigi, Valentina Pyatkin, Abhilasha Ravichander, Sarah Wiegreffe, Nouha Dziri, Khyathi Chandu, Jack Hessel, Yulia Tsvetkov, Noah A Smith, Yejin Choi, and Hannaneh Hajishirzi. 2024. The art of saying no: Contextual noncompliance in language models.arXiv [cs.CL](July 2024)

  5. [5]

    Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology.Qual. Res. Psychol.3, 2 (Jan. 2006), 77–101

  6. [6]

    2012.Thematic analysis.American Psychological Association

    Virginia Braun and Victoria Clarke. 2012.Thematic analysis.American Psychological Association

  7. [7]

    Virginia Braun and Victoria Clarke. 2019. Reflecting on reflexive thematic analysis.Qualitative research in sport, exercise and health11, 4 (2019), 589–597

  8. [8]

    California State Legislature. 2025. Senate Bill 243: Companion Chatbots. Approved by Governor October 13, 2025. https://leginfo. legislature.ca.gov/faces/billNavClient.xhtml?bill_id=202520260SB243 Chapter 677, Statutes of 2025

  9. [9]

    Mohit Chandra, Suchismita Naik, Denae Ford, Ebele Okoli, Munmun De Choudhury, Mahsa Ershadi, Gonzalo Ramos, Javier Hernandez, Ananya Bhattacharjee, Shahed Warreth, et al. 2025. From Lived Experience to Insight: Unpacking the Psychological Risks of Using AI Conversational Agents. InProceedings of the 2025 ACM Conference on Fairness, Accountability, and Tra...

  10. [10]

    Mohit Chandra, Suchismita Naik, Denae Ford, Ebele Okoli, Munmun De Choudhury, Mahsa Ershadi, Gonzalo Ramos, Javier Hernandez, Ananya Bhattacharjee, Shahed Warreth, and Jina Suh. 2024. From lived experience to insight: Unpacking the psychological risks of using AI conversational agents.arXiv [cs.HC](Dec. 2024)

  11. [11]

    Patrick Chao, Edoardo Debenedetti, Alexander Robey, Maksym Andriushchenko, Francesco Croce, Vikash Sehwag, Edgar Dobriban, Nicolas Flammarion, George J Pappas, Florian Tramer, Hamed Hassani, and Eric Wong. 2024. JailbreakBench: An open robustness benchmark for jailbreaking large language models.arXiv [cs.CR](March 2024)

  12. [12]

    Khaoula Chehbouni, Mohammed Haddou, Jackie Chi Kit Cheung, and Golnoosh Farnadi. 2025. Neither valid nor reliable? Investigating the use of LLMs as judges.arXiv [cs.CL](Aug. 2025)

  13. [13]

    Adam Dahlgren Lindström, Leila Methnani, Lea Krause, Petter Ericson, Íñigo Martínez de Rituerto de Troya, Dimitri Coelho Mollo, and Roel Dobbe. 2025. Helpful, harmless, honest? Sociotechnical limits of AI alignment and safety through Reinforcement Learning from Human Feedback.Ethics Inf. Technol.27, 2 (June 2025), 28

  14. [14]

    Munmun De Choudhury and Sushovan De. 2014. Mental health discourse on reddit: Self-disclosure, social support, and anonymity. Proceedings of the International AAAI Conference on Web and Social Media8, 1 (May 2014), 71–80

  15. [15]

    Mary J De Silva, Erica Breuer, Lucy Lee, Laura Asher, Neerja Chowdhary, Crick Lund, and Vikram Patel. 2014. Theory of Change: a theory-driven approach to enhance the Medical Research Council’s framework for complex interventions.Trials15, 1 (July 2014), 267

  16. [16]

    John Draper and Richard T McKeon. 2024. The journey toward 988: A historical perspective on crisis hotlines in the United States. Psychiatr. Clin. North Am.47, 3 (Sept. 2024), 473–490

  17. [17]

    John Draper, Gillian Murphy, Eduardo Vega, David W Covington, and Richard McKeon. 2015. Helping callers to the National Suicide Prevention Lifeline who are at imminent risk of suicide: the importance of active engagement, active rescue, and collaboration between crisis and emergency services.Suicide Life Threat. Behav.45, 3 (June 2015), 261–270

  18. [18]

    Foundational Contributors, Ahmed El-Kishky, Daniel Selsam, Francis Song, Giambattista Parascandolo, Hongyu Ren, Hunter Lightman, Hyung Won, Ilge Akkaya, I Sutskever, Jason Wei, Jonathan Gordon, K Cobbe, Kevin Yu, Lukasz Kondraciuk, Max Schwarzer, Mostafa Rohaninejad, Noam Brown, Shengjia Zhao, Trapit Bansal, Vineet Kosaraju, Wenda Zhou Leadership, J Pacho...

  19. [19]

    I cannot write this because it violates our content policy

    Lan Gao, Oscar Chen, Rachel Lee, Nick Feamster, Chenhao Tan, and Marshini Chetty. 2025. “I cannot write this because it violates our content policy”: Understanding content moderation policies and user experiences in generative AI products.arXiv [cs.HC](June 2025)

  20. [20]

    Clifford Geertz. 1973. The impact of the concept of culture on the concept of man

  21. [21]

    Su Golder, Shahd Ahmed, Gill Norman, and Andrew Booth. 2017. Attitudes toward the ethics of research using social media: A systematic review.J. Med. Internet Res.19, 6 (June 2017), e195

  22. [23]

    Melody Y Guan, Manas Joglekar, Eric Wallace, Saachi Jain, Boaz Barak, Alec Helyar, Rachel Dias, Andrea Vallone, Hongyu Ren, Jason Wei, Hyung Won Chung, Sam Toyer, Johannes Heidecke, Alex Beutel, and Amelia Glaese. 2024. Deliberative Alignment: Reasoning enables safer language models.arXiv [cs.CL](Dec. 2024)

  23. [24]

    Seungju Han, Kavel Rao, Allyson Ettinger, Liwei Jiang, Bill Yuchen Lin, Nathan Lambert, Yejin Choi, and Nouha Dziri. 2024. WildGuard: Open one-stop moderation tools for safety risks, jailbreaks, and refusals of LLMs.arXiv [cs.CL](June 2024)

  24. [25]

    for an app supposed to make its users feel better, it sure is a joke

    Md Romael Haque and Sabirat Rubya. 2022. “for an app supposed to make its users feel better, it sure is a joke” - an analysis of user reviews of mobile mental health applications.Proc. ACM Hum. Comput. Interact.6, CSCW2 (Nov. 2022), 1–29

  25. [26]

    Christina Harrington, Sheena Erete, and Anne Marie Piper. 2019. Deconstructing community-based collaborative design: Towards more equitable participatory design engagements.Proceedings of the ACM on Human-Computer Interaction3, CSCW (2019), 1–25

  26. [27]

    Kashmir Hill. 2025. A Teen Was Suicidal. ChatGPT Was the Friend He Confided In.The New York Times(Aug. 2025)

  27. [28]

    Lujain Ibrahim, Saffron Huang, Lama Ahmad, Umang Bhatt, and Markus Anderljung. 2025. Towards interactive evaluations for interaction harms in human-AI systems. InProceedings of the AAAI/ACM Conference on AI, Ethics, and Society, Vol. 8. 1302–1310

  28. [29]

    Zainab Iftikhar, Amy Xiao, Sean Ransom, Jeff Huang, and Harini Suresh. 2025. How LLM counselors violate ethical standards in mental health practice: A practitioner-informed framework.Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society8, 2 (Oct. 2025), 1311–1323

  29. [30]

    Illinois General Assembly. 2025. House Bill 1806: Wellness and Oversight for Psychological Resources Act. Signed into law August 1,

  30. [31]

    https://ilga.gov/Legislation/BillStatus?DocNum=1806&GAID=18&DocTypeID=HB&LegId=159219&SessionID=114 Public Act 104-0054

  31. [32]

    Nataliya V Ivankova, John W Creswell, and Sheldon L Stick. 2006. Using mixed-methods sequential explanatory design: From theory to practice.Field Methods18, 1 (Feb. 2006), 3–20

  32. [33]

    Nicholas Jenkins, Michael Bloor, Jan Fischer, Lee Berney, and Joanne Neale. 2010. Putting it in context: the use of vignettes in qualitative interviewing.Qual. Res.10, 2 (April 2010), 175–198

  33. [34]

    Kelly Joyce, Laurel Smith-Doerr, Sharla Alegria, Susan Bell, Taylor Cruz, Steve G Hoffman, Safiya Umoja Noble, and Benjamin Shestakofsky. 2021. Toward a sociology of artificial intelligence: A call for research on inequalities and structural change.Socius7 (Jan. 2021), 237802312199958

  34. [35]

    I’ve talked to ChatGPT about my issues last night

    Kyuha Jung, Gyuho Lee, Yuanhui Huang, and Yunan Chen. 2025. “I’ve talked to ChatGPT about my issues last night. ”: Examining Mental Health Conversations with Large Language Models through Reddit Analysis.arXiv [cs.HC](April 2025)

  35. [36]

    Reishiro Kawakami and Sukrit Venkatagiri. 2024. The impact of generative AI on artists. InCreativity and Cognition. ACM, New York, NY, USA, 79–82

  36. [37]

    Hannah Rose Kirk, Iason Gabriel, Chris Summerfield, Bertie Vidgen, and Scott A Hale. 2025. Why human–AI relationships need socioaffective alignment.Humanit. Soc. Sci. Commun.12, 1 (May 2025), 728

  37. [38]

    Theodora Koulouri, Robert D Macredie, and David Olakitan. 2022. Chatbots to support young adults’ mental health: An exploratory study of acceptability.ACM Trans. Interact. Intell. Syst.12, 2 (June 2022), 1–39

  38. [39]

    Seth Lazar and Alondra Nelson. 2023. AI safety on whose terms?Science381, 6654 (July 2023), 138

  39. [40]

    This is human intelligence debugging artificial intelligence

    Zhuoyang Li, Zihao Zhu, Xinning Gui, and Yuhan Luo. 2025. “This is human intelligence debugging artificial intelligence”: Examining how people prompt GPT in seeking mental health support.Int. J. Hum. Comput. Stud.103555 (June 2025), 103555

  40. [41]

    Michael Madaio, Lisa Egede, Hariharan Subramonyam, Jennifer Wortman Vaughan, and Hanna Wallach. 2022. Assessing the fairness of AI systems: AI practitioners’ processes, challenges, and needs for support.Proc. ACM Hum. Comput. Interact.6, CSCW1 (March 2022), 1–26

  41. [42]

    Mantas Mazeika, Long Phan, Xuwang Yin, Andy Zou, Zifan Wang, Norman Mu, Elham Sakhaee, Nathaniel Li, Steven Basart, Bo Li, David Forsyth, and Dan Hendrycks. 2024. HarmBench: A standardized evaluation framework for automated red teaming and robust refusal.arXiv [cs.LG](Feb. 2024). Beyond the Single Turn FAccT ’26, June 25–28, 2026, Montreal, QC, Canada

  42. [43]

    Miles McCain, Ryn Linthicum, Chloe Lubinski, Alex Tamkin, Saffron Huang, Michael Stern, Kunal Handa, Esin Durmus, Tyler Neylon, Stuart Ritchie, Kamya Jagadish, Paruul Maheshwary, Sarah Heck, Alexandra Sanderford, and Deep Ganguli. 2025. How People Use Claude for Support, Advice, and Companionship. https://www.anthropic.com/news/how-people-use-claude-for-s...

  43. [44]

    I see me here

    Ashlee Milton, Leah Ajmani, Michael Ann DeVito, and Stevie Chancellor. 2023. “I see me here”: Mental health content, community, and algorithmic curation on TikTok. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems, Vol. 16. ACM, New York, NY, USA, 1–17

  44. [45]

    Jared Moore, Declan Grabb, William Agnew, Kevin Klyman, Stevie Chancellor, Desmond C Ong, and Nick Haber. 2025. Expressing stigma and inappropriate responses prevents LLMs from safely replacing mental health providers. InProceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency. ACM, New York, NY, USA, 599–627

  45. [46]

    Ramaravind Kommiya Mothilal, Shion Guha, and Syed Ishtiaque Ahmed. 2024. Towards a non-ideal methodological framework for Responsible ML.arXiv [cs.HC](Jan. 2024)

  46. [47]

    I don’t think RAI applies to my model

    Nadia Nahar, Chenyang Yang, Yanxin Chen, Wesley Hanwen Deng, Ken Holstein, Motahhare Eslami, and Christian Kästner. 2026. “I don’t think RAI applies to my model” – engaging non-champions with sticky stories for responsible AI work. InProceedings of the 2026 CHI Conference on Human Factors in Computing Systems. ACM, New York, NY, USA, 1–23

  47. [48]

    Alondra Nelson. 2023. Thick Alignment

  48. [49]

    New York State Assembly. 2025. An Act to amend the general business law, in relation to artificial intelligence companion models. Assembly Bill A6767, 2025–2026 Regular Sessions. https://www.nysenate.gov/legislation/bills/2025/A6767 Introduced by M. of A. Vanel; referred to the Committee on Consumer Affairs and Protection

  49. [50]

    OpenAI. 2025. Strengthening ChatGPT’s responses in sensitive conversations. https://openai.com/index/strengthening-chatgpt- responses-in-sensitive-conversations/. Accessed: 2025-11-24

  50. [51]

    OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, Red Avila, Igor Babuschkin, Suchir Balaji, Valerie Balcom, Paul Baltescu, Haiming Bao, Mohammad Bavarian, Jeff Belgum, Irwan Bello, Jake Berdine, Gabriel Bernadett-Shapiro, Christopher Berner...

  51. [52]

    Lawrence A Palinkas, Sarah M Horwitz, Carla A Green, Jennifer P Wisdom, Naihua Duan, and Kimberly Hoagwood. 2015. Purposeful sampling for qualitative data collection and analysis in mixed method implementation research.Adm. Policy Ment. Health42, 5 (Sept. 2015), 533–544

  52. [53]

    can I not be suicidal on a Sunday?

    Sachin R Pendse, Amit Sharma, Aditya Vashistha, Munmun De Choudhury, and Neha Kumar. 2021. “can I not be suicidal on a Sunday?”: Understanding technology-mediated pathways to mental health support.Proc. SIGCHI Conf. Hum. Factor. Comput. Syst.2021 (May 2021)

  53. [54]

    Mehrdad Rahsepar Meadi, Tomas Sillekens, Suzanne Metselaar, Anton van Balkom, Justin Bernstein, and Neeltje Batelaan. 2025. Exploring the ethical challenges of conversational AI in mental health care: Scoping review.JMIR Ment. Health12, 1 (Feb. 2025), e60432

  54. [55]

    Richard Ren, Steven Basart, Adam Khoja, Alice Gatti, Long Phan, Xuwang Yin, Mantas Mazeika, Alexander Pan, Gabriel Mukobi, Ryan H Kim, Stephen Fitz, and Dan Hendrycks. 2024. Safetywashing: Do AI safety benchmarks actually measure safety progress?arXiv [cs.LG] (July 2024)

  55. [56]

    Rhode Island General Assembly. 2026. An Act Relating to Commercial Law—General Regulatory Provisions—Artificial Intelligence Companion Models. Senate Bill S2195, January Session, A.D. 2026. https://webserver.rilegislature.gov/BillText/BillText26/SenateText26/ S2195.pdf Introduced by Senators Urso, Gu, DiPalma, Paolino, Zurier, Murray, and Appollonio; refe...

  56. [57]

    Laughing so I don’t cry

    Anastasia Schaadhardt, Yue Fu, Cory Gennari Pratt, and Wanda Pratt. 2023. “Laughing so I don’t cry”: How TikTok users employ humor and compassion to connect around psychiatric hospitalization. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems, Vol. 14. ACM, New York, NY, USA, 1–13

  57. [58]

    Andrew D Selbst, Danah Boyd, Sorelle A Friedler, Suresh Venkatasubramanian, and Janet Vertesi. 2019. Fairness and Abstraction in Sociotechnical Systems. InProceedings of the Conference on Fairness, Accountability, and Transparency. ACM, New York, NY, USA, 59–68

  58. [59]

    Itai Shapira, Gerdus Benade, and Ariel D Procaccia. 2026. How RLHF Amplifies Sycophancy.arXiv [cs.AI](Feb. 2026)

  59. [60]

    Renee Shelby, Shalaleh Rismani, Kathryn Henne, Ajung Moon, Negar Rostamzadeh, Paul Nicholas, N’mah Yilla-Akbari, Jess Gallegos, Andrew Smart, Emilio Garcia, and Gurleen Virk. 2023. Sociotechnical harms of algorithmic systems: Scoping a taxonomy for harm reduction. InProceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society, Vol. 24. ACM, New ...

  60. [61]

    Brett Sholtis. 2020. During A Mental Health Crisis, A Family’s Call To 911 Turns Tragic.NPR(Oct. 2020)

  61. [62]

    It happened to be the perfect thing

    Steven Siddals, John Torous, and Astrid Coxon. 2024. “It happened to be the perfect thing”: experiences of generative AI chatbots for mental health.Npj Ment Health Res3, 1 (Oct. 2024), 48

  62. [63]

    Petr Slovak and Sean A Munson. 2024. HCI contributions in mental health: A modular framework to guide psychosocial intervention design.Proc. SIGCHI Conf. Hum. Factor. Comput. Syst.2024 (May 2024)

  63. [64]

    Inhwa Song, Sachin R Pendse, Neha Kumar, and Munmun De Choudhury. 2024. The typing cure: Experiences with Large Language Model chatbots for mental health support.arXiv [cs.HC](Jan. 2024)

  64. [65]

    Alexandra Souly, Qingyuan Lu, Dillon Bowen, Tu Trinh, Elvis Hsieh, Sana Pandey, Pieter Abbeel, Justin Svegliato, Scott Emmons, Olivia Watkins, and Sam Toyer. 2024. A StrongREJECT for Empty Jailbreaks. arXiv:2402.10260 [cs.LG] https://arxiv.org/abs/2402.10260

  65. [66]

    Ningjing Tang, Megan Li, Amy Winecoff, Michael Madaio, Hoda Heidari, and Hong Shen. 2026. Navigating uncertainties: How GenAI developers document their models on open-source platforms. InProceedings of the 2026 CHI Conference on Human Factors in Computing Systems. ACM, New York, NY, USA, 1–19

  66. [67]

    Tangila Islam Tanni, Mamtaj Akter, Joshua Anderson, Mary Jean Amon, and Pamela J Wisniewski. 2024. Examining the unique online risk experiences and mental health outcomes of LGBTQ+ versus heterosexual youth. InProceedings of the CHI Conference on Human Factors in Computing Systems, Vol. 31. ACM, New York, NY, USA, 1–21

  67. [68]

    Tamar Tavory. 2024. Regulating AI in mental health: Ethics of care perspective.JMIR Ment. Health11 (Sept. 2024), e58493

  68. [69]

    Bertie Vidgen, Nino Scherrer, Hannah Rose Kirk, Rebecca Qian, Anand Kannappan, Scott A Hale, and Paul Röttger. 2023. SimpleSafe- tyTests: A test suite for identifying critical safety risks in large language models.arXiv [cs.CL](Nov. 2023)

  69. [70]

    Hanna Wallach, Meera Desai, A Feder Cooper, Angelina Wang, Chad Atalla, Solon Barocas, Su Lin Blodgett, Alexandra Chouldechova, Emily Corvi, P Alex Dow, Jean Garcia-Gathright, Alexandra Olteanu, Nicholas Pangakis, Stefanie Reed, Emily Sheng, Dan Vann, Jennifer Wortman Vaughan, Matthew Vogel, Hannah Washington, and Abigail Z Jacobs. 2025. Position: Evaluat...

  70. [71]

    Yuxia Wang, Haonan Li, Xudong Han, Preslav Nakov, and Timothy Baldwin. 2024. Do-Not-Answer: Evaluating Safeguards in LLMs. In Findings of the Association for Computational Linguistics: EACL 2024. 896–911

  71. [72]

    Laura Weidinger, Inioluwa Deborah Raji, Hanna Wallach, Margaret Mitchell, Angelina Wang, Olawale Salaudeen, Rishi Bommasani, Deep Ganguli, Sanmi Koyejo, and William Isaac. 2025. Toward an evaluation science for generative AI systems.arXiv [cs.AI](March 2025)

  72. [73]

    Laura Weidinger, Maribeth Rauh, Nahema Marchal, Arianna Manzini, Lisa Anne Hendricks, Juan Mateos-Garcia, Stevie Bergman, Jackie Kay, Conor Griffin, Ben Bariach, Iason Gabriel, Verena Rieser, and William Isaac. 2023. Sociotechnical Safety Evaluation of Generative AI Systems.arXiv [cs.AI](Oct. 2023)

  73. [74]

    Bingbing Wen, Jihan Yao, Shangbin Feng, Chenjun Xu, Yulia Tsvetkov, Bill Howe, and Lucy Lu Wang. 2025. Know your limits: A survey of abstention in large language models.Trans. Assoc. Comput. Linguist.13 (June 2025), 529–556. Beyond the Single Turn FAccT ’26, June 25–28, 2026, Montreal, QC, Canada

  74. [75]

    As an AI language model, I cannot

    Joel Wester, Tim Schrills, Henning Pohl, and Niels van Berkel. 2024. “As an AI language model, I cannot”: Investigating LLM Denials of User Requests. InProceedings of the CHI Conference on Human Factors in Computing Systems (CHI ’24, Article 979). Association for Computing Machinery, New York, NY, USA, 1–14

  75. [76]

    Richmond Y Wong. 2021. Tactics of soft resistance in user experience professionals’ values work.Proc. ACM Hum. Comput. Interact.5, CSCW2 (Oct. 2021), 1–28

  76. [77]

    Tinghao Xie, Xiangyu Qi, Yi Zeng, Yangsibo Huang, Udari Madhushani Sehwag, Kaixuan Huang, Luxi He, Boyi Wei, Dacheng Li, Ying Sheng, Ruoxi Jia, Bo Li, Kai Li, Danqi Chen, Peter Henderson, and Prateek Mittal. 2024. SORRY-bench: Systematically evaluating large language model safety refusal.arXiv [cs.AI](June 2024)

  77. [78]

    Dong Whi Yoo, Jiayue Melissa Shi, Violeta J Rodriguez, and Koustuv Saha. 2025. AI chatbots for mental health: Values and harms from lived experiences of depression.arXiv [cs.HC](April 2025)

  78. [79]

    Meg Young, Lassana Magassa, and Batya Friedman. 2019. Toward inclusive tech policy design: a method for underrepresented voices to strengthen tech policy documents.Ethics and Information Technology21, 2 (2019), 89–103

  79. [80]

    Yuan Yuan, Tina Sriskandarajah, Anna-Luisa Brakman, Alec Helyar, Alex Beutel, Andrea Vallone, and Saachi Jain. 2025. From hard refusals to safe-completions: Toward output-centric safety training.arXiv [cs.CY](Aug. 2025)

  80. [81]

    Xi Zheng, Zhuoyang Li, Xinning Gui, and Yuhan Luo. 2025. Customizing emotional support: How do individuals construct and interact with LLM-powered chatbots. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. ACM, New York, NY, USA, 1–20

Showing first 80 references.