Recognition: no theorem link
Beyond the Single Turn: Reframing Refusals as Dynamic Experiences Embedded in the Context of Mental Health Support Interactions with LLMs
Pith reviewed 2026-05-16 08:38 UTC · model grok-4.3
The pith
LLM refusals during mental health support are multi-phase experiences rather than single-turn blocks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Refusals are not isolated, single-turn system behaviors but rather constitute dynamic, multi-phase experiences: pre-refusal expectation formation, refusal triggering and encounter, refusal message framing, resource referral provision, and post-refusal outcomes. The study contributes a multi-phase framework for evaluating refusals beyond binary policy compliance accuracy together with design recommendations for future refusal mechanisms.
What carries the argument
The multi-phase framework that treats refusals as holistic experiences embedded in users' support-seeking trajectories and the broader LLM design pipeline.
If this is right
- Refusal mechanisms should be designed and evaluated across all five phases rather than measured only by immediate compliance.
- Message framing and resource referral steps become explicit targets for reducing user distress and improving support continuity.
- Post-refusal outcomes such as continued help-seeking or disengagement must be tracked as part of system performance.
- Mental health professionals' input can shape how each phase is implemented to align with clinical needs.
Where Pith is reading between the lines
- The same multi-phase lens could be applied to refusals in other high-stakes domains such as legal or financial advice.
- Developers could prototype and test phased refusal interfaces in controlled trials that measure user retention and follow-through.
- Regulatory standards for AI safety might shift from binary policy checks toward requirements for trajectory-aware refusal handling.
- Training data and fine-tuning objectives could be adjusted to optimize the full sequence rather than only the refusal trigger.
Load-bearing premise
The experiences described by the self-selected sample of 53 survey participants and 16 interviewees represent those of the wider population of people using LLMs for mental health support.
What would settle it
A larger random-sample study of LLM mental health users that finds most refusals are experienced as isolated single-turn events with negligible pre- and post-phases.
Figures
read the original abstract
Content Warning: This paper contains participant quotes and discussions related to mental health challenges, emotional distress, and suicidal ideation. Large language models (LLMs) are increasingly used for mental health support, yet the model safeguards -- particularly refusals to engage with sensitive content -- remain poorly understood from the perspectives of users and mental health professionals (MHPs) and have been reported to cause real-world harms. This paper presents findings from a sequential mixed-methods study examining how LLM refusals are experienced and interpreted in mental health support interactions. Through surveys (N=53) and in-depth interviews (N=16) with individuals using LLMs for mental health support and MHPs, we reveal that refusals are not isolated, single-turn system behaviors but rather constitute dynamic, multi-phase experiences: pre-refusal expectation formation, refusal triggering and encounter, refusal message framing, resource referral provision, and post-refusal outcomes. We contribute a multi-phase framework for evaluating refusals beyond binary policy compliance accuracy and design recommendations for future refusal mechanisms. These findings suggest that understanding LLM refusals requires moving beyond single-turn interactions toward recognizing them as holistic experiences embedded within users' support-seeking trajectories and the broader LLM design pipeline.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper reports a sequential mixed-methods study with surveys (N=53) and interviews (N=16) involving LLM users seeking mental health support and mental health professionals. It claims that LLM refusals are not isolated single-turn system behaviors but dynamic multi-phase experiences consisting of pre-refusal expectation formation, refusal triggering and encounter, refusal message framing, resource referral provision, and post-refusal outcomes. The authors contribute an inductive multi-phase framework for evaluating refusals beyond binary compliance and offer design recommendations for future refusal mechanisms.
Significance. If the framework holds beyond the sample, the work would usefully shift evaluation of LLM safeguards from single-turn accuracy metrics toward holistic user trajectories in sensitive mental health contexts, potentially reducing reported harms through better-designed referrals and framing. The mixed-methods design produces coherent qualitative themes from participant data, but the small self-selected sample constrains claims of broad applicability and leaves the framework as an exploratory rather than validated contribution.
major comments (2)
- [Methods] Methods section: Recruitment details for the self-selected sample of 53 survey participants and 16 interviewees are insufficiently specified (e.g., exact channels, screening criteria, or demographics), preventing assessment of selection bias. This is load-bearing for the central claim because the five-phase framework is derived inductively from these data and presented as characterizing refusals in LLM mental health support interactions more generally.
- [Results] Results and Framework sections: The multi-phase model is advanced as a general reframing without quantitative validation, inter-rater reliability metrics for the inductive coding, or explicit discussion of how phases were tested against alternative trajectories (e.g., silent disengagement). This weakens the move from descriptive themes to a prescriptive evaluation framework.
minor comments (2)
- [Abstract] The abstract and introduction could more explicitly state the study's exploratory scope and limitations on generalizability to avoid overstatement of the framework's reach.
- [Results] Figure or table summarizing the five phases would improve clarity when the phases are first introduced in the results.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback, which underscores the importance of methodological transparency and careful framing of our exploratory findings. We address each major comment below, indicating where revisions will be made to the manuscript.
read point-by-point responses
-
Referee: [Methods] Methods section: Recruitment details for the self-selected sample of 53 survey participants and 16 interviewees are insufficiently specified (e.g., exact channels, screening criteria, or demographics), preventing assessment of selection bias. This is load-bearing for the central claim because the five-phase framework is derived inductively from these data and presented as characterizing refusals in LLM mental health support interactions more generally.
Authors: We agree that greater specificity on recruitment is warranted to support assessment of selection bias. In the revised manuscript, we will expand the Methods section to detail recruitment channels (targeted posts on mental health forums, social media platforms, and professional networks), screening criteria (self-reported LLM use for mental health support and willingness to discuss refusal experiences), and aggregated demographic characteristics. We will also strengthen the Limitations section to explicitly state that the self-selected sample renders the five-phase framework exploratory and derived from this particular participant group, rather than a general characterization of all LLM mental health interactions. revision: yes
-
Referee: [Results] Results and Framework sections: The multi-phase model is advanced as a general reframing without quantitative validation, inter-rater reliability metrics for the inductive coding, or explicit discussion of how phases were tested against alternative trajectories (e.g., silent disengagement). This weakens the move from descriptive themes to a prescriptive evaluation framework.
Authors: The study is qualitative and inductive by design, so quantitative validation falls outside its scope; we will revise the Results and Framework sections to consistently describe the model as an exploratory framework for evaluation rather than a general or prescriptive one. We will add a description of the thematic analysis process, including iterative team consensus on codes (noting that formal inter-rater reliability metrics are not standard in reflexive thematic analysis but the collaborative refinement steps can be elaborated). We will also incorporate discussion of alternative trajectories such as silent disengagement, drawing directly from interview accounts where participants described disengaging without further interaction, and explain how these cases helped delineate phase boundaries. These textual revisions will better position the framework as descriptive while addressing the transition to evaluative use. revision: partial
- We cannot add quantitative validation or new empirical tests of the framework, as these would require a different study design and additional data collection beyond the current mixed-methods scope.
Circularity Check
Empirical multi-phase framework derived directly from primary mixed-methods data
full rationale
The paper derives its central claim (refusals as dynamic multi-phase experiences: pre-refusal expectation formation, triggering/encounter, message framing, resource referral, post-refusal outcomes) through sequential mixed-methods analysis of new survey (N=53) and interview (N=16) data, followed by inductive coding. No equations, fitted parameters, self-definitional loops, or load-bearing self-citations reduce any step to prior inputs by construction. The framework is explicitly presented as an empirical contribution grounded in the collected participant experiences rather than imported from external theorems or renamed known results.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Self-reported experiences from participants accurately capture the phases of refusal encounters in mental health contexts.
invented entities (1)
-
Multi-phase refusal experience framework
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Leah Hope Ajmani, Arka Ghosh, Benjamin Kaveladze, Eugenia Kim, Keertana Namuduri, Theresa Nguyen, Ebele Okoli, Jessica Schleider, Denae Ford, and Jina Suh. 2025. Seeking late night life lines: Experiences of conversational AI use in mental health crisis.arXiv [cs.HC] (Dec. 2025)
work page 2025
-
[2]
Andrew Selsky, Associated Press and Leah Willingham, Associated Press. 2022. How some encounters between police and people with mental illness can turn tragic. https://www.pbs.org/newshour/health/how-some-encounters-between-police-and-people-with-mental- illness-can-turn-tragic. Accessed: 2026-1-8
work page 2022
-
[3]
Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, Carol Chen, Catherine Olsson, Christopher Olah, Danny Hernandez, Dawn Drain, Deep Ganguli, Dustin Li, Eli Tran-Johnson, Ethan Perez, Jamie Kerr, Jared Mueller, Jeffrey Ladish, Joshua Landau, Kamal Ndousse, K...
work page 2026
-
[4]
Faeze Brahman, Sachin Kumar, Vidhisha Balachandran, Pradeep Dasigi, Valentina Pyatkin, Abhilasha Ravichander, Sarah Wiegreffe, Nouha Dziri, Khyathi Chandu, Jack Hessel, Yulia Tsvetkov, Noah A Smith, Yejin Choi, and Hannaneh Hajishirzi. 2024. The art of saying no: Contextual noncompliance in language models.arXiv [cs.CL](July 2024)
work page 2024
-
[5]
Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology.Qual. Res. Psychol.3, 2 (Jan. 2006), 77–101
work page 2006
-
[6]
2012.Thematic analysis.American Psychological Association
Virginia Braun and Victoria Clarke. 2012.Thematic analysis.American Psychological Association
work page 2012
-
[7]
Virginia Braun and Victoria Clarke. 2019. Reflecting on reflexive thematic analysis.Qualitative research in sport, exercise and health11, 4 (2019), 589–597
work page 2019
-
[8]
California State Legislature. 2025. Senate Bill 243: Companion Chatbots. Approved by Governor October 13, 2025. https://leginfo. legislature.ca.gov/faces/billNavClient.xhtml?bill_id=202520260SB243 Chapter 677, Statutes of 2025
work page 2025
-
[9]
Mohit Chandra, Suchismita Naik, Denae Ford, Ebele Okoli, Munmun De Choudhury, Mahsa Ershadi, Gonzalo Ramos, Javier Hernandez, Ananya Bhattacharjee, Shahed Warreth, et al. 2025. From Lived Experience to Insight: Unpacking the Psychological Risks of Using AI Conversational Agents. InProceedings of the 2025 ACM Conference on Fairness, Accountability, and Tra...
work page 2025
-
[10]
Mohit Chandra, Suchismita Naik, Denae Ford, Ebele Okoli, Munmun De Choudhury, Mahsa Ershadi, Gonzalo Ramos, Javier Hernandez, Ananya Bhattacharjee, Shahed Warreth, and Jina Suh. 2024. From lived experience to insight: Unpacking the psychological risks of using AI conversational agents.arXiv [cs.HC](Dec. 2024)
work page 2024
-
[11]
Patrick Chao, Edoardo Debenedetti, Alexander Robey, Maksym Andriushchenko, Francesco Croce, Vikash Sehwag, Edgar Dobriban, Nicolas Flammarion, George J Pappas, Florian Tramer, Hamed Hassani, and Eric Wong. 2024. JailbreakBench: An open robustness benchmark for jailbreaking large language models.arXiv [cs.CR](March 2024)
work page 2024
-
[12]
Khaoula Chehbouni, Mohammed Haddou, Jackie Chi Kit Cheung, and Golnoosh Farnadi. 2025. Neither valid nor reliable? Investigating the use of LLMs as judges.arXiv [cs.CL](Aug. 2025)
work page 2025
-
[13]
Adam Dahlgren Lindström, Leila Methnani, Lea Krause, Petter Ericson, Íñigo Martínez de Rituerto de Troya, Dimitri Coelho Mollo, and Roel Dobbe. 2025. Helpful, harmless, honest? Sociotechnical limits of AI alignment and safety through Reinforcement Learning from Human Feedback.Ethics Inf. Technol.27, 2 (June 2025), 28
work page 2025
-
[14]
Munmun De Choudhury and Sushovan De. 2014. Mental health discourse on reddit: Self-disclosure, social support, and anonymity. Proceedings of the International AAAI Conference on Web and Social Media8, 1 (May 2014), 71–80
work page 2014
-
[15]
Mary J De Silva, Erica Breuer, Lucy Lee, Laura Asher, Neerja Chowdhary, Crick Lund, and Vikram Patel. 2014. Theory of Change: a theory-driven approach to enhance the Medical Research Council’s framework for complex interventions.Trials15, 1 (July 2014), 267
work page 2014
-
[16]
John Draper and Richard T McKeon. 2024. The journey toward 988: A historical perspective on crisis hotlines in the United States. Psychiatr. Clin. North Am.47, 3 (Sept. 2024), 473–490
work page 2024
-
[17]
John Draper, Gillian Murphy, Eduardo Vega, David W Covington, and Richard McKeon. 2015. Helping callers to the National Suicide Prevention Lifeline who are at imminent risk of suicide: the importance of active engagement, active rescue, and collaboration between crisis and emergency services.Suicide Life Threat. Behav.45, 3 (June 2015), 261–270
work page 2015
-
[18]
Foundational Contributors, Ahmed El-Kishky, Daniel Selsam, Francis Song, Giambattista Parascandolo, Hongyu Ren, Hunter Lightman, Hyung Won, Ilge Akkaya, I Sutskever, Jason Wei, Jonathan Gordon, K Cobbe, Kevin Yu, Lukasz Kondraciuk, Max Schwarzer, Mostafa Rohaninejad, Noam Brown, Shengjia Zhao, Trapit Bansal, Vineet Kosaraju, Wenda Zhou Leadership, J Pacho...
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[19]
I cannot write this because it violates our content policy
Lan Gao, Oscar Chen, Rachel Lee, Nick Feamster, Chenhao Tan, and Marshini Chetty. 2025. “I cannot write this because it violates our content policy”: Understanding content moderation policies and user experiences in generative AI products.arXiv [cs.HC](June 2025)
work page 2025
-
[20]
Clifford Geertz. 1973. The impact of the concept of culture on the concept of man
work page 1973
-
[21]
Su Golder, Shahd Ahmed, Gill Norman, and Andrew Booth. 2017. Attitudes toward the ethics of research using social media: A systematic review.J. Med. Internet Res.19, 6 (June 2017), e195
work page 2017
-
[23]
Melody Y Guan, Manas Joglekar, Eric Wallace, Saachi Jain, Boaz Barak, Alec Helyar, Rachel Dias, Andrea Vallone, Hongyu Ren, Jason Wei, Hyung Won Chung, Sam Toyer, Johannes Heidecke, Alex Beutel, and Amelia Glaese. 2024. Deliberative Alignment: Reasoning enables safer language models.arXiv [cs.CL](Dec. 2024)
work page 2024
-
[24]
Seungju Han, Kavel Rao, Allyson Ettinger, Liwei Jiang, Bill Yuchen Lin, Nathan Lambert, Yejin Choi, and Nouha Dziri. 2024. WildGuard: Open one-stop moderation tools for safety risks, jailbreaks, and refusals of LLMs.arXiv [cs.CL](June 2024)
work page 2024
-
[25]
for an app supposed to make its users feel better, it sure is a joke
Md Romael Haque and Sabirat Rubya. 2022. “for an app supposed to make its users feel better, it sure is a joke” - an analysis of user reviews of mobile mental health applications.Proc. ACM Hum. Comput. Interact.6, CSCW2 (Nov. 2022), 1–29
work page 2022
-
[26]
Christina Harrington, Sheena Erete, and Anne Marie Piper. 2019. Deconstructing community-based collaborative design: Towards more equitable participatory design engagements.Proceedings of the ACM on Human-Computer Interaction3, CSCW (2019), 1–25
work page 2019
-
[27]
Kashmir Hill. 2025. A Teen Was Suicidal. ChatGPT Was the Friend He Confided In.The New York Times(Aug. 2025)
work page 2025
-
[28]
Lujain Ibrahim, Saffron Huang, Lama Ahmad, Umang Bhatt, and Markus Anderljung. 2025. Towards interactive evaluations for interaction harms in human-AI systems. InProceedings of the AAAI/ACM Conference on AI, Ethics, and Society, Vol. 8. 1302–1310
work page 2025
-
[29]
Zainab Iftikhar, Amy Xiao, Sean Ransom, Jeff Huang, and Harini Suresh. 2025. How LLM counselors violate ethical standards in mental health practice: A practitioner-informed framework.Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society8, 2 (Oct. 2025), 1311–1323
work page 2025
-
[30]
Illinois General Assembly. 2025. House Bill 1806: Wellness and Oversight for Psychological Resources Act. Signed into law August 1,
work page 2025
-
[31]
https://ilga.gov/Legislation/BillStatus?DocNum=1806&GAID=18&DocTypeID=HB&LegId=159219&SessionID=114 Public Act 104-0054
-
[32]
Nataliya V Ivankova, John W Creswell, and Sheldon L Stick. 2006. Using mixed-methods sequential explanatory design: From theory to practice.Field Methods18, 1 (Feb. 2006), 3–20
work page 2006
-
[33]
Nicholas Jenkins, Michael Bloor, Jan Fischer, Lee Berney, and Joanne Neale. 2010. Putting it in context: the use of vignettes in qualitative interviewing.Qual. Res.10, 2 (April 2010), 175–198
work page 2010
-
[34]
Kelly Joyce, Laurel Smith-Doerr, Sharla Alegria, Susan Bell, Taylor Cruz, Steve G Hoffman, Safiya Umoja Noble, and Benjamin Shestakofsky. 2021. Toward a sociology of artificial intelligence: A call for research on inequalities and structural change.Socius7 (Jan. 2021), 237802312199958
work page 2021
-
[35]
I’ve talked to ChatGPT about my issues last night
Kyuha Jung, Gyuho Lee, Yuanhui Huang, and Yunan Chen. 2025. “I’ve talked to ChatGPT about my issues last night. ”: Examining Mental Health Conversations with Large Language Models through Reddit Analysis.arXiv [cs.HC](April 2025)
work page 2025
-
[36]
Reishiro Kawakami and Sukrit Venkatagiri. 2024. The impact of generative AI on artists. InCreativity and Cognition. ACM, New York, NY, USA, 79–82
work page 2024
-
[37]
Hannah Rose Kirk, Iason Gabriel, Chris Summerfield, Bertie Vidgen, and Scott A Hale. 2025. Why human–AI relationships need socioaffective alignment.Humanit. Soc. Sci. Commun.12, 1 (May 2025), 728
work page 2025
-
[38]
Theodora Koulouri, Robert D Macredie, and David Olakitan. 2022. Chatbots to support young adults’ mental health: An exploratory study of acceptability.ACM Trans. Interact. Intell. Syst.12, 2 (June 2022), 1–39
work page 2022
-
[39]
Seth Lazar and Alondra Nelson. 2023. AI safety on whose terms?Science381, 6654 (July 2023), 138
work page 2023
-
[40]
This is human intelligence debugging artificial intelligence
Zhuoyang Li, Zihao Zhu, Xinning Gui, and Yuhan Luo. 2025. “This is human intelligence debugging artificial intelligence”: Examining how people prompt GPT in seeking mental health support.Int. J. Hum. Comput. Stud.103555 (June 2025), 103555
work page 2025
-
[41]
Michael Madaio, Lisa Egede, Hariharan Subramonyam, Jennifer Wortman Vaughan, and Hanna Wallach. 2022. Assessing the fairness of AI systems: AI practitioners’ processes, challenges, and needs for support.Proc. ACM Hum. Comput. Interact.6, CSCW1 (March 2022), 1–26
work page 2022
-
[42]
Mantas Mazeika, Long Phan, Xuwang Yin, Andy Zou, Zifan Wang, Norman Mu, Elham Sakhaee, Nathaniel Li, Steven Basart, Bo Li, David Forsyth, and Dan Hendrycks. 2024. HarmBench: A standardized evaluation framework for automated red teaming and robust refusal.arXiv [cs.LG](Feb. 2024). Beyond the Single Turn FAccT ’26, June 25–28, 2026, Montreal, QC, Canada
work page 2024
-
[43]
Miles McCain, Ryn Linthicum, Chloe Lubinski, Alex Tamkin, Saffron Huang, Michael Stern, Kunal Handa, Esin Durmus, Tyler Neylon, Stuart Ritchie, Kamya Jagadish, Paruul Maheshwary, Sarah Heck, Alexandra Sanderford, and Deep Ganguli. 2025. How People Use Claude for Support, Advice, and Companionship. https://www.anthropic.com/news/how-people-use-claude-for-s...
work page 2025
-
[44]
Ashlee Milton, Leah Ajmani, Michael Ann DeVito, and Stevie Chancellor. 2023. “I see me here”: Mental health content, community, and algorithmic curation on TikTok. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems, Vol. 16. ACM, New York, NY, USA, 1–17
work page 2023
-
[45]
Jared Moore, Declan Grabb, William Agnew, Kevin Klyman, Stevie Chancellor, Desmond C Ong, and Nick Haber. 2025. Expressing stigma and inappropriate responses prevents LLMs from safely replacing mental health providers. InProceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency. ACM, New York, NY, USA, 599–627
work page 2025
-
[46]
Ramaravind Kommiya Mothilal, Shion Guha, and Syed Ishtiaque Ahmed. 2024. Towards a non-ideal methodological framework for Responsible ML.arXiv [cs.HC](Jan. 2024)
work page 2024
-
[47]
I don’t think RAI applies to my model
Nadia Nahar, Chenyang Yang, Yanxin Chen, Wesley Hanwen Deng, Ken Holstein, Motahhare Eslami, and Christian Kästner. 2026. “I don’t think RAI applies to my model” – engaging non-champions with sticky stories for responsible AI work. InProceedings of the 2026 CHI Conference on Human Factors in Computing Systems. ACM, New York, NY, USA, 1–23
work page 2026
-
[48]
Alondra Nelson. 2023. Thick Alignment
work page 2023
-
[49]
New York State Assembly. 2025. An Act to amend the general business law, in relation to artificial intelligence companion models. Assembly Bill A6767, 2025–2026 Regular Sessions. https://www.nysenate.gov/legislation/bills/2025/A6767 Introduced by M. of A. Vanel; referred to the Committee on Consumer Affairs and Protection
work page 2025
-
[50]
OpenAI. 2025. Strengthening ChatGPT’s responses in sensitive conversations. https://openai.com/index/strengthening-chatgpt- responses-in-sensitive-conversations/. Accessed: 2025-11-24
work page 2025
-
[51]
OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, Red Avila, Igor Babuschkin, Suchir Balaji, Valerie Balcom, Paul Baltescu, Haiming Bao, Mohammad Bavarian, Jeff Belgum, Irwan Bello, Jake Berdine, Gabriel Bernadett-Shapiro, Christopher Berner...
work page 2023
-
[52]
Lawrence A Palinkas, Sarah M Horwitz, Carla A Green, Jennifer P Wisdom, Naihua Duan, and Kimberly Hoagwood. 2015. Purposeful sampling for qualitative data collection and analysis in mixed method implementation research.Adm. Policy Ment. Health42, 5 (Sept. 2015), 533–544
work page 2015
-
[53]
can I not be suicidal on a Sunday?
Sachin R Pendse, Amit Sharma, Aditya Vashistha, Munmun De Choudhury, and Neha Kumar. 2021. “can I not be suicidal on a Sunday?”: Understanding technology-mediated pathways to mental health support.Proc. SIGCHI Conf. Hum. Factor. Comput. Syst.2021 (May 2021)
work page 2021
-
[54]
Mehrdad Rahsepar Meadi, Tomas Sillekens, Suzanne Metselaar, Anton van Balkom, Justin Bernstein, and Neeltje Batelaan. 2025. Exploring the ethical challenges of conversational AI in mental health care: Scoping review.JMIR Ment. Health12, 1 (Feb. 2025), e60432
work page 2025
-
[55]
Richard Ren, Steven Basart, Adam Khoja, Alice Gatti, Long Phan, Xuwang Yin, Mantas Mazeika, Alexander Pan, Gabriel Mukobi, Ryan H Kim, Stephen Fitz, and Dan Hendrycks. 2024. Safetywashing: Do AI safety benchmarks actually measure safety progress?arXiv [cs.LG] (July 2024)
work page 2024
-
[56]
Rhode Island General Assembly. 2026. An Act Relating to Commercial Law—General Regulatory Provisions—Artificial Intelligence Companion Models. Senate Bill S2195, January Session, A.D. 2026. https://webserver.rilegislature.gov/BillText/BillText26/SenateText26/ S2195.pdf Introduced by Senators Urso, Gu, DiPalma, Paolino, Zurier, Murray, and Appollonio; refe...
work page 2026
-
[57]
Anastasia Schaadhardt, Yue Fu, Cory Gennari Pratt, and Wanda Pratt. 2023. “Laughing so I don’t cry”: How TikTok users employ humor and compassion to connect around psychiatric hospitalization. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems, Vol. 14. ACM, New York, NY, USA, 1–13
work page 2023
-
[58]
Andrew D Selbst, Danah Boyd, Sorelle A Friedler, Suresh Venkatasubramanian, and Janet Vertesi. 2019. Fairness and Abstraction in Sociotechnical Systems. InProceedings of the Conference on Fairness, Accountability, and Transparency. ACM, New York, NY, USA, 59–68
work page 2019
-
[59]
Itai Shapira, Gerdus Benade, and Ariel D Procaccia. 2026. How RLHF Amplifies Sycophancy.arXiv [cs.AI](Feb. 2026)
work page 2026
-
[60]
Renee Shelby, Shalaleh Rismani, Kathryn Henne, Ajung Moon, Negar Rostamzadeh, Paul Nicholas, N’mah Yilla-Akbari, Jess Gallegos, Andrew Smart, Emilio Garcia, and Gurleen Virk. 2023. Sociotechnical harms of algorithmic systems: Scoping a taxonomy for harm reduction. InProceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society, Vol. 24. ACM, New ...
work page 2023
-
[61]
Brett Sholtis. 2020. During A Mental Health Crisis, A Family’s Call To 911 Turns Tragic.NPR(Oct. 2020)
work page 2020
-
[62]
It happened to be the perfect thing
Steven Siddals, John Torous, and Astrid Coxon. 2024. “It happened to be the perfect thing”: experiences of generative AI chatbots for mental health.Npj Ment Health Res3, 1 (Oct. 2024), 48
work page 2024
-
[63]
Petr Slovak and Sean A Munson. 2024. HCI contributions in mental health: A modular framework to guide psychosocial intervention design.Proc. SIGCHI Conf. Hum. Factor. Comput. Syst.2024 (May 2024)
work page 2024
-
[64]
Inhwa Song, Sachin R Pendse, Neha Kumar, and Munmun De Choudhury. 2024. The typing cure: Experiences with Large Language Model chatbots for mental health support.arXiv [cs.HC](Jan. 2024)
work page 2024
-
[65]
Alexandra Souly, Qingyuan Lu, Dillon Bowen, Tu Trinh, Elvis Hsieh, Sana Pandey, Pieter Abbeel, Justin Svegliato, Scott Emmons, Olivia Watkins, and Sam Toyer. 2024. A StrongREJECT for Empty Jailbreaks. arXiv:2402.10260 [cs.LG] https://arxiv.org/abs/2402.10260
work page internal anchor Pith review arXiv 2024
-
[66]
Ningjing Tang, Megan Li, Amy Winecoff, Michael Madaio, Hoda Heidari, and Hong Shen. 2026. Navigating uncertainties: How GenAI developers document their models on open-source platforms. InProceedings of the 2026 CHI Conference on Human Factors in Computing Systems. ACM, New York, NY, USA, 1–19
work page 2026
-
[67]
Tangila Islam Tanni, Mamtaj Akter, Joshua Anderson, Mary Jean Amon, and Pamela J Wisniewski. 2024. Examining the unique online risk experiences and mental health outcomes of LGBTQ+ versus heterosexual youth. InProceedings of the CHI Conference on Human Factors in Computing Systems, Vol. 31. ACM, New York, NY, USA, 1–21
work page 2024
-
[68]
Tamar Tavory. 2024. Regulating AI in mental health: Ethics of care perspective.JMIR Ment. Health11 (Sept. 2024), e58493
work page 2024
-
[69]
Bertie Vidgen, Nino Scherrer, Hannah Rose Kirk, Rebecca Qian, Anand Kannappan, Scott A Hale, and Paul Röttger. 2023. SimpleSafe- tyTests: A test suite for identifying critical safety risks in large language models.arXiv [cs.CL](Nov. 2023)
work page 2023
-
[70]
Hanna Wallach, Meera Desai, A Feder Cooper, Angelina Wang, Chad Atalla, Solon Barocas, Su Lin Blodgett, Alexandra Chouldechova, Emily Corvi, P Alex Dow, Jean Garcia-Gathright, Alexandra Olteanu, Nicholas Pangakis, Stefanie Reed, Emily Sheng, Dan Vann, Jennifer Wortman Vaughan, Matthew Vogel, Hannah Washington, and Abigail Z Jacobs. 2025. Position: Evaluat...
work page 2025
-
[71]
Yuxia Wang, Haonan Li, Xudong Han, Preslav Nakov, and Timothy Baldwin. 2024. Do-Not-Answer: Evaluating Safeguards in LLMs. In Findings of the Association for Computational Linguistics: EACL 2024. 896–911
work page 2024
-
[72]
Laura Weidinger, Inioluwa Deborah Raji, Hanna Wallach, Margaret Mitchell, Angelina Wang, Olawale Salaudeen, Rishi Bommasani, Deep Ganguli, Sanmi Koyejo, and William Isaac. 2025. Toward an evaluation science for generative AI systems.arXiv [cs.AI](March 2025)
work page 2025
-
[73]
Laura Weidinger, Maribeth Rauh, Nahema Marchal, Arianna Manzini, Lisa Anne Hendricks, Juan Mateos-Garcia, Stevie Bergman, Jackie Kay, Conor Griffin, Ben Bariach, Iason Gabriel, Verena Rieser, and William Isaac. 2023. Sociotechnical Safety Evaluation of Generative AI Systems.arXiv [cs.AI](Oct. 2023)
work page 2023
-
[74]
Bingbing Wen, Jihan Yao, Shangbin Feng, Chenjun Xu, Yulia Tsvetkov, Bill Howe, and Lucy Lu Wang. 2025. Know your limits: A survey of abstention in large language models.Trans. Assoc. Comput. Linguist.13 (June 2025), 529–556. Beyond the Single Turn FAccT ’26, June 25–28, 2026, Montreal, QC, Canada
work page 2025
-
[75]
As an AI language model, I cannot
Joel Wester, Tim Schrills, Henning Pohl, and Niels van Berkel. 2024. “As an AI language model, I cannot”: Investigating LLM Denials of User Requests. InProceedings of the CHI Conference on Human Factors in Computing Systems (CHI ’24, Article 979). Association for Computing Machinery, New York, NY, USA, 1–14
work page 2024
-
[76]
Richmond Y Wong. 2021. Tactics of soft resistance in user experience professionals’ values work.Proc. ACM Hum. Comput. Interact.5, CSCW2 (Oct. 2021), 1–28
work page 2021
-
[77]
Tinghao Xie, Xiangyu Qi, Yi Zeng, Yangsibo Huang, Udari Madhushani Sehwag, Kaixuan Huang, Luxi He, Boyi Wei, Dacheng Li, Ying Sheng, Ruoxi Jia, Bo Li, Kai Li, Danqi Chen, Peter Henderson, and Prateek Mittal. 2024. SORRY-bench: Systematically evaluating large language model safety refusal.arXiv [cs.AI](June 2024)
work page 2024
-
[78]
Dong Whi Yoo, Jiayue Melissa Shi, Violeta J Rodriguez, and Koustuv Saha. 2025. AI chatbots for mental health: Values and harms from lived experiences of depression.arXiv [cs.HC](April 2025)
work page 2025
-
[79]
Meg Young, Lassana Magassa, and Batya Friedman. 2019. Toward inclusive tech policy design: a method for underrepresented voices to strengthen tech policy documents.Ethics and Information Technology21, 2 (2019), 89–103
work page 2019
-
[80]
Yuan Yuan, Tina Sriskandarajah, Anna-Luisa Brakman, Alec Helyar, Alex Beutel, Andrea Vallone, and Saachi Jain. 2025. From hard refusals to safe-completions: Toward output-centric safety training.arXiv [cs.CY](Aug. 2025)
work page 2025
-
[81]
Xi Zheng, Zhuoyang Li, Xinning Gui, and Yuhan Luo. 2025. Customizing emotional support: How do individuals construct and interact with LLM-powered chatbots. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. ACM, New York, NY, USA, 1–20
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.