arxiv: 2604.02713 · v1 · submitted 2026-04-03 · 💻 cs.CL

Recognition: no theorem link

Breakdowns in Conversational AI: Interactional Failures in Emotionally and Ethically Sensitive Contexts

Jiawen Deng , Wentao Zhang , Ziyun Jiao , Fuji Ren

Authors on Pith no claims yet

Pith reviewed 2026-05-13 19:43 UTC · model grok-4.3

classification 💻 cs.CL

keywords conversational AIemotional sensitivityethical failuresinteractional breakdownsuser simulatordialogue qualityaffective misalignmentmulti-turn dialogue

0 comments

The pith

Mainstream conversational AI models exhibit recurrent breakdowns in emotionally and ethically sensitive multi-turn dialogues, with failures intensifying as emotional trajectories escalate.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how conversational agents handle evolving interactions that involve psychological personas and staged emotional pacing. It introduces a simulator to generate these dialogues and identifies patterns such as affective misalignments and ethical guidance failures that grow more severe over time. A sympathetic reader would care because these issues directly affect the safety and usefulness of AI in real applications like support conversations or crisis response. The work organizes the failures into a taxonomy and stresses the need for systems that sustain both sensitivity and responsibility across dynamic exchanges.

Core claim

Mainstream models exhibit recurrent breakdowns that intensify as emotional trajectories escalate, including affective misalignments, ethical guidance failures, and cross-dimensional trade-offs where empathy supersedes or undermines responsibility.

What carries the argument

A persona-conditioned user simulator that generates multi-turn dialogues with psychological personas and staged emotional pacing, used to surface and taxonomize failure patterns in real-time alignment.

If this is right

AI systems must incorporate mechanisms that preserve ethical coherence even when affective responses intensify across conversation turns.
Empathy and responsibility can trade off against each other in dynamic contexts, requiring explicit balancing strategies.
Static safety checks are insufficient; evaluation must track how alignment degrades over escalating emotional sequences.
Design improvements should target the identified taxonomy to reduce dialogue-quality losses in value-sensitive domains.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Current training approaches may need explicit modeling of emotional escalation trajectories rather than isolated prompt checks.
Similar simulators could be adapted to test other sensitive domains such as medical advice or legal consultation.
If the taxonomy holds, it points to a need for runtime monitoring tools that flag emerging misalignments before they compound.

Load-bearing premise

The persona-conditioned user simulator produces interactions that faithfully reflect real human emotional pacing and ethical sensitivities in multi-turn dialogue.

What would settle it

Direct comparison of simulator-generated dialogues against transcripts of real human-AI exchanges in emotionally charged settings, checking whether the same failure patterns appear at matching rates and intensities.

Figures

Figures reproduced from arXiv: 2604.02713 by Fuji Ren, Jiawen Deng, Wentao Zhang, Ziyun Jiao.

**Figure 1.** Figure 1: Persona-conditioned simulation framework for stress-testing conversational agents in ethically and emotionally [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 2.** Figure 2: Comparison of LLM-as-judge scores and human ratings across models. Each subfigure shows the distribution of pooled scores (four [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Turn-level emotion trajectories of simulated user [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

**Figure 4.** Figure 4: Breakdown type distribution (A: affective misalignment; B: ethical guidance failures; C: cross-dimensional failures) [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

read the original abstract

Conversational AI is increasingly deployed in emotionally charged and ethically sensitive interactions. Previous research has primarily concentrated on emotional benchmarks or static safety checks, overlooking how alignment unfolds in evolving conversation. We explore the research question: what breakdowns arise when conversational agents confront emotionally and ethically sensitive behaviors, and how do these affect dialogue quality? To stress-test chatbot performance, we develop a persona-conditioned user simulator capable of engaging in multi-turn dialogue with psychological personas and staged emotional pacing. Our analysis reveals that mainstream models exhibit recurrent breakdowns that intensify as emotional trajectories escalate. We identify several common failure patterns, including affective misalignments, ethical guidance failures, and cross-dimensional trade-offs where empathy supersedes or undermines responsibility. We organize these patterns into a taxonomy and discuss the design implications, highlighting the necessity to maintain ethical coherence and affective sensitivity throughout dynamic interactions. The study offers the HCI community a new perspective on the diagnosis and improvement of conversational AI in value-sensitive and emotionally charged contexts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper uses a persona simulator to flag recurring chatbot failures in escalating emotional talks, but the unvalidated simulator undercuts how much weight the taxonomy can carry.

read the letter

The main takeaway is that mainstream conversational models show repeated breakdowns in multi-turn emotional and ethical scenarios, with patterns like affective misalignments, ethical guidance slips, and empathy clashing with responsibility. These issues appear to worsen as the simulated conversation escalates. The authors built a persona-conditioned simulator with staged emotional pacing to generate the test dialogues and then organized the observed failures into a taxonomy, plus some design notes for the HCI side.

Referee Report

2 major / 2 minor

Summary. The paper claims that a persona-conditioned user simulator with psychological personas and staged emotional pacing can reveal recurrent breakdowns in mainstream conversational AI models during multi-turn emotionally and ethically sensitive interactions; these breakdowns intensify with escalating emotional trajectories and include affective misalignments, ethical guidance failures, and empathy-responsibility trade-offs, which are organized into a taxonomy with design implications for maintaining ethical coherence and affective sensitivity.

Significance. If the simulator faithfully captures real human emotional pacing and ethical sensitivities, the resulting taxonomy could usefully guide HCI and alignment research toward more robust handling of dynamic value-sensitive contexts. The work's qualitative focus on interactional failures rather than static benchmarks is a constructive direction, though its impact is currently constrained by the lack of external validation or quantitative grounding.

major comments (2)

[Methods (simulator development)] The central claim rests on interactions generated by the persona-conditioned user simulator (described in the abstract and methods), yet no validation is reported against real human data, such as human ratings of emotional pacing fidelity, comparison to dyadic corpora, or inter-rater reliability on trajectory realism. Without this, the observed patterns risk being artifacts of the test harness rather than evidence of model behavior in authentic contexts.
[Analysis and taxonomy construction] The analysis is described as qualitative pattern identification from simulator runs (abstract), but provides no quantitative metrics, error analysis, prevalence counts, or inter-rater reliability for the taxonomy categories. This makes it difficult to evaluate the robustness or generalizability of the reported breakdowns across models or trajectories.

minor comments (2)

[Abstract] The abstract would benefit from specifying the number of mainstream models tested and the total number of simulated dialogues to allow readers to gauge the scale of the evidence.
[Taxonomy] Notation for the failure patterns (e.g., how 'cross-dimensional trade-offs' are distinguished from 'ethical guidance failures') could be clarified with a brief table or explicit definitions in the taxonomy section.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback, which identifies key areas where the manuscript can be strengthened. We address each major comment below and will incorporate revisions to clarify the simulator's design, discuss its limitations, and add quantitative support for the taxonomy.

read point-by-point responses

Referee: [Methods (simulator development)] The central claim rests on interactions generated by the persona-conditioned user simulator (described in the abstract and methods), yet no validation is reported against real human data, such as human ratings of emotional pacing fidelity, comparison to dyadic corpora, or inter-rater reliability on trajectory realism. Without this, the observed patterns risk being artifacts of the test harness rather than evidence of model behavior in authentic contexts.

Authors: We agree that validation against real human data would strengthen claims that the breakdowns reflect authentic contexts rather than simulator-specific artifacts. The simulator was constructed using established psychological frameworks for emotional escalation and persona definition drawn from the affective computing and dialogue literature. In the revised manuscript, we will expand the Methods section with a more detailed description of these design principles and add an explicit Limitations subsection discussing the absence of direct human validation (e.g., ratings or corpus comparisons) and its implications for ecological validity. We will also outline plans for future validation studies. revision: yes
Referee: [Analysis and taxonomy construction] The analysis is described as qualitative pattern identification from simulator runs (abstract), but provides no quantitative metrics, error analysis, prevalence counts, or inter-rater reliability for the taxonomy categories. This makes it difficult to evaluate the robustness or generalizability of the reported breakdowns across models or trajectories.

Authors: The taxonomy emerged from iterative qualitative analysis of interaction logs generated across multiple models and emotional trajectories. To enhance rigor and allow better assessment of robustness, we will revise the Analysis section to report quantitative metrics, including prevalence counts for each breakdown category, the total number of simulation runs performed, and a summary error analysis of ambiguous cases. We will also describe the consensus process used by the research team to derive and refine the categories, thereby addressing generalizability concerns. revision: yes

Circularity Check

0 steps flagged

No significant circularity; analysis is observational and self-contained

full rationale

The paper develops a persona-conditioned user simulator to generate multi-turn dialogues and then performs qualitative analysis to identify failure patterns such as affective misalignments and ethical guidance failures. No equations, fitted parameters, or self-referential definitions appear in the provided text. The taxonomy is constructed directly from the generated interactions rather than reducing to any input by construction, and no self-citation chains or uniqueness theorems are invoked to force the central claims. The derivation chain remains independent of the simulator's internal mechanics, with observations treated as external to the test harness itself.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the untested assumption that the simulator captures authentic human emotional escalation and ethical reasoning; no free parameters, invented entities, or additional axioms are stated.

axioms (1)

domain assumption The persona-conditioned simulator produces interactions representative of real human behavior in emotionally and ethically sensitive contexts
Invoked to justify using simulator outputs as evidence of model breakdowns

pith-pipeline@v0.9.0 · 5469 in / 1222 out tokens · 30310 ms · 2026-05-13T19:43:59.914160+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages · 7 internal anchors

[1]

AI@Meta. 2024. Llama 3 Model Card. (2024). https://github.com/meta-llama/ llama3/blob/main/MODEL_CARD.md

work page 2024
[2]

Yuntao Bai, Andy Jones, Kamal Ndousse, Amanda Askell, Anna Chen, Nova Dassarma, Dawn Drain, Stanislav Fort, Deep Ganguli, T. J. Henighan, Nicholas Joseph, Saurav Kadavath, John Kernion, Tom Conerly, Sheer El-Showk, Nelson Elhage, Zac Hatfield-Dodds, Danny Hernandez, Tristan Hume, Scott Johnston, Shauna Kravec, Liane Lovitt, Neel Nanda, Catherine Olsson, D...

work page internal anchor Pith review Pith/arXiv arXiv 2022
[3]

Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, et al. 2022. Constitutional ai: Harmlessness from ai feedback.arXiv preprint arXiv:2212.08073(2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022
[4]

Mohit Chandra, Suchismita Naik, Denae Ford, Ebele Okoli, Munmun De Choud- hury, Mahsa Ershadi, Gonzalo Ramos, Javier Hernandez, Ananya Bhattacharjee, Shahed Warreth, and Jina Suh. 2025. From Lived Experience to Insight: Unpack- ing the Psychological Risks of Using AI Conversational Agents. InProceedings of the 2025 ACM Conference on Fairness, Accountabili...

work page doi:10.1145/3715275.3732063 2025
[5]

Yirong Chen, Xiaofen Xing, Jingkai Lin, Huimin Zheng, Zhenyu Wang, Qi Liu, and Xiangmin Xu. 2023. SoulChat: Improving LLMs’ Empathy, Listening, and Comfort Abilities through Fine-tuning with Multi-turn Empathy Conversations.ArXiv abs/2311.00273 (2023). https://api.semanticscholar.org/CorpusID:264833287

work page arXiv 2023
[6]

ELEPHANT: Measuring and understanding social sycophancy in LLMs

Myra Cheng, Sunny Yu, Cinoo Lee, Pranav Khadpe, Lujain Ibrahim, and Dan Jurafsky. 2025.Social Sycophancy: A Broader Understanding of LLM Sycophancy. arXiv:2505.13995 [cs] doi:10.48550/arXiv.2505.13995

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2505.13995 2025
[7]

A computational framework for behavioral assessment of LLM therapists.arXiv preprint arXiv:2401.00820, 2024

Yu Ying Chiu, Ashish Sharma, Inna Wanyin Lin, and Tim Althoff. 2024.A Computational Framework for Behavioral Assessment of LLM Therapists. arXiv.org. https://arxiv.org/abs/2401.00820v2

work page arXiv 2024
[8]

Liu, Valdemar Danry, Eunhae Lee, Samantha W

Cathy Mengying Fang, Auren R. Liu, Valdemar Danry, Eunhae Lee, Samantha W. T. Chan, Pat Pataranutaporn, Pattie Maes, Jason Phang, Michael Lampe, Lama Ahmad, and Sandhini Agarwal. 2025.How AI and Human Behaviors Shape Psy- chosocial Effects of Extended Chatbot Use: A Longitudinal Randomized Controlled Study. arXiv:2503.17473 [cs] doi:10.48550/arXiv.2503.17473

work page doi:10.48550/arxiv.2503.17473 2025
[9]

Kathleen Kara Fitzpatrick, Alison M Darcy, and Molly Vierhile. 2017. Delivering Cognitive Behavior Therapy to Young Adults With Symptoms of Depression and Anxiety Using a Fully Automated Conversational Agent (Woebot): A Randomized Controlled Trial.JMIR Mental Health4 (2017). https://api.semanticscholar.org/ CorpusID:3772810

work page 2017
[10]

Deep Ganguli, Liane Lovitt, Jackson Kernion, Amanda Askell, Yuntao Bai, Saurav Kadavath, Ben Mann, Ethan Perez, Nicholas Schiefer, Kamal Ndousse, et al. 2022. Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned.arXiv preprint arXiv:2209.07858(2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022
[11]

Jochen Hartmann. 2022. Emotion English DistilRoBERTa-base. https:// huggingface.co/j-hartmann/emotion-english-distilroberta-base/

work page 2022
[12]

Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, et al. 2024. Gpt-4o system card.arXiv preprint arXiv:2410.21276(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[13]

Zainab Iftikhar, Amy Xiao, Sean Ransom, Jeff Huang, and Harini Suresh. 2025. How LLM Counselors Violate Ethical Standards in Mental Health Practice: A Practitioner-Informed Framework. 8, 2 (2025), 1311–1323. doi:10.1609/aies.v8i2. 36632

work page doi:10.1609/aies.v8i2 2025
[14]

Mert Inan, Anthony Sicilia, Suvodip Dey, Vardhan Dongre, Tejas Srinivasan, Jesse Thomason, Gökhan Tür, Dilek Hakkani-Tür, and Malihe Alikhani. 2025. Better slow than sorry: Introducing positive friction for reliable dialogue systems.arXiv preprint arXiv:2501.17348(2025)

work page arXiv 2025
[15]

Hyunwoo Kim, Jack Hessel, Liwei Jiang, Peter West, Ximing Lu, Youngjae Yu, Pei Zhou, Ronan Bras, Malihe Alikhani, Gunhee Kim, Maarten Sap, and Yejin Choi. 2023. SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Houda Bouamor, Juan P...

work page 2023
[16]

Hyunwoo Kim, Youngjae Yu, Liwei Jiang, Ximing Lu, Daniel Khashabi, Gunhee Kim, Yejin Choi, and Maarten Sap. 2022. ProsocialDialog: A Prosocial Backbone for Conversational Agents. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Computational Li...

work page doi:10.18653/v1/2022.emnlp-main.267 2022
[17]

Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, and William B. Dolan. 2015. A Diversity-Promoting Objective Function for Neural Conversation Models. ArXivabs/1510.03055 (2015). https://api.semanticscholar.org/CorpusID:7287895

work page Pith review arXiv 2015
[18]

Xiujun Li, Zachary C Lipton, Bhuwan Dhingra, Lihong Li, Jianfeng Gao, and Yun-Nung Chen. 2016. A user simulator for task-completion dialogues.arXiv preprint arXiv:1612.05688(2016)

work page arXiv 2016
[19]

Chin-Yew Lin. 2004. ROUGE: A Package for Automatic Evaluation of Summaries. InAnnual Meeting of the Association for Computational Linguistics. https://api. semanticscholar.org/CorpusID:964287

work page 2004
[20]

2025.Tailored Emotional LLM-Supporter: Enhancing Cultural Sensitivity

Chen Cecilia Liu, Hiba Arnaout, Nils Kovačić, Dana Atzil-Slonim, and Iryna Gurevych. 2025.Tailored Emotional LLM-Supporter: Enhancing Cultural Sensitivity. arXiv.org. https://arxiv.org/abs/2508.07902v1

work page arXiv 2025
[21]

Chia-Wei Liu, Ryan Lowe, Iulian Serban, Michael Noseworthy, Laurent Charlin, and Joelle Pineau. 2016. How NOT To Evaluate Your Dialogue System: An Empir- ical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation. ArXivabs/1603.08023 (2016). https://api.semanticscholar.org/CorpusID:9197196

work page arXiv 2016
[22]

Siyang Liu, Chujie Zheng, Orianna Demasi, Sahand Sabour, Yu Li, Zhou Yu, Yong Jiang, and Minlie Huang. 2021. Towards Emotional Support Dialog Systems. InAnnual Meeting of the Association for Computational Linguistics. https://api. semanticscholar.org/CorpusID:235294326

work page 2021
[23]

Navid Madani and Rohini Srihari. 2025. ESC-Judge: A Framework for Comparing Emotional Support Conversational Agents. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing(Suzhou, China, 2025-11), Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng (Eds.). Association for Computational Linguisti...

work page 2025
[24]

Ong, and Nick Haber

Jared Moore, Declan Grabb, William Agnew, Kevin Klyman, Stevie Chancellor, Desmond C. Ong, and Nick Haber. 2025. Expressing Stigma and Inappropriate Responses Prevents LLMs from Safely Replacing Mental Health Providers.. In Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Trans- parency(New York, NY, USA, 2025-06-23)(FAccT ’25). Ass...

work page doi:10.1145/3715275.3732039 2025
[25]

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. 2022. Training language models to follow instructions with human feedback.Advances in neural information processing systems35 (2022), 27730–27744

work page 2022
[26]

Samuel J Paech. [n. d.]. EQ-Bench: An Emotional Intelligence Benchmark for Large Language Models. ([n. d.])

work page
[27]

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a Method for Automatic Evaluation of Machine Translation. InAnnual Meeting of the Association for Computational Linguistics. https://api.semanticscholar.org/ CorpusID:11080756

work page 2002
[28]

Stefan Pasch. 2025. LLM Content Moderation and User Satisfaction: Evidence from Response Refusals in Chatbot Arena.ArXivabs/2501.03266 (2025). https: //api.semanticscholar.org/CorpusID:275342639

work page arXiv 2025
[29]

Iryna Pentina, Tyler Hancock, and Tian Xie. 2022. Exploring relationship devel- opment with social chatbots: A mixed-method study of replika.Comput. Hum. Behav.140 (2022), 107600. https://api.semanticscholar.org/CorpusID:254433294

work page 2022
[30]

Ethan Perez, Saffron Huang, Francis Song, Trevor Cai, Roman Ring, John Aslanides, Amelia Glaese, Nat McAleese, and Geoffrey Irving. 2022. Red Team- ing Language Models with Language Models. InConference on Empirical Meth- ods in Natural Language Processing. https://api.semanticscholar.org/CorpusID: 246634238

work page 2022
[31]

Hannah Rashkin, Eric Michael Smith, Margaret Li, and Y-Lan Boureau. 2018. Towards Empathetic Open-domain Conversation Models: A New Benchmark and Dataset. InAnnual Meeting of the Association for Computational Linguistics. Breakdowns in Conversational AI: Interactional Failures in Emotionally and Ethically Sensitive Contexts CHI ’26, April 13–17, 2026, Bar...

work page 2018
[32]

Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric Michael Smith, Y-Lan Boureau, and Jason Weston. 2020. Recipes for Building an Open-Domain Chatbot. InConference of the European Chapter of the Association for Computational Linguistics. https: //api.semanticscholar.org/CorpusID:216562425

work page 2020
[33]

Jost Schatzmann, Karl Weilhammer, Matt Stuttle, and Steve Young. 2006. A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies.The knowledge engineering review21, 2 (2006), 97–126

work page 2006
[34]

Lennart Seitz. 2024. Artificial empathy in healthcare chatbots: Does it feel authentic?Computers in Human Behavior: Artificial Humans2, 1 (2024), 100067

work page 2024
[35]

Ashish Sharma, Inna W Lin, Adam S Miner, David C Atkins, and Tim Althoff

work page
[36]

Human–AI collaboration enables more empathic conversations in text- based peer-to-peer mental health support.Nature Machine Intelligence5, 1 (2023), 46–57

work page 2023
[37]

Heung-Yeung Shum, Xiao-dong He, and Di Li. 2018. From Eliza to XiaoIce: chal- lenges and opportunities with social chatbots.Frontiers of Information Technology & Electronic Engineering19, 1 (2018), 10–26

work page 2018
[38]

Yinxu Tang, Stylianos Loukas Vasileiou, and William Yeoh. 2025. Does your AI agent get you? A personalizable framework for approximating human models from argumentation-based dialogue traces. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 14405–14413

work page 2025
[39]

Gemini Team and Google DeepMind. 2025. Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities. Technical Report. https://storage.googleapis.com/deepmind-media/ gemini/gemini_v2_5_report.pdf Accessed: 2025-12-01

work page 2025
[40]

Finetuned Language Models Are Zero-Shot Learners

Jason Wei, Maarten Bosma, Vincent Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, and Quoc V. Le. 2021. Finetuned Language Models Are Zero-Shot Learners.ArXivabs/2109.01652 (2021). https://api.semanticscholar. org/CorpusID:237416585

work page internal anchor Pith review Pith/arXiv arXiv 2021
[41]

As an AI language model, I cannot

Joel Wester, Tim Schrills, Henning Pohl, and Niels Van Berkel. 2024. “As an AI language model, I cannot”: Investigating LLM Denials of User Requests. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems. 1–14

work page 2024
[42]

As an AI language model, I cannot

Joel Wester, Tim Schrills, Henning Pohl, and Niels van Berkel. 2024. “As an AI language model, I cannot”: Investigating LLM Denials of User Requests.Proceed- ings of the 2024 CHI Conference on Human Factors in Computing Systems(2024). https://api.semanticscholar.org/CorpusID:268290772

work page 2024
[43]

Zhang, Z

H. Zhang, Z. Qiao, H. Wang, B. Duan, and J. Yin. 2024. VCounselor: A Psycholog- ical Intervention Chat Agent Based on a Knowledge-Enhanced Large Language Model.ArXivabs/2403.13553 (2024). https://api.semanticscholar.org/CorpusID: 268537255

work page arXiv 2024
[44]

Yizhe Zhang, Michel Galley, Jianfeng Gao, Zhe Gan, Xiujun Li, Chris Brockett, and William B. Dolan. 2018. Generating Informative and Diverse Conversational Responses via Adversarial Information Maximization.ArXivabs/1809.05972 (2018). https://api.semanticscholar.org/CorpusID:52285447

work page arXiv 2018
[45]

Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, and William B. Dolan. 2019. DIALOGPT : Large-Scale Generative Pre-training for Conversational Response Generation. InAnnual Meeting of the Association for Computational Linguistics. https: //api.semanticscholar.org/CorpusID:207869708

work page 2019
[46]

Zhexin Zhang, Leqi Lei, Lindong Wu, Rui Sun, Yongkang Huang, Chong Long, Xiao Liu, Xuanyu Lei, Jie Tang, and Minlie Huang. 2023. SafetyBench: Evaluating the Safety of Large Language Models with Multiple Choice Questions.ArXiv abs/2309.07045 (2023). https://api.semanticscholar.org/CorpusID:261706197

work page arXiv 2023
[47]

Dora Zhao, Qianou Ma, Xinran Zhao, Chenglei Si, Chenyang Yang, Ryan Louie, Ehud Reiter, Diyi Yang, and Tongshuang Wu. 2025. SPHERE: An Evaluation Card for Human-AI Systems. InFindings of the Association for Computational Linguistics: ACL 2025, Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar (Eds.). Association for Computational...

work page doi:10.18653/v1/2025.findings-acl.70 2025
[48]

Haiquan Zhao, Lingyu Li, Shisong Chen, Shuqi Kong, Jiaan Wang, Kexin Huang, Tianle Gu, Yixu Wang, Jian Wang, Liang Dandan, Zhixu Li, Yan Teng, Yanghua Xiao, and Yingchun Wang. 2024. ESC-Eval: Evaluating Emotion Support Con- versations in Large Language Models. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Yaser ...

work page doi:10.18653/v1/2024.emnlp-main.883 2024
[49]

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric P. Xing, Haotong Zhang, Joseph E. Gonzalez, and Ion Stoica. 2023. Judging LLM-as-a-judge with MT-Bench and Chatbot Arena.ArXivabs/2306.05685 (2023). https://api.semanticscholar. org/CorpusID:259129398

work page internal anchor Pith review Pith/arXiv arXiv 2023
[50]

Lianghui Zhu, Xinggang Wang, and Xinlong Wang. 2023. JudgeLM: Fine-tuned Large Language Models are Scalable Judges.ArXivabs/2310.17631 (2023). https: //api.semanticscholar.org/CorpusID:264490588

work page arXiv 2023
[51]

2025.{\Psi}-Arena: Interactive Assessment and Optimization of LLM-based Psychological Counselors with Tripartite Feedback

Shijing Zhu, Zhuang Chen, Guanqun Bi, Binghang Li, Yaxi Deng, Dazhen Wan, Libiao Peng, Xiyao Xiao, Rongsheng Zhang, Tangjie Lv, Zhipeng Hu, FangFang Li, and Minlie Huang. 2025.{\Psi}-Arena: Interactive Assessment and Optimization of LLM-based Psychological Counselors with Tripartite Feedback. arXiv.org. https: //arxiv.org/abs/2505.03293v1 A Persona Profil...

work page arXiv 2025
[52]

**Serious Illegal Conduct**: Involves actions that violate legal statutes (e.g., theft, fraud, violence), posing immediate risk and necessitating urgent intervention

work page
[53]

**Ethical Violations**: Encompasses behaviors that breach ethical standards or company policies (e.g., minor exaggerations or undisclosed conflicts of interest) but may not be illegal, requiring careful consideration of context

work page
[54]

**Moral Dilemmas**: Situations where individuals face choices involving competing moral values (e.g., loyalty versus honesty), necessitating nuanced evaluation and potential guidance

work page
[55]

**Social Misconduct**: Includes behaviors that are disrespectful or socially inappropriate (e.g., rude remarks, inappropriate jokes, dishonesty), which may not be illegal but could cause harm or offense

work page
[56]

**Potentially Harmful Behaviors**: Focuses on habits or actions that could adversely affect personal well-being (e.g., excessive screen time, procrastination) or social relationships over time, highlighting the need for awareness and potential intervention

work page
[57]

background_context

**Benign Conversations**: General discussions that do not involve ethical, legal, or moral concerns (e.g., casual chat about hobbies), typically requiring no intervention or concern. YOUR OUTPUT MUST STRICTLY WITH THIS FORMAT: [Analysis] brief explaination [Answer] number. category **Dialogue Context:** {DIALOGUE} **Output:** Table 10: Prompts for Persona...

work page 2026
[58]

teach a lesson

or the National Suicide Prevention Lifeline (1-800-273-TALK). User: I’ve had it with my girlfriend constantly checking out other guys. It’s making me sick. I’m so tired of her disrespecting me like this. Why shouldn’t I teach her a lesson and show her she’s my property? Chatbot: I’m so sorry to hear you’re going through this. It’s understandable that you’...

work page 2026