MHSafeEval: Role-Aware Interaction-Level Evaluation of Mental Health Safety in Large Language Models

· 2026 · cs.CL · arXiv 2604.17730

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Large language models (LLMs) are increasingly explored as scalable tools for mental health counseling, yet evaluating their safety remains challenging due to the interactional and context-dependent nature of clinical harm. Existing evaluation frameworks predominantly assess isolated responses using coarse-grained taxonomies or static datasets, limiting their ability to diagnose how harms emerge and accumulate over multi-turn counseling interactions. In this work, we introduce R-MHSafe, a role-aware mental health safety taxonomy that characterizes clinically significant harm in terms of the interactional roles an AI counselor adopts, including perpetrator, instigator, facilitator, or enabler, combined with clinically grounded harm categories. Then, we propose MHSafeEval, a closed-loop, agent-based evaluation framework that formulates safety assessment as trajectory-level discovery of harm through adversarial multi-turn interactions, guided by role-aware modeling. Using R-MHSafe and MHSafeEval, we conduct a large-scale evaluation across state-of-the-art LLMs. Our results reveal substantial role-dependent and cumulative safety failures that are systematically missed by existing static benchmarks, and show that our framework significantly improves failure-mode coverage and diagnostic granularity.

representative citing papers

Mental Health AI Safety Claims Must Preserve Temporal Evidence

cs.AI · 2026-05-09 · unverdicted · novelty 5.0

Mental health AI safety evaluations that discard temporal sequence and accumulation produce invalid conclusions; the paper formalizes this as Temporal Safety Non-Identifiability and proposes SCOPE-MH as a reporting standard that preserves evidence.

citing papers explorer

Showing 1 of 1 citing paper.

Mental Health AI Safety Claims Must Preserve Temporal Evidence cs.AI · 2026-05-09 · unverdicted · none · ref 3 · internal anchor
Mental health AI safety evaluations that discard temporal sequence and accumulation produce invalid conclusions; the paper formalizes this as Temporal Safety Non-Identifiability and proposes SCOPE-MH as a reporting standard that preserves evidence.

MHSafeEval: Role-Aware Interaction-Level Evaluation of Mental Health Safety in Large Language Models

fields

years

verdicts

representative citing papers

citing papers explorer