Seven clinician-informed safety criteria enable LLM-as-a-Judge to reach substantial agreement with human consensus (Cohen's κ up to 0.75) on evaluating LLM responses to users demonstrating psychosis.
Building trust in mental health chatbots: safety metrics and llm-based evaluation tools.arXiv preprint arXiv:2408.04650, 2024
3 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
Creates a clinical crisis taxonomy and 2,252-example dataset then audits five LLMs, finding variable safety with notable failures on indirect signals and in self-harm categories.
Mental health AI safety evaluations that discard temporal sequence and accumulation produce invalid conclusions; the paper formalizes this as Temporal Safety Non-Identifiability and proposes SCOPE-MH as a reporting standard that preserves evidence.
citing papers explorer
-
Using LLM-as-a-Judge/Jury to Advance Scalable, Clinically-Validated Safety Evaluations of Model Responses to Users Demonstrating Psychosis
Seven clinician-informed safety criteria enable LLM-as-a-Judge to reach substantial agreement with human consensus (Cohen's κ up to 0.75) on evaluating LLM responses to users demonstrating psychosis.
-
Between Help and Harm: An Evaluation of Mental Health Crisis Handling by LLMs
Creates a clinical crisis taxonomy and 2,252-example dataset then audits five LLMs, finding variable safety with notable failures on indirect signals and in self-harm categories.
-
Mental Health AI Safety Claims Must Preserve Temporal Evidence
Mental health AI safety evaluations that discard temporal sequence and accumulation produce invalid conclusions; the paper formalizes this as Temporal Safety Non-Identifiability and proposes SCOPE-MH as a reporting standard that preserves evidence.