CounselBench introduces expert-rated evaluations and an adversarial test set showing LLMs frequently produce unconstructive, overgeneralized, or unsafe responses in mental health QA compared to human therapists.
The opportunities and risks of large language models in mental health
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2025 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
CounselBench: A Large-Scale Expert Evaluation and Adversarial Benchmarking of Large Language Models in Mental Health Question Answering
CounselBench introduces expert-rated evaluations and an adversarial test set showing LLMs frequently produce unconstructive, overgeneralized, or unsafe responses in mental health QA compared to human therapists.