Psychiatrists show low inter-rater reliability when evaluating LLM mental health responses, with systematic disagreement reflecting distinct clinical frameworks rather than random error.
Predicting Depression via Social Media.International AAAI Conference on Web and Social Media, 7(1):128–137, 2013
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Expert Evaluation and the Limits of Human Feedback in Mental Health AI Safety Testing
Psychiatrists show low inter-rater reliability when evaluating LLM mental health responses, with systematic disagreement reflecting distinct clinical frameworks rather than random error.