Hybrid human-LLM features let traditional ML models reach 0.717-0.849 F1 for hallucination detection and 0.59-0.64 F1 for omissions in mental health data, beating LLM judges at 52% accuracy.
Identify any entities mentioned 3
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Blending Human and LLM Expertise to Detect Hallucinations and Omissions in Mental Health Chatbot Responses
Hybrid human-LLM features let traditional ML models reach 0.717-0.849 F1 for hallucination detection and 0.59-0.64 F1 for omissions in mental health data, beating LLM judges at 52% accuracy.