A systematic study of 2800 AI responses shows automated metrics using clinician references closely match human ratings for answering hospitalization questions, using clinical notes, and applying medical knowledge.
Ayers, Adam Poliak, Mark Dredze, Eric C
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Automated Evaluation can Distinguish the Good and Bad AI Responses to Patient Questions about Hospitalization
A systematic study of 2800 AI responses shows automated metrics using clinician references closely match human ratings for answering hospitalization questions, using clinical notes, and applying medical knowledge.