Introduces a competency-based GPBench benchmark and evaluates ten LLMs, concluding they require continuous human supervision for clinical general practice.
preventing diseases before they occur, preventing disease progression during illness, and preventing recurrence after illness
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Evaluating Clinical Competencies of Large Language Models with a General Practice Benchmark
Introduces a competency-based GPBench benchmark and evaluates ten LLMs, concluding they require continuous human supervision for clinical general practice.