MedProbeBench is the first benchmark using clinical guidelines as references to evaluate LLMs and deep research agents on multi-step evidence integration for expert-level medical guideline generation, with evaluations showing major gaps.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
MedProbeBench: Systematic Benchmarking at Deep Evidence Integration for Expert-level Medical Guideline
MedProbeBench is the first benchmark using clinical guidelines as references to evaluate LLMs and deep research agents on multi-step evidence integration for expert-level medical guideline generation, with evaluations showing major gaps.