MedProbeBench is the first benchmark using clinical guidelines as references to evaluate LLMs and deep research agents on multi-step evidence integration for expert-level medical guideline generation, with evaluations showing major gaps.
Tumours composed of fat are no longer a simple diagnosis: an overview of fatty tumours with a spindle cell component.J Clin Pathol.2018;71(6):483–92
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
MedProbeBench: Systematic Benchmarking at Deep Evidence Integration for Expert-level Medical Guideline
MedProbeBench is the first benchmark using clinical guidelines as references to evaluate LLMs and deep research agents on multi-step evidence integration for expert-level medical guideline generation, with evaluations showing major gaps.