MedProbeBench is the first benchmark using clinical guidelines as references to evaluate LLMs and deep research agents on multi-step evidence integration for expert-level medical guideline generation, with evaluations showing major gaps.
Atypical spindle cell lipomatous tumor: clinicopathologic characterization of 232 cases demonstrating a morphologic spectrum.Am J Surg Pathol.2017;41(2):234–44
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
MedProbeBench: Systematic Benchmarking at Deep Evidence Integration for Expert-level Medical Guideline
MedProbeBench is the first benchmark using clinical guidelines as references to evaluate LLMs and deep research agents on multi-step evidence integration for expert-level medical guideline generation, with evaluations showing major gaps.