Med-StepBench is the first large-scale step-wise hallucination benchmark for 3D oncological PET/CT that decomposes clinical reasoning into four stages and reveals systematic VLM failures hidden by aggregate metrics.
Ehrxqa: A multi-modal question answering dataset for electronic health records with chest x-ray images.Advances in Neu- ral Information Processing Systems, 36:3867–3880
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1roles
background 1polarities
background 1representative citing papers
citing papers explorer
-
Med-StepBench: A Hierarchical Reasoning Framework for Evaluating Hallucinations in Medical Vision-Language Models
Med-StepBench is the first large-scale step-wise hallucination benchmark for 3D oncological PET/CT that decomposes clinical reasoning into four stages and reveals systematic VLM failures hidden by aggregate metrics.