Med-StepBench is the first large-scale step-wise hallucination benchmark for 3D oncological PET/CT that decomposes clinical reasoning into four stages and reveals systematic VLM failures hidden by aggregate metrics.
Medheval: Benchmarking hallucinations and mitigation strategies in medical large vision–language models
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Med-StepBench: A Hierarchical Reasoning Framework for Evaluating Hallucinations in Medical Vision-Language Models
Med-StepBench is the first large-scale step-wise hallucination benchmark for 3D oncological PET/CT that decomposes clinical reasoning into four stages and reveals systematic VLM failures hidden by aggregate metrics.