Chain-of-thought underperforms direct answering in medical VQA due to a perception bottleneck, but ROI cues and textual grounding interventions can improve results and reverse the gap.
arXiv preprint arXiv:2506.13793 (2025)
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 2years
2026 2representative citing papers
MRPO is a step-aware RL method that penalizes early reasoning errors exponentially more when the final answer is incorrect, reducing early-stage failures from 64% to 13% and outperforming baselines including larger models on medical VQA tasks.
citing papers explorer
-
Better Eyes, Better Thoughts: Why Vision Chain-of-Thought Fails in Medicine
Chain-of-thought underperforms direct answering in medical VQA due to a perception bottleneck, but ROI cues and textual grounding interventions can improve results and reverse the gap.
-
Breaking Failure Cascades: Step-Aware Reinforcement Learning for Medical Multimodal Reasoning
MRPO is a step-aware RL method that penalizes early reasoning errors exponentially more when the final answer is incorrect, reducing early-stage failures from 64% to 13% and outperforming baselines including larger models on medical VQA tasks.