A new multi-frame VQA benchmark on volumetric MRI demonstrates that bounding-box supervised fine-tuning improves spatial grounding in VLMs over zero-shot baselines.
Med- max: Mixed-modal instruction tuning for training biomedical assistants.arXiv preprint arXiv:2412.12661
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Beyond a Single Frame: Multi-Frame Spatially Grounded Reasoning Across Volumetric MRI
A new multi-frame VQA benchmark on volumetric MRI demonstrates that bounding-box supervised fine-tuning improves spatial grounding in VLMs over zero-shot baselines.