Instruction-free tuning of LVLMs on medical image-description pairs via momentum proxy instructions and response shuffling achieves SOTA accuracy on VQA tasks across SKINCON, WBCAtt, CBIS, and MIMIC-CXR.
Lima: Less is more for alignment,
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
AutoVQA-G is a self-improving framework that generates VQA-G datasets with higher visual grounding accuracy than leading multimodal LLMs via iterative CoT verification and prompt refinement.
citing papers explorer
-
Instruction-Free Tuning of Large Vision Language Models for Medical Instruction Following
Instruction-free tuning of LVLMs on medical image-description pairs via momentum proxy instructions and response shuffling achieves SOTA accuracy on VQA tasks across SKINCON, WBCAtt, CBIS, and MIMIC-CXR.
-
AutoVQA-G: Self-Improving Agentic Framework for Automated Visual Question Answering and Grounding Annotation
AutoVQA-G is a self-improving framework that generates VQA-G datasets with higher visual grounding accuracy than leading multimodal LLMs via iterative CoT verification and prompt refinement.