Small well-performing LVLMs gain the most from test-time scaling with up to 30% improvements that can match or exceed larger models, while visual information is used mainly early in reasoning chains.
ArXivabs/2509.03321(2025),https://api.semanticscholar.org/CorpusID: 281092552
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
On Test-Time Scaling for Vision-Language Models
Small well-performing LVLMs gain the most from test-time scaling with up to 30% improvements that can match or exceed larger models, while visual information is used mainly early in reasoning chains.