Add relevant content to make the entry more complete and quickly level up

For web pages or interface screenshots, do not answer questions about the following information by default, unless they themselves constitute the main content: -top navigation ba

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Caption First, VQA Second: Knowledge Density, Not Task Format, Drives Multimodal Scaling

cs.CL · 2026-03-17 · unverdicted · novelty 5.0

Knowledge density in image captions, not task format diversity, is the primary driver of multimodal LLM scaling performance.

citing papers explorer

Showing 1 of 1 citing paper.

Caption First, VQA Second: Knowledge Density, Not Task Format, Drives Multimodal Scaling cs.CL · 2026-03-17 · unverdicted · none · ref 4
Knowledge density in image captions, not task format diversity, is the primary driver of multimodal LLM scaling performance.

Add relevant content to make the entry more complete and quickly level up

fields

years

verdicts

representative citing papers

citing papers explorer