pith. sign in

Visual sketchpad: Sketching as a visual chain of thought for multimodal language models.Advances in Neural Information Processing Systems, 37:139348–139379,

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

fields

cs.CV 1

years

2026 1

verdicts

CONDITIONAL 1

representative citing papers

Leveraging Latent Visual Reasoning in Silence

cs.CV · 2026-05-18 · conditional · novelty 6.0

Latent visual reasoning improves multimodal models via training effects even without using latent tokens at inference, enabled by an attention-based RL reward that promotes interaction with text tokens.

citing papers explorer

Showing 1 of 1 citing paper.

  • Leveraging Latent Visual Reasoning in Silence cs.CV · 2026-05-18 · conditional · none · ref 12

    Latent visual reasoning improves multimodal models via training effects even without using latent tokens at inference, enabled by an attention-based RL reward that promotes interaction with text tokens.