Long grounded thoughts: Distilling com- positional visual reasoning chains at scale.arXiv preprint arXiv:2511.05705, 2025

David Acuna et al · 2025 · arXiv 2511.05705

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

cs.CV · 2025-12-11 · unverdicted · novelty 5.0

GETok partitions images with grid tokens and refines locations via offset tokens to enable better native 2D spatial reasoning in MLLMs.

Showing 1 of 1 citing paper.

Grounding Everything in Tokens for Multimodal Large Language Models cs.CV · 2025-12-11 · unverdicted · none · ref 1
GETok partitions images with grid tokens and refines locations via offset tokens to enable better native 2D spatial reasoning in MLLMs.