PRCR enables replay-free visual revisiting in interleaved multimodal reasoning by storing raw visual KV caches with spatial coordinates and rebinding keys to position-compatible coordinates, matching replay performance while cutting computation by orders of magnitude.
Beyond static visual tokens: Structured sequential visual chain-of-thought reasoning.arXiv preprint arXiv:2603.26737, 2026
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Position Rebinding Cache Reuse: Replay-Free Visual Revisiting for Interleaved Multimodal Reasoning
PRCR enables replay-free visual revisiting in interleaved multimodal reasoning by storing raw visual KV caches with spatial coordinates and rebinding keys to position-compatible coordinates, matching replay performance while cutting computation by orders of magnitude.