DPVR-LF routes saturated vision tokens into a one-layer side branch after layer 4, runs text-only processing through layers 5-17, and performs late fusion at the final layer to reduce visual computation while preserving multimodal performance.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Late-Layer Fusion is Enough: Dual-Path Vision Token Routing for Multimodal Large Language Models under Visual Saturation
DPVR-LF routes saturated vision tokens into a one-layer side branch after layer 4, runs text-only processing through layers 5-17, and performs late fusion at the final layer to reduce visual computation while preserving multimodal performance.