Fre-Res compresses video tokens by preserving spatial anchors and representing temporal dynamics with low-frequency residual tokens derived from 1D-DCT on inter-frame residuals, plus a Spatial-Guided Absorber to reinject the information.
LLaV A-NeXT: Stronger LLMs supercharge multimodal capabilities in the wild
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Fre-Res: Frequency-Residual Video Token Compression for Efficient Video MLLMs
Fre-Res compresses video tokens by preserving spatial anchors and representing temporal dynamics with low-frequency residual tokens derived from 1D-DCT on inter-frame residuals, plus a Spatial-Guided Absorber to reinject the information.