AVOC is a retrieval-inspired token compression framework that improves long-form audio-video understanding in multimodal LLMs by selecting informative tokens based on classical IR principles.
Msjoe: Jointly evolving mllm and sampler for efficient long-form video understanding.arXiv preprint arXiv:2602.22932, 2026
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
AVOC: Enhancing Hour-Level Audio-Video Understanding in Omni-Modal LLMs via Retrieval-Inspired Token Compression
AVOC is a retrieval-inspired token compression framework that improves long-form audio-video understanding in multimodal LLMs by selecting informative tokens based on classical IR principles.