X-Stream benchmark shows SOTA MLLMs score ~50% on concurrent multi-stream tasks and lack proactive ability, using a dual-verification pipeline to avoid single-stream bias.
arXiv preprint arXiv:2512.03405 (2025) 4
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
CodecCap introduces a keyframe-residual captioning structure inspired by video codecs to achieve higher-fidelity dense video captions than direct VLM generation.
citing papers explorer
-
X-Stream: Exploring MLLMs as Multiplexers for Multi-Stream Understanding
X-Stream benchmark shows SOTA MLLMs score ~50% on concurrent multi-stream tasks and lack proactive ability, using a dual-verification pipeline to avoid single-stream bias.
-
CodecCap: High-Fidelity Codec-Inspired Residual Modeling for Dense Video Captioning
CodecCap introduces a keyframe-residual captioning structure inspired by video codecs to achieve higher-fidelity dense video captions than direct VLM generation.