Streamingvlm: Real-time understanding for infinite video streams.arXiv preprint arXiv:2510.02295, 2025

Enxin Song, Wenhao Chai, Shusheng Yang, Ethan Armand, Xiaojun Shan, Haiyang Xu, Jianwen Xie, Zhuowen Tu · 2025 · arXiv 2510.02295

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

representative citing papers

Omni-DuplexEval: Evaluating Real-time Duplex Omni-modal Interaction

cs.CV · 2026-05-17 · conditional · novelty 7.0

Omni-DuplexEval creates a new benchmark and LLM-as-a-Judge framework for real-time duplex omni-modal interaction, revealing that current models score below 40% overall and struggle especially with proactive responses.

BlossomRec: Block-level Fused Sparse Attention Mechanism for Sequential Recommendations

cs.IR · 2025-12-15 · unverdicted · novelty 6.0

BlossomRec is a sparse attention mechanism that uses two distinct block-level patterns for long-term and short-term interests, fused by a gated output, to reduce computation in sequential recommendation Transformers.

Towards Effective Long Video Understanding of Multimodal Large Language Models via One-shot Clip Retrieval

cs.CV · 2025-12-09 · unverdicted · novelty 6.0

OneClip-RAG enables MLLMs to handle long videos via one-shot clip retrieval and unified chunking-retrieval, delivering performance gains like matching GPT-5 level on MLVU with high efficiency on standard GPUs.

citing papers explorer

Showing 3 of 3 citing papers.

Omni-DuplexEval: Evaluating Real-time Duplex Omni-modal Interaction cs.CV · 2026-05-17 · conditional · none · ref 28
Omni-DuplexEval creates a new benchmark and LLM-as-a-Judge framework for real-time duplex omni-modal interaction, revealing that current models score below 40% overall and struggle especially with proactive responses.
BlossomRec: Block-level Fused Sparse Attention Mechanism for Sequential Recommendations cs.IR · 2025-12-15 · unverdicted · none · ref 58
BlossomRec is a sparse attention mechanism that uses two distinct block-level patterns for long-term and short-term interests, fused by a gated output, to reduce computation in sequential recommendation Transformers.
Towards Effective Long Video Understanding of Multimodal Large Language Models via One-shot Clip Retrieval cs.CV · 2025-12-09 · unverdicted · none · ref 44
OneClip-RAG enables MLLMs to handle long videos via one-shot clip retrieval and unified chunking-retrieval, delivering performance gains like matching GPT-5 level on MLVU with high efficiency on standard GPUs.

Streamingvlm: Real-time understanding for infinite video streams.arXiv preprint arXiv:2510.02295, 2025

fields

years

verdicts

representative citing papers

citing papers explorer