pith. sign in

Don't Pause! Every prediction matters in a streaming video

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it
abstract

Streaming video models should respond the moment an event unfolds, not after the moment has passed. Yet existing online VideoQA benchmarks remain largely retrospective. They pause the video at fixed timestamps, pose questions about current or past events, and score models only at those moments. This protocol leaves streaming predictions untested. To close this gap, we introduce SPOT-Bench, featuring multi-turn proactive queries that evaluate general streaming perception and assistive capabilities required by an always-on, real-time assistant. SPOT-Bench comes with Timeliness-F1, a consolidated metric that measures streaming predictions by their temporal precision and balanced coverage across the entire video. Our benchmark reveals: (i) offline models detect events reliably but spam predictions unprompted; (ii) post-training for silence reduces spamming but induces unresponsiveness; (iii) half of the streaming video expects no response, which we term dead-time - compute spent here does not affect response latency. These findings motivate AsynKV, a training-free streaming adaptation of offline models, that retains their event perception while improving their streaming behavior. AsynKV features a long-short term memory, utilized efficiently by scaling compute during dead-time. It serves as a strong baseline on SPOT-Bench, outperforming existing streaming models, and achieves state-of-the-art on retrospective benchmarks.

fields

cs.CL 1 cs.CV 1

years

2026 2

verdicts

UNVERDICTED 2

clear filters

representative citing papers

Harnessing Streaming Video in the Wild

cs.CV · 2026-06-07 · unverdicted · novelty 6.0

Presents Streaming-Train-248K dataset, Streaming Harness system, and Streaming-Eval benchmark to enable VLMs for proactive, memory-equipped streaming video understanding.

citing papers explorer

Showing 1 of 1 citing paper after filters.

  • Harnessing Streaming Video in the Wild cs.CV · 2026-06-07 · unverdicted · none · ref 5 · internal anchor

    Presents Streaming-Train-248K dataset, Streaming Harness system, and Streaming-Eval benchmark to enable VLMs for proactive, memory-equipped streaming video understanding.