SPOT-Bench tests real-time streaming video perception with timeliness metrics, exposing limitations in current models and introducing AsynKV as an improved baseline.
Vispeak: Visual instruction feedback in streaming videos
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 6representative citing papers
StreamGaze is a new benchmark and QA generation pipeline that measures how well MLLMs leverage gaze trajectories for temporal reasoning and proactive intention prediction in streaming egocentric videos.
LyraV uses FDTC and SToP for per-frame incremental decoding to reach 98.29% video synchrony at 3.89 FPS while preserving general understanding.
DSCache decouples cumulative past and instant KV caches with position-agnostic encoding to adapt offline VideoVLLMs to streaming video, delivering 2.5% average accuracy gains on QA benchmarks.
Seed1.8 is a new foundation model that adds unified agentic capabilities for search, code execution, and GUI interaction to existing LLM and vision strengths.
Seed2.0 model series reports gains in reasoning, visual understanding, search, and reliability on intricate long-horizon tasks via an internal evaluation system.
citing papers explorer
-
Don't Pause! Every prediction matters in a streaming video
SPOT-Bench tests real-time streaming video perception with timeliness metrics, exposing limitations in current models and introducing AsynKV as an improved baseline.
-
StreamGaze: Gaze-Guided Temporal Reasoning and Proactive Understanding in Streaming Videos
StreamGaze is a new benchmark and QA generation pipeline that measures how well MLLMs leverage gaze trajectories for temporal reasoning and proactive intention prediction in streaming egocentric videos.
-
Don't Pause: Streaming Video-Language Synchrony for Online Video Understanding
LyraV uses FDTC and SToP for per-frame incremental decoding to reach 98.29% video synchrony at 3.89 FPS while preserving general understanding.
-
Decouple and Cache: KV Cache Construction for Streaming Video Understanding
DSCache decouples cumulative past and instant KV caches with position-agnostic encoding to adapt offline VideoVLLMs to streaming video, delivering 2.5% average accuracy gains on QA benchmarks.
-
Seed1.8 Model Card: Towards Generalized Real-World Agency
Seed1.8 is a new foundation model that adds unified agentic capabilities for search, code execution, and GUI interaction to existing LLM and vision strengths.
-
Seed2.0 Model Card: Towards Intelligence Frontier for Real-World Complexity
Seed2.0 model series reports gains in reasoning, visual understanding, search, and reliability on intricate long-horizon tasks via an internal evaluation system.