Proactivevideoqa: A comprehensive benchmark evaluating proactive interactions in video large language models

Yueqian Wang, Xiaojun Meng, Yifan Wang, Huishuai Zhang, Dongyan Zhao · 2025 · arXiv 2507.09313

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

baseline 1 dataset 1

citation-polarity summary

baseline 1 use dataset 1

representative citing papers

X-Stream: Exploring MLLMs as Multiplexers for Multi-Stream Understanding

cs.CV · 2026-06-01 · unverdicted · novelty 7.0 · 2 refs

X-Stream benchmark shows SOTA MLLMs score ~50% on concurrent multi-stream tasks and lack proactive ability, using a dual-verification pipeline to avoid single-stream bias.

An Efficient Streaming Video Understanding Framework with Agentic Control

cs.CV · 2026-05-18 · unverdicted · novelty 7.0

R3-Streaming uses cascaded control with age-aware memory forgetting and TB-GRPO reinforcement learning to reach SOTA scores of 57.92 on OVO-Bench and 76.36 on StreamingBench with 95-96% fewer visual tokens.

StreamPro: From Reactive Perception to Proactive Decision-Making in Streaming Video

cs.CV · 2026-05-11 · unverdicted · novelty 7.0

StreamPro introduces a benchmark and training method using CB-Stream Loss and GRPO to enable proactive decision-making in streaming videos, achieving 41.5 on StreamPro-Bench compared to 10.4 previously.

Don't Pause! Every prediction matters in a streaming video

cs.CV · 2026-04-27 · unverdicted · novelty 7.0

SPOT-Bench tests real-time streaming video perception with timeliness metrics, exposing limitations in current models and introducing AsynKV as an improved baseline.

From Reactive to Proactive: Assessing the Proactivity of Voice Agents via ProVoice-Bench

cs.AI · 2026-04-16 · unverdicted · novelty 7.0

ProVoice-Bench is the first framework to evaluate proactive voice agents, revealing that state-of-the-art multimodal LLMs struggle with over-triggering and context-aware reasoning.

IPIBench: Evaluating Interactive Proactive Intelligence of MLLMs under Continuous Streams

cs.CV · 2026-05-26 · unverdicted · novelty 6.0

IPIBench evaluates MLLMs on interactive proactive intelligence in streaming videos, identifies unstable triggering and poor coordination, and proposes the training-free IPI-Agent framework to improve performance across settings.

Omni-DuplexEval: Evaluating Real-time Duplex Omni-modal Interaction

cs.CV · 2026-05-17 · unverdicted · novelty 6.0 · 2 refs

Omni-DuplexEval provides a new benchmark and automatic evaluation method for real-time duplex omni-modal interaction, showing state-of-the-art models reach only 39.6% overall and 20% on proactive reminders.

citing papers explorer

Showing 7 of 7 citing papers after filters.

X-Stream: Exploring MLLMs as Multiplexers for Multi-Stream Understanding cs.CV · 2026-06-01 · unverdicted · none · ref 43 · 2 links
X-Stream benchmark shows SOTA MLLMs score ~50% on concurrent multi-stream tasks and lack proactive ability, using a dual-verification pipeline to avoid single-stream bias.
An Efficient Streaming Video Understanding Framework with Agentic Control cs.CV · 2026-05-18 · unverdicted · none · ref 27
R3-Streaming uses cascaded control with age-aware memory forgetting and TB-GRPO reinforcement learning to reach SOTA scores of 57.92 on OVO-Bench and 76.36 on StreamingBench with 95-96% fewer visual tokens.
StreamPro: From Reactive Perception to Proactive Decision-Making in Streaming Video cs.CV · 2026-05-11 · unverdicted · none · ref 17
StreamPro introduces a benchmark and training method using CB-Stream Loss and GRPO to enable proactive decision-making in streaming videos, achieving 41.5 on StreamPro-Bench compared to 10.4 previously.
Don't Pause! Every prediction matters in a streaming video cs.CV · 2026-04-27 · unverdicted · none · ref 61
SPOT-Bench tests real-time streaming video perception with timeliness metrics, exposing limitations in current models and introducing AsynKV as an improved baseline.
From Reactive to Proactive: Assessing the Proactivity of Voice Agents via ProVoice-Bench cs.AI · 2026-04-16 · unverdicted · none · ref 23
ProVoice-Bench is the first framework to evaluate proactive voice agents, revealing that state-of-the-art multimodal LLMs struggle with over-triggering and context-aware reasoning.
IPIBench: Evaluating Interactive Proactive Intelligence of MLLMs under Continuous Streams cs.CV · 2026-05-26 · unverdicted · none · ref 44
IPIBench evaluates MLLMs on interactive proactive intelligence in streaming videos, identifies unstable triggering and poor coordination, and proposes the training-free IPI-Agent framework to improve performance across settings.
Omni-DuplexEval: Evaluating Real-time Duplex Omni-modal Interaction cs.CV · 2026-05-17 · unverdicted · none · ref 11 · 2 links
Omni-DuplexEval provides a new benchmark and automatic evaluation method for real-time duplex omni-modal interaction, showing state-of-the-art models reach only 39.6% overall and 20% on proactive reminders.

Proactivevideoqa: A comprehensive benchmark evaluating proactive interactions in video large language models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer