Omni-DuplexEval creates a new benchmark and LLM-as-a-Judge framework for real-time duplex omni-modal interaction, revealing that current models score below 40% overall and struggle especially with proactive responses.
Spot-bench: Benchmarking real-time spoken proactive video understanding.arXiv preprint arXiv:2505.08765, 2025
2 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 2roles
dataset 1polarities
use dataset 1representative citing papers
Large multimodal models display emerging but limited spatial action capabilities in goal-oriented urban 3D navigation, remaining far from human-level performance with errors diverging rapidly after critical decision points.
citing papers explorer
-
Omni-DuplexEval: Evaluating Real-time Duplex Omni-modal Interaction
Omni-DuplexEval creates a new benchmark and LLM-as-a-Judge framework for real-time duplex omni-modal interaction, revealing that current models score below 40% overall and struggle especially with proactive responses.
-
How Far Are Large Multimodal Models from Human-Level Spatial Action? A Benchmark for Goal-Oriented Embodied Navigation in Urban Airspace
Large multimodal models display emerging but limited spatial action capabilities in goal-oriented urban 3D navigation, remaining far from human-level performance with errors diverging rapidly after critical decision points.