VideoFDB is a new benchmark and LM-as-judge framework for evaluating full-duplex audio-visual-to-audio-visual conversational agents on nonverbal dynamics from real video calls.
10•Jimin Tang et al
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 6years
2026 6representative citing papers
Wan-Streamer is a unified end-to-end Transformer for low-latency streaming audio-visual interaction using block-causal attention on interleaved multimodal tokens.
VidSplat iteratively synthesizes novel views with geometry-guided video diffusion to enable robust Gaussian splatting reconstruction from sparse or single-image inputs.
LongCat-Video-Avatar 1.5 delivers an engineering-focused upgrade to audio-driven video generation with claimed competitive performance against closed-source systems on a 500-case benchmark.
citing papers explorer
-
VideoFDB: Evaluating Full-Duplex Vision-Speech Capabilities in Conversational Agents
VideoFDB is a new benchmark and LM-as-judge framework for evaluating full-duplex audio-visual-to-audio-visual conversational agents on nonverbal dynamics from real video calls.
-
Wan-Streamer v0.1: End-to-end Real-time Interactive Foundation Models
Wan-Streamer is a unified end-to-end Transformer for low-latency streaming audio-visual interaction using block-causal attention on interleaved multimodal tokens.
-
VidSplat: Gaussian Splatting Reconstruction with Geometry-Guided Video Diffusion Priors
VidSplat iteratively synthesizes novel views with geometry-guided video diffusion to enable robust Gaussian splatting reconstruction from sparse or single-image inputs.
-
LongCat-Video-Avatar 1.5 Technical Report
LongCat-Video-Avatar 1.5 delivers an engineering-focused upgrade to audio-driven video generation with claimed competitive performance against closed-source systems on a 500-case benchmark.
- OmniHuman: A Large-scale Dataset and Benchmark for Human-Centric Video Generation
- Multimodal Large Language Model-Enabled Video Translation: A Role-Oriented Survey