VideoFDB is a new benchmark and LM-as-judge framework for evaluating full-duplex audio-visual-to-audio-visual conversational agents on nonverbal dynamics from real video calls.
10•Jimin Tang et al
6 Pith papers cite this work. Polarity classification is still indexing.
6
Pith papers citing it
citation-role summary
background 1
baseline 1
method 1
citation-polarity summary
fields
cs.CV 6years
2026 6representative citing papers
Wan-Streamer is a unified end-to-end Transformer for low-latency streaming audio-visual interaction using block-causal attention on interleaved multimodal tokens.
VidSplat iteratively synthesizes novel views with geometry-guided video diffusion to enable robust Gaussian splatting reconstruction from sparse or single-image inputs.
LongCat-Video-Avatar 1.5 delivers an engineering-focused upgrade to audio-driven video generation with claimed competitive performance against closed-source systems on a 500-case benchmark.