10•Jimin Tang et al

Yikang Ding, Jiwen Liu, Wenyuan Zhang, Zekun Wang, Wentao Hu, Liyuan Cui, Mingming Lao, Yingchao Shao, Hui Liu, Xiaohan Li, Ming Chen, Xiaoqiang Liu, Yu-shen Liu, Pengfei Wan · 2025 · arXiv 2509.09595

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 1 baseline 1 method 1

citation-polarity summary

background 1 baseline 1 use method 1

representative citing papers

VideoFDB: Evaluating Full-Duplex Vision-Speech Capabilities in Conversational Agents

cs.CV · 2026-05-28 · unverdicted · novelty 8.0

VideoFDB is a new benchmark and LM-as-judge framework for evaluating full-duplex audio-visual-to-audio-visual conversational agents on nonverbal dynamics from real video calls.

Wan-Streamer v0.1: End-to-end Real-time Interactive Foundation Models

cs.CV · 2026-06-23 · unverdicted · novelty 6.0

Wan-Streamer is a unified end-to-end Transformer for low-latency streaming audio-visual interaction using block-causal attention on interleaved multimodal tokens.

VidSplat: Gaussian Splatting Reconstruction with Geometry-Guided Video Diffusion Priors

cs.CV · 2026-05-12 · unverdicted · novelty 6.0

VidSplat iteratively synthesizes novel views with geometry-guided video diffusion to enable robust Gaussian splatting reconstruction from sparse or single-image inputs.

LongCat-Video-Avatar 1.5 Technical Report

cs.CV · 2026-05-26 · unverdicted · novelty 3.0

LongCat-Video-Avatar 1.5 delivers an engineering-focused upgrade to audio-driven video generation with claimed competitive performance against closed-source systems on a 500-case benchmark.

OmniHuman: A Large-scale Dataset and Benchmark for Human-Centric Video Generation

cs.CV · 2026-04-20

Multimodal Large Language Model-Enabled Video Translation: A Role-Oriented Survey

cs.CV · 2026-04-13

citing papers explorer

Showing 6 of 6 citing papers.

VideoFDB: Evaluating Full-Duplex Vision-Speech Capabilities in Conversational Agents cs.CV · 2026-05-28 · unverdicted · none · ref 12
VideoFDB is a new benchmark and LM-as-judge framework for evaluating full-duplex audio-visual-to-audio-visual conversational agents on nonverbal dynamics from real video calls.
Wan-Streamer v0.1: End-to-end Real-time Interactive Foundation Models cs.CV · 2026-06-23 · unverdicted · none · ref 14
Wan-Streamer is a unified end-to-end Transformer for low-latency streaming audio-visual interaction using block-causal attention on interleaved multimodal tokens.
VidSplat: Gaussian Splatting Reconstruction with Geometry-Guided Video Diffusion Priors cs.CV · 2026-05-12 · unverdicted · none · ref 6
VidSplat iteratively synthesizes novel views with geometry-guided video diffusion to enable robust Gaussian splatting reconstruction from sparse or single-image inputs.
LongCat-Video-Avatar 1.5 Technical Report cs.CV · 2026-05-26 · unverdicted · none · ref 17
LongCat-Video-Avatar 1.5 delivers an engineering-focused upgrade to audio-driven video generation with claimed competitive performance against closed-source systems on a 500-case benchmark.
OmniHuman: A Large-scale Dataset and Benchmark for Human-Centric Video Generation cs.CV · 2026-04-20 · unreviewed · ref 12
Multimodal Large Language Model-Enabled Video Translation: A Role-Oriented Survey cs.CV · 2026-04-13 · unreviewed · ref 131

10•Jimin Tang et al

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer