MultiTalk: Enhancing 3D talking head generation across languages with multilingual video dataset

Kim Sung-Bin, Lee Chae-Yeon, Gihun Son, Oh Hyun-Bin, Janghoon Ju, Suekyeong Nam, Tae-Hyun Oh · 2024 · arXiv 2406.14272

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1 baseline 1

citation-polarity summary

background 1 baseline 1

representative citing papers

ViBES: A Conversational Agent with Behaviorally-Intelligent 3D Virtual Body

cs.CV · 2025-12-16 · unverdicted · novelty 7.0

ViBES introduces a speech-language-behavior model using modality-specific transformer experts that jointly generates dialogue and 3D body actions, showing gains over separate co-speech and text-to-motion baselines on multi-turn metrics.

The DeepSpeak Dataset

cs.CV · 2024-08-09 · unverdicted · novelty 7.0

DeepSpeak provides over 100 hours of consented, identity-matched real and modern deepfake audiovisual content focused on talking heads, with evaluations showing existing detectors fail to generalize without retraining.

EchoTorrent: Towards Swift, Sustained, and Streaming Multi-Modal Video Generation

cs.CV · 2026-02-14 · unverdicted · novelty 4.0

EchoTorrent combines multi-teacher distillation, adaptive CFG calibration, hybrid long-tail forcing, and VAE decoder refinement to enable few-pass autoregressive streaming video generation with improved temporal consistency and audio-lip sync.

OmniHuman: A Large-scale Dataset and Benchmark for Human-Centric Video Generation

cs.CV · 2026-04-20

citing papers explorer

Showing 4 of 4 citing papers.

ViBES: A Conversational Agent with Behaviorally-Intelligent 3D Virtual Body cs.CV · 2025-12-16 · unverdicted · none · ref 101
ViBES introduces a speech-language-behavior model using modality-specific transformer experts that jointly generates dialogue and 3D body actions, showing gains over separate co-speech and text-to-motion baselines on multi-turn metrics.
The DeepSpeak Dataset cs.CV · 2024-08-09 · unverdicted · none · ref 49
DeepSpeak provides over 100 hours of consented, identity-matched real and modern deepfake audiovisual content focused on talking heads, with evaluations showing existing detectors fail to generalize without retraining.
EchoTorrent: Towards Swift, Sustained, and Streaming Multi-Modal Video Generation cs.CV · 2026-02-14 · unverdicted · none · ref 10
EchoTorrent combines multi-teacher distillation, adaptive CFG calibration, hybrid long-tail forcing, and VAE decoder refinement to enable few-pass autoregressive streaming video generation with improved temporal consistency and audio-lip sync.
OmniHuman: A Large-scale Dataset and Benchmark for Human-Centric Video Generation cs.CV · 2026-04-20 · unreviewed · ref 41

MultiTalk: Enhancing 3D talking head generation across languages with multilingual video dataset

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer