arXiv preprint arXiv:2504.02386 (2025)

Kim Sung-Bin, Jeongsoo Choi, Puyuan Peng, Joon Son Chung, Tae-Hyun Oh, David Harwath · 2025 · arXiv 2504.02386

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

CoSyncDiT: Cognitive Synchronous Diffusion Transformer for Movie Dubbing

cs.SD · 2026-04-14 · unverdicted · novelty 7.0

CoSyncDiT is a cognitive-inspired diffusion transformer that achieves state-of-the-art lip synchronization and naturalness in movie dubbing by guiding noise-to-speech generation through acoustic, visual, and contextual stages plus joint regularization.

JAM-Flow: Joint Audio-Motion Synthesis with Flow Matching

cs.CV · 2025-06-30 · unverdicted · novelty 6.0

JAM-Flow introduces a unified flow-matching model with a Multi-Modal Diffusion Transformer that jointly synthesizes facial motion and speech from text, audio, or motion inputs.

citing papers explorer

Showing 2 of 2 citing papers.

CoSyncDiT: Cognitive Synchronous Diffusion Transformer for Movie Dubbing cs.SD · 2026-04-14 · unverdicted · none · ref 38
CoSyncDiT is a cognitive-inspired diffusion transformer that achieves state-of-the-art lip synchronization and naturalness in movie dubbing by guiding noise-to-speech generation through acoustic, visual, and contextual stages plus joint regularization.
JAM-Flow: Joint Audio-Motion Synthesis with Flow Matching cs.CV · 2025-06-30 · unverdicted · none · ref 39
JAM-Flow introduces a unified flow-matching model with a Multi-Modal Diffusion Transformer that jointly synthesizes facial motion and speech from text, audio, or motion inputs.

arXiv preprint arXiv:2504.02386 (2025)

fields

years

verdicts

representative citing papers

citing papers explorer