Lip Forcing distills a 14B bidirectional video diffusion teacher into autoregressive students that achieve real-time lip synchronization at 31 FPS using two denoising steps without CFG.
arXiv:2512.25066 [cs.CV] https://arxiv.org/abs/2512.25066 Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
years
2026 3representative citing papers
JUST-DUB-IT adapts a joint audio-visual diffusion model via LoRA to generate high-quality dubbed videos with translated audio and lip-synced facial motion.
MindFlow presents a neuroscience-inspired dual-stream generative model that uses chunk-state emotional modeling and conditional flow matching to produce facial animations with improved semantic fit and motion realism in dyadic conversations.
citing papers explorer
-
JUST-DUB-IT: Video Dubbing via Joint Audio-Visual Diffusion
JUST-DUB-IT adapts a joint audio-visual diffusion model via LoRA to generate high-quality dubbed videos with translated audio and lip-synced facial motion.