FlashLips delivers 100+ FPS mask-free lip-sync by reconstructing target frames in latent space from an audio-predicted lips-pose vector using a compact U-Net trained solely on reconstruction losses and self-supervised mask removal.
Dream-talk: Diffusion-based realistic emotional audio-driven method for single image talking face generation, 2023
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
FlashLips: 100-FPS Mask-Free Latent Lip-Sync using Reconstruction Instead of Diffusion or GANs
FlashLips delivers 100+ FPS mask-free lip-sync by reconstructing target frames in latent space from an audio-predicted lips-pose vector using a compact U-Net trained solely on reconstruction losses and self-supervised mask removal.