FlashLips delivers 100+ FPS mask-free lip-sync by reconstructing target frames in latent space from an audio-predicted lips-pose vector using a compact U-Net trained solely on reconstruction losses and self-supervised mask removal.
Dinet: Deformation inpainting network for realistic face visually dubbing on high resolution video
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
FlashLips: 100-FPS Mask-Free Latent Lip-Sync using Reconstruction Instead of Diffusion or GANs
FlashLips delivers 100+ FPS mask-free lip-sync by reconstructing target frames in latent space from an audio-predicted lips-pose vector using a compact U-Net trained solely on reconstruction losses and self-supervised mask removal.