Speech-Driven Facial Reenactment Using Conditional Generative Adversarial Networks
classification
💻 cs.CV
keywords
audionetworksadversarialconditionalfacegenerativeinputlandmarks
read the original abstract
We present a novel approach to generating photo-realistic images of a face with accurate lip sync, given an audio input. By using a recurrent neural network, we achieved mouth landmarks based on audio features. We exploited the power of conditional generative adversarial networks to produce highly-realistic face conditioned on a set of landmarks. These two networks together are capable of producing a sequence of natural faces in sync with an input audio track.
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
Test-Time Self-Adaptive Conditioning for Stable Audio-Driven Talking-Head Generation
TT-SAC is a parameter-free inference framework that uses a generator-encoder feedback loop to adapt conditioning representations and stabilize identity and motion in audio-driven talking-head videos.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.