Speech-Driven Facial Reenactment Using Conditional Generative Adversarial Networks

· 2018 · cs.CV · arXiv 1803.07461

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

We present a novel approach to generating photo-realistic images of a face with accurate lip sync, given an audio input. By using a recurrent neural network, we achieved mouth landmarks based on audio features. We exploited the power of conditional generative adversarial networks to produce highly-realistic face conditioned on a set of landmarks. These two networks together are capable of producing a sequence of natural faces in sync with an input audio track.

representative citing papers

Test-Time Self-Adaptive Conditioning for Stable Audio-Driven Talking-Head Generation

cs.CV · 2026-05-25 · unverdicted · novelty 6.0

TT-SAC is a parameter-free inference framework that uses a generator-encoder feedback loop to adapt conditioning representations and stabilize identity and motion in audio-driven talking-head videos.

citing papers explorer

Showing 1 of 1 citing paper.

Test-Time Self-Adaptive Conditioning for Stable Audio-Driven Talking-Head Generation cs.CV · 2026-05-25 · unverdicted · none · ref 22 · internal anchor
TT-SAC is a parameter-free inference framework that uses a generator-encoder feedback loop to adapt conditioning representations and stabilize identity and motion in audio-driven talking-head videos.

Speech-Driven Facial Reenactment Using Conditional Generative Adversarial Networks

fields

years

verdicts

representative citing papers

citing papers explorer