Speech-Driven Facial Reenactment Using Conditional Generative Adversarial Networks

Hamid Aghajan; Hosein Hasani; Seyed Ali Jalalifar

arxiv: 1803.07461 · v1 · pith:JZMQLUJQnew · submitted 2018-03-20 · 💻 cs.CV

Speech-Driven Facial Reenactment Using Conditional Generative Adversarial Networks

Seyed Ali Jalalifar , Hosein Hasani , Hamid Aghajan This is my paper

classification 💻 cs.CV

keywords audionetworksadversarialconditionalfacegenerativeinputlandmarks

0 comments

read the original abstract

We present a novel approach to generating photo-realistic images of a face with accurate lip sync, given an audio input. By using a recurrent neural network, we achieved mouth landmarks based on audio features. We exploited the power of conditional generative adversarial networks to produce highly-realistic face conditioned on a set of landmarks. These two networks together are capable of producing a sequence of natural faces in sync with an input audio track.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Test-Time Self-Adaptive Conditioning for Stable Audio-Driven Talking-Head Generation
cs.CV 2026-05 unverdicted novelty 6.0

TT-SAC is a parameter-free inference framework that uses a generator-encoder feedback loop to adapt conditioning representations and stabilize identity and motion in audio-driven talking-head videos.