AUHead uses audio-language models to generate Action Unit sequences from speech and feeds them into a controllable diffusion model to synthesize realistic emotional talking-head videos.
Animatable 3d-aware face image generation for video avatars.arXiv preprint arXiv:2210.06465,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
AUHead: Realistic Emotional Talking Head Generation via Action Units Control
AUHead uses audio-language models to generate Action Unit sequences from speech and feeds them into a controllable diffusion model to synthesize realistic emotional talking-head videos.