FLOAT: generative motion latent flow matching for audio-driven talk- ing portrait.CoRR, abs/2412.01064

Taekyung Ki, Dongchan Min, Gyeongsu Chae · 2024 · arXiv 2412.01064

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

Cross-Modal Emotion Transfer for Emotion Editing in Talking Face Video

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

C-MET transfers emotions from speech to facial video by learning cross-modal semantic vectors with pretrained audio and disentangled expression encoders, yielding 14% higher emotion accuracy on MEAD and CREMA-D even for unseen emotions.

THEval. Evaluation Framework for Talking Head Video Generation

cs.CV · 2025-11-06 · conditional · novelty 6.0

THEval proposes eight metrics for evaluating talking head videos on quality, naturalness, and synchronization, tested on 85,000 videos from 17 models with a new curated dataset.

PortraitDirector: A Hierarchical Disentanglement Framework for Controllable and Real-time Facial Reenactment

cs.CV · 2026-04-21 · unverdicted · novelty 5.0

PortraitDirector uses hierarchical disentanglement of spatial physical motions and semantic emotions to deliver controllable, high-fidelity real-time facial reenactment at 20 FPS.

citing papers explorer

Showing 3 of 3 citing papers.

Cross-Modal Emotion Transfer for Emotion Editing in Talking Face Video cs.CV · 2026-04-09 · unverdicted · none · ref 25
C-MET transfers emotions from speech to facial video by learning cross-modal semantic vectors with pretrained audio and disentangled expression encoders, yielding 14% higher emotion accuracy on MEAD and CREMA-D even for unseen emotions.
THEval. Evaluation Framework for Talking Head Video Generation cs.CV · 2025-11-06 · conditional · none · ref 4
THEval proposes eight metrics for evaluating talking head videos on quality, naturalness, and synchronization, tested on 85,000 videos from 17 models with a new curated dataset.
PortraitDirector: A Hierarchical Disentanglement Framework for Controllable and Real-time Facial Reenactment cs.CV · 2026-04-21 · unverdicted · none · ref 17
PortraitDirector uses hierarchical disentanglement of spatial physical motions and semantic emotions to deliver controllable, high-fidelity real-time facial reenactment at 20 FPS.

FLOAT: generative motion latent flow matching for audio-driven talk- ing portrait.CoRR, abs/2412.01064

fields

years

verdicts

representative citing papers

citing papers explorer