Talker-T2AV achieves better lip-sync accuracy, video quality, and audio quality than dual-branch baselines by separating high-level shared autoregressive modeling from modality-specific low-level diffusion refinement in a joint audio-video generation framework.
Real3d-portrait: One-shot realistic 3d talking portrait synthesis
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 6roles
baseline 1polarities
baseline 1representative citing papers
AvatarPointillist autoregressively generates adaptive 3D point clouds via Transformer for photorealistic 4D Gaussian avatars from one image, jointly predicting animation bindings and using a conditioned Gaussian decoder.
UIKA is a feed-forward animatable Gaussian head model using UV-guided correspondence estimation and learnable UV tokens with dual-level attention, trained on large-scale synthetic data to handle pose-free inputs.
SDTalk proposes a generalizable one-shot 3DGS talking head method that uses structured facial priors for complete reconstruction and dual-branch motion fields for dynamics, outperforming prior identity-specific approaches.
FlexAvatar introduces bias sinks in a transformer to unify monocular and multi-view training, yielding complete 3D head avatars with strong generalization and view extrapolation from single images.
THEval proposes eight metrics for evaluating talking head videos on quality, naturalness, and synchronization, tested on 85,000 videos from 17 models with a new curated dataset.
citing papers explorer
-
Talker-T2AV: Joint Talking Audio-Video Generation with Autoregressive Diffusion Modeling
Talker-T2AV achieves better lip-sync accuracy, video quality, and audio quality than dual-branch baselines by separating high-level shared autoregressive modeling from modality-specific low-level diffusion refinement in a joint audio-video generation framework.
-
AvatarPointillist: AutoRegressive 4D Gaussian Avatarization
AvatarPointillist autoregressively generates adaptive 3D point clouds via Transformer for photorealistic 4D Gaussian avatars from one image, jointly predicting animation bindings and using a conditioned Gaussian decoder.
-
UIKA: Fast Universal Head Avatar from Pose-Free Images
UIKA is a feed-forward animatable Gaussian head model using UV-guided correspondence estimation and learnable UV tokens with dual-level attention, trained on large-scale synthetic data to handle pose-free inputs.
-
SDTalk: Structured Facial Priors and Dual-Branch Motion Fields for Generalizable Gaussian Talking Head Synthesis
SDTalk proposes a generalizable one-shot 3DGS talking head method that uses structured facial priors for complete reconstruction and dual-branch motion fields for dynamics, outperforming prior identity-specific approaches.
-
FlexAvatar: Learning Complete 3D Head Avatars with Partial Supervision
FlexAvatar introduces bias sinks in a transformer to unify monocular and multi-view training, yielding complete 3D head avatars with strong generalization and view extrapolation from single images.
-
THEval. Evaluation Framework for Talking Head Video Generation
THEval proposes eight metrics for evaluating talking head videos on quality, naturalness, and synchronization, tested on 85,000 videos from 17 models with a new curated dataset.