hub Mixed citations

Liveportrait: Efficient portrait animation with stitching and retargeting control

Jianzhu Guo, Dingyun Zhang, Xiaoqiang Liu, Zhizhou Zhong, Yuan Zhang, Pengfei Wan, Di Zhang · 2024 · arXiv 2407.03168

Mixed citation behavior. Most common role is background (60%).

18 Pith papers citing it

Background 60% of classified citations

read on arXiv browse 18 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 3 dataset 1 method 1

citation-polarity summary

background 3 use dataset 1 use method 1

representative citing papers

Talker-T2AV: Joint Talking Audio-Video Generation with Autoregressive Diffusion Modeling

cs.CV · 2026-04-26 · unverdicted · novelty 7.0

Talker-T2AV achieves better lip-sync accuracy, video quality, and audio quality than dual-branch baselines by separating high-level shared autoregressive modeling from modality-specific low-level diffusion refinement in a joint audio-video generation framework.

AvatarPointillist: AutoRegressive 4D Gaussian Avatarization

cs.CV · 2026-04-06 · unverdicted · novelty 7.0

AvatarPointillist autoregressively generates adaptive 3D point clouds via Transformer for photorealistic 4D Gaussian avatars from one image, jointly predicting animation bindings and using a conditioned Gaussian decoder.

UIKA: Fast Universal Head Avatar from Pose-Free Images

cs.CV · 2026-01-12 · conditional · novelty 7.0

UIKA is a feed-forward animatable Gaussian head model using UV-guided correspondence estimation and learnable UV tokens with dual-level attention, trained on large-scale synthetic data to handle pose-free inputs.

ViBES: A Conversational Agent with Behaviorally-Intelligent 3D Virtual Body

cs.CV · 2025-12-16 · unverdicted · novelty 7.0

ViBES introduces a speech-language-behavior model using modality-specific transformer experts that jointly generates dialogue and 3D body actions, showing gains over separate co-speech and text-to-motion baselines on multi-turn metrics.

Omni-Attribute: Open-vocabulary Attribute Encoder for Visual Concept Personalization

cs.CV · 2025-12-11 · unverdicted · novelty 7.0

Omni-Attribute is a new open-vocabulary image attribute encoder trained on semantically linked pairs with dual objectives to produce disentangled representations for personalization and compositional generation.

Unmasking Puppeteers: Leveraging Biometric Leakage to Expose Impersonation in AI-Based Videoconferencing

cs.CV · 2025-10-03 · unverdicted · novelty 7.0

A pose-conditioned large-margin contrastive encoder isolates persistent biometric identity cues from transmitted latents in talking-head videoconferencing to flag impersonation attacks via cosine similarity without inspecting the output video.

FluentAvatar: Flicker-Free Talking-Head Animation via Phoneme-Guided Autoregressive Modeling

cs.CV · 2025-09-15 · unverdicted · novelty 7.0

Phoneme-guided autoregressive framework for talking-head animation that reduces inter-frame flicker via causal keyframe generation and timestamp-aware interpolation, outperforming diffusion baselines on FVD and a new BG-Flicker metric.

Durian: Dual Reference Image-Guided Portrait Animation with Attribute Transfer

cs.CV · 2025-09-04 · conditional · novelty 7.0

Durian introduces a dual-reference diffusion model trained via self-reconstruction on video frames to enable cross-identity attribute transfer in portrait animations, supporting multi-attribute composition and interpolation.

TOPOS: High-Fidelity and Efficient Industry-Grade 3D Head Generation

cs.CV · 2026-05-14 · unverdicted · novelty 6.0

TOPOS creates high-fidelity 3D heads with fixed industry topology from single images via a specialized VAE with Perceiver Resampler and a rectified flow transformer.

The Alpha Blending Hypothesis: Compositing Shortcut in Deepfake Detection

cs.CV · 2026-05-11 · unverdicted · novelty 6.0

Deepfake detectors act as alpha blending searchers; training solely on self-blended real images yields top cross-dataset generalization on 15 datasets without using synthetic deepfakes.

SocialDirector: Training-Free Social Interaction Control for Multi-Person Video Generation

cs.CV · 2026-05-11 · unverdicted · novelty 6.0

SocialDirector uses spatiotemporal actor masking and directional reweighting on cross-attention maps to reduce actor-action mismatches and improve target-directed interactions in generated multi-person videos.

MeshLAM: Feed-Forward One-Shot Animatable Textured Mesh Avatar Reconstruction

cs.CV · 2026-04-23 · unverdicted · novelty 6.0

MeshLAM reconstructs high-fidelity animatable textured mesh head avatars from a single image via a feed-forward dual shape-texture architecture with iterative GRU decoding and reprojection-based guidance.

ARGen: Affect-Reinforced Generative Augmentation towards Vision-based Dynamic Emotion Perception

cs.CV · 2026-04-14 · unverdicted · novelty 6.0

ARGen generates high-fidelity dynamic facial expression videos using affective semantic injection and adaptive reinforcement diffusion to improve emotion recognition models facing data scarcity and long-tail distributions.

JAM-Flow: Joint Audio-Motion Synthesis with Flow Matching

cs.CV · 2025-06-30 · unverdicted · novelty 6.0

JAM-Flow introduces a unified flow-matching model with a Multi-Modal Diffusion Transformer that jointly synthesizes facial motion and speech from text, audio, or motion inputs.

EasyVFX: Frequency-Driven Decoupling for Resource-Efficient VFX Generation

cs.CV · 2026-05-21 · unverdicted · novelty 5.0

EasyVFX decouples VFX generation via frequency-aware Mixture-of-Experts and test-time training to achieve realistic effects with limited resources.

PortraitDirector: A Hierarchical Disentanglement Framework for Controllable and Real-time Facial Reenactment

cs.CV · 2026-04-21 · unverdicted · novelty 5.0

PortraitDirector uses hierarchical disentanglement of spatial physical motions and semantic emotions to deliver controllable, high-fidelity real-time facial reenactment at 20 FPS.

JoyVASA: Portrait and Animal Image Animation with Diffusion-Based Audio-Driven Facial Dynamics and Head Motion Generation

cs.CV · 2024-11-14 · unverdicted · novelty 5.0

JoyVASA decouples static 3D facial representations from identity-independent dynamic motion sequences generated by a diffusion transformer to produce audio-driven animations for humans and animals.

OpenWorldLib: A Unified Codebase and Definition of Advanced World Models

cs.CV · 2026-04-06

citing papers explorer

Showing 18 of 18 citing papers.

Talker-T2AV: Joint Talking Audio-Video Generation with Autoregressive Diffusion Modeling cs.CV · 2026-04-26 · unverdicted · none · ref 4
Talker-T2AV achieves better lip-sync accuracy, video quality, and audio quality than dual-branch baselines by separating high-level shared autoregressive modeling from modality-specific low-level diffusion refinement in a joint audio-video generation framework.
AvatarPointillist: AutoRegressive 4D Gaussian Avatarization cs.CV · 2026-04-06 · unverdicted · none · ref 19
AvatarPointillist autoregressively generates adaptive 3D point clouds via Transformer for photorealistic 4D Gaussian avatars from one image, jointly predicting animation bindings and using a conditioned Gaussian decoder.
UIKA: Fast Universal Head Avatar from Pose-Free Images cs.CV · 2026-01-12 · conditional · none · ref 23
UIKA is a feed-forward animatable Gaussian head model using UV-guided correspondence estimation and learnable UV tokens with dual-level attention, trained on large-scale synthetic data to handle pose-free inputs.
ViBES: A Conversational Agent with Behaviorally-Intelligent 3D Virtual Body cs.CV · 2025-12-16 · unverdicted · none · ref 35
ViBES introduces a speech-language-behavior model using modality-specific transformer experts that jointly generates dialogue and 3D body actions, showing gains over separate co-speech and text-to-motion baselines on multi-turn metrics.
Omni-Attribute: Open-vocabulary Attribute Encoder for Visual Concept Personalization cs.CV · 2025-12-11 · unverdicted · none · ref 21
Omni-Attribute is a new open-vocabulary image attribute encoder trained on semantically linked pairs with dual objectives to produce disentangled representations for personalization and compositional generation.
Unmasking Puppeteers: Leveraging Biometric Leakage to Expose Impersonation in AI-Based Videoconferencing cs.CV · 2025-10-03 · unverdicted · none · ref 72
A pose-conditioned large-margin contrastive encoder isolates persistent biometric identity cues from transmitted latents in talking-head videoconferencing to flag impersonation attacks via cosine similarity without inspecting the output video.
FluentAvatar: Flicker-Free Talking-Head Animation via Phoneme-Guided Autoregressive Modeling cs.CV · 2025-09-15 · unverdicted · none · ref 8
Phoneme-guided autoregressive framework for talking-head animation that reduces inter-frame flicker via causal keyframe generation and timestamp-aware interpolation, outperforming diffusion baselines on FVD and a new BG-Flicker metric.
Durian: Dual Reference Image-Guided Portrait Animation with Attribute Transfer cs.CV · 2025-09-04 · conditional · none · ref 5
Durian introduces a dual-reference diffusion model trained via self-reconstruction on video frames to enable cross-identity attribute transfer in portrait animations, supporting multi-attribute composition and interpolation.
TOPOS: High-Fidelity and Efficient Industry-Grade 3D Head Generation cs.CV · 2026-05-14 · unverdicted · none · ref 137
TOPOS creates high-fidelity 3D heads with fixed industry topology from single images via a specialized VAE with Perceiver Resampler and a rectified flow transformer.
The Alpha Blending Hypothesis: Compositing Shortcut in Deepfake Detection cs.CV · 2026-05-11 · unverdicted · none · ref 9
Deepfake detectors act as alpha blending searchers; training solely on self-blended real images yields top cross-dataset generalization on 15 datasets without using synthetic deepfakes.
SocialDirector: Training-Free Social Interaction Control for Multi-Person Video Generation cs.CV · 2026-05-11 · unverdicted · none · ref 16
SocialDirector uses spatiotemporal actor masking and directional reweighting on cross-attention maps to reduce actor-action mismatches and improve target-directed interactions in generated multi-person videos.
MeshLAM: Feed-Forward One-Shot Animatable Textured Mesh Avatar Reconstruction cs.CV · 2026-04-23 · unverdicted · none · ref 19
MeshLAM reconstructs high-fidelity animatable textured mesh head avatars from a single image via a feed-forward dual shape-texture architecture with iterative GRU decoding and reprojection-based guidance.
ARGen: Affect-Reinforced Generative Augmentation towards Vision-based Dynamic Emotion Perception cs.CV · 2026-04-14 · unverdicted · none · ref 12
ARGen generates high-fidelity dynamic facial expression videos using affective semantic injection and adaptive reinforcement diffusion to improve emotion recognition models facing data scarcity and long-tail distributions.
JAM-Flow: Joint Audio-Motion Synthesis with Flow Matching cs.CV · 2025-06-30 · unverdicted · none · ref 7
JAM-Flow introduces a unified flow-matching model with a Multi-Modal Diffusion Transformer that jointly synthesizes facial motion and speech from text, audio, or motion inputs.
EasyVFX: Frequency-Driven Decoupling for Resource-Efficient VFX Generation cs.CV · 2026-05-21 · unverdicted · none · ref 9
EasyVFX decouples VFX generation via frequency-aware Mixture-of-Experts and test-time training to achieve realistic effects with limited resources.
PortraitDirector: A Hierarchical Disentanglement Framework for Controllable and Real-time Facial Reenactment cs.CV · 2026-04-21 · unverdicted · none · ref 11
PortraitDirector uses hierarchical disentanglement of spatial physical motions and semantic emotions to deliver controllable, high-fidelity real-time facial reenactment at 20 FPS.
JoyVASA: Portrait and Animal Image Animation with Diffusion-Based Audio-Driven Facial Dynamics and Head Motion Generation cs.CV · 2024-11-14 · unverdicted · none · ref 17
JoyVASA decouples static 3D facial representations from identity-independent dynamic motion sequences generated by a diffusion transformer to produce audio-driven animations for humans and animals.
OpenWorldLib: A Unified Codebase and Definition of Advanced World Models cs.CV · 2026-04-06 · unreviewed · ref 38

Liveportrait: Efficient portrait animation with stitching and retargeting control

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer