A pose-conditioned large-margin contrastive encoder isolates persistent biometric identity cues from transmitted latents in talking-head videoconferencing to flag impersonation attacks via cosine similarity without inspecting the output video.
Latent image animator: Learning to animate images via latent space navigation
4 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
UniSim learns a universal real-world simulator from orchestrated diverse datasets, enabling zero-shot deployment of policies trained purely in simulation.
PCMECL improves speech-preserving facial expression manipulation by learning personalized prompts from individual visuals and using feature differencing to align visual and semantic changes from VLMs.
AUHead uses audio-language models to generate Action Unit sequences from speech and feeds them into a controllable diffusion model to synthesize realistic emotional talking-head videos.
citing papers explorer
-
Unmasking Puppeteers: Leveraging Biometric Leakage to Expose Impersonation in AI-Based Videoconferencing
A pose-conditioned large-margin contrastive encoder isolates persistent biometric identity cues from transmitted latents in talking-head videoconferencing to flag impersonation attacks via cosine similarity without inspecting the output video.
-
Learning Interactive Real-World Simulators
UniSim learns a universal real-world simulator from orchestrated diverse datasets, enabling zero-shot deployment of policies trained purely in simulation.
-
Personalized Cross-Modal Emotional Correlation Learning for Speech-Preserving Facial Expression Manipulation
PCMECL improves speech-preserving facial expression manipulation by learning personalized prompts from individual visuals and using feature differencing to align visual and semantic changes from VLMs.
-
AUHead: Realistic Emotional Talking Head Generation via Action Units Control
AUHead uses audio-language models to generate Action Unit sequences from speech and feeds them into a controllable diffusion model to synthesize realistic emotional talking-head videos.