Hallo3: Highly dynamic and realistic portrait image animation with diffusion transformer networks

Jiahui Cheng, Yuqi Ma, Shan Mu, Hang Zhou, Jingdong Wang, Siyu Zhu · 2024 · arXiv 2412.00733

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

SyncBreaker:Stage-Aware Multimodal Adversarial Attacks on Audio-Driven Talking Head Generation

cs.CV · 2026-04-09 · unverdicted · novelty 6.0

SyncBreaker jointly attacks image and audio streams with Multi-Interval Sampling and Cross-Attention Fooling to degrade speech-driven talking head generation more than single-modality baselines.

JAM-Flow: Joint Audio-Motion Synthesis with Flow Matching

cs.CV · 2025-06-30 · unverdicted · novelty 6.0

JAM-Flow introduces a unified flow-matching model with a Multi-Modal Diffusion Transformer that jointly synthesizes facial motion and speech from text, audio, or motion inputs.

citing papers explorer

Showing 2 of 2 citing papers.

SyncBreaker:Stage-Aware Multimodal Adversarial Attacks on Audio-Driven Talking Head Generation cs.CV · 2026-04-09 · unverdicted · none · ref 7
SyncBreaker jointly attacks image and audio streams with Multi-Interval Sampling and Cross-Attention Fooling to degrade speech-driven talking head generation more than single-modality baselines.
JAM-Flow: Joint Audio-Motion Synthesis with Flow Matching cs.CV · 2025-06-30 · unverdicted · none · ref 44
JAM-Flow introduces a unified flow-matching model with a Multi-Modal Diffusion Transformer that jointly synthesizes facial motion and speech from text, audio, or motion inputs.

Hallo3: Highly dynamic and realistic portrait image animation with diffusion transformer networks

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer