Towards Multiple Character Image Animation Through Enhancing Implicit Decoupling

Andong Wang; Heung-Yeung Shum; Hongfa Wang; Jingyun Xue; Kaihao Zhang; Mengyang Liu; Qi Tian; Shaobo Min; Wei Liu; Wenhan Luo

arxiv: 2406.03035 · v4 · pith:PHWFLORLnew · submitted 2024-06-05 · 💻 cs.CV

Towards Multiple Character Image Animation Through Enhancing Implicit Decoupling

Jingyun Xue , Hongfa Wang , Qi Tian , Yue Ma , Andong Wang , Zhiyuan Zhao , Shaobo Min , Wenzhe Zhao

show 5 more authors

Kaihao Zhang Heung-Yeung Shum Wei Liu Mengyang Liu Wenhan Luo

This is my paper

classification 💻 cs.CV

keywords backgroundcharactercharactersimageanimationmultipleinformationdecoupling

0 comments

read the original abstract

Controllable character image animation has a wide range of applications. Although existing studies have consistently improved performance, challenges persist in the field of character image animation, particularly concerning stability in complex backgrounds and tasks involving multiple characters. To address these challenges, we propose a novel multi-condition guided framework for character image animation, employing several well-designed input modules to enhance the implicit decoupling capability of the model. First, the optical flow guider calculates the background optical flow map as guidance information, which enables the model to implicitly learn to decouple the background motion into background constants and background momentum during training, and generate a stable background by setting zero background momentum during inference. Second, the depth order guider calculates the order map of the characters, which transforms the depth information into the positional information of multiple characters. This facilitates the implicit learning of decoupling different characters, especially in accurately separating the occluded body parts of multiple characters. Third, the reference pose map is input to enhance the ability to decouple character texture and pose information in the reference image. Furthermore, to fill the gap of fair evaluation of multi-character image animation, we propose a new benchmark comprising about 4,000 frames. Extensive qualitative and quantitative evaluations demonstrate that our method excels in generating high-quality character animations, especially in scenarios of complex backgrounds and multiple characters.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Through the PRISM: Preference Representation in Intermediate States of Video Diffusion Models
cs.CV 2026-06 unverdicted novelty 7.0

PRISM shows video diffusion models inherently encode preference information in noisy latents, achieving SOTA accuracy and enabling noise-robust early-stage sampling with a correlation to generative performance.
HunyuanVideo: A Systematic Framework For Large Video Generative Models
cs.CV 2024-12 unverdicted novelty 5.0

HunyuanVideo presents a 13B-parameter open-source video generative model with integrated data, architecture, training, and inference systems whose professional evaluations show it outperforming prior SOTA models inclu...