Mojito: Motion Trajectory and Intensity Control for Video Generation

Jianwei Yang; Kuan Wang; Olatunji Ruwase; Shuohang Wang; Xiaoxia Wu; Xin Eric Wang; Xuehai He; Yelong Shen; Yiping Wang; Zheng Zhan

arxiv: 2412.08948 · v2 · pith:MQ4ISQDJnew · submitted 2024-12-12 · 💻 cs.CV · cs.CL

Mojito: Motion Trajectory and Intensity Control for Video Generation

Xuehai He , Shuohang Wang , Jianwei Yang , Xiaoxia Wu , Yiping Wang , Kuan Wang , Zheng Zhan , Olatunji Ruwase

show 2 more authors

Yelong Shen Xin Eric Wang

This is my paper

classification 💻 cs.CV cs.CL

keywords motionintensitycontrolmojitodiffusiontrajectoryvideodirectional

0 comments

read the original abstract

Recent advancements in diffusion models have shown great promise in producing high-quality video content. However, efficiently training video diffusion models capable of integrating directional guidance and controllable motion intensity remains a challenging and under-explored area. To tackle these challenges, this paper introduces Mojito, a diffusion model that incorporates both motion trajectory and intensity control for text-to-video generation. Specifically, Mojito features a Directional Motion Control (DMC) module that leverages cross-attention to efficiently direct the generated object's motion without training, alongside a Motion Intensity Modulator (MIM) that uses optical flow maps generated from videos to guide varying levels of motion intensity. Extensive experiments demonstrate Mojito's effectiveness in achieving precise trajectory and intensity control with high computational efficiency, generating motion patterns that closely match specified directions and intensities, providing realistic dynamics that align well with natural motion in real-world scenarios.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

EMA: Effort Metric Attention for Anatomical Effort-Guided Human Motion Diffusion
cs.CV 2026-05 unverdicted novelty 6.0

EMA is a new cross-attention module that uses two kinematic metrics to approximate LMA effort factors and enables numerical, region-wise control of motion intensity in human motion diffusion models.