Motionctrl: A unified and flexible motion controller for video generation

Zhouxia Wang, Ziyang Yuan, Xintao Wang, Yaowei Li, Tianshui Chen, Menghan Xia, Ping Luo, Ying Shan · 2024

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Reshoot-Anything: A Self-Supervised Model for In-the-Wild Video Reshooting

cs.CV · 2026-04-23 · unverdicted · novelty 7.0

Reshoot-Anything trains a diffusion transformer on pseudo multi-view triplets created by cropping and warping monocular videos to achieve temporally consistent video reshooting with robust camera control on dynamic scenes.

HVG-3D: Bridging Real and Simulation Domains for 3D-Conditional Hand-Object Interaction Video Synthesis

cs.CV · 2026-03-31 · unverdicted · novelty 6.0

HVG-3D uses a 3D-aware diffusion architecture with ControlNet to synthesize high-fidelity hand-object interaction videos from 3D control signals, achieving state-of-the-art spatial fidelity and temporal coherence on the TASTE-Rob dataset.

Less is More: Data-Efficient Adaptation for Controllable Text-to-Video Generation

cs.CV · 2025-11-21 · unverdicted · novelty 6.0

Fine-tuning text-to-video models on sparse low-quality synthetic data for physical camera controls outperforms fine-tuning on photorealistic data.

citing papers explorer

Showing 3 of 3 citing papers.

Reshoot-Anything: A Self-Supervised Model for In-the-Wild Video Reshooting cs.CV · 2026-04-23 · unverdicted · none · ref 38
Reshoot-Anything trains a diffusion transformer on pseudo multi-view triplets created by cropping and warping monocular videos to achieve temporally consistent video reshooting with robust camera control on dynamic scenes.
HVG-3D: Bridging Real and Simulation Domains for 3D-Conditional Hand-Object Interaction Video Synthesis cs.CV · 2026-03-31 · unverdicted · none · ref 70
HVG-3D uses a 3D-aware diffusion architecture with ControlNet to synthesize high-fidelity hand-object interaction videos from 3D control signals, achieving state-of-the-art spatial fidelity and temporal coherence on the TASTE-Rob dataset.
Less is More: Data-Efficient Adaptation for Controllable Text-to-Video Generation cs.CV · 2025-11-21 · unverdicted · none · ref 45
Fine-tuning text-to-video models on sparse low-quality synthetic data for physical camera controls outperforms fine-tuning on photorealistic data.

Motionctrl: A unified and flexible motion controller for video generation

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer