MotiMotion adds visual reasoning via a training-free VLM to refine primary trajectories and hallucinate secondary motions, plus a confidence-aware guidance scheme, yielding more plausible interactions on the new MotiBench benchmark.
arXiv preprint arXiv:2512.02015 (2025)
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 4years
2026 4verdicts
UNVERDICTED 4roles
background 1polarities
background 1representative citing papers
PREX decomposes target 4D video volumes into Preserve, Reveal, and Expand roles with a region-aware adapter on a frozen diffusion backbone, trained via proxy tasks, and introduces the PREBench benchmark to reduce region-structured editing failures.
A synthetic data pipeline and fine-tuned video model enable generative editing to move object 3D trajectories in videos while keeping relative motion.
A new occlusion-aware control module generates high-fidelity egocentric videos from sparse 3D hand joints, supported by a million-clip dataset and cross-embodiment benchmark.
citing papers explorer
-
MotiMotion: Motion-Controlled Video Generation with Visual Reasoning
MotiMotion adds visual reasoning via a training-free VLM to refine primary trajectories and hallucinate secondary motions, plus a confidence-aware guidance scheme, yielding more plausible interactions on the new MotiBench benchmark.
-
Preserve, Reveal, Expand: Faithful 4D Video Editing with Region-Aware Conditioning
PREX decomposes target 4D video volumes into Preserve, Reveal, and Expand roles with a region-aware adapter on a frozen diffusion backbone, trained via proxy tasks, and introduces the PREBench benchmark to reduce region-structured editing failures.
-
TrajectoryMover: Generative Movement of Object Trajectories in Videos
A synthetic data pipeline and fine-tuned video model enable generative editing to move object 3D trajectories in videos while keeping relative motion.
-
Controllable Egocentric Video Generation via Occlusion-Aware Sparse 3D Hand Joints
A new occlusion-aware control module generates high-fidelity egocentric videos from sparse 3D hand joints, supported by a million-clip dataset and cross-embodiment benchmark.