Trajectory attention for fine-grained video motion control

URLhttps://arxiv · 2024 · arXiv 2411.19324

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

citation-role summary

background 2 baseline 2

citation-polarity summary

background 2 baseline 2

representative citing papers

DEVIS-GRPO: Unleashing GRPO on Dynamic Extreme View Synthesis

cs.CV · 2026-05-16 · unverdicted · novelty 7.0

DEVIS-GRPO applies online policy gradients with an accumulative small-to-large view sampling strategy and multi-level rewards to improve trajectory-controlled extreme view video generation, reporting gains on Kubric-4D, iPhone, and DL3DV datasets.

Reshoot-Anything: A Self-Supervised Model for In-the-Wild Video Reshooting

cs.CV · 2026-04-23 · unverdicted · novelty 7.0

Reshoot-Anything trains a diffusion transformer on pseudo multi-view triplets created by cropping and warping monocular videos to achieve temporally consistent video reshooting with robust camera control on dynamic scenes.

$h$-control: Training-Free Camera Control via Block-Conditional Gibbs Refinement

cs.CV · 2026-05-12 · unverdicted · novelty 6.0 · 2 refs

h-control augments hard-replacement guidance with block-conditional pseudo-Gibbs refinement on unobserved latent sites and adaptive 3D patch freezing to achieve superior FVD on RealEstate10K and DAVIS.

INSPATIO-WORLD: A Real-Time 4D World Simulator via Spatiotemporal Autoregressive Modeling

cs.CV · 2026-04-08 · unverdicted · novelty 6.0

INSPATIO-WORLD is a real-time framework for high-fidelity 4D scene generation and navigation from monocular videos via STAR architecture with implicit caching, explicit geometric constraints, and distribution-matching distillation.

Real2SAM2Real: Generative 3D Caches as Complementary Context for Video Diffusion

cs.CV · 2026-05-29 · unverdicted · novelty 5.0

Real2SAM2Real uses 3D caches from lifting models as complementary context for video diffusion models to enable precise decoupled control over camera trajectories and multi-entity motions while maintaining spatiotemporal consistency.

Pantheon360: Taming Digital Twin Generation via 3D-Aware 360{\deg} Video Diffusion

cs.CV · 2026-05-25 · unverdicted · novelty 5.0

Pantheon360 introduces a controllable 360° video diffusion framework that uses an explicit 3D cache from sparse inputs to enforce geometric consistency for digital twin generation.

World-R1: Reinforcing 3D Constraints for Text-to-Video Generation

cs.CV · 2026-04-27 · unverdicted · novelty 4.0 · 3 refs

World-R1 applies reinforcement learning via Flow-GRPO and a text dataset to align text-to-video models with 3D constraints from pre-trained foundation models, improving consistency while keeping original visual quality.

Making Time Editable in Video Diffusion Transformers

cs.CV · 2026-06-08 · unverdicted · novelty 3.0

Introduces a lightweight temporal module to extend pretrained DiT video models with time editing capabilities while preserving the original generative prior.

citing papers explorer

Showing 8 of 8 citing papers after filters.

DEVIS-GRPO: Unleashing GRPO on Dynamic Extreme View Synthesis cs.CV · 2026-05-16 · unverdicted · none · ref 58
DEVIS-GRPO applies online policy gradients with an accumulative small-to-large view sampling strategy and multi-level rewards to improve trajectory-controlled extreme view video generation, reporting gains on Kubric-4D, iPhone, and DL3DV datasets.
Reshoot-Anything: A Self-Supervised Model for In-the-Wild Video Reshooting cs.CV · 2026-04-23 · unverdicted · none · ref 41
Reshoot-Anything trains a diffusion transformer on pseudo multi-view triplets created by cropping and warping monocular videos to achieve temporally consistent video reshooting with robust camera control on dynamic scenes.
$h$-control: Training-Free Camera Control via Block-Conditional Gibbs Refinement cs.CV · 2026-05-12 · unverdicted · none · ref 34 · 2 links
h-control augments hard-replacement guidance with block-conditional pseudo-Gibbs refinement on unobserved latent sites and adaptive 3D patch freezing to achieve superior FVD on RealEstate10K and DAVIS.
INSPATIO-WORLD: A Real-Time 4D World Simulator via Spatiotemporal Autoregressive Modeling cs.CV · 2026-04-08 · unverdicted · none · ref 90
INSPATIO-WORLD is a real-time framework for high-fidelity 4D scene generation and navigation from monocular videos via STAR architecture with implicit caching, explicit geometric constraints, and distribution-matching distillation.
Real2SAM2Real: Generative 3D Caches as Complementary Context for Video Diffusion cs.CV · 2026-05-29 · unverdicted · none · ref 40
Real2SAM2Real uses 3D caches from lifting models as complementary context for video diffusion models to enable precise decoupled control over camera trajectories and multi-entity motions while maintaining spatiotemporal consistency.
Pantheon360: Taming Digital Twin Generation via 3D-Aware 360{\deg} Video Diffusion cs.CV · 2026-05-25 · unverdicted · none · ref 82
Pantheon360 introduces a controllable 360° video diffusion framework that uses an explicit 3D cache from sparse inputs to enforce geometric consistency for digital twin generation.
World-R1: Reinforcing 3D Constraints for Text-to-Video Generation cs.CV · 2026-04-27 · unverdicted · none · ref 51 · 3 links
World-R1 applies reinforcement learning via Flow-GRPO and a text dataset to align text-to-video models with 3D constraints from pre-trained foundation models, improving consistency while keeping original visual quality.
Making Time Editable in Video Diffusion Transformers cs.CV · 2026-06-08 · unverdicted · none · ref 11
Introduces a lightweight temporal module to extend pretrained DiT video models with time editing capabilities while preserving the original generative prior.

Trajectory attention for fine-grained video motion control

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer