CoMoVi co-generates 3D human motions and 2D videos synchronously in a single diffusion denoising loop using 3D-to-2D projection and dual-branch diffusion with 3D-2D cross attentions.
arXiv preprint arXiv:2507.21045 (2025) 1, 3
4 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 4verdicts
UNVERDICTED 4representative citing papers
HA-HOI produces physically plausible 4D HOI animations from monocular videos by anchoring object reconstruction to human motion and refining the result in a physics-based humanoid-object simulator.
Stitch4D reconstructs coherent 4D urban scenes from sparse non-overlapping camera placements by synthesizing bridge views and enforcing inter-location spatio-temporal consistency.
PAGE-4D is a feedforward extension of VGGT that uses a dynamics-aware aggregator and mask to disentangle pose estimation from geometry reconstruction in videos with moving objects.
citing papers explorer
-
CoMoVi: Co-Generation of 3D Human Motions and Realistic Videos
CoMoVi co-generates 3D human motions and 2D videos synchronously in a single diffusion denoising loop using 3D-to-2D projection and dual-branch diffusion with 3D-2D cross attentions.
-
Real2Sim in HOI: Toward Physically Plausible HOI Reconstruction from Monocular Videos
HA-HOI produces physically plausible 4D HOI animations from monocular videos by anchoring object reconstruction to human motion and refining the result in a physics-based humanoid-object simulator.
-
Stitch4D: Sparse Multi-Location 4D Urban Reconstruction via Spatio-Temporal Interpolation
Stitch4D reconstructs coherent 4D urban scenes from sparse non-overlapping camera placements by synthesizing bridge views and enforcing inter-location spatio-temporal consistency.
-
PAGE-4D: VGGT-4D Perception via Disentangled Pose and Geometry Estimation
PAGE-4D is a feedforward extension of VGGT that uses a dynamics-aware aggregator and mask to disentangle pose estimation from geometry reconstruction in videos with moving objects.