A training-free method reformulates camera control as geometric displacement fields applied via differentiable latent resampling, enabling control and bias probing in video diffusion models.
arXiv preprint arXiv:2601.14674 (2026) 6
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 3years
2026 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
VolFill uses a hybrid 3D VAE to compress sparse truncated unsigned distance function grids into latent space and a latent Diffusion Transformer to denoise complete scenes, conditioned on geometry foundation models, outperforming baselines on SCRREAM and NRGB-D datasets.
Syn4D is a new multiview synthetic 4D dataset supplying dense ground-truth annotations for dynamic scene reconstruction, tracking, and human pose estimation.
citing papers explorer
-
Probing into Camera Control of Video Models
A training-free method reformulates camera control as geometric displacement fields applied via differentiable latent resampling, enabling control and bias probing in video diffusion models.
-
VolFill: Single-View Amodal 3D Scene Reconstruction with Volumetric Flow Matching
VolFill uses a hybrid 3D VAE to compress sparse truncated unsigned distance function grids into latent space and a latent Diffusion Transformer to denoise complete scenes, conditioned on geometry foundation models, outperforming baselines on SCRREAM and NRGB-D datasets.
-
Syn4D: A Multiview Synthetic 4D Dataset
Syn4D is a new multiview synthetic 4D dataset supplying dense ground-truth annotations for dynamic scene reconstruction, tracking, and human pose estimation.