ChopGrad truncates backpropagation to local frame windows in video diffusion models, reducing memory from linear in frame count to constant while enabling pixel-wise loss fine-tuning.
Orient anything: Learning robust object orientation estimation from rendering 3d mod- els
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 3verdicts
UNVERDICTED 3representative citing papers
GenHSI is a training-free three-stage pipeline that turns a scene image, character image, and complex HSI prompt into long videos with plausible chained interactions by generating atomic actions, 3D keyframes via 2D inpainting plus optimization, and then feeding them to pre-trained video diffusion.
Unposed-to-3D learns simulation-ready 3D vehicle models from unposed real images by predicting camera parameters for photometric self-supervision, then adding scale prediction and harmonization.
citing papers explorer
-
ChopGrad: Pixel-Wise Losses for Latent Video Diffusion via Truncated Backpropagation
ChopGrad truncates backpropagation to local frame windows in video diffusion models, reducing memory from linear in frame count to constant while enabling pixel-wise loss fine-tuning.
-
GenHSI: Controllable Generation of Human-Scene Interaction Videos
GenHSI is a training-free three-stage pipeline that turns a scene image, character image, and complex HSI prompt into long videos with plausible chained interactions by generating atomic actions, 3D keyframes via 2D inpainting plus optimization, and then feeding them to pre-trained video diffusion.
-
Unposed-to-3D: Learning Simulation-Ready Vehicles from Real-World Images
Unposed-to-3D learns simulation-ready 3D vehicle models from unposed real images by predicting camera parameters for photometric self-supervision, then adding scale prediction and harmonization.