World Tracing introduces a multi-layer pixel-aligned 3D point representation instantiated via a diffusion transformer (WT-DiT) trained with pixel-space flow matching to jointly reconstruct visible surfaces and generate occluded geometry.
Ldm3d: Latent diffusion model for 3d, 2023
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 3years
2026 3verdicts
UNVERDICTED 3representative citing papers
UniGP unifies controllable generation and dense prediction in an MMDiT-based diffusion model through simple joint training that preserves backbone priors.
Modality Forcing lets a single DiT produce image and depth outputs in any order after training on sparse real-world depth, with larger image-pretrained models yielding better depth accuracy and a 57% AbsRel reduction versus prior joint generative baselines.
citing papers explorer
-
World Tracing: Generative Pixel-Aligned Geometry Beyond the Visible
World Tracing introduces a multi-layer pixel-aligned 3D point representation instantiated via a diffusion transformer (WT-DiT) trained with pixel-space flow matching to jointly reconstruct visible surfaces and generate occluded geometry.
-
UniGP: Taming Diffusion Transformer for Prior-Preserved Unified Generation and Perception
UniGP unifies controllable generation and dense prediction in an MMDiT-based diffusion model through simple joint training that preserves backbone priors.
-
Modality Forcing for Scalable Spatial Generation
Modality Forcing lets a single DiT produce image and depth outputs in any order after training on sparse real-world depth, with larger image-pretrained models yielding better depth accuracy and a 57% AbsRel reduction versus prior joint generative baselines.