Ldm3d: Latent diffusion model for 3d, 2023

Gabriela Ben Melech Stan, Diana Wofk, Scottie Fox, Alex Redden, Will Saxton, Jean Yu, Estelle Aflalo, Shao-Yen Tseng, Fabio Nonato, Matthias Muller, Vasudev Lal · 2023 · arXiv 2305.10853

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

representative citing papers

World Tracing: Generative Pixel-Aligned Geometry Beyond the Visible

cs.CV · 2026-06-11 · unverdicted · novelty 7.0

World Tracing introduces a multi-layer pixel-aligned 3D point representation instantiated via a diffusion transformer (WT-DiT) trained with pixel-space flow matching to jointly reconstruct visible surfaces and generate occluded geometry.

UniGP: Taming Diffusion Transformer for Prior-Preserved Unified Generation and Perception

cs.CV · 2026-06-29 · unverdicted · novelty 6.0

UniGP unifies controllable generation and dense prediction in an MMDiT-based diffusion model through simple joint training that preserves backbone priors.

Modality Forcing for Scalable Spatial Generation

cs.CV · 2026-06-11 · unverdicted · novelty 6.0

Modality Forcing lets a single DiT produce image and depth outputs in any order after training on sparse real-world depth, with larger image-pretrained models yielding better depth accuracy and a 57% AbsRel reduction versus prior joint generative baselines.

citing papers explorer

Showing 3 of 3 citing papers after filters.

World Tracing: Generative Pixel-Aligned Geometry Beyond the Visible cs.CV · 2026-06-11 · unverdicted · none · ref 60
World Tracing introduces a multi-layer pixel-aligned 3D point representation instantiated via a diffusion transformer (WT-DiT) trained with pixel-space flow matching to jointly reconstruct visible surfaces and generate occluded geometry.
UniGP: Taming Diffusion Transformer for Prior-Preserved Unified Generation and Perception cs.CV · 2026-06-29 · unverdicted · none · ref 23
UniGP unifies controllable generation and dense prediction in an MMDiT-based diffusion model through simple joint training that preserves backbone priors.
Modality Forcing for Scalable Spatial Generation cs.CV · 2026-06-11 · unverdicted · none · ref 33
Modality Forcing lets a single DiT produce image and depth outputs in any order after training on sparse real-world depth, with larger image-pretrained models yielding better depth accuracy and a 57% AbsRel reduction versus prior joint generative baselines.

Ldm3d: Latent diffusion model for 3d, 2023

fields

years

verdicts

representative citing papers

citing papers explorer