OcclusionFormer adds explicit Z-order modeling via a new SA-Z dataset and volume-rendering compositing in a diffusion transformer to resolve occlusion ambiguities in layout-grounded image synthesis.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
MIND decouples high-dimensional model-induced label noise into subspace components via latent manifold disentanglement and a Latent Decoupling Estimator.
PixArt-α matches commercial text-to-image quality with a diffusion transformer trained in 675 A100 GPU days through decomposed training stages, cross-attention text injection, and vision-language model dense captions.
citing papers explorer
-
OcclusionFormer: Arranging Z-Order for Layout-Grounded Image Generation
OcclusionFormer adds explicit Z-order modeling via a new SA-Z dataset and volume-rendering compositing in a diffusion transformer to resolve occlusion ambiguities in layout-grounded image synthesis.
-
MIND: Decoupling Model-Induced Label Noise via Latent Manifold Disentanglement
MIND decouples high-dimensional model-induced label noise into subspace components via latent manifold disentanglement and a Latent Decoupling Estimator.
-
PixArt-$\alpha$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
PixArt-α matches commercial text-to-image quality with a diffusion transformer trained in 675 A100 GPU days through decomposed training stages, cross-attention text injection, and vision-language model dense captions.