OcclusionFormer adds explicit Z-order modeling via a new SA-Z dataset and volume-rendering compositing in a diffusion transformer to resolve occlusion ambiguities in layout-grounded image synthesis.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2verdicts
UNVERDICTED 2representative citing papers
A new shared video-image tokenizer enables large language models to surpass diffusion models on standard visual generation benchmarks.
citing papers explorer
-
OcclusionFormer: Arranging Z-Order for Layout-Grounded Image Generation
OcclusionFormer adds explicit Z-order modeling via a new SA-Z dataset and volume-rendering compositing in a diffusion transformer to resolve occlusion ambiguities in layout-grounded image synthesis.
-
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation
A new shared video-image tokenizer enables large language models to surpass diffusion models on standard visual generation benchmarks.