A pixel-space Diffusion Transformer with Unified Transformer architecture unifies image generation, editing, and personalization in an end-to-end model that maps all inputs to a shared token space and scales from 8B to over 200B parameters.
arXiv preprint arXiv:2411.06558 (2024)
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
A training-free method with time-dependent attention gating and trajectory pruning enhances object-background balance in diffusion-based image synthesis.
citing papers explorer
-
HiDream-O1-Image: A Natively Unified Image Generative Foundation Model with Pixel-level Unified Transformer
A pixel-space Diffusion Transformer with Unified Transformer architecture unifies image generation, editing, and personalization in an end-to-end model that maps all inputs to a shared token space and scales from 8B to over 200B parameters.
-
Training-Free Object-Background Compositional T2I via Dynamic Spatial Guidance and Multi-Path Pruning
A training-free method with time-dependent attention gating and trajectory pruning enhances object-background balance in diffusion-based image synthesis.