Mean-Variance Split residuals separate centered variation from mean updates to prevent collapse and enable stable training of 1000-layer Diffusion Transformers.
Lumina-image 2.0: A unified and efficient image generative framework, 2025
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
A 17B-parameter sparse MoE diffusion transformer activates 2B parameters per pass and reaches competitive quality on image generation benchmarks without post-training.
citing papers explorer
-
Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers
Mean-Variance Split residuals separate centered variation from mean updates to prevent collapse and enable stable training of 1000-layer Diffusion Transformers.
-
Nucleus-Image: Sparse MoE for Image Generation
A 17B-parameter sparse MoE diffusion transformer activates 2B parameters per pass and reaches competitive quality on image generation benchmarks without post-training.