The paper decomposes masked diffusion model training variance into masking pattern noise, masking rate noise, and data noise, then introduces P-POTS and MIRROR to reduce variance and close the performance gap with autoregressive models.
A.3 RELATEDWORK In order to reduce training variance in diffusion models, the following strategies have been proposed: • Meng et al
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Bringing Stability to Diffusion: Decomposing and Reducing Variance of Training Masked Diffusion Models
The paper decomposes masked diffusion model training variance into masking pattern noise, masking rate noise, and data noise, then introduces P-POTS and MIRROR to reduce variance and close the performance gap with autoregressive models.