The paper decomposes masked diffusion model training variance into masking pattern noise, masking rate noise, and data noise, then introduces P-POTS and MIRROR to reduce variance and close the performance gap with autoregressive models.
By the law of total variance, Varx0,t,xt(Y) =E x0 [Vart,xt(Y|x 0)] + Varx0 [Et,xt(Y|x 0)]
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Bringing Stability to Diffusion: Decomposing and Reducing Variance of Training Masked Diffusion Models
The paper decomposes masked diffusion model training variance into masking pattern noise, masking rate noise, and data noise, then introduces P-POTS and MIRROR to reduce variance and close the performance gap with autoregressive models.