The paper decomposes masked diffusion model training variance into masking pattern noise, masking rate noise, and data noise, then introduces P-POTS and MIRROR to reduce variance and close the performance gap with autoregressive models.
A.5.2 P-POTS: PARETO-OPTIMALPROPERTY Consider the reweighted estimator 1 p(t) lθ(x0, t, xt), t∼p(t)
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Bringing Stability to Diffusion: Decomposing and Reducing Variance of Training Masked Diffusion Models
The paper decomposes masked diffusion model training variance into masking pattern noise, masking rate noise, and data noise, then introduces P-POTS and MIRROR to reduce variance and close the performance gap with autoregressive models.