A.5.2 P-POTS: PARETO-OPTIMALPROPERTY Consider the reweighted estimator 1 p(t) lθ(x0, t, xt), t∼p(t)

Combine Everything Putting steps 1, 2 together, we have: Varx0,t,xt(ℓθ) =E x0,t [Varxt(ℓθ |x 0, t)]| {z } A +E x0 [Vart(gθ(x0, t)|x 0)]| {z } B + Varx0 (Et[gθ(x0, t)])| {z } C , which is exactly the claimed decomposition Eq · 2025

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Bringing Stability to Diffusion: Decomposing and Reducing Variance of Training Masked Diffusion Models

cs.LG · 2025-11-22 · unverdicted · novelty 7.0

The paper decomposes masked diffusion model training variance into masking pattern noise, masking rate noise, and data noise, then introduces P-POTS and MIRROR to reduce variance and close the performance gap with autoregressive models.

citing papers explorer

Showing 1 of 1 citing paper.

Bringing Stability to Diffusion: Decomposing and Reducing Variance of Training Masked Diffusion Models cs.LG · 2025-11-22 · unverdicted · none · ref 16
The paper decomposes masked diffusion model training variance into masking pattern noise, masking rate noise, and data noise, then introduces P-POTS and MIRROR to reduce variance and close the performance gap with autoregressive models.

A.5.2 P-POTS: PARETO-OPTIMALPROPERTY Consider the reweighted estimator 1 p(t) lθ(x0, t, xt), t∼p(t)

fields

years

verdicts

representative citing papers

citing papers explorer