Pretrained language models are used as energy functions for Glauber dynamics in discrete text diffusion, improving generation quality over prior diffusion LMs and matching autoregressive models on benchmarks and reasoning tasks.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
Absorbing discrete diffusion models the conditional distributions of clean data; reparameterizing yields a time-independent RADD that unifies with AO-ARMs and reaches SOTA perplexity among diffusion models on zero-shot language benchmarks.
Spherical flows on S^{d-1} with vMF noise reduce the continuity equation to a scalar ODE in cosine similarity, yielding posterior-weighted marginal velocity and score that enable ODE and predictor-corrector sampling for categorical sequences, with the posterior trained by cross-entropy and empirical
citing papers explorer
-
Leveraging Pretrained Language Models as Energy Functions for Glauber Dynamics Text Diffusion
Pretrained language models are used as energy functions for Glauber dynamics in discrete text diffusion, improving generation quality over prior diffusion LMs and matching autoregressive models on benchmarks and reasoning tasks.
-
Spherical Flows for Sampling Categorical Data
Spherical flows on S^{d-1} with vMF noise reduce the continuity equation to a scalar ODE in cosine similarity, yielding posterior-weighted marginal velocity and score that enable ODE and predictor-corrector sampling for categorical sequences, with the posterior trained by cross-entropy and empirical