Absorbing discrete diffusion models the conditional distributions of clean data; reparameterizing yields a time-independent RADD that unifies with AO-ARMs and reaches SOTA perplexity among diffusion models on zero-shot language benchmarks.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
Categorical flow matching models scale to 1.7B parameters on 2.1T tokens, enabling 4-step text generation with competitive quality and benchmark performance.
citing papers explorer
-
Your Absorbing Discrete Diffusion Secretly Models the Conditional Distributions of Clean Data
Absorbing discrete diffusion models the conditional distributions of clean data; reparameterizing yields a time-independent RADD that unifies with AO-ARMs and reaches SOTA perplexity among diffusion models on zero-shot language benchmarks.
-
Scaling Categorical Flow Maps
Categorical flow matching models scale to 1.7B parameters on 2.1T tokens, enabling 4-step text generation with competitive quality and benchmark performance.
- Spherical Flows for Sampling Categorical Data