pith. sign in

hub Canonical reference

Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution

Canonical reference. 80% of citing Pith papers cite this work as background.

72 Pith papers citing it
Background 80% of classified citations
abstract

Despite their groundbreaking performance for many generative modeling tasks, diffusion models have fallen short on discrete data domains such as natural language. Crucially, standard diffusion models rely on the well-established theory of score matching, but efforts to generalize this to discrete structures have not yielded the same empirical gains. In this work, we bridge this gap by proposing score entropy, a novel loss that naturally extends score matching to discrete spaces, integrates seamlessly to build discrete diffusion models, and significantly boosts performance. Experimentally, we test our Score Entropy Discrete Diffusion models (SEDD) on standard language modeling tasks. For comparable model sizes, SEDD beats existing language diffusion paradigms (reducing perplexity by $25$-$75$\%) and is competitive with autoregressive models, in particular outperforming GPT-2. Furthermore, compared to autoregressive mdoels, SEDD generates faithful text without requiring distribution annealing techniques like temperature scaling (around $6$-$8\times$ better generative perplexity than un-annealed GPT-2), can trade compute and quality (similar quality with $32\times$ fewer network evaluations), and enables controllable infilling (matching nucleus sampling quality while enabling other strategies besides left to right prompting).

hub tools

citation-role summary

background 8 method 2

citation-polarity summary

clear filters

representative citing papers

Large Language Diffusion Models

cs.CL · 2025-02-14 · unverdicted · novelty 8.0

LLaDA is a scalable diffusion-based language model that matches autoregressive LLMs like LLaMA3 8B on tasks and surpasses GPT-4o on reversal poem completion.

Masked Language Flow Models

cs.CL · 2026-06-26 · unverdicted · novelty 7.0

MLFMs combine masking with continuous flows to scale flow-based language models to reasoning and instruction-following tasks on GSM8K and MT-Bench.

Variational Learning for Insertion-based Generation

cs.LG · 2026-06-01 · unverdicted · novelty 7.0

Introduces the Insertion Process model for variable-length non-monotonic sequence generation via a bijective permutation mapping and permutation-based variational inference.

Free energy Estimation on Any State Space

stat.ML · 2026-05-29 · unverdicted · novelty 7.0

Generalizes neural transport methods for free energy estimation to any state space with added algebraic and group-theoretic results on time reversal and h-transforms.

Machine Unlearning for Masked Diffusion Language Models

cs.CL · 2026-05-18 · unverdicted · novelty 7.0

MDU minimizes forward KL divergence from prompt-conditional to prompt-masked unconditional predictions at masked positions to unlearn knowledge in MDLMs while trading off privacy and utility via temperature scaling.

Support Before Frequency in Discrete Diffusion

cs.LG · 2026-05-13 · unverdicted · novelty 7.0

Discrete diffusion models learn data support before frequencies because the exact reverse process decomposes edits into a dominant validity scale and a finer probability coefficient.

Layer Collapse in Diffusion Language Models

cs.LG · 2026-05-07 · unverdicted · novelty 7.0 · 2 refs

Diffusion language models develop early-layer collapse around an indispensable super-outlier due to overtraining, resulting in higher compressibility and reversed optimal sparsity patterns versus autoregressive models.

GD4: Graph-based Discrete Denoising Diffusion for MIMO Detection

cs.LG · 2026-05-01 · unverdicted · novelty 7.0

GD4 is a graph-based discrete denoising diffusion method for MIMO detection that yields higher-quality suboptimal solutions than prior diffusion detectors and classical baselines under similar compute budgets in both under- and over-determined settings.

DiscreteRTC: Discrete Diffusion Policies are Natural Asynchronous Executors

cs.RO · 2026-04-27 · unverdicted · novelty 7.0 · 2 refs

Discrete diffusion policies act as natural asynchronous executors for robotics by treating action generation as iterative unmasking, yielding higher success rates and lower computation than flow-matching real-time chunking in dynamic tasks.

citing papers explorer

Showing 4 of 4 citing papers after filters.

  • Free energy Estimation on Any State Space stat.ML · 2026-05-29 · unverdicted · none · ref 31 · internal anchor

    Generalizes neural transport methods for free energy estimation to any state space with added algebraic and group-theoretic results on time reversal and h-transforms.

  • Forward-Learned Discrete Diffusion: Learning how to noise to denoise faster stat.ML · 2026-05-18 · unverdicted · none · ref 6 · internal anchor

    FLDD learns non-Markovian marginal and posterior distributions for the forward process so a factorized reverse process can match the target better and produce higher-quality samples in fewer steps.

  • A Diffusive Classification Loss for Learning Energy-based Generative Models stat.ML · 2026-01-28 · unverdicted · none · ref 5 · internal anchor

    DiffCLF reframes EBM training as supervised classification across noise levels to avoid mode blindness while remaining computationally efficient for generative models.

  • Efficient Inference for Coupled Hidden Markov Models in Continuous Time and Discrete Space stat.ML · 2025-10-14 · unverdicted · none · ref 14 · internal anchor

    Proposes Latent Interacting Particle Systems with an efficient parameterization of twist potentials to enable approximate posterior inference for coupled continuous-time hidden Markov models via twisted sequential Monte Carlo, demonstrated on a latent SIRS graph model and real wildfire data.