Pretrained language models are used as energy functions for Glauber dynamics in discrete text diffusion, improving generation quality over prior diffusion LMs and matching autoregressive models on benchmarks and reasoning tasks.
Tess 2: A large-scale generalist diffusion language model.arXiv preprint arXiv:2502.13917
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4representative citing papers
Dataset-level metrics in diffusion language models mask substantial sample-level non-determinism that varies with model and system factors, which a new Factor Variance Attribution metric can decompose.
FlowLM converts diffusion LMs to flow matching via fine-tuning, achieving few-step generation that rivals or beats 2000-step diffusion and saturates faster than training flow models from scratch.
Continuous flows on token embeddings with flow-map distillation produce one-step language models whose quality exceeds recent 8-step discrete diffusion baselines on LM1B and OpenWebText.
citing papers explorer
-
Leveraging Pretrained Language Models as Energy Functions for Glauber Dynamics Text Diffusion
Pretrained language models are used as energy functions for Glauber dynamics in discrete text diffusion, improving generation quality over prior diffusion LMs and matching autoregressive models on benchmarks and reasoning tasks.
-
Dataset-Level Metrics Attenuate Non-Determinism: A Fine-Grained Non-Determinism Evaluation in Diffusion Language Models
Dataset-level metrics in diffusion language models mask substantial sample-level non-determinism that varies with model and system factors, which a new Factor Variance Attribution metric can decompose.
-
FlowLM: Few-Step Language Modeling via Diffusion-to-Flow Adaptation
FlowLM converts diffusion LMs to flow matching via fine-tuning, achieving few-step generation that rivals or beats 2000-step diffusion and saturates faster than training flow models from scratch.
-
Flow Map Language Models: One-step Language Modeling via Continuous Denoising
Continuous flows on token embeddings with flow-map distillation produce one-step language models whose quality exceeds recent 8-step discrete diffusion baselines on LM1B and OpenWebText.