hub

A survey on diffusion language models

Li, S · 2025 · arXiv 2508.10875

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

read on arXiv browse 12 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 4

citation-polarity summary

background 4

representative citing papers

Support Before Frequency in Discrete Diffusion

cs.LG · 2026-05-13 · unverdicted · novelty 7.0

Discrete diffusion models learn data support before frequencies because the exact reverse process decomposes edits into a dominant validity scale and a finer probability coefficient.

TAD: Temporal-Aware Trajectory Self-Distillation for Fast and Accurate Diffusion LLM

cs.CL · 2026-05-10 · unverdicted · novelty 7.0

TAD improves the accuracy-parallelism trade-off in diffusion LLMs via temporal-aware self-distillation that applies hard labels to soon-to-be-decoded tokens and soft supervision to future tokens.

DepCap: Adaptive Block-Wise Parallel Decoding for Efficient Diffusion LM Inference

cs.LG · 2026-04-17 · unverdicted · novelty 7.0

DepCap accelerates diffusion LM inference up to 5.63x by using last-block influence for adaptive block boundaries and conflict-free token selection for parallel decoding, with negligible quality loss.

Diffusion Language Models for Speech Recognition

cs.CL · 2026-04-15 · unverdicted · novelty 7.0

Diffusion language models and a CTC-USDM joint decoder improve ASR accuracy over standard approaches.

DMax: Aggressive Parallel Decoding for dLLMs

cs.LG · 2026-04-09 · conditional · novelty 7.0 · 2 refs

DMax uses On-Policy Uniform Training and Soft Parallel Decoding to enable aggressive parallelism in dLLMs, raising TPF on GSM8K from 2.04 to 5.47 and on MBPP from 2.71 to 5.86 while preserving accuracy.

Mitigating Mask Prior Drift and Positional Attention Collapse in Large Diffusion Vision-Language Models

cs.CV · 2026-05-14 · unverdicted · novelty 6.0 · 2 refs

Diagnoses mask prior drift and positional attention collapse in LDVLMs and introduces two plug-and-play decoding interventions that raise long-form generation quality without retraining.

ELF: Embedded Language Flows

cs.CL · 2026-05-11 · unverdicted · novelty 6.0

ELF is a continuous embedding-space flow matching model for language that stays continuous until the last step and outperforms prior discrete and continuous diffusion language models with fewer sampling steps.

TrajDLM: Topology-Aware Block Diffusion Language Model for Trajectory Generation

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

TrajDLM applies block diffusion language models to discrete road-segment sequences with topology constraints to generate realistic trajectories up to 2.8 times faster than prior methods while supporting zero-shot transfer.

Saber: An Efficient Sampling with Adaptive Acceleration and Backtracking Enhanced Remasking for Diffusion Language Model

cs.AI · 2025-10-20 · unverdicted · novelty 6.0

Saber improves both speed and accuracy of diffusion language models on code generation by dynamically adjusting unmasking steps and reverting low-confidence tokens via backtracking.

FS-DFM: Fast and Accurate Long Text Generation with Few-Step Diffusion Language Models

cs.CL · 2025-09-24 · conditional · novelty 6.0

FS-DFM enables 1024-token generation at perplexity parity with 1024-step baselines using only 8 steps via explicit step-budget training, reliable updates, and teacher guidance.

Predict-then-Diffuse: Adaptive Response Length for Compute-Budgeted Inference in Diffusion LLMs

cs.LG · 2026-05-05 · unverdicted · novelty 5.0 · 2 refs

Predict-then-Diffuse predicts response length for diffusion LLMs before inference, cutting FLOPs with a data-driven safety buffer while preserving output quality.

Deep Thinking by Markov Chain of Continuous Thoughts

cs.LG · 2025-09-29 · unverdicted · novelty 5.0

MarCos modifies transformers to perform continuous multi-step reasoning by mapping thought-level continuous states directly to next-thought distributions, achieving substantial wall-clock speedups on math problems.

citing papers explorer

Showing 12 of 12 citing papers.

Support Before Frequency in Discrete Diffusion cs.LG · 2026-05-13 · unverdicted · none · ref 28
Discrete diffusion models learn data support before frequencies because the exact reverse process decomposes edits into a dominant validity scale and a finer probability coefficient.
TAD: Temporal-Aware Trajectory Self-Distillation for Fast and Accurate Diffusion LLM cs.CL · 2026-05-10 · unverdicted · none · ref 32
TAD improves the accuracy-parallelism trade-off in diffusion LLMs via temporal-aware self-distillation that applies hard labels to soon-to-be-decoded tokens and soft supervision to future tokens.
DepCap: Adaptive Block-Wise Parallel Decoding for Efficient Diffusion LM Inference cs.LG · 2026-04-17 · unverdicted · none · ref 20
DepCap accelerates diffusion LM inference up to 5.63x by using last-block influence for adaptive block boundaries and conflict-free token selection for parallel decoding, with negligible quality loss.
Diffusion Language Models for Speech Recognition cs.CL · 2026-04-15 · unverdicted · none · ref 16
Diffusion language models and a CTC-USDM joint decoder improve ASR accuracy over standard approaches.
DMax: Aggressive Parallel Decoding for dLLMs cs.LG · 2026-04-09 · conditional · none · ref 42 · 2 links
DMax uses On-Policy Uniform Training and Soft Parallel Decoding to enable aggressive parallelism in dLLMs, raising TPF on GSM8K from 2.04 to 5.47 and on MBPP from 2.71 to 5.86 while preserving accuracy.
Mitigating Mask Prior Drift and Positional Attention Collapse in Large Diffusion Vision-Language Models cs.CV · 2026-05-14 · unverdicted · none · ref 11 · 2 links
Diagnoses mask prior drift and positional attention collapse in LDVLMs and introduces two plug-and-play decoding interventions that raise long-form generation quality without retraining.
ELF: Embedded Language Flows cs.CL · 2026-05-11 · unverdicted · none · ref 33
ELF is a continuous embedding-space flow matching model for language that stays continuous until the last step and outperforms prior discrete and continuous diffusion language models with fewer sampling steps.
TrajDLM: Topology-Aware Block Diffusion Language Model for Trajectory Generation cs.LG · 2026-05-11 · unverdicted · none · ref 29
TrajDLM applies block diffusion language models to discrete road-segment sequences with topology constraints to generate realistic trajectories up to 2.8 times faster than prior methods while supporting zero-shot transfer.
Saber: An Efficient Sampling with Adaptive Acceleration and Backtracking Enhanced Remasking for Diffusion Language Model cs.AI · 2025-10-20 · unverdicted · none · ref 16
Saber improves both speed and accuracy of diffusion language models on code generation by dynamically adjusting unmasking steps and reverting low-confidence tokens via backtracking.
FS-DFM: Fast and Accurate Long Text Generation with Few-Step Diffusion Language Models cs.CL · 2025-09-24 · conditional · none · ref 5
FS-DFM enables 1024-token generation at perplexity parity with 1024-step baselines using only 8 steps via explicit step-budget training, reliable updates, and teacher guidance.
Predict-then-Diffuse: Adaptive Response Length for Compute-Budgeted Inference in Diffusion LLMs cs.LG · 2026-05-05 · unverdicted · none · ref 8 · 2 links
Predict-then-Diffuse predicts response length for diffusion LLMs before inference, cutting FLOPs with a data-driven safety buffer while preserving output quality.
Deep Thinking by Markov Chain of Continuous Thoughts cs.LG · 2025-09-29 · unverdicted · none · ref 9
MarCos modifies transformers to perform continuous multi-step reasoning by mapping thought-level continuous states directly to next-thought distributions, achieving substantial wall-clock speedups on math problems.

A survey on diffusion language models

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer