Accelerating diffusion llms via adaptive parallel decoding

Daniel Israel, Guy Van den Broeck, Aditya Grover · 2025 · arXiv 2506.00413

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Drifting Objectives for Refining Discrete Diffusion Language Models

cs.CL · 2026-05-19 · unverdicted · novelty 7.0

TokenDrift refines discrete diffusion language models by applying anti-symmetric drifting to soft-token features during training, yielding large reductions in generation perplexity at low NFEs.

Support Before Frequency in Discrete Diffusion

cs.LG · 2026-05-13 · unverdicted · novelty 7.0

Discrete diffusion models learn data support before frequencies because the exact reverse process decomposes edits into a dominant validity scale and a finer probability coefficient.

BARD: Bridging AutoRegressive and Diffusion Vision-Language Models Via Highly Efficient Progressive Block Merging and Stage-Wise Distillation

cs.CV · 2026-04-15 · unverdicted · novelty 7.0

BARD bridges autoregressive and diffusion VLMs with progressive block merging plus stage-wise intra-diffusion distillation, delivering 3x speedup and new SOTA on open dVLMs using under 4.4M data points.

DMax: Aggressive Parallel Decoding for dLLMs

cs.LG · 2026-04-09 · conditional · novelty 7.0 · 2 refs

DMax uses On-Policy Uniform Training and Soft Parallel Decoding to enable aggressive parallelism in dLLMs, raising TPF on GSM8K from 2.04 to 5.47 and on MBPP from 2.71 to 5.86 while preserving accuracy.

Roll Out and Roll Back: Diffusion LLMs are Their Own Efficiency Teachers

cs.CL · 2026-05-16 · unverdicted · novelty 6.0

Diffusion LLMs can act as their own efficiency teachers by using revokable parallel decoding to identify reliable token orders and then distilling those orders into the model parameters for faster inference.

Efficient-DLM: From Autoregressive to Diffusion Language Models, and Beyond in Speed

cs.CL · 2025-12-16 · unverdicted · novelty 6.0

Efficient-DLM converts AR models to dLMs via block-wise causal attention and position-dependent masking, yielding higher accuracy and 2.7-4.5x throughput than Dream 7B and Qwen3 4B.

Diffusion Language Models Know the Answer Before Decoding

cs.CL · 2025-08-27 · conditional · novelty 6.0

DLMs show early answer convergence allowing Prophet to cut decoding steps by up to 3.4x on LLaDA-8B and Dream-7B while keeping output quality.

citing papers explorer

Showing 7 of 7 citing papers.

Drifting Objectives for Refining Discrete Diffusion Language Models cs.CL · 2026-05-19 · unverdicted · none · ref 27
TokenDrift refines discrete diffusion language models by applying anti-symmetric drifting to soft-token features during training, yielding large reductions in generation perplexity at low NFEs.
Support Before Frequency in Discrete Diffusion cs.LG · 2026-05-13 · unverdicted · none · ref 31
Discrete diffusion models learn data support before frequencies because the exact reverse process decomposes edits into a dominant validity scale and a finer probability coefficient.
BARD: Bridging AutoRegressive and Diffusion Vision-Language Models Via Highly Efficient Progressive Block Merging and Stage-Wise Distillation cs.CV · 2026-04-15 · unverdicted · none · ref 9
BARD bridges autoregressive and diffusion VLMs with progressive block merging plus stage-wise intra-diffusion distillation, delivering 3x speedup and new SOTA on open dVLMs using under 4.4M data points.
DMax: Aggressive Parallel Decoding for dLLMs cs.LG · 2026-04-09 · conditional · none · ref 37 · 2 links
DMax uses On-Policy Uniform Training and Soft Parallel Decoding to enable aggressive parallelism in dLLMs, raising TPF on GSM8K from 2.04 to 5.47 and on MBPP from 2.71 to 5.86 while preserving accuracy.
Roll Out and Roll Back: Diffusion LLMs are Their Own Efficiency Teachers cs.CL · 2026-05-16 · unverdicted · none · ref 11
Diffusion LLMs can act as their own efficiency teachers by using revokable parallel decoding to identify reliable token orders and then distilling those orders into the model parameters for faster inference.
Efficient-DLM: From Autoregressive to Diffusion Language Models, and Beyond in Speed cs.CL · 2025-12-16 · unverdicted · none · ref 43
Efficient-DLM converts AR models to dLMs via block-wise causal attention and position-dependent masking, yielding higher accuracy and 2.7-4.5x throughput than Dream 7B and Qwen3 4B.
Diffusion Language Models Know the Answer Before Decoding cs.CL · 2025-08-27 · conditional · none · ref 11
DLMs show early answer convergence allowing Prophet to cut decoding steps by up to 3.4x on LLaDA-8B and Dream-7B while keeping output quality.

Accelerating diffusion llms via adaptive parallel decoding

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer