super hub Canonical reference

Large Language Diffusion Models

Fengqi Zhu, Jingyang Ou, Jun Hu, Shen Nie, Xiaolu Zhang, Zebin You · 2025 · cs.CL · arXiv 2502.09992

Canonical reference. 72% of citing Pith papers cite this work as background.

164 Pith papers citing it

Background 72% of classified citations

open full Pith review browse 164 citing papers more from Fengqi Zhu arXiv PDF

abstract

The capabilities of large language models (LLMs) are widely regarded as relying on autoregressive models (ARMs). We challenge this notion by introducing LLaDA, a diffusion model trained from scratch under the pre-training and supervised fine-tuning (SFT) paradigm. LLaDA employs a forward data masking process and a reverse generation process, parameterized by a Transformer to predict masked tokens. It provides a principled generative approach for probabilistic inference by optimizing a likelihood lower bound. Across extensive benchmarks on general tasks, math, code, and so on, LLaDA demonstrates strong scalability and performs comparably to our self-constructed ARM baselines. Remarkably, LLaDA 8B is competitive with strong LLMs like LLaMA3 8B in in-context learning and, after SFT, exhibits impressive instruction-following abilities in case studies such as multi-turn dialogue. Moreover, LLaDA addresses the reversal curse, surpassing GPT-4o in a reversal poem completion task. Our findings show the promise of diffusion models for language modeling at scale and challenge the common assumption that core LLM capabilities discussed above inherently depend on ARMs. Project page and codes: https://ml-gsai.github.io/LLaDA-demo/.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 22 method 5 baseline 2

citation-polarity summary

background 21 use method 5 baseline 2 unclear 1

claims ledger

abstract The capabilities of large language models (LLMs) are widely regarded as relying on autoregressive models (ARMs). We challenge this notion by introducing LLaDA, a diffusion model trained from scratch under the pre-training and supervised fine-tuning (SFT) paradigm. LLaDA employs a forward data masking process and a reverse generation process, parameterized by a Transformer to predict masked tokens. It provides a principled generative approach for probabilistic inference by optimizing a likelihood lower bound. Across extensive benchmarks on general tasks, math, code, and so on, LLaDA demonstrate

authors

Fengqi Zhu Jingyang Ou Jun Hu Shen Nie Xiaolu Zhang Zebin You

co-cited works

representative citing papers

NPU Design for Diffusion Language Model Inference

cs.AR · 2026-01-28 · unverdicted · novelty 8.0

Introduces the first NPU accelerator for diffusion language models with dLLM-specific ISA, hardware execution model, BAOS KV quantization, and 7nm RTL synthesis.

Flow Reasoning Models: Scaling Reasoning Through Iterative Self-Refinement

cs.AI · 2026-06-28 · conditional · novelty 7.0

Flow models reach 99.2% Sudoku accuracy in 7 passes and 96.1% on out-of-distribution Sudoku-Extreme by selecting dynamically stable candidates and training with self-conditioning plus DPO to avoid failed outputs.

Masked Diffusion Decoding as $x$-Prediction Flow

cs.CL · 2026-06-27 · unverdicted · novelty 7.0

Masked diffusion LMs can use continuous x-prediction flow with token-wise asynchronous updates and an RL policy network to reach 97% performance on HumanEval using only 25% of the usual decoding budget.

Masked Language Flow Models

cs.CL · 2026-06-26 · unverdicted · novelty 7.0

MLFMs combine masking with continuous flows to scale flow-based language models to reasoning and instruction-following tasks on GSM8K and MT-Bench.

Unified Energy for Invariant and Independent Decoding in Diffusion Language Models

cs.CL · 2026-06-08 · unverdicted · novelty 7.0

The paper introduces Uni-E, a unified energy for DLMs that accounts for model capacity, dependency and invariance, can be computed exactly, and corrects distribution shifts from dependency and invariance.

Hacking Generative Perplexity: Why Unconditional Text Evaluation Needs Distributional Metrics

cs.CL · 2026-06-07 · accept · novelty 7.0

Naive samplers beat published diffusion and flow models on gen-PPL with incoherent output, proving the metric unsound and motivating distributional evaluation suites.

AsyncLane: Decoupling Refinement from Advancement in Diffusion Language Model Decoding

cs.CL · 2026-06-07 · unverdicted · novelty 7.0 · 2 refs

AsyncLane decouples refinement from advancement in DLM decoding via lane forking at delimiters plus efficiency optimizations, yielding up to 3x throughput gains on math and code benchmarks without retraining.

Beyond Matching: Category-Guided Latent Intent Reasoning for Generative Retrieval in E-Commerce

cs.IR · 2026-06-05 · unverdicted · novelty 7.0

CaLIR learns continuous latent intent states guided by product category hierarchies for generative retrieval, combining hierarchical reasoning and dynamic prefix tries to balance effectiveness and low-latency inference on multilingual e-commerce data.

Knowledge Editing in Masked Diffusion Language Models

cs.CL · 2026-06-02 · unverdicted · novelty 7.0

Locate-then-edit succeeds at the same early-to-mid MLP locations in masked diffusion models as in autoregressive models, but requires optimization over intermediate partial-mask states to handle multi-token targets.

MaskForge: Structure-Aware Adaptive Attacks for Jailbreaking Diffusion Large Language Models

cs.CR · 2026-06-01 · unverdicted · novelty 7.0

MaskForge reaches 79.3% average attack success rate on five dLLMs by adaptively searching and accumulating structural attack patterns with a UCB bandit, improving 17.6% over baselines and transferring to 88.2% on AdvBench.

TAPS: Target-Aware Prefix Tree Selection for Diffusion-Drafted Speculative Decoding

cs.AI · 2026-05-30 · unverdicted · novelty 7.0

TAPS converts diffusion marginal probabilities into path-conditioned acceptance estimates to select prefix-closed subtrees under a fixed verification budget, achieving up to 7.9x end-to-end speedup over autoregressive decoding.

Adaptive Order Policies for Masked Diffusion

cs.LG · 2026-05-29 · unverdicted · novelty 7.0

A policy network learns to choose unmasking order in masked diffusion by reweighting the loss, outperforming random and heuristic baselines on ordering-sensitive tasks.

Bastion: Budget-Aware Speculative Decoding with Tree-structured Block Diffusion Drafting

cs.LG · 2026-05-28 · unverdicted · novelty 7.0

BASTION is a budget-aware speculative decoding framework with adaptive tree-structured block diffusion drafting that reports up to 6.61x speedup and 39% improvement over block-diffusion baselines.

Mind-Omni: A Unified Multi-Task Framework for Brain-Vision-Language Modeling via Discrete Diffusion

cs.AI · 2026-05-28 · unverdicted · novelty 7.0

Mind-Omni unifies seven brain-vision-language tasks in one discrete-diffusion framework with a brain tokenizer and a new BQA dataset, claiming SOTA multi-task performance competitive with larger single-task models.

TUBE: Tangent Upper Bound on Evidence for Discrete Diffusion Language Models

cs.LG · 2026-05-22 · unverdicted · novelty 7.0

TUBE is a new upper bound on evidence for discrete diffusion models that shows block MDMs and AO-ARMs have strictly lower likelihood than exact ARMs.

Uniform Diffusion Models Revisited: Leave-One-Out Denoiser and Absorbing State Reformulation

cs.LG · 2026-05-21 · unverdicted · novelty 7.0

Uniform diffusion models rely on a leave-one-out denoiser rather than the usual denoising posterior, with exact conversions derived; an absorbing-state reformulation is introduced that matches or exceeds masked diffusion on language modeling while preserving the original joint distribution.

Drifting Objectives for Refining Discrete Diffusion Language Models

cs.CL · 2026-05-19 · unverdicted · novelty 7.0

TokenDrift refines discrete diffusion language models by applying anti-symmetric drifting to soft-token features during training, yielding large reductions in generation perplexity at low NFEs.

Backdooring Masked Diffusion Language Models

cs.LG · 2026-05-19 · unverdicted · novelty 7.0 · 2 refs

SHADOWMASK backdoors MDLMs by replacing the all-mask terminal distribution with a trigger-mask mixture prior, achieving near-100% attack success on DiT and LLaDA-8B models across multiple datasets while resisting fine-tuning and some defenses.

Machine Unlearning for Masked Diffusion Language Models

cs.CL · 2026-05-18 · unverdicted · novelty 7.0

MDU minimizes forward KL divergence from prompt-conditional to prompt-masked unconditional predictions at masked positions to unlearn knowledge in MDLMs while trading off privacy and utility via temperature scaling.

Constrained Code Generation with Discrete Diffusion

cs.CL · 2026-05-16 · unverdicted · novelty 7.0

Constrained Diffusion for Code (CDC) integrates constraint satisfaction into the reverse denoising process of discrete diffusion models via constraint-aware operators that use optimization and program analysis to steer generation toward feasible programs.

Dynamic Chunking for Diffusion Language Models

cs.CL · 2026-05-15 · unverdicted · novelty 7.0

DCDM replaces positional blocks with learnable semantic chunks via differentiable Chunking Attention, yielding consistent gains over block and unstructured diffusion baselines up to 1.5B parameters.

PSD: Pushing the Pareto Frontier of Diffusion LLMs via Parallel Speculative Decoding

cs.CL · 2026-05-15 · unverdicted · novelty 7.0

PSD is a training-free framework that jointly optimizes spatial unmasking and temporal speculative decoding in diffusion LLMs to reach up to 5.5x tokens per forward pass while preserving accuracy comparable to greedy decoding.

From Table to Cell: Attention for Better Reasoning with TABALIGN

cs.AI · 2026-05-14 · unverdicted · novelty 7.0

TABALIGN pairs a diffusion language model planner emitting binary cell masks with a trained attention verifier, raising average accuracy 15.76 points over strong baselines on eight table benchmarks while speeding execution 44.64%.

Factorization-Error-Free Discrete Diffusion Language Model via Speculative Decoding

cs.CL · 2026-05-14 · unverdicted · novelty 7.0

FeF-DLLM achieves factorization-error-free generation in discrete diffusion language models via prefix-conditioned posterior factorization and speculative decoding, delivering 5.04 pp higher accuracy and 3.86x faster inference on GSM8K, MATH, HumanEval, and MBPP.

citing papers explorer

Showing 50 of 63 citing papers after filters.

Masked Diffusion Decoding as $x$-Prediction Flow cs.CL · 2026-06-27 · unverdicted · none · ref 4 · internal anchor
Masked diffusion LMs can use continuous x-prediction flow with token-wise asynchronous updates and an RL policy network to reach 97% performance on HumanEval using only 25% of the usual decoding budget.
Masked Language Flow Models cs.CL · 2026-06-26 · unverdicted · none · ref 31 · internal anchor
MLFMs combine masking with continuous flows to scale flow-based language models to reasoning and instruction-following tasks on GSM8K and MT-Bench.
Unified Energy for Invariant and Independent Decoding in Diffusion Language Models cs.CL · 2026-06-08 · unverdicted · none · ref 32 · internal anchor
The paper introduces Uni-E, a unified energy for DLMs that accounts for model capacity, dependency and invariance, can be computed exactly, and corrects distribution shifts from dependency and invariance.
Hacking Generative Perplexity: Why Unconditional Text Evaluation Needs Distributional Metrics cs.CL · 2026-06-07 · accept · none · ref 21 · internal anchor
Naive samplers beat published diffusion and flow models on gen-PPL with incoherent output, proving the metric unsound and motivating distributional evaluation suites.
AsyncLane: Decoupling Refinement from Advancement in Diffusion Language Model Decoding cs.CL · 2026-06-07 · unverdicted · none · ref 13 · 2 links · internal anchor
AsyncLane decouples refinement from advancement in DLM decoding via lane forking at delimiters plus efficiency optimizations, yielding up to 3x throughput gains on math and code benchmarks without retraining.
Knowledge Editing in Masked Diffusion Language Models cs.CL · 2026-06-02 · unverdicted · none · ref 56 · internal anchor
Locate-then-edit succeeds at the same early-to-mid MLP locations in masked diffusion models as in autoregressive models, but requires optimization over intermediate partial-mask states to handle multi-token targets.
Drifting Objectives for Refining Discrete Diffusion Language Models cs.CL · 2026-05-19 · unverdicted · none · ref 7 · internal anchor
TokenDrift refines discrete diffusion language models by applying anti-symmetric drifting to soft-token features during training, yielding large reductions in generation perplexity at low NFEs.
Machine Unlearning for Masked Diffusion Language Models cs.CL · 2026-05-18 · unverdicted · none · ref 2 · internal anchor
MDU minimizes forward KL divergence from prompt-conditional to prompt-masked unconditional predictions at masked positions to unlearn knowledge in MDLMs while trading off privacy and utility via temperature scaling.
Constrained Code Generation with Discrete Diffusion cs.CL · 2026-05-16 · unverdicted · none · ref 23 · internal anchor
Constrained Diffusion for Code (CDC) integrates constraint satisfaction into the reverse denoising process of discrete diffusion models via constraint-aware operators that use optimization and program analysis to steer generation toward feasible programs.
Dynamic Chunking for Diffusion Language Models cs.CL · 2026-05-15 · unverdicted · none · ref 31 · internal anchor
DCDM replaces positional blocks with learnable semantic chunks via differentiable Chunking Attention, yielding consistent gains over block and unstructured diffusion baselines up to 1.5B parameters.
PSD: Pushing the Pareto Frontier of Diffusion LLMs via Parallel Speculative Decoding cs.CL · 2026-05-15 · unverdicted · none · ref 5 · internal anchor
PSD is a training-free framework that jointly optimizes spatial unmasking and temporal speculative decoding in diffusion LLMs to reach up to 5.5x tokens per forward pass while preserving accuracy comparable to greedy decoding.
Factorization-Error-Free Discrete Diffusion Language Model via Speculative Decoding cs.CL · 2026-05-14 · unverdicted · none · ref 11 · internal anchor
FeF-DLLM achieves factorization-error-free generation in discrete diffusion language models via prefix-conditioned posterior factorization and speculative decoding, delivering 5.04 pp higher accuracy and 3.86x faster inference on GSM8K, MATH, HumanEval, and MBPP.
Self-Distilled Trajectory-Aware Boltzmann Modeling: Bridging the Training-Inference Discrepancy in Diffusion Language Models cs.CL · 2026-05-12 · unverdicted · none · ref 7 · 2 links · internal anchor
TABOM is a trajectory-aligned Boltzmann modeling framework that turns self-distilled inference paths into a pairwise ranking loss to close the training-inference gap in diffusion language models and expand their effective capabilities.
TAD: Temporal-Aware Trajectory Self-Distillation for Fast and Accurate Diffusion LLM cs.CL · 2026-05-10 · unverdicted · none · ref 5 · internal anchor
TAD improves the accuracy-parallelism trade-off in diffusion LLMs via temporal-aware self-distillation that applies hard labels to soon-to-be-decoded tokens and soft supervision to future tokens.
Guidance Is Not a Hyperparameter: Learning Dynamic Control in Diffusion Language Models cs.CL · 2026-05-08 · unverdicted · none · ref 1 · internal anchor
Adaptive guidance trajectories learned via PPO outperform fixed-scale CFG on controllability-quality balance in three controlled NLP generation tasks with discrete diffusion models.
Focus on the Core: Empowering Diffusion Large Language Models by Self-Contrast cs.CL · 2026-05-02 · unverdicted · none · ref 22 · internal anchor
FoCore uses self-contrast on early-converging high-density tokens to boost diffusion LLM quality on reasoning benchmarks while cutting decoding steps by over 2x.
LangFlow: Continuous Diffusion Rivals Discrete in Language Modeling cs.CL · 2026-04-13 · unverdicted · none · ref 14 · internal anchor
LangFlow is the first continuous diffusion language model to rival discrete diffusion on perplexity and generative perplexity while exceeding autoregressive baselines on several zero-shot tasks.
Unlocking Prompt Infilling Capability for Diffusion Language Models cs.CL · 2026-04-04 · unverdicted · none · ref 17 · internal anchor
Full-sequence masking in SFT unlocks prompt infilling for masked diffusion language models, producing templates that match or surpass hand-designed ones and transfer across models.
Dependency-Guided Parallel Decoding in Discrete Diffusion Language Models cs.CL · 2026-04-02 · unverdicted · none · ref 11 · internal anchor
DEMASK adds a lightweight pairwise-dependency predictor to dLLMs and uses greedy selection to enable parallel unmasking whose total-variation error is provably bounded under sub-additivity.
LogicDiff: Logic-Guided Denoising Improves Zero-Shot Reasoning in Masked Diffusion Language Models cs.CL · 2026-03-24 · conditional · none · ref 10 · internal anchor
Logic-role-guided unmasking in masked diffusion models raises zero-shot GSM8K accuracy from 22% to 61% by enforcing logical generation order.
A Comparative analysis of Layer-wise Representational Capacity in AR and Diffusion LLMs cs.CL · 2026-03-08 · unverdicted · none · ref 11 · internal anchor
Diffusion language models form more global representations with early-layer redundancy compared to autoregressive models, allowing layer skipping for up to 18.75% FLOP savings while maintaining over 90% performance.
Nemotron-Labs-TwoTower: Diffusion Language Modeling with Pretrained Autoregressive Context cs.CL · 2026-06-25 · unverdicted · none · ref 6 · internal anchor
TwoTower decouples autoregressive context from diffusion denoising using a frozen AR tower and bidirectional diffusion tower, retaining 98.7% baseline quality with 2.42X throughput on a 30B model trained on 2.1T tokens.
Back on Track: Aligning Rewards and States for Reasoning in Diffusion Large Language Models cs.CL · 2026-06-07 · unverdicted · none · ref 8 · internal anchor
PAPO improves reasoning performance in diffusion LLMs by converting sparse terminal rewards into dense step-wise credit and replaying real high-uncertainty trajectories, reporting gains up to 42.2% on Countdown.
NAVIRA: Decoupled Stochastic Remasking for Masked Diffusion Language Models cs.CL · 2026-06-04 · unverdicted · none · ref 4 · internal anchor
NAVIRA decouples quality scoring from regeneration via stochastic remasking in masked diffusion LMs, improving fluency and LLM-judge scores on a 170M model.
SemBlock: Semantic Boundary Dynamic Blocks for Diffusion LLMs cs.CL · 2026-06-03 · unverdicted · none · ref 13 · internal anchor
SemBlock adds semantic-boundary prediction to enable dynamic block decoding in diffusion LLMs and reports gains over fixed-block and AdaBlock baselines on GSM8K, IFEval, MATH, and HumanEval.
Fast-dLLM++: Fr\'{e}chet Profile Decoding for Faster Diffusion LLM Inference cs.CL · 2026-06-01 · conditional · none · ref 3 · internal anchor
Fast-dLLM++ generalizes Fast-dLLM decoding to heterogeneous confidence profiles via Fréchet profile selection, delivering up to 37% throughput gains on GSM8K, MATH, HumanEval, and MBPP with LLaDA-8B.
dMoE: dLLMs with Learnable Block Experts cs.CL · 2026-05-29 · unverdicted · none · ref 3 · internal anchor
dMoE aggregates token expert distributions to block level in dLLMs, cutting unique experts from 69.5 to 14.6, memory by 76-80%, and latency by 1.14-1.66x while retaining 99.11% performance.
Efficient Diffusion LLMs via Temporal-Spatial Parallel Decoding and Confidence Extrapolation cs.CL · 2026-05-29 · unverdicted · none · ref 18 · internal anchor
Introduces TSPD with a trajectory-feature controller and training-free CE to reduce denoising steps in dLLMs while aiming to preserve quality.
Fast-dDrive: Efficient Block-Diffusion VLM for Autonomous Driving cs.CL · 2026-05-22 · unverdicted · none · ref 10 · 2 links · internal anchor
Fast-dDrive is a block-diffusion VLA that reports SOTA accuracy on WOD-E2E and nuScenes driving benchmarks together with 12x throughput over autoregressive baselines via section scaffolds and test-time averaging.
PulseCol: Periodically Refreshed Column-Sparse Attention for Accelerating Diffusion Language Models cs.CL · 2026-05-20 · unverdicted · none · ref 18 · internal anchor
PulseCol introduces periodically refreshed column-sparse attention to achieve up to 1.95x speedup over FlashAttention in diffusion LLMs with maintained model quality.
Roll Out and Roll Back: Diffusion LLMs are Their Own Efficiency Teachers cs.CL · 2026-05-16 · unverdicted · none · ref 10 · internal anchor
Diffusion LLMs can act as their own efficiency teachers by using revokable parallel decoding to identify reliable token orders and then distilling those orders into the model parameters for faster inference.
Language Generation as Optimal Control: Closed-Loop Diffusion in Latent Control Space cs.CL · 2026-05-14 · unverdicted · none · ref 33 · 3 links · internal anchor
Manta-LM approximates the HJB equation via flow matching in latent control space to realize closed-loop optimal control for language generation.
BitLM: Unlocking Multi-Token Language Generation with Bitwise Continuous Diffusion cs.CL · 2026-05-12 · unverdicted · none · ref 19 · internal anchor
BitLM replaces per-token softmax with bitwise continuous diffusion inside causal blocks to generate multiple tokens in parallel while preserving autoregressive structure.
Edit-Based Refinement for Parallel Masked Diffusion Language Models cs.CL · 2026-05-10 · unverdicted · none · ref 22 · internal anchor
ME-DLM augments parallel masked diffusion models with edit-distance-supervised refinements to raise quality on coding and math benchmarks while using far fewer diffusion steps.
TextLDM: Language Modeling with Continuous Latent Diffusion cs.CL · 2026-05-08 · unverdicted · none · ref 11 · internal anchor
TextLDM applies DiT-style latent diffusion with flow matching to language modeling via a REPA-aligned VAE, outperforming prior diffusion LMs and matching GPT-2 when trained from scratch on OpenWebText2.
Continuous Latent Diffusion Language Model cs.CL · 2026-05-07 · unverdicted · none · ref 70 · internal anchor
Cola DLM proposes a hierarchical latent diffusion model that learns a text-to-latent mapping, fits a global semantic prior in continuous space with a block-causal DiT, and performs conditional decoding, establishing latent prior modeling as an alternative to token-level autoregressive language model
Where to Place the Query? Unveiling and Mitigating Positional Bias in In-Context Learning for Diffusion LLMs via Decoding Dynamics cs.CL · 2026-04-26 · unverdicted · none · ref 24 · internal anchor
Query position is a first-order variable in dLLM ICL whose variance matches semantic quality impact; mitigated via Average Confidence metric and training-free Auto-ICL routing.
A Universal Avoidance Method for Diverse Multi-branch Generation cs.CL · 2026-04-19 · unverdicted · none · ref 3 · internal anchor
UAG is a universal avoidance generation method that increases multi-branch diversity in diffusion and transformer models by penalizing output similarity, delivering up to 1.9x higher diversity with 4.4x speed and 1/64th the FLOPs of prior methods.
Stability-Weighted Decoding for Diffusion Language Models cs.CL · 2026-04-18 · unverdicted · none · ref 12 · internal anchor
Stability-Weighted Decoding improves diffusion LLM accuracy by modulating token scores with temporal stability from KL divergence between prediction steps.
Differences in Text Generated by Diffusion and Autoregressive Language Models cs.CL · 2026-04-04 · unverdicted · none · ref 25 · internal anchor
DLMs exhibit lower n-gram entropy, higher semantic coherence, and higher semantic diversity than ARMs, primarily due to bidirectional context and remasking decoding strategies.
The Illusion of Latent Generalization: Bi-directionality and the Reversal Curse cs.CL · 2026-03-13 · unverdicted · none · ref 6 · internal anchor
Bidirectional objectives mitigate reversal by requiring explicit source-as-target signals and storing directions as distinct representations instead of inducing latent generalization.
Flow Map Language Models: One-step Language Modeling via Continuous Denoising cs.CL · 2026-02-18 · conditional · none · ref 22 · 2 links · internal anchor
Continuous flows on token embeddings with flow-map distillation produce one-step language models whose quality exceeds recent 8-step discrete diffusion baselines on LM1B and OpenWebText.
Diffusion-State Policy Optimization for Masked Diffusion Language Models cs.CL · 2026-02-06 · unverdicted · none · ref 6 · 2 links · internal anchor
DiSPO optimizes intermediate decisions in masked diffusion LMs by branching at selected masked states, resampling tokens, scoring completions, and updating only new tokens using a derived policy-gradient estimator that reuses terminal rollouts.
Training-Trajectory-Aware Token Selection cs.CL · 2026-01-15 · unverdicted · none · ref 17 · internal anchor
Training-Trajectory-Aware Token Selection (T3S) reconstructs the token-level training objective to overcome a performance bottleneck in continual distillation of reasoning capabilities from large to small language models.
Efficient-DLM: From Autoregressive to Diffusion Language Models, and Beyond in Speed cs.CL · 2025-12-16 · unverdicted · none · ref 3 · internal anchor
Efficient-DLM converts AR models to dLMs via block-wise causal attention and position-dependent masking, yielding higher accuracy and 2.7-4.5x throughput than Dream 7B and Qwen3 4B.
FS-DFM: Fast and Accurate Long Text Generation with Few-Step Diffusion Language Models cs.CL · 2025-09-24 · conditional · none · ref 7 · internal anchor
FS-DFM enables 1024-token generation at perplexity parity with 1024-step baselines using only 8 steps via explicit step-budget training, reliable updates, and teacher guidance.
Diffusion Language Models Know the Answer Before Decoding cs.CL · 2025-08-27 · conditional · none · ref 15 · internal anchor
DLMs show early answer convergence allowing Prophet to cut decoding steps by up to 3.4x on LLaDA-8B and Dream-7B while keeping output quality.
Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference cs.CL · 2025-08-04 · unverdicted · none · ref 15 · internal anchor
Seed Diffusion Preview is a discrete diffusion language model that reaches 2146 tokens per second inference on H20 GPUs with competitive code benchmark performance, establishing a new speed-quality Pareto frontier.
BlockPilot: Instance-Adaptive Policy Learning for Diffusion-based Speculative Decoding cs.CL · 2026-06-30 · unverdicted · none · ref 28 · internal anchor
BlockPilot is an instance-adaptive policy that predicts optimal block size from the prefilling representation for diffusion speculative decoding, reporting 5.92 acceptance length and 4.20x speedup on Qwen3-4B.
Dynamic Infilling Anchors for Format-Constrained Generation in Diffusion Large Language Models cs.CL · 2026-06-03 · unverdicted · none · ref 67 · internal anchor
DIA is a training-free method that dynamically adjusts anchor positions in diffusion LLMs to improve format compliance and accuracy on reasoning benchmarks like GSM8K and MATH.

Large Language Diffusion Models

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer