hub

DKV-Cache: The cache for diffusion language models

Xinyin Ma, Runpeng Yu, Gongfan Fang, Xinchao Wang · 2025 · arXiv 2505.15781

18 Pith papers cite this work. Polarity classification is still indexing.

18 Pith papers citing it

read on arXiv browse 18 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 3 method 1

citation-polarity summary

background 3 use method 1

representative citing papers

Drifting Objectives for Refining Discrete Diffusion Language Models

cs.CL · 2026-05-19 · unverdicted · novelty 7.0

TokenDrift refines discrete diffusion language models by applying anti-symmetric drifting to soft-token features during training, yielding large reductions in generation perplexity at low NFEs.

Muninn: Your Trajectory Diffusion Model But Faster

cs.RO · 2026-05-11 · unverdicted · novelty 7.0

Muninn accelerates diffusion trajectory planners up to 4.6x by spending an uncertainty budget to decide when to cache denoiser outputs, preserving performance and certifying bounded deviation from full computation.

TAD: Temporal-Aware Trajectory Self-Distillation for Fast and Accurate Diffusion LLM

cs.CL · 2026-05-10 · unverdicted · novelty 7.0

TAD improves the accuracy-parallelism trade-off in diffusion LLMs via temporal-aware self-distillation that applies hard labels to soon-to-be-decoded tokens and soft supervision to future tokens.

DARE: Diffusion Language Model Activation Reuse for Efficient Inference

cs.LG · 2026-05-01 · unverdicted · novelty 7.0

DARE reuses up to 87% of attention activations in diffusion LLMs through KV caching and output reuse, delivering 1.2x per-layer latency gains with average performance drops of 1.2-2.0%.

$R^2$-dLLM: Accelerating Diffusion Large Language Models via Spatio-Temporal Redundancy Reduction

cs.CL · 2026-04-21 · unverdicted · novelty 7.0

R²-dLLM reduces dLLM decoding steps by up to 75% via spatio-temporal redundancy reduction while keeping generation quality competitive.

DMax: Aggressive Parallel Decoding for dLLMs

cs.LG · 2026-04-09 · conditional · novelty 7.0 · 2 refs

DMax uses On-Policy Uniform Training and Soft Parallel Decoding to enable aggressive parallelism in dLLMs, raising TPF on GSM8K from 2.04 to 5.47 and on MBPP from 2.71 to 5.86 while preserving accuracy.

A Comparative analysis of Layer-wise Representational Capacity in AR and Diffusion LLMs

cs.CL · 2026-03-08 · unverdicted · novelty 7.0

Diffusion language models form more global representations with early-layer redundancy compared to autoregressive models, allowing layer skipping for up to 18.75% FLOP savings while maintaining over 90% performance.

Demystifying MaskGIT Sampler and Beyond: Adaptive Order Selection in Masked Diffusion

cs.LG · 2025-10-06 · unverdicted · novelty 7.0

Theoretical analysis reveals MaskGIT's implicit temperature sampling in masked diffusion; proposes equivalent moment sampler and efficiency techniques for adaptive unmasking with image and text experiments.

PulseCol: Periodically Refreshed Column-Sparse Attention for Accelerating Diffusion Language Models

cs.CL · 2026-05-20 · unverdicted · novelty 6.0

PulseCol introduces periodically refreshed column-sparse attention to achieve up to 1.95x speedup over FlashAttention in diffusion LLMs with maintained model quality.

Elastic-dLLM: Position Preserving Context Compression and Augmentation of Diffusion LLMs

cs.LG · 2026-05-18 · unverdicted · novelty 6.0

Position-preserving MASK token compression reduces redundancy in diffusion LLMs to accelerate parallel decoding and enable context folding for longer sequences.

Orthrus: Memory-Efficient Parallel Token Generation via Dual-View Diffusion

cs.LG · 2026-05-12 · unverdicted · novelty 6.0 · 2 refs

Orthrus unifies autoregressive LLMs and diffusion models via shared KV cache and consensus to enable up to 7.8x parallel token generation speedup with O(1) memory overhead and lossless results.

Consistent Diffusion Language Models

cs.LG · 2026-04-30 · unverdicted · novelty 6.0

CDLM trains denoisers to be path-invariant across stochastic posterior bridges in discrete diffusion, unifying prior methods and achieving new SOTA few-step text generation performance.

Stability-Weighted Decoding for Diffusion Language Models

cs.CL · 2026-04-18 · unverdicted · novelty 6.0

Stability-Weighted Decoding improves diffusion LLM accuracy by modulating token scores with temporal stability from KL divergence between prediction steps.

DualDiffusion: A Speculative Decoding Strategy for Masked Diffusion Models

cs.LG · 2026-04-06 · unverdicted · novelty 6.0

DualDiffusion combines a lightweight drafter using approximations with a full verifier to reduce generation steps in masked diffusion models while keeping accuracy on MMLU and GSM8K.

Efficient-DLM: From Autoregressive to Diffusion Language Models, and Beyond in Speed

cs.CL · 2025-12-16 · unverdicted · novelty 6.0

Efficient-DLM converts AR models to dLMs via block-wise causal attention and position-dependent masking, yielding higher accuracy and 2.7-4.5x throughput than Dream 7B and Qwen3 4B.

Diffusion Language Models Know the Answer Before Decoding

cs.CL · 2025-08-27 · conditional · novelty 6.0

DLMs show early answer convergence allowing Prophet to cut decoding steps by up to 3.4x on LLaDA-8B and Dream-7B while keeping output quality.

Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model

cs.LG · 2025-05-29 · unverdicted · novelty 6.0

Muddit is a unified discrete diffusion transformer that integrates strong visual priors from a pretrained text-to-image model with a lightweight text decoder to enable fast parallel generation across text and image modalities.

TIDE: Efficient and Lossless MoE Diffusion LLM Inference with I/O-aware Expert Offload

cs.CL · 2026-05-19 · unverdicted · novelty 5.0

TIDE schedules I/O-aware expert offloading for MoE diffusion LLMs by solving for an optimal refresh interval that exploits temporal stability of activations, yielding up to 1.5x throughput gain losslessly.

citing papers explorer

Showing 18 of 18 citing papers.

Drifting Objectives for Refining Discrete Diffusion Language Models cs.CL · 2026-05-19 · unverdicted · none · ref 29
TokenDrift refines discrete diffusion language models by applying anti-symmetric drifting to soft-token features during training, yielding large reductions in generation perplexity at low NFEs.
Muninn: Your Trajectory Diffusion Model But Faster cs.RO · 2026-05-11 · unverdicted · none · ref 40
Muninn accelerates diffusion trajectory planners up to 4.6x by spending an uncertainty budget to decide when to cache denoiser outputs, preserving performance and certifying bounded deviation from full computation.
TAD: Temporal-Aware Trajectory Self-Distillation for Fast and Accurate Diffusion LLM cs.CL · 2026-05-10 · unverdicted · none · ref 45
TAD improves the accuracy-parallelism trade-off in diffusion LLMs via temporal-aware self-distillation that applies hard labels to soon-to-be-decoded tokens and soft supervision to future tokens.
DARE: Diffusion Language Model Activation Reuse for Efficient Inference cs.LG · 2026-05-01 · unverdicted · none · ref 10
DARE reuses up to 87% of attention activations in diffusion LLMs through KV caching and output reuse, delivering 1.2x per-layer latency gains with average performance drops of 1.2-2.0%.
$R^2$-dLLM: Accelerating Diffusion Large Language Models via Spatio-Temporal Redundancy Reduction cs.CL · 2026-04-21 · unverdicted · none · ref 12
R²-dLLM reduces dLLM decoding steps by up to 75% via spatio-temporal redundancy reduction while keeping generation quality competitive.
DMax: Aggressive Parallel Decoding for dLLMs cs.LG · 2026-04-09 · conditional · none · ref 53 · 2 links
DMax uses On-Policy Uniform Training and Soft Parallel Decoding to enable aggressive parallelism in dLLMs, raising TPF on GSM8K from 2.04 to 5.47 and on MBPP from 2.71 to 5.86 while preserving accuracy.
A Comparative analysis of Layer-wise Representational Capacity in AR and Diffusion LLMs cs.CL · 2026-03-08 · unverdicted · none · ref 9
Diffusion language models form more global representations with early-layer redundancy compared to autoregressive models, allowing layer skipping for up to 18.75% FLOP savings while maintaining over 90% performance.
Demystifying MaskGIT Sampler and Beyond: Adaptive Order Selection in Masked Diffusion cs.LG · 2025-10-06 · unverdicted · none · ref 7
Theoretical analysis reveals MaskGIT's implicit temperature sampling in masked diffusion; proposes equivalent moment sampler and efficiency techniques for adaptive unmasking with image and text experiments.
PulseCol: Periodically Refreshed Column-Sparse Attention for Accelerating Diffusion Language Models cs.CL · 2026-05-20 · unverdicted · none · ref 17
PulseCol introduces periodically refreshed column-sparse attention to achieve up to 1.95x speedup over FlashAttention in diffusion LLMs with maintained model quality.
Elastic-dLLM: Position Preserving Context Compression and Augmentation of Diffusion LLMs cs.LG · 2026-05-18 · unverdicted · none · ref 18
Position-preserving MASK token compression reduces redundancy in diffusion LLMs to accelerate parallel decoding and enable context folding for longer sequences.
Orthrus: Memory-Efficient Parallel Token Generation via Dual-View Diffusion cs.LG · 2026-05-12 · unverdicted · none · ref 16 · 2 links
Orthrus unifies autoregressive LLMs and diffusion models via shared KV cache and consensus to enable up to 7.8x parallel token generation speedup with O(1) memory overhead and lossless results.
Consistent Diffusion Language Models cs.LG · 2026-04-30 · unverdicted · none · ref 50
CDLM trains denoisers to be path-invariant across stochastic posterior bridges in discrete diffusion, unifying prior methods and achieving new SOTA few-step text generation performance.
Stability-Weighted Decoding for Diffusion Language Models cs.CL · 2026-04-18 · unverdicted · none · ref 11
Stability-Weighted Decoding improves diffusion LLM accuracy by modulating token scores with temporal stability from KL divergence between prediction steps.
DualDiffusion: A Speculative Decoding Strategy for Masked Diffusion Models cs.LG · 2026-04-06 · unverdicted · none · ref 4
DualDiffusion combines a lightweight drafter using approximations with a full verifier to reduce generation steps in masked diffusion models while keeping accuracy on MMLU and GSM8K.
Efficient-DLM: From Autoregressive to Diffusion Language Models, and Beyond in Speed cs.CL · 2025-12-16 · unverdicted · none · ref 42
Efficient-DLM converts AR models to dLMs via block-wise causal attention and position-dependent masking, yielding higher accuracy and 2.7-4.5x throughput than Dream 7B and Qwen3 4B.
Diffusion Language Models Know the Answer Before Decoding cs.CL · 2025-08-27 · conditional · none · ref 14
DLMs show early answer convergence allowing Prophet to cut decoding steps by up to 3.4x on LLaDA-8B and Dream-7B while keeping output quality.
Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model cs.LG · 2025-05-29 · unverdicted · none · ref 14
Muddit is a unified discrete diffusion transformer that integrates strong visual priors from a pretrained text-to-image model with a lightweight text decoder to enable fast parallel generation across text and image modalities.
TIDE: Efficient and Lossless MoE Diffusion LLM Inference with I/O-aware Expert Offload cs.CL · 2026-05-19 · unverdicted · none · ref 14
TIDE schedules I/O-aware expert offloading for MoE diffusion LLMs by solving for an optimal refresh interval that exploits temporal stability of activations, yielding up to 1.5x throughput gain losslessly.

DKV-Cache: The cache for diffusion language models

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer