Set diffusion factorizes likelihood over arbitrary token sets and uses a set-causal diffusion architecture to support KV caching and any-order decoding, yielding improved speed-quality tradeoffs versus prior diffusion LMs.
Attention is all you need for kv cache in diffusion llms
5 Pith papers cite this work. Polarity classification is still indexing.
years
2026 5verdicts
UNVERDICTED 5representative citing papers
AsyncLane decouples refinement from advancement in DLM decoding via lane forking at delimiters plus efficiency optimizations, yielding up to 3x throughput gains on math and code benchmarks without retraining.
Position-preserving MASK token compression reduces redundancy in diffusion LLMs to accelerate parallel decoding and enable context folding for longer sequences.
WaveFilter applies wavelet decomposition to filter critical tokens for sparse KV caching, improving long-context performance of diffusion LLMs as a plug-and-play addition to existing methods.
Reports a streaming pipeline with asymmetric CUDA pipelining and batched MLLM amortization that sustains 27.4 fps at 512x512 on RTX 3090 Ti for oil-painting stylization.
citing papers explorer
-
Set Diffusion: Interpolating Token Orderings Between Autoregression and Diffusion for Fast and Flexible Decoding
Set diffusion factorizes likelihood over arbitrary token sets and uses a set-causal diffusion architecture to support KV caching and any-order decoding, yielding improved speed-quality tradeoffs versus prior diffusion LMs.
-
AsyncLane: Decoupling Refinement from Advancement in Diffusion Language Model Decoding
AsyncLane decouples refinement from advancement in DLM decoding via lane forking at delimiters plus efficiency optimizations, yielding up to 3x throughput gains on math and code benchmarks without retraining.
-
Elastic-dLLM: Position Preserving Context Compression and Augmentation of Diffusion LLMs
Position-preserving MASK token compression reduces redundancy in diffusion LLMs to accelerate parallel decoding and enable context folding for longer sequences.
-
WaveFilter: Enhancing the Long-Context Capability of Diffusion LLMs via Wavelet-Guided KV Cache Filtering
WaveFilter applies wavelet decomposition to filter critical tokens for sparse KV caching, improving long-context performance of diffusion LLMs as a plug-and-play addition to existing methods.
-
Video-Rate Streaming Stylization on a Vision-Aware MLLM-Conditioned Edit Diffusion: Asymmetric Batched Inference on a Distilled UNet + MLLM Text Encoder
Reports a streaming pipeline with asymmetric CUDA pipelining and batched MLLM amortization that sustains 27.4 fps at 512x512 on RTX 3090 Ti for oil-painting stylization.