Attention is all you need for kv cache in diffusion llms

Attention is all you need for kv cache in diffusion llms , author= · 2025 · arXiv 2510.14973

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

representative citing papers

Set Diffusion: Interpolating Token Orderings Between Autoregression and Diffusion for Fast and Flexible Decoding

cs.LG · 2026-07-02 · unverdicted · novelty 7.0

Set diffusion factorizes likelihood over arbitrary token sets and uses a set-causal diffusion architecture to support KV caching and any-order decoding, yielding improved speed-quality tradeoffs versus prior diffusion LMs.

AsyncLane: Decoupling Refinement from Advancement in Diffusion Language Model Decoding

cs.CL · 2026-06-07 · unverdicted · novelty 7.0

AsyncLane decouples refinement from advancement in DLM decoding via lane forking at delimiters plus efficiency optimizations, yielding up to 3x throughput gains on math and code benchmarks without retraining.

Elastic-dLLM: Position Preserving Context Compression and Augmentation of Diffusion LLMs

cs.LG · 2026-05-18 · unverdicted · novelty 6.0

Position-preserving MASK token compression reduces redundancy in diffusion LLMs to accelerate parallel decoding and enable context folding for longer sequences.

WaveFilter: Enhancing the Long-Context Capability of Diffusion LLMs via Wavelet-Guided KV Cache Filtering

cs.CL · 2026-05-30 · unverdicted · novelty 5.0

WaveFilter applies wavelet decomposition to filter critical tokens for sparse KV caching, improving long-context performance of diffusion LLMs as a plug-and-play addition to existing methods.

Video-Rate Streaming Stylization on a Vision-Aware MLLM-Conditioned Edit Diffusion: Asymmetric Batched Inference on a Distilled UNet + MLLM Text Encoder

cs.CV · 2026-06-04 · unverdicted · novelty 4.0

Reports a streaming pipeline with asymmetric CUDA pipelining and batched MLLM amortization that sustains 27.4 fps at 512x512 on RTX 3090 Ti for oil-painting stylization.

citing papers explorer

Showing 5 of 5 citing papers after filters.

Set Diffusion: Interpolating Token Orderings Between Autoregression and Diffusion for Fast and Flexible Decoding cs.LG · 2026-07-02 · unverdicted · none · ref 148
Set diffusion factorizes likelihood over arbitrary token sets and uses a set-causal diffusion architecture to support KV caching and any-order decoding, yielding improved speed-quality tradeoffs versus prior diffusion LMs.
AsyncLane: Decoupling Refinement from Advancement in Diffusion Language Model Decoding cs.CL · 2026-06-07 · unverdicted · none · ref 12
AsyncLane decouples refinement from advancement in DLM decoding via lane forking at delimiters plus efficiency optimizations, yielding up to 3x throughput gains on math and code benchmarks without retraining.
Elastic-dLLM: Position Preserving Context Compression and Augmentation of Diffusion LLMs cs.LG · 2026-05-18 · unverdicted · none · ref 20
Position-preserving MASK token compression reduces redundancy in diffusion LLMs to accelerate parallel decoding and enable context folding for longer sequences.
WaveFilter: Enhancing the Long-Context Capability of Diffusion LLMs via Wavelet-Guided KV Cache Filtering cs.CL · 2026-05-30 · unverdicted · none · ref 19
WaveFilter applies wavelet decomposition to filter critical tokens for sparse KV caching, improving long-context performance of diffusion LLMs as a plug-and-play addition to existing methods.
Video-Rate Streaming Stylization on a Vision-Aware MLLM-Conditioned Edit Diffusion: Asymmetric Batched Inference on a Distilled UNet + MLLM Text Encoder cs.CV · 2026-06-04 · unverdicted · none · ref 20
Reports a streaming pipeline with asymmetric CUDA pipelining and batched MLLM amortization that sustains 27.4 fps at 512x512 on RTX 3090 Ti for oil-painting stylization.

Attention is all you need for kv cache in diffusion llms

fields

years

verdicts

representative citing papers

citing papers explorer