Entropycache: Decoded token entropy guided kv caching for diffusion language models.arXiv preprint arXiv:2603.18489, 2026

Minsoo Cheong, Donghyun Son, Woosang Lim, Sungjoo Yoo · 2026 · arXiv 2603.18489

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

Improved Large Language Diffusion Models

cs.CL · 2026-06-24 · unverdicted · novelty 6.0

iLLaDA is an 8B masked diffusion LM trained from scratch with bidirectional attention, reporting gains of 14-21 points on BBH, ARC, MATH and HumanEval over prior diffusion models while remaining competitive with Qwen2.5-7B.

citing papers explorer

Showing 1 of 1 citing paper.

Improved Large Language Diffusion Models cs.CL · 2026-06-24 · unverdicted · none · ref 29
iLLaDA is an 8B masked diffusion LM trained from scratch with bidirectional attention, reporting gains of 14-21 points on BBH, ARC, MATH and HumanEval over prior diffusion models while remaining competitive with Qwen2.5-7B.

Entropycache: Decoded token entropy guided kv caching for diffusion language models.arXiv preprint arXiv:2603.18489, 2026

fields

years

verdicts

representative citing papers

citing papers explorer