Parallelbench: Understanding the trade-offs of parallel decoding in diffusion llms

Wonjun Kang, Kevin Galim, Seunghyuk Oh, Minjae Lee, Yuchen Zeng, Shuibai Zhang, Coleman Hooper, Yuezhou Hu, Hyung Il Koo, Nam Ik Cho, et al · 2025 · arXiv 2510.04767

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Kernel-Gradient Drifting Models

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

Kernel-gradient drifting reformulates drifting models via kernel gradients to yield identifiable one-step generation with smoothed score matching and KL descent on Euclidean, Riemannian, and discrete spaces.

TAD: Temporal-Aware Trajectory Self-Distillation for Fast and Accurate Diffusion LLM

cs.CL · 2026-05-10 · unverdicted · novelty 7.0

TAD improves the accuracy-parallelism trade-off in diffusion LLMs via temporal-aware self-distillation that applies hard labels to soon-to-be-decoded tokens and soft supervision to future tokens.

Remask, Don't Replace: Token-to-Mask Refinement in Diffusion Large Language Models

cs.CL · 2026-04-20 · unverdicted · novelty 7.0

Token-to-Mask remasking improves self-correction in diffusion LLMs by resetting erroneous commitments to masks rather than overwriting them, yielding +13.33 points on AIME 2025 and +8.56 on CMATH.

Flow Map Language Models: One-step Language Modeling via Continuous Denoising

cs.CL · 2026-02-18 · conditional · novelty 6.0 · 2 refs

Continuous flows on token embeddings with flow-map distillation produce one-step language models whose quality exceeds recent 8-step discrete diffusion baselines on LM1B and OpenWebText.

citing papers explorer

Showing 4 of 4 citing papers.

Kernel-Gradient Drifting Models cs.LG · 2026-05-11 · unverdicted · none · ref 46
Kernel-gradient drifting reformulates drifting models via kernel gradients to yield identifiable one-step generation with smoothed score matching and KL descent on Euclidean, Riemannian, and discrete spaces.
TAD: Temporal-Aware Trajectory Self-Distillation for Fast and Accurate Diffusion LLM cs.CL · 2026-05-10 · unverdicted · none · ref 6
TAD improves the accuracy-parallelism trade-off in diffusion LLMs via temporal-aware self-distillation that applies hard labels to soon-to-be-decoded tokens and soft supervision to future tokens.
Remask, Don't Replace: Token-to-Mask Refinement in Diffusion Large Language Models cs.CL · 2026-04-20 · unverdicted · none · ref 12
Token-to-Mask remasking improves self-correction in diffusion LLMs by resetting erroneous commitments to masks rather than overwriting them, yielding +13.33 points on AIME 2025 and +8.56 on CMATH.
Flow Map Language Models: One-step Language Modeling via Continuous Denoising cs.CL · 2026-02-18 · conditional · none · ref 11 · 2 links
Continuous flows on token embeddings with flow-map distillation produce one-step language models whose quality exceeds recent 8-step discrete diffusion baselines on LM1B and OpenWebText.

Parallelbench: Understanding the trade-offs of parallel decoding in diffusion llms

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer