From next-token to next-block: A principled adaptation path for diffusion llms

Tian, Y · 2025 · arXiv 2512.06776

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

BlockVLA: Accelerating Autoregressive VLA via Block Diffusion Finetuning

cs.RO · 2026-05-13 · unverdicted · novelty 7.0

BlockVLA accelerates autoregressive VLA models by 3.3x using block diffusion finetuning, with faster training convergence and better early performance on long-horizon robotic tasks.

BARD: Bridging AutoRegressive and Diffusion Vision-Language Models Via Highly Efficient Progressive Block Merging and Stage-Wise Distillation

cs.CV · 2026-04-15 · unverdicted · novelty 7.0

BARD bridges autoregressive and diffusion VLMs with progressive block merging plus stage-wise intra-diffusion distillation, delivering 3x speedup and new SOTA on open dVLMs using under 4.4M data points.

DMax: Aggressive Parallel Decoding for dLLMs

cs.LG · 2026-04-09 · conditional · novelty 7.0 · 2 refs

DMax uses On-Policy Uniform Training and Soft Parallel Decoding to enable aggressive parallelism in dLLMs, raising TPF on GSM8K from 2.04 to 5.47 and on MBPP from 2.71 to 5.86 while preserving accuracy.

TEAM: Temporal-Spatial Consistency Guided Expert Activation for MoE Diffusion Language Model Acceleration

cs.CL · 2026-02-09 · unverdicted · novelty 7.0

TEAM accelerates MoE dLLMs up to 2.2x by exploiting temporal-spatial consistency in expert routing to accept more tokens with fewer activations.

Orthrus: Memory-Efficient Parallel Token Generation via Dual-View Diffusion

cs.LG · 2026-05-12 · unverdicted · novelty 6.0 · 2 refs

Orthrus unifies autoregressive LLMs and diffusion models via shared KV cache and consensus to enable up to 7.8x parallel token generation speedup with O(1) memory overhead and lossless results.

Parallelism and Generation Order in Masked Diffusion Language Models: Limits Today, Potential Tomorrow

cs.CL · 2026-01-22 · unverdicted · novelty 5.0

MDLMs lag autoregressive models in performance because parallel modeling weakens inter-token dependencies, yet they adapt generation order to task demands and show promise in a generate-then-edit paradigm.

citing papers explorer

Showing 6 of 6 citing papers.

BlockVLA: Accelerating Autoregressive VLA via Block Diffusion Finetuning cs.RO · 2026-05-13 · unverdicted · none · ref 16
BlockVLA accelerates autoregressive VLA models by 3.3x using block diffusion finetuning, with faster training convergence and better early performance on long-horizon robotic tasks.
BARD: Bridging AutoRegressive and Diffusion Vision-Language Models Via Highly Efficient Progressive Block Merging and Stage-Wise Distillation cs.CV · 2026-04-15 · unverdicted · none · ref 16
BARD bridges autoregressive and diffusion VLMs with progressive block merging plus stage-wise intra-diffusion distillation, delivering 3x speedup and new SOTA on open dVLMs using under 4.4M data points.
DMax: Aggressive Parallel Decoding for dLLMs cs.LG · 2026-04-09 · conditional · none · ref 75 · 2 links
DMax uses On-Policy Uniform Training and Soft Parallel Decoding to enable aggressive parallelism in dLLMs, raising TPF on GSM8K from 2.04 to 5.47 and on MBPP from 2.71 to 5.86 while preserving accuracy.
TEAM: Temporal-Spatial Consistency Guided Expert Activation for MoE Diffusion Language Model Acceleration cs.CL · 2026-02-09 · unverdicted · none · ref 22
TEAM accelerates MoE dLLMs up to 2.2x by exploiting temporal-spatial consistency in expert routing to accept more tokens with fewer activations.
Orthrus: Memory-Efficient Parallel Token Generation via Dual-View Diffusion cs.LG · 2026-05-12 · unverdicted · none · ref 18 · 2 links
Orthrus unifies autoregressive LLMs and diffusion models via shared KV cache and consensus to enable up to 7.8x parallel token generation speedup with O(1) memory overhead and lossless results.
Parallelism and Generation Order in Masked Diffusion Language Models: Limits Today, Potential Tomorrow cs.CL · 2026-01-22 · unverdicted · none · ref 6
MDLMs lag autoregressive models in performance because parallel modeling weakens inter-token dependencies, yet they adapt generation order to task demands and show promise in a generate-then-edit paradigm.

From next-token to next-block: A principled adaptation path for diffusion llms

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer