DMax uses On-Policy Uniform Training and Soft Parallel Decoding to enable aggressive parallelism in dLLMs, raising TPF on GSM8K from 2.04 to 5.47 and on MBPP from 2.71 to 5.86 while preserving accuracy.
Wedlm: Reconciling diffusion language models with standard causal atten- tion for fast inference.arXiv preprint arXiv:2512.22737
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 4roles
background 1polarities
background 1representative citing papers
Info-Gain Sampler improves MDM decoding by using bidirectional information gain to reduce cumulative uncertainty, outperforming greedy samplers on reasoning accuracy and creative writing tasks.
TEAM accelerates MoE dLLMs up to 2.2x by exploiting temporal-spatial consistency in expert routing to accept more tokens with fewer activations.
Extends Tweedie's formulae to GBM, BESQ, and CIR processes to enable non-Gaussian diffusion generative models and empirical Bayes applications.
citing papers explorer
-
DMax: Aggressive Parallel Decoding for dLLMs
DMax uses On-Policy Uniform Training and Soft Parallel Decoding to enable aggressive parallelism in dLLMs, raising TPF on GSM8K from 2.04 to 5.47 and on MBPP from 2.71 to 5.86 while preserving accuracy.