Improving Sampling for Masked Diffusion Models via Information Gain

· 2026 · cs.CL · arXiv 2602.18176

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

Masked Diffusion Models (MDMs) enable flexible decoding orders, yet existing samplers remain largely greedy, selecting locally certain tokens without accounting for their downstream effects. We show that this myopia can increase cumulative uncertainty and lead to suboptimal generation. To address this, we propose the **Info-Gain Sampler**, a training-free decoding method that uses the bidirectional structure of MDMs to balance immediate uncertainty with the information gained over remaining masked positions. Across reasoning, coding, creative writing, and image generation tasks, Info-Gain Sampler consistently outperforms existing MDM samplers, improving average reasoning accuracy by 2.9--11.6 percentage points and achieving a 62.8% average win rate in creative writing. The code is available at https://github.com/yks23/Information-Gain-Sampler.

representative citing papers

Looped Diffusion Language Models

cs.LG · 2026-05-25 · conditional · novelty 6.0

LoopMDM loops early-middle layers in masked diffusion models to match same-size MDM performance with up to 3.3x fewer training FLOPs and outperform on reasoning tasks by up to 8.5 points on GSM8K.

Decoding in Order-Agnostic Language Models: Chain-Rule Deviation and Uniform Spreading

cs.CL · 2026-05-31 · unverdicted · novelty 5.0

OALMs exhibit order-dependent likelihoods up to 0.49 nats/token and a uniform confidence spread maximizes recoverability, motivating Var(log q_t) as a decoding diagnostic.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Looped Diffusion Language Models cs.LG · 2026-05-25 · conditional · none · ref 73 · internal anchor
LoopMDM loops early-middle layers in masked diffusion models to match same-size MDM performance with up to 3.3x fewer training FLOPs and outperform on reasoning tasks by up to 8.5 points on GSM8K.

Improving Sampling for Masked Diffusion Models via Information Gain

fields

years

verdicts

representative citing papers

citing papers explorer