DPRM introduces a Doob h-transform process reward module as a plug-in for token ordering in diffusion language models, with convergence proofs and empirical gains over confidence baselines especially on hard reasoning and scientific design tasks.
We run two configurations that 20 DPRM: A Plug-in Token-Ordering Module for Diffusion Language Models Figure 7.Per-rank accuracy comparison on GSM8K
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
DPRM: A Plug-in Doob h transform-induced Token-Ordering Module for Diffusion Language Models
DPRM introduces a Doob h-transform process reward module as a plug-in for token ordering in diffusion language models, with convergence proofs and empirical gains over confidence baselines especially on hard reasoning and scientific design tasks.