A 130M-parameter continuous bitstream diffusion model with entropy-gated Langevin sampling achieves GenPPL 59.76 on LM1B and 27.06 on OWT, closing the gap to autoregressive models at matched entropy with 256 NFEs.
Advances in Neural Information Processing Systems , year =
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Towards Closing the Autoregressive Gap in Language Modeling via Entropy-Gated Continuous Bitstream Diffusion
A 130M-parameter continuous bitstream diffusion model with entropy-gated Langevin sampling achieves GenPPL 59.76 on LM1B and 27.06 on OWT, closing the gap to autoregressive models at matched entropy with 256 NFEs.