pith. sign in

Discrete Stochastic Localization for Non-autoregressive Generation

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it
abstract

Continuous diffusion is a natural framework for non-autoregressive generation but has generally lagged behind masked discrete diffusion models (MDMs) on discrete sequence generation. We argue that the bottleneck is not continuity itself, but a representation in which denoising depends on timestep-indexed noise regimes. We introduce \emph{Discrete Stochastic Localization} (DSL), a continuous-state framework with unit-sphere token embeddings whose Bayes-optimal denoiser is invariant to the nominal signal-to-noise ratio (SNR) under the localization channel. One trained network then supports an entire family of per-token SNR paths, with endpoint masked-diffusion paths as a special case. Fine-tuning a pretrained MDLM checkpoint with DSL substantially improves distributional faithfulness (MAUVE) on OpenWebText across all step budgets from $T{=}128$ to $T{=}1024$, and the same checkpoint supports random-order autoregressive sampling, as well as a hybrid continuous-then-discrete sampler using as few as T=48 total steps -- without distillation or retraining.

fields

cs.CL 1

years

2026 1

verdicts

UNVERDICTED 1

clear filters

representative citing papers

DSL-LLaDA: Scaling Continuous Denoising to 8B Masked Diffusion LMs

cs.CL · 2026-05-31 · unverdicted · novelty 6.0

Adapting LLaDA-8B-Instruct via Discrete Stochastic Localization with continuous per-token Gaussian noise yields continuous denoising that achieves top ROUGE-1 on zero-shot summarization at low step budgets and adds selective noisy-state robustness.

citing papers explorer

Showing 1 of 1 citing paper after filters.

  • DSL-LLaDA: Scaling Continuous Denoising to 8B Masked Diffusion LMs cs.CL · 2026-05-31 · unverdicted · none · ref 10 · internal anchor

    Adapting LLaDA-8B-Instruct via Discrete Stochastic Localization with continuous per-token Gaussian noise yields continuous denoising that achieves top ROUGE-1 on zero-shot summarization at low step budgets and adds selective noisy-state robustness.