Discrete diffusion models learn data support before frequencies because the exact reverse process decomposes edits into a dominant validity scale and a finer probability coefficient.
Advances in neural information processing systems , volume=
5 Pith papers cite this work. Polarity classification is still indexing.
years
2026 5verdicts
UNVERDICTED 5representative citing papers
DARE reuses up to 87% of attention activations in diffusion LLMs through KV caching and output reuse, delivering 1.2x per-layer latency gains with average performance drops of 1.2-2.0%.
URGE performs unbiased inference-time scaling for diffusion models by attaching multiplicative path weights from Girsanov estimation and resampling trajectories, with a proven equivalence to prior particle-wise SMC schemes.
CDLM trains denoisers to be path-invariant across stochastic posterior bridges in discrete diffusion, unifying prior methods and achieving new SOTA few-step text generation performance.
Denoising Recursion Models train multi-step noise reversal in looped transformers and outperform the prior Tiny Recursion Model on ARC-AGI.
citing papers explorer
-
Support Before Frequency in Discrete Diffusion
Discrete diffusion models learn data support before frequencies because the exact reverse process decomposes edits into a dominant validity scale and a finer probability coefficient.
-
DARE: Diffusion Language Model Activation Reuse for Efficient Inference
DARE reuses up to 87% of attention activations in diffusion LLMs through KV caching and output reuse, delivering 1.2x per-layer latency gains with average performance drops of 1.2-2.0%.
-
Simple Approximation and Derivative Free Inference-Time Scaling for Diffusion Models via Sequential Monte Carlo on Path Measures
URGE performs unbiased inference-time scaling for diffusion models by attaching multiplicative path weights from Girsanov estimation and resampling trajectories, with a proven equivalence to prior particle-wise SMC schemes.
-
Consistent Diffusion Language Models
CDLM trains denoisers to be path-invariant across stochastic posterior bridges in discrete diffusion, unifying prior methods and achieving new SOTA few-step text generation performance.
-
One Step Forward and K Steps Back: Better Reasoning with Denoising Recursion Models
Denoising Recursion Models train multi-step noise reversal in looped transformers and outperform the prior Tiny Recursion Model on ARC-AGI.