Advances in neural information processing systems , volume=

Diffusion-lm improves controllable text generation , author=

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

browse 5 citing papers

representative citing papers

Support Before Frequency in Discrete Diffusion

cs.LG · 2026-05-13 · unverdicted · novelty 7.0

Discrete diffusion models learn data support before frequencies because the exact reverse process decomposes edits into a dominant validity scale and a finer probability coefficient.

DARE: Diffusion Language Model Activation Reuse for Efficient Inference

cs.LG · 2026-05-01 · unverdicted · novelty 7.0

DARE reuses up to 87% of attention activations in diffusion LLMs through KV caching and output reuse, delivering 1.2x per-layer latency gains with average performance drops of 1.2-2.0%.

Simple Approximation and Derivative Free Inference-Time Scaling for Diffusion Models via Sequential Monte Carlo on Path Measures

stat.ML · 2026-05-18 · unverdicted · novelty 6.0

URGE performs unbiased inference-time scaling for diffusion models by attaching multiplicative path weights from Girsanov estimation and resampling trajectories, with a proven equivalence to prior particle-wise SMC schemes.

Consistent Diffusion Language Models

cs.LG · 2026-04-30 · unverdicted · novelty 6.0

CDLM trains denoisers to be path-invariant across stochastic posterior bridges in discrete diffusion, unifying prior methods and achieving new SOTA few-step text generation performance.

One Step Forward and K Steps Back: Better Reasoning with Denoising Recursion Models

cs.LG · 2026-04-20 · unverdicted · novelty 6.0

Denoising Recursion Models train multi-step noise reversal in looped transformers and outperform the prior Tiny Recursion Model on ARC-AGI.

citing papers explorer

Showing 5 of 5 citing papers.

Support Before Frequency in Discrete Diffusion cs.LG · 2026-05-13 · unverdicted · none · ref 32
Discrete diffusion models learn data support before frequencies because the exact reverse process decomposes edits into a dominant validity scale and a finer probability coefficient.
DARE: Diffusion Language Model Activation Reuse for Efficient Inference cs.LG · 2026-05-01 · unverdicted · none · ref 13
DARE reuses up to 87% of attention activations in diffusion LLMs through KV caching and output reuse, delivering 1.2x per-layer latency gains with average performance drops of 1.2-2.0%.
Simple Approximation and Derivative Free Inference-Time Scaling for Diffusion Models via Sequential Monte Carlo on Path Measures stat.ML · 2026-05-18 · unverdicted · none · ref 12
URGE performs unbiased inference-time scaling for diffusion models by attaching multiplicative path weights from Girsanov estimation and resampling trajectories, with a proven equivalence to prior particle-wise SMC schemes.
Consistent Diffusion Language Models cs.LG · 2026-04-30 · unverdicted · none · ref 21
CDLM trains denoisers to be path-invariant across stochastic posterior bridges in discrete diffusion, unifying prior methods and achieving new SOTA few-step text generation performance.
One Step Forward and K Steps Back: Better Reasoning with Denoising Recursion Models cs.LG · 2026-04-20 · unverdicted · none · ref 231
Denoising Recursion Models train multi-step noise reversal in looped transformers and outperform the prior Tiny Recursion Model on ARC-AGI.

Advances in neural information processing systems , volume=

fields

years

verdicts

representative citing papers

citing papers explorer