hub

Train for the worst, plan for the best: Understand- ing token ordering in masked diffusions.arXiv preprint arXiv:2502.06768, 2025a

URL https: //arxiv · 2025 · arXiv 2502.06768

14 Pith papers cite this work. Polarity classification is still indexing.

14 Pith papers citing it

read on arXiv browse 14 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Flow Reasoning Models: Scaling Reasoning Through Iterative Self-Refinement

cs.AI · 2026-06-28 · conditional · novelty 7.0

Flow models reach 99.2% Sudoku accuracy in 7 passes and 96.1% on out-of-distribution Sudoku-Extreme by selecting dynamically stable candidates and training with self-conditioning plus DPO to avoid failed outputs.

Adaptive Order Policies for Masked Diffusion

cs.LG · 2026-05-29 · unverdicted · novelty 7.0

A policy network learns to choose unmasking order in masked diffusion by reweighting the loss, outperforming random and heuristic baselines on ordering-sensitive tasks.

Machine Unlearning for Masked Diffusion Language Models

cs.CL · 2026-05-18 · unverdicted · novelty 7.0

MDU minimizes forward KL divergence from prompt-conditional to prompt-masked unconditional predictions at masked positions to unlearn knowledge in MDLMs while trading off privacy and utility via temperature scaling.

BadDLM: Backdooring Diffusion Language Models with Diverse Targets

cs.CR · 2026-05-10 · unverdicted · novelty 7.0

BadDLM implants effective backdoors in diffusion language models across concept, attribute, alignment, and payload targets by exploiting denoising dynamics while preserving clean performance.

Leveraging Pretrained Language Models as Energy Functions for Glauber Dynamics Text Diffusion

cs.LG · 2026-05-05 · unverdicted · novelty 7.0

Pretrained language models are used as energy functions for Glauber dynamics in discrete text diffusion, improving generation quality over prior diffusion LMs and matching autoregressive models on benchmarks and reasoning tasks.

Dream-Cubed: Controllable Generative Modeling in Minecraft by Training on Billions of Cubes

cs.CV · 2026-04-22 · unverdicted · novelty 7.0

Dream-Cubed releases a billion-scale voxel dataset and 3D diffusion models that generate controllable Minecraft worlds by operating directly on blocks.

NI Sampling: Accelerating Discrete Diffusion Sampling by Token Order Optimization

cs.LG · 2026-04-20 · unverdicted · novelty 7.0

NI Sampling accelerates discrete diffusion language models up to 14.3 times by training a neural indicator to select which tokens to sample at each step using a trajectory-preserving objective.

Improving Sampling for Masked Diffusion Models via Information Gain

cs.CL · 2026-02-20 · unverdicted · novelty 7.0

Info-Gain Sampler improves MDM decoding by using bidirectional information gain to reduce cumulative uncertainty, outperforming greedy samplers on reasoning accuracy and creative writing tasks.

Coevolutionary Continuous Discrete Diffusion: Make Your Diffusion Language Model a Latent Reasoner

cs.AI · 2025-10-03 · unverdicted · novelty 7.0

CCDD defines a joint multimodal diffusion on continuous representation space and discrete token space to combine expressivity with explicit token supervision for diffusion language models.

Fixed-Point Masked Generative Modeling

cs.LG · 2026-05-29 · unverdicted · novelty 6.0

FP-MGMs with consistency loss and three-state reuse (CoFRe) reduce parameters by up to 38.8% and improve low-budget perplexity and FID versus standard masked generative models on text and images.

Looped Diffusion Language Models

cs.LG · 2026-05-25 · conditional · novelty 6.0

LoopMDM loops early-middle layers in masked diffusion models to match same-size MDM performance with up to 3.3x fewer training FLOPs and outperform on reasoning tasks by up to 8.5 points on GSM8K.

Differences in Text Generated by Diffusion and Autoregressive Language Models

cs.CL · 2026-04-04 · unverdicted · novelty 6.0

DLMs exhibit lower n-gram entropy, higher semantic coherence, and higher semantic diversity than ARMs, primarily due to bidirectional context and remasking decoding strategies.

Learned Relay Representations for Forward-Thinking Discrete Diffusion Models

cs.LG · 2026-05-21

DVD: Discrete Voxel Diffusion for 3D Generation and Editing

cs.CV · 2026-05-08

citing papers explorer

Showing 14 of 14 citing papers.

Flow Reasoning Models: Scaling Reasoning Through Iterative Self-Refinement cs.AI · 2026-06-28 · conditional · none · ref 12
Flow models reach 99.2% Sudoku accuracy in 7 passes and 96.1% on out-of-distribution Sudoku-Extreme by selecting dynamically stable candidates and training with self-conditioning plus DPO to avoid failed outputs.
Adaptive Order Policies for Masked Diffusion cs.LG · 2026-05-29 · unverdicted · none · ref 125
A policy network learns to choose unmasking order in masked diffusion by reweighting the loss, outperforming random and heuristic baselines on ordering-sensitive tasks.
Machine Unlearning for Masked Diffusion Language Models cs.CL · 2026-05-18 · unverdicted · none · ref 19
MDU minimizes forward KL divergence from prompt-conditional to prompt-masked unconditional predictions at masked positions to unlearn knowledge in MDLMs while trading off privacy and utility via temperature scaling.
BadDLM: Backdooring Diffusion Language Models with Diverse Targets cs.CR · 2026-05-10 · unverdicted · none · ref 29
BadDLM implants effective backdoors in diffusion language models across concept, attribute, alignment, and payload targets by exploiting denoising dynamics while preserving clean performance.
Leveraging Pretrained Language Models as Energy Functions for Glauber Dynamics Text Diffusion cs.LG · 2026-05-05 · unverdicted · none · ref 5
Pretrained language models are used as energy functions for Glauber dynamics in discrete text diffusion, improving generation quality over prior diffusion LMs and matching autoregressive models on benchmarks and reasoning tasks.
Dream-Cubed: Controllable Generative Modeling in Minecraft by Training on Billions of Cubes cs.CV · 2026-04-22 · unverdicted · none · ref 19
Dream-Cubed releases a billion-scale voxel dataset and 3D diffusion models that generate controllable Minecraft worlds by operating directly on blocks.
NI Sampling: Accelerating Discrete Diffusion Sampling by Token Order Optimization cs.LG · 2026-04-20 · unverdicted · none · ref 7
NI Sampling accelerates discrete diffusion language models up to 14.3 times by training a neural indicator to select which tokens to sample at each step using a trajectory-preserving objective.
Improving Sampling for Masked Diffusion Models via Information Gain cs.CL · 2026-02-20 · unverdicted · none · ref 9
Info-Gain Sampler improves MDM decoding by using bidirectional information gain to reduce cumulative uncertainty, outperforming greedy samplers on reasoning accuracy and creative writing tasks.
Coevolutionary Continuous Discrete Diffusion: Make Your Diffusion Language Model a Latent Reasoner cs.AI · 2025-10-03 · unverdicted · none · ref 23
CCDD defines a joint multimodal diffusion on continuous representation space and discrete token space to combine expressivity with explicit token supervision for diffusion language models.
Fixed-Point Masked Generative Modeling cs.LG · 2026-05-29 · unverdicted · none · ref 38
FP-MGMs with consistency loss and three-state reuse (CoFRe) reduce parameters by up to 38.8% and improve low-budget perplexity and FID versus standard masked generative models on text and images.
Looped Diffusion Language Models cs.LG · 2026-05-25 · conditional · none · ref 32
LoopMDM loops early-middle layers in masked diffusion models to match same-size MDM performance with up to 3.3x fewer training FLOPs and outperform on reasoning tasks by up to 8.5 points on GSM8K.
Differences in Text Generated by Diffusion and Autoregressive Language Models cs.CL · 2026-04-04 · unverdicted · none · ref 16
DLMs exhibit lower n-gram entropy, higher semantic coherence, and higher semantic diversity than ARMs, primarily due to bidirectional context and remasking decoding strategies.
Learned Relay Representations for Forward-Thinking Discrete Diffusion Models cs.LG · 2026-05-21 · unreviewed · ref 15
DVD: Discrete Voxel Diffusion for 3D Generation and Editing cs.CV · 2026-05-08 · unreviewed · ref 52

Train for the worst, plan for the best: Understand- ing token ordering in masked diffusions.arXiv preprint arXiv:2502.06768, 2025a

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer