Diffusionbert: Improving generative masked language models with diffusion models,

· 2022 · arXiv 2211.15029

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

read on arXiv browse 11 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Large Language Diffusion Models

cs.CL · 2025-02-14 · unverdicted · novelty 8.0

LLaDA is a scalable diffusion-based language model that matches autoregressive LLMs like LLaMA3 8B on tasks and surpasses GPT-4o on reversal poem completion.

NI Sampling: Accelerating Discrete Diffusion Sampling by Token Order Optimization

cs.LG · 2026-04-20 · unverdicted · novelty 7.0

NI Sampling accelerates discrete diffusion language models up to 14.3 times by training a neural indicator to select which tokens to sample at each step using a trajectory-preserving objective.

Not All Denoising Steps Are Equal: Model Scheduling for Faster Masked Diffusion Language Models

cs.LG · 2026-02-04 · unverdicted · novelty 7.0

Early and late denoising steps in masked diffusion LMs are robust to smaller-model replacement, enabling 17% FLOPs reduction with modest generative quality loss.

Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding

cs.CL · 2025-05-28 · conditional · novelty 7.0

Fast-dLLM adds reusable KV cache blocks and selective parallel decoding to diffusion LLMs, closing most of the speed gap with autoregressive models without retraining.

Your Absorbing Discrete Diffusion Secretly Models the Conditional Distributions of Clean Data

cs.LG · 2024-06-06 · conditional · novelty 7.0

Absorbing discrete diffusion models the conditional distributions of clean data; reparameterizing yields a time-independent RADD that unifies with AO-ARMs and reaches SOTA perplexity among diffusion models on zero-shot language benchmarks.

Measuring Temporal Linguistic Emergence in Diffusion Language Models

cs.CL · 2026-04-25 · unverdicted · novelty 6.0

In diffusion language models, coarse linguistic labels stabilize earlier than exact token identity, uncertainty tracks correctness, and mid-trajectory states are most sensitive to perturbations.

Efficient-DLM: From Autoregressive to Diffusion Language Models, and Beyond in Speed

cs.CL · 2025-12-16 · unverdicted · novelty 6.0

Efficient-DLM converts AR models to dLMs via block-wise causal attention and position-dependent masking, yielding higher accuracy and 2.7-4.5x throughput than Dream 7B and Qwen3 4B.

AsyncVLA: Asynchronous Flow Matching for Vision-Language-Action Models

cs.RO · 2025-11-18 · unverdicted · novelty 6.0

AsyncVLA adds asynchronous flow matching and a confidence rater to VLA models so they can generate actions on flexible schedules and selectively refine low-confidence tokens before execution.

LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning

cs.LG · 2025-05-22 · conditional · novelty 6.0

LLaDA-V is a diffusion-based multimodal large language model that reaches competitive or state-of-the-art results on visual instruction tasks while using a non-autoregressive architecture.

GCDance: Genre-Controlled Music-Driven 3D Full Body Dance Generation

cs.GR · 2025-02-25 · unverdicted · novelty 6.0

GCDance is a text-and-music-conditioned diffusion framework that generates genre-consistent 3D dance sequences and reports better results than prior methods on FineDance and AIST++.

Variational Autoencoding Discrete Diffusion with Enhanced Dimensional Correlations Modeling

cs.LG · 2025-05-23 · unverdicted · novelty 5.0

VADD augments masked diffusion models with an auxiliary recognition model and variational inference to implicitly model inter-dimensional correlations, yielding higher-quality samples than standard MDMs at low denoising step counts on toy data, images, and text.

citing papers explorer

Showing 11 of 11 citing papers.

Large Language Diffusion Models cs.CL · 2025-02-14 · unverdicted · none · ref 61
LLaDA is a scalable diffusion-based language model that matches autoregressive LLMs like LLaMA3 8B on tasks and surpasses GPT-4o on reversal poem completion.
NI Sampling: Accelerating Discrete Diffusion Sampling by Token Order Optimization cs.LG · 2026-04-20 · unverdicted · none · ref 37
NI Sampling accelerates discrete diffusion language models up to 14.3 times by training a neural indicator to select which tokens to sample at each step using a trajectory-preserving objective.
Not All Denoising Steps Are Equal: Model Scheduling for Faster Masked Diffusion Language Models cs.LG · 2026-02-04 · unverdicted · none · ref 9
Early and late denoising steps in masked diffusion LMs are robust to smaller-model replacement, enabling 17% FLOPs reduction with modest generative quality loss.
Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding cs.CL · 2025-05-28 · conditional · none · ref 10
Fast-dLLM adds reusable KV cache blocks and selective parallel decoding to diffusion LLMs, closing most of the speed gap with autoregressive models without retraining.
Your Absorbing Discrete Diffusion Secretly Models the Conditional Distributions of Clean Data cs.LG · 2024-06-06 · conditional · none · ref 62
Absorbing discrete diffusion models the conditional distributions of clean data; reparameterizing yields a time-independent RADD that unifies with AO-ARMs and reaches SOTA perplexity among diffusion models on zero-shot language benchmarks.
Measuring Temporal Linguistic Emergence in Diffusion Language Models cs.CL · 2026-04-25 · unverdicted · none · ref 2
In diffusion language models, coarse linguistic labels stabilize earlier than exact token identity, uncertainty tracks correctness, and mid-trajectory states are most sensitive to perturbations.
Efficient-DLM: From Autoregressive to Diffusion Language Models, and Beyond in Speed cs.CL · 2025-12-16 · unverdicted · none · ref 1
Efficient-DLM converts AR models to dLMs via block-wise causal attention and position-dependent masking, yielding higher accuracy and 2.7-4.5x throughput than Dream 7B and Qwen3 4B.
AsyncVLA: Asynchronous Flow Matching for Vision-Language-Action Models cs.RO · 2025-11-18 · unverdicted · none · ref 23
AsyncVLA adds asynchronous flow matching and a confidence rater to VLA models so they can generate actions on flexible schedules and selectively refine low-confidence tokens before execution.
LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning cs.LG · 2025-05-22 · conditional · none · ref 36
LLaDA-V is a diffusion-based multimodal large language model that reaches competitive or state-of-the-art results on visual instruction tasks while using a non-autoregressive architecture.
GCDance: Genre-Controlled Music-Driven 3D Full Body Dance Generation cs.GR · 2025-02-25 · unverdicted · none · ref 32
GCDance is a text-and-music-conditioned diffusion framework that generates genre-consistent 3D dance sequences and reports better results than prior methods on FineDance and AIST++.
Variational Autoencoding Discrete Diffusion with Enhanced Dimensional Correlations Modeling cs.LG · 2025-05-23 · unverdicted · none · ref 5
VADD augments masked diffusion models with an auxiliary recognition model and variational inference to implicitly model inter-dimensional correlations, yielding higher-quality samples than standard MDMs at low denoising step counts on toy data, images, and text.

Diffusionbert: Improving generative masked language models with diffusion models,

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer