super hub Canonical reference

Large Language Diffusion Models

Fengqi Zhu, Jingyang Ou, Jun Hu, Shen Nie, Xiaolu Zhang, Zebin You · 2025 · cs.CL · arXiv 2502.09992

Canonical reference. 72% of citing Pith papers cite this work as background.

133 Pith papers citing it

Background 72% of classified citations

open full Pith review browse 133 citing papers more from Fengqi Zhu arXiv PDF

abstract

The capabilities of large language models (LLMs) are widely regarded as relying on autoregressive models (ARMs). We challenge this notion by introducing LLaDA, a diffusion model trained from scratch under the pre-training and supervised fine-tuning (SFT) paradigm. LLaDA employs a forward data masking process and a reverse generation process, parameterized by a Transformer to predict masked tokens. It provides a principled generative approach for probabilistic inference by optimizing a likelihood lower bound. Across extensive benchmarks on general tasks, math, code, and so on, LLaDA demonstrates strong scalability and performs comparably to our self-constructed ARM baselines. Remarkably, LLaDA 8B is competitive with strong LLMs like LLaMA3 8B in in-context learning and, after SFT, exhibits impressive instruction-following abilities in case studies such as multi-turn dialogue. Moreover, LLaDA addresses the reversal curse, surpassing GPT-4o in a reversal poem completion task. Our findings show the promise of diffusion models for language modeling at scale and challenge the common assumption that core LLM capabilities discussed above inherently depend on ARMs. Project page and codes: https://ml-gsai.github.io/LLaDA-demo/.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 22 method 5 baseline 2

citation-polarity summary

background 21 use method 5 baseline 2 unclear 1

claims ledger

abstract The capabilities of large language models (LLMs) are widely regarded as relying on autoregressive models (ARMs). We challenge this notion by introducing LLaDA, a diffusion model trained from scratch under the pre-training and supervised fine-tuning (SFT) paradigm. LLaDA employs a forward data masking process and a reverse generation process, parameterized by a Transformer to predict masked tokens. It provides a principled generative approach for probabilistic inference by optimizing a likelihood lower bound. Across extensive benchmarks on general tasks, math, code, and so on, LLaDA demonstrate

authors

Fengqi Zhu Jingyang Ou Jun Hu Shen Nie Xiaolu Zhang Zebin You

co-cited works

representative citing papers

NPU Design for Diffusion Language Model Inference

cs.AR · 2026-01-28 · unverdicted · novelty 8.0

Introduces the first NPU accelerator for diffusion language models with dLLM-specific ISA, hardware execution model, BAOS KV quantization, and 7nm RTL synthesis.

AsyncLane: Decoupling Refinement from Advancement in Diffusion Language Model Decoding

cs.CL · 2026-06-07 · unverdicted · novelty 7.0

AsyncLane decouples refinement from advancement in DLM decoding via lane forking at delimiters plus efficiency optimizations, yielding up to 3x throughput gains on math and code benchmarks without retraining.

TAPS: Target-Aware Prefix Tree Selection for Diffusion-Drafted Speculative Decoding

cs.AI · 2026-05-30 · unverdicted · novelty 7.0

TAPS converts diffusion marginal probabilities into path-conditioned acceptance estimates to select prefix-closed subtrees under a fixed verification budget, achieving up to 7.9x end-to-end speedup over autoregressive decoding.

Adaptive Order Policies for Masked Diffusion

cs.LG · 2026-05-29 · unverdicted · novelty 7.0

A policy network learns to choose unmasking order in masked diffusion by reweighting the loss, outperforming random and heuristic baselines on ordering-sensitive tasks.

Bastion: Budget-Aware Speculative Decoding with Tree-structured Block Diffusion Drafting

cs.LG · 2026-05-28 · unverdicted · novelty 7.0

BASTION is a budget-aware speculative decoding framework with adaptive tree-structured block diffusion drafting that reports up to 6.61x speedup and 39% improvement over block-diffusion baselines.

Mind-Omni: A Unified Multi-Task Framework for Brain-Vision-Language Modeling via Discrete Diffusion

cs.AI · 2026-05-28 · unverdicted · novelty 7.0

Mind-Omni unifies seven brain-vision-language tasks in one discrete-diffusion framework with a brain tokenizer and a new BQA dataset, claiming SOTA multi-task performance competitive with larger single-task models.

Uniform Diffusion Models Revisited: Leave-One-Out Denoiser and Absorbing State Reformulation

cs.LG · 2026-05-21 · unverdicted · novelty 7.0

Uniform diffusion models rely on a leave-one-out denoiser rather than the usual denoising posterior, with exact conversions derived; an absorbing-state reformulation is introduced that matches or exceeds masked diffusion on language modeling while preserving the original joint distribution.

Drifting Objectives for Refining Discrete Diffusion Language Models

cs.CL · 2026-05-19 · unverdicted · novelty 7.0

TokenDrift refines discrete diffusion language models by applying anti-symmetric drifting to soft-token features during training, yielding large reductions in generation perplexity at low NFEs.

Machine Unlearning for Masked Diffusion Language Models

cs.CL · 2026-05-18 · unverdicted · novelty 7.0

MDU minimizes forward KL divergence from prompt-conditional to prompt-masked unconditional predictions at masked positions to unlearn knowledge in MDLMs while trading off privacy and utility via temperature scaling.

Constrained Code Generation with Discrete Diffusion

cs.CL · 2026-05-16 · unverdicted · novelty 7.0

Constrained Diffusion for Code (CDC) integrates constraint satisfaction into the reverse denoising process of discrete diffusion models via constraint-aware operators that use optimization and program analysis to steer generation toward feasible programs.

Dynamic Chunking for Diffusion Language Models

cs.CL · 2026-05-15 · unverdicted · novelty 7.0

DCDM replaces positional blocks with learnable semantic chunks via differentiable Chunking Attention, yielding consistent gains over block and unstructured diffusion baselines up to 1.5B parameters.

PSD: Pushing the Pareto Frontier of Diffusion LLMs via Parallel Speculative Decoding

cs.CL · 2026-05-15 · unverdicted · novelty 7.0

PSD is a training-free framework that jointly optimizes spatial unmasking and temporal speculative decoding in diffusion LLMs to reach up to 5.5x tokens per forward pass while preserving accuracy comparable to greedy decoding.

From Table to Cell: Attention for Better Reasoning with TABALIGN

cs.AI · 2026-05-14 · unverdicted · novelty 7.0

TABALIGN pairs a diffusion language model planner emitting binary cell masks with a trained attention verifier, raising average accuracy 15.76 points over strong baselines on eight table benchmarks while speeding execution 44.64%.

Factorization-Error-Free Discrete Diffusion Language Model via Speculative Decoding

cs.CL · 2026-05-14 · unverdicted · novelty 7.0

FeF-DLLM achieves factorization-error-free generation in discrete diffusion language models via prefix-conditioned posterior factorization and speculative decoding, delivering 5.04 pp higher accuracy and 3.86x faster inference on GSM8K, MATH, HumanEval, and MBPP.

Support Before Frequency in Discrete Diffusion

cs.LG · 2026-05-13 · unverdicted · novelty 7.0

Discrete diffusion models learn data support before frequencies because the exact reverse process decomposes edits into a dominant validity scale and a finer probability coefficient.

Beyond Mode-Seeking RL: Trajectory-Balance Post-Training for Diffusion Language Models

cs.LG · 2026-05-13 · conditional · novelty 7.0

TraFL applies trajectory flow balancing to post-train diffusion language models, preventing mode collapse and delivering consistent gains on reasoning tasks that hold under increased sampling.

AIS: Adaptive Importance Sampling for Quantized RL

stat.ML · 2026-05-13 · unverdicted · novelty 7.0

AIS adaptively corrects non-stationary policy gradient bias in quantized LLM RL, matching BF16 performance while retaining 1.5-2.76x FP8 rollout speedup.

Self-Distilled Trajectory-Aware Boltzmann Modeling: Bridging the Training-Inference Discrepancy in Diffusion Language Models

cs.CL · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

TABOM is a trajectory-aligned Boltzmann modeling framework that turns self-distilled inference paths into a pairwise ranking loss to close the training-inference gap in diffusion language models and expand their effective capabilities.

Block-R1: Rethinking the Role of Block Size in Multi-domain Reinforcement Learning for Diffusion Large Language Models

cs.LG · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

Introduces Block-R1 benchmark, Block-R1-41K dataset, and a conflict score to handle domain-specific optimal block sizes in RL post-training of diffusion LLMs.

UniRank: Unified List-wise Reranking via Confidence-Ordered Denoising

cs.IR · 2026-05-11 · unverdicted · novelty 7.0

UniRank unifies autoregressive and non-autoregressive list-wise reranking via bidirectional modeling in a confidence-ordered iterative denoising process, outperforming baselines on datasets and online tests.

TAD: Temporal-Aware Trajectory Self-Distillation for Fast and Accurate Diffusion LLM

cs.CL · 2026-05-10 · unverdicted · novelty 7.0

TAD improves the accuracy-parallelism trade-off in diffusion LLMs via temporal-aware self-distillation that applies hard labels to soon-to-be-decoded tokens and soft supervision to future tokens.

BadDLM: Backdooring Diffusion Language Models with Diverse Targets

cs.CR · 2026-05-10 · unverdicted · novelty 7.0

BadDLM implants effective backdoors in diffusion language models across concept, attribute, alignment, and payload targets by exploiting denoising dynamics while preserving clean performance.

Discrete Langevin-Inspired Posterior Sampling

cs.LG · 2026-05-10 · unverdicted · novelty 7.0

ΔLPS is a gradient-guided discrete posterior sampler for inverse problems that works with masked or uniform discrete diffusion priors and outperforms prior discrete methods on image restoration tasks.

LEAP: Unlocking dLLM Parallelism via Lookahead Early-Convergence Token Detection

cs.LG · 2026-05-09 · unverdicted · novelty 7.0

LEAP detects early-converging tokens in dLLMs via future context filtering and multi-sequence superposition, reducing average denoising steps by about 30% while maintaining accuracy.

citing papers explorer

Showing 50 of 133 citing papers.

NPU Design for Diffusion Language Model Inference cs.AR · 2026-01-28 · unverdicted · none · ref 4 · internal anchor
Introduces the first NPU accelerator for diffusion language models with dLLM-specific ISA, hardware execution model, BAOS KV quantization, and 7nm RTL synthesis.
AsyncLane: Decoupling Refinement from Advancement in Diffusion Language Model Decoding cs.CL · 2026-06-07 · unverdicted · none · ref 14 · internal anchor
AsyncLane decouples refinement from advancement in DLM decoding via lane forking at delimiters plus efficiency optimizations, yielding up to 3x throughput gains on math and code benchmarks without retraining.
TAPS: Target-Aware Prefix Tree Selection for Diffusion-Drafted Speculative Decoding cs.AI · 2026-05-30 · unverdicted · none · ref 28 · internal anchor
TAPS converts diffusion marginal probabilities into path-conditioned acceptance estimates to select prefix-closed subtrees under a fixed verification budget, achieving up to 7.9x end-to-end speedup over autoregressive decoding.
Adaptive Order Policies for Masked Diffusion cs.LG · 2026-05-29 · unverdicted · none · ref 76 · internal anchor
A policy network learns to choose unmasking order in masked diffusion by reweighting the loss, outperforming random and heuristic baselines on ordering-sensitive tasks.
Bastion: Budget-Aware Speculative Decoding with Tree-structured Block Diffusion Drafting cs.LG · 2026-05-28 · unverdicted · none · ref 48 · internal anchor
BASTION is a budget-aware speculative decoding framework with adaptive tree-structured block diffusion drafting that reports up to 6.61x speedup and 39% improvement over block-diffusion baselines.
Mind-Omni: A Unified Multi-Task Framework for Brain-Vision-Language Modeling via Discrete Diffusion cs.AI · 2026-05-28 · unverdicted · none · ref 55 · internal anchor
Mind-Omni unifies seven brain-vision-language tasks in one discrete-diffusion framework with a brain tokenizer and a new BQA dataset, claiming SOTA multi-task performance competitive with larger single-task models.
Uniform Diffusion Models Revisited: Leave-One-Out Denoiser and Absorbing State Reformulation cs.LG · 2026-05-21 · unverdicted · none · ref 22 · internal anchor
Uniform diffusion models rely on a leave-one-out denoiser rather than the usual denoising posterior, with exact conversions derived; an absorbing-state reformulation is introduced that matches or exceeds masked diffusion on language modeling while preserving the original joint distribution.
Drifting Objectives for Refining Discrete Diffusion Language Models cs.CL · 2026-05-19 · unverdicted · none · ref 7 · internal anchor
TokenDrift refines discrete diffusion language models by applying anti-symmetric drifting to soft-token features during training, yielding large reductions in generation perplexity at low NFEs.
Machine Unlearning for Masked Diffusion Language Models cs.CL · 2026-05-18 · unverdicted · none · ref 2 · internal anchor
MDU minimizes forward KL divergence from prompt-conditional to prompt-masked unconditional predictions at masked positions to unlearn knowledge in MDLMs while trading off privacy and utility via temperature scaling.
Constrained Code Generation with Discrete Diffusion cs.CL · 2026-05-16 · unverdicted · none · ref 23 · internal anchor
Constrained Diffusion for Code (CDC) integrates constraint satisfaction into the reverse denoising process of discrete diffusion models via constraint-aware operators that use optimization and program analysis to steer generation toward feasible programs.
Dynamic Chunking for Diffusion Language Models cs.CL · 2026-05-15 · unverdicted · none · ref 31 · internal anchor
DCDM replaces positional blocks with learnable semantic chunks via differentiable Chunking Attention, yielding consistent gains over block and unstructured diffusion baselines up to 1.5B parameters.
PSD: Pushing the Pareto Frontier of Diffusion LLMs via Parallel Speculative Decoding cs.CL · 2026-05-15 · unverdicted · none · ref 5 · internal anchor
PSD is a training-free framework that jointly optimizes spatial unmasking and temporal speculative decoding in diffusion LLMs to reach up to 5.5x tokens per forward pass while preserving accuracy comparable to greedy decoding.
From Table to Cell: Attention for Better Reasoning with TABALIGN cs.AI · 2026-05-14 · unverdicted · none · ref 44 · internal anchor
TABALIGN pairs a diffusion language model planner emitting binary cell masks with a trained attention verifier, raising average accuracy 15.76 points over strong baselines on eight table benchmarks while speeding execution 44.64%.
Factorization-Error-Free Discrete Diffusion Language Model via Speculative Decoding cs.CL · 2026-05-14 · unverdicted · none · ref 11 · internal anchor
FeF-DLLM achieves factorization-error-free generation in discrete diffusion language models via prefix-conditioned posterior factorization and speculative decoding, delivering 5.04 pp higher accuracy and 3.86x faster inference on GSM8K, MATH, HumanEval, and MBPP.
Support Before Frequency in Discrete Diffusion cs.LG · 2026-05-13 · unverdicted · none · ref 12 · internal anchor
Discrete diffusion models learn data support before frequencies because the exact reverse process decomposes edits into a dominant validity scale and a finer probability coefficient.
Beyond Mode-Seeking RL: Trajectory-Balance Post-Training for Diffusion Language Models cs.LG · 2026-05-13 · conditional · none · ref 16 · internal anchor
TraFL applies trajectory flow balancing to post-train diffusion language models, preventing mode collapse and delivering consistent gains on reasoning tasks that hold under increased sampling.
AIS: Adaptive Importance Sampling for Quantized RL stat.ML · 2026-05-13 · unverdicted · none · ref 13 · internal anchor
AIS adaptively corrects non-stationary policy gradient bias in quantized LLM RL, matching BF16 performance while retaining 1.5-2.76x FP8 rollout speedup.
Self-Distilled Trajectory-Aware Boltzmann Modeling: Bridging the Training-Inference Discrepancy in Diffusion Language Models cs.CL · 2026-05-12 · unverdicted · none · ref 7 · 2 links · internal anchor
TABOM is a trajectory-aligned Boltzmann modeling framework that turns self-distilled inference paths into a pairwise ranking loss to close the training-inference gap in diffusion language models and expand their effective capabilities.
Block-R1: Rethinking the Role of Block Size in Multi-domain Reinforcement Learning for Diffusion Large Language Models cs.LG · 2026-05-12 · unverdicted · none · ref 38 · 2 links · internal anchor
Introduces Block-R1 benchmark, Block-R1-41K dataset, and a conflict score to handle domain-specific optimal block sizes in RL post-training of diffusion LLMs.
UniRank: Unified List-wise Reranking via Confidence-Ordered Denoising cs.IR · 2026-05-11 · unverdicted · none · ref 14 · internal anchor
UniRank unifies autoregressive and non-autoregressive list-wise reranking via bidirectional modeling in a confidence-ordered iterative denoising process, outperforming baselines on datasets and online tests.
TAD: Temporal-Aware Trajectory Self-Distillation for Fast and Accurate Diffusion LLM cs.CL · 2026-05-10 · unverdicted · none · ref 5 · internal anchor
TAD improves the accuracy-parallelism trade-off in diffusion LLMs via temporal-aware self-distillation that applies hard labels to soon-to-be-decoded tokens and soft supervision to future tokens.
BadDLM: Backdooring Diffusion Language Models with Diverse Targets cs.CR · 2026-05-10 · unverdicted · none · ref 37 · internal anchor
BadDLM implants effective backdoors in diffusion language models across concept, attribute, alignment, and payload targets by exploiting denoising dynamics while preserving clean performance.
Discrete Langevin-Inspired Posterior Sampling cs.LG · 2026-05-10 · unverdicted · none · ref 23 · internal anchor
ΔLPS is a gradient-guided discrete posterior sampler for inverse problems that works with masked or uniform discrete diffusion priors and outperforms prior discrete methods on image restoration tasks.
LEAP: Unlocking dLLM Parallelism via Lookahead Early-Convergence Token Detection cs.LG · 2026-05-09 · unverdicted · none · ref 12 · internal anchor
LEAP detects early-converging tokens in dLLMs via future context filtering and multi-sequence superposition, reducing average denoising steps by about 30% while maintaining accuracy.
Guidance Is Not a Hyperparameter: Learning Dynamic Control in Diffusion Language Models cs.CL · 2026-05-08 · unverdicted · none · ref 1 · internal anchor
Adaptive guidance trajectories learned via PPO outperform fixed-scale CFG on controllability-quality balance in three controlled NLP generation tasks with discrete diffusion models.
GPO-V: Jailbreak Diffusion Vision Language Model by Global Probability Optimization cs.CV · 2026-05-08 · unverdicted · none · ref 1 · 2 links · internal anchor
GPO-V jailbreaks dVLMs by globally optimizing probabilities in the denoising process to bypass refusal patterns, achieving stealthy and transferable attacks.
Focus on the Core: Empowering Diffusion Large Language Models by Self-Contrast cs.CL · 2026-05-02 · unverdicted · none · ref 22 · internal anchor
FoCore uses self-contrast on early-converging high-density tokens to boost diffusion LLM quality on reasoning benchmarks while cutting decoding steps by over 2x.
DARE: Diffusion Language Model Activation Reuse for Efficient Inference cs.LG · 2026-05-01 · unverdicted · none · ref 4 · internal anchor
DARE reuses up to 87% of attention activations in diffusion LLMs through KV caching and output reuse, delivering 1.2x per-layer latency gains with average performance drops of 1.2-2.0%.
Dream-Cubed: Controllable Generative Modeling in Minecraft by Training on Billions of Cubes cs.CV · 2026-04-22 · unverdicted · none · ref 29 · internal anchor
Dream-Cubed releases a billion-scale voxel dataset and 3D diffusion models that generate controllable Minecraft worlds by operating directly on blocks.
NI Sampling: Accelerating Discrete Diffusion Sampling by Token Order Optimization cs.LG · 2026-04-20 · unverdicted · none · ref 4 · internal anchor
NI Sampling accelerates discrete diffusion language models up to 14.3 times by training a neural indicator to select which tokens to sample at each step using a trajectory-preserving objective.
One Pass for All: A Discrete Diffusion Model for Knowledge Graph Triple Set Prediction cs.AI · 2026-04-20 · unverdicted · none · ref 21 · internal anchor
DiffTSP applies discrete diffusion to knowledge graph triple set prediction, recovering all missing triples simultaneously via edge-masking noise reversal and a structure-aware transformer, achieving SOTA on three datasets.
DepCap: Adaptive Block-Wise Parallel Decoding for Efficient Diffusion LM Inference cs.LG · 2026-04-17 · unverdicted · none · ref 29 · internal anchor
DepCap accelerates diffusion LM inference up to 5.63x by using last-block influence for adaptive block boundaries and conflict-free token selection for parallel decoding, with negligible quality loss.
BARD: Bridging AutoRegressive and Diffusion Vision-Language Models Via Highly Efficient Progressive Block Merging and Stage-Wise Distillation cs.CV · 2026-04-15 · unverdicted · none · ref 14 · internal anchor
BARD bridges autoregressive and diffusion VLMs with progressive block merging plus stage-wise intra-diffusion distillation, delivering 3x speedup and new SOTA on open dVLMs using under 4.4M data points.
LangFlow: Continuous Diffusion Rivals Discrete in Language Modeling cs.CL · 2026-04-13 · unverdicted · none · ref 14 · internal anchor
LangFlow is the first continuous diffusion language model to rival discrete diffusion on perplexity and generative perplexity while exceeding autoregressive baselines on several zero-shot tasks.
DMax: Aggressive Parallel Decoding for dLLMs cs.LG · 2026-04-09 · conditional · none · ref 58 · 2 links · internal anchor
DMax uses On-Policy Uniform Training and Soft Parallel Decoding to enable aggressive parallelism in dLLMs, raising TPF on GSM8K from 2.04 to 5.47 and on MBPP from 2.71 to 5.86 while preserving accuracy.
Unlocking Prompt Infilling Capability for Diffusion Language Models cs.CL · 2026-04-04 · unverdicted · none · ref 17 · internal anchor
Full-sequence masking in SFT unlocks prompt infilling for masked diffusion language models, producing templates that match or surpass hand-designed ones and transfer across models.
NeuralLVC: Neural Lossless Video Compression via Masked Diffusion with Temporal Conditioning eess.IV · 2026-04-03 · unverdicted · none · ref 29 · internal anchor
NeuralLVC achieves better lossless compression than H.264 and H.265 on video sequences by combining masked diffusion with temporal conditioning on frame differences.
Dependency-Guided Parallel Decoding in Discrete Diffusion Language Models cs.CL · 2026-04-02 · unverdicted · none · ref 11 · internal anchor
DEMASK adds a lightweight pairwise-dependency predictor to dLLMs and uses greedy selection to enable parallel unmasking whose total-variation error is provably bounded under sub-additivity.
LogicDiff: Logic-Guided Denoising Improves Zero-Shot Reasoning in Masked Diffusion Language Models cs.CL · 2026-03-24 · conditional · none · ref 10 · internal anchor
Logic-role-guided unmasking in masked diffusion models raises zero-shot GSM8K accuracy from 22% to 61% by enforcing logical generation order.
A Comparative analysis of Layer-wise Representational Capacity in AR and Diffusion LLMs cs.CL · 2026-03-08 · unverdicted · none · ref 11 · internal anchor
Diffusion language models form more global representations with early-layer redundancy compared to autoregressive models, allowing layer skipping for up to 18.75% FLOP savings while maintaining over 90% performance.
Visual Para-Thinker: Divide-and-Conquer Reasoning for Visual Comprehension cs.CV · 2026-02-10 · unverdicted · none · ref 13 · internal anchor
Visual Para-Thinker is the first parallel reasoning framework for MLLMs that uses visual partitioning strategies, Pa-Attention, and LPRoPE to extend test-time scaling benefits to visual comprehension tasks.
DisCa: Accelerating Video Diffusion Transformers with Distillation-Compatible Learnable Feature Caching cs.CV · 2026-02-05 · unverdicted · none · ref 47 · internal anchor
DisCa replaces heuristic feature caching with a lightweight learnable neural predictor compatible with distillation, achieving 11.8× acceleration on video diffusion transformers with preserved generation quality.
dMLLM-TTS: Self-Verified and Efficient Test-Time Scaling for Diffusion Multi-Modal Large Language Models cs.CV · 2025-12-22 · conditional · none · ref 18 · internal anchor
dMLLM-TTS delivers up to 6x more efficient test-time scaling for diffusion MLLMs via O(N+T) hierarchical search and self-verified feedback, improving generation quality on GenEval across three models.
PartDiffuser: Part-wise 3D Mesh Generation via Discrete Diffusion cs.CV · 2025-11-24 · unverdicted · none · ref 27 · internal anchor
PartDiffuser is a semi-autoregressive discrete diffusion framework that generates high-fidelity 3D meshes from point clouds by combining inter-part autoregression with intra-part parallel diffusion using a part-aware DiT architecture.
Demystifying MaskGIT Sampler and Beyond: Adaptive Order Selection in Masked Diffusion cs.LG · 2025-10-06 · unverdicted · none · ref 8 · internal anchor
Theoretical analysis reveals MaskGIT's implicit temperature sampling in masked diffusion; proposes equivalent moment sampler and efficiency techniques for adaptive unmasking with image and text experiments.
Coevolutionary Continuous Discrete Diffusion: Make Your Diffusion Language Model a Latent Reasoner cs.AI · 2025-10-03 · unverdicted · none · ref 30 · internal anchor
CCDD defines a joint multimodal diffusion on continuous representation space and discrete token space to combine expressivity with explicit token supervision for diffusion language models.
Inference-Time Scaling of Diffusion Language Models via Trajectory Refinement cs.LG · 2025-07-11 · conditional · none · ref 33 · internal anchor
PG-DLM applies particle Gibbs sampling over full trajectories in diffusion language models to enable iterative refinement, yielding higher accuracy on reward-guided generation with theoretical convergence guarantees.
Adversarial Diffusion Across Modalities: A Fusion Survey of Attacks, Defenses, and Evaluation for Text, Vision, and Vision-Language Models cs.CR · 2026-06-25 · unverdicted · none · ref 48 · internal anchor
A narrative survey that catalogs fifty papers on diffusion-based adversarial techniques across text, vision, and vision-language models, proposes a six-class taxonomy of diffusion roles plus a unified five-dimension evaluation framework, and releases a companion catalog.
NAVIRA: Decoupled Stochastic Remasking for Masked Diffusion Language Models cs.CL · 2026-06-04 · unverdicted · none · ref 4 · internal anchor
NAVIRA decouples quality scoring from regeneration via stochastic remasking in masked diffusion LMs, improving fluency and LLM-judge scores on a 170M model.
AMix-2: Establishing Protein as a Native Modality in Large Language Models q-bio.BM · 2026-05-29 · unverdicted · none · ref 39 · internal anchor
AMix-2 unifies protein sequences and text in one LLM via shared tokens and block-wise diffusion modeling, introduces the ProteinArena benchmark, and reports competitive performance against task-specific protein models and frontier LLMs.

Large Language Diffusion Models

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer