Discrete diffusion in large language and multimodal models: A survey

Runpeng Yu, Qi Li, Xinchao Wang · 2025 · arXiv 2506.13759

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

read on arXiv browse 9 citing papers

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

ViMU: Benchmarking Video Metaphorical Understanding

cs.CV · 2026-05-14 · unverdicted · novelty 8.0

ViMU is the first benchmark for evaluating video models on metaphorical and subtextual understanding using hint-free questions grounded in multimodal evidence.

NPU Design for Diffusion Language Model Inference

cs.AR · 2026-01-28 · unverdicted · novelty 8.0

Introduces the first NPU accelerator for diffusion language models with dLLM-specific ISA, hardware execution model, BAOS KV quantization, and 7nm RTL synthesis.

DepCap: Adaptive Block-Wise Parallel Decoding for Efficient Diffusion LM Inference

cs.LG · 2026-04-17 · unverdicted · novelty 7.0

DepCap accelerates diffusion LM inference up to 5.63x by using last-block influence for adaptive block boundaries and conflict-free token selection for parallel decoding, with negligible quality loss.

DMax: Aggressive Parallel Decoding for dLLMs

cs.LG · 2026-04-09 · conditional · novelty 7.0 · 2 refs

DMax uses On-Policy Uniform Training and Soft Parallel Decoding to enable aggressive parallelism in dLLMs, raising TPF on GSM8K from 2.04 to 5.47 and on MBPP from 2.71 to 5.86 while preserving accuracy.

On-Policy Self-Evolution via Failure Trajectories for Agentic Safety Alignment

cs.AI · 2026-05-12 · unverdicted · novelty 6.0

FATE lets LLM agents self-evolve safer behaviors by generating and filtering repairs from their own failure trajectories using verifiers and Pareto optimization.

Yeti: A compact protein structure tokenizer for reconstruction and multi-modal generation

q-bio.BM · 2026-05-11 · unverdicted · novelty 6.0

Yeti is a compact tokenizer for protein structures that delivers strong codebook use, token diversity, and reconstruction while enabling from-scratch multimodal generation of plausible sequences and structures with 10x fewer parameters than ESM3.

Simple Self-Conditioning Adaptation for Masked Diffusion Models

cs.LG · 2026-04-28 · unverdicted · novelty 6.0

SCMDM adapts trained masked diffusion models to condition denoising steps on their own prior clean predictions, cutting generative perplexity nearly in half on open-web text while improving discretized image, molecule, and genomic synthesis.

LLaDA2.0: Scaling Up Diffusion Language Models to 100B

cs.LG · 2025-12-10 · conditional · novelty 6.0

LLaDA2.0 scales discrete diffusion language models to 100B parameters via systematic conversion from autoregressive models using a 3-phase WSD training scheme and releases open-source 16B and 100B MoE variants.

Fast-dVLA: Accelerating Discrete Diffusion VLA to Real-Time Performance

cs.RO · 2026-03-26 · unverdicted · novelty 5.0

Parameter differences from two training runs on a small task set are treated as auxiliary capability vectors that are merged into a pretrained VLA model, yielding auxiliary-task gains at the cost of ordinary supervised finetuning plus a simple regularization term.

citing papers explorer

Showing 9 of 9 citing papers.

ViMU: Benchmarking Video Metaphorical Understanding cs.CV · 2026-05-14 · unverdicted · none · ref 35
ViMU is the first benchmark for evaluating video models on metaphorical and subtextual understanding using hint-free questions grounded in multimodal evidence.
NPU Design for Diffusion Language Model Inference cs.AR · 2026-01-28 · unverdicted · none · ref 1
Introduces the first NPU accelerator for diffusion language models with dLLM-specific ISA, hardware execution model, BAOS KV quantization, and 7nm RTL synthesis.
DepCap: Adaptive Block-Wise Parallel Decoding for Efficient Diffusion LM Inference cs.LG · 2026-04-17 · unverdicted · none · ref 37
DepCap accelerates diffusion LM inference up to 5.63x by using last-block influence for adaptive block boundaries and conflict-free token selection for parallel decoding, with negligible quality loss.
DMax: Aggressive Parallel Decoding for dLLMs cs.LG · 2026-04-09 · conditional · none · ref 95 · 2 links
DMax uses On-Policy Uniform Training and Soft Parallel Decoding to enable aggressive parallelism in dLLMs, raising TPF on GSM8K from 2.04 to 5.47 and on MBPP from 2.71 to 5.86 while preserving accuracy.
On-Policy Self-Evolution via Failure Trajectories for Agentic Safety Alignment cs.AI · 2026-05-12 · unverdicted · none · ref 46
FATE lets LLM agents self-evolve safer behaviors by generating and filtering repairs from their own failure trajectories using verifiers and Pareto optimization.
Yeti: A compact protein structure tokenizer for reconstruction and multi-modal generation q-bio.BM · 2026-05-11 · unverdicted · none · ref 29
Yeti is a compact tokenizer for protein structures that delivers strong codebook use, token diversity, and reconstruction while enabling from-scratch multimodal generation of plausible sequences and structures with 10x fewer parameters than ESM3.
Simple Self-Conditioning Adaptation for Masked Diffusion Models cs.LG · 2026-04-28 · unverdicted · none · ref 10
SCMDM adapts trained masked diffusion models to condition denoising steps on their own prior clean predictions, cutting generative perplexity nearly in half on open-web text while improving discretized image, molecule, and genomic synthesis.
LLaDA2.0: Scaling Up Diffusion Language Models to 100B cs.LG · 2025-12-10 · conditional · none · ref 39
LLaDA2.0 scales discrete diffusion language models to 100B parameters via systematic conversion from autoregressive models using a 3-phase WSD training scheme and releases open-source 16B and 100B MoE variants.
Fast-dVLA: Accelerating Discrete Diffusion VLA to Real-Time Performance cs.RO · 2026-03-26 · unverdicted · none · ref 28
Parameter differences from two training runs on a small task set are treated as auxiliary capability vectors that are merged into a pretrained VLA model, yielding auxiliary-task gains at the cost of ordinary supervised finetuning plus a simple regularization term.

Discrete diffusion in large language and multimodal models: A survey

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer