{"total":10,"items":[{"citing_arxiv_id":"2605.19470","ref_index":4,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Drifting Objectives for Refining Discrete Diffusion Language Models","primary_cat":"cs.CL","submitted_at":"2026-05-19T07:22:17+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"TokenDrift refines discrete diffusion language models by applying anti-symmetric drifting to soft-token features during training, yielding large reductions in generation perplexity at low NFEs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.19376","ref_index":41,"ref_count":2,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Generative Recursive Reasoning","primary_cat":"cs.AI","submitted_at":"2026-05-19T05:20:56+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"GRAM is a latent-variable generative model that performs recursive reasoning via stochastic trajectories, trained with amortized variational inference to support multi-hypothesis reasoning and unconditional generation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.19262","ref_index":1,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Backdooring Masked Diffusion Language Models","primary_cat":"cs.LG","submitted_at":"2026-05-19T02:20:08+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.12836","ref_index":22,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Discrete Stochastic Localization for Non-autoregressive Generation","primary_cat":"cs.LG","submitted_at":"2026-05-13T00:12:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DSL provides a continuous embedding framework where one denoiser supports a family of SNR paths for discrete sequences, improving MAUVE scores on OpenWebText and allowing random-order and hybrid sampling from a fine-tuned MDLM checkpoint.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.11854","ref_index":33,"ref_count":2,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Self-Distilled Trajectory-Aware Boltzmann Modeling: Bridging the Training-Inference Discrepancy in Diffusion Language Models","primary_cat":"cs.CL","submitted_at":"2026-05-12T09:39:06+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"TABOM is a trajectory-aligned Boltzmann modeling framework that turns self-distilled inference paths into a pairwise ranking loss to close the training-inference gap in diffusion language models and expand their effective capabilities.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"generalized masked diffusion for discrete data.arXiv preprint arXiv:2406.04329, 2024. [32] Subham Sahoo, Marianne Arriola, Yair Schiff, Aaron Gokaslan, Edgar Marroquin, Justin Chiu, Alexander Rush, and V olodymyr Kuleshov. Simple and effective masked diffusion language models.Advances in Neural Information Processing Systems, 37:130136-130184, 2024. [33] Kaiwen Zheng, Yongxin Chen, Hanzi Mao, Ming-Yu Liu, Jun Zhu, and Qinsheng Zhang. Masked diffusion models are secretly time-agnostic masked models and exploit inaccurate categorical sampling.arXiv preprint arXiv:2409.02908, 2024. [34] Jingyang Ou, Shen Nie, Kaiwen Xue, Fengqi Zhu, Jiacheng Sun, Zhenguo Li, and Chongxuan Li. Your absorbing discrete diffusion secretly models the conditional distributions of clean data."},{"citing_arxiv_id":"2605.09302","ref_index":31,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Discrete Langevin-Inspired Posterior Sampling","primary_cat":"cs.LG","submitted_at":"2026-05-10T03:59:40+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"ΔLPS is a gradient-guided discrete posterior sampler for inverse problems that works with masked or uniform discrete diffusion priors and outperforms prior discrete methods on image restoration tasks.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"the analytic posteriorq(z s|zt,z 0)to define a reverse transition fors < t: pθ(zs|zt) = X ˆz0 q(zs|zt, ˆz0)pθ(ˆz0;z t).(1) Masked diffusion models instantiate this framework using an absorbing mask state. For a special token [M], the corruption process can be written as q(zt[ℓ]|z0[ℓ]) = Cat zt[ℓ];α tez0[ℓ] + (1−α t)e[M] \u0001 , where αt decreases with t. MDLM-style models [31] build on this absorbing process and train the denoiser with a weighted masked-token prediction objective. A different line of work, including Duo-style uniform-state diffusion [32], instead uses uniform corruption, q(zt[ℓ]|z0[ℓ]) = Cat zt[ℓ];α tez0[ℓ] + (1−α t)u \u0001 ,u= 1 K 1,(2) so corrupted variables are replaced by uniformly random tokens rather than a single absorbing mask."},{"citing_arxiv_id":"2605.07971","ref_index":25,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"DVD: Discrete Voxel Diffusion for 3D Generation and Editing","primary_cat":"cs.CV","submitted_at":"2026-05-08T16:32:17+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"While the prior distribution could be arbitrary, they are mainly split into two parts: uniform state diffusion models (USDMs) and mask diffusion models (MDMs). USDMs have a uniform prior distribution across all possible states and all tokens, while MDMs mask every position with a special MASK token. Discrete diffusion models have been developed and studied in various tasks, such as text generation [25], image generation [26], multimodal modeling [27], protein generation [24], and pose estimation [ 28], etc. For applications of discrete diffusion in 3D tasks, Song et al. [29] investigated mesh generation using discrete diffusion models, and TD3D [ 30] leveraged discrete diffusion models for shape generation in a quantized latent space, scaffold diffusion [31] used DDMs"},{"citing_arxiv_id":"2605.03360","ref_index":28,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"A-CODE: Fully Atomic Protein Co-Design with Unified Multimodal Diffusion","primary_cat":"q-bio.QM","submitted_at":"2026-05-05T04:41:14+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"A-CODE presents a fully atomic one-stage multimodal diffusion model for protein co-design that claims superior unconditional generation performance over prior one- and two-stage models plus a tenfold success-rate gain on hard binder-design tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.08302","ref_index":66,"ref_count":2,"confidence":0.55,"is_internal_anchor":false,"paper_title":"DMax: Aggressive Parallel Decoding for dLLMs","primary_cat":"cs.LG","submitted_at":"2026-04-09T14:35:42+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"DMax uses On-Policy Uniform Training and Soft Parallel Decoding to enable aggressive parallelism in dLLMs, raising TPF on GSM8K from 2.04 to 5.47 and on MBPP from 2.71 to 5.86 while preserving accuracy.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.00688","ref_index":22,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"OmniVoice: Towards Omnilingual Zero-Shot Text-to-Speech with Diffusion Language Models","primary_cat":"cs.CL","submitted_at":"2026-04-01T09:45:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"OmniVoice introduces a diffusion language model-style non-autoregressive TTS system that directly maps text to multi-codebook acoustic tokens, scaling zero-shot synthesis to over 600 languages with SOTA results on multilingual benchmarks using 581k hours of open data.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"Bidirectional Transformer Text Tokens Acoustic Tokens Prompt Segment Target Masked Segment Instruct Transcript Predicted Target Figure 1: Illustration of OmniV oice architecture. To bridge this gap, we introduce OmniV oice, an architecturally streamlined yet highly effective dis- crete NAR TTS framework. OmniV oice employs a discrete masked diffusion objective [22] with a bidirectional Transformer [23] to directly map text to multi-codebook acoustic tokens, thereby by- passing the complexity and limitations of cascaded pipelines. Its core modeling philosophy extends the success of diffusion language models [24, 25] to the speech domain. We demonstrate that the potential of this minimalist architecture can be fully unleashed through two technical innovations:"}],"limit":50,"offset":0}