{"total":14,"items":[{"citing_arxiv_id":"2606.00295","ref_index":124,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Adaptive Order Policies for Masked Diffusion","primary_cat":"cs.LG","submitted_at":"2026-05-29T19:26:53+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A policy network learns to choose unmasking order in masked diffusion by reweighting the loss, outperforming random and heuristic baselines on ordering-sensitive tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.26106","ref_index":75,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Looped Diffusion Language Models","primary_cat":"cs.LG","submitted_at":"2026-05-25T17:58:24+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LoopMDM loops early-middle layers in masked diffusion models to match same-size MDM performance with up to 3.3x fewer training FLOPs and outperform on reasoning tasks by up to 8.5 points on GSM8K.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.25638","ref_index":30,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Reinforcement Learning from Denoising Feedback","primary_cat":"cs.CL","submitted_at":"2026-05-25T09:39:13+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"RLDF is a new RL paradigm for diffusion language models that optimizes toward clipped clean states with weighted timestep sampling and reports substantial gains on reasoning benchmarks for LLaDA and Dream.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.22765","ref_index":38,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Uniform Diffusion Models Revisited: Leave-One-Out Denoiser and Absorbing State Reformulation","primary_cat":"cs.LG","submitted_at":"2026-05-21T17:27:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Uniform diffusion models rely on a leave-one-out denoiser rather than the usual denoising posterior, with exact conversions derived; an absorbing-state reformulation is introduced that matches or exceeds masked diffusion on language modeling while preserving the original joint distribution.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.17497","ref_index":88,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Self-Supervised On-Policy Distillation for Reasoning Language Models","primary_cat":"cs.LG","submitted_at":"2026-05-17T15:14:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SSOPD converts intra-group correct-wrong contrast into process supervision by distilling a teacher distribution from the shortest correct completion into prefixes of the longest wrong completion, improving GRPO on AIME and HMMT benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.17174","ref_index":23,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Beyond Execution: Static-Analysis Rewards and Hint-Conditioned Diffusion RL for Code Generation","primary_cat":"cs.SE","submitted_at":"2026-05-16T22:18:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Static checking rewards and moderate AST-based hints improve diffusion RL performance for code generation, with effectiveness varying by task difficulty across HumanEval, MBPP, and LiveCodeBench.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.11726","ref_index":69,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Block-R1: Rethinking the Role of Block Size in Multi-domain Reinforcement Learning for Diffusion Large Language Models","primary_cat":"cs.LG","submitted_at":"2026-05-12T08:09:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Introduces Block-R1 benchmark, Block-R1-41K dataset, and a conflict score to handle domain-specific optimal block sizes in RL post-training of diffusion LLMs.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Association for Computational Linguistics (ACL), 2025. [67] J. Yang, Y . Jiang, X. Hu, S. Cheng, B. Qi, and J. Shao. Dare: Diffusion large language models alignment and reinforcement executor.arXiv preprint arXiv:2604.04215, 2026. [68] L. Yang, Y . Tian, B. Li, X. Zhang, K. Shen, Y . Tong, and M. Wang. Mmada: Multimodal Large Diffusion Language Models.arXiv preprint arXiv:2505.15809, 2025. [69] J. Ye, J. Gao, S. Gong, L. Zheng, X. Jiang, Z. Li, and L. Kong. Beyond autoregression: Discrete diffusion for complex reasoning and planning.arXiv preprint arXiv:2410.14157, 2024. [70] J. Ye, Z. Xie, L. Zheng, J. Gao, Z. Wu, X. Jiang, Z. Li, and L. Kong. Dream 7B.URL https://hkunlp.github.io/blog/2025/dream, 2025. [71] O. Zekri and N. Boullé. Fine-Tuning Discrete Diffusion Models with Policy Gradient Methods."},{"citing_arxiv_id":"2605.07933","ref_index":54,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"How to Train Your Latent Diffusion Language Model Jointly With the Latent Space","primary_cat":"cs.CL","submitted_at":"2026-05-08T16:05:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Joint training of the latent space with the diffusion process produces a competitive latent diffusion language model that is faster than existing discrete and continuous diffusion baselines.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"viacheslav@gmail.com. Preprint. arXiv:2605.07933v1 [cs.CL] 8 May 2026 1 Introduction Autoregressive models are the current standard for text generation [33, 20, 15]. However, despite their prevalence, they are constrained by their left-to-right generation pattern, which prevents them from correcting previous mistakes or generating more than one token at a time [54]. Diffusion language models offer an alternative paradigm [27, 29, 41, 37]: they generate text through iterative refinement, updating all positions in parallel and providing greater control over the generated sequence. Text diffusion models are commonly divided into discrete and continuous approaches. Discrete diffusion operates directly in token space by corrupting and denoising categorical states, and has"},{"citing_arxiv_id":"2605.06548","ref_index":103,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Continuous Latent Diffusion Language Model","primary_cat":"cs.CL","submitted_at":"2026-05-07T16:44:56+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Cola DLM proposes a hierarchical latent diffusion model that learns a text-to-latent mapping, fits a global semantic prior in continuous space with a block-causal DiT, and performs conditional decoding, establishing latent prior modeling as an alternative to token-level autoregressive language model","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"[101] An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025. [102] Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. Xlnet: Generalized autoregressive pretraining for language understanding.Advancesin neural information processing systems, 32, 2019. [103] Jiacheng Ye, Jiahui Gao, Shansan Gong, Lin Zheng, Xin Jiang, Zhenguo Li, and Lingpeng Kong. Beyond autoregression: Discrete diffusion for complex reasoning and planning.arXiv preprint arXiv:2410.14157, 2024. [104] Haoran You, Yichao Fu, Zheng Wang, Amir Yazdanbakhsh, and Yingyan Celine Lin. When linear attention meets autoregressive decoding: Towards more effective and efficient linearized large language models."},{"citing_arxiv_id":"2606.19349","ref_index":31,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Where to Place the Query? Unveiling and Mitigating Positional Bias in In-Context Learning for Diffusion LLMs via Decoding Dynamics","primary_cat":"cs.CL","submitted_at":"2026-04-26T20:22:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Query position is a first-order variable in dLLM ICL whose variance matches semantic quality impact; mitigated via Average Confidence metric and training-free Auto-ICL routing.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2603.12554","ref_index":16,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Reinforcement Learning for Diffusion LLMs with Entropy-Guided Step Selection and Stepwise Advantages","primary_cat":"cs.LG","submitted_at":"2026-03-13T01:38:44+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"Derives an exact unbiased policy gradient for RL post-training of diffusion LLMs via entropy-guided step selection and one-step denoising rewards, achieving state-of-the-art results on coding and logical reasoning benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2602.18176","ref_index":16,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Improving Sampling for Masked Diffusion Models via Information Gain","primary_cat":"cs.CL","submitted_at":"2026-02-20T12:26:03+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Info-Gain Sampler improves MDM decoding by using bidirectional information gain to reduce cumulative uncertainty, outperforming greedy samplers on reasoning accuracy and creative writing tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2601.12538","ref_index":149,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Agentic Reasoning for Large Language Models","primary_cat":"cs.AI","submitted_at":"2026-01-18T18:58:23+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"The survey structures agentic reasoning for LLMs into foundational, self-evolving, and collective multi-agent layers while distinguishing in-context orchestration from post-training optimization and reviewing applications across domains.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"explicitly separates observation and reasoning modules to optimize for efficiency. Similarly, works like [143, 144, 145, 146, 147, 142, 148] break reasoning into reusable or hierarchical abstractions. [76] pro- motes hierarchical thinking through hypertrees, while [82] abstracts the world with symbolic predicates to reduce planning burden. Others, such as [149] and [119], decompose via latent variables or state spaces. These decompositions not only enhance tractability, but also align with neural-symbolic hybrid frameworks. They are especially common in long-horizon or multi-agent planning scenarios, such as [150, 151]. External Aid / Tool Use.Many systems leverage external structures or tools to aid planning, including"},{"citing_arxiv_id":"2508.14685","ref_index":17,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SSA: Improving Performance With a Better Scoring Function","primary_cat":"cs.CL","submitted_at":"2025-08-20T13:01:34+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Replacing Softmax with Scaled Signed Averaging in transformer attention improves generalization under distribution shifts for in-context learning and boosts results on NLP benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}