{"total":133,"items":[{"citing_arxiv_id":"2606.26566","ref_index":48,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Adversarial Diffusion Across Modalities: A Fusion Survey of Attacks, Defenses, and Evaluation for Text, Vision, and Vision-Language Models","primary_cat":"cs.CR","submitted_at":"2026-06-25T03:32:12+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A narrative survey that catalogs fifty papers on diffusion-based adversarial techniques across text, vision, and vision-language models, proposes a six-class taxonomy of diffusion roles plus a unified five-dimension evaluation framework, and releases a companion catalog.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.08411","ref_index":14,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"AsyncLane: Decoupling Refinement from Advancement in Diffusion Language Model Decoding","primary_cat":"cs.CL","submitted_at":"2026-06-07T02:24:11+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"AsyncLane decouples refinement from advancement in DLM decoding via lane forking at delimiters plus efficiency optimizations, yielding up to 3x throughput gains on math and code benchmarks without retraining.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.06031","ref_index":4,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"NAVIRA: Decoupled Stochastic Remasking for Masked Diffusion Language Models","primary_cat":"cs.CL","submitted_at":"2026-06-04T11:24:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"NAVIRA decouples quality scoring from regeneration via stochastic remasking in masked diffusion LMs, improving fluency and LLM-judge scores on a 170M model.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.00724","ref_index":1,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"WaveFilter: Enhancing the Long-Context Capability of Diffusion LLMs via Wavelet-Guided KV Cache Filtering","primary_cat":"cs.CL","submitted_at":"2026-05-30T13:32:26+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"WaveFilter applies wavelet decomposition to filter critical tokens for sparse KV caching, improving long-context performance of diffusion LLMs as a plug-and-play addition to existing methods.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.00487","ref_index":28,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"TAPS: Target-Aware Prefix Tree Selection for Diffusion-Drafted Speculative Decoding","primary_cat":"cs.AI","submitted_at":"2026-05-30T02:39:40+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"TAPS converts diffusion marginal probabilities into path-conditioned acceptance estimates to select prefix-closed subtrees under a fixed verification budget, achieving up to 7.9x end-to-end speedup over autoregressive decoding.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.00295","ref_index":76,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Adaptive Order Policies for Masked Diffusion","primary_cat":"cs.LG","submitted_at":"2026-05-29T19:26:53+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A policy network learns to choose unmasking order in masked diffusion by reweighting the loss, outperforming random and heuristic baselines on ordering-sensitive tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.30963","ref_index":39,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"AMix-2: Establishing Protein as a Native Modality in Large Language Models","primary_cat":"q-bio.BM","submitted_at":"2026-05-29T07:58:08+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"AMix-2 unifies protein sequences and text in one LLM via shared tokens and block-wise diffusion modeling, introduces the ProteinArena benchmark, and reports competitive performance against task-specific protein models and frontier LLMs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.30876","ref_index":3,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"dMoE: dLLMs with Learnable Block Experts","primary_cat":"cs.CL","submitted_at":"2026-05-29T06:03:50+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"dMoE aggregates token expert distributions to block level in dLLMs, cutting unique experts from 69.5 to 14.6, memory by 76-80%, and latency by 1.14-1.66x while retaining 99.11% performance.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.30753","ref_index":18,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Efficient Diffusion LLMs via Temporal-Spatial Parallel Decoding and Confidence Extrapolation","primary_cat":"cs.CL","submitted_at":"2026-05-29T02:29:28+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Introduces TSPD with a trajectory-feature controller and training-free CE to reduce denoising steps in dLLMs while aiming to preserve quality.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.29858","ref_index":25,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Masked Diffusion Vision-Language Models for Temporal Action Localization","primary_cat":"cs.CV","submitted_at":"2026-05-28T12:39:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Adapts MDVLMs to TAL via planned training objective and step-level IoU reward, reporting gains over autoregressive baselines on ActivityNet and THUMOS datasets.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.29727","ref_index":48,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Bastion: Budget-Aware Speculative Decoding with Tree-structured Block Diffusion Drafting","primary_cat":"cs.LG","submitted_at":"2026-05-28T10:21:34+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"BASTION is a budget-aware speculative decoding framework with adaptive tree-structured block diffusion drafting that reports up to 6.61x speedup and 39% improvement over block-diffusion baselines.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.29613","ref_index":7,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Decoding Strategies for Diffusion-Based ASR: A Systematic Evaluation of Confidence-Based Thresholding","primary_cat":"eess.AS","submitted_at":"2026-05-28T08:48:56+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Threshold-based decoding for diffusion ASR outperforms fixed schemes by accepting high-confidence tokens early and matches autoregressive accuracy with better speed.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.29591","ref_index":55,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Mind-Omni: A Unified Multi-Task Framework for Brain-Vision-Language Modeling via Discrete Diffusion","primary_cat":"cs.AI","submitted_at":"2026-05-28T08:33:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Mind-Omni unifies seven brain-vision-language tasks in one discrete-diffusion framework with a brain tokenizer and a new BQA dataset, claiming SOTA multi-task performance competitive with larger single-task models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.29398","ref_index":31,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"GDSD: Reinforcement Learning as Guided Denoiser Self-Distillation for Diffusion Language Models","primary_cat":"cs.LG","submitted_at":"2026-05-28T05:47:40+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"GDSD reduces RL for dLLMs to likelihood-free self-distillation via a normalization-free logit-matching objective, outperforming ELBO methods with more stable training on LLaDA-8B and Dream-7B.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.29123","ref_index":11,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"The Confidence Shortcut: A Reasoning Failure Mode of Masked Diffusion Models","primary_cat":"cs.AI","submitted_at":"2026-05-27T21:33:37+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Confidence-based decoding and training in masked diffusion models shortcut long-range dependencies in reasoning, producing errors on complex inputs that random masking avoids.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.26120","ref_index":9,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Dynamic-dLLM: Dynamic Cache-Budget and Adaptive Parallel Decoding for Training-Free Acceleration of Diffusion LLM","primary_cat":"cs.CL","submitted_at":"2026-05-27T02:47:50+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Dynamic-dLLM achieves over 3x average inference speedup on dLLMs like LLaDA-8B via adaptive cache budgets and decoding thresholds while preserving benchmark performance.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.23163","ref_index":10,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Fast-dDrive: Efficient Block-Diffusion VLM for Autonomous Driving","primary_cat":"cs.CL","submitted_at":"2026-05-22T02:31:32+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.22967","ref_index":13,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Learned Relay Representations for Forward-Thinking Discrete Diffusion Models","primary_cat":"cs.LG","submitted_at":"2026-05-21T18:53:22+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.22765","ref_index":22,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Uniform Diffusion Models Revisited: Leave-One-Out Denoiser and Absorbing State Reformulation","primary_cat":"cs.LG","submitted_at":"2026-05-21T17:27:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Uniform diffusion models rely on a leave-one-out denoiser rather than the usual denoising posterior, with exact conversions derived; an absorbing-state reformulation is introduced that matches or exceeds masked diffusion on language modeling while preserving the original joint distribution.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.20813","ref_index":18,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"PulseCol: Periodically Refreshed Column-Sparse Attention for Accelerating Diffusion Language Models","primary_cat":"cs.CL","submitted_at":"2026-05-20T07:06:54+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"PulseCol introduces periodically refreshed column-sparse attention to achieve up to 1.95x speedup over FlashAttention in diffusion LLMs with maintained model quality.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.20179","ref_index":5,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"TIDE: Efficient and Lossless MoE Diffusion LLM Inference with I/O-aware Expert Offload","primary_cat":"cs.CL","submitted_at":"2026-05-19T17:59:08+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"TIDE schedules I/O-aware expert offloading for MoE diffusion LLMs by solving for an optimal refresh interval that exploits temporal stability of activations, yielding up to 1.5x throughput gain losslessly.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.19470","ref_index":7,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Drifting Objectives for Refining Discrete Diffusion Language Models","primary_cat":"cs.CL","submitted_at":"2026-05-19T07:22:17+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"TokenDrift refines discrete diffusion language models by applying anti-symmetric drifting to soft-token features during training, yielding large reductions in generation perplexity at low NFEs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.19391","ref_index":46,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Tweedie's Formulae and Diffusion Generative Models Beyond Gaussian","primary_cat":"stat.ML","submitted_at":"2026-05-19T05:36:18+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Extends Tweedie's formulae to GBM, BESQ, and CIR processes to enable non-Gaussian diffusion generative models and empirical Bayes applications.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.19262","ref_index":7,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Backdooring Masked Diffusion Language Models","primary_cat":"cs.LG","submitted_at":"2026-05-19T02:20:08+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18745","ref_index":18,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"SURGE: Approximation and Training Free Particle Filter for Diffusion Surrogate","primary_cat":"stat.ML","submitted_at":"2026-05-18T17:59:00+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18253","ref_index":2,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Machine Unlearning for Masked Diffusion Language Models","primary_cat":"cs.CL","submitted_at":"2026-05-18T11:54:11+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"MDU minimizes forward KL divergence from prompt-conditional to prompt-masked unconditional predictions at masked positions to unlearn knowledge in MDLMs while trading off privacy and utility via temperature scaling.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18165","ref_index":21,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Elastic-dLLM: Position Preserving Context Compression and Augmentation of Diffusion LLMs","primary_cat":"cs.LG","submitted_at":"2026-05-18T10:09:10+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Position-preserving MASK token compression reduces redundancy in diffusion LLMs to accelerate parallel decoding and enable context folding for longer sequences.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.17850","ref_index":11,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Simple Approximation and Derivative Free Inference-Time Scaling for Diffusion Models via Sequential Monte Carlo on Path Measures","primary_cat":"stat.ML","submitted_at":"2026-05-18T04:45:33+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"URGE performs unbiased inference-time scaling for diffusion models by attaching multiplicative path weights from Girsanov estimation and resampling trajectories, with a proven equivalence to prior particle-wise SMC schemes.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.17174","ref_index":14,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Beyond Execution: Static-Analysis Rewards and Hint-Conditioned Diffusion RL for Code Generation","primary_cat":"cs.SE","submitted_at":"2026-05-16T22:18:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Static checking rewards and moderate AST-based hints improve diffusion RL performance for code generation, with effectiveness varying by task difficulty across HumanEval, MBPP, and LiveCodeBench.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.16941","ref_index":10,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Roll Out and Roll Back: Diffusion LLMs are Their Own Efficiency Teachers","primary_cat":"cs.CL","submitted_at":"2026-05-16T11:27:40+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Diffusion LLMs can act as their own efficiency teachers by using revokable parallel decoding to identify reliable token orders and then distilling those orders into the model parameters for faster inference.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.16842","ref_index":42,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Sketch Then Paint: Hierarchical Reinforcement Learning for Diffusion Multi-Modal Large Language Models","primary_cat":"cs.AI","submitted_at":"2026-05-16T06:59:54+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Proposes HT-GRPO with sketch-then-paint staged updates, prompt-conditioned importance ratios, and hierarchical credit assignment for dMLLMs, reporting gains on GenEval and DPG plus quality metrics.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.16829","ref_index":23,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Constrained Code Generation with Discrete Diffusion","primary_cat":"cs.CL","submitted_at":"2026-05-16T06:15:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Constrained Diffusion for Code (CDC) integrates constraint satisfaction into the reverse denoising process of discrete diffusion models via constraint-aware operators that use optimization and program analysis to steer generation toward feasible programs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.15676","ref_index":31,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Dynamic Chunking for Diffusion Language Models","primary_cat":"cs.CL","submitted_at":"2026-05-15T06:56:05+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"DCDM replaces positional blocks with learnable semantic chunks via differentiable Chunking Attention, yielding consistent gains over block and unstructured diffusion baselines up to 1.5B parameters.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.15609","ref_index":5,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"PSD: Pushing the Pareto Frontier of Diffusion LLMs via Parallel Speculative Decoding","primary_cat":"cs.CL","submitted_at":"2026-05-15T04:43:02+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"PSD is a training-free framework that jointly optimizes spatial unmasking and temporal speculative decoding in diffusion LLMs to reach up to 5.5x tokens per forward pass while preserving accuracy comparable to greedy decoding.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14531","ref_index":33,"ref_count":2,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Language Generation as Optimal Control: Closed-Loop Diffusion in Latent Control Space","primary_cat":"cs.CL","submitted_at":"2026-05-14T08:13:43+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14530","ref_index":13,"ref_count":2,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Mitigating Mask Prior Drift and Positional Attention Collapse in Large Diffusion Vision-Language Models","primary_cat":"cs.CV","submitted_at":"2026-05-14T08:11:32+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Diagnoses mask prior drift and positional attention collapse in LDVLMs and introduces two plug-and-play decoding interventions that raise long-form generation quality without retraining.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14465","ref_index":44,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"From Table to Cell: Attention for Better Reasoning with TABALIGN","primary_cat":"cs.AI","submitted_at":"2026-05-14T07:00:26+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"TABALIGN pairs a diffusion language model planner emitting binary cell masks with a trained attention verifier, raising average accuracy 15.76 points over strong baselines on eight table benchmarks while speeding execution 44.64%.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14305","ref_index":11,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Factorization-Error-Free Discrete Diffusion Language Model via Speculative Decoding","primary_cat":"cs.CL","submitted_at":"2026-05-14T03:15:25+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"FeF-DLLM achieves factorization-error-free generation in discrete diffusion language models via prefix-conditioned posterior factorization and speculative decoding, delivering 5.04 pp higher accuracy and 3.86x faster inference on GSM8K, MATH, HumanEval, and MBPP.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.13999","ref_index":12,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Support Before Frequency in Discrete Diffusion","primary_cat":"cs.LG","submitted_at":"2026-05-13T18:08:16+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Discrete diffusion models learn data support before frequencies because the exact reverse process decomposes edits into a dominant validity scale and a finer probability coefficient.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.13935","ref_index":16,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Beyond Mode-Seeking RL: Trajectory-Balance Post-Training for Diffusion Language Models","primary_cat":"cs.LG","submitted_at":"2026-05-13T16:14:46+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"TraFL applies trajectory flow balancing to post-train diffusion language models, preventing mode collapse and delivering consistent gains on reasoning tasks that hold under increased sampling.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.13907","ref_index":13,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"AIS: Adaptive Importance Sampling for Quantized RL","primary_cat":"stat.ML","submitted_at":"2026-05-13T03:36:57+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"AIS adaptively corrects non-stationary policy gradient bias in quantized LLM RL, matching BF16 performance while retaining 1.5-2.76x FP8 rollout speedup.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.12836","ref_index":16,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Discrete Stochastic Localization for Non-autoregressive Generation","primary_cat":"cs.LG","submitted_at":"2026-05-13T00:12:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DSL provides a continuous embedding framework where one denoiser supports a family of SNR paths for discrete sequences, improving MAUVE scores on OpenWebText and allowing random-order and hybrid sampling from a fine-tuned MDLM checkpoint.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.12825","ref_index":17,"ref_count":2,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Orthrus: Memory-Efficient Parallel Token Generation via Dual-View Diffusion","primary_cat":"cs.LG","submitted_at":"2026-05-12T23:47:35+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Orthrus unifies autoregressive LLMs and diffusion models via shared KV cache and consensus to enable up to 7.8x parallel token generation speedup with O(1) memory overhead and lossless results.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.11854","ref_index":7,"ref_count":2,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Self-Distilled Trajectory-Aware Boltzmann Modeling: Bridging the Training-Inference Discrepancy in Diffusion Language Models","primary_cat":"cs.CL","submitted_at":"2026-05-12T09:39:06+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"TABOM is a trajectory-aligned Boltzmann modeling framework that turns self-distilled inference paths into a pairwise ranking loss to close the training-inference gap in diffusion language models and expand their effective capabilities.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"[5] Jiacheng Ye, Zhihui Xie, Lin Zheng, Jiahui Gao, Zirui Wu, Xin Jiang, Zhenguo Li, and Lingpeng Kong. Dream 7b: Diffusion large language models.arXiv preprint arXiv:2508.15487, 2025. [6] Shen Nie, Fengqi Zhu, Zebin You, Xiaolu Zhang, Jingyang Ou, Jun Hu, Jun Zhou, Yankai Lin, Ji-Rong Wen, and Chongxuan Li. Large language diffusion models.arXiv preprint arXiv:2502.09992, 2025. [7] Jingyang Ou, Jiaqi Han, Minkai Xu, Shaoxuan Xu, Jianwen Xie, Stefano Ermon, Yi Wu, and Chongxuan Li. Principled rl for diffusion llms emerges from a sequence-level perspective, 2025. URLhttps://arxiv.org/abs/2512.03759. [8] Jingyi Yang, Yuxian Jiang, Xuhao Hu, Shuang Cheng, Biqing Qi, and Jing Shao. Dare: Diffusion large language models alignment and reinforcement executor, 2026."},{"citing_arxiv_id":"2605.11726","ref_index":38,"ref_count":2,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Block-R1: Rethinking the Role of Block Size in Multi-domain Reinforcement Learning for Diffusion Large Language Models","primary_cat":"cs.LG","submitted_at":"2026-05-12T08:09:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Introduces Block-R1 benchmark, Block-R1-41K dataset, and a conflict score to handle domain-specific optimal block sizes in RL post-training of diffusion LLMs.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"conflict; (3) a new benchmark,Block-R1, for flexible RL post-training for dLLMs in both single and cross domain; and (4) a simple yet powerful cross-domain post- training method with sample-level best-improved training block sizes. Extensive experiments on 13 distinct datasets, 7 latest RL algorithms, and various different dLLM backbones are covered in Block-R1. 1 Introduction Diffusion Large Language Models (dLLMs) [38, 70, 2, 8, 7, 77] have recently emerged as a promising alternative to conventional autoregressive large language models by enabling parallel token generation through a block-based semi-autoregressive decoding mechanism [2]. Specifically, the sequence is divided into equal-length structured blocks, enabling parallel decoding of multiple tokens within"},{"citing_arxiv_id":"2605.18810","ref_index":18,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"D-PACE: Dynamic Position-Aware Cross-Entropy for Parallel Speculative Drafting","primary_cat":"cs.LG","submitted_at":"2026-05-12T06:27:57+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"D-PACE derives per-position weights from a surrogate of expected accepted draft length to shift training focus toward currently limiting positions, yielding measured gains in wall-clock speedup and emitted length across benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.11577","ref_index":19,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"BitLM: Unlocking Multi-Token Language Generation with Bitwise Continuous Diffusion","primary_cat":"cs.CL","submitted_at":"2026-05-12T06:02:59+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"BitLM replaces per-token softmax with bitwise continuous diffusion inside causal blocks to generate multiple tokens in parallel while preserving autoregressive structure.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"A central property of BitLM is that the diffusion head predicts the entire target block jointly. The input to Eq. (10) is the full noisy tensor A(n) t ∈R m×B, and the output is the full denoised tensor ˆA(n) 0 ∈R m×B. Consequently, the model doesnotimpose a factorization of the form (Devlin et al., 2019; Nie et al., 2025; Bie et al., 2025) pθ \u0010 A(n) 0 |C (n−1) \u0011 = m ∏ i=1 B ∏ b=1 pθ \u0010 a(n) i,b |C (n−1) \u0011 , (19) 6 Preprint. Under review. Algorithm 1: BitDance-LM training. Require: Token sequence y1:L, block size m, code lengthB Ensure:Training lossL BitDance-LM 1:N← ⌈L/m⌉ 2:A 1:L,0 ←ϕ(y 1:L )▷token ids→binary codes in{−1, 1} B 3: PadA 1:L,0 to lengthNm, yieldingA 1:Nm,0 4: Split A1:Nm,0 into blocks {A (1) 0 , . . ., A(N) 0 }, whereA (n) 0 ∈ {−1, 1} m×B 5:E 1:Nm ←MLP(A 1:Nm,0 )▷ lift binary codes to"},{"citing_arxiv_id":"2605.11567","ref_index":32,"ref_count":2,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Dynamic Execution Commitment of Vision-Language-Action Models","primary_cat":"cs.CV","submitted_at":"2026-05-12T05:52:58+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"gressive language models by drafting multiple tokens and verifying them using a stronger or identical model. Subsequent work, including self-speculative decoding [15, 31], further improves efficiency by leveraging structured verification trees to enable parallel token prediction while preserving exact decoding correctness in discrete token spaces. For instance, SSD [15] leverages self-drafting to elimi- nate auxiliary models, enabling dLLMs [32] to generate and verify multiple tokens per iteration [33]. While structurally related [34], our work addresses a fundamentally different objective. Speculative decoding aims to improve generation efficiency without altering model outputs. In contrast, we focus on execution commitment in VLA systems-determining how much of a predicted continuous action"},{"citing_arxiv_id":"2605.10527","ref_index":14,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"UniRank: Unified List-wise Reranking via Confidence-Ordered Denoising","primary_cat":"cs.IR","submitted_at":"2026-05-11T13:12:23+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"UniRank unifies autoregressive and non-autoregressive list-wise reranking via bidirectional modeling in a confidence-ordered iterative denoising process, outperforming baselines on datasets and online tests.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"trained language model to support retrieval, ranking, and explanation, whereas decoder-only models emphasize industrial scaling, with HSTU [27] proposing a high-throughput sequential transduction architecture for long and dynamic user histories. More recently, LLaDA-Rec [21] replaces left-to- right decoding with a masked discrete diffusion backbone [14] that generates the semantic identifier of the next item. (3) Optimization. Beyond next token prediction, TallRec [28] shows that instruction tuning with small amounts of recommendation data can improve few-shot recommendation and cross-domain transfer. In contrast to these methods, which generate the next item from a global item vocabulary conditioned on user history, UniRank targets the reranking setting, where the output must"},{"citing_arxiv_id":"2605.10020","ref_index":33,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"TrajDLM: Topology-Aware Block Diffusion Language Model for Trajectory Generation","primary_cat":"cs.LG","submitted_at":"2026-05-11T05:44:40+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"TrajDLM applies block diffusion language models to discrete road-segment sequences with topology constraints to generate realistic trajectories up to 2.8 times faster than prior methods while supporting zero-shot transfer.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"[31] A. Lou, C. Meng, and S. Ermon. Discrete diffusion modeling by estimating the ratios of the data distribution.arXiv preprint arXiv:2310.16834, 2023. [32] Y . Lv, Y . Duan, W. Kang, Z. Li, and F.-Y . Wang. Traffic flow prediction with big data: A deep learning approach.Ieee transactions on intelligent transportation systems, 16(2):865-873, 2014. [33] S. Nie, F. Zhu, Z. You, X. Zhang, J. Ou, J. Hu, J. Zhou, Y . Lin, J.-R. Wen, and C. Li. Large language diffusion models.arXiv preprint arXiv:2502.09992, 2025. 11 [34] Y . Qin, H. Wu, W. Ju, X. Luo, and M. Zhang. A diffusion model for poi recommendation.ACM Transactions on Information Systems, 42(2):1-27, 2023. [35] R. Rombach, A. Blattmann, D. Lorenz, P."}],"limit":50,"offset":0}