Mdpo: Overcom- ing the training-inference divide of masked diffusion lan- guage models.arXiv preprint arXiv:2508.13148

He, H · 2025 · arXiv 2508.13148

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

citation-role summary

background 1 baseline 1

citation-polarity summary

background 1 baseline 1

representative citing papers

Block-R1: Rethinking the Role of Block Size in Multi-domain Reinforcement Learning for Diffusion Large Language Models

cs.LG · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

Introduces Block-R1 benchmark, Block-R1-41K dataset, and a conflict score to handle domain-specific optimal block sizes in RL post-training of diffusion LLMs.

Remask, Don't Replace: Token-to-Mask Refinement in Diffusion Large Language Models

cs.CL · 2026-04-20 · unverdicted · novelty 7.0

Token-to-Mask remasking improves self-correction in diffusion LLMs by resetting erroneous commitments to masks rather than overwriting them, yielding +13.33 points on AIME 2025 and +8.56 on CMATH.

BARD: Bridging AutoRegressive and Diffusion Vision-Language Models Via Highly Efficient Progressive Block Merging and Stage-Wise Distillation

cs.CV · 2026-04-15 · unverdicted · novelty 7.0

BARD bridges autoregressive and diffusion VLMs with progressive block merging plus stage-wise intra-diffusion distillation, delivering 3x speedup and new SOTA on open dVLMs using under 4.4M data points.

MemDLM: Memory-Enhanced DLM Training

cs.CL · 2026-03-23 · unverdicted · novelty 7.0

MemDLM embeds a simulated denoising trajectory into DLM training via bi-level optimization, creating a parametric memory that improves convergence and long-context performance even when the memory is dropped at test time.

One Step Forward and K Steps Back: Better Reasoning with Denoising Recursion Models

cs.LG · 2026-04-20 · unverdicted · novelty 6.0

Denoising Recursion Models train multi-step noise reversal in looped transformers and outperform the prior Tiny Recursion Model on ARC-AGI.

Diffusion-State Policy Optimization for Masked Diffusion Language Models

cs.CL · 2026-02-06 · unverdicted · novelty 6.0 · 2 refs

DiSPO optimizes intermediate decisions in masked diffusion LMs by branching at selected masked states, resampling tokens, scoring completions, and updating only new tokens using a derived policy-gradient estimator that reuses terminal rollouts.

A Survey of Reinforcement Learning for Large Reasoning Models

cs.CL · 2025-09-10 · accept · novelty 3.0

A survey compiling RL methods, challenges, data resources, and applications for enhancing reasoning in large language models and large reasoning models since DeepSeek-R1.

Break the Block: Dynamic-size Reasoning Blocks for Diffusion Large Language Models via Monotonic Entropy Descent with Reinforcement Learning

cs.LG · 2026-05-04

citing papers explorer

Showing 1 of 1 citing paper after filters.

Break the Block: Dynamic-size Reasoning Blocks for Diffusion Large Language Models via Monotonic Entropy Descent with Reinforcement Learning cs.LG · 2026-05-04 · unreviewed · ref 3

Mdpo: Overcom- ing the training-inference divide of masked diffusion lan- guage models.arXiv preprint arXiv:2508.13148

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer