Title resolution pending

URL https: //arxiv · 2025 · arXiv 2509.06949

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Beyond Mode-Seeking RL: Trajectory-Balance Post-Training for Diffusion Language Models

cs.LG · 2026-05-13 · conditional · novelty 7.0

TraFL applies trajectory flow balancing to post-train diffusion language models, preventing mode collapse and delivering consistent gains on reasoning tasks that hold under increased sampling.

Relative Score Policy Optimization for Diffusion Language Models

cs.CL · 2026-05-11 · unverdicted · novelty 7.0

RSPO interprets reward advantages as targets for relative log-ratios in dLLMs, calibrating noisy estimates to stabilize RLVR training and achieve strong gains on planning tasks with competitive math reasoning performance.

TAD: Temporal-Aware Trajectory Self-Distillation for Fast and Accurate Diffusion LLM

cs.CL · 2026-05-10 · unverdicted · novelty 7.0

TAD improves the accuracy-parallelism trade-off in diffusion LLMs via temporal-aware self-distillation that applies hard labels to soon-to-be-decoded tokens and soft supervision to future tokens.

MemDLM: Memory-Enhanced DLM Training

cs.CL · 2026-03-23 · unverdicted · novelty 7.0

MemDLM embeds a simulated denoising trajectory into DLM training via bi-level optimization, creating a parametric memory that improves convergence and long-context performance even when the memory is dropped at test time.

Improving Sampling for Masked Diffusion Models via Information Gain

cs.CL · 2026-02-20 · unverdicted · novelty 7.0

Info-Gain Sampler improves MDM decoding by using bidirectional information gain to reduce cumulative uncertainty, outperforming greedy samplers on reasoning accuracy and creative writing tasks.

Sketch Then Paint: Hierarchical Reinforcement Learning for Diffusion Multi-Modal Large Language Models

cs.AI · 2026-05-16 · unverdicted · novelty 6.0

Proposes HT-GRPO with sketch-then-paint staged updates, prompt-conditioned importance ratios, and hierarchical credit assignment for dMLLMs, reporting gains on GenEval and DPG plus quality metrics.

Beyond Execution: Static-Analysis Rewards and Hint-Conditioned Diffusion RL for Code Generation

cs.SE · 2026-05-16 · unverdicted · novelty 4.0

Static checking rewards and moderate AST-based hints improve diffusion RL performance for code generation, with effectiveness varying by task difficulty across HumanEval, MBPP, and LiveCodeBench.

Fast-dDrive: Efficient Block-Diffusion VLM for Autonomous Driving

cs.CL · 2026-05-22

citing papers explorer

Showing 8 of 8 citing papers.

Beyond Mode-Seeking RL: Trajectory-Balance Post-Training for Diffusion Language Models cs.LG · 2026-05-13 · conditional · none · ref 21
TraFL applies trajectory flow balancing to post-train diffusion language models, preventing mode collapse and delivering consistent gains on reasoning tasks that hold under increased sampling.
Relative Score Policy Optimization for Diffusion Language Models cs.CL · 2026-05-11 · unverdicted · none · ref 95
RSPO interprets reward advantages as targets for relative log-ratios in dLLMs, calibrating noisy estimates to stabilize RLVR training and achieve strong gains on planning tasks with competitive math reasoning performance.
TAD: Temporal-Aware Trajectory Self-Distillation for Fast and Accurate Diffusion LLM cs.CL · 2026-05-10 · unverdicted · none · ref 38
TAD improves the accuracy-parallelism trade-off in diffusion LLMs via temporal-aware self-distillation that applies hard labels to soon-to-be-decoded tokens and soft supervision to future tokens.
MemDLM: Memory-Enhanced DLM Training cs.CL · 2026-03-23 · unverdicted · none · ref 11
MemDLM embeds a simulated denoising trajectory into DLM training via bi-level optimization, creating a parametric memory that improves convergence and long-context performance even when the memory is dropped at test time.
Improving Sampling for Masked Diffusion Models via Information Gain cs.CL · 2026-02-20 · unverdicted · none · ref 14
Info-Gain Sampler improves MDM decoding by using bidirectional information gain to reduce cumulative uncertainty, outperforming greedy samplers on reasoning accuracy and creative writing tasks.
Sketch Then Paint: Hierarchical Reinforcement Learning for Diffusion Multi-Modal Large Language Models cs.AI · 2026-05-16 · unverdicted · none · ref 22
Proposes HT-GRPO with sketch-then-paint staged updates, prompt-conditioned importance ratios, and hierarchical credit assignment for dMLLMs, reporting gains on GenEval and DPG plus quality metrics.
Beyond Execution: Static-Analysis Rewards and Hint-Conditioned Diffusion RL for Code Generation cs.SE · 2026-05-16 · unverdicted · none · ref 21
Static checking rewards and moderate AST-based hints improve diffusion RL performance for code generation, with effectiveness varying by task difficulty across HumanEval, MBPP, and LiveCodeBench.
Fast-dDrive: Efficient Block-Diffusion VLM for Autonomous Driving cs.CL · 2026-05-22 · unreviewed · ref 16

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer