Multiplex thinking: Reasoning via token-wise branch- and-merge.arXiv preprint arXiv:2601.08808

· 2026 · arXiv 2601.08808

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Block-R1: Rethinking the Role of Block Size in Multi-domain Reinforcement Learning for Diffusion Large Language Models

cs.LG · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

Introduces Block-R1 benchmark, Block-R1-41K dataset, and a conflict score to handle domain-specific optimal block sizes in RL post-training of diffusion LLMs.

Latent-GRPO: Group Relative Policy Optimization for Latent Reasoning

cs.LG · 2026-04-30 · unverdicted · novelty 6.0

Latent-GRPO stabilizes reinforcement learning in latent space, delivering 7.86 Pass@1 gains on low-difficulty tasks over latent baselines and 4.27 points over explicit GRPO on high-difficulty tasks with 3-4x shorter reasoning chains.

Break the Block: Dynamic-size Reasoning Blocks for Diffusion Large Language Models via Monotonic Entropy Descent with Reinforcement Learning

cs.LG · 2026-05-04

Select to Think: Unlocking SLM Potential with Local Sufficiency

cs.CL · 2026-04-29