Multiplex thinking: Reasoning via token-wise branch- and-merge.arXiv preprint arXiv:2601.08808

· 2026 · arXiv 2601.08808

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Block-R1: Rethinking the Role of Block Size in Multi-domain Reinforcement Learning for Diffusion Large Language Models

cs.LG · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

Introduces Block-R1 benchmark, Block-R1-41K dataset, and a conflict score to handle domain-specific optimal block sizes in RL post-training of diffusion LLMs.

Latent-GRPO: Group Relative Policy Optimization for Latent Reasoning

cs.LG · 2026-04-30 · unverdicted · novelty 6.0

Latent-GRPO stabilizes reinforcement learning in latent space, delivering 7.86 Pass@1 gains on low-difficulty tasks over latent baselines and 4.27 points over explicit GRPO on high-difficulty tasks with 3-4x shorter reasoning chains.

Select to Think: Unlocking SLM Potential with Local Sufficiency

cs.CL · 2026-04-29 · conditional · novelty 6.0

Small language models can achieve near large-model reasoning performance by learning to re-rank their own top-K token predictions after distilling selection from the large model.

Break the Block: Dynamic-size Reasoning Blocks for Diffusion Large Language Models via Monotonic Entropy Descent with Reinforcement Learning

cs.LG · 2026-05-04

citing papers explorer

Showing 4 of 4 citing papers.

Block-R1: Rethinking the Role of Block Size in Multi-domain Reinforcement Learning for Diffusion Large Language Models cs.LG · 2026-05-12 · unverdicted · none · ref 54 · 2 links
Introduces Block-R1 benchmark, Block-R1-41K dataset, and a conflict score to handle domain-specific optimal block sizes in RL post-training of diffusion LLMs.
Latent-GRPO: Group Relative Policy Optimization for Latent Reasoning cs.LG · 2026-04-30 · unverdicted · none · ref 37
Latent-GRPO stabilizes reinforcement learning in latent space, delivering 7.86 Pass@1 gains on low-difficulty tasks over latent baselines and 4.27 points over explicit GRPO on high-difficulty tasks with 3-4x shorter reasoning chains.
Select to Think: Unlocking SLM Potential with Local Sufficiency cs.CL · 2026-04-29 · conditional · none · ref 20
Small language models can achieve near large-model reasoning performance by learning to re-rank their own top-K token predictions after distilling selection from the large model.
Break the Block: Dynamic-size Reasoning Blocks for Diffusion Large Language Models via Monotonic Entropy Descent with Reinforcement Learning cs.LG · 2026-05-04 · unreviewed · ref 15

Multiplex thinking: Reasoning via token-wise branch- and-merge.arXiv preprint arXiv:2601.08808

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer