Multiplex thinking: Reasoning via token-wise branch- and-merge.arXiv preprint arXiv:2601.08808

Tang, Y · 2026 · arXiv 2601.08808

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Efficient and Trainable Language Model Test-Time Scaling via Local Branch Routing

cs.CL · 2026-06-24 · unverdicted · novelty 7.0

LBR performs token-level test-time scaling via local branch routing on hidden states, enabling end-to-end RL training and improving Pass@1 and Pass@32 on math benchmarks over CoT and RLVR baselines.

CurveRL: Principled Distribution-Aware Context Reweighting for LLM Reasoning

cs.LG · 2026-05-23 · unverdicted · novelty 7.0

CurveRL derives a quantile-coordinate reweighting rule from a utility functional on pass rates and shows it outperforms GRPO on reasoning benchmarks.

Block-R1: Rethinking the Role of Block Size in Multi-domain Reinforcement Learning for Diffusion Large Language Models

cs.LG · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

Introduces Block-R1 benchmark, Block-R1-41K dataset, and a conflict score to handle domain-specific optimal block sizes in RL post-training of diffusion LLMs.

Break the Block: Dynamic-size Reasoning Blocks for Diffusion Large Language Models via Monotonic Entropy Descent with Reinforcement Learning

cs.LG · 2026-05-04 · unverdicted · novelty 6.0 · 2 refs

b1 is a plug-and-play post-training framework that trains diffusion LLMs to produce dynamic-size reasoning blocks by optimizing a monotonic entropy descent objective via reinforcement learning.

Latent-GRPO: Group Relative Policy Optimization for Latent Reasoning

cs.LG · 2026-04-30 · unverdicted · novelty 6.0

Latent-GRPO stabilizes reinforcement learning in latent space, delivering 7.86 Pass@1 gains on low-difficulty tasks over latent baselines and 4.27 points over explicit GRPO on high-difficulty tasks with 3-4x shorter reasoning chains.

Select to Think: Unlocking SLM Potential with Local Sufficiency

cs.CL · 2026-04-29 · unverdicted · novelty 5.0 · 2 refs

Select to Think reframes LLM help as ranking among SLM top-K candidates and distills the ranking ability back into the SLM for improved single-pass reasoning.

citing papers explorer

Showing 6 of 6 citing papers after filters.

Efficient and Trainable Language Model Test-Time Scaling via Local Branch Routing cs.CL · 2026-06-24 · unverdicted · none · ref 25
LBR performs token-level test-time scaling via local branch routing on hidden states, enabling end-to-end RL training and improving Pass@1 and Pass@32 on math benchmarks over CoT and RLVR baselines.
CurveRL: Principled Distribution-Aware Context Reweighting for LLM Reasoning cs.LG · 2026-05-23 · unverdicted · none · ref 29
CurveRL derives a quantile-coordinate reweighting rule from a utility functional on pass rates and shows it outperforms GRPO on reasoning benchmarks.
Block-R1: Rethinking the Role of Block Size in Multi-domain Reinforcement Learning for Diffusion Large Language Models cs.LG · 2026-05-12 · unverdicted · none · ref 54 · 2 links
Introduces Block-R1 benchmark, Block-R1-41K dataset, and a conflict score to handle domain-specific optimal block sizes in RL post-training of diffusion LLMs.
Break the Block: Dynamic-size Reasoning Blocks for Diffusion Large Language Models via Monotonic Entropy Descent with Reinforcement Learning cs.LG · 2026-05-04 · unverdicted · none · ref 15 · 2 links
b1 is a plug-and-play post-training framework that trains diffusion LLMs to produce dynamic-size reasoning blocks by optimizing a monotonic entropy descent objective via reinforcement learning.
Latent-GRPO: Group Relative Policy Optimization for Latent Reasoning cs.LG · 2026-04-30 · unverdicted · none · ref 37
Latent-GRPO stabilizes reinforcement learning in latent space, delivering 7.86 Pass@1 gains on low-difficulty tasks over latent baselines and 4.27 points over explicit GRPO on high-difficulty tasks with 3-4x shorter reasoning chains.
Select to Think: Unlocking SLM Potential with Local Sufficiency cs.CL · 2026-04-29 · unverdicted · none · ref 20 · 2 links
Select to Think reframes LLM help as ranking among SLM top-K candidates and distills the ranking ability back into the SLM for improved single-pass reasoning.

Multiplex thinking: Reasoning via token-wise branch- and-merge.arXiv preprint arXiv:2601.08808

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer