Adaptthink: Reasoning models can learn when to think

Jiajie Zhang, Nianyi Lin, Lei Hou, Ling Feng, Juanzi Li · 2025 · DOI 10.18653/v1/2025.emnlp-main.184

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open at publisher browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

DART: Draft-Agreement Routing for Training-Free Adaptive Thinking Budgets in Hybrid Reasoning Models

cs.AI · 2026-06-22 · unverdicted · novelty 7.0

DART is a training-free router that accepts direct answers on draft agreement and allocates thinking budgets via draft entropy on disagreement, reporting accuracy gains and token reductions on math and code benchmarks across model scales.

Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning

cs.LG · 2026-04-08 · unverdicted · novelty 7.0

This survey introduces the Generate-Filter-Control-Replay (GFCR) taxonomy to structure rollout pipelines for RL-based post-training of reasoning LLMs.

ExpThink: Experience-Guided Reinforcement Learning for Adaptive Chain-of-Thought Compression

cs.LG · 2026-05-08 · unverdicted · novelty 6.0 · 2 refs

ExpThink reduces average CoT response length by up to 77% while improving accuracy on math benchmarks via experience-guided reward shaping and difficulty-adaptive advantage in RL.

Spatial Reasoning via Modality Switching Between Language and Symbolic Representation

cs.AI · 2026-06-30 · unverdicted · novelty 5.0 · 2 refs

Introduces a modality-switching mechanism for LLMs on spatial reasoning tasks using a trustworthiness and complexity based metric, showing up to 42% performance improvement.

citing papers explorer

Showing 1 of 1 citing paper after filters.

ExpThink: Experience-Guided Reinforcement Learning for Adaptive Chain-of-Thought Compression cs.LG · 2026-05-08 · unverdicted · none · ref 52 · 2 links
ExpThink reduces average CoT response length by up to 77% while improving accuracy on math benchmarks via experience-guided reward shaping and difficulty-adaptive advantage in RL.

Adaptthink: Reasoning models can learn when to think

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer