Adaptthink: Reasoning models can learn when to think

Jiajie Zhang, Nianyi Lin, Lei Hou, Ling Feng, Juanzi Li · 2025 · DOI 10.18653/v1/2025.emnlp-main.184

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open at publisher browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

DART: Draft-Agreement Routing for Training-Free Adaptive Thinking Budgets in Hybrid Reasoning Models

cs.AI · 2026-06-22 · unverdicted · novelty 7.0

DART is a training-free router that accepts direct answers on draft agreement and allocates thinking budgets via draft entropy on disagreement, reporting accuracy gains and token reductions on math and code benchmarks across model scales.

Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning

cs.LG · 2026-04-08 · unverdicted · novelty 7.0

This survey introduces the Generate-Filter-Control-Replay (GFCR) taxonomy to structure rollout pipelines for RL-based post-training of reasoning LLMs.

ExpThink: Experience-Guided Reinforcement Learning for Adaptive Chain-of-Thought Compression

cs.LG · 2026-05-08 · unverdicted · novelty 6.0 · 2 refs

ExpThink reduces average CoT response length by up to 77% while improving accuracy on math benchmarks via experience-guided reward shaping and difficulty-adaptive advantage in RL.

Spatial Reasoning via Modality Switching Between Language and Symbolic Representation

cs.AI · 2026-06-30 · unverdicted · novelty 5.0

Introduces a trustworthiness-and-complexity switching metric that lets LLMs choose between language and grid modalities for spatial reasoning, yielding up to 42% gains in tested settings.

citing papers explorer

Showing 4 of 4 citing papers.

DART: Draft-Agreement Routing for Training-Free Adaptive Thinking Budgets in Hybrid Reasoning Models cs.AI · 2026-06-22 · unverdicted · none · ref 1
DART is a training-free router that accepts direct answers on draft agreement and allocates thinking budgets via draft entropy on disagreement, reporting accuracy gains and token reductions on math and code benchmarks across model scales.
Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning cs.LG · 2026-04-08 · unverdicted · none · ref 169
This survey introduces the Generate-Filter-Control-Replay (GFCR) taxonomy to structure rollout pipelines for RL-based post-training of reasoning LLMs.
ExpThink: Experience-Guided Reinforcement Learning for Adaptive Chain-of-Thought Compression cs.LG · 2026-05-08 · unverdicted · none · ref 52 · 2 links
ExpThink reduces average CoT response length by up to 77% while improving accuracy on math benchmarks via experience-guided reward shaping and difficulty-adaptive advantage in RL.
Spatial Reasoning via Modality Switching Between Language and Symbolic Representation cs.AI · 2026-06-30 · unverdicted · none · ref 90
Introduces a trustworthiness-and-complexity switching metric that lets LLMs choose between language and grid modalities for spatial reasoning, yielding up to 42% gains in tested settings.

Adaptthink: Reasoning models can learn when to think

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer