DART is a training-free router that accepts direct answers on draft agreement and allocates thinking budgets via draft entropy on disagreement, reporting accuracy gains and token reductions on math and code benchmarks across model scales.
Adaptthink: Reasoning models can learn when to think
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 4verdicts
UNVERDICTED 4roles
background 1polarities
background 1representative citing papers
This survey introduces the Generate-Filter-Control-Replay (GFCR) taxonomy to structure rollout pipelines for RL-based post-training of reasoning LLMs.
ExpThink reduces average CoT response length by up to 77% while improving accuracy on math benchmarks via experience-guided reward shaping and difficulty-adaptive advantage in RL.
Introduces a trustworthiness-and-complexity switching metric that lets LLMs choose between language and grid modalities for spatial reasoning, yielding up to 42% gains in tested settings.
citing papers explorer
-
DART: Draft-Agreement Routing for Training-Free Adaptive Thinking Budgets in Hybrid Reasoning Models
DART is a training-free router that accepts direct answers on draft agreement and allocates thinking budgets via draft entropy on disagreement, reporting accuracy gains and token reductions on math and code benchmarks across model scales.
-
Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning
This survey introduces the Generate-Filter-Control-Replay (GFCR) taxonomy to structure rollout pipelines for RL-based post-training of reasoning LLMs.
-
ExpThink: Experience-Guided Reinforcement Learning for Adaptive Chain-of-Thought Compression
ExpThink reduces average CoT response length by up to 77% while improving accuracy on math benchmarks via experience-guided reward shaping and difficulty-adaptive advantage in RL.
-
Spatial Reasoning via Modality Switching Between Language and Symbolic Representation
Introduces a trustworthiness-and-complexity switching metric that lets LLMs choose between language and grid modalities for spatial reasoning, yielding up to 42% gains in tested settings.