DART: Draft-Agreement Routing for Training-Free Adaptive Thinking Budgets in Hybrid Reasoning Models

Chanjun Park; Heuiseok Lim; Hyeongju Park; Hyeonseok Moon; Jaehyung Seo; Jungseob Lee; Junyoung Son; Seongtae Hong; Seungjun Lee; Sugyeong Eo

arxiv: 2606.23181 · v1 · pith:ROFHMXESnew · submitted 2026-06-22 · 💻 cs.AI · cs.CL

DART: Draft-Agreement Routing for Training-Free Adaptive Thinking Budgets in Hybrid Reasoning Models

Jungseob Lee , Seongtae Hong , Seungjun Lee , Jaehyung Seo , Junyoung Son , Sugyeong Eo , Chanjun Park , Hyeongju Park

show 2 more authors

Hyeonseok Moon Heuiseok Lim

This is my paper

Pith reviewed 2026-06-26 08:33 UTC · model grok-4.3

classification 💻 cs.AI cs.CL

keywords hybrid reasoningtraining-free routingdraft agreemententropy-based budgetadaptive thinkingmath reasoningcode reasoningno labeled data

0 comments

The pith

DART samples two no-think drafts and routes to direct answering when they agree, setting adaptive thinking budgets from draft entropy when they disagree.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces DART, a training-free routing method for hybrid reasoning models that can either answer a query directly or allocate extra tokens for extended thinking. It generates two inexpensive drafts without thinking; when the drafts match, the model answers directly, and when they differ it uses the entropy across the drafts to select a suitable thinking budget. This routing preserves or raises accuracy on math and code benchmarks while cutting the thinking tokens consumed, and the signal works across model sizes from 0.6B to 32B without any labeled data or parameter updates.

Core claim

DART samples two cheap no-think drafts, accepts direct answering when the drafts agree, and predicts a thinking budget from draft entropy when they disagree, thereby adapting the thinking budget per query without any labeled data or gradient updates.

What carries the argument

Agreement between two no-think drafts as the signal for direct answering, with draft entropy used to set the thinking budget on disagreement.

If this is right

Accuracy on Olympiad-level math improves up to 9.0 points while thinking tokens fall 15-69 percent.
Code reasoning accuracy rises up to 22.5 points under execution-based checks while thinking tokens drop 51-63 percent.
The same Stage-1 agreement signal operates across 0.6B-32B models, multiple families, and API-only hosted models without training.
Accuracy is preserved or improved relative to always-thinking in most evaluated settings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The draft-agreement signal could be tested on additional domains such as multi-step planning or tool-use tasks.
Production systems might combine the two-draft check with a small set of follow-up drafts when entropy is borderline to further refine the budget.
The approach suggests that internal consistency checks inside a single forward pass can substitute for separately trained routers in many hybrid setups.

Load-bearing premise

Agreement between two no-think drafts reliably signals an easy query that can be answered directly without accuracy loss.

What would settle it

A dataset in which queries where the two drafts agree show substantially lower direct-answer accuracy than the always-thinking baseline would falsify the core routing signal.

Figures

Figures reproduced from arXiv: 2606.23181 by Chanjun Park, Heuiseok Lim, Hyeongju Park, Hyeonseok Moon, Jaehyung Seo, Jungseob Lee, Junyoung Son, Seongtae Hong, Seungjun Lee, Sugyeong Eo.

**Figure 1.** Figure 1: Overview of DART. Stage 1 draws K=2 no-think drafts and accepts the unanimous answer when they agree. On disagreement, Stage 2 maps draft entropy to a query-specific thinking budget and produces the answer in a separate completion. Algorithm 1 DART Inference Pipeline. Require: Query q, model M, drafts K, temperature T Ensure: Response yˆ 1: Draw r1, . . . , rK ∼ M(q, think=false, T) 2: ai ← EXTRACTANSWER(r… view at source ↗

**Figure 2.** Figure 2: Efficiency Pareto (left) and response-level failure-mode breakdown (right) for Qwen3-8B on MATH-500. [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Difficulty-stratified Stage-1 behaviour on [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Draft entropy vs. actual thinking tokens on [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗

read the original abstract

Hybrid reasoning models can answer directly or spend extra tokens on extended thinking. A practical router should choose between these modes for each query, so easy problems avoid unnecessary reasoning and hard problems receive enough budget to finish the answer. Existing routers move in this direction, but they typically require labeled training data or fix thinking budgets up front, ignoring answer-level evidence from the model itself. We introduce DART, a training-free routing framework that samples two cheap no-think drafts, accepts direct answering when the drafts agree, and predicts a thinking budget from draft entropy when they disagree. Across the main comparisons, DART preserves or improves always-thinking accuracy in most settings while reducing thinking-token use. On math reasoning, accuracy improves by up to $+$9.0 points on Olympiad-level problems while thinking tokens drop 15-69%. On code reasoning under execution-based equivalence, accuracy improves by up to +22.5 points while thinking tokens drop 51-63%. The Stage~1 signal extends across model scales (0.6B-32B), model families, and API-only hosted settings, with no labeled data and no gradient updates required.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DART gives a clean training-free router via draft agreement plus entropy, but the results skip the key check that agreement actually marks easy queries.

read the letter

The paper introduces DART, a router that samples two cheap no-think drafts from a hybrid model. If the drafts match, it accepts the direct answer; if they differ, it turns the entropy of those drafts into a thinking budget. No training data or gradient steps are needed.

This combination is new relative to routers that demand labeled examples or preset budgets. The abstract reports accuracy holding or rising on math and code tasks while cutting thinking tokens, with the pattern holding from 0.6B to 32B models and even on API endpoints.

The soft spot is exactly the one the stress-test flags. The whole method rests on agreement being a reliable signal that the direct answer is correct. The paper gives no per-subset split showing higher direct-answer accuracy on agreed cases than on disagreed ones, and no correlation between entropy and actual difficulty or required budget. Without those numbers, the reported gains (+9 on Olympiad math, +22.5 on code) could come from sampling variance or evaluation choices rather than the router itself. The mapping from entropy to budget size is also left vague.

The work is aimed at people building efficient inference pipelines for hybrid reasoning models. A reader looking for training-free ideas will find a practical starting point, but anyone planning to deploy it will need the missing validation splits first.

The paper deserves a serious referee. The core idea is simple enough to test, and the gaps are fixable with additional tables rather than a full redesign.

Referee Report

3 major / 2 minor

Summary. The paper introduces DART, a training-free routing framework for hybrid reasoning models. It samples two no-think drafts from the model; if the drafts agree, it accepts the direct (no-think) answer; if they disagree, it allocates an adaptive thinking budget derived from the entropy of the drafts. The central empirical claim is that this routing preserves or improves accuracy relative to always-thinking baselines while reducing thinking-token usage, with reported gains of up to +9.0 points on Olympiad-level math and +22.5 points on execution-based code tasks, holding across model scales (0.6B–32B), families, and API-only settings without any labeled data or gradient updates.

Significance. If the routing signal is validated, the result would be significant for practical hybrid reasoning systems: it supplies an adaptive, zero-training method for choosing between direct and extended thinking modes using only model-internal evidence. Credit is due for the fully training-free design, the absence of fitted parameters, the demonstrated applicability to API-only models, and the coverage across scales and task types.

major comments (3)

[Experimental Results] The load-bearing assumption—that draft agreement preferentially occurs on queries where the no-think answer is correct—is not supported by any per-subset accuracy breakdown (agreed vs. disagreed cases). Without this split, the reported accuracy gains (+9.0 math, +22.5 code) cannot be attributed to the routing rule rather than sampling variance or evaluation protocol; this must be added to the main results.
[Method] The mapping from draft entropy to thinking budget is described only at a high level; no explicit formula, calibration procedure, or justification with error bars or statistical testing is supplied. This leaves the adaptive-budget component of the central claim ungrounded and prevents assessment of robustness.
[Evaluation Protocol] No information is given on data splits, number of evaluation runs, or statistical significance tests for the token-reduction and accuracy claims. This is required to substantiate the cross-task and cross-scale results.

minor comments (2)

[Abstract] The term 'Stage 1 signal' appears in the abstract without prior definition; introduce it explicitly in the method section.
[Notation] Ensure consistent terminology between 'thinking tokens' and 'thinking budget' across text and figures.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major point below and will revise the manuscript accordingly to strengthen the empirical support and methodological clarity.

read point-by-point responses

Referee: [Experimental Results] The load-bearing assumption—that draft agreement preferentially occurs on queries where the no-think answer is correct—is not supported by any per-subset accuracy breakdown (agreed vs. disagreed cases). Without this split, the reported accuracy gains (+9.0 math, +22.5 code) cannot be attributed to the routing rule rather than sampling variance or evaluation protocol; this must be added to the main results.

Authors: We agree that a per-subset breakdown is necessary to substantiate the routing signal. In the revision we will add a table reporting accuracy on the agreed subset (where the direct answer is used) versus the disagreed subset (where the thinking budget is allocated), together with the corresponding always-thinking and no-thinking baselines. This will allow readers to verify that agreement correlates with correctness of the no-think answer. revision: yes
Referee: [Method] The mapping from draft entropy to thinking budget is described only at a high level; no explicit formula, calibration procedure, or justification with error bars or statistical testing is supplied. This leaves the adaptive-budget component of the central claim ungrounded and prevents assessment of robustness.

Authors: We acknowledge the description was high-level. The revised manuscript will include the explicit formula relating draft entropy to the allocated thinking budget, the procedure used to select the budget values, and any accompanying error bars or statistical tests that demonstrate robustness of the mapping. revision: yes
Referee: [Evaluation Protocol] No information is given on data splits, number of evaluation runs, or statistical significance tests for the token-reduction and accuracy claims. This is required to substantiate the cross-task and cross-scale results.

Authors: We will expand the experimental section to report the data splits employed, the number of independent evaluation runs, and the statistical significance tests (including p-values where appropriate) supporting the accuracy and token-reduction claims across tasks and model scales. revision: yes

Circularity Check

0 steps flagged

No circularity: routing rule defined directly from model outputs with no fitted parameters or self-referential reductions.

full rationale

The DART method is introduced as sampling two no-think drafts from the model, accepting the direct answer on agreement, and using draft entropy to set thinking budget on disagreement. This is presented as a direct heuristic operating on raw model generations without any equations, fitted parameters, or derivations that reduce to the inputs by construction. No self-citations are invoked to justify the core routing logic, and performance numbers are reported as empirical observations across scales and tasks rather than as predictions forced by the method's own definitions. The approach is therefore self-contained and does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the method is described as operating on raw model outputs.

pith-pipeline@v0.9.1-grok · 5773 in / 1211 out tokens · 26664 ms · 2026-06-26T08:33:40.173065+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

43 extracted references · 6 canonical work pages

[1]

A dapt T hink: Reasoning Models Can Learn When to Think

Zhang, Jiajie and Lin, Nianyi and Hou, Lei and Feng, Ling and Li, Juanzi. A dapt T hink: Reasoning Models Can Learn When to Think. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.184

work page doi:10.18653/v1/2025.emnlp-main.184 2025
[2]

T hink S witcher: When to Think Hard, When to Think Fast

Liang, Guosheng and Zhong, Longguang and Yang, Ziyi and Quan, Xiaojun. T hink S witcher: When to Think Hard, When to Think Fast. Findings of the Association for Computational Linguistics: EMNLP 2025. 2025. doi:10.18653/v1/2025.findings-emnlp.278

work page doi:10.18653/v1/2025.findings-emnlp.278 2025
[3]

When to Continue Thinking: Adaptive Thinking Mode Switching for Efficient Reasoning

Zhang, Xiaoyun and Ruan, Jingqing and Ma, Xing and Zhu, Yawen and Zhao, Haodong and Li, Hao and Chen, Jiansong and Zeng, Ke and Cai, Xunliang. When to Continue Thinking: Adaptive Thinking Mode Switching for Efficient Reasoning. Findings of the Association for Computational Linguistics: EMNLP 2025. 2025. doi:10.18653/v1/2025.findings-emnlp.310

work page doi:10.18653/v1/2025.findings-emnlp.310 2025
[4]

arXiv preprint arXiv:2505.11274 , year=

SelfBudgeter: Adaptive Token Allocation for Efficient LLM Reasoning , author=. arXiv preprint arXiv:2505.11274 , year=. 2505.11274 , archivePrefix=

Pith/arXiv arXiv
[5]

2025 , eprint=

Hierarchical Budget Policy Optimization for Adaptive Reasoning , author=. 2025 , eprint=

2025
[6]

arXiv preprint arXiv:2505.19435 , year=

Route to Reason: Adaptive Routing for LLM and Reasoning Strategy Selection , author=. arXiv preprint arXiv:2505.19435 , year=. 2505.19435 , archivePrefix=

arXiv
[7]

2025 , eprint=

Demystifying Hybrid Thinking: Can LLMs Truly Switch Between Think and No-Think? , author=. 2025 , eprint=

2025
[8]

arXiv preprint arXiv:2507.02076 , year=

Reasoning on a Budget: A Survey of Adaptive and Controllable Test-Time Compute in LLMs , author=. arXiv preprint arXiv:2507.02076 , year=. 2507.02076 , archivePrefix=

arXiv
[9]

arXiv preprint arXiv:2505.09388 , year=

Qwen3 Technical Report , author=. arXiv preprint arXiv:2505.09388 , year=

Pith/arXiv arXiv
[10]

2026 , howpublished=

2026
[11]

Transactions on Machine Learning Research , issn=

Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models , author=. Transactions on Machine Learning Research , issn=. 2025 , url=

2025
[12]

Transactions on Machine Learning Research , issn=

Efficient Reasoning Models: A Survey , author=. Transactions on Machine Learning Research , issn=. 2025 , url=

2025
[13]

The Eleventh International Conference on Learning Representations , year=

Self-Consistency Improves Chain of Thought Reasoning in Language Models , author=. The Eleventh International Conference on Learning Representations , year=
[14]

arXiv preprint arXiv:2512.02556 , year=

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models , author=. arXiv preprint arXiv:2512.02556 , year=. 2512.02556 , archivePrefix=

Pith/arXiv arXiv
[15]

arXiv preprint arXiv:2501.12948 , year=

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning , author=. arXiv preprint arXiv:2501.12948 , year=

Pith/arXiv arXiv
[16]

doi: 10.18653/v1/2024.acl-long.211

He, Chaoqun and Luo, Renjie and Bai, Yuzhuo and Hu, Shengding and Thai, Zhen and Shen, Junhao and Hu, Jinyi and Han, Xu and Huang, Yujie and Zhang, Yuxiang and Liu, Jie and Qi, Lei and Liu, Zhiyuan and Sun, Maosong. O lympiad B ench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems. Proceedings of the ...

work page doi:10.18653/v1/2024.acl-long.211 2024
[17]

The Eleventh International Conference on Learning Representations , year=

Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation , author=. The Eleventh International Conference on Learning Representations , year=
[18]

Measuring Mathematical Problem Solving With the

Dan Hendrycks and Collin Burns and Saurav Kadavath and Akul Arora and Steven Basart and Eric Tang and Dawn Song and Jacob Steinhardt , booktitle=. Measuring Mathematical Problem Solving With the. 2021 , url=

2021
[19]

The Twelfth International Conference on Learning Representations , year=

Let's Verify Step by Step , author=. The Twelfth International Conference on Learning Representations , year=
[20]

Think you have Solved Question Answering? Try

Clark, Peter and Cowhey, Isaac and Etzioni, Oren and Khot, Tushar and Sabharwal, Ashish and Schoenick, Carissa and Tafjord, Oyvind , journal=. Think you have Solved Question Answering? Try
[21]

Bowman , booktitle=

David Rein and Betty Li Hou and Asa Cooper Stickland and Jackson Petty and Richard Yuanzhe Pang and Julien Dirani and Julian Michael and Samuel R. Bowman , booktitle=. 2024 , url=

2024
[22]

Proceedings of the 38th International Conference on Neural Information Processing Systems , articleno =

Wang, Yubo and Ma, Xueguang and Zhang, Ge and Ni, Yuansheng and Chandra, Abhranil and Guo, Shiguang and Ren, Weiming and Arulraj, Aaran and He, Xuan and Jiang, Ziyan and Li, Tianle and Ku, Max and Wang, Kai and Zhuang, Alex and Fan, Rongqi and Yue, Xiang and Chen, Wenhu , title =. Proceedings of the 38th International Conference on Neural Information Proc...

2024
[23]

arXiv preprint arXiv:2107.03374 , year=

Evaluating Large Language Models Trained on Code , author=. arXiv preprint arXiv:2107.03374 , year=

Pith/arXiv arXiv
[24]

arXiv preprint arXiv:2108.07732 , year=

Program Synthesis with Large Language Models , author=. arXiv preprint arXiv:2108.07732 , year=. 2108.07732 , archivePrefix=

Pith/arXiv arXiv
[25]

Jia, Maxwell , year=
[26]

2025 , publisher=

AIME 2025 - Unified Test-Time Scaling Format , author=. 2025 , publisher=

2025
[27]

arXiv preprint arXiv:2409.12122 , year=

Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement , author=. arXiv preprint arXiv:2409.12122 , year=. 2409.12122 , archivePrefix=

Pith/arXiv arXiv
[28]

arXiv preprint arXiv:2506.23840 , year=

Do Thinking Tokens Help or Trap? Towards More Efficient Large Reasoning Model , author=. arXiv preprint arXiv:2506.23840 , year=

arXiv
[29]

Leslie Lamport

Kwon, Woosuk and Li, Zhuohan and Zhuang, Siyuan and Sheng, Ying and Zheng, Lianmin and Yu, Cody Hao and Gonzalez, Joseph and Zhang, Hao and Stoica, Ion , title =. Proceedings of the 29th Symposium on Operating Systems Principles , pages =. 2023 , isbn =. doi:10.1145/3600006.3613165 , abstract =

work page doi:10.1145/3600006.3613165 2023
[30]

Lingjiao Chen and Matei Zaharia and James Zou , journal=. Frugal. 2024 , url=

2024
[31]

Gonzalez and M Waleed Kadous and Ion Stoica , booktitle=

Isaac Ong and Amjad Almahairi and Vincent Wu and Wei-Lin Chiang and Tianhao Wu and Joseph E. Gonzalez and M Waleed Kadous and Ion Stoica , booktitle=. Route. 2025 , url=

2025
[32]

Charlie Victor Snell and Jaehoon Lee and Kelvin Xu and Aviral Kumar , booktitle=. Scaling. 2025 , url=

2025
[33]

ICML 2024 Workshop on In-Context Learning , year=

Universal Self-Consistency for Large Language Models , author=. ICML 2024 Workshop on In-Context Learning , year=

2024
[34]

2017 , eprint=

Adaptive Computation Time for Recurrent Neural Networks , author=. 2017 , eprint=

2017
[35]

Advances in Neural Information Processing Systems , editor=

Confident Adaptive Language Modeling , author=. Advances in Neural Information Processing Systems , editor=. 2022 , url=

2022
[36]

arXiv preprint arXiv:2207.05221 , year=

Language Models (Mostly) Know What They Know , author=. arXiv preprint arXiv:2207.05221 , year=

Pith/arXiv arXiv
[37]

The Annals of Mathematical Statistics , volume=

An Empirical Distribution Function for Sampling with Incomplete Information , author=. The Annals of Mathematical Statistics , volume=. 1955 , doi=

1955
[38]

International Conference on Machine Learning , pages=

Fast inference from transformers via speculative decoding , author=. International Conference on Machine Learning , pages=. 2023 , organization=

2023
[39]

Let ' s Sample Step by Step: Adaptive-Consistency for Efficient Reasoning and Coding with LLM s

Aggarwal, Pranjal and Madaan, Aman and Yang, Yiming and Mausam. Let ' s Sample Step by Step: Adaptive-Consistency for Efficient Reasoning and Coding with LLM s. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.761

work page doi:10.18653/v1/2023.emnlp-main.761 2023
[40]

2026 , howpublished=

Introducing. 2026 , howpublished=

2026
[41]

2026 , howpublished=

Building with Extended Thinking , author=. 2026 , howpublished=

2026
[42]

2025 , eprint=

MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe , author=. 2025 , eprint=

2025
[43]

2026 , eprint=

Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence , author=. 2026 , eprint=

2026

[1] [1]

A dapt T hink: Reasoning Models Can Learn When to Think

Zhang, Jiajie and Lin, Nianyi and Hou, Lei and Feng, Ling and Li, Juanzi. A dapt T hink: Reasoning Models Can Learn When to Think. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.184

work page doi:10.18653/v1/2025.emnlp-main.184 2025

[2] [2]

T hink S witcher: When to Think Hard, When to Think Fast

Liang, Guosheng and Zhong, Longguang and Yang, Ziyi and Quan, Xiaojun. T hink S witcher: When to Think Hard, When to Think Fast. Findings of the Association for Computational Linguistics: EMNLP 2025. 2025. doi:10.18653/v1/2025.findings-emnlp.278

work page doi:10.18653/v1/2025.findings-emnlp.278 2025

[3] [3]

When to Continue Thinking: Adaptive Thinking Mode Switching for Efficient Reasoning

Zhang, Xiaoyun and Ruan, Jingqing and Ma, Xing and Zhu, Yawen and Zhao, Haodong and Li, Hao and Chen, Jiansong and Zeng, Ke and Cai, Xunliang. When to Continue Thinking: Adaptive Thinking Mode Switching for Efficient Reasoning. Findings of the Association for Computational Linguistics: EMNLP 2025. 2025. doi:10.18653/v1/2025.findings-emnlp.310

work page doi:10.18653/v1/2025.findings-emnlp.310 2025

[4] [4]

arXiv preprint arXiv:2505.11274 , year=

SelfBudgeter: Adaptive Token Allocation for Efficient LLM Reasoning , author=. arXiv preprint arXiv:2505.11274 , year=. 2505.11274 , archivePrefix=

Pith/arXiv arXiv

[5] [5]

2025 , eprint=

Hierarchical Budget Policy Optimization for Adaptive Reasoning , author=. 2025 , eprint=

2025

[6] [6]

arXiv preprint arXiv:2505.19435 , year=

Route to Reason: Adaptive Routing for LLM and Reasoning Strategy Selection , author=. arXiv preprint arXiv:2505.19435 , year=. 2505.19435 , archivePrefix=

arXiv

[7] [7]

2025 , eprint=

Demystifying Hybrid Thinking: Can LLMs Truly Switch Between Think and No-Think? , author=. 2025 , eprint=

2025

[8] [8]

arXiv preprint arXiv:2507.02076 , year=

Reasoning on a Budget: A Survey of Adaptive and Controllable Test-Time Compute in LLMs , author=. arXiv preprint arXiv:2507.02076 , year=. 2507.02076 , archivePrefix=

arXiv

[9] [9]

arXiv preprint arXiv:2505.09388 , year=

Qwen3 Technical Report , author=. arXiv preprint arXiv:2505.09388 , year=

Pith/arXiv arXiv

[10] [10]

2026 , howpublished=

2026

[11] [11]

Transactions on Machine Learning Research , issn=

Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models , author=. Transactions on Machine Learning Research , issn=. 2025 , url=

2025

[12] [12]

Transactions on Machine Learning Research , issn=

Efficient Reasoning Models: A Survey , author=. Transactions on Machine Learning Research , issn=. 2025 , url=

2025

[13] [13]

The Eleventh International Conference on Learning Representations , year=

Self-Consistency Improves Chain of Thought Reasoning in Language Models , author=. The Eleventh International Conference on Learning Representations , year=

[14] [14]

arXiv preprint arXiv:2512.02556 , year=

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models , author=. arXiv preprint arXiv:2512.02556 , year=. 2512.02556 , archivePrefix=

Pith/arXiv arXiv

[15] [15]

arXiv preprint arXiv:2501.12948 , year=

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning , author=. arXiv preprint arXiv:2501.12948 , year=

Pith/arXiv arXiv

[16] [16]

doi: 10.18653/v1/2024.acl-long.211

He, Chaoqun and Luo, Renjie and Bai, Yuzhuo and Hu, Shengding and Thai, Zhen and Shen, Junhao and Hu, Jinyi and Han, Xu and Huang, Yujie and Zhang, Yuxiang and Liu, Jie and Qi, Lei and Liu, Zhiyuan and Sun, Maosong. O lympiad B ench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems. Proceedings of the ...

work page doi:10.18653/v1/2024.acl-long.211 2024

[17] [17]

The Eleventh International Conference on Learning Representations , year=

Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation , author=. The Eleventh International Conference on Learning Representations , year=

[18] [18]

Measuring Mathematical Problem Solving With the

Dan Hendrycks and Collin Burns and Saurav Kadavath and Akul Arora and Steven Basart and Eric Tang and Dawn Song and Jacob Steinhardt , booktitle=. Measuring Mathematical Problem Solving With the. 2021 , url=

2021

[19] [19]

The Twelfth International Conference on Learning Representations , year=

Let's Verify Step by Step , author=. The Twelfth International Conference on Learning Representations , year=

[20] [20]

Think you have Solved Question Answering? Try

Clark, Peter and Cowhey, Isaac and Etzioni, Oren and Khot, Tushar and Sabharwal, Ashish and Schoenick, Carissa and Tafjord, Oyvind , journal=. Think you have Solved Question Answering? Try

[21] [21]

Bowman , booktitle=

David Rein and Betty Li Hou and Asa Cooper Stickland and Jackson Petty and Richard Yuanzhe Pang and Julien Dirani and Julian Michael and Samuel R. Bowman , booktitle=. 2024 , url=

2024

[22] [22]

Proceedings of the 38th International Conference on Neural Information Processing Systems , articleno =

Wang, Yubo and Ma, Xueguang and Zhang, Ge and Ni, Yuansheng and Chandra, Abhranil and Guo, Shiguang and Ren, Weiming and Arulraj, Aaran and He, Xuan and Jiang, Ziyan and Li, Tianle and Ku, Max and Wang, Kai and Zhuang, Alex and Fan, Rongqi and Yue, Xiang and Chen, Wenhu , title =. Proceedings of the 38th International Conference on Neural Information Proc...

2024

[23] [23]

arXiv preprint arXiv:2107.03374 , year=

Evaluating Large Language Models Trained on Code , author=. arXiv preprint arXiv:2107.03374 , year=

Pith/arXiv arXiv

[24] [24]

arXiv preprint arXiv:2108.07732 , year=

Program Synthesis with Large Language Models , author=. arXiv preprint arXiv:2108.07732 , year=. 2108.07732 , archivePrefix=

Pith/arXiv arXiv

[25] [25]

Jia, Maxwell , year=

[26] [26]

2025 , publisher=

AIME 2025 - Unified Test-Time Scaling Format , author=. 2025 , publisher=

2025

[27] [27]

arXiv preprint arXiv:2409.12122 , year=

Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement , author=. arXiv preprint arXiv:2409.12122 , year=. 2409.12122 , archivePrefix=

Pith/arXiv arXiv

[28] [28]

arXiv preprint arXiv:2506.23840 , year=

Do Thinking Tokens Help or Trap? Towards More Efficient Large Reasoning Model , author=. arXiv preprint arXiv:2506.23840 , year=

arXiv

[29] [29]

Leslie Lamport

Kwon, Woosuk and Li, Zhuohan and Zhuang, Siyuan and Sheng, Ying and Zheng, Lianmin and Yu, Cody Hao and Gonzalez, Joseph and Zhang, Hao and Stoica, Ion , title =. Proceedings of the 29th Symposium on Operating Systems Principles , pages =. 2023 , isbn =. doi:10.1145/3600006.3613165 , abstract =

work page doi:10.1145/3600006.3613165 2023

[30] [30]

Lingjiao Chen and Matei Zaharia and James Zou , journal=. Frugal. 2024 , url=

2024

[31] [31]

Gonzalez and M Waleed Kadous and Ion Stoica , booktitle=

Isaac Ong and Amjad Almahairi and Vincent Wu and Wei-Lin Chiang and Tianhao Wu and Joseph E. Gonzalez and M Waleed Kadous and Ion Stoica , booktitle=. Route. 2025 , url=

2025

[32] [32]

Charlie Victor Snell and Jaehoon Lee and Kelvin Xu and Aviral Kumar , booktitle=. Scaling. 2025 , url=

2025

[33] [33]

ICML 2024 Workshop on In-Context Learning , year=

Universal Self-Consistency for Large Language Models , author=. ICML 2024 Workshop on In-Context Learning , year=

2024

[34] [34]

2017 , eprint=

Adaptive Computation Time for Recurrent Neural Networks , author=. 2017 , eprint=

2017

[35] [35]

Advances in Neural Information Processing Systems , editor=

Confident Adaptive Language Modeling , author=. Advances in Neural Information Processing Systems , editor=. 2022 , url=

2022

[36] [36]

arXiv preprint arXiv:2207.05221 , year=

Language Models (Mostly) Know What They Know , author=. arXiv preprint arXiv:2207.05221 , year=

Pith/arXiv arXiv

[37] [37]

The Annals of Mathematical Statistics , volume=

An Empirical Distribution Function for Sampling with Incomplete Information , author=. The Annals of Mathematical Statistics , volume=. 1955 , doi=

1955

[38] [38]

International Conference on Machine Learning , pages=

Fast inference from transformers via speculative decoding , author=. International Conference on Machine Learning , pages=. 2023 , organization=

2023

[39] [39]

Let ' s Sample Step by Step: Adaptive-Consistency for Efficient Reasoning and Coding with LLM s

Aggarwal, Pranjal and Madaan, Aman and Yang, Yiming and Mausam. Let ' s Sample Step by Step: Adaptive-Consistency for Efficient Reasoning and Coding with LLM s. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.761

work page doi:10.18653/v1/2023.emnlp-main.761 2023

[40] [40]

2026 , howpublished=

Introducing. 2026 , howpublished=

2026

[41] [41]

2026 , howpublished=

Building with Extended Thinking , author=. 2026 , howpublished=

2026

[42] [42]

2025 , eprint=

MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe , author=. 2025 , eprint=

2025

[43] [43]

2026 , eprint=

Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence , author=. 2026 , eprint=

2026