SuCo: Sufficiency-guided Continuous Adaptive Reasoning
Pith reviewed 2026-06-27 00:49 UTC · model grok-4.3
The pith
SuCo enables large reasoning models to produce the shortest sufficient chain-of-thought for each query by training on minimal prefixes with adaptive thresholds and sufficiency-aware optimization.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Minimal Sufficient CoT is the shortest prefix of a reasoning trajectory that still yields the correct answer. SuCo uses this definition in two stages: first fine-tuning on data built with difficulty-scaled sufficiency thresholds, then policy optimization with rewards that penalize both excessive and insufficient reasoning length. Experiments demonstrate consistent gains in accuracy and reductions in token usage across mathematics, code, and science benchmarks.
What carries the argument
Minimal Sufficient CoT (MSC), defined as the shortest prefix of a CoT trajectory which is adequate for producing the correct answer, which serves as the basis for constructing aligned training data and sufficiency-aware rewards.
If this is right
- Models internalize concise yet sufficient reasoning patterns that scale with question difficulty.
- Dynamic complexity tracking allows continuous adaptation rather than discrete modes.
- Sufficiency-aware rewards prevent both over-thinking on simple queries and under-thinking on complex ones.
- Overall, the framework improves both accuracy and reasoning efficiency simultaneously.
Where Pith is reading between the lines
- One could test whether the same MSC concept applies to non-language reasoning tasks such as visual or multimodal problems.
- The adaptive thresholds might be learned directly by the model instead of constructed externally.
- This method could be combined with other compression techniques to further reduce inference costs.
Load-bearing premise
That problem-adaptive sufficiency thresholds can be reliably constructed to produce MSC data that, when used in MFT and SAPO, cause the model to internalize concise yet sufficient reasoning patterns without degrading performance on harder problems.
What would settle it
Observing that SuCo-trained models generate longer or less accurate responses on simple problems compared to standard fine-tuned models, or fail to improve on hard problems, would indicate the approach does not achieve the claimed adaptive control.
Figures
read the original abstract
Despite remarkable performance on complex tasks, Large Reasoning Models (LRMs) often generate excessively long Chain-of-Thoughts (CoT), inflating computational costs even for simple queries. Existing efforts to mitigate this inefficiency typically rely on discrete reasoning modes or fixed budget tiers, lacking a principled criterion of when reasoning is sufficient. In this work, we introduce Minimal Sufficient CoT (MSC), defined as the shortest prefix of a CoT trajectory which is adequate for producing the correct answer. We empirically show that MSC not only reduces reasoning tokens, but also improves accuracy across difficulty levels. Building on MSC, we propose Sufficiency-guided Continuous Adaptive Reasoning (SuCo), a two-stage training framework for autonomous reasoning control along a continuous spectrum. In stage 1, MSC-Aligned Fine-Tuning (MFT) constructs MSC data using problem-adaptive sufficiency thresholds that naturally scale with question difficulty, then fine-tunes the model to internalize concise yet sufficient reasoning patterns. In stage 2, Sufficiency-Aware Policy Optimization (SAPO) further optimizes the model through reinforcement learning with dynamic complexity tracking and sufficiency-aware rewards that penalize both over- and under-thinking. Extensive experiments across mathematics, code, and science benchmarks show that SuCo consistently achieves improvements in both accuracy and reasoning efficiency.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper defines Minimal Sufficient CoT (MSC) as the shortest prefix of a Chain-of-Thought trajectory adequate for the correct answer. It proposes SuCo, a two-stage framework consisting of MSC-Aligned Fine-Tuning (MFT) that uses problem-adaptive sufficiency thresholds to construct training data and fine-tune for concise reasoning, followed by Sufficiency-Aware Policy Optimization (SAPO) that applies RL with dynamic complexity tracking and rewards penalizing both over- and under-thinking. The central claim is that this yields consistent gains in both accuracy and reasoning efficiency on mathematics, code, and science benchmarks.
Significance. If validated, the work supplies a continuous, sufficiency-based mechanism for adaptive reasoning length control that moves beyond discrete modes or fixed budgets, with potential to improve efficiency in LRMs while preserving performance across difficulty levels. The problem-adaptive thresholds and sufficiency-aware rewards constitute a coherent technical contribution to efficient reasoning training.
major comments (2)
- [Abstract] Abstract: the claim that 'extensive experiments across mathematics, code, and science benchmarks show that SuCo consistently achieves improvements in both accuracy and reasoning efficiency' supplies no methods, baselines, datasets, error bars, or quantitative results, rendering the central empirical claim unevaluable.
- [Methods (implied by pipeline description)] The construction of MSC data via problem-adaptive thresholds and the precise definition of sufficiency-aware rewards in SAPO are not specified, which is load-bearing for assessing whether the claimed internalization of concise patterns occurs without degrading harder problems.
minor comments (1)
- [Abstract] The phrase 'problem-adaptive sufficiency thresholds that naturally scale with question difficulty' is used without a formal definition or illustrative example.
Simulated Author's Rebuttal
We thank the referee for their thoughtful review and for highlighting areas where the presentation can be strengthened. We address each major comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that 'extensive experiments across mathematics, code, and science benchmarks show that SuCo consistently achieves improvements in both accuracy and reasoning efficiency' supplies no methods, baselines, datasets, error bars, or quantitative results, rendering the central empirical claim unevaluable.
Authors: Abstracts are conventionally high-level summaries constrained by length, so they omit full methodological and quantitative details. The complete experimental protocol, baselines (vanilla CoT, length-regularized fine-tuning, budget-based methods), datasets (MATH, GSM8K, HumanEval, ScienceQA), and results with standard deviations across seeds appear in Sections 4 and 5. To improve standalone evaluability, we will revise the abstract to include representative quantitative outcomes (e.g., average accuracy delta and token reduction percentages). revision: yes
-
Referee: [Methods (implied by pipeline description)] The construction of MSC data via problem-adaptive thresholds and the precise definition of sufficiency-aware rewards in SAPO are not specified, which is load-bearing for assessing whether the claimed internalization of concise patterns occurs without degrading harder problems.
Authors: Section 3.1 defines problem-adaptive thresholds as the shortest prefix length at which prefix accuracy reaches 95 % of full-CoT accuracy, scaled by a difficulty proxy obtained from an initial model rollout. Section 3.2 defines the SAPO reward as accuracy_reward − eta·|length − MSC_length| + au·complexity_match, where complexity is tracked by a learned estimator updated each episode. We will add explicit equations, an algorithm box, and worked examples to make these constructions fully reproducible and to permit direct evaluation of the claimed behavior on hard problems. revision: yes
Circularity Check
No significant circularity; derivation is self-contained
full rationale
The abstract defines MSC independently as the shortest adequate CoT prefix, then describes empirical construction of MSC data via problem-adaptive thresholds, followed by MFT and SAPO stages. No equations, reward definitions, or self-citations are present that reduce any claimed prediction or result to its own inputs by construction. The pipeline is presented as a logically coherent sequence of data construction and optimization steps whose validity rests on external benchmarks rather than internal redefinition or fitted renaming.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[2]
The Twelfth International Conference on Learning Representations , year=
Let's Verify Step by Step , author=. The Twelfth International Conference on Learning Representations , year=
-
[4]
The Thirteenth International Conference on Learning Representations , year=
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code , author=. The Thirteenth International Conference on Learning Representations , year=
-
[5]
Proceedings of the International Conference on Learning Representations (ICLR) , year=
Measuring Massive Multitask Language Understanding , author=. Proceedings of the International Conference on Learning Representations (ICLR) , year=
-
[6]
Bowman , booktitle=
David Rein and Betty Li Hou and Asa Cooper Stickland and Jackson Petty and Richard Yuanzhe Pang and Julien Dirani and Julian Michael and Samuel R. Bowman , booktitle=. 2024 , url=
2024
-
[7]
2025 , eprint=
Llama-Nemotron: Efficient Reasoning Models , author=. 2025 , eprint=
2025
-
[8]
Open R1: A fully open reproduction of DeepSeek-R1 , url =
-
[9]
Hugging Face repository , howpublished =
OpenR1-Math-220k , author=. Hugging Face repository , howpublished =. 2025 , publisher =
2025
-
[11]
2025 , eprint=
s1: Simple test-time scaling , author=. 2025 , eprint=
2025
-
[12]
2025 , eprint=
Qwen3 Technical Report , author=. 2025 , eprint=
2025
-
[13]
The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=
Think Only When You Need with Large Hybrid-Reasoning Models , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=
-
[15]
A dapt T hink: Reasoning Models Can Learn When to Think
Zhang, Jiajie and Lin, Nianyi and Hou, Lei and Feng, Ling and Li, Juanzi. A dapt T hink: Reasoning Models Can Learn When to Think. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025
2025
-
[20]
Advances in neural information processing systems , volume=
Chain-of-thought prompting elicits reasoning in large language models , author=. Advances in neural information processing systems , volume=
-
[21]
Advances in neural information processing systems , volume=
Large language models are zero-shot reasoners , author=. Advances in neural information processing systems , volume=
-
[22]
2025 , eprint=
gpt-oss-120b & gpt-oss-20b Model Card , author=. 2025 , eprint=
2025
-
[23]
NeurIPS , year=
Measuring Mathematical Problem Solving With the MATH Dataset , author=. NeurIPS , year=
-
[24]
Qwen2.5: A Party of Foundation Models , url =
Qwen Team , month =. Qwen2.5: A Party of Foundation Models , url =
-
[27]
Mz Dai and Chenxu Yang and Qingyi Si , booktitle=. S-. 2025 , url=
2025
-
[30]
Second Conference on Language Modeling , year=
L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning , author=. Second Conference on Language Modeling , year=
-
[31]
Charlie Victor Snell and Jaehoon Lee and Kelvin Xu and Aviral Kumar , booktitle=. Scaling. 2025 , url=
2025
-
[32]
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
Deduplicating training data makes language models better , author=. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
-
[33]
2024 , url=
Carlos E Jimenez and John Yang and Alexander Wettig and Shunyu Yao and Kexin Pei and Ofir Press and Karthik R Narasimhan , booktitle=. 2024 , url=
2024
-
[38]
Forty-second International Conference on Machine Learning , year=
T1: Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling , author=. Forty-second International Conference on Machine Learning , year=
-
[41]
The Fourteenth International Conference on Learning Representations , year=
CyclicReflex: Improving Reasoning Models via Cyclical Reflection Token Scheduling , author=. The Fourteenth International Conference on Learning Representations , year=
-
[42]
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=
Alphaone: Reasoning models thinking slow and fast at test time , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=
2025
-
[44]
Advances in Neural Information Processing Systems , volume=
Does thinking more always help? mirage of test-time scaling in reasoning models , author=. Advances in Neural Information Processing Systems , volume=
-
[45]
Geva, Mor and Khashabi, Daniel and Segal, Elad and Khot, Tushar and Roth, Dan and Berant, Jonathan , journal =
-
[46]
Commonsenseqa: A question answering challenge targeting commonsense knowledge , author=. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) , pages=
2019
-
[47]
Hashimoto , title =
Xuechen Li and Tianyi Zhang and Yann Dubois and Rohan Taori and Ishaan Gulrajani and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto , title =. GitHub repository , howpublished =. 2023 , month =
2023
-
[48]
System Report for CCL 25-Eval Task 10: SRAG - MAV for Fine-Grained C hinese Hate Speech Recognition
Wang, Jiahao and Liu, Ramen and Zhang, Longhui and Li, Jing. System Report for CCL 25-Eval Task 10: SRAG - MAV for Fine-Grained C hinese Hate Speech Recognition. Proceedings of the 24th C hina National Conference on Computational Linguistics ( CCL 2025). 2025
2025
-
[50]
H., Bhattacharya, P., Brundyn, A., Casper, J., Catanzaro, B., Clay, S., Cohen, J., et al
Adler, B., Agarwal, N., Aithal, A., Anh, D. H., Bhattacharya, P., Brundyn, A., Casper, J., Catanzaro, B., Clay, S., Cohen, J., et al. Nemotron-4 340b technical report. arXiv preprint arXiv:2406.11704, 2024
arXiv 2024
-
[51]
and Welleck, S
Aggarwal, P. and Welleck, S. L1: Controlling how long a reasoning model thinks with reinforcement learning. In Second Conference on Language Modeling, 2025. URL https://openreview.net/forum?id=4jdIxXBNve
2025
-
[52]
U., Narenthiran, S., Majumdar, S., Ficek, A., Jain, S., Huang, J., Noroozi, V., and Ginsburg, B
Ahmad, W. U., Narenthiran, S., Majumdar, S., Ficek, A., Jain, S., Huang, J., Noroozi, V., and Ginsburg, B. Opencodereasoning: Advancing data distillation for competitive coding. arXiv preprint arXiv:2504.01943, 2025
Pith/arXiv arXiv 2025
-
[53]
Program synthesis with large language models
Austin, J., Odena, A., Nye, M., Bosma, M., Michalewski, H., Dohan, D., Jiang, E., Cai, C., Terry, M., Le, Q., et al. Program synthesis with large language models. arXiv preprint arXiv:2108.07732, 2021
Pith/arXiv arXiv 2021
-
[54]
Bercovich, A., Levy, I., Golan, I., Dabbah, M., El-Yaniv, R., Puny, O., Galil, I., Moshe, Z., Ronen, T., Nabwani, N., Shahaf, I., Tropp, O., Karpas, E., Zilberstein, R., Zeng, J., Singhal, S., Bukharin, A., Zhang, Y., Konuk, T., Shen, G., Mahabaleshwarkar, A. S., Kartal, B., Suhara, Y., Delalleau, O., Chen, Z., Wang, Z., Mosallanezhad, D., Renduchintala, ...
arXiv 2025
-
[55]
V., R \'e , C., and Mirhoseini, A
Brown, B., Juravsky, J., Ehrlich, R., Clark, R., Le, Q. V., R \'e , C., and Mirhoseini, A. Large language monkeys: Scaling inference compute with repeated sampling. arXiv preprint arXiv:2407.21787, 2024
Pith/arXiv arXiv 2024
-
[56]
Training verifiers to solve math word problems
Cobbe, K., Kosaraju, V., Bavarian, M., Chen, M., Jun, H., Kaiser, L., Plappert, M., Tworek, J., Hilton, J., Nakano, R., Hesse, C., and Schulman, J. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168, 2021
Pith/arXiv arXiv 2021
-
[57]
S- GRPO : Early exit via reinforcement learning in reasoning models
Dai, M., Yang, C., and Si, Q. S- GRPO : Early exit via reinforcement learning in reasoning models. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. URL https://openreview.net/forum?id=wNMK5o0Vfg
2025
-
[58]
O., and Liu, S
Fan, C., Zhang, Y., Jia, J., Hero, A. O., and Liu, S. Cyclicreflex: Improving reasoning models via cyclical reflection token scheduling. In The Fourteenth International Conference on Learning Representations, 2026
2026
-
[59]
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies
Geva, M., Khashabi, D., Segal, E., Khot, T., Roth, D., and Berant, J. Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies . Transactions of the Association for Computational Linguistics (TACL), 2021
2021
-
[60]
S., Chakraborty, S., Reddy, A., Lu, Y., Wang, M., Manocha, D., Huang, F., Ghavamzadeh, M., and Bedi, A
Ghosal, S. S., Chakraborty, S., Reddy, A., Lu, Y., Wang, M., Manocha, D., Huang, F., Ghavamzadeh, M., and Bedi, A. S. Does thinking more always help? mirage of test-time scaling in reasoning models. Advances in Neural Information Processing Systems, 38: 0 172664--172691, 2026
2026
-
[61]
Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning
Guo, D., Yang, D., Zhang, H., Song, J., Zhang, R., Xu, R., Zhu, Q., Ma, S., Wang, P., Bi, X., et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948, 2025
Pith/arXiv arXiv 2025
-
[62]
Thinkdial: An open recipe for controlling reasoning effort in large language models
He, Q., Yuan, S., Li, X., Wang, M., and Chen, J. Thinkdial: An open recipe for controlling reasoning effort in large language models. arXiv preprint arXiv:2508.18773, 2025
arXiv 2025
-
[63]
Measuring massive multitask language understanding
Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D., and Steinhardt, J. Measuring massive multitask language understanding. Proceedings of the International Conference on Learning Representations (ICLR), 2021 a
2021
-
[64]
Measuring mathematical problem solving with the math dataset
Hendrycks, D., Burns, C., Kadavath, S., Arora, A., Basart, S., Tang, E., Song, D., and Steinhardt, J. Measuring mathematical problem solving with the math dataset. NeurIPS, 2021 b
2021
-
[65]
Thinkprune: Pruning long chain-of-thought of llms via reinforcement learning
Hou, B., Zhang, Y., Ji, J., Liu, Y., Qian, K., Andreas, J., and Chang, S. Thinkprune: Pruning long chain-of-thought of llms via reinforcement learning. arXiv preprint arXiv:2504.01296, 2025 a
Pith/arXiv arXiv 2025
-
[66]
T1: Advancing language model reasoning through reinforcement learning and inference scaling
Hou, Z., Lv, X., Lu, R., Zhang, J., Li, Y., Yao, Z., Li, J., Tang, J., and Dong, Y. T1: Advancing language model reasoning through reinforcement learning and inference scaling. In Forty-second International Conference on Machine Learning, 2025 b . URL https://openreview.net/forum?id=tnxONP8zTE
2025
-
[67]
Open r1: A fully open reproduction of deepseek-r1, January 2025
Hugging Face . Open r1: A fully open reproduction of deepseek-r1, January 2025. URL https://github.com/huggingface/open-r1
2025
-
[68]
Jaech, A., Kalai, A., Lerer, A., Richardson, A., El-Kishky, A., Low, A., Helyar, A., Madry, A., Beutel, A., Carney, A., et al. Openai o1 system card. arXiv preprint arXiv:2412.16720, 2024
Pith/arXiv arXiv 2024
-
[69]
Livecodebench: Holistic and contamination free evaluation of large language models for code
Jain, N., Han, K., Gu, A., Li, W.-D., Yan, F., Zhang, T., Wang, S., Solar-Lezama, A., Sen, K., and Stoica, I. Livecodebench: Holistic and contamination free evaluation of large language models for code. In The Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id=chfJJYC3iL
2025
-
[70]
Think only when you need with large hybrid-reasoning models
Jiang, L., Wu, X., Huang, S., Dong, Q., Chi, Z., Dong, L., Zhang, X., Lv, T., Cui, L., and Wei, F. Think only when you need with large hybrid-reasoning models. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. URL https://openreview.net/forum?id=fDjDVE4qdj
2025
-
[71]
E., Yang, J., Wettig, A., Yao, S., Pei, K., Press, O., and Narasimhan, K
Jimenez, C. E., Yang, J., Wettig, A., Yao, S., Pei, K., Press, O., and Narasimhan, K. R. SWE -bench: Can language models resolve real-world github issues? In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=VTF8yNQM66
2024
-
[72]
S., Reid, M., Matsuo, Y., and Iwasawa, Y
Kojima, T., Gu, S. S., Reid, M., Matsuo, Y., and Iwasawa, Y. Large language models are zero-shot reasoners. Advances in neural information processing systems, 35: 0 22199--22213, 2022
2022
-
[73]
Deduplicating training data makes language models better
Lee, K., Ippolito, D., Nystrom, A., Zhang, C., Eck, D., Callison-Burch, C., and Carlini, N. Deduplicating training data makes language models better. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.\ 8424--8445, 2022
2022
-
[74]
Li, X., Zhang, T., Dubois, Y., Taori, R., Gulrajani, I., Guestrin, C., Liang, P., and Hashimoto, T. B. Alpacaeval: An automatic evaluator of instruction-following models. https://github.com/tatsu-lab/alpaca_eval, 5 2023
2023
-
[75]
Let's verify step by step
Lightman, H., Kosaraju, V., Burda, Y., Edwards, H., Baker, B., Lee, T., Leike, J., Schulman, J., Sutskever, I., and Cobbe, K. Let's verify step by step. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=v8L0pN6EOi
2024
-
[76]
Adacot: Pareto-optimal adaptive chain-of-thought triggering via reinforcement learning
Lou, C., Sun, Z., Liang, X., Qu, M., Shen, W., Wang, W., Li, Y., Yang, Q., and Wu, S. Adacot: Pareto-optimal adaptive chain-of-thought triggering via reinforcement learning. arXiv preprint arXiv:2505.11896, 2025
arXiv 2025
-
[77]
B., Penedo, G., Beeching, E., Gallouédec, Q., Habib, N., Tunstall, L., and von Werra, L
Lozhkov, A., Kydlíček, H., Allal, L. B., Penedo, G., Beeching, E., Gallouédec, Q., Habib, N., Tunstall, L., and von Werra, L. Openr1-math-220k. https://huggingface.co/datasets/open-r1/OpenR1-Math-220k, 2025
2025
-
[78]
L., Fei-Fei, L., Hajishirzi, H., Zettlemoyer, L., Liang, P., Candès, E., and Hashimoto, T
Muennighoff, N., Yang, Z., Shi, W., Li, X. L., Fei-Fei, L., Hajishirzi, H., Zettlemoyer, L., Liang, P., Candès, E., and Hashimoto, T. s1: Simple test-time scaling, 2025. URL https://arxiv.org/abs/2501.19393
Pith/arXiv arXiv 2025
-
[79]
gpt-oss-120b & gpt-oss-20b model card, 2025
OpenAI. gpt-oss-120b & gpt-oss-20b model card, 2025. URL https://arxiv.org/abs/2508.10925
Pith/arXiv arXiv 2025
-
[80]
Qwen Team . Qwen3 technical report, 2025. URL https://arxiv.org/abs/2505.09388
Pith/arXiv arXiv 2025
-
[81]
L., Stickland, A
Rein, D., Hou, B. L., Stickland, A. C., Petty, J., Pang, R. Y., Dirani, J., Michael, J., and Bowman, S. R. GPQA : A graduate-level google-proof q&a benchmark. In First Conference on Language Modeling, 2024. URL https://openreview.net/forum?id=Ti67584b98
2024
-
[82]
Deepseekmath: Pushing the limits of mathematical reasoning in open language models
Shao, Z., Wang, P., Zhu, Q., Xu, R., Song, J., Bi, X., Zhang, H., Zhang, M., Li, Y., et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models. arXiv preprint arXiv:2402.03300, 2024
Pith/arXiv arXiv 2024
-
[83]
V., Lee, J., Xu, K., and Kumar, A
Snell, C. V., Lee, J., Xu, K., and Kumar, A. Scaling LLM test-time compute optimally can be more effective than scaling parameters for reasoning. In The Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id=4FWAwZtd2n
2025
-
[84]
Stop overthinking: A survey on efficient reasoning for large language models
Sui, Y., Chuang, Y.-N., Wang, G., Zhang, J., Zhang, T., Yuan, J., Liu, H., Wen, A., Zhong, S., Zou, N., et al. Stop overthinking: A survey on efficient reasoning for large language models. arXiv preprint arXiv:2503.16419, 2025
Pith/arXiv arXiv 2025
-
[85]
Commonsenseqa: A question answering challenge targeting commonsense knowledge
Talmor, A., Herzig, J., Lourie, N., and Berant, J. Commonsenseqa: A question answering challenge targeting commonsense knowledge. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp.\ 4149--4158, 2019
2019
-
[86]
System report for CCL 25-eval task 10: SRAG - MAV for fine-grained C hinese hate speech recognition
Wang, J., Liu, R., Zhang, L., and Li, J. System report for CCL 25-eval task 10: SRAG - MAV for fine-grained C hinese hate speech recognition. In Lin, H., Li, B., and Tan, H. (eds.), Proceedings of the 24th C hina National Conference on Computational Linguistics ( CCL 2025) , pp.\ 395--402, Jinan, China, August 2025 a . Chinese Information Processing Socie...
2025
-
[87]
Thoughts are all over the place: On the underthinking of o1-like llms
Wang, Y., Liu, Q., Xu, J., Liang, T., Chen, X., He, Z., Song, L., Yu, D., Li, J., Zhang, Z., et al. Thoughts are all over the place: On the underthinking of o1-like llms. arXiv preprint arXiv:2501.18585, 2025 b
arXiv 2025
-
[88]
V., Zhou, D., et al
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le, Q. V., Zhou, D., et al. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35: 0 24824--24837, 2022
2022
-
[89]
From efficiency to adaptivity: A deeper look at adaptive reasoning in large language models
Wu, C., Li, B., Gao, M., and Wang, Z. From efficiency to adaptivity: A deeper look at adaptive reasoning in large language models. arXiv preprint arXiv:2511.10788, 2025
arXiv 2025
-
[90]
Towards large reasoning models: A survey of reinforced reasoning with large language models
Xu, F., Hao, Q., Zong, Z., Wang, J., Zhang, Y., Wang, J., Lan, X., Gong, J., Ouyang, T., Meng, F., et al. Towards large reasoning models: A survey of reinforced reasoning with large language models. arXiv preprint arXiv:2501.09686, 2025
Pith/arXiv arXiv 2025
-
[91]
Qwen2.5-math technical report: Toward mathematical expert model via self-improvement
Yang, A., Zhang, B., Hui, B., Gao, B., Yu, B., Li, C., Liu, D., Tu, J., Zhou, J., Lin, J., Lu, K., Xue, M., Lin, R., Liu, T., Ren, X., and Zhang, Z. Qwen2.5-math technical report: Toward mathematical expert model via self-improvement. arXiv preprint arXiv:2409.12122, 2024
Pith/arXiv arXiv 2024
-
[92]
Alphaone: Reasoning models thinking slow and fast at test time
Zhang, J., Dong, R., Wang, H., Ning, X., Geng, H., Li, P., He, X., Bai, Y., Malik, J., Gupta, S., et al. Alphaone: Reasoning models thinking slow and fast at test time. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pp.\ 11340--11365, 2025 a
2025
-
[93]
A dapt T hink: Reasoning models can learn when to think
Zhang, J., Lin, N., Hou, L., Feng, L., and Li, J. A dapt T hink: Reasoning models can learn when to think. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025 b . URL https://aclanthology.org/2025.emnlp-main.184/
2025
-
[94]
Speed up your code: Progressive code acceleration through bidirectional tree editing
Zhang, L., Wang, J., Zhang, M., Cao, G., Shi, E., Ma, Y., Yu, J., Liu, H., Li, J., and Zhang, M. Speed up your code: Progressive code acceleration through bidirectional tree editing. In Che, W., Nabende, J., Shutova, E., and Pilehvar, M. T. (eds.), Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Pape...
-
[95]
Tinyllama: An open-source small language model
Zhang, P., Zeng, G., Wang, T., and Lu, W. Tinyllama: An open-source small language model. arXiv preprint arXiv:2401.02385, 2024
Pith/arXiv arXiv 2024
-
[96]
Saber: Switchable and balanced training for efficient llm reasoning
Zhao, K., Zhao, Y., Song, J., He, S., Zhang, L., Zhang, Q., and Li, T. Saber: Switchable and balanced training for efficient llm reasoning. arXiv preprint arXiv:2508.10026, 2025
arXiv 2025
-
[97]
X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., et al
Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., et al. A survey of large language models. arXiv preprint arXiv:2303.18223, 1 0 (2), 2023
Pith/arXiv arXiv 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.