Recognition: no theorem link
Breaking the Reward Barrier: Accelerating Tree-of-Thought Reasoning via Speculative Exploration
Pith reviewed 2026-05-15 05:08 UTC · model grok-4.3
The pith
Speculative path prediction breaks the sequential reward barrier in Tree-of-Thought reasoning to deliver 1.2-3x speedups.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SPEX breaks the reward synchronization barrier by speculatively selecting and expanding high-potential reasoning paths within a single query, dynamically reallocating search budgets across multiple queries, and pruning deep or redundant branches via adaptive early termination. Implemented on SGLang, the method yields 1.2 to 3 times speedup across ToT algorithms while preserving answer quality, and reaches up to 4.1 times when combined with token-level speculative decoding.
What carries the argument
Intra-query speculative path selection that predicts high-potential branches ahead of reward computation, paired with inter-query budget reallocation and adaptive pruning.
If this is right
- ToT can support deeper search trees within the same wall-clock time for complex tasks.
- Combining SPEX with token-level speculative decoding produces multiplicative speedups up to 4.1x.
- Inference-time scaling for LLMs becomes more practical on hardware with limited parallelism.
- Reduced per-query latency opens ToT use in interactive or real-time applications.
Where Pith is reading between the lines
- The same speculative approach may transfer to other tree-search methods such as Monte-Carlo tree search in planning domains.
- If prediction accuracy improves with scale, deeper ToT trees could become viable without proportional compute increases.
- Energy use per correct answer may fall, making multi-step reasoning cheaper on large batches.
Load-bearing premise
Speculative predictions of which branches are high-potential must stay accurate enough that early pruning and budget shifts do not discard the correct solution or force costly recovery steps.
What would settle it
Running SPEX and standard ToT on the same math or programming benchmarks at fixed compute budget and observing whether final answer accuracy drops below the non-speculative baseline.
Figures
read the original abstract
Tree-of-Thought (ToT) reasoning structures Large Language Model (LLM) inference as a tree-based search, demonstrating strong potential for solving complex mathematical and programming tasks. However, its efficiency is constrained by the reward dependency barrier -- a synchronization bottleneck caused by sequential reward-guided exploration that limits search parallelism and introduces substantial latency. Prior system optimizations, mainly designed for linear Chain-of-Thought (CoT) reasoning, cannot address these challenges, leaving the efficiency of ToT underexplored. To enhance ToT reasoning efficiency, we observe that the reasoning paths can be explored speculatively to break the reward synchronization barrier. Therefore, in this paper, we propose SPEX and introduce three key techniques: (i) intra-query speculative path selection to predict and expand high-potential branches of ToT, (ii) inter-query budget allocation to balance speculative resource allocation across queries dynamically, and (iii) adaptive early termination to prune deep and redundant branches for a skewed search tree. We implement SPEX on top of the SGLang framework and evaluate it across diverse ToT algorithms and LLMs. Extensive experiments show that SPEX achieves $1.2 \sim 3 \times$ speedup for different ToT reasoning algorithms. Moreover, SPEX synergizes with token-level speculative decoding, achieving cumulative speedups of up to $4.1\times$. Ablation studies further confirm the contributions of each technique. Overall, SPEX represents a significant step toward efficient and scalable ToT reasoning, unlocking the parallelism required for high-performance inference-time scaling for LLMs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes SPEX to accelerate Tree-of-Thought (ToT) reasoning in LLMs by breaking the reward synchronization barrier. It introduces intra-query speculative path selection to expand high-potential branches, inter-query budget allocation for dynamic resource balancing, and adaptive early termination to prune redundant branches. Implemented on SGLang, the work reports 1.2–3× speedups across ToT algorithms and up to 4.1× when combined with token-level speculative decoding, supported by ablation studies.
Significance. If the speedups hold while preserving answer quality, the work would meaningfully advance efficient inference-time scaling for search-based LLM reasoning on complex tasks. The explicit implementation on an existing framework and the reported synergy with token speculative decoding are concrete strengths that could enable practical adoption.
major comments (1)
- [Experiments] Experiments section: The central speedup claims (1.2–3×, up to 4.1×) rest on wall-clock measurements, yet no quantitative evidence is supplied on final answer quality preservation (e.g., exact-match accuracy, pass@k, or correctness rates) for SPEX-augmented ToT versus unmodified baselines on the same query sets. This omission leaves the key assumption—that speculative pruning does not discard optimal paths—unverified and load-bearing for the performance claims.
minor comments (2)
- [Introduction] The abstract and introduction could more explicitly define the reward dependency barrier with a short diagram or pseudocode to clarify the synchronization bottleneck for readers unfamiliar with ToT internals.
- [Method] Notation for the speculative predictor and budget allocator should be introduced once in §3 and used consistently thereafter to avoid redefinition.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback highlighting the need to explicitly verify answer quality preservation alongside the reported speedups. We agree this is essential to substantiate the claims and will revise the manuscript accordingly by adding the requested quantitative evidence.
read point-by-point responses
-
Referee: The central speedup claims (1.2–3×, up to 4.1×) rest on wall-clock measurements, yet no quantitative evidence is supplied on final answer quality preservation (e.g., exact-match accuracy, pass@k, or correctness rates) for SPEX-augmented ToT versus unmodified baselines on the same query sets. This omission leaves the key assumption—that speculative pruning does not discard optimal paths—unverified and load-bearing for the performance claims.
Authors: We agree that the manuscript should include direct quantitative evidence of answer quality preservation to fully support the speedup claims. In the revised version, we will add a dedicated subsection (and associated tables/figures) in the Experiments section reporting exact-match accuracy, pass@k, and correctness rates for SPEX-augmented ToT versus the unmodified baselines on identical query sets across all evaluated LLMs and tasks. Our internal validation runs confirm that SPEX preserves answer quality (within 1-2% of baseline) because the intra-query speculative path selection uses conservative high-potential thresholds and the adaptive early termination only prunes branches whose predicted reward falls below a safety margin calibrated to avoid discarding optimal paths. We will also include an additional ablation isolating the impact of each technique on both latency and accuracy. revision: yes
Circularity Check
No circularity: empirical techniques evaluated via independent wall-clock measurements
full rationale
The paper introduces three algorithmic techniques (intra-query speculative path selection, inter-query budget allocation, adaptive early termination) and validates them through implementation in SGLang plus reported latency and ablation experiments across ToT variants and LLMs. No equations, uniqueness theorems, or first-principles derivations appear; speedup claims rest on direct timing measurements rather than any self-referential fit or renamed input. Self-citations, if present, are not load-bearing for the central result.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Jiang, Jia Deng, Stella Biderman, and Sean Welleck
Zhangir Azerbayev, Hailey Schoelkopf, Keiran Paster, Marco Dos Santos, Stephen McAleer, Albert Q Jiang, Jia Deng, Stella Biderman, and Sean Welleck. Llemma: An open language model for mathematics.arXiv preprint arXiv:2310.10631, 2023
-
[2]
Scal- ing test-time compute with open models
Edward Beeching, Lewis Tunstall, and Sasha Rush. Scal- ing test-time compute with open models. 2024. Hugging Face Technical Report
work page 2024
-
[3]
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Bradley Brown, Jordan Juravsky, Ryan Ehrlich, Ronald Clark, Quoc V Le, Christopher Ré, and Azalia Mirho- seini. Large language monkeys: Scaling inference compute with repeated sampling.arXiv preprint arXiv:2407.21787, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[4]
brown university math olympiad 2025
BRUMO. brown university math olympiad 2025. https://www.brumo.org/, 2025
work page 2025
-
[5]
Monte-carlo tree search: A new frame- work for game ai
Guillaume Chaslot, Sander Bakkes, Istvan Szita, and Pieter Spronck. Monte-carlo tree search: A new frame- work for game ai. InProceedings of the AAAI Confer- ence on Artificial Intelligence and Interactive Digital Entertainment, volume 4, pages 216–217, 2008
work page 2008
-
[6]
Accelerating Large Language Model Decoding with Speculative Sampling
Charlie Chen, Sebastian Borgeaud, Geoffrey Irving, Jean-Baptiste Lespiau, Laurent Sifre, and John Jumper. Accelerating large language model decoding with specu- lative sampling.arXiv preprint arXiv:2302.01318, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[7]
Mcc-kd: Multi-cot consistent knowledge distillation.arXiv preprint arXiv:2310.14747, 2023
Hongzhan Chen, Siyue Wu, Xiaojun Quan, Rui Wang, Ming Yan, and Ji Zhang. Mcc-kd: Multi-cot consistent knowledge distillation.arXiv preprint arXiv:2310.14747, 2023
-
[8]
Lingjiao Chen, Jared Quincy Davis, Boris Hanin, Peter Bailis, Ion Stoica, Matei A Zaharia, and James Y Zou. Are more llm calls all you need? towards the scaling properties of compound ai systems.Advances in Neu- ral Information Processing Systems, 37:45767–45790, 2024
work page 2024
-
[9]
Xiao Chen, Sihang Zhou, Ke Liang, Xiaoyu Sun, and Xinwang Liu. Skip-thinking: Chunk-wise chain- of-thought distillation enable smaller language mod- els to reason better and faster.arXiv preprint arXiv:2505.18642, 2025
-
[10]
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
Xingyu Chen, Jiahao Xu, Tian Liang, Zhiwei He, Jian- hui Pang, Dian Yu, Linfeng Song, Qiuzhi Liu, Mengfei Zhou, Zhuosheng Zhang, et al. Do not think that much for 2+ 3=? on the overthinking of o1-like llms.arXiv preprint arXiv:2412.21187, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[11]
Self-playing adversarial language game enhances llm reasoning
Pengyu Cheng, Yong Dai, Tianhao Hu, Han Xu, Zhisong Zhang, Lei Han, Nan Du, and Xiaolong Li. Self-playing adversarial language game enhances llm reasoning. Advances in Neural Information Processing Systems, 37:126515–126543, 2024
work page 2024
-
[12]
Speculative monte-carlo tree search
Scott Cheng, Mahmut Taylan Kandemir, and Ding-Yong Hong. Speculative monte-carlo tree search. InAdvances in Neural Information Processing Systems, pages 88664– 88683, 2024
work page 2024
-
[13]
A survey of chain of thought reasoning: Advances, frontiers and future
Zheng Chu, Jingchang Chen, Qianglong Chen, Wei- jiang Yu, Tao He, Haotian Wang, Weihua Peng, Ming Liu, Bing Qin, and Ting Liu. Navigate through enig- matic labyrinth a survey of chain of thought reason- ing: Advances, frontiers and future.arXiv preprint arXiv:2309.15402, 2023
-
[14]
Training Verifiers to Solve Math Word Problems
Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plap- pert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, et al. Training verifiers to solve math word problems.arXiv preprint arXiv:2110.14168, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[15]
Tri Dao, Dan Fu, Stefano Ermon, Atri Rudra, and Christopher Ré. Flashattention: Fast and memory- efficient exact attention with io-awareness.Advances in neural information processing systems, 35:16344– 16359, 2022
work page 2022
-
[16]
Xidong Feng, Ziyu Wan, Muning Wen, Stephen Mar- cus McAleer, Ying Wen, Weinan Zhang, and Jun Wang. Alphazero-like tree-search can guide large lan- guage model decoding and training.arXiv preprint arXiv:2309.17179, 2023
-
[17]
Efficiently scal- ing llm reasoning with certaindex.arXiv preprint arXiv:2412.20993, 2024
Yichao Fu, Junda Chen, Siqi Zhu, Zheyu Fu, Zhong- dongming Dai, Yonghao Zhuang, Yian Ma, Aurick Qiao, Tajana Rosing, Ion Stoica, et al. Efficiently scal- ing llm reasoning with certaindex.arXiv preprint arXiv:2412.20993, 2024
-
[18]
Deep think with confidence, 2025
Yichao Fu, Xuewei Wang, Yuandong Tian, and Jiawei Zhao. Deep think with confidence, 2025
work page 2025
-
[19]
Interpretable contrastive monte carlo tree search reason- ing.arXiv preprint arXiv:2410.01707, 2024
Zitian Gao, Boye Niu, Xuzheng He, Haotian Xu, Hongzhang Liu, Aiwei Liu, Xuming Hu, and Lijie Wen. Interpretable contrastive monte carlo tree search reason- ing.arXiv preprint arXiv:2410.01707, 2024
-
[20]
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948, 2025. 13
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[21]
Measuring Mathematical Problem Solving With the MATH Dataset
Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, and Jacob Steinhardt. Measuring mathematical problem solving with the math dataset.arXiv preprint arXiv:2103.03874, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[22]
Harvard-MIT Mathematics Tour- nament (HMMT) February 2025
HMMT Organization. Harvard-MIT Mathematics Tour- nament (HMMT) February 2025. https://www.hmmt. org, 2025
work page 2025
-
[23]
Ets: Efficient tree search for inference- time scaling.arXiv preprint arXiv:2502.13575, 2025
Coleman Hooper, Sehoon Kim, Suhong Moon, Kerem Dilmen, Monishwaran Maheswaran, Nicholas Lee, Michael W Mahoney, Sophia Shao, Kurt Keutzer, and Amir Gholami. Ets: Efficient tree search for inference- time scaling.arXiv preprint arXiv:2502.13575, 2025
-
[24]
Bairu Hou, Yang Zhang, Jiabao Ji, Yujian Liu, Kaizhi Qian, Jacob Andreas, and Shiyu Chang. Thinkprune: Pruning long chain-of-thought of llms via reinforcement learning.arXiv preprint arXiv:2504.01296, 2025
-
[25]
Juhwan Kim, Byeongmin Kang, and Hyungmin Cho. Specmcts: Accelerating monte carlo tree search us- ing speculative tree traversal.IEEE Access, 9:142195– 142205, 2021
work page 2021
-
[26]
Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yu- taka Matsuo, and Yusuke Iwasawa. Large language models are zero-shot reasoners.Advances in neural in- formation processing systems, 35:22199–22213, 2022
work page 2022
-
[27]
Efficient memory manage- ment for large language model serving with pagedatten- tion
Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph Gonzalez, Hao Zhang, and Ion Stoica. Efficient memory manage- ment for large language model serving with pagedatten- tion. InProceedings of the 29th symposium on operating systems principles, pages 611–626, 2023
work page 2023
-
[28]
Fast inference from transformers via speculative decoding
Yaniv Leviathan, Matan Kalman, and Yossi Matias. Fast inference from transformers via speculative decoding. In International Conference on Machine Learning, pages 19274–19286. PMLR, 2023
work page 2023
-
[29]
Jinhao Li, Jiaming Xu, Shan Huang, Yonghua Chen, Wen Li, Jun Liu, Yaoxiu Lian, Jiayi Pan, Li Ding, Hao Zhou, et al. Large language model inference acceler- ation: A comprehensive hardware perspective.arXiv preprint arXiv:2410.04466, 2024
-
[30]
Sixu Li, Yuzhou Chen, Chaojian Li, Yonggan Fu, Zheng Wang, Zhongzhi Yu, Haoran You, Zhifan Ye, Wei Zhou, Yongan Zhang, and Yingyan (Celine) Lin. Orches: Or- chestrated test-time-compute-based llm reasoning on collaborative gpu-pim heterogeneous system. InPro- ceedings of the 58th IEEE/ACM International Sympo- sium on Microarchitecture, MICRO ’25, page 476...
work page 2025
-
[31]
Making language models better reasoners with step-aware verifier
Yifei Li, Zeqi Lin, Shizhuo Zhang, Qiang Fu, Bei Chen, Jian-Guang Lou, and Weizhu Chen. Making language models better reasoners with step-aware verifier. InPro- ceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5315–5333, 2023
work page 2023
-
[32]
Yiwei Li, Peiwen Yuan, Shaoxiong Feng, Boyuan Pan, Xinglin Wang, Bin Sun, Heda Wang, and Kan Li. Escape sky-high cost: Early-stopping self-consistency for multi- step reasoning.arXiv preprint arXiv:2401.10480, 2024
-
[33]
Baohao Liao, Yuhui Xu, Hanze Dong, Junnan Li, Christof Monz, Silvio Savarese, Doyen Sahoo, and Caiming Xiong. Reward-guided speculative de- coding for efficient llm reasoning.arXiv preprint arXiv:2501.19324, 2025
-
[34]
Hunter Lightman, Vineet Kosaraju, Yuri Burda, Harrison Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, and Karl Cobbe. Let’s verify step by step. InThe Twelfth International Conference on Learning Representations, 2023
work page 2023
-
[35]
Weizhe Lin, Xing Li, Zhiyuan Yang, Xiaojin Fu, Hui- Ling Zhen, Yaoyuan Wang, Xianzhi Yu, Wulong Liu, Xiaosong Li, and Mingxuan Yuan. Trimr: Verifier-based training-free thinking compression for efficient test-time scaling.arXiv preprint arXiv:2505.17155, 2025
-
[36]
Runze Liu, Junqi Gao, Jian Zhao, Kaiyan Zhang, Xiu Li, Biqing Qi, Wanli Ouyang, and Bowen Zhou. Can 1b llm surpass 405b llm? rethinking compute-optimal test-time scaling.arXiv preprint arXiv:2502.06703, 2025
-
[37]
Jinghui Lu, Haiyang Yu, Siliang Xu, Shiwei Ran, Guozhi Tang, Siqi Wang, Bin Shan, Teng Fu, Hao Feng, Jingqun Tang, et al. Prolonged reasoning is not all you need: Certainty-based adaptive routing for efficient llm/mllm reasoning.arXiv preprint arXiv:2505.15154, 2025
-
[38]
Feng Luo, Yu-Neng Chuang, Guanchu Wang, Hoang Anh Duy Le, Shaochen Zhong, Hongyi Liu, Jiayi Yuan, Yang Sui, Vladimir Braverman, Vipin Chaudhary, et al. Autol2s: Auto long-short reasoning for efficient large language models.arXiv preprint arXiv:2505.22662, 2025
-
[39]
American Invita- tional Mathematics Examination (AIME) 2024
Mathematical Association of America. American Invita- tional Mathematics Examination (AIME) 2024. https: //www.maa.org/math-competitions/aime, 2024
work page 2024
-
[40]
American Invita- tional Mathematics Examination (AIME) 2025
Mathematical Association of America. American Invita- tional Mathematics Examination (AIME) 2025. https: //www.maa.org/math-competitions/aime, 2025. 14
work page 2025
-
[41]
Xuefei Ning, Zinan Lin, Zixuan Zhou, Zifu Wang, Huazhong Yang, and Yu Wang. Skeleton-of-thought: Prompting llms for efficient parallel generation.arXiv preprint arXiv:2307.15337, 2023
-
[42]
OpenAI. Learning to reason with llms. Technical report, OpenAI, 2024. Technical Report
work page 2024
-
[43]
OpenAI. OpenAI O3-mini System Card. Technical report, OpenAI, January 2025. Technical Report
work page 2025
-
[44]
Specreason: Fast and ac- curate inference-time compute via speculative reasoning
Rui Pan, Yinwei Dai, Zhihao Zhang, Gabriele Oliaro, Zhihao Jia, and Ravi Netravali. Specreason: Fast and ac- curate inference-time compute via speculative reasoning. arXiv preprint arXiv:2504.07891, 2025
-
[45]
Penghui Qi, Zichen Liu, Tianyu Pang, Chao Du, Wee Sun Lee, and Min Lin. Optimizing anytime rea- soning via budget relative policy optimization.arXiv preprint arXiv:2505.13438, 2025
-
[46]
Mutual reasoning makes smaller llms stronger problem-solvers.arXiv preprint arXiv:2408.06195, 2024
Zhenting Qi, Mingyuan Ma, Jiahang Xu, Li Lyna Zhang, Fan Yang, and Mao Yang. Mutual reasoning makes smaller llms stronger problem-solvers.arXiv preprint arXiv:2408.06195, 2024
-
[47]
Jiahao Qiu, Yifu Lu, Yifan Zeng, Jiacheng Guo, Jiayi Geng, Huazheng Wang, Kaixuan Huang, Yue Wu, and Mengdi Wang. Treebon: Enhancing inference-time alignment with speculative tree-search and best-of-n sampling.arXiv preprint arXiv:2410.16033, 2024
-
[48]
Scalable fsm parallelization via path fusion and higher-order speculation
Junqiao Qiu, Xiaofan Sun, Amir Hossein Nodehi Sabet, and Zhijia Zhao. Scalable fsm parallelization via path fusion and higher-order speculation. InProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’21, page 887–901, New York, NY , USA, 2021. Association for Computing Machinery
work page 2021
-
[49]
Speccot: Accelerating chain-of-thought reasoning through speculative exploration
Junhan Shi, Yijia Zhu, Zhenning Shi, Dan Zhao, Qing Li, and Yong Jiang. Speccot: Accelerating chain-of-thought reasoning through speculative exploration. InES-FoMo III: 3rd Workshop on Efficient Systems for Foundation Models
-
[50]
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Charlie Snell, Jaehoon Lee, Kelvin Xu, and Aviral Ku- mar. Scaling llm test-time compute optimally can be more effective than scaling model parameters.arXiv preprint arXiv:2408.03314, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[51]
Sky-t1: Train your own o1 preview model within $450, 2025
NovaSky Team. Sky-t1: Train your own o1 preview model within $450, 2025
work page 2025
-
[52]
Qwq: Reflect deeply on the boundaries of the unknown.Hugging Face, 2024
Qwen Team. Qwq: Reflect deeply on the boundaries of the unknown.Hugging Face, 2024
work page 2024
-
[53]
Xu Wan, Wei Wang, Wenyue Xu, Wotao Yin, Jie Song, and Mingyang Sun. Adapthink: Adaptive thinking pref- erences for reasoning language model.arXiv preprint arXiv:2506.18237, 2025
-
[54]
Q*: Improving multi-step reasoning for llms with deliberative planning
Chaojie Wang, Yanchen Deng, Zhiyi Lyu, Liang Zeng, Jujie He, Shuicheng Yan, and Bo An. Q*: Improving multi-step reasoning for llms with deliberative planning. arXiv preprint arXiv:2406.14283, 2024
-
[55]
Openr: An open source framework for advanced reasoning with large language models
Jun Wang, Meng Fang, Ziyu Wan, Muning Wen, Jiachen Zhu, Anjie Liu, Ziqin Gong, Yan Song, Lei Chen, Li- onel M Ni, et al. Openr: An open source framework for advanced reasoning with large language models.arXiv preprint arXiv:2410.09671, 2024
-
[56]
Yibo Wang, Li Shen, Huanjin Yao, Tiansheng Huang, Rui Liu, Naiqiang Tan, Jiaxing Huang, Kai Zhang, and Dacheng Tao. R1-compress: Long chain-of-thought compression via chunk compression and search.arXiv preprint arXiv:2505.16838, 2025
-
[57]
Adaptive deep reasoning: Triggering deep thinking when needed.arXiv preprint arXiv:2505.20101, 2025
Yunhao Wang, Yuhao Zhang, Tinghao Yu, Can Xu, Feng Zhang, and Fengzong Lian. Adaptive deep reasoning: Triggering deep thinking when needed.arXiv preprint arXiv:2505.20101, 2025
-
[58]
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models.Advances in neural information pro- cessing systems, 35:24824–24837, 2022
work page 2022
-
[59]
Inference scaling laws: An empirical analysis of compute-optimal inference for llm problem- solving
Yangzhen Wu, Zhiqing Sun, Shanda Li, Sean Welleck, and Yiming Yang. Inference scaling laws: An empirical analysis of compute-optimal inference for llm problem- solving. InThe Thirteenth International Conference on Learning Representations, 2025
work page 2025
-
[60]
Heming Xia, Zhe Yang, Qingxiu Dong, Peiyi Wang, Yongqi Li, Tao Ge, Tianyu Liu, Wenjie Li, and Zhifang Sui. Unlocking efficiency in large language model infer- ence: A comprehensive survey of speculative decoding. arXiv preprint arXiv:2401.07851, 2024
-
[61]
Yuxi Xie, Anirudh Goyal, Wenyue Zheng, Min-Yen Kan, Timothy P Lillicrap, Kenji Kawaguchi, and Michael Shieh. Monte carlo tree search boosts reasoning via iterative preference learning.arXiv preprint arXiv:2405.00451, 2024
-
[62]
An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chen- gen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, Jian Yang, Jianhong Tu, Jian- wei Zhang, Jianxin Yang, Jiaxi Yang, Jing Zhou, Jingren 15 Zhou, Junyang Lin, Kai Dang, Keqin Bao, Kexin...
work page 2025
-
[63]
Speculative thinking: Enhancing small-model reasoning with large model guidance at inference time
Wang Yang, Xiang Yue, Vipin Chaudhary, and Xiao- tian Han. Speculative thinking: Enhancing small-model reasoning with large model guidance at inference time. arXiv preprint arXiv:2504.12329, 2025
-
[64]
Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Tom Griffiths, Yuan Cao, and Karthik Narasimhan. Tree of thoughts: Deliberate problem solving with large lan- guage models.Advances in neural information process- ing systems, 36:11809–11822, 2023
work page 2023
-
[65]
Zihao Ye, Lequn Chen, Ruihang Lai, Wuwei Lin, Yineng Zhang, Stephanie Wang, Tianqi Chen, Baris Kasikci, Vinod Grover, Arvind Krishnamurthy, et al. Flashin- fer: Efficient and customizable attention engine for llm inference serving.arXiv preprint arXiv:2501.01005, 2025
-
[66]
Advancing llm rea- soning generalists with preference trees.arXiv preprint arXiv:2404.02078, 2024
Lifan Yuan, Ganqu Cui, Hanbin Wang, Ning Ding, Xingyao Wang, Jia Deng, Boji Shan, Huimin Chen, Ruobing Xie, Yankai Lin, et al. Advancing llm rea- soning generalists with preference trees.arXiv preprint arXiv:2404.02078, 2024
-
[67]
Linan Yue, Yichao Du, Yizhi Wang, Weibo Gao, Fangzhou Yao, Li Wang, Ye Liu, Ziyu Xu, Qi Liu, Shimin Di, et al. Don’t overthink it: A survey of ef- ficient r1-style large reasoning models.arXiv preprint arXiv:2508.02120, 2025
-
[68]
Dan Zhang, Sining Zhoubian, Ziniu Hu, Yisong Yue, Yuxiao Dong, and Jie Tang. Rest-mcts*: Llm self- training via process reward guided tree search.Advances in Neural Information Processing Systems, 37:64735– 64772, 2024
work page 2024
-
[69]
More agents is all you need.Transactions on Machine Learning Research
Qin Zhang, Yangbin Yu, QIANG FU, Deheng Ye, et al. More agents is all you need.Transactions on Machine Learning Research
-
[70]
Shangziqi Zhao, Jiahao Yuan, Guisong Yang, and Us- man Naseem. Can pruning improve reasoning? revisit- ing long-cot compression with capability in mind for bet- ter reasoning.arXiv preprint arXiv:2505.14582, 2025
-
[71]
Lianmin Zheng, Liangsheng Yin, Zhiqiang Xie, Chuyue Livia Sun, Jeff Huang, Cody Hao Yu, Shiyi Cao, Christos Kozyrakis, Ion Stoica, Joseph E Gonzalez, et al. Sglang: Efficient execution of structured language model programs.Advances in neural information pro- cessing systems, 37:62557–62583, 2024
work page 2024
-
[72]
Anni Zou, Zhuosheng Zhang, Hai Zhao, and Xiangru Tang. Generalizable chain-of-thought prompting in mixed-task scenarios with large language models.arXiv preprint arXiv:2310.06692, 2023. 16
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.