LinTree: Improving LLM Reasoning with Explicitly Structured Search Histories

Liwei Kang; Wee Sun Lee; Yee Whye Teh

arxiv: 2605.31492 · v1 · pith:6BCIUEKJnew · submitted 2026-05-29 · 💻 cs.AI

LinTree: Improving LLM Reasoning with Explicitly Structured Search Histories

Liwei Kang , Yee Whye Teh , Wee Sun Lee This is my paper

Pith reviewed 2026-06-28 22:03 UTC · model grok-4.3

classification 💻 cs.AI

keywords LLM reasoningsearch treesparent pointerslinearized tracesBlocks WorldSokobangrid navigationheuristic search

0 comments

The pith

Adding parent pointers to make search tree structure explicit improves LLM reasoning performance and efficiency over implicit histories and heuristic search.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether LLMs benefit from full search history in reasoning traces, which can be viewed as linearized trees where the model extends, abandons, and backtracks partial solutions. It finds that access to history alone does not reliably beat heuristic search that only sees the current state. The key issue is that standard traces represent the tree only implicitly, without marking which prior state is revisited on backtrack. Adding simple parent pointers creates an explicit LinTree representation, which raises both accuracy and efficiency in Blocks World, grid Navigation, and Sokoban. A reader would care because the change is minimal yet targets a structural limitation in how models use their own traces.

Core claim

The paper claims that LLM reasoning traces implicitly encode search trees, and that raw history access does not suffice to outperform LLM-heuristic best-first search; explicitly marking parent pointers to form a LinTree structure produces higher task success rates and fewer steps across the three tested environments.

What carries the argument

LinTree, the linearized search tree made explicit by parent pointers that identify which earlier state is revisited when the model backtracks.

If this is right

Explicit parent pointers allow models to condition on the full tree rather than only the linear trace.
Search efficiency improves because the model can more reliably avoid redundant branches.
The advantage appears in puzzle domains where backtracking is frequent.
The same explicit structure can be added on top of existing trace-based reasoning without retraining.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method could be tested on longer or more open-ended reasoning chains where implicit backtracking becomes harder to track.
Models trained to emit parent pointers natively might internalize tree structure without extra tokens at inference time.
The finding suggests that other implicit structures in LLM outputs, such as call graphs or dependency chains, may also benefit from explicit markers.

Load-bearing premise

The performance and efficiency gains come from the explicit parent pointers rather than from incidental changes in token count, prompt wording, or overall model behavior.

What would settle it

Equating total tokens and prompt format between the LinTree version and the implicit-trace version, then observing whether the accuracy gap disappears in the same three environments.

Figures

Figures reproduced from arXiv: 2605.31492 by Liwei Kang, Wee Sun Lee, Yee Whye Teh.

read the original abstract

Large language models (LLMs) often solve reasoning problems by generating intermediate traces that explore and revise partial solutions. From a search perspective, these traces can be viewed as linearized search trees, where the model extends a partial solution, abandons it when it fails, and backtracks to try alternatives. Compared with traditional heuristic-guided search, such a policy has a potential advantage: it conditions on the whole search trace rather than only on the current local state. We first test whether LLMs utilize this advantage by comparing trace-conditioned reasoning policies against best-first search equipped with an LLM heuristic that only observes the current local state. Across three controlled reasoning environments, Blocks World, grid Navigation, and Sokoban, we find that raw access to search history alone is not enough to reliably outperform heuristic search. We then study one possible reason: in LLM reasoning traces, the underlying search tree is only implicitly represented, and when the model backtracks or switches branches, the trace does not explicitly identify which earlier search state is being revisited. We show that adding simple parent pointers to explicitly represent the linearized tree (LinTree) structure improves both task performance and search efficiency relative to implicit reasoning models and LLM-heuristic-guided search. These results suggest that search history becomes most useful when its tree structure is made explicit, motivating more structure-aware representations for LLM reasoning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LinTree shows gains from parent pointers in three planning domains, but the experiments leave open whether those gains come from the tree structure or from changes in prompt format and token count.

read the letter

The paper first checks whether full search history helps LLMs over a local-state heuristic in Blocks World, grid navigation, and Sokoban. It finds history alone does not reliably beat the heuristic. It then adds parent pointers to make the underlying tree explicit in the trace and reports better task performance and efficiency.

That comparison is the main new piece. The setup is straightforward and the domains are standard, so the result is easy to understand even if the numbers need more detail.

The soft spot is exactly the one in the stress test. Adding pointers changes the trace format and length. The abstract gives no sign that token budgets or surface formatting were matched between the implicit and LinTree conditions. Without that control, or at least reporting on it, the performance lift could be produced by the altered input distribution rather than by the explicit structure. Sample sizes, statistical tests, and how backtracking was actually implemented are also missing from the abstract, which makes the central claim harder to evaluate.

The work is aimed at people building LLM agents that do search or planning. A reader who already works on trace-based reasoning will get the most out of it, because the question is concrete and the proposed fix is simple.

It deserves a serious referee. The idea is testable, the domains are controlled, and the potential confound is fixable with more careful reporting. I would send it out rather than desk reject.

Referee Report

2 major / 1 minor

Summary. The manuscript claims that while providing LLMs with raw search history in reasoning traces does not reliably outperform LLM-heuristic-guided best-first search across Blocks World, grid Navigation, and Sokoban, explicitly representing the underlying search tree using parent pointers in the LinTree structure leads to improved task performance and search efficiency.

Significance. If the results hold after appropriate controls, this would indicate that making the tree structure of search histories explicit is a key factor enabling LLMs to better leverage past reasoning steps, providing a lightweight prompting-based approach to improve reasoning without additional training or model changes.

major comments (2)

[Abstract] Abstract: The abstract reports comparative results across three environments but supplies no information on sample sizes, statistical tests, controls for prompt length, or how backtracking was implemented, leaving the central empirical claim under-specified.
[Experimental comparison] Experimental comparison (implicit vs. LinTree conditions): The claim that gains are caused by explicit parent pointers requires that token counts, prompt lengths, and surface formatting are matched across conditions; the manuscript provides no indication that these covariates were controlled, so observed lifts could arise from altered input distributions rather than the structural information itself.

minor comments (1)

[Environments] The environments are described as 'controlled' but no details are given on how environment stochasticity or state representations were standardized across conditions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which highlight important aspects of experimental reporting and controls. We address each major comment below and will make the indicated revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: The abstract reports comparative results across three environments but supplies no information on sample sizes, statistical tests, controls for prompt length, or how backtracking was implemented, leaving the central empirical claim under-specified.

Authors: We agree that the abstract would be improved by including these details to better support the central claims. In the revision we will expand the abstract to report the number of independent trials per environment, note that statistical significance was assessed with non-parametric tests (e.g., Wilcoxon signed-rank), confirm that prompt lengths were matched across conditions, and briefly describe the backtracking procedure. Because abstracts are length-constrained, the expanded version will still direct readers to the methods section for full implementation details. revision: yes
Referee: [Experimental comparison] Experimental comparison (implicit vs. LinTree conditions): The claim that gains are caused by explicit parent pointers requires that token counts, prompt lengths, and surface formatting are matched across conditions; the manuscript provides no indication that these covariates were controlled, so observed lifts could arise from altered input distributions rather than the structural information itself.

Authors: The referee correctly identifies a methodological gap. The manuscript does not report or explicitly enforce matched token counts and prompt lengths between the implicit-trace and LinTree conditions; the parent-pointer tokens necessarily add a modest number of tokens. We will revise the experimental-setup section to (i) tabulate average prompt lengths and token counts per condition, (ii) describe the formatting standardization applied, and (iii) add an ablation that further equalizes lengths (via padding or truncation) and reports the resulting performance. These additions will allow readers to assess whether the observed gains persist under stricter length controls. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical comparison of prompting variants

full rationale

The paper reports controlled experiments comparing implicit vs. explicit (LinTree) search traces and LLM-heuristic search across three toy domains. No equations, fitted parameters, or derivations are present; performance numbers are measured outcomes, not quantities that reduce to inputs by construction. No self-citation chains or uniqueness theorems are invoked as load-bearing premises. The work is self-contained against external benchmarks (task success rates and efficiency metrics) with no reduction of claims to self-referential definitions or renamings.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that LLM-generated traces can be interpreted as search processes and that the three puzzle domains are representative of reasoning tasks; no free parameters or invented physical entities are introduced.

axioms (1)

domain assumption LLM reasoning traces can be viewed as linearized search trees that explore and backtrack
Stated in the first paragraph of the abstract as the starting point for the experiments.

invented entities (1)

LinTree structure no independent evidence
purpose: Explicit representation of the search tree via parent pointers in linearized traces
Introduced by the authors as the key modification being tested.

pith-pipeline@v0.9.1-grok · 5767 in / 1241 out tokens · 26672 ms · 2026-06-28T22:03:05.405206+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 11 canonical work pages · 6 internal anchors

[1]

Graph of thoughts: Solving elaborate problems with large language models

Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Michal Podstawski, Lukas Gianinazzi, Joanna Gajda, Tomasz Lehmann, Hubert Niewiadomski, Piotr Nyczyk, et al. Graph of thoughts: Solving elaborate problems with large language models. InProceedings of the AAAI conference on artificial intelligence, volume 38, pages 17682–17690, 2024

2024
[2]

Toward adaptive reasoning in large language models with thought rollback.arXiv preprint arXiv:2412.19707, 2024

Sijia Chen and Baochun Li. Toward adaptive reasoning in large language models with thought rollback.arXiv preprint arXiv:2412.19707, 2024

work page arXiv 2024
[3]

Boosting of thoughts: Trial-and-error problem solving with large language models.arXiv preprint arXiv:2402.11140, 2024

Sijia Chen, Baochun Li, and Di Niu. Boosting of thoughts: Trial-and-error problem solving with large language models.arXiv preprint arXiv:2402.11140, 2024

work page arXiv 2024
[4]

Xgrammar: Flexible and efficient structured generation engine for large language models

Yixin Dong, Charlie F Ruan, Yaxing Cai, Ziyi Xu, Yilong Zhao, Ruihang Lai, and Tianqi Chen. Xgrammar: Flexible and efficient structured generation engine for large language models. Proceedings of Machine Learning and Systems, 7, 2025

2025
[5]

An investigation of model-free planning: boxoban levels

Arthur Guez, Mehdi Mirza, Karol Gregor, Rishabh Kabra, Sebastien Racaniere, Theophane Weber, David Raposo, Adam Santoro, Laurent Orseau, Tom Eccles, Greg Wayne, David Silver, Timothy Lillicrap, and Victor Valdes. An investigation of model-free planning: boxoban levels. https://github.com/deepmind/boxoban-levels/, 2018

2018
[6]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[7]

Reason- ing with language model is planning with world model

Shibo Hao, Yi Gu, Haodi Ma, Joshua Hong, Zhen Wang, Daisy Wang, and Zhiting Hu. Reason- ing with language model is planning with world model. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 8154–8173, 2023

2023
[8]

A formal basis for the heuristic determination of minimum cost paths.IEEE transactions on Systems Science and Cybernetics, 4(2):100–107, 1968

Peter E Hart, Nils J Nilsson, and Bertram Raphael. A formal basis for the heuristic determination of minimum cost paths.IEEE transactions on Systems Science and Cybernetics, 4(2):100–107, 1968

1968
[9]

OpenAI o1 System Card

Aaron Jaech, Adam Kalai, Adam Lerer, Adam Richardson, Ahmed El-Kishky, Aiden Low, Alec Helyar, Aleksander Madry, Alex Beutel, Alex Carney, et al. Openai o1 system card.arXiv preprint arXiv:2412.16720, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[10]

Sokoban: A challenging single-agent search problem

Andreas Junghanns and Jonathan Schaeffer. Sokoban: A challenging single-agent search problem. InIJCAI Workshop on Using Games as an Experimental Testbed for AI Reasearch, pages 27–36. Morgan Kaufmann Publishers San Francisco, CA, 1997

1997
[11]

Thought of search: Planning with language models through the lens of efficiency.Advances in Neural Information Processing Systems, 37:138491–138568, 2024

Michael Katz, Harsha Kokel, Kavitha Srinivas, and Shirin Sohrabi. Thought of search: Planning with language models through the lens of efficiency.Advances in Neural Information Processing Systems, 37:138491–138568, 2024

2024
[12]

arXiv preprint arXiv:2502.07374 , year=

Dacheng Li, Shiyi Cao, Tyler Griggs, Shu Liu, Xiangxi Mo, Eric Tang, Sumanth Hegde, Kourosh Hakhamaneshi, Shishir G Patil, Matei Zaharia, et al. Llms can easily learn to reason from demonstrations structure, not content, is what matters!arXiv preprint arXiv:2502.07374, 2025

work page arXiv 2025
[13]

Self-refine: Iterative refinement with self-feedback.Advances in neural information processing systems, 36:46534–46594, 2023

Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, et al. Self-refine: Iterative refinement with self-feedback.Advances in neural information processing systems, 36:46534–46594, 2023

2023
[14]

Llm-a*: Large language model enhanced incremental heuristic search on path planning

Silin Meng, Yiwei Wang, Cheng-Fu Yang, Nanyun Peng, and Kai-Wei Chang. Llm-a*: Large language model enhanced incremental heuristic search on path planning. InFindings of the Association for Computational Linguistics: EMNLP 2024, pages 1087–1102, 2024

2024
[15]

Learning to search from demonstration sequences

Dixant Mittal, Liwei Kang, and Wee Sun Lee. Learning to search from demonstration sequences. InThe Thirteenth International Conference on Learning Representations, 2025

2025
[16]

Show your work: Scratchpads for intermediate computation with language models

Maxwell Nye, Anders Johan Andreassen, Guy Gur-Ari, Henryk Michalewski, Jacob Austin, David Bieber, David Dohan, Aitor Lewkowycz, Maarten Bosma, David Luan, et al. Show your work: Scratchpads for intermediate computation with language models. 2021. 10

2021
[17]

Heuristics: intelligent search strategies for computer problem solving

Judea Pearl. Heuristics: intelligent search strategies for computer problem solving. 1983

1983
[18]

Diversity and dissimilarity coefficients: a unified approach.Theoretical population biology, 21(1):24–43, 1982

C Radhakrishna Rao. Diversity and dissimilarity coefficients: a unified approach.Theoretical population biology, 21(1):24–43, 1982

1982
[19]

arXiv preprint arXiv:2405.06682 , year=

Matthew Renze and Erhan Guven. Self-reflection in llm agents: Effects on problem-solving performance.arXiv preprint arXiv:2405.06682, 2024

work page arXiv 2024
[20]

Algorithm of thoughts: Enhancing exploration of ideas in large language models.arXiv preprint arXiv:2308.10379, 2023

Bilgehan Sel, Ahmad Al-Tawaha, Vanshaj Khattar, Ruoxi Jia, and Ming Jin. Algorithm of thoughts: Enhancing exploration of ideas in large language models.arXiv preprint arXiv:2308.10379, 2023

work page arXiv 2023
[21]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[22]

Reflexion: Language agents with verbal reinforcement learning.Advances in neural information processing systems, 36:8634–8652, 2023

Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. Reflexion: Language agents with verbal reinforcement learning.Advances in neural information processing systems, 36:8634–8652, 2023

2023
[23]

Towards clause-learning state space search: Learning to recognize dead-ends

Marcel Steinmetz and Jörg Hoffmann. Towards clause-learning state space search: Learning to recognize dead-ends. InProceedings of the AAAI Conference on Artificial Intelligence, volume 30, 2016

2016
[24]

Large language models still can’t plan (a benchmark for llms on planning and reasoning about change)

Karthik Valmeekam, Alberto Olmo, Sarath Sreedharan, and Subbarao Kambhampati. Large language models still can’t plan (a benchmark for llms on planning and reasoning about change). InNeurIPS 2022 Foundation Models for Decision Making Workshop, 2022

2022
[25]

Don’t get lost in the trees: Streamlining llm reasoning by overcoming tree search exploration pitfalls

Ante Wang, Linfeng Song, Ye Tian, Dian Yu, Haitao Mi, Xiangyu Duan, Zhaopeng Tu, Jinsong Su, and Dong Yu. Don’t get lost in the trees: Streamlining llm reasoning by overcoming tree search exploration pitfalls. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 23946–23959, 2025

2025
[26]

Chain-of-thought prompting elicits reasoning in large language models

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35:24824–24837, 2022

2022
[27]

Self-evaluation guided beam search for reasoning.Advances in Neural Information Processing Systems, 36:41618–41650, 2023

Yuxi Xie, Kenji Kawaguchi, Yiran Zhao, James Xu Zhao, Min-Yen Kan, Junxian He, and Michael Xie. Self-evaluation guided beam search for reasoning.Advances in Neural Information Processing Systems, 36:41618–41650, 2023

2023
[28]

Qwen3 Technical Report

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[29]

Tree of thoughts: Deliberate problem solving with large language models.Ad- vances in neural information processing systems, 36:11809–11822, 2023

Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Tom Griffiths, Yuan Cao, and Karthik Narasimhan. Tree of thoughts: Deliberate problem solving with large language models.Ad- vances in neural information processing systems, 36:11809–11822, 2023

2023
[30]

Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models

Andy Zhou, Kai Yan, Michal Shlapentokh-Rothman, Haohan Wang, and Yu-Xiong Wang. Language agent tree search unifies reasoning acting and planning in language models.arXiv preprint arXiv:2310.04406, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[31]

Least-to-Most Prompting Enables Complex Reasoning in Large Language Models

Denny Zhou, Nathanael Schärli, Le Hou, Jason Wei, Nathan Scales, Xuezhi Wang, Dale Schuurmans, Claire Cui, Olivier Bousquet, Quoc Le, et al. Least-to-most prompting enables complex reasoning in large language models.arXiv preprint arXiv:2205.10625, 2022. 11 A SFT Trace Examples We show one short instance for each domain and trace formant, drawn from the t...

work page internal anchor Pith review Pith/arXiv arXiv 2022

[1] [1]

Graph of thoughts: Solving elaborate problems with large language models

Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Michal Podstawski, Lukas Gianinazzi, Joanna Gajda, Tomasz Lehmann, Hubert Niewiadomski, Piotr Nyczyk, et al. Graph of thoughts: Solving elaborate problems with large language models. InProceedings of the AAAI conference on artificial intelligence, volume 38, pages 17682–17690, 2024

2024

[2] [2]

Toward adaptive reasoning in large language models with thought rollback.arXiv preprint arXiv:2412.19707, 2024

Sijia Chen and Baochun Li. Toward adaptive reasoning in large language models with thought rollback.arXiv preprint arXiv:2412.19707, 2024

work page arXiv 2024

[3] [3]

Boosting of thoughts: Trial-and-error problem solving with large language models.arXiv preprint arXiv:2402.11140, 2024

Sijia Chen, Baochun Li, and Di Niu. Boosting of thoughts: Trial-and-error problem solving with large language models.arXiv preprint arXiv:2402.11140, 2024

work page arXiv 2024

[4] [4]

Xgrammar: Flexible and efficient structured generation engine for large language models

Yixin Dong, Charlie F Ruan, Yaxing Cai, Ziyi Xu, Yilong Zhao, Ruihang Lai, and Tianqi Chen. Xgrammar: Flexible and efficient structured generation engine for large language models. Proceedings of Machine Learning and Systems, 7, 2025

2025

[5] [5]

An investigation of model-free planning: boxoban levels

Arthur Guez, Mehdi Mirza, Karol Gregor, Rishabh Kabra, Sebastien Racaniere, Theophane Weber, David Raposo, Adam Santoro, Laurent Orseau, Tom Eccles, Greg Wayne, David Silver, Timothy Lillicrap, and Victor Valdes. An investigation of model-free planning: boxoban levels. https://github.com/deepmind/boxoban-levels/, 2018

2018

[6] [6]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[7] [7]

Reason- ing with language model is planning with world model

Shibo Hao, Yi Gu, Haodi Ma, Joshua Hong, Zhen Wang, Daisy Wang, and Zhiting Hu. Reason- ing with language model is planning with world model. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 8154–8173, 2023

2023

[8] [8]

A formal basis for the heuristic determination of minimum cost paths.IEEE transactions on Systems Science and Cybernetics, 4(2):100–107, 1968

Peter E Hart, Nils J Nilsson, and Bertram Raphael. A formal basis for the heuristic determination of minimum cost paths.IEEE transactions on Systems Science and Cybernetics, 4(2):100–107, 1968

1968

[9] [9]

OpenAI o1 System Card

Aaron Jaech, Adam Kalai, Adam Lerer, Adam Richardson, Ahmed El-Kishky, Aiden Low, Alec Helyar, Aleksander Madry, Alex Beutel, Alex Carney, et al. Openai o1 system card.arXiv preprint arXiv:2412.16720, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[10] [10]

Sokoban: A challenging single-agent search problem

Andreas Junghanns and Jonathan Schaeffer. Sokoban: A challenging single-agent search problem. InIJCAI Workshop on Using Games as an Experimental Testbed for AI Reasearch, pages 27–36. Morgan Kaufmann Publishers San Francisco, CA, 1997

1997

[11] [11]

Thought of search: Planning with language models through the lens of efficiency.Advances in Neural Information Processing Systems, 37:138491–138568, 2024

Michael Katz, Harsha Kokel, Kavitha Srinivas, and Shirin Sohrabi. Thought of search: Planning with language models through the lens of efficiency.Advances in Neural Information Processing Systems, 37:138491–138568, 2024

2024

[12] [12]

arXiv preprint arXiv:2502.07374 , year=

Dacheng Li, Shiyi Cao, Tyler Griggs, Shu Liu, Xiangxi Mo, Eric Tang, Sumanth Hegde, Kourosh Hakhamaneshi, Shishir G Patil, Matei Zaharia, et al. Llms can easily learn to reason from demonstrations structure, not content, is what matters!arXiv preprint arXiv:2502.07374, 2025

work page arXiv 2025

[13] [13]

Self-refine: Iterative refinement with self-feedback.Advances in neural information processing systems, 36:46534–46594, 2023

Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, et al. Self-refine: Iterative refinement with self-feedback.Advances in neural information processing systems, 36:46534–46594, 2023

2023

[14] [14]

Llm-a*: Large language model enhanced incremental heuristic search on path planning

Silin Meng, Yiwei Wang, Cheng-Fu Yang, Nanyun Peng, and Kai-Wei Chang. Llm-a*: Large language model enhanced incremental heuristic search on path planning. InFindings of the Association for Computational Linguistics: EMNLP 2024, pages 1087–1102, 2024

2024

[15] [15]

Learning to search from demonstration sequences

Dixant Mittal, Liwei Kang, and Wee Sun Lee. Learning to search from demonstration sequences. InThe Thirteenth International Conference on Learning Representations, 2025

2025

[16] [16]

Show your work: Scratchpads for intermediate computation with language models

Maxwell Nye, Anders Johan Andreassen, Guy Gur-Ari, Henryk Michalewski, Jacob Austin, David Bieber, David Dohan, Aitor Lewkowycz, Maarten Bosma, David Luan, et al. Show your work: Scratchpads for intermediate computation with language models. 2021. 10

2021

[17] [17]

Heuristics: intelligent search strategies for computer problem solving

Judea Pearl. Heuristics: intelligent search strategies for computer problem solving. 1983

1983

[18] [18]

Diversity and dissimilarity coefficients: a unified approach.Theoretical population biology, 21(1):24–43, 1982

C Radhakrishna Rao. Diversity and dissimilarity coefficients: a unified approach.Theoretical population biology, 21(1):24–43, 1982

1982

[19] [19]

arXiv preprint arXiv:2405.06682 , year=

Matthew Renze and Erhan Guven. Self-reflection in llm agents: Effects on problem-solving performance.arXiv preprint arXiv:2405.06682, 2024

work page arXiv 2024

[20] [20]

Algorithm of thoughts: Enhancing exploration of ideas in large language models.arXiv preprint arXiv:2308.10379, 2023

Bilgehan Sel, Ahmad Al-Tawaha, Vanshaj Khattar, Ruoxi Jia, and Ming Jin. Algorithm of thoughts: Enhancing exploration of ideas in large language models.arXiv preprint arXiv:2308.10379, 2023

work page arXiv 2023

[21] [21]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[22] [22]

Reflexion: Language agents with verbal reinforcement learning.Advances in neural information processing systems, 36:8634–8652, 2023

Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. Reflexion: Language agents with verbal reinforcement learning.Advances in neural information processing systems, 36:8634–8652, 2023

2023

[23] [23]

Towards clause-learning state space search: Learning to recognize dead-ends

Marcel Steinmetz and Jörg Hoffmann. Towards clause-learning state space search: Learning to recognize dead-ends. InProceedings of the AAAI Conference on Artificial Intelligence, volume 30, 2016

2016

[24] [24]

Large language models still can’t plan (a benchmark for llms on planning and reasoning about change)

Karthik Valmeekam, Alberto Olmo, Sarath Sreedharan, and Subbarao Kambhampati. Large language models still can’t plan (a benchmark for llms on planning and reasoning about change). InNeurIPS 2022 Foundation Models for Decision Making Workshop, 2022

2022

[25] [25]

Don’t get lost in the trees: Streamlining llm reasoning by overcoming tree search exploration pitfalls

Ante Wang, Linfeng Song, Ye Tian, Dian Yu, Haitao Mi, Xiangyu Duan, Zhaopeng Tu, Jinsong Su, and Dong Yu. Don’t get lost in the trees: Streamlining llm reasoning by overcoming tree search exploration pitfalls. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 23946–23959, 2025

2025

[26] [26]

Chain-of-thought prompting elicits reasoning in large language models

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35:24824–24837, 2022

2022

[27] [27]

Self-evaluation guided beam search for reasoning.Advances in Neural Information Processing Systems, 36:41618–41650, 2023

Yuxi Xie, Kenji Kawaguchi, Yiran Zhao, James Xu Zhao, Min-Yen Kan, Junxian He, and Michael Xie. Self-evaluation guided beam search for reasoning.Advances in Neural Information Processing Systems, 36:41618–41650, 2023

2023

[28] [28]

Qwen3 Technical Report

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[29] [29]

Tree of thoughts: Deliberate problem solving with large language models.Ad- vances in neural information processing systems, 36:11809–11822, 2023

Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Tom Griffiths, Yuan Cao, and Karthik Narasimhan. Tree of thoughts: Deliberate problem solving with large language models.Ad- vances in neural information processing systems, 36:11809–11822, 2023

2023

[30] [30]

Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models

Andy Zhou, Kai Yan, Michal Shlapentokh-Rothman, Haohan Wang, and Yu-Xiong Wang. Language agent tree search unifies reasoning acting and planning in language models.arXiv preprint arXiv:2310.04406, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[31] [31]

Least-to-Most Prompting Enables Complex Reasoning in Large Language Models

Denny Zhou, Nathanael Schärli, Le Hou, Jason Wei, Nathan Scales, Xuezhi Wang, Dale Schuurmans, Claire Cui, Olivier Bousquet, Quoc Le, et al. Least-to-most prompting enables complex reasoning in large language models.arXiv preprint arXiv:2205.10625, 2022. 11 A SFT Trace Examples We show one short instance for each domain and trace formant, drawn from the t...

work page internal anchor Pith review Pith/arXiv arXiv 2022