Planning in the LLM Era: Building for Reliability and Efficiency

Harsha Kokel; Kavitha Srinivas; Michael Katz; Shirin Sohrabi

arxiv: 2605.21902 · v1 · pith:2TI36DTPnew · submitted 2026-05-21 · 💻 cs.AI · cs.CL

Planning in the LLM Era: Building for Reliability and Efficiency

Michael Katz , Harsha Kokel , Kavitha Srinivas , Shirin Sohrabi This is my paper

Pith reviewed 2026-05-22 06:48 UTC · model grok-4.3

classification 💻 cs.AI cs.CL

keywords planninglarge language modelsLLM planningsymbolic solversAI agentsreliabilityefficiencyplanner generation

0 comments

The pith

LLMs are shifting from directly generating plans to creating verifiable symbolic solvers for families of problems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Early LLM planning work relied on single-shot plan generation or hybrid methods that pair models with limited external search. These techniques prove unsound and incomplete, often consuming substantial resources while failing to improve results on unseen problems. Newer methods instead use LLMs only during solution construction to produce symbolic solvers that handle entire problem families, which can then be verified and executed efficiently without the model. The paper argues this change marks a realignment of the planning field toward agents that are both reliable and resource-efficient, with minimal language-model dependence at runtime. It reviews three categories of such planner-generation methods, notes their current limits, and outlines steps for further progress.

Core claim

The paper claims that the planning field is realigning in the LLM era by moving away from single-shot and hybrid approaches and toward using LLMs at solution construction time to generate symbolic solvers for families of problems. These solvers can be verified and then used efficiently at inference time, supporting agents that are reliable and resource-efficient while maintaining minimal dependence on language models during operation.

What carries the argument

Planner-generation methods that task LLMs with producing verifiable symbolic solvers for problem families, which are then deployed independently of the model.

If this is right

Agents gain reliability because solutions come from verified symbolic solvers rather than direct model output.
Inference-time resource demands drop since the language model is no longer needed for each planning request.
Generated planners become maintainable with far less ongoing reliance on large language models.
Research can prioritize improvements in solver generation and verification to widen applicability.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same construction-time generation pattern could be explored for other agent capabilities such as reasoning or tool use.
Empirical tests on concrete application domains would clarify whether the efficiency gains hold in practice.
Integration with established symbolic planning systems might produce hybrid tools that combine verification with learned generation.

Load-bearing premise

The limitations of single-shot and hybrid LLM planning are inherent to those methods, and planner-generation approaches meaningfully overcome them for problems not encountered during generation.

What would settle it

An experiment showing that single-shot or hybrid LLM planning matches or exceeds the reliability and efficiency of generated symbolic solvers on a diverse set of previously unseen planning problems would challenge the central claim.

Figures

Figures reproduced from arXiv: 2605.21902 by Harsha Kokel, Kavitha Srinivas, Michael Katz, Shirin Sohrabi.

read the original abstract

Growing attention to intelligent agents has put a spotlight on one of their central capabilities: planning. Early attempts to leverage large language models (LLMs) for planning relied on single-shot plan generation, followed by hybrid approaches that coupled LLMs with limited external search. These methods, unsound and incomplete by their very nature, often require substantial resources without yielding better solutions on unseen problems. As the limitations of LLMs become clearer, recent work has shifted toward using them at solution construction time -- generating symbolic solvers for a family of problems that can be verified and then used efficiently at inference time. This trend reflects the growing need for agents that are both reliable and resource-efficient. It also offers a path towards generating maintainable planners with minimal dependence on language models at inference time. In this paper, we argue that this shift reflects a broader realignment of the planning field in the LLM era. We examine three major categories of planner-generation methods, discuss their current limitations, and outline research steps towards a more reliable and efficient LLM-based generation of planners.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a position paper that clearly frames the move from direct LLM planning to generating verifiable symbolic solvers, but offers no new evidence that the approach actually generalizes.

read the letter

Colleague, the main point is that the authors see the field shifting from single-shot or hybrid LLM planning toward using models to generate symbolic planners that can be verified once and then run efficiently on new instances. They treat this as a broader realignment that could deliver better reliability and lower inference costs. The paper does a solid job laying out the problems with earlier methods, such as their lack of soundness and weak results on unseen problems. It then groups recent planner-generation work into three categories, notes the role of verification, and lists some concrete research steps to strengthen the approach. That kind of organized overview can help people keep track of where things are heading. The soft spot is the lack of support for the key claim. The argument assumes that verification will reliably catch failures when problem structure or constraints shift, yet the text gives no formal argument, new experiment, or specific cited result showing this actually happens. The earlier critique of single-shot methods as inherently limited is reasonable, but the same gap reappears if the generated solver's tests or proofs are incomplete for novel cases. The paper stays at the level of observed trends and future directions rather than demonstrating the improvement. This is useful for researchers working on LLM agents and planning who want context on current directions and a set of open questions to consider. It will not supply new methods or data for a methods section, but it can help frame discussions or motivate follow-up work. I would send it to peer review. The topic is timely, the categorization is clear, and referees could usefully push on the evidence needed to back the reliability claims.

Referee Report

2 major / 1 minor

Summary. The paper claims that early LLM-based planning relied on single-shot plan generation and hybrid LLM-external search methods, which are unsound and incomplete by nature and often fail to yield better solutions on unseen problems despite high resource costs. It argues that the field is shifting toward using LLMs at solution construction time to generate verifiable symbolic solvers for families of problems; these solvers can then be used efficiently at inference time with minimal LLM dependence. The manuscript examines three major categories of such planner-generation methods, discusses their limitations, and outlines research steps toward more reliable and efficient LLM-based planner generation.

Significance. If the central thesis holds and the generated planners generalize reliably, this realignment could enable more maintainable, verifiable, and resource-efficient planning for intelligent agents, reducing runtime dependence on LLMs. The significance hinges on whether verification in these approaches addresses soundness gaps for unseen problems, an aspect the manuscript does not fully substantiate.

major comments (2)

[Section 3] Section 3: The discussion of the three categories of planner-generation methods describes verification steps and efficiency gains at inference time but provides no formal argument, proof sketch, or cited empirical result showing that verification catches failures under distribution shift in problem structure or constraints for unseen instances.
[Abstract] Abstract: The claim that single-shot and hybrid methods are 'unsound and incomplete by their very nature' and 'often require substantial resources without yielding better solutions on unseen problems' is central to motivating the realignment, yet the manuscript offers no specific evidence, data, or detailed analysis to support this characterization or the comparison to newer approaches.

minor comments (1)

The manuscript would benefit from an explicit early definition or taxonomy of the three planner-generation categories to improve accessibility for readers new to the subfield.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. We address each major comment below, indicating planned revisions where appropriate to strengthen the presentation of our position.

read point-by-point responses

Referee: [Section 3] Section 3: The discussion of the three categories of planner-generation methods describes verification steps and efficiency gains at inference time but provides no formal argument, proof sketch, or cited empirical result showing that verification catches failures under distribution shift in problem structure or constraints for unseen instances.

Authors: We agree that Section 3 would benefit from a more explicit treatment of verification under distribution shift. The manuscript is a position paper that surveys emerging planner-generation approaches and their reported verification mechanisms from the literature, rather than presenting new formal proofs or original empirical results. In revision, we will expand the section to discuss the challenges of verifying soundness for unseen problem structures, reference relevant empirical findings from cited works where they exist, and emphasize this as a key open issue aligned with the research steps we already outline. revision: yes
Referee: [Abstract] Abstract: The claim that single-shot and hybrid methods are 'unsound and incomplete by their very nature' and 'often require substantial resources without yielding better solutions on unseen problems' is central to motivating the realignment, yet the manuscript offers no specific evidence, data, or detailed analysis to support this characterization or the comparison to newer approaches.

Authors: The abstract statements summarize well-documented limitations of early LLM planning methods as established in the broader literature. To provide the requested support, we will revise the abstract for precision and add targeted references plus brief illustrative analysis in the introduction, drawing on specific studies that document unsoundness, incompleteness, and performance degradation on unseen instances for single-shot and hybrid approaches. This will ground the motivation without changing the paper's core argument. revision: yes

Circularity Check

0 steps flagged

No circularity: observational position paper on field trends

full rationale

The paper presents an observational summary of trends in LLM-based planning, categorizing single-shot, hybrid, and planner-generation methods while discussing limitations and future directions. It contains no equations, derivations, fitted parameters, or self-referential definitions that reduce claims to their own inputs. The central argument relies on external literature and field observations rather than internal reductions or load-bearing self-citations, rendering the analysis self-contained against external benchmarks with no detectable circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are introduced; the paper is a high-level discussion of research trends.

pith-pipeline@v0.9.0 · 5713 in / 988 out tokens · 45312 ms · 2026-05-22T06:48:57.167232+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

recent work has shifted toward using them at solution construction time -- generating symbolic solvers for a family of problems that can be verified and then used efficiently at inference time
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We examine three major categories of planner-generation methods

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · 2 internal anchors

[1]

Asai, M.; and Fukunaga, A. 2018. Classical Planning in Deep Latent Space: Bridging the Subsymbolic-Symbolic Boundary. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence ( AAAI 2018) , 6094--6101. AAAI Press

work page 2018
[2]

a ckstr \

B \"a ckstr \"o m, C.; and Nebel, B. 1995. Complexity Results for SAS ^ + Planning. Computational Intelligence, 11(4): 625--655

work page 1995
[3]

Besta, M.; Blach, N.; Kubicek, A.; Gerstenberger, R.; Podstawski, M.; Gianinazzi, L.; Gajda, J.; Lehmann, T.; Niewiadomski, H.; Nyczyk, P.; and Hoefler, T. 2024. Graph of Thoughts: Solving Elaborate Problems with Large Language Models. In Dy, J.; and Natarajan, S., eds., Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence ( AAAI 20...

work page 2024
[4]

Caglar, T.; Belhaj, S.; Chakraborti, T.; Katz, M.; and Sreedharan, S. 2024. Can LLM s Fix Issues with Reasoning Models? Towards More Likely Models for AI Planning. In Dy, J.; and Natarajan, S., eds., Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence ( AAAI 2024) , 20061--20069. AAAI Press

work page 2024
[5]

Cao, D.; Katz, M.; Kokel, H.; Srinivas, K.; and Sohrabi, S. 2024. Automating T hought of S earch: A Journey Towards Soundness and Completeness. In NeurIPS 2024 Workshop on Open-World Agents

work page 2024
[6]

B.; Giacomo, G

Corr \^e a, A. B.; Giacomo, G. D.; Helmert, M.; and Rubin, S. 2024. Planning with Object Creation. In Bernardini, S.; and Muise, C., eds., Proceedings of the Thirty-Fourth International Conference on Automated Planning and Scheduling (ICAPS 2024), 104--113. AAAI Press

work page 2024
[7]

Frontier Large Language Models Rival State-of-the-Art Planners

Corr \^e a, A. B.; Pereira, A. G.; and Seipp, J. 2025 a . The 2025 Planning Performance of Frontier Large Language Models. arXiv:2511.09378

work page internal anchor Pith review Pith/arXiv arXiv 2025
[8]

B.; Pereira, A

Corr \^e a, A. B.; Pereira, A. G.; and Seipp, J. 2025 b . Classical Planning with LLM-Generated Heuristics: Challenging the State of the Art with Python Code. In Proceedings of the Thirty-Eight Annual Conference on Neural Information Processing Systems ( NeurIPS 2025)

work page 2025
[9]

Echchahed, A.; and Castro, P. S. 2025. A Survey of State Representation Learning for Deep Reinforcement Learning. Trans. Mach. Learn. Res., 2025

work page 2025
[10]

Gestrin, E.; Kuhlmann, M.; and Seipp, J. 2024. NL2Plan: Robust LLM-Driven Planning from Minimal Text Descriptions. arXiv:2405.04215

work page arXiv 2024
[11]

Guan, L.; Valmeekam, K.; Sreedharan, S.; and Kambhampati, S. 2023. Leveraging pre-trained large language models to construct and utilize world models for model-based task planning. In Proceedings of the Thirty-Seventh Annual Conference on Neural Information Processing Systems ( NeurIPS 2023) , 79081--79094

work page 2023
[12]

Hao, S.; Gu, Y.; Ma, H.; Hong, J.; Wang, Z.; Wang, D.; and Hu, Z. 2023. Reasoning with Language Model is Planning with World Model. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing ( EMNLP 2023)

work page 2023
[13]

Hodel, N. 2024. Exploring the use of LLMs in generalized planning. Bachelor's thesis, Saarland University

work page 2024
[14]

Huang, S.; Lipovetzky, N.; and Cohn, T. 2025. Planning in the Dark: LLM-Symbolic Planning Pipeline Without Experts. In Walsh, T.; Shah, J.; and Kolter, Z., eds., Proceedings of the Thirty-Nineth AAAI Conference on Artificial Intelligence ( AAAI 2025) , 26542--26550. AAAI Press

work page 2025
[15]

Jim \'e nez, S.; Segovia-Aguas, J.; and Jonsson, A. 2019. A Review of Generalized Planning. The Knowledge Engineering Review, 34: e5

work page 2019
[16]

P.; and Murthy, A

Kambhampati, S.; Valmeekam, K.; Guan, L.; Verma, M.; Stechly, K.; Bhambri, S.; Saldyt, L. P.; and Murthy, A. B. 2024. Position: LLM s Can t Plan, But Can Help Planning in LLM -Modulo Frameworks. In Proceedings of the 41st International Conference on Machine Learning (ICML 2024) . JMLR .org

work page 2024
[17]

Katz, M.; Kokel, H.; and Sreedharan, S. 2025. Seemingly Simple Planning Problems are Computationally Challenging: The Countdown Game. arXiv:2508.02900 [cs.AI]

work page internal anchor Pith review Pith/arXiv arXiv 2025
[18]

Katz, M.; Kokel, H.; Srinivas, K.; and Sohrabi, S. 2024. Thought of Search: Planning with Language Models Through The Lens of Efficiency. In Proceedings of the Thirty-Seventh Annual Conference on Neural Information Processing Systems ( NeurIPS 2024)

work page 2024
[19]

P.; and Lozano-Perez, T

Konidaris, G.; Kaelbling, L. P.; and Lozano-Perez, T. 2018. From skills to symbols: Learning symbolic representations for abstract high-level planning. Journal of Artificial Intelligence Research, 61: 215--289

work page 2018
[20]

Liang, J.; Huang, W.; Xia, F.; Xu, P.; Hausman, K.; Ichter, B.; Florence, P.; and Zeng, A. 2023. Code as Policies: Language Model Programs for Embodied Control. In 2023 IEEE International Conference on Robotics and Automation (ICRA), 9493--9500

work page 2023
[21]

F.; Hayton, T.; Porteous, J.; and Gregory, P

Lindsay, A.; Read, J.; Ferreira, J. F.; Hayton, T.; Porteous, J.; and Gregory, P. 2017. Framer: Planning Models from Natural Language Action Descriptions. In Barbulescu, L.; Frank, J.; Mausam; and Smith, S. F., eds., Proceedings of the Twenty-Seventh International Conference on Automated Planning and Scheduling (ICAPS 2017), 434--442. AAAI Press

work page 2017
[22]

McDermott, D.; Ghallab, M.; Howe, A.; Knoblock, C.; Ram, A.; Veloso, M.; Weld, D.; and Wilkins, D. 1998. PDDL -- The Planning Domain Definition Language -- Version 1.2. Technical Report CVC TR-98-003/DCS TR-1165, Yale Center for Computational Vision and Control, Yale University

work page 1998
[23]

Oswald, J.; Srinivas, K.; Kokel, H.; Lee, J.; Katz, M.; and Sohrabi, S. 2024. Large Language Models as Planning Domain Generators. In Bernardini, S.; and Muise, C., eds., Proceedings of the Thirty-Fourth International Conference on Automated Planning and Scheduling (ICAPS 2024). AAAI Press

work page 2024
[24]

Palacios, H.; and Geffner, H. 2009. Compiling Uncertainty Away in Conformant Planning Problems with Bounded Width. Journal of Artificial Intelligence Research, 35: 623--675

work page 2009
[25]

Sel, B.; Al - Tawaha, A.; Khattar, V.; Wang, L.; Jia, R.; and Jin, M. 2023. Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models. CoRR, abs/2308.10379

work page arXiv 2023
[26]

Shinn, N.; Cassano, F.; Gopinath, A.; Narasimhan, K.; and Yao, S. 2023. Reflexion: language agents with verbal reinforcement learning. In Proceedings of the Thirty-Seventh Annual Conference on Neural Information Processing Systems ( NeurIPS 2023)

work page 2023
[27]

Silver, T.; Dan, S.; Srinivas, K.; Tenenbaum, J.; Pack Kaelbling , L.; and Katz, M. 2024. Generalized Planning in PDDL Domains with Pretrained Large Language Models. In Dy, J.; and Natarajan, S., eds., Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence ( AAAI 2024) . AAAI Press

work page 2024
[28]

S.; Kumar, N.; Lozano-P \'e rez, T.; and Kaelbling, L

Silver, T.; Hariprasad, V.; Shuttleworth, R. S.; Kumar, N.; Lozano-P \'e rez, T.; and Kaelbling, L. P. 2022. PDDL Planning with Pretrained Large Language Models. In NeurIPS 2022 Workshop on Foundation Models for Decision Making

work page 2022
[29]

Singh, I.; Blukis, V.; Mousavian, A.; Goyal, A.; Xu, D.; Tremblay, J.; Fox, D.; Thomason, J.; and Garg, A. 2023. ProgPrompt: Generating Situated Robot Task Plans using Large Language Models. In 2023 IEEE International Conference on Robotics and Automation (ICRA), 11523--11530

work page 2023
[30]

V.; Katz, M.; and Udrea, O

Sohrabi, S.; Riabov, A. V.; Katz, M.; and Udrea, O. 2018. An AI Planning Solution to Scenario Generation for Enterprise Risk Management. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence ( AAAI 2018) , 160--167. AAAI Press

work page 2018
[31]

Song, L.; Dai, Y.; Prabhu, V.; Zhang, J.; Shi, T.; Li, L.; Li, J.; Savarese, S.; Chen, Z.; Zhao, J.; et al. 2025. Coact-1: Computer-using agents with coding as actions. arXiv preprint arXiv:2508.03923

work page arXiv 2025
[32]

Stein, K.; Hodel, N.; Fišer, D.; Hoffmann, J.; Katz, M.; and Koller, A. 2025. Improved Generalized Planning with LLMs through Strategy Refinement and Reflection. arXiv:2508.13876

work page arXiv 2025
[33]

Sun, H.; Zhuang, Y.; Kong, L.; Dai, B.; and Zhang, C. 2023. AdaPlanner: Adaptive Planning from Feedback with Language Models. In Oh, A.; Naumann, T.; Globerson, A.; Saenko, K.; Hardt, M.; and Levine, S., eds., Advances in Neural Information Processing Systems, volume 36, 58202--58245. Curran Associates, Inc

work page 2023
[34]

Tantakoun, M.; Muise, C.; and Zhu, X. 2025. LLM s as Planning Formalizers: A Survey for Leveraging Large Language Models to Construct Automated Planning Models. In Che, W.; Nabende, J.; Shutova, E.; and Pilehvar, M. T., eds., Findings of the Association for Computational Linguistics: ACL 2025. Association for Computational Linguistics

work page 2025
[35]

Trivedi, H.; Khot, T.; Hartmann, M.; Manku, R.; Dong, V.; Li, E.; Gupta, S.; Sabharwal, A.; and Balasubramanian, N. 2024. AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents. In Ku, L.; Martins, A.; and Srikumar, V., eds., Findings of the Association for Computational Linguistics: ACL 2024, 16022--16076. Associatio...

work page 2024
[36]

Tuisov, A.; Vernik, Y.; and Shleyfman, A. 2025. LLM-Generated Heuristics for AI Planning: Do We Even Need Domain-Independence Anymore? arXiv:2501.18784

work page arXiv 2025
[37]

Y.; Rambachan, A.; Kleinberg, J.; and Mullainathan, S

Vafa, K.; Chen, J. Y.; Rambachan, A.; Kleinberg, J.; and Mullainathan, S. 2024. Evaluating the World Model Implicit in a Generative Model. arXiv:2406.03689

work page arXiv 2024
[38]

L.; and Petrick, R

Vallati, M.; Bart \' a k, R.; Chrpa, L.; McCluskey, T. L.; and Petrick, R. P. A. 2025. Knowledge Engineering for Planning and Scheduling in the LLM Era. In Harabor, D.; and Ramirez, M., eds., Proceedings of the thirty-fifth International Conference on Automated Planning and Scheduling (ICAPS 2025), 391--395. AAAI Press

work page 2025
[39]

Valmeekam, K.; Marquez, M.; Olmo, A.; Sreedharan, S.; and Kambhampati, S. 2023 a . PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change. In Proceedings of the Thirty-Seventh Annual Conference on Neural Information Processing Systems ( NeurIPS 2023) , 38975--38987

work page 2023
[40]

Valmeekam, K.; Marquez, M.; Sreedharan, S.; and Kambhampati, S. 2023 b . On the Planning Abilities of Large Language Models - A Critical Investigation. In Proceedings of the Thirty-Seventh Annual Conference on Neural Information Processing Systems ( NeurIPS 2023)

work page 2023
[41]

Wang, X.; Chen, Y.; Yuan, L.; Zhang, Y.; Li, Y.; Peng, H.; and Ji, H. 2024. Executable Code Actions Elicit Better LLM Agents. In Proceedings of the 41st International Conference on Machine Learning (ICML 2024) . OpenReview.net

work page 2024
[42]

Yao, S.; Yu, D.; Zhao, J.; Shafran, I.; Griffiths, T.; Cao, Y.; and Narasimhan, K. 2023. Tree of thoughts: Deliberate problem solving with large language models. In Proceedings of the Thirty-Seventh Annual Conference on Neural Information Processing Systems ( NeurIPS 2023)

work page 2023
[43]

F.; Zhu, H.; Zhou, X.; Lo, R.; Sridhar, A.; Cheng, X.; Ou, T.; Bisk, Y.; Fried, D.; Alon, U.; and Neubig, G

Zhou, S.; Xu, F. F.; Zhu, H.; Zhou, X.; Lo, R.; Sridhar, A.; Cheng, X.; Ou, T.; Bisk, Y.; Fried, D.; Alon, U.; and Neubig, G. 2024. WebArena: A Realistic Web Environment for Building Autonomous Agents. In Proceedings of the Twelfth International Conference on Learning Representations (ICLR 2024). OpenReview.net

work page 2024

[1] [1]

Asai, M.; and Fukunaga, A. 2018. Classical Planning in Deep Latent Space: Bridging the Subsymbolic-Symbolic Boundary. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence ( AAAI 2018) , 6094--6101. AAAI Press

work page 2018

[2] [2]

a ckstr \

B \"a ckstr \"o m, C.; and Nebel, B. 1995. Complexity Results for SAS ^ + Planning. Computational Intelligence, 11(4): 625--655

work page 1995

[3] [3]

Besta, M.; Blach, N.; Kubicek, A.; Gerstenberger, R.; Podstawski, M.; Gianinazzi, L.; Gajda, J.; Lehmann, T.; Niewiadomski, H.; Nyczyk, P.; and Hoefler, T. 2024. Graph of Thoughts: Solving Elaborate Problems with Large Language Models. In Dy, J.; and Natarajan, S., eds., Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence ( AAAI 20...

work page 2024

[4] [4]

Caglar, T.; Belhaj, S.; Chakraborti, T.; Katz, M.; and Sreedharan, S. 2024. Can LLM s Fix Issues with Reasoning Models? Towards More Likely Models for AI Planning. In Dy, J.; and Natarajan, S., eds., Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence ( AAAI 2024) , 20061--20069. AAAI Press

work page 2024

[5] [5]

Cao, D.; Katz, M.; Kokel, H.; Srinivas, K.; and Sohrabi, S. 2024. Automating T hought of S earch: A Journey Towards Soundness and Completeness. In NeurIPS 2024 Workshop on Open-World Agents

work page 2024

[6] [6]

B.; Giacomo, G

Corr \^e a, A. B.; Giacomo, G. D.; Helmert, M.; and Rubin, S. 2024. Planning with Object Creation. In Bernardini, S.; and Muise, C., eds., Proceedings of the Thirty-Fourth International Conference on Automated Planning and Scheduling (ICAPS 2024), 104--113. AAAI Press

work page 2024

[7] [7]

Frontier Large Language Models Rival State-of-the-Art Planners

Corr \^e a, A. B.; Pereira, A. G.; and Seipp, J. 2025 a . The 2025 Planning Performance of Frontier Large Language Models. arXiv:2511.09378

work page internal anchor Pith review Pith/arXiv arXiv 2025

[8] [8]

B.; Pereira, A

Corr \^e a, A. B.; Pereira, A. G.; and Seipp, J. 2025 b . Classical Planning with LLM-Generated Heuristics: Challenging the State of the Art with Python Code. In Proceedings of the Thirty-Eight Annual Conference on Neural Information Processing Systems ( NeurIPS 2025)

work page 2025

[9] [9]

Echchahed, A.; and Castro, P. S. 2025. A Survey of State Representation Learning for Deep Reinforcement Learning. Trans. Mach. Learn. Res., 2025

work page 2025

[10] [10]

Gestrin, E.; Kuhlmann, M.; and Seipp, J. 2024. NL2Plan: Robust LLM-Driven Planning from Minimal Text Descriptions. arXiv:2405.04215

work page arXiv 2024

[11] [11]

Guan, L.; Valmeekam, K.; Sreedharan, S.; and Kambhampati, S. 2023. Leveraging pre-trained large language models to construct and utilize world models for model-based task planning. In Proceedings of the Thirty-Seventh Annual Conference on Neural Information Processing Systems ( NeurIPS 2023) , 79081--79094

work page 2023

[12] [12]

Hao, S.; Gu, Y.; Ma, H.; Hong, J.; Wang, Z.; Wang, D.; and Hu, Z. 2023. Reasoning with Language Model is Planning with World Model. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing ( EMNLP 2023)

work page 2023

[13] [13]

Hodel, N. 2024. Exploring the use of LLMs in generalized planning. Bachelor's thesis, Saarland University

work page 2024

[14] [14]

Huang, S.; Lipovetzky, N.; and Cohn, T. 2025. Planning in the Dark: LLM-Symbolic Planning Pipeline Without Experts. In Walsh, T.; Shah, J.; and Kolter, Z., eds., Proceedings of the Thirty-Nineth AAAI Conference on Artificial Intelligence ( AAAI 2025) , 26542--26550. AAAI Press

work page 2025

[15] [15]

Jim \'e nez, S.; Segovia-Aguas, J.; and Jonsson, A. 2019. A Review of Generalized Planning. The Knowledge Engineering Review, 34: e5

work page 2019

[16] [16]

P.; and Murthy, A

Kambhampati, S.; Valmeekam, K.; Guan, L.; Verma, M.; Stechly, K.; Bhambri, S.; Saldyt, L. P.; and Murthy, A. B. 2024. Position: LLM s Can t Plan, But Can Help Planning in LLM -Modulo Frameworks. In Proceedings of the 41st International Conference on Machine Learning (ICML 2024) . JMLR .org

work page 2024

[17] [17]

Katz, M.; Kokel, H.; and Sreedharan, S. 2025. Seemingly Simple Planning Problems are Computationally Challenging: The Countdown Game. arXiv:2508.02900 [cs.AI]

work page internal anchor Pith review Pith/arXiv arXiv 2025

[18] [18]

Katz, M.; Kokel, H.; Srinivas, K.; and Sohrabi, S. 2024. Thought of Search: Planning with Language Models Through The Lens of Efficiency. In Proceedings of the Thirty-Seventh Annual Conference on Neural Information Processing Systems ( NeurIPS 2024)

work page 2024

[19] [19]

P.; and Lozano-Perez, T

Konidaris, G.; Kaelbling, L. P.; and Lozano-Perez, T. 2018. From skills to symbols: Learning symbolic representations for abstract high-level planning. Journal of Artificial Intelligence Research, 61: 215--289

work page 2018

[20] [20]

Liang, J.; Huang, W.; Xia, F.; Xu, P.; Hausman, K.; Ichter, B.; Florence, P.; and Zeng, A. 2023. Code as Policies: Language Model Programs for Embodied Control. In 2023 IEEE International Conference on Robotics and Automation (ICRA), 9493--9500

work page 2023

[21] [21]

F.; Hayton, T.; Porteous, J.; and Gregory, P

Lindsay, A.; Read, J.; Ferreira, J. F.; Hayton, T.; Porteous, J.; and Gregory, P. 2017. Framer: Planning Models from Natural Language Action Descriptions. In Barbulescu, L.; Frank, J.; Mausam; and Smith, S. F., eds., Proceedings of the Twenty-Seventh International Conference on Automated Planning and Scheduling (ICAPS 2017), 434--442. AAAI Press

work page 2017

[22] [22]

McDermott, D.; Ghallab, M.; Howe, A.; Knoblock, C.; Ram, A.; Veloso, M.; Weld, D.; and Wilkins, D. 1998. PDDL -- The Planning Domain Definition Language -- Version 1.2. Technical Report CVC TR-98-003/DCS TR-1165, Yale Center for Computational Vision and Control, Yale University

work page 1998

[23] [23]

Oswald, J.; Srinivas, K.; Kokel, H.; Lee, J.; Katz, M.; and Sohrabi, S. 2024. Large Language Models as Planning Domain Generators. In Bernardini, S.; and Muise, C., eds., Proceedings of the Thirty-Fourth International Conference on Automated Planning and Scheduling (ICAPS 2024). AAAI Press

work page 2024

[24] [24]

Palacios, H.; and Geffner, H. 2009. Compiling Uncertainty Away in Conformant Planning Problems with Bounded Width. Journal of Artificial Intelligence Research, 35: 623--675

work page 2009

[25] [25]

Sel, B.; Al - Tawaha, A.; Khattar, V.; Wang, L.; Jia, R.; and Jin, M. 2023. Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models. CoRR, abs/2308.10379

work page arXiv 2023

[26] [26]

Shinn, N.; Cassano, F.; Gopinath, A.; Narasimhan, K.; and Yao, S. 2023. Reflexion: language agents with verbal reinforcement learning. In Proceedings of the Thirty-Seventh Annual Conference on Neural Information Processing Systems ( NeurIPS 2023)

work page 2023

[27] [27]

Silver, T.; Dan, S.; Srinivas, K.; Tenenbaum, J.; Pack Kaelbling , L.; and Katz, M. 2024. Generalized Planning in PDDL Domains with Pretrained Large Language Models. In Dy, J.; and Natarajan, S., eds., Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence ( AAAI 2024) . AAAI Press

work page 2024

[28] [28]

S.; Kumar, N.; Lozano-P \'e rez, T.; and Kaelbling, L

Silver, T.; Hariprasad, V.; Shuttleworth, R. S.; Kumar, N.; Lozano-P \'e rez, T.; and Kaelbling, L. P. 2022. PDDL Planning with Pretrained Large Language Models. In NeurIPS 2022 Workshop on Foundation Models for Decision Making

work page 2022

[29] [29]

Singh, I.; Blukis, V.; Mousavian, A.; Goyal, A.; Xu, D.; Tremblay, J.; Fox, D.; Thomason, J.; and Garg, A. 2023. ProgPrompt: Generating Situated Robot Task Plans using Large Language Models. In 2023 IEEE International Conference on Robotics and Automation (ICRA), 11523--11530

work page 2023

[30] [30]

V.; Katz, M.; and Udrea, O

Sohrabi, S.; Riabov, A. V.; Katz, M.; and Udrea, O. 2018. An AI Planning Solution to Scenario Generation for Enterprise Risk Management. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence ( AAAI 2018) , 160--167. AAAI Press

work page 2018

[31] [31]

Song, L.; Dai, Y.; Prabhu, V.; Zhang, J.; Shi, T.; Li, L.; Li, J.; Savarese, S.; Chen, Z.; Zhao, J.; et al. 2025. Coact-1: Computer-using agents with coding as actions. arXiv preprint arXiv:2508.03923

work page arXiv 2025

[32] [32]

Stein, K.; Hodel, N.; Fišer, D.; Hoffmann, J.; Katz, M.; and Koller, A. 2025. Improved Generalized Planning with LLMs through Strategy Refinement and Reflection. arXiv:2508.13876

work page arXiv 2025

[33] [33]

Sun, H.; Zhuang, Y.; Kong, L.; Dai, B.; and Zhang, C. 2023. AdaPlanner: Adaptive Planning from Feedback with Language Models. In Oh, A.; Naumann, T.; Globerson, A.; Saenko, K.; Hardt, M.; and Levine, S., eds., Advances in Neural Information Processing Systems, volume 36, 58202--58245. Curran Associates, Inc

work page 2023

[34] [34]

Tantakoun, M.; Muise, C.; and Zhu, X. 2025. LLM s as Planning Formalizers: A Survey for Leveraging Large Language Models to Construct Automated Planning Models. In Che, W.; Nabende, J.; Shutova, E.; and Pilehvar, M. T., eds., Findings of the Association for Computational Linguistics: ACL 2025. Association for Computational Linguistics

work page 2025

[35] [35]

Trivedi, H.; Khot, T.; Hartmann, M.; Manku, R.; Dong, V.; Li, E.; Gupta, S.; Sabharwal, A.; and Balasubramanian, N. 2024. AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents. In Ku, L.; Martins, A.; and Srikumar, V., eds., Findings of the Association for Computational Linguistics: ACL 2024, 16022--16076. Associatio...

work page 2024

[36] [36]

Tuisov, A.; Vernik, Y.; and Shleyfman, A. 2025. LLM-Generated Heuristics for AI Planning: Do We Even Need Domain-Independence Anymore? arXiv:2501.18784

work page arXiv 2025

[37] [37]

Y.; Rambachan, A.; Kleinberg, J.; and Mullainathan, S

Vafa, K.; Chen, J. Y.; Rambachan, A.; Kleinberg, J.; and Mullainathan, S. 2024. Evaluating the World Model Implicit in a Generative Model. arXiv:2406.03689

work page arXiv 2024

[38] [38]

L.; and Petrick, R

Vallati, M.; Bart \' a k, R.; Chrpa, L.; McCluskey, T. L.; and Petrick, R. P. A. 2025. Knowledge Engineering for Planning and Scheduling in the LLM Era. In Harabor, D.; and Ramirez, M., eds., Proceedings of the thirty-fifth International Conference on Automated Planning and Scheduling (ICAPS 2025), 391--395. AAAI Press

work page 2025

[39] [39]

Valmeekam, K.; Marquez, M.; Olmo, A.; Sreedharan, S.; and Kambhampati, S. 2023 a . PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change. In Proceedings of the Thirty-Seventh Annual Conference on Neural Information Processing Systems ( NeurIPS 2023) , 38975--38987

work page 2023

[40] [40]

Valmeekam, K.; Marquez, M.; Sreedharan, S.; and Kambhampati, S. 2023 b . On the Planning Abilities of Large Language Models - A Critical Investigation. In Proceedings of the Thirty-Seventh Annual Conference on Neural Information Processing Systems ( NeurIPS 2023)

work page 2023

[41] [41]

Wang, X.; Chen, Y.; Yuan, L.; Zhang, Y.; Li, Y.; Peng, H.; and Ji, H. 2024. Executable Code Actions Elicit Better LLM Agents. In Proceedings of the 41st International Conference on Machine Learning (ICML 2024) . OpenReview.net

work page 2024

[42] [42]

Yao, S.; Yu, D.; Zhao, J.; Shafran, I.; Griffiths, T.; Cao, Y.; and Narasimhan, K. 2023. Tree of thoughts: Deliberate problem solving with large language models. In Proceedings of the Thirty-Seventh Annual Conference on Neural Information Processing Systems ( NeurIPS 2023)

work page 2023

[43] [43]

F.; Zhu, H.; Zhou, X.; Lo, R.; Sridhar, A.; Cheng, X.; Ou, T.; Bisk, Y.; Fried, D.; Alon, U.; and Neubig, G

Zhou, S.; Xu, F. F.; Zhu, H.; Zhou, X.; Lo, R.; Sridhar, A.; Cheng, X.; Ou, T.; Bisk, Y.; Fried, D.; Alon, U.; and Neubig, G. 2024. WebArena: A Realistic Web Environment for Building Autonomous Agents. In Proceedings of the Twelfth International Conference on Learning Representations (ICLR 2024). OpenReview.net

work page 2024