Recognition: no theorem link
LLM+P: Empowering Large Language Models with Optimal Planning Proficiency
Pith reviewed 2026-05-14 18:30 UTC · model grok-4.3
The pith
LLM+P lets language models generate optimal plans by routing problems through classical planners via PDDL translation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
LLM+P takes a natural language description of a planning problem, translates it into a syntactically and semantically correct PDDL file, invokes a classical planner to compute a correct or optimal plan, and translates that plan back into natural language; on the introduced benchmark problems this yields optimal solutions for most instances while standalone LLMs produce no feasible plan for most instances.
What carries the argument
The bidirectional LLM-to-PDDL-to-LLM translation pipeline that lets an LLM describe the problem and interpret the solution while delegating the actual search to a classical planner.
If this is right
- Classical planners become usable inside LLM pipelines without requiring users to write PDDL themselves.
- LLMs can be restricted to the easier sub-task of problem description and solution interpretation while optimality is guaranteed by search.
- The same translation pattern can be applied to other structured reasoning domains that already have efficient solvers.
- Benchmark results indicate that feasibility and optimality rates rise sharply once the planner is inserted between the two LLM calls.
Where Pith is reading between the lines
- The method could be tested on problems whose optimal solutions are already known from independent solvers to measure exact translation error rates.
- If PDDL translation remains the bottleneck, fine-tuning the LLM specifically on paired natural-language and PDDL examples might further improve reliability.
- The framework naturally extends to any domain that possesses both a natural-language interface and an existing classical solver, such as scheduling or verification tasks.
Load-bearing premise
Large language models can produce PDDL encodings that are accurate enough for the classical planner to return valid and optimal plans rather than invalid or empty ones.
What would settle it
A collection of natural-language planning problems on which the LLM repeatedly emits PDDL that is either syntactically malformed or semantically inconsistent with the original description, causing the planner to return no solution or an incorrect one.
read the original abstract
Large language models (LLMs) have demonstrated remarkable zero-shot generalization abilities: state-of-the-art chatbots can provide plausible answers to many common questions that arise in daily life. However, so far, LLMs cannot reliably solve long-horizon planning problems. By contrast, classical planners, once a problem is given in a formatted way, can use efficient search algorithms to quickly identify correct, or even optimal, plans. In an effort to get the best of both worlds, this paper introduces LLM+P, the first framework that incorporates the strengths of classical planners into LLMs. LLM+P takes in a natural language description of a planning problem, then returns a correct (or optimal) plan for solving that problem in natural language. LLM+P does so by first converting the language description into a file written in the planning domain definition language (PDDL), then leveraging classical planners to quickly find a solution, and then translating the found solution back into natural language. Along with LLM+P, we define a diverse set of different benchmark problems taken from common planning scenarios. Via a comprehensive set of experiments on these benchmark problems, we find that LLM+P is able to provide optimal solutions for most problems, while LLMs fail to provide even feasible plans for most problems.\footnote{The code and results are publicly available at https://github.com/Cranial-XIX/llm-pddl.git.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces LLM+P, a hybrid framework in which an LLM first translates a natural-language planning problem into PDDL, a classical planner then computes an optimal (or correct) solution, and the LLM finally renders the plan back into natural language. The authors define a collection of benchmark problems drawn from common planning domains, and report that LLM+P produces optimal solutions on most instances while pure LLMs fail to produce even feasible plans on most instances. Code and results are released publicly.
Significance. If the central empirical claim holds, the work supplies a concrete, reproducible demonstration that classical planners can be grafted onto LLMs to obtain optimality guarantees that current LLMs lack. The public release of code strengthens the result. The significance is limited by the absence of direct evidence that the LLM-generated PDDL faithfully encodes the original natural-language specification; downstream planner success alone does not certify semantic fidelity.
major comments (2)
- [§4] §4 (Experiments) and the associated tables: success is measured solely by whether the classical planner returns a plan; no separate human or automated audit of the generated PDDL files is reported. Consequently the headline claim that LLM+P solves the intended problems optimally rests on an unverified assumption that the LLM translation step preserves preconditions, fluents, and goal conditions exactly.
- [§3.1] §3.1 (PDDL Generation): the prompt templates and few-shot examples used to elicit PDDL are not accompanied by any quantitative measure of syntactic or semantic error rates. Because classical planners will solve any well-formed PDDL they receive, end-to-end success rates do not isolate whether the LLM step is reliable or merely lucky on the chosen benchmarks.
minor comments (2)
- [§4.1] The abstract states that benchmarks are 'taken from common planning scenarios' but §4.1 provides only high-level descriptions; an explicit list of domains, instance counts, and difficulty parameters would improve reproducibility.
- [Figure 2] Figure 2 (or equivalent) comparing LLM-only versus LLM+P trajectories would benefit from error bars or per-domain breakdowns rather than aggregate percentages.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which highlight important aspects of our evaluation methodology. We address each major comment below and outline the revisions we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: §4 (Experiments) and the associated tables: success is measured solely by whether the classical planner returns a plan; no separate human or automated audit of the generated PDDL files is reported. Consequently the headline claim that LLM+P solves the intended problems optimally rests on an unverified assumption that the LLM translation step preserves preconditions, fluents, and goal conditions exactly.
Authors: We acknowledge that our evaluation does not include an independent audit of the generated PDDL against the natural-language specifications. While the public code release permits inspection of the PDDL outputs, we agree that this leaves the semantic fidelity of the translation step unverified in the paper. In the revised manuscript we will add a new subsection under Experiments that reports a human audit of a random sample of generated PDDL files (at least 20% of instances per domain), checking that all preconditions, effects, and goal conditions match the original problem statement. We will also report the fraction of cases where the generated PDDL is syntactically invalid. revision: yes
-
Referee: §3.1 (PDDL Generation): the prompt templates and few-shot examples used to elicit PDDL are not accompanied by any quantitative measure of syntactic or semantic error rates. Because classical planners will solve any well-formed PDDL they receive, end-to-end success rates do not isolate whether the LLM step is reliable or merely lucky on the chosen benchmarks.
Authors: We agree that isolating the reliability of the PDDL-generation step is valuable. In the revision we will augment §3.1 with quantitative error analysis: (1) syntactic validity rate measured by attempting to parse every generated PDDL file with a standard PDDL parser, and (2) semantic fidelity measured on the subset of domains for which we possess ground-truth PDDL (Blocksworld, Logistics, etc.) by comparing generated predicates, actions, and goals against the reference encodings. These metrics will be reported alongside the existing end-to-end success rates. revision: yes
Circularity Check
No significant circularity; empirical hybrid framework
full rationale
The paper presents LLM+P as an empirical system: LLM translates NL problem descriptions to PDDL, a classical planner computes a solution (optimal or feasible), and the plan is translated back to NL. Central claims rest on experiments over author-defined benchmarks where end-to-end success rates are measured against ground-truth solvability. No equations, parameter fits, or derivations appear; no self-citation chain supports a uniqueness theorem or ansatz; success is externally validated by planner output rather than by construction from the LLM step itself. This matches the default non-circular case for a systems paper.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption LLMs can accurately convert natural language planning problems into correct PDDL
- standard math Classical planners produce optimal solutions given valid PDDL input
Forward citations
Cited by 23 Pith papers
-
Self-Improvement for Fast, High-Quality Plan Generation
Self-improvement of a decoder-only transformer yields plans averaging 30% shorter than a source symbolic planner, over 80% optimal where known, with sub-exponential latency scaling.
-
LLMs as ASP Programmers: Self-Correction Enables Task-Agnostic Nonmonotonic Reasoning
LLM+ASP framework enables task-agnostic nonmonotonic reasoning by having LLMs generate and self-correct ASP programs using solver feedback, outperforming SMT alternatives on diverse benchmarks.
-
LLM-Flax : Generalizable Robotic Task Planning via Neuro-Symbolic Approaches with Large Language Models
LLM-Flax automates neuro-symbolic robotic task planning with three LLM stages for rule generation, failure recovery, and zero-shot scoring, outperforming manual baselines on MazeNamo grids.
-
ANCHOR: A Physically Grounded Closed-Loop Framework for Robust Home-Service Mobile Manipulation
ANCHOR raises mobile manipulation success from 53.3% to 71.7% in unseen homes by binding plans to observable geometry, ensuring operable navigation endpoints, and using layered local recovery instead of global replans.
-
Using large language models for embodied planning introduces systematic safety risks
LLM planners for robots often produce dangerous plans even when planning succeeds, with safety awareness staying flat as model scale improves planning ability.
-
Self-Correcting RAG: Enhancing Faithfulness via MMKP Context Selection and NLI-Guided MCTS
Self-Correcting RAG formalizes retrieval as MMKP to maximize information density under token limits and uses NLI-guided MCTS to validate faithfulness, raising accuracy and cutting hallucinations on six multi-hop QA an...
-
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
LLMs display high variance and major accuracy drops on GSM-Symbolic variants of grade-school math problems, indicating they replicate training patterns rather than execute logical reasoning.
-
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models
VoxPoser uses LLMs to compose 3D value maps via VLM interaction for model-based synthesis of robust robot trajectories on open-set language-specified manipulation tasks.
-
CSR: Infinite-Horizon Real-Time Policies with Massive Cached State Representations
CSR with ASR enables infinite-horizon real-time LLM policies via stable KV-cache properties and background eviction, delivering 26x lower latency and SOTA recall on embodied benchmarks.
-
Decoupled Travel Planning with Behavior Forest
Behavior Forest decouples multi-constraint travel planning into parallel behavior trees with LLM nodes and global coordination, yielding 6.67% and 11.82% gains over prior methods on two benchmarks.
-
Mind the Prompt: Self-adaptive Generation of Task Plan Explanations via LLMs
COMPASS formalizes prompt engineering as a POMDP-based cognitive decision process for self-adaptive generation of task plan explanations via LLMs.
-
SYMBOLIZER: Symbolic Model-free Task Planning with VLMs
SYMBOLIZER grounds symbolic states from images via VLMs using only lifted predicates and solves long-horizon tasks with goal-count and width-based heuristic search, outperforming direct VLM planning and matching VLM-h...
-
Select-then-Solve: Paradigm Routing as Inference-Time Optimization for LLM Agents
A learned embedding-based router selecting among six reasoning paradigms improves LLM agent accuracy from 47.6% to 53.1% on average, beating the best fixed paradigm by 2.8pp.
-
A Survey on Large Language Model based Autonomous Agents
A survey of LLM-based autonomous agents that proposes a unified framework for their construction and reviews applications in social science, natural science, and engineering along with evaluation methods and future di...
-
Beyond Individual Intelligence: Surveying Collaboration, Failure Attribution, and Self-Evolution in LLM-based Multi-Agent Systems
The survey proposes the LIFE framework to unify fragmented research on collaboration, failure attribution, and self-evolution in LLM multi-agent systems into a progression toward self-organizing intelligence.
-
Novelty-based Tree-of-Thought Search for LLM Reasoning and Planning
Novelty estimation via LLM prompts enables pruning in Tree-of-Thought search, reducing overall token usage on language planning benchmarks.
-
Bridging Values and Behavior: A Hierarchical Framework for Proactive Embodied Agents
ValuePlanner is a hierarchical architecture that uses LLMs to generate value-based subgoals and PDDL planners to produce executable actions, enabling self-directed behavior in embodied agents.
-
From Coarse to Fine: Self-Adaptive Hierarchical Planning for LLM Agents
AdaPlan-H enables LLM agents to generate self-adaptive hierarchical plans that adjust detail level to task difficulty, improving success rates in multi-step tasks.
-
AssemPlanner: A Multi-Agent Based Task Planning Framework for Flexible Assembly System
AssemPlanner is a ReAct-based multi-agent system that autonomously generates production plans from natural language inputs by integrating scheduling, knowledge, line balancing, and scene graph feedback.
-
Compiled AI: Deterministic Code Generation for LLM-Based Workflow Automation
Compiled AI generates deterministic code artifacts from LLMs in a one-time compilation step, enabling reliable workflow execution with zero runtime tokens after break-even.
-
Understanding the planning of LLM agents: A survey
A survey that provides a taxonomy of methods for improving planning in LLM-based agents across task decomposition, plan selection, external modules, reflection, and memory.
-
The Rise and Potential of Large Language Model Based Agents: A Survey
The paper surveys the origins, frameworks, applications, and open challenges of AI agents built on large language models.
-
Shaping Schema via Language Representation as the Next Frontier for LLM Intelligence Expanding
Advanced language representations shape LLMs' schemas to improve knowledge activation and problem-solving.
Reference graph
Works this paper leans on
-
[1]
Eliza—a computer program for the study of natural language communication between man and machine,
J. Weizenbaum, “Eliza—a computer program for the study of natural language communication between man and machine,” Communica- tions of the ACM , vol. 9, no. 1, pp. 36–45, 1966
work page 1966
- [2]
-
[3]
Chatgpt for robotics: Design principles and model abilities,
S. Vemprala, R. Bonatti, A. Bucker, and A. Kapoor, “Chatgpt for robotics: Design principles and model abilities,” Microsoft, Tech. Rep. MSR-TR-2023-8, February 2023. [Online]. Available: https://www.microsoft.com/en-us/research/publication/ chatgpt-for-robotics-design-principles-and-model-abilities/
work page 2023
-
[4]
Dissociating language and thought in large language models: a cognitive perspective,
K. Mahowald, A. A. Ivanova, I. A. Blank, N. Kanwisher, J. B. Tenenbaum, and E. Fedorenko, “Dissociating language and thought in large language models: a cognitive perspective,” arXiv preprint arXiv:2301.06627, 2023
-
[5]
Mixout: Effective regularization to finetune large-scale pretrained language models,
C. Lee, K. Cho, and W. Kang, “Mixout: Effective regularization to finetune large-scale pretrained language models,” arXiv preprint arXiv:1909.11299, 2019
-
[6]
Finetuned Language Models Are Zero-Shot Learners
J. Wei, M. Bosma, V . Y . Zhao, K. Guu, A. W. Yu, B. Lester, N. Du, A. M. Dai, and Q. V . Le, “Finetuned language models are zero-shot learners,” arXiv preprint arXiv:2109.01652 , 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[7]
Pddl-the planning domain definition language,
D. McDermott, M. Ghallab, A. Howe, C. Knoblock, A. Ram, M. Veloso, D. Weld, and D. Wilkins, “Pddl-the planning domain definition language,” 1998
work page 1998
-
[8]
An introduc- tion to the planning domain definition language,
P. Haslum, N. Lipovetzky, D. Magazzeni, and C. Muise, “An introduc- tion to the planning domain definition language,” Synthesis Lectures on Artificial Intelligence and Machine Learning , vol. 13, no. 2, pp. 1–187, 2019
work page 2019
-
[9]
K. Valmeekam, A. Olmo, S. Sreedharan, and S. Kambhampati, “Large language models still can’t plan (a benchmark for llms on planning and reasoning about change),” arXiv preprint arXiv:2206.10498, 2022
-
[10]
Language models are few-shot learners,
T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al. , “Language models are few-shot learners,” Advances in neural information pro- cessing systems, vol. 33, pp. 1877–1901, 2020
work page 1901
-
[11]
The fast downward planning system,
M. Helmert, “The fast downward planning system,” Journal of Artifi- cial Intelligence Research , vol. 26, pp. 191–246, 2006
work page 2006
-
[12]
The computational complexity of propositional STRIPS planning,
T. Bylander, “The computational complexity of propositional STRIPS planning,” Artificial Intelligence, vol. 69, no. 1-2, pp. 165–204, 1994
work page 1994
-
[13]
Situations, actions, and causal laws,
J. McCarthy, “Situations, actions, and causal laws,” Stanford Univer- sity Technical Report, Tech. Rep., 1963
work page 1963
-
[14]
Strips: A new approach to the appli- cation of theorem proving to problem solving,
R. E. Fikes and N. J. Nilsson, “Strips: A new approach to the appli- cation of theorem proving to problem solving,” Artificial intelligence, vol. 2, no. 3-4, pp. 189–208, 1971
work page 1971
- [15]
-
[16]
Prodigy: An integrated architecture for planning and learning,
J. Carbonell, O. Etzioni, Y . Gil, R. Joseph, C. Knoblock, S. Minton, and M. Veloso, “Prodigy: An integrated architecture for planning and learning,” ACM SIGART Bulletin , vol. 2, no. 4, pp. 51–55, 1991
work page 1991
-
[17]
Shop2: An htn planning system,
D. S. Nau, T.-C. Au, O. Ilghami, U. Kuter, J. W. Murdock, D. Wu, and F. Yaman, “Shop2: An htn planning system,” Journal of artificial intelligence research, 2003
work page 2003
-
[18]
Task planning in robotics: an empirical comparison of pddl-and asp-based sys- tems,
Y .-q. Jiang, S.-q. Zhang, P. Khandelwal, and P. Stone, “Task planning in robotics: an empirical comparison of pddl-and asp-based sys- tems,” Frontiers of Information Technology & Electronic Engineering, vol. 20, pp. 363–373, 2019
work page 2019
-
[19]
Answer set programming at a glance,
G. Brewka, T. Eiter, and M. Truszczy ´nski, “Answer set programming at a glance,” Communications of the ACM, vol. 54, no. 12, pp. 92–103, 2011
work page 2011
-
[20]
Answer set programming and plan generation,
V . Lifschitz, “Answer set programming and plan generation,” Artificial Intelligence, vol. 138, no. 1-2, pp. 39–54, 2002
work page 2002
-
[21]
Pddl2. 1: An extension to pddl for expressing temporal planning domains,
M. Fox and D. Long, “Pddl2. 1: An extension to pddl for expressing temporal planning domains,”Journal of artificial intelligence research, vol. 20, pp. 61–124, 2003
work page 2003
-
[22]
Mobile robot planning using action language bc with an abstraction hierarchy,
S. Zhang, F. Yang, P. Khandelwal, and P. Stone, “Mobile robot planning using action language bc with an abstraction hierarchy,” in International Conference on Logic Programming and Nonmonotonic Reasoning. Springer, 2015, pp. 502–516
work page 2015
-
[23]
Task-motion planning for safe and efficient urban driving,
Y . Ding, X. Zhang, X. Zhan, and S. Zhang, “Task-motion planning for safe and efficient urban driving,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , 2020
work page 2020
-
[24]
Multi-robot planning with conflicts and synergies,
Y . Jiang, H. Yedidsion, S. Zhang, G. Sharon, and P. Stone, “Multi-robot planning with conflicts and synergies,” Autonomous Robots , vol. 43, no. 8, pp. 2011–2032, 2019
work page 2011
-
[25]
Platform-independent benchmarks for task and motion planning,
F. Lagriffoul, N. T. Dantam, C. Garrett, A. Akbari, S. Srivastava, and L. E. Kavraki, “Platform-independent benchmarks for task and motion planning,” IEEE Robotics and Automation Letters , vol. 3, no. 4, pp. 3765–3772, 2018
work page 2018
-
[26]
Integrated task and motion planning in belief space,
L. P. Kaelbling and T. Lozano-P ´erez, “Integrated task and motion planning in belief space,” The International Journal of Robotics Research, vol. 32, no. 9-10, pp. 1194–1227, 2013
work page 2013
-
[27]
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[28]
Evaluating Large Language Models Trained on Code
M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. d. O. Pinto, J. Ka- plan, H. Edwards, Y . Burda, N. Joseph, G. Brockman, et al. , “Evaluating large language models trained on code,” arXiv preprint arXiv:2107.03374, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[29]
OPT: Open Pre-trained Transformer Language Models
S. Zhang, S. Roller, N. Goyal, M. Artetxe, M. Chen, S. Chen, C. De- wan, M. Diab, X. Li, X. V . Lin, et al., “Opt: Open pre-trained trans- former language models,” arXiv preprint arXiv:2205.01068 , 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
- [30]
-
[31]
LLaMA: Open and Efficient Foundation Language Models
H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozi`ere, N. Goyal, E. Hambro, F. Azhar, et al., “Llama: Open and efficient foundation language models,” arXiv preprint arXiv:2302.13971, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[32]
Llama 2: Open Foundation and Fine-Tuned Chat Models
H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y . Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale, et al. , “Llama 2: Open foundation and fine-tuned chat models,” arXiv preprint arXiv:2307.09288, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[33]
PaLM: Scaling Language Modeling with Pathways
A. Chowdhery, S. Narang, J. Devlin, M. Bosma, G. Mishra, A. Roberts, P. Barham, H. W. Chung, C. Sutton, S. Gehrmann, et al., “Palm: Scaling language modeling with pathways,” arXiv preprint arXiv:2204.02311, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[34]
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances
M. Ahn, A. Brohan, N. Brown, Y . Chebotar, O. Cortes, B. David, C. Finn, K. Gopalakrishnan, K. Hausman, A. Herzog, et al., “Do as i can, not as i say: Grounding language in robotic affordances,” arXiv preprint arXiv:2204.01691, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[35]
Task and motion planning with large language models for object rearrangement,
Y . Ding, X. Zhang, C. Paxton, and S. Zhang, “Task and motion planning with large language models for object rearrangement,” 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2023
work page 2023
-
[36]
PaLM-E: An Embodied Multimodal Language Model
D. Driess, F. Xia, M. S. Sajjadi, C. Lynch, A. Chowdhery, B. Ichter, A. Wahid, J. Tompson, Q. Vuong, T. Yu,et al., “Palm-e: An embodied multimodal language model,” arXiv preprint arXiv:2303.03378, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[37]
Inner Monologue: Embodied Reasoning through Planning with Language Models
W. Huang, F. Xia, T. Xiao, H. Chan, J. Liang, P. Florence, A. Zeng, J. Tompson, I. Mordatch, Y . Chebotar, et al. , “Inner monologue: Embodied reasoning through planning with language models,” arXiv preprint arXiv:2207.05608, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[38]
Language models as zero-shot planners: Extracting actionable knowledge for embodied agents,
W. Huang, P. Abbeel, D. Pathak, and I. Mordatch, “Language models as zero-shot planners: Extracting actionable knowledge for embodied agents,” in International Conference on Machine Learning . PMLR, 2022, pp. 9118–9147
work page 2022
-
[39]
Housekeep: Tidying virtual households using commonsense reasoning,
Y . Kant, A. Ramachandran, S. Yenamandra, I. Gilitschenski, D. Batra, A. Szot, and H. Agrawal, “Housekeep: Tidying virtual households using commonsense reasoning,” in Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXIX. Springer, 2022, pp. 355–373
work page 2022
- [40]
-
[41]
Text2motion: From natu- ral language instructions to feasible plans
K. Lin, C. Agia, T. Migimatsu, M. Pavone, and J. Bohg, “Text2motion: From natural language instructions to feasible plans,” arXiv preprint arXiv:2303.12153, 2023
-
[42]
Automaton-based representations of task knowledge from generative language models,
Y . Yang, J.-R. Gaglione, C. Neary, and U. Topcu, “Automaton-based representations of task knowledge from generative language models,” arXiv preprint arXiv:2212.01944 , 2023
-
[43]
Integrating action knowledge and llms for task planning and situation handling in open worlds,
Y . Ding, X. Zhang, S. Amiri, N. Cao, H. Yang, A. Kaminski, C. Esselink, and S. Zhang, “Integrating action knowledge and llms for task planning and situation handling in open worlds,” arXiv preprint arXiv:2305.17590, 2023
-
[44]
Robots that ask for help: Uncertainty alignment for large language model planners,
A. Z. Ren, A. Dixit, A. Bodrova, S. Singh, S. Tu, N. Brown, P. Xu, L. Takayama, F. Xia, J. Varley, et al. , “Robots that ask for help: Uncertainty alignment for large language model planners,” arXiv preprint arXiv:2307.01928, 2023
-
[45]
Autotamp: Autoregressive task and motion planning with llms as translators and checkers,
Y . Chen, J. Arkin, Y . Zhang, N. Roy, and C. Fan, “Autotamp: Autoregressive task and motion planning with llms as translators and checkers,” arXiv preprint arXiv:2306.06531 , 2023
-
[46]
K. Valmeekam, S. Sreedharan, M. Marquez, A. Olmo, and S. Kamb- hampati, “On the planning abilities of large language models (a critical investigation with a proposed benchmark),” arXiv preprint arXiv:2302.06706, 2023
-
[47]
PDDL planning with pretrained large language models,
T. Silver, V . Hariprasad, R. S. Shuttleworth, N. Kumar, T. Lozano- P´erez, and L. P. Kaelbling, “PDDL planning with pretrained large language models,” in NeurIPS 2022 Foundation Models for Decision Making Workshop , 2022. [Online]. Available: https: //openreview.net/forum?id=1QMMUB4zfl
work page 2022
-
[48]
Plansformer: Generating symbolic plans using transformers,
V . Pallagani, B. Muppasani, K. Murugesan, F. Rossi, L. Horesh, B. Srivastava, F. Fabiano, and A. Loreggia, “Plansformer: Generating symbolic plans using transformers,” arXiv preprint arXiv:2212.08681, 2022
-
[49]
Learning and leveraging verifiers to improve planning capabilities of pre-trained language models,
D. Arora and S. Kambhampati, “Learning and leveraging verifiers to improve planning capabilities of pre-trained language models,” arXiv preprint arXiv:2305.17077, 2023
-
[50]
L. Guan, K. Valmeekam, S. Sreedharan, and S. Kambhampati, “Leveraging pre-trained large language models to construct and uti- lize world models for model-based task planning,” arXiv preprint arXiv:2305.14909, 2023
-
[51]
Generalized planning in pddl domains with pretrained large language models,
T. Silver, S. Dan, K. Srinivas, J. B. Tenenbaum, L. P. Kaelbling, and M. Katz, “Generalized planning in pddl domains with pretrained large language models,” arXiv preprint arXiv:2305.11014 , 2023
-
[52]
Understanding the capabili- ties of large language models for automated planning,
V . Pallagani, B. Muppasani, K. Murugesan, F. Rossi, B. Srivastava, L. Horesh, F. Fabiano, and A. Loreggia, “Understanding the capabili- ties of large language models for automated planning,” arXiv preprint arXiv:2305.16151, 2023
-
[53]
On the planning abilities of large language models–a critical investi- gation,
K. Valmeekam, M. Marquez, S. Sreedharan, and S. Kambhampati, “On the planning abilities of large language models–a critical investi- gation,” arXiv preprint arXiv:2305.15771 , 2023
-
[54]
Translating natural language to planning goals with large-language models,
Y . Xie, C. Yu, T. Zhu, J. Bai, Z. Gong, and H. Soh, “Translating natural language to planning goals with large-language models,” arXiv preprint arXiv:2302.05128, 2023
-
[55]
Saycanpay: Heuristic planning with large language models using learnable domain knowl- edge,
R. Hazra, P. Z. D. Martires, and L. De Raedt, “Saycanpay: Heuristic planning with large language models using learnable domain knowl- edge,” arXiv preprint arXiv:2308.12682 , 2023
-
[56]
K. Rana, J. Haviland, S. Garg, J. Abou-Chakra, I. Reid, and N. Suen- derhauf, “Sayplan: Grounding large language models using 3d scene graphs for scalable task planning,” arXiv preprint arXiv:2307.06135 , 2023
-
[57]
Isr-llm: Iterative self-refined large language model for long-horizon sequential task planning,
Z. Zhou, J. Song, K. Yao, Z. Shu, and L. Ma, “Isr-llm: Iterative self-refined large language model for long-horizon sequential task planning,” arXiv preprint arXiv:2308.13724 , 2023
-
[58]
Z. Wang, S. Cai, A. Liu, X. Ma, and Y . Liang, “Describe, explain, plan and select: Interactive planning with large language models en- ables open-world multi-task agents,”arXiv preprint arXiv:2302.01560, 2023
-
[59]
WebGPT: Browser-assisted question-answering with human feedback
R. Nakano, J. Hilton, S. Balaji, J. Wu, L. Ouyang, C. Kim, C. Hesse, S. Jain, V . Kosaraju, W. Saunders,et al., “Webgpt: Browser- assisted question-answering with human feedback,” arXiv preprint arXiv:2112.09332, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[60]
Internet-augmented language models through few-shot prompting for open-domain question answering,
A. Lazaridou, E. Gribovskaya, W. Stokowiec, and N. Grigorev, “Internet-augmented language models through few-shot prompting for open-domain question answering,” arXiv preprint arXiv:2203.05115 , 2022
-
[61]
Memory-assisted prompt editing to improve gpt-3 after deployment,
A. Madaan, N. Tandon, P. Clark, and Y . Yang, “Memory-assisted prompt editing to improve gpt-3 after deployment,” 2023
work page 2023
-
[62]
Replug: Retrieval-augmented black-box language models
W. Shi, S. Min, M. Yasunaga, M. Seo, R. James, M. Lewis, L. Zettle- moyer, and W.-t. Yih, “Replug: Retrieval-augmented black-box lan- guage models,” arXiv preprint arXiv:2301.12652 , 2023
-
[63]
W. Chen, X. Ma, X. Wang, and W. W. Cohen, “Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks,” arXiv preprint arXiv:2211.12588 , 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[64]
PAL: Program-aided Language Models
L. Gao, A. Madaan, S. Zhou, U. Alon, P. Liu, Y . Yang, J. Callan, and G. Neubig, “Pal: Program-aided language models,” arXiv preprint arXiv:2211.10435, 2022
work page Pith review arXiv 2022
-
[65]
Toolformer: Language Models Can Teach Themselves to Use Tools
T. Schick, J. Dwivedi-Yu, R. Dess `ı, R. Raileanu, M. Lomeli, L. Zettle- moyer, N. Cancedda, and T. Scialom, “Toolformer: Language models can teach themselves to use tools,” arXiv preprint arXiv:2302.04761 , 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[66]
Faithful chain-of-thought reasoning, 2023
Q. Lyu, S. Havaldar, A. Stein, L. Zhang, D. Rao, E. Wong, M. Apid- ianaki, and C. Callison-Burch, “Faithful chain-of-thought reasoning,” arXiv preprint arXiv:2301.13379 , 2023
-
[67]
doi:10.5281/ZENODO.6382173 , organization =
J. Seipp, ´A. Torralba, and J. Hoffmann, “PDDL generators,” https: //doi.org/10.5281/zenodo.6382173, 2022
-
[68]
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
S. Yao, D. Yu, J. Zhao, I. Shafran, T. L. Griffiths, Y . Cao, and K. Narasimhan, “Tree of thoughts: Deliberate problem solving with large language models,” arXiv preprint arXiv:2305.10601 , 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.