pith. machine review for the scientific record. sign in

arxiv: 2602.04129 · v2 · submitted 2026-02-04 · 💻 cs.RO · cs.AI· cs.ET· cs.MA

Recognition: no theorem link

KGLAMP: Knowledge Graph-guided Language model for Adaptive Multi-robot Planning and Replanning

Authors on Pith no claims yet

Pith reviewed 2026-05-16 08:05 UTC · model grok-4.3

classification 💻 cs.RO cs.AIcs.ETcs.MA
keywords knowledge graphmulti-robot planninglarge language modelsPDDLadaptive replanningheterogeneous robotsdynamic environments
0
0 comments X

The pith

A knowledge graph guides an LLM to build and update accurate PDDL plans for heterogeneous robot teams in dynamic settings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents KGLAMP as a way to combine a structured knowledge graph with large language models for multi-robot planning. The graph stores facts about objects, reachability, and each robot's capabilities, then directs the LLM to output correct PDDL problem descriptions instead of relying on the model to invent everything from text alone. When new observations arrive, the graph updates and checks for inconsistencies that would break the current plan, prompting the LLM to generate a revised PDDL file for replanning. Experiments on the MAT-THOR benchmark demonstrate at least a 25.3 percent performance gain compared with both pure LLM planners and classical PDDL planners that lack this memory structure. The approach therefore tackles the manual-modeling burden of traditional planners and the inconsistency problems of unstructured language-model planning in long-horizon, uncertain environments.

Core claim

KGLAMP maintains a structured knowledge graph encoding object relations, spatial reachability, and robot capabilities, which guides the LLM in generating accurate PDDL problem specifications. The knowledge graph serves as a persistent, dynamically updated memory that incorporates new observations and triggers replanning upon detecting inconsistencies, enabling symbolic plans to adapt to evolving world states.

What carries the argument

The knowledge graph that encodes object relations, spatial reachability, and robot capabilities to direct the LLM toward correct PDDL outputs and to detect when replanning is required.

If this is right

  • Plans stay consistent with changing observations without requiring a human to rewrite the entire symbolic model.
  • Heterogeneous teams coordinate more reliably because capability differences are explicitly represented in the shared graph.
  • Replanning occurs only when the graph flags an inconsistency, avoiding unnecessary full replans.
  • The same graph can be reused across multiple tasks, reducing the cost of starting each new mission from scratch.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • In real deployments the graph could be populated directly from onboard perception pipelines rather than simulated observations.
  • Extending the graph with temporal relations might allow the system to anticipate future inconsistencies before they occur.
  • The framework's separation of persistent memory from the LLM could be applied to single-robot tasks that still require long-horizon symbolic reasoning.

Load-bearing premise

The knowledge graph can be kept accurate from robot observations and the LLM will reliably turn that graph into correct, consistent PDDL specifications.

What would settle it

Running the MAT-THOR experiments and observing that KGLAMP does not improve success rate by at least 25.3 percent over the LLM-only and PDDL baselines, or that plans fail because the generated PDDL files contain errors.

Figures

Figures reproduced from arXiv: 2602.04129 by Chak Lam Shek, David Isele, Faizan M. Tariq, Piyush Gupta, Sangjae Bae.

Figure 1
Figure 1. Figure 1: Impact of relational knowledge on task planning. (a) Without relational graphs, PDDL models miss object relationships, leading to failed plans. (b) Incorporating relationship, property, and reacha￾bility graphs enables accurate PDDL generation and feasible plans. heterogeneous multi-robot systems remains challenging [9], as most approaches assume shared action models and identi￾cal capabilities, limiting r… view at source ↗
Figure 2
Figure 2. Figure 2: Minimal STRIPS PDDL example illustrating (a) Domain PDDL and (b) Problem PDDL A. Multi-Agent Planning (MAP) Formulation We formulate the problem as a cooperative Multi￾Agent Planning (MAP) framework, defined by ⟨R, D, {Ai} n i=1, P, I, G⟩, where R = {r1, . . . , rn} is the set of robots. Each robot ri has a domain di ∈ D capturing its capabilities and constraints, and an action set Ai defining its state tr… view at source ↗
Figure 3
Figure 3. Figure 3: Overview of KGLAMP framework. Environment and robot information are encoded as relationship, property, and reachability knowledge graphs. LLM agents generate goal, relational, property, and reachability predicates in a dependency-aware manner to synthesize a PDDL problem, execute the resulting plan, and iteratively update the graphs and replan upon execution failures. LLM agents. It encodes instructions an… view at source ↗
Figure 4
Figure 4. Figure 4: An example knowledge graph. (a) Grelation captures semantic and geometric relationships among objects. (b) Gproperty encodes object attributes and robot capabilities. (c) Greach models spatial connectivity [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: An example of LLM prompt for LLMrelation. This prompt utilizes contextual examples, scenario definition, spatial data, and output constraints to extract relevant spatial tuples. action feasibility and ordering (e.g., reachability or contain￾ment). Identifying these relations before action generation is therefore critical, as they define the structural constraints governing valid action sequences. This infe… view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative example of planning and replanning. In the task Put the watch and keychain inside the drawer, the robot fails when placing the watch into a closed drawer. It recovers by replanning to open the drawer and completes the task. We compare against six representative baselines. (i) LLM￾as-Planner [34] treats the language model as a standalone planner that directly generates action sequences without s… view at source ↗
read the original abstract

Heterogeneous multi-robot systems are increasingly used in long-horizon missions requiring coordinated planning across diverse capabilities. However, existing planning approaches struggle to construct accurate symbolic representations and maintain plan consistency in dynamic environments. Classical PDDL planners require manually crafted symbolic models, while LLM-based planners often ignore agent heterogeneity and environmental uncertainty. We introduce KGLAMP, a knowledge-graph-guided LLM planning framework for heterogeneous multi-robot teams. The framework maintains a structured knowledge graph encoding object relations, spatial reachability, and robot capabilities, which guides the LLM in generating accurate PDDL problem specifications. The knowledge graph serves as a persistent, dynamically updated memory that incorporates new observations and triggers replanning upon detecting inconsistencies, enabling symbolic plans to adapt to evolving world states. Experiments on the MAT-THOR benchmark show that KGLAMP improves performance by at least 25.3% over both LLM-only and PDDL-based variants.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces KGLAMP, a framework that maintains a dynamically updated knowledge graph encoding object relations, spatial reachability, and robot capabilities to guide an LLM in generating accurate PDDL problem specifications for planning and replanning in heterogeneous multi-robot teams. The KG acts as persistent memory that incorporates observations and triggers replanning on detected inconsistencies. The central claim is that KGLAMP achieves at least 25.3% performance improvement over LLM-only and classical PDDL baselines on the MAT-THOR benchmark.

Significance. If the reported gains are substantiated with controlled experiments and component-level validation, the work would offer a practical integration of symbolic representations with LLM flexibility for adaptive multi-robot planning under uncertainty and heterogeneity, addressing limitations of purely classical or neural approaches.

major comments (2)
  1. [Experiments] The experimental evaluation reports a 25.3% improvement on MAT-THOR but supplies no protocol details, statistical tests, error bars, baseline code or hyperparameter settings, or controls for heterogeneity/uncertainty; without these the central empirical claim cannot be verified or attributed to the KG-LLM-PDDL loop.
  2. [Framework and Evaluation] No separate metrics are provided for knowledge-graph triple precision/recall from raw observations or for syntactic/semantic correctness of LLM-generated PDDL; these are load-bearing assumptions for the replanning mechanism, yet only aggregate task success is reported.
minor comments (1)
  1. [Abstract] The abstract states the improvement percentage without defining the exact metric (success rate, completion time, etc.) or the number of trials.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We agree that additional experimental details and component-level metrics are necessary to substantiate the claims and will revise the manuscript accordingly. Below we respond to each major comment.

read point-by-point responses
  1. Referee: [Experiments] The experimental evaluation reports a 25.3% improvement on MAT-THOR but supplies no protocol details, statistical tests, error bars, baseline code or hyperparameter settings, or controls for heterogeneity/uncertainty; without these the central empirical claim cannot be verified or attributed to the KG-LLM-PDDL loop.

    Authors: We acknowledge the need for greater transparency in the experimental protocol. In the revised manuscript we will add: (i) a full description of the evaluation protocol including number of independent trials per scenario, random seeds, and environment variations; (ii) statistical significance tests (paired t-tests or Wilcoxon signed-rank tests with p-values) comparing KGLAMP against baselines; (iii) error bars (standard deviation or 95% confidence intervals) on all reported success rates; (iv) explicit hyperparameter settings for the LLM (temperature, prompt templates) and classical planner; (v) links or pseudocode for baseline implementations; and (vi) dedicated ablation studies that isolate the contributions of heterogeneity handling and uncertainty detection. These additions will allow readers to reproduce the 25.3% aggregate improvement and attribute it specifically to the KG-LLM-PDDL loop. revision: yes

  2. Referee: [Framework and Evaluation] No separate metrics are provided for knowledge-graph triple precision/recall from raw observations or for syntactic/semantic correctness of LLM-generated PDDL; these are load-bearing assumptions for the replanning mechanism, yet only aggregate task success is reported.

    Authors: We agree that aggregate task success alone is insufficient to validate the core mechanisms. In the revision we will introduce and report two new evaluation sections: (1) Knowledge-graph quality metrics—precision, recall, and F1 for triples extracted from raw observations, measured against ground-truth annotations on a held-out subset of MAT-THOR episodes; (2) PDDL generation quality—syntactic validity rate (percentage of outputs accepted by a PDDL parser) and semantic correctness (percentage of generated problem files whose initial state and goal match the observed world state, verified by automated simulation or manual inspection on sampled cases). These metrics will be presented alongside the replanning frequency and overall success rates to demonstrate that the KG update and LLM-PDDL steps are reliable. revision: yes

Circularity Check

0 steps flagged

No circularity: framework combines standard components without self-referential derivations or fitted predictions

full rationale

The manuscript presents KGLAMP as an engineering combination of knowledge graphs for state tracking, LLMs for PDDL generation, and classical planners for execution, with dynamic updates and replanning on inconsistency detection. No equations, parameters, or derivations appear in the provided text that reduce by construction to inputs (e.g., no fitted scale parameters renamed as predictions, no uniqueness theorems imported from self-citations, no ansatzes smuggled via prior work). The 25.3% performance delta is an empirical claim on the MAT-THOR benchmark rather than a logical tautology. The derivation chain is therefore self-contained and non-circular.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the assumption that an LLM can translate graph-encoded relations into valid PDDL when suitably prompted; no free parameters or new invented entities are introduced in the abstract.

axioms (1)
  • domain assumption LLMs guided by a structured knowledge graph can generate accurate and consistent PDDL problem specifications for heterogeneous robots.
    This assumption is required for the LLM component to produce usable symbolic plans from the graph.

pith-pipeline@v0.9.0 · 5474 in / 1382 out tokens · 64933 ms · 2026-05-16T08:05:34.094192+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 4 internal anchors

  1. [1]

    Multi-robot coordination and layout design for au- tomated warehousing

    Yulun Zhang, Matthew C Fontaine, Varun Bhatt, Stefanos Nikolaidis, and Jiaoyang Li. Multi-robot coordination and layout design for au- tomated warehousing. InProceedings of the International Symposium on Combinatorial Search, volume 17, pages 305–306, 2024

  2. [2]

    Multi- robot task planning under individual and collaborative temporal logic specifications

    Ruofei Bai, Ronghao Zheng, Meiqin Liu, and Senlin Zhang. Multi- robot task planning under individual and collaborative temporal logic specifications. InInternational Conference on Intelligent Robots and Systems (IROS), pages 6382–6389. IEEE, 2021

  3. [3]

    In- centivizing collaboration in heterogeneous teams via common-pool resource games.IEEE Transactions on Automatic Control, 68(3):1902– 1909, 2022

    Piyush Gupta, Shaunak D Bopardikar, and Vaibhav Srivastava. In- centivizing collaboration in heterogeneous teams via common-pool resource games.IEEE Transactions on Automatic Control, 68(3):1902– 1909, 2022

  4. [4]

    Achiev- ing efficient collaboration in decentralized heterogeneous teams using common-pool resource games

    Piyush Gupta, Shaunak D Bopardikar, and Vaibhav Srivastava. Achiev- ing efficient collaboration in decentralized heterogeneous teams using common-pool resource games. In58th Conference on Decision and Control (CDC), pages 6924–6929. IEEE, 2019

  5. [5]

    Autonomous robot task execution in flexible manufacturing: Integrating PDDL and behavior trees in ARIAC 2023.Biomimetics, 9(10):612, 2024

    Ruikai Liu, Guangxi Wan, Maowei Jiang, Haojie Chen, and Peng Zeng. Autonomous robot task execution in flexible manufacturing: Integrating PDDL and behavior trees in ARIAC 2023.Biomimetics, 9(10):612, 2024

  6. [6]

    LLM+P: Empowering Large Language Models with Optimal Planning Proficiency

    Bo Liu, Yuqian Jiang, Xiaohan Zhang, Qiang Liu, Shiqi Zhang, Joydeep Biswas, and Peter Stone. LLM+P: Empowering large lan- guage models with optimal planning proficiency.arXiv preprint arXiv:2304.11477, 2023

  7. [7]

    Robots that ask for help: Uncertainty alignment for large language model planners.arXiv preprint arXiv:2307.01928, 2023

    Allen Z Ren, Anushri Dixit, Alexandra Bodrova, Sumeet Singh, Stephen Tu, Noah Brown, Peng Xu, Leila Takayama, Fei Xia, Jake Varley, et al. Robots that ask for help: Uncertainty alignment for large language model planners.arXiv preprint arXiv:2307.01928, 2023

  8. [8]

    Leveraging pre-trained large language models to construct and utilize world models for model-based task planning

    Lin Guan, Karthik Valmeekam, Sarath Sreedharan, and Subbarao Kambhampati. Leveraging pre-trained large language models to construct and utilize world models for model-based task planning. Advances in Neural Information Processing Systems, 36:79081–79094, 2023

  9. [9]

    Distributed allocation and scheduling of tasks with cross-schedule dependencies for heteroge- neous multi-robot teams.IEEE access, 12:74327–74342, 2024

    Barbara Arbanas Ferreira, Tamara Petrovi ´c, Matko Orsag, J Ramiro Mart´ınez-de Dios, and Stjepan Bogdan. Distributed allocation and scheduling of tasks with cross-schedule dependencies for heteroge- neous multi-robot teams.IEEE access, 12:74327–74342, 2024

  10. [10]

    LaMMA-P: Generalizable multi-agent long-horizon task allocation and planning with LM-driven PDDL planner

    Xiaopan Zhang, Hao Qin, Fuquan Wang, Yue Dong, and Jiachen Li. LaMMA-P: Generalizable multi-agent long-horizon task allocation and planning with LM-driven PDDL planner. InInternational Conference on Robotics and Automation, pages 10221–10221. IEEE, 2025

  11. [11]

    Iterative Formalization and Planning in Partially Observable Environments

    Liancheng Gong, Wang Zhu, Jesse Thomason, and Li Zhang. Zero- shot iterative formalization and planning in partially observable envi- ronments.arXiv preprint arXiv:2505.13126, 2025

  12. [12]

    NOVELGYM: A flexible ecosystem for hybrid planning and learning agents designed for open worlds.arXiv preprint arXiv:2401.03546, 2024

    Shivam Goel, Yichen Wei, Panagiotis Lymperopoulos, Kl ´ara Chur ´a, Matthias Scheutz, and Jivko Sinapov. NOVELGYM: A flexible ecosystem for hybrid planning and learning agents designed for open worlds.arXiv preprint arXiv:2401.03546, 2024

  13. [13]

    GFlowVLM: Enhancing multi-step reasoning in vision-language models with generative flow networks

    Haoqiang Kang, Enna Sachdeva, Piyush Gupta, Sangjae Bae, and Kwonjoon Lee. GFlowVLM: Enhancing multi-step reasoning in vision-language models with generative flow networks. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 3815–3825, 2025

  14. [14]

    Generalized mission planning for heterogeneous multi-robot teams via LLM-constructed hierarchical trees

    Piyush Gupta, David Isele, Enna Sachdeva, Pin-Hao Huang, Behzad Dariush, Kwonjoon Lee, and Sangjae Bae. Generalized mission planning for heterogeneous multi-robot teams via LLM-constructed hierarchical trees. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 10187–10193, 2025

  15. [15]

    Robot behavior-tree-based task generation with large language models.arXiv preprint arXiv:2302.12927, 2023

    Yue Cao and CS Lee. Robot behavior-tree-based task generation with large language models.arXiv preprint arXiv:2302.12927, 2023

  16. [16]

    Skill reinforcement learning and planning for open-world long-horizon tasks.arXiv preprint arXiv:2303.16563, 2023

    Haoqi Yuan, Chi Zhang, Hongcheng Wang, Feiyang Xie, Penglin Cai, Hao Dong, and Zongqing Lu. Skill reinforcement learning and planning for open-world long-horizon tasks.arXiv preprint arXiv:2303.16563, 2023

  17. [17]

    PLAN-AND-ACT: Improving planning of agents for long-horizon tasks.arXiv preprint arXiv:2503.09572, 2025

    Lutfi Eren Erdogan, Nicholas Lee, Sehoon Kim, Suhong Moon, Hiroki Furuta, Gopala Anumanchipalli, Kurt Keutzer, and Amir Gholami. PLAN-AND-ACT: Improving planning of agents for long-horizon tasks.arXiv preprint arXiv:2503.09572, 2025

  18. [18]

    Smart-LLM: Smart multi-agent robot task planning using large language models

    Shyam Sundar Kannan, Vishnunandan LN Venkatesh, and Byung- Cheol Min. Smart-LLM: Smart multi-agent robot task planning using large language models. In2024 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 12140–12147. IEEE, 2024

  19. [19]

    Graph-grounded LLMs: Leveraging graphical function calling to minimize LLM hallucinations

    Piyush Gupta, Sangjae Bae, and David Isele. Graph-grounded LLMs: Leveraging graphical function calling to minimize LLM hallucinations. arXiv preprint arXiv:2503.10941, 2025

  20. [20]

    Graph-enhanced large language models in asynchronous plan reasoning.arXiv preprint arXiv:2402.02805, 2024

    Fangru Lin, Emanuele La Malfa, Valentin Hofmann, Elle Michelle Yang, Anthony Cohn, and Janet B Pierrehumbert. Graph-enhanced large language models in asynchronous plan reasoning.arXiv preprint arXiv:2402.02805, 2024

  21. [21]

    Compositional coordination for multi-robot teams with large language models.arXiv preprint arXiv:2507.16068, 2025

    Zhehui Huang, Guangyao Shi, Yuwei Wu, Vijay Kumar, and Gaurav S Sukhatme. Compositional coordination for multi-robot teams with large language models.arXiv preprint arXiv:2507.16068, 2025

  22. [22]

    COHERENT: Collaboration of heterogeneous multi-robot system with large language models

    Kehui Liu, Zixin Tang, Dong Wang, Zhigang Wang, Xuelong Li, and Bin Zhao. COHERENT: Collaboration of heterogeneous multi-robot system with large language models. InInternational Conference on Robotics and Automation, pages 10208–10214. IEEE, 2025

  23. [23]

    M2PA: A multi-memory planning agent for open worlds inspired by cognitive theory

    YanfangZhou YanfangZhou, Xiaodong Li, Yuntao Liu, Yongqiang Zhao, Xintong Wang, Zhenyu Li, Jinlong Tian, and Xinhai Xu. M2PA: A multi-memory planning agent for open worlds inspired by cognitive theory. InFindings of the Association for Computational Linguistics: ACL 2025, pages 23204–23220, 2025

  24. [24]

    RAP: Retrieval-augmented planning with contextual memory for multimodal LLM agents.arXiv preprint arXiv:2402.03610, 2024

    Tomoyuki Kagaya, Thong Jing Yuan, Yuxuan Lou, Jayashree Karlekar, Sugiri Pranata, Akira Kinose, Koki Oguri, Felix Wick, and Yang You. RAP: Retrieval-augmented planning with contextual memory for multimodal LLM agents.arXiv preprint arXiv:2402.03610, 2024

  25. [25]

    REMEMBER: Building and reasoning over long-horizon spatio-temporal memory for robot navigation

    Abrar Anwar, John Welsh, Joydeep Biswas, Soha Pouya, and Yan Chang. REMEMBER: Building and reasoning over long-horizon spatio-temporal memory for robot navigation. InInternational Con- ference on Robotics and Automation, pages 2838–2845. IEEE, 2025

  26. [26]

    OPTIMUS-1: Hybrid multimodal memory empowered agents excel in long-horizon tasks.Advances in neural information processing systems, 37:49881–49913, 2024

    Zaijing Li, Yuquan Xie, Rui Shao, Gongwei Chen, Dongmei Jiang, and Liqiang Nie. OPTIMUS-1: Hybrid multimodal memory empowered agents excel in long-horizon tasks.Advances in neural information processing systems, 37:49881–49913, 2024

  27. [27]

    KARMA: Augmenting embodied ai agents with long-and-short term memory systems

    Zixuan Wang, Bo Yu, Junzhe Zhao, Wenhao Sun, Sai Hou, Shuai Liang, Xing Hu, Yinhe Han, and Yiming Gan. KARMA: Augmenting embodied ai agents with long-and-short term memory systems. In International Conference on Robotics and Automation, pages 1–8. IEEE, 2025

  28. [28]

    L3M+ P: Lifelong planning with large language models.arXiv preprint arXiv:2508.01917, 2025

    Krish Agarwal, Yuqian Jiang, Jiaheng Hu, Bo Liu, and Peter Stone. L3M+ P: Lifelong planning with large language models.arXiv preprint arXiv:2508.01917, 2025

  29. [29]

    Learning STRIPS action models with classical planning

    Diego Aineto, Sergio Jim ´enez, and Eva Onaindia. Learning STRIPS action models with classical planning. InProceedings of the Interna- tional Conference on Automated Planning and Scheduling, volume 28, pages 399–407, 2018

  30. [30]

    The fast downward planning system.Journal of Artificial Intelligence Research, 26:191–246, 2006

    Malte Helmert. The fast downward planning system.Journal of Artificial Intelligence Research, 26:191–246, 2006

  31. [31]

    GPT-5.https://openai.com/gpt-5/, 2025

    OpenAI. GPT-5.https://openai.com/gpt-5/, 2025. Ac- cessed: 2025-12-03

  32. [32]

    Scale-Plan: Scalable language-enabled task planning for heterogeneous multi-robot teams.arXiv preprint arXiv:2603.08814, 2026

    Piyush Gupta, Sangjae Bae, Jiachen Li, and David Isele. Scale-Plan: Scalable language-enabled task planning for heterogeneous multi-robot teams.arXiv preprint arXiv:2603.08814, 2026

  33. [33]

    AI2-THOR: An Interactive 3D Environment for Visual AI

    Eric Kolve, Roozbeh Mottaghi, Winson Han, Eli VanderBilt, Luca Weihs, Alvaro Herrasti, Matt Deitke, Kiana Ehsani, Daniel Gordon, Yuke Zhu, et al. AI2-THOR: An interactive 3D environment for visual AI.arXiv preprint arXiv:1712.05474, 2017

  34. [34]

    Large language model as a policy teacher for training reinforcement learning agents.arXiv preprint arXiv:2311.13373, 2023

    Zihao Zhou, Bin Hu, Chenyang Zhao, Pu Zhang, and Bin Liu. Large language model as a policy teacher for training reinforcement learning agents.arXiv preprint arXiv:2311.13373, 2023

  35. [35]

    Chain-of-thought prompting elicits reasoning in large language models.Advances in neural information processing systems, 35:24824–24837, 2022

    Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models.Advances in neural information processing systems, 35:24824–24837, 2022

  36. [36]

    SayPlan: grounding large language models using 3d scene graphs for scalable task planning

    Krishan Rana, Jesse Haviland, Sourav Garg, Jad Abou-Chakra, Ian Reid, and Niko Suenderhauf. SayPlan: grounding large language models using 3d scene graphs for scalable task planning. In7th Annual Conference on Robot Learning, 2023

  37. [37]

    Hierarchical planning for complex tasks with knowledge graph-rag and symbolic verification

    Flavio Petruzzellis, Cristina Cornelio, and Pietro Lio. Hierarchical planning for complex tasks with knowledge graph-rag and symbolic verification. InForty-Second International Conference on Machine Learning (ICML), 2025

  38. [38]

    The Llama 3 herd of models, 2024

    Llama Team, AI @ Meta. The Llama 3 herd of models, 2024

  39. [39]

    Phi-4 Technical Report

    Marah Abdin, Jyoti Aneja, Harkirat Behl, S ´ebastien Bubeck, Ronen Eldan, Suriya Gunasekar, Michael Harrison, Russell J Hewett, Mojan Javaheripi, Piero Kauffmann, et al. Phi-4 technical report.arXiv preprint arXiv:2412.08905, 2024

  40. [40]

    Jiang, Alexandre Sablayrolles, Antoine Roux, et al

    Albert Q. Jiang, Alexandre Sablayrolles, Antoine Roux, et al. Mistral 7b, 2023

  41. [41]

    Qwen2 technical report, 2024

    An Yang, Baosong Yang, Binyuan Hui, et al. Qwen2 technical report, 2024

  42. [42]

    Towards scalable & efficient interaction-aware planning in autonomous vehicles using knowledge distillation

    Piyush Gupta, David Isele, and Sangjae Bae. Towards scalable & efficient interaction-aware planning in autonomous vehicles using knowledge distillation. In2024 IEEE Intelligent Vehicles Symposium (IV), pages 2735–2742. IEEE, 2024