Learning Bilevel Policies over Symbolic World Models for Long-Horizon Planning

Dillon Z. Chen; Sheila A. McIlraith; Till Hofmann; Toryn Q. Klassen

arxiv: 2605.15975 · v2 · pith:ISPZPM4Jnew · submitted 2026-05-15 · 💻 cs.AI · cs.RO

Learning Bilevel Policies over Symbolic World Models for Long-Horizon Planning

Dillon Z. Chen , Till Hofmann , Toryn Q. Klassen , Sheila A. McIlraith This is my paper

Pith reviewed 2026-05-20 18:29 UTC · model grok-4.3

classification 💻 cs.AI cs.RO

keywords bilevel policiessymbolic abstractionslong-horizon planningimitation learningembodied AIMetaWorld benchmarksinductive generalisation

0 comments

The pith

Bilevel policies pair symbolic high-level abstraction with learned low-level control to solve long-horizon embodied tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to show that high-level symbolic policies, built from abstractions of low-level demonstrations via inductive generalisation, can be paired with a neural low-level policy to enable reliable long-horizon planning. This matters because pure imitation learning struggles to generate extended sequences while symbolic methods bring efficiency and scalability. If correct, the result is agents that handle many more objects and longer tasks than end-to-end or VLA baselines allow, with clear gains in training and inference speed.

Core claim

Bilevel policies of the form (π^hl, π^ll) are constructed so that the high-level symbolic policy is derived from symbolic abstractions of low-level demonstrations combined with inductive generalisation; the low-level component is a neural policy learned from demonstrations. Realised in the BISON system, this structure generalises to long horizons and greater object counts than VLA or end-to-end methods and is more time- and memory-efficient, with high-level policies solving problems involving 10,000 relevant objects in under a minute when low-level execution is ignored.

What carries the argument

Bilevel policy (π^hl, π^ll) in which the high-level symbolic component operates over a symbolic world model abstracted from demonstrations and extended by inductive generalisation, while the low-level component is a neural controller trained by imitation.

If this is right

The approach generalises to long horizons and problems with greater numbers of objects than those solved by VLA and end-to-end methods.
Training and inference are more time and memory efficient than baselines.
High-level policies alone can solve problems with 10,000 relevant objects in under a minute.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same abstraction-plus-generalisation route might be applied to other continuous domains such as navigation or multi-robot coordination.
Learned symbolic policies could reduce dependence on hand-designed features across a wider range of planning problems.
Physical-robot experiments would test whether the abstracted policies remain robust under sensor noise and execution uncertainty.

Load-bearing premise

Symbolic abstractions extracted from low-level demonstrations, combined with inductive generalisation, suffice to construct a high-level policy that preserves all planning-relevant structure without manual feature engineering or loss of critical constraints.

What would settle it

A case in which the derived high-level policy produces incomplete or invalid plans on a long-horizon task with many objects because critical constraints were lost during abstraction or generalisation would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.15975 by Dillon Z. Chen, Sheila A. McIlraith, Till Hofmann, Toryn Q. Klassen.

**Figure 1.** Figure 1: Top Left – inputs for learning and executing bilevel policies: a domain theory D, a labelling function L that maps observations to state abstractions, and LL demos with HL goals. Bottom Left – bilevel policy learning: LL demos induce HL demos via L, and LL/HL policies are separately learned from LL/HL demos. Right – bilevel policy execution: state abstractions s hl are computed from observations s ll via L… view at source ↗

**Figure 2.** Figure 2: The HL policy π hl(a hl | s hl , g hl) learning process. Step 1: we use the labelling function L to construct HL traces from the LL demonstrations paired with HL goals. Step 2: we utilise goal regression to extract condition-action rules from the HL traces and goals (underlined). Step 3: we inductively generalise the rules by replacing objects with variables to produce symbolic policies. 4.1 Learning HL Po… view at source ↗

**Figure 3.** Figure 3: The LL policy π ll(a ll | s ll , a hl , g hl) represented by a GNN. In this example, the input action is a hl = pick(obj, loc), and the resulting output is a ll. Solid lines represent graph edges, and dashed lines represent how information is passed. Bold font indicates Euclidean vectors. Theorem 1. Let D = ⟨P, A⟩ be an HL domain, L a labelling function, and C ∈ N. There exists a finite dataset T such that… view at source ↗

**Figure 4.** Figure 4: Median (line) and range (shaded) of success rate ( [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Success rate (↑) vs. time (↓) for training and inference with 10 objects. (Q4) Is BISON more efficient than (re)planning and end-toend approaches? From [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: HL solving time (↓) vs. number of objects. (Q5) Do learned HL policies generalise to arbitrary numbers of objects? We answer this question by inspecting the learned policies and evaluating their performance on HL planning problems over numbers of objects. Learned HL policies in BISON are symbolic and hence can be interpreted manually or by an LLM to generalise over arbitrary numbers of objects, as displaye… view at source ↗

**Figure 7.** Figure 7: 1https://pytorch-geometric.readthedocs.io/en/latest/tutorial/heterogeneous.html 24 [PITH_FULL_IMAGE:figures/full_fig_p024_7.png] view at source ↗

**Figure 7.** Figure 7: Visualisations of benchmark environments. [PITH_FULL_IMAGE:figures/full_fig_p025_7.png] view at source ↗

read the original abstract

We tackle the challenge of building embodied AI agents that can reliably solve long-horizon planning problems. Imitation learning from demonstrations has shown itself to be effective in training robots to solve a diversity of complex tasks requiring fine motor control and manipulation over low-level (LL), continuous environments. Yet, it remains a difficult endeavour to generate long-horizon plans from imitation learning alone. In contrast, high-level (HL), symbolic abstractions facilitate efficient and interpretable long-horizon planning. We propose to combine the strengths of LL imitation learning for manipulation and control, and HL symbolic abstractions for long-horizon planning. We realise this idea via \emph{bilevel policies} of the form $(\pi^{\mathrm{hl}}, \pi^{\mathrm{ll}})$, consisting of a neural policy $\pi^{\mathrm{ll}}$ learned from LL demonstrations, and an HL symbolic policy $\pi^{\mathrm{hl}}$ that is constructed from symbolic abstractions of the LL demonstrations combined with inductive generalisation. We implement these ideas in the BISON system. Experiments on extended MetaWorld benchmarks demonstrate that BISON generalises to long horizons and problems with greater numbers of objects than those solved by VLA and end-to-end methods, and is more time and memory efficient in training and inference. Notably, when ignoring LL execution, BISON's HL policies can solve HL problems with 10,000 relevant objects in under a minute. Project page: https://dillonzchen.github.io/bison

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

BISON builds a high-level symbolic policy from low-level demo abstractions via inductive generalization and pairs it with a neural low-level policy, delivering better scaling and efficiency on extended MetaWorld than baselines.

read the letter

The main takeaway is that this paper constructs a bilevel policy where the high-level symbolic part comes directly from abstracting low-level demonstrations and then generalizing inductively, while the low-level part is a standard neural policy trained on the same data. They implement it as BISON and test on extended MetaWorld tasks. The high-level component alone solves problems with 10,000 objects in under a minute when low-level execution is set aside, and the full system handles longer horizons and more objects than VLA or end-to-end methods while using less time and memory for training and inference. That scaling result is the clearest practical signal in the work. The construction of the high-level policy without manual feature engineering is the piece that feels distinct from prior hierarchical or symbolic planning approaches cited in the abstract. The experiments give concrete efficiency numbers and generalization claims that line up with the goal of reliable long-horizon planning in embodied settings. The hybrid setup makes sense for keeping symbolic speed at the top level and neural flexibility at the bottom. The main soft spots are in the experimental reporting. The abstract and high-level description mention benchmark gains but skip statistical tests, variance across runs, exact baseline implementations, and data exclusion rules. That leaves the generalization claims resting on summary numbers rather than detailed evidence. The load-bearing assumption that symbolic abstractions extracted from low-level demos plus inductive generalization will preserve all planning-relevant structure also needs more scrutiny. In continuous manipulation domains, things like grasp stability or metric reachability constraints can matter, and it is not obvious from the given description whether the pipeline catches cases where the high-level plan becomes infeasible for the low-level policy. If the full paper includes an executability audit or formal check on the abstractions, that would address the concern directly. This work is aimed at researchers in robotics and planning who are trying to combine symbolic methods with imitation learning for longer sequences. A reader focused on hybrid systems or scaling planning in manipulation tasks would find the bilevel construction and MetaWorld results worth examining. The paper shows clear thinking on the problem and engages honestly with the literature on hierarchical approaches, so it deserves a serious referee even if revisions are needed for tighter experimental details and more on abstraction soundness.

Referee Report

2 major / 2 minor

Summary. The paper proposes BISON, a bilevel policy framework for long-horizon embodied planning that pairs a neural low-level policy learned via imitation from demonstrations with a high-level symbolic policy constructed by extracting symbolic abstractions from those demonstrations and applying inductive generalization. Experiments on extended MetaWorld benchmarks claim that BISON generalizes to longer horizons and instances with more objects than VLA or end-to-end baselines, while being more efficient in training and inference; notably, the HL symbolic component solves problems with 10,000 objects in under a minute when LL execution is ignored.

Significance. If the central claims hold, the work would demonstrate a practical way to combine the scalability of symbolic planning with the robustness of neural control for manipulation, offering efficiency and generalization advantages on long-horizon tasks without manual feature engineering. The bilevel separation and use of inductive generalization over predicates are promising directions, though the absence of formal guarantees on abstraction fidelity limits immediate impact.

major comments (2)

[Experiments] Experiments section: the abstract and reported benchmark gains provide no details on statistical tests, number of runs, variance, data exclusion rules, or exact baseline implementations; this leaves the generalization claims to long horizons and greater object counts resting on high-level description only.
[Abstraction pipeline] Abstraction pipeline (Section 3): no formal soundness argument or exhaustive executability audit is described for the symbolic abstraction extraction step; without this, it is unclear whether continuous constraints such as grasp stability under varying masses or metric reachability are preserved when the HL policy is applied to the neural LL executor.

minor comments (2)

[Preliminaries] Notation: the bilevel policy is written as (π^hl, π^ll); explicitly state whether π^hl is a policy over predicates or a planner that invokes the LL policy at each step.
[Figures] Figure clarity: ensure that diagrams of the abstraction extraction and inductive generalization steps label all inputs/outputs and distinguish learned versus hand-specified components.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and indicate the revisions we will make to improve clarity and rigor.

read point-by-point responses

Referee: [Experiments] Experiments section: the abstract and reported benchmark gains provide no details on statistical tests, number of runs, variance, data exclusion rules, or exact baseline implementations; this leaves the generalization claims to long horizons and greater object counts resting on high-level description only.

Authors: We agree that the current presentation of results lacks sufficient methodological detail for full reproducibility and assessment of the generalization claims. In the revised manuscript we will expand the Experiments section to report the number of independent runs (five random seeds per method and task), mean success rates with standard deviations, the statistical tests performed (paired t-tests with p-values), data exclusion rules (none applied beyond the standard success criterion of task completion within the horizon), and precise implementation details for all baselines including VLA and end-to-end architectures, training hyperparameters, and evaluation protocols. revision: yes
Referee: [Abstraction pipeline] Abstraction pipeline (Section 3): no formal soundness argument or exhaustive executability audit is described for the symbolic abstraction extraction step; without this, it is unclear whether continuous constraints such as grasp stability under varying masses or metric reachability are preserved when the HL policy is applied to the neural LL executor.

Authors: We acknowledge that the manuscript does not supply a formal soundness proof for the abstraction extraction procedure. The predicates are induced directly from successful low-level demonstration trajectories, and the neural low-level policy is trained via imitation learning to realize them in the continuous domain; task success rates in our benchmarks provide empirical evidence that grasp stability and reachability are handled adequately by the combined bilevel system. In the revision we will add an explicit discussion subsection in Section 3 that describes the extraction heuristics, reports observed failure modes related to continuous constraints, and states the absence of formal guarantees as a limitation. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical claims rest on external benchmarks and independent comparisons

full rationale

The paper presents a system proposal for bilevel policies (neural LL policy from demonstrations plus HL symbolic policy from abstractions and inductive generalization) and supports its claims via experimental results on extended MetaWorld benchmarks against VLA and end-to-end baselines. No mathematical derivation chain, equations, or fitted parameters are described that reduce by construction to the paper's own inputs. Evaluation relies on external benchmark performance metrics (time, memory, generalization to long horizons and 10k objects), which are falsifiable outside the system and do not invoke self-citation chains or self-definitional loops for the central results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim depends on the domain assumption that demonstration-derived symbolic abstractions plus inductive generalisation yield a complete and sound high-level policy; no free parameters or invented physical entities are described in the abstract.

axioms (1)

domain assumption Symbolic abstractions extracted from low-level demonstrations combined with inductive generalisation produce an effective high-level policy for long-horizon planning.
Invoked to justify automatic construction of π^hl without manual engineering.

invented entities (1)

Bilevel policy (π^hl, π^ll) no independent evidence
purpose: To separate symbolic long-horizon planning from neural low-level execution.
Core architectural construct introduced by the BISON system.

pith-pipeline@v0.9.0 · 5804 in / 1362 out tokens · 110063 ms · 2026-05-20T18:29:01.504079+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We apply goal regression ... to extract a set of Condition → Action rules. We inductively generalise the resulting rules to produce compact and expressive HL policies consisting of first-order, condition-action rules
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

HL policies πhl consist of sets of first-order, condition-action rules ... with associated priority values related to goal proximity

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

153 extracted references · 153 canonical work pages · 7 internal anchors

[1]

David Abel, Dilip Arumugam, Lucas Lehnert, and Michael L. Littman. State abstractions for lifelong reinforcement learning. InICML, 2018

work page 2018
[2]

Ellis Hershkowitz, and Michael L

David Abel, D. Ellis Hershkowitz, and Michael L. Littman. Near optimal behavior via approximate state abstraction. InICML, 2016

work page 2016
[3]

Tenenbaum, Christopher Bates, and Samuel J

Zergham Ahmed, Joshua B. Tenenbaum, Christopher Bates, and Samuel J. Gershman. Synthe- sizing world models for bilevel planning.Trans. Mach. Learn. Res., 2025, 2025

work page 2025
[4]

Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alexander Herzog, Daniel Ho, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Eric Jang, Rosario Jauregui Ruano, Kyle Jeffrey, Sally Jesmonth, Nikhil J. Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Kuang-H...

work page 2022
[5]

De- von Hjelm

Ankesh Anand, Evan Racah, Sherjil Ozair, Yoshua Bengio, Marc-Alexandre Côté, and R. De- von Hjelm. Unsupervised state representation learning in Atari. InNeurIPS, 2019

work page 2019
[6]

Classical planning in deep latent space: Bridging the subsymbolic-symbolic boundary

Masataro Asai and Alex Fukunaga. Classical planning in deep latent space: Bridging the subsymbolic-symbolic boundary. InAAAI, 2018. 10

work page 2018
[7]

Classical planning in deep latent space.J

Masataro Asai, Hiroshi Kajino, Alex Fukunaga, and Christian Muise. Classical planning in deep latent space.J. Artif. Intell. Res., 74:1599–1686, 2022

work page 2022
[8]

From pixels to predicates: Learning symbolic world models via pretrained VLMs.IEEE Robotics and Automation Letters, 11(4):4002–4009, 2026

Ashay Athalye, Nishanth Kumar, Tom Silver, Yichao Liang, Jiuguang Wang, Tomás Lozano- Pérez, and Leslie Pack Kaelbling. From pixels to predicates: Learning symbolic world models via pretrained VLMs.IEEE Robotics and Automation Letters, 11(4):4002–4009, 2026

work page 2026
[9]

Downward refinement and the efficiency of hierarchical problem solving.Artif

Fahiem Bacchus and Qiang Yang. Downward refinement and the efficiency of hierarchical problem solving.Artif. Intell., 71(1):43–100, 1994

work page 1994
[10]

A survey on hierarchical planning - one abstract idea, many concrete realizations

Pascal Bercher, Ron Alford, and Daniel Höller. A survey on hierarchical planning - one abstract idea, many concrete realizations. InIJCAI, 2019

work page 2019
[11]

Planning for temporally extended goals in pure-past linear temporal logic.Artif

Luigi Bonassi, Giuseppe De Giacomo, Marco Favorito, Francesco Fuggitti, Alfonso Emilio Gerevini, and Enrico Scala. Planning for temporally extended goals in pure-past linear temporal logic.Artif. Intell., 348:104409, 2025

work page 2025
[12]

Learning to predict action feasibility for task and motion planning in 3d environments

Smail Ait Bouhsain, Rachid Alami, and Thierry Siméon. Learning to predict action feasibility for task and motion planning in 3d environments. InICRA, 2023

work page 2023
[13]

Learning geometric reasoning networks for robot task and motion planning

Smail Ait Bouhsain, Rachid Alami, and Thierry Siméon. Learning geometric reasoning networks for robot task and motion planning. InICLR, 2025

work page 2025
[14]

Using abstractions for decision-theoretic planning with time constraints

Craig Boutilier and Richard Dearden. Using abstractions for decision-theoretic planning with time constraints. InAAAI, 1994

work page 1994
[15]

Zavlanos, and Miroslav Pajic

Alper Kamil Bozkurt, Yu Wang, Michael M. Zavlanos, and Miroslav Pajic. Control synthesis from linear temporal logic specifications using model-free reinforcement learning. InICRA, 2020

work page 2020
[16]

MONet: Unsupervised Scene Decomposition and Representation

Christopher P. Burgess, Loïc Matthey, Nicholas Watters, Rishabh Kabra, Irina Higgins, Matthew M. Botvinick, and Alexander Lerchner. MONet: Unsupervised scene decomposition and representation.CoRR, abs/1901.11390, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1901
[17]

Klassen, Richard Anthony Valenzano, and Sheila A

Alberto Camacho, Rodrigo Toro Icarte, Toryn Q. Klassen, Richard Anthony Valenzano, and Sheila A. McIlraith. LTL and beyond: Formal languages for reward function specification in reinforcement learning. InIJCAI, 2019

work page 2019
[18]

Diffusion policy: Visuomotor policy learning via action diffusion

Cheng Chi, Siyuan Feng, Yilun Du, Zhenjia Xu, Eric Cousineau, Benjamin Burchfiel, and Shuran Song. Diffusion policy: Visuomotor policy learning via action diffusion. InRSS, 2023

work page 2023
[19]

Guided search for task and motion plans using learned heuristics

Rohan Chitnis, Dylan Hadfield-Menell, Abhishek Gupta, Siddharth Srivastava, Edward Gro- shev, Christopher Lin, and Pieter Abbeel. Guided search for task and motion plans using learned heuristics. InICRA, 2016

work page 2016
[20]

Tenenbaum, Tomás Lozano-Pérez, and Leslie Pack Kaelbling

Rohan Chitnis, Tom Silver, Joshua B. Tenenbaum, Tomás Lozano-Pérez, and Leslie Pack Kaelbling. Learning neuro-symbolic relational transition models for bilevel planning. InIROS, 2022

work page 2022
[21]

Learning long-horizon action dependencies in sampling-based bilevel planning

Bartlomiej Cieslar, Leslie Pack Kaelbling, Tomás Lozano-Pérez, and Jorge Mendez-Mendez. Learning long-horizon action dependencies in sampling-based bilevel planning. InCoRL, 2024

work page 2024
[22]

Corrêa, Florian Pommerening, Malte Helmert, and Guillem Francès

Augusto B. Corrêa, Florian Pommerening, Malte Helmert, and Guillem Francès. Lifted successor generation using query optimization techniques. InICAPS, 2020

work page 2020
[23]

Long-horizon manipulation of unknown objects via task and motion planning with estimated affordances

Aidan Curtis, Xiaolin Fang, Leslie Pack Kaelbling, Tomás Lozano-Pérez, and Caelan Reed Garrett. Long-horizon manipulation of unknown objects via task and motion planning with estimated affordances. InICRA, 2022

work page 2022
[24]

Tenenbaum, Tomás Lozano-Pérez, and Leslie Pack Kaelbling

Aidan Curtis, George Matheos, Nishad Gothoskar, Vikash Mansinghka, Joshua B. Tenenbaum, Tomás Lozano-Pérez, and Leslie Pack Kaelbling. Partially observable task and motion planning with uncertainty and risk awareness. InRSS, 2024. 11

work page 2024
[25]

Tenenbaum, Tomás Lozano-Pérez, and Leslie Pack Kaelbling

Aidan Curtis, Tom Silver, Joshua B. Tenenbaum, Tomás Lozano-Pérez, and Leslie Pack Kaelbling. Discovering state and action abstractions for generalized task and motion planning. InAAAI, 2022

work page 2022
[26]

Dantam, Zachary K

Neil T. Dantam, Zachary K. Kingston, Swarat Chaudhuri, and Lydia E. Kavraki. An incremen- tal constraint-based framework for task and motion planning.Int. J. Robotics Res., 37(10), 2018

work page 2018
[27]

Peter Dayan and Geoffrey E. Hinton. Feudal reinforcement learning. InNeurIPS, 1992

work page 1992
[28]

Dean and Robert Givan

Thomas L. Dean and Robert Givan. Model minimization in Markov decision processes. In AAAI, 1997

work page 1997
[29]

Dietterich

Thomas G. Dietterich. Hierarchical reinforcement learning with the MAXQ value function decomposition.J. Artif. Intell. Res., 13:227–303, 2000

work page 2000
[30]

Deep visual reasoning: Learning to predict action sequences for task and motion planning from an initial scene image

Danny Driess, Jung-Su Ha, and Marc Toussaint. Deep visual reasoning: Learning to predict action sequences for task and motion planning from an initial scene image. InRSS, 2020

work page 2020
[31]

Oguz, Jung-Su Ha, and Marc Toussaint

Danny Driess, Ozgur S. Oguz, Jung-Su Ha, and Marc Toussaint. Deep visual heuristics: Learning feasibility of mixed-integer programs for manipulation planning. InICRA, 2020

work page 2020
[32]

Danny Driess, Fei Xia, Mehdi S. M. Sajjadi, Corey Lynch, Aakanksha Chowdhery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, Wenlong Huang, Yevgen Chebotar, Pierre Sermanet, Daniel Duckworth, Sergey Levine, Vincent Vanhoucke, Karol Hausman, Marc Toussaint, Klaus Greff, Andy Zeng, Igor Mordatch, and Pete Florence. Palm-e: An embodied ...

work page 2023
[33]

Fast task planning with neuro-symbolic relaxation.IEEE Robotics Autom

Qiwei Du, Bowen Li, Yi Du, Shaoshu Su, Taimeng Fu, Zitong Zhan, Zhipeng Zhao, and Chen Wang. Fast task planning with neuro-symbolic relaxation.IEEE Robotics Autom. Lett., 11(3):3684–3691, 2026

work page 2026
[34]

Hwang, Soumya Sanyal, Xiang Ren, Allyson Ettinger, Zaïd Harchaoui, and Yejin Choi

Nouha Dziri, Ximing Lu, Melanie Sclar, Xiang Lorraine Li, Liwei Jiang, Bill Yuchen Lin, Sean Welleck, Peter West, Chandra Bhagavatula, Ronan Le Bras, Jena D. Hwang, Soumya Sanyal, Xiang Ren, Allyson Ettinger, Zaïd Harchaoui, and Yejin Choi. Faith and fate: Limits of transformers on compositionality. InNeurIPS, 2023

work page 2023
[35]

Hendler, and Dana S

Kutluhan Erol, James A. Hendler, and Dana S. Nau. Complexity results for HTN planning. Ann. Math. Artif. Intell., 18(1):69–93, 1996

work page 1996
[36]

Hart, and Nils J

Richard Fikes, Peter E. Hart, and Nils J. Nilsson. Learning and executing generalized robot plans.Artif. Intell., 3(1-3):251–288, 1972

work page 1972
[37]

Richard Fikes and Nils J. Nilsson. STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving.Artif. Intell., 2(3/4):189–208, 1971

work page 1971
[38]

McIlraith

Christian Fritz and Sheila A. McIlraith. Monitoring plan optimality during execution. In ICAPS, 2007

work page 2007
[39]

Holladay, Beomjoon Kim, Tom Silver, Leslie Pack Kaelbling, and Tomás Lozano-Pérez

Caelan Reed Garrett, Rohan Chitnis, Rachel M. Holladay, Beomjoon Kim, Tom Silver, Leslie Pack Kaelbling, and Tomás Lozano-Pérez. Integrated task and motion planning.Annu. Rev. Control. Robotics Auton. Syst., 4:265–293, 2021

work page 2021
[40]

Synthesis Lectures on Artificial Intelligence and Machine Learning

Hector Geffner and Blai Bonet.A Concise Introduction to Models and Methods for Automated Planning. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers, 2013

work page 2013
[41]

Nau, and Paolo Traverso.Automated planning - theory and practice

Malik Ghallab, Dana S. Nau, and Paolo Traverso.Automated planning - theory and practice. Elsevier, 2004

work page 2004
[42]

Schoenholz, Patrick F

Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George E. Dahl. Neural message passing for quantum chemistry. InICML, 2017

work page 2017
[43]

A theory of abstraction.Artif

Fausto Giunchiglia and Toby Walsh. A theory of abstraction.Artif. Intell., 57:323–389, 1992. 12

work page 1992
[44]

Cordell Green

C. Cordell Green. Application of theorem proving to problem solving. In Donald E. Walker and Lewis M. Norton, editors,IJCAI, pages 219–240. William Kaufmann, 1969

work page 1969
[45]

Exploiting first-order regression in inductive policy selection

Charles Gretton and Sylvie Thiébaux. Exploiting first-order regression in inductive policy selection. InUAI, 2004

work page 2004
[46]

INTERPRET: interactive predicate learning from language feedback for generalizable task planning

Muzhi Han, Yifeng Zhu, Song-Chun Zhu, Ying Nian Wu, and Yuke Zhu. INTERPRET: interactive predicate learning from language feedback for generalizable task planning. InRSS, 2024

work page 2024
[47]

Entity-centric reinforcement learning for object manipulation from pixels

Dan Haramati, Tal Daniel, and Aviv Tamar. Entity-centric reinforcement learning for object manipulation from pixels. InICLR, 2024

work page 2024
[48]

Patrik Haslum, Nir Lipovetzky, Daniele Magazzeni, and Christian Muise.An Introduction to the Planning Domain Definition Language. 2019

work page 2019
[49]

Dean, and Craig Boutilier

Milos Hauskrecht, Nicolas Meuleau, Leslie Pack Kaelbling, Thomas L. Dean, and Craig Boutilier. Hierarchical solution of Markov decision processes using macro-actions. InUAI, 1998

work page 1998
[50]

The fast downward planning system.J

Malte Helmert. The fast downward planning system.J. Artif. Intell. Res., 26:191–246, 2006

work page 2006
[51]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In NeurIPS, 2020

work page 2020
[52]

Training compute-optimal large language models

Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, DDL Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, et al. Training compute-optimal large language models. InNeurIPS, 2022

work page 2022
[53]

Look before you leap: Unveiling the power of gpt-4v in robotic vision- language planning,

Yingdong Hu, Fanqi Lin, Tong Zhang, Li Yi, and Yang Gao. Look before you leap: Unveiling the power of GPT-4V in robotic vision-language planning.CoRR, abs/2311.17842, 2023

work page arXiv 2023
[54]

Automated planning domain inference for task and motion planning

Jinbang Huang, Allen Tao, Rozilyn Marco, Miroslav Bogdanovic, Jonathan Kelly, and Florian Shkurti. Automated planning domain inference for task and motion planning. InICRA, 2025

work page 2025
[55]

Inner monologue: Embodied reasoning through planning with language models

Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan Tompson, Igor Mordatch, Yevgen Chebotar, Pierre Sermanet, Tomas Jackson, Noah Brown, Linda Luu, Sergey Levine, Karol Hausman, and Brian Ichter. Inner monologue: Embodied reasoning through planning with language models. InCoRL, 2022

work page 2022
[56]

McIlraith

León Illanes, Xi Yan, Rodrigo Toro Icarte, and Sheila A. McIlraith. Symbolic plans as high-level instructions for reinforcement learning. InICAPS, 2020

work page 2020
[57]

Physical Intelligence, Bo Ai, Ali Amin, Raichelle Aniceto, Ashwin Balakrishna, Greg Balke, Kevin Black, George Bokinsky, Shihao Cao, Thomas Charbonnier, Vedant Choudhary, Foster Collins, Ken Conley, Grace Connors, James Darpinian, Karan Dhabalia, Maitrayee Dhaka, Jared DiCarlo, Danny Driess, Michael Equi, Adnan Esmail, Yunhao Fang, Chelsea Finn, Catherine...

work page internal anchor Pith review Pith/arXiv arXiv 2026
[58]

Physical Intelligence, Kevin Black, Noah Brown, James Darpinian, Karan Dhabalia, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, Manuel Y . Galliker, Dibya Ghosh, Lachy Groom, Karol Hausman, Brian Ichter, Szymon Jakubczak, Tim Jones, Liyiming Ke, Devin LeBlanc, Sergey Levine, Adrian Li-Bell, Mohith Mothukuri, Suraj Nair, Karl Pertsc...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[59]

Deepltl: Learning to efficiently satisfy complex LTL specifications for multi-task RL

Mathias Jackermeier and Alessandro Abate. Deepltl: Learning to efficiently satisfy complex LTL specifications for multi-task RL. InICLR, 2025

work page 2025
[60]

Autonomous learning of object- centric abstractions for high-level planning

Steven James, Benjamin Rosman, and George Konidaris. Autonomous learning of object- centric abstractions for high-level planning. InICLR, 2022

work page 2022
[61]

Position: Llms can’t plan, but can help planning in llm-modulo frameworks

Subbarao Kambhampati, Karthik Valmeekam, Lin Guan, Mudit Verma, Kaya Stechly, Siddhant Bhambri, Lucas Saldyt, and Anil Murthy. Position: Llms can’t plan, but can help planning in llm-modulo frameworks. InICML, 2024

work page 2024
[62]

Learning to search in task and motion planning with streams.IEEE Robotics Autom

Mohamed Khodeir, Ben Agro, and Florian Shkurti. Learning to search in task and motion planning with streams.IEEE Robotics Autom. Lett., 8(4):1983–1990, 2023

work page 1983
[63]

Learning to guide task and motion planning using score-space representation.Int

Beomjoon Kim, Zi Wang, Leslie Pack Kaelbling, and Tomás Lozano-Pérez. Learning to guide task and motion planning using score-space representation.Int. J. Robotics Res., 38(7), 2019

work page 2019
[64]

Sanketi, Quan Vuong, Thomas Kollar, Benjamin Burchfiel, Russ Tedrake, Dorsa Sadigh, Sergey Levine, Percy Liang, and Chelsea Finn

Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan Paul Foster, Pannag R. Sanketi, Quan Vuong, Thomas Kollar, Benjamin Burchfiel, Russ Tedrake, Dorsa Sadigh, Sergey Levine, Percy Liang, and Chelsea Finn. Openvla: An open-source vision-language-action model. InCoRL, 2024

work page 2024
[65]

Kingma and Jimmy Ba

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InICLR, 2015

work page 2015
[66]

Kipf, Elise van der Pol, and Max Welling

Thomas N. Kipf, Elise van der Pol, and Max Welling. Contrastive learning of structured world models. InICLR, 2020

work page 2020
[67]

Kipf and Max Welling

Thomas N. Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. InICLR, 2017

work page 2017
[68]

On the necessity of abstraction.Current Opinion in Behavioral Sciences, 29:1–7, 2019

George Konidaris. On the necessity of abstraction.Current Opinion in Behavioral Sciences, 29:1–7, 2019. Artificial Intelligence

work page 2019
[69]

George Dimitri Konidaris and Andrew G. Barto. Efficient skill learning using abstraction selection. InIJCAI, 2009

work page 2009
[70]

Constructing symbolic representations for high-level planning

George Dimitri Konidaris, Leslie Pack Kaelbling, and Tomás Lozano-Pérez. Constructing symbolic representations for high-level planning. InAAAI, 2014

work page 2014
[71]

From skills to symbols: Learning symbolic representations for abstract high-level planning.J

George Dimitri Konidaris, Leslie Pack Kaelbling, and Tomás Lozano-Pérez. From skills to symbols: Learning symbolic representations for abstract high-level planning.J. Artif. Intell. Res., 61:215–289, 2018

work page 2018
[72]

Richard E. Korf. Planning as search: A quantitative approach.Artif. Intell., 33(1):65–88, 1987

work page 1987
[73]

Nishanth Kumar, Tom Silver, Willie McClinton, Linfeng Zhao, Stephen Proulx, Tomás Lozano- Pérez, Leslie Pack Kaelbling, and Jennifer L. Barry. Practice makes perfect: Planning to learning skill parameter policies. InRSS, 2024

work page 2024
[74]

Hierarchical imitation and reinforcement learning

Hoang Minh Le, Nan Jiang, Alekh Agarwal, Miroslav Dudík, Yisong Yue, and Hal Daumé III. Hierarchical imitation and reinforcement learning. InICML, 2018

work page 2018
[75]

Li, Toryn Q

Andrew C. Li, Toryn Q. Klassen, Andrew Wang, Parand A. Alamdari, and Sheila A. McIlraith. Ground-compose-reinforce: Grounding language inagentic behaviours using limited data. In NeurIPS, 2025. 14

work page 2025
[76]

Scherer, and Alexander G

Bowen Li, Tom Silver, Sebastian A. Scherer, and Alexander G. Gray. Bilevel learning for bilevel planning. InRSS, 2025

work page 2025
[77]

Walsh, and Michael L

Lihong Li, Thomas J. Walsh, and Michael L. Littman. Towards a unified theory of state abstraction for mdps. InInternational Symposium on Artificial Intelligence and Mathematics, 2006

work page 2006
[78]

Reinforcement learning with temporal logic rewards

Xiao Li, Cristian Ioan Vasile, and Calin Belta. Reinforcement learning with temporal logic rewards. InIROS, 2017

work page 2017
[79]

Tenenbaum, Tom Silver, João F

Yichao Liang, Nishanth Kumar, Hao Tang, Adrian Weller, Joshua B. Tenenbaum, Tom Silver, João F. Henriques, and Kevin Ellis. Visualpredicator: Learning abstract world models with neuro-symbolic predicates for robot planning. InICLR, 2025

work page 2025
[80]

Tenenbaum, Carl Edward Rasmussen, Adrian Weller, Zenna Tavares, Tom Silver, and Kevin Ellis

Yichao Liang, Dat Nguyen, Cambridge Yang, Tianyang Li, Joshua B. Tenenbaum, Carl Edward Rasmussen, Adrian Weller, Zenna Tavares, Tom Silver, and Kevin Ellis. Exopredicator: Learning abstract models of dynamic worlds for robot planning. InICLR, 2026

work page 2026

Showing first 80 references.

[1] [1]

David Abel, Dilip Arumugam, Lucas Lehnert, and Michael L. Littman. State abstractions for lifelong reinforcement learning. InICML, 2018

work page 2018

[2] [2]

Ellis Hershkowitz, and Michael L

David Abel, D. Ellis Hershkowitz, and Michael L. Littman. Near optimal behavior via approximate state abstraction. InICML, 2016

work page 2016

[3] [3]

Tenenbaum, Christopher Bates, and Samuel J

Zergham Ahmed, Joshua B. Tenenbaum, Christopher Bates, and Samuel J. Gershman. Synthe- sizing world models for bilevel planning.Trans. Mach. Learn. Res., 2025, 2025

work page 2025

[4] [4]

Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alexander Herzog, Daniel Ho, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Eric Jang, Rosario Jauregui Ruano, Kyle Jeffrey, Sally Jesmonth, Nikhil J. Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Kuang-H...

work page 2022

[5] [5]

De- von Hjelm

Ankesh Anand, Evan Racah, Sherjil Ozair, Yoshua Bengio, Marc-Alexandre Côté, and R. De- von Hjelm. Unsupervised state representation learning in Atari. InNeurIPS, 2019

work page 2019

[6] [6]

Classical planning in deep latent space: Bridging the subsymbolic-symbolic boundary

Masataro Asai and Alex Fukunaga. Classical planning in deep latent space: Bridging the subsymbolic-symbolic boundary. InAAAI, 2018. 10

work page 2018

[7] [7]

Classical planning in deep latent space.J

Masataro Asai, Hiroshi Kajino, Alex Fukunaga, and Christian Muise. Classical planning in deep latent space.J. Artif. Intell. Res., 74:1599–1686, 2022

work page 2022

[8] [8]

From pixels to predicates: Learning symbolic world models via pretrained VLMs.IEEE Robotics and Automation Letters, 11(4):4002–4009, 2026

Ashay Athalye, Nishanth Kumar, Tom Silver, Yichao Liang, Jiuguang Wang, Tomás Lozano- Pérez, and Leslie Pack Kaelbling. From pixels to predicates: Learning symbolic world models via pretrained VLMs.IEEE Robotics and Automation Letters, 11(4):4002–4009, 2026

work page 2026

[9] [9]

Downward refinement and the efficiency of hierarchical problem solving.Artif

Fahiem Bacchus and Qiang Yang. Downward refinement and the efficiency of hierarchical problem solving.Artif. Intell., 71(1):43–100, 1994

work page 1994

[10] [10]

A survey on hierarchical planning - one abstract idea, many concrete realizations

Pascal Bercher, Ron Alford, and Daniel Höller. A survey on hierarchical planning - one abstract idea, many concrete realizations. InIJCAI, 2019

work page 2019

[11] [11]

Planning for temporally extended goals in pure-past linear temporal logic.Artif

Luigi Bonassi, Giuseppe De Giacomo, Marco Favorito, Francesco Fuggitti, Alfonso Emilio Gerevini, and Enrico Scala. Planning for temporally extended goals in pure-past linear temporal logic.Artif. Intell., 348:104409, 2025

work page 2025

[12] [12]

Learning to predict action feasibility for task and motion planning in 3d environments

Smail Ait Bouhsain, Rachid Alami, and Thierry Siméon. Learning to predict action feasibility for task and motion planning in 3d environments. InICRA, 2023

work page 2023

[13] [13]

Learning geometric reasoning networks for robot task and motion planning

Smail Ait Bouhsain, Rachid Alami, and Thierry Siméon. Learning geometric reasoning networks for robot task and motion planning. InICLR, 2025

work page 2025

[14] [14]

Using abstractions for decision-theoretic planning with time constraints

Craig Boutilier and Richard Dearden. Using abstractions for decision-theoretic planning with time constraints. InAAAI, 1994

work page 1994

[15] [15]

Zavlanos, and Miroslav Pajic

Alper Kamil Bozkurt, Yu Wang, Michael M. Zavlanos, and Miroslav Pajic. Control synthesis from linear temporal logic specifications using model-free reinforcement learning. InICRA, 2020

work page 2020

[16] [16]

MONet: Unsupervised Scene Decomposition and Representation

Christopher P. Burgess, Loïc Matthey, Nicholas Watters, Rishabh Kabra, Irina Higgins, Matthew M. Botvinick, and Alexander Lerchner. MONet: Unsupervised scene decomposition and representation.CoRR, abs/1901.11390, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1901

[17] [17]

Klassen, Richard Anthony Valenzano, and Sheila A

Alberto Camacho, Rodrigo Toro Icarte, Toryn Q. Klassen, Richard Anthony Valenzano, and Sheila A. McIlraith. LTL and beyond: Formal languages for reward function specification in reinforcement learning. InIJCAI, 2019

work page 2019

[18] [18]

Diffusion policy: Visuomotor policy learning via action diffusion

Cheng Chi, Siyuan Feng, Yilun Du, Zhenjia Xu, Eric Cousineau, Benjamin Burchfiel, and Shuran Song. Diffusion policy: Visuomotor policy learning via action diffusion. InRSS, 2023

work page 2023

[19] [19]

Guided search for task and motion plans using learned heuristics

Rohan Chitnis, Dylan Hadfield-Menell, Abhishek Gupta, Siddharth Srivastava, Edward Gro- shev, Christopher Lin, and Pieter Abbeel. Guided search for task and motion plans using learned heuristics. InICRA, 2016

work page 2016

[20] [20]

Tenenbaum, Tomás Lozano-Pérez, and Leslie Pack Kaelbling

Rohan Chitnis, Tom Silver, Joshua B. Tenenbaum, Tomás Lozano-Pérez, and Leslie Pack Kaelbling. Learning neuro-symbolic relational transition models for bilevel planning. InIROS, 2022

work page 2022

[21] [21]

Learning long-horizon action dependencies in sampling-based bilevel planning

Bartlomiej Cieslar, Leslie Pack Kaelbling, Tomás Lozano-Pérez, and Jorge Mendez-Mendez. Learning long-horizon action dependencies in sampling-based bilevel planning. InCoRL, 2024

work page 2024

[22] [22]

Corrêa, Florian Pommerening, Malte Helmert, and Guillem Francès

Augusto B. Corrêa, Florian Pommerening, Malte Helmert, and Guillem Francès. Lifted successor generation using query optimization techniques. InICAPS, 2020

work page 2020

[23] [23]

Long-horizon manipulation of unknown objects via task and motion planning with estimated affordances

Aidan Curtis, Xiaolin Fang, Leslie Pack Kaelbling, Tomás Lozano-Pérez, and Caelan Reed Garrett. Long-horizon manipulation of unknown objects via task and motion planning with estimated affordances. InICRA, 2022

work page 2022

[24] [24]

Tenenbaum, Tomás Lozano-Pérez, and Leslie Pack Kaelbling

Aidan Curtis, George Matheos, Nishad Gothoskar, Vikash Mansinghka, Joshua B. Tenenbaum, Tomás Lozano-Pérez, and Leslie Pack Kaelbling. Partially observable task and motion planning with uncertainty and risk awareness. InRSS, 2024. 11

work page 2024

[25] [25]

Tenenbaum, Tomás Lozano-Pérez, and Leslie Pack Kaelbling

Aidan Curtis, Tom Silver, Joshua B. Tenenbaum, Tomás Lozano-Pérez, and Leslie Pack Kaelbling. Discovering state and action abstractions for generalized task and motion planning. InAAAI, 2022

work page 2022

[26] [26]

Dantam, Zachary K

Neil T. Dantam, Zachary K. Kingston, Swarat Chaudhuri, and Lydia E. Kavraki. An incremen- tal constraint-based framework for task and motion planning.Int. J. Robotics Res., 37(10), 2018

work page 2018

[27] [27]

Peter Dayan and Geoffrey E. Hinton. Feudal reinforcement learning. InNeurIPS, 1992

work page 1992

[28] [28]

Dean and Robert Givan

Thomas L. Dean and Robert Givan. Model minimization in Markov decision processes. In AAAI, 1997

work page 1997

[29] [29]

Dietterich

Thomas G. Dietterich. Hierarchical reinforcement learning with the MAXQ value function decomposition.J. Artif. Intell. Res., 13:227–303, 2000

work page 2000

[30] [30]

Deep visual reasoning: Learning to predict action sequences for task and motion planning from an initial scene image

Danny Driess, Jung-Su Ha, and Marc Toussaint. Deep visual reasoning: Learning to predict action sequences for task and motion planning from an initial scene image. InRSS, 2020

work page 2020

[31] [31]

Oguz, Jung-Su Ha, and Marc Toussaint

Danny Driess, Ozgur S. Oguz, Jung-Su Ha, and Marc Toussaint. Deep visual heuristics: Learning feasibility of mixed-integer programs for manipulation planning. InICRA, 2020

work page 2020

[32] [32]

Danny Driess, Fei Xia, Mehdi S. M. Sajjadi, Corey Lynch, Aakanksha Chowdhery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, Wenlong Huang, Yevgen Chebotar, Pierre Sermanet, Daniel Duckworth, Sergey Levine, Vincent Vanhoucke, Karol Hausman, Marc Toussaint, Klaus Greff, Andy Zeng, Igor Mordatch, and Pete Florence. Palm-e: An embodied ...

work page 2023

[33] [33]

Fast task planning with neuro-symbolic relaxation.IEEE Robotics Autom

Qiwei Du, Bowen Li, Yi Du, Shaoshu Su, Taimeng Fu, Zitong Zhan, Zhipeng Zhao, and Chen Wang. Fast task planning with neuro-symbolic relaxation.IEEE Robotics Autom. Lett., 11(3):3684–3691, 2026

work page 2026

[34] [34]

Hwang, Soumya Sanyal, Xiang Ren, Allyson Ettinger, Zaïd Harchaoui, and Yejin Choi

Nouha Dziri, Ximing Lu, Melanie Sclar, Xiang Lorraine Li, Liwei Jiang, Bill Yuchen Lin, Sean Welleck, Peter West, Chandra Bhagavatula, Ronan Le Bras, Jena D. Hwang, Soumya Sanyal, Xiang Ren, Allyson Ettinger, Zaïd Harchaoui, and Yejin Choi. Faith and fate: Limits of transformers on compositionality. InNeurIPS, 2023

work page 2023

[35] [35]

Hendler, and Dana S

Kutluhan Erol, James A. Hendler, and Dana S. Nau. Complexity results for HTN planning. Ann. Math. Artif. Intell., 18(1):69–93, 1996

work page 1996

[36] [36]

Hart, and Nils J

Richard Fikes, Peter E. Hart, and Nils J. Nilsson. Learning and executing generalized robot plans.Artif. Intell., 3(1-3):251–288, 1972

work page 1972

[37] [37]

Richard Fikes and Nils J. Nilsson. STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving.Artif. Intell., 2(3/4):189–208, 1971

work page 1971

[38] [38]

McIlraith

Christian Fritz and Sheila A. McIlraith. Monitoring plan optimality during execution. In ICAPS, 2007

work page 2007

[39] [39]

Holladay, Beomjoon Kim, Tom Silver, Leslie Pack Kaelbling, and Tomás Lozano-Pérez

Caelan Reed Garrett, Rohan Chitnis, Rachel M. Holladay, Beomjoon Kim, Tom Silver, Leslie Pack Kaelbling, and Tomás Lozano-Pérez. Integrated task and motion planning.Annu. Rev. Control. Robotics Auton. Syst., 4:265–293, 2021

work page 2021

[40] [40]

Synthesis Lectures on Artificial Intelligence and Machine Learning

Hector Geffner and Blai Bonet.A Concise Introduction to Models and Methods for Automated Planning. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers, 2013

work page 2013

[41] [41]

Nau, and Paolo Traverso.Automated planning - theory and practice

Malik Ghallab, Dana S. Nau, and Paolo Traverso.Automated planning - theory and practice. Elsevier, 2004

work page 2004

[42] [42]

Schoenholz, Patrick F

Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George E. Dahl. Neural message passing for quantum chemistry. InICML, 2017

work page 2017

[43] [43]

A theory of abstraction.Artif

Fausto Giunchiglia and Toby Walsh. A theory of abstraction.Artif. Intell., 57:323–389, 1992. 12

work page 1992

[44] [44]

Cordell Green

C. Cordell Green. Application of theorem proving to problem solving. In Donald E. Walker and Lewis M. Norton, editors,IJCAI, pages 219–240. William Kaufmann, 1969

work page 1969

[45] [45]

Exploiting first-order regression in inductive policy selection

Charles Gretton and Sylvie Thiébaux. Exploiting first-order regression in inductive policy selection. InUAI, 2004

work page 2004

[46] [46]

INTERPRET: interactive predicate learning from language feedback for generalizable task planning

Muzhi Han, Yifeng Zhu, Song-Chun Zhu, Ying Nian Wu, and Yuke Zhu. INTERPRET: interactive predicate learning from language feedback for generalizable task planning. InRSS, 2024

work page 2024

[47] [47]

Entity-centric reinforcement learning for object manipulation from pixels

Dan Haramati, Tal Daniel, and Aviv Tamar. Entity-centric reinforcement learning for object manipulation from pixels. InICLR, 2024

work page 2024

[48] [48]

Patrik Haslum, Nir Lipovetzky, Daniele Magazzeni, and Christian Muise.An Introduction to the Planning Domain Definition Language. 2019

work page 2019

[49] [49]

Dean, and Craig Boutilier

Milos Hauskrecht, Nicolas Meuleau, Leslie Pack Kaelbling, Thomas L. Dean, and Craig Boutilier. Hierarchical solution of Markov decision processes using macro-actions. InUAI, 1998

work page 1998

[50] [50]

The fast downward planning system.J

Malte Helmert. The fast downward planning system.J. Artif. Intell. Res., 26:191–246, 2006

work page 2006

[51] [51]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In NeurIPS, 2020

work page 2020

[52] [52]

Training compute-optimal large language models

Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, DDL Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, et al. Training compute-optimal large language models. InNeurIPS, 2022

work page 2022

[53] [53]

Look before you leap: Unveiling the power of gpt-4v in robotic vision- language planning,

Yingdong Hu, Fanqi Lin, Tong Zhang, Li Yi, and Yang Gao. Look before you leap: Unveiling the power of GPT-4V in robotic vision-language planning.CoRR, abs/2311.17842, 2023

work page arXiv 2023

[54] [54]

Automated planning domain inference for task and motion planning

Jinbang Huang, Allen Tao, Rozilyn Marco, Miroslav Bogdanovic, Jonathan Kelly, and Florian Shkurti. Automated planning domain inference for task and motion planning. InICRA, 2025

work page 2025

[55] [55]

Inner monologue: Embodied reasoning through planning with language models

Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan Tompson, Igor Mordatch, Yevgen Chebotar, Pierre Sermanet, Tomas Jackson, Noah Brown, Linda Luu, Sergey Levine, Karol Hausman, and Brian Ichter. Inner monologue: Embodied reasoning through planning with language models. InCoRL, 2022

work page 2022

[56] [56]

McIlraith

León Illanes, Xi Yan, Rodrigo Toro Icarte, and Sheila A. McIlraith. Symbolic plans as high-level instructions for reinforcement learning. InICAPS, 2020

work page 2020

[57] [57]

Physical Intelligence, Bo Ai, Ali Amin, Raichelle Aniceto, Ashwin Balakrishna, Greg Balke, Kevin Black, George Bokinsky, Shihao Cao, Thomas Charbonnier, Vedant Choudhary, Foster Collins, Ken Conley, Grace Connors, James Darpinian, Karan Dhabalia, Maitrayee Dhaka, Jared DiCarlo, Danny Driess, Michael Equi, Adnan Esmail, Yunhao Fang, Chelsea Finn, Catherine...

work page internal anchor Pith review Pith/arXiv arXiv 2026

[58] [58]

Physical Intelligence, Kevin Black, Noah Brown, James Darpinian, Karan Dhabalia, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, Manuel Y . Galliker, Dibya Ghosh, Lachy Groom, Karol Hausman, Brian Ichter, Szymon Jakubczak, Tim Jones, Liyiming Ke, Devin LeBlanc, Sergey Levine, Adrian Li-Bell, Mohith Mothukuri, Suraj Nair, Karl Pertsc...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[59] [59]

Deepltl: Learning to efficiently satisfy complex LTL specifications for multi-task RL

Mathias Jackermeier and Alessandro Abate. Deepltl: Learning to efficiently satisfy complex LTL specifications for multi-task RL. InICLR, 2025

work page 2025

[60] [60]

Autonomous learning of object- centric abstractions for high-level planning

Steven James, Benjamin Rosman, and George Konidaris. Autonomous learning of object- centric abstractions for high-level planning. InICLR, 2022

work page 2022

[61] [61]

Position: Llms can’t plan, but can help planning in llm-modulo frameworks

Subbarao Kambhampati, Karthik Valmeekam, Lin Guan, Mudit Verma, Kaya Stechly, Siddhant Bhambri, Lucas Saldyt, and Anil Murthy. Position: Llms can’t plan, but can help planning in llm-modulo frameworks. InICML, 2024

work page 2024

[62] [62]

Learning to search in task and motion planning with streams.IEEE Robotics Autom

Mohamed Khodeir, Ben Agro, and Florian Shkurti. Learning to search in task and motion planning with streams.IEEE Robotics Autom. Lett., 8(4):1983–1990, 2023

work page 1983

[63] [63]

Learning to guide task and motion planning using score-space representation.Int

Beomjoon Kim, Zi Wang, Leslie Pack Kaelbling, and Tomás Lozano-Pérez. Learning to guide task and motion planning using score-space representation.Int. J. Robotics Res., 38(7), 2019

work page 2019

[64] [64]

Sanketi, Quan Vuong, Thomas Kollar, Benjamin Burchfiel, Russ Tedrake, Dorsa Sadigh, Sergey Levine, Percy Liang, and Chelsea Finn

Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan Paul Foster, Pannag R. Sanketi, Quan Vuong, Thomas Kollar, Benjamin Burchfiel, Russ Tedrake, Dorsa Sadigh, Sergey Levine, Percy Liang, and Chelsea Finn. Openvla: An open-source vision-language-action model. InCoRL, 2024

work page 2024

[65] [65]

Kingma and Jimmy Ba

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InICLR, 2015

work page 2015

[66] [66]

Kipf, Elise van der Pol, and Max Welling

Thomas N. Kipf, Elise van der Pol, and Max Welling. Contrastive learning of structured world models. InICLR, 2020

work page 2020

[67] [67]

Kipf and Max Welling

Thomas N. Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. InICLR, 2017

work page 2017

[68] [68]

On the necessity of abstraction.Current Opinion in Behavioral Sciences, 29:1–7, 2019

George Konidaris. On the necessity of abstraction.Current Opinion in Behavioral Sciences, 29:1–7, 2019. Artificial Intelligence

work page 2019

[69] [69]

George Dimitri Konidaris and Andrew G. Barto. Efficient skill learning using abstraction selection. InIJCAI, 2009

work page 2009

[70] [70]

Constructing symbolic representations for high-level planning

George Dimitri Konidaris, Leslie Pack Kaelbling, and Tomás Lozano-Pérez. Constructing symbolic representations for high-level planning. InAAAI, 2014

work page 2014

[71] [71]

From skills to symbols: Learning symbolic representations for abstract high-level planning.J

George Dimitri Konidaris, Leslie Pack Kaelbling, and Tomás Lozano-Pérez. From skills to symbols: Learning symbolic representations for abstract high-level planning.J. Artif. Intell. Res., 61:215–289, 2018

work page 2018

[72] [72]

Richard E. Korf. Planning as search: A quantitative approach.Artif. Intell., 33(1):65–88, 1987

work page 1987

[73] [73]

Nishanth Kumar, Tom Silver, Willie McClinton, Linfeng Zhao, Stephen Proulx, Tomás Lozano- Pérez, Leslie Pack Kaelbling, and Jennifer L. Barry. Practice makes perfect: Planning to learning skill parameter policies. InRSS, 2024

work page 2024

[74] [74]

Hierarchical imitation and reinforcement learning

Hoang Minh Le, Nan Jiang, Alekh Agarwal, Miroslav Dudík, Yisong Yue, and Hal Daumé III. Hierarchical imitation and reinforcement learning. InICML, 2018

work page 2018

[75] [75]

Li, Toryn Q

Andrew C. Li, Toryn Q. Klassen, Andrew Wang, Parand A. Alamdari, and Sheila A. McIlraith. Ground-compose-reinforce: Grounding language inagentic behaviours using limited data. In NeurIPS, 2025. 14

work page 2025

[76] [76]

Scherer, and Alexander G

Bowen Li, Tom Silver, Sebastian A. Scherer, and Alexander G. Gray. Bilevel learning for bilevel planning. InRSS, 2025

work page 2025

[77] [77]

Walsh, and Michael L

Lihong Li, Thomas J. Walsh, and Michael L. Littman. Towards a unified theory of state abstraction for mdps. InInternational Symposium on Artificial Intelligence and Mathematics, 2006

work page 2006

[78] [78]

Reinforcement learning with temporal logic rewards

Xiao Li, Cristian Ioan Vasile, and Calin Belta. Reinforcement learning with temporal logic rewards. InIROS, 2017

work page 2017

[79] [79]

Tenenbaum, Tom Silver, João F

Yichao Liang, Nishanth Kumar, Hao Tang, Adrian Weller, Joshua B. Tenenbaum, Tom Silver, João F. Henriques, and Kevin Ellis. Visualpredicator: Learning abstract world models with neuro-symbolic predicates for robot planning. InICLR, 2025

work page 2025

[80] [80]

Tenenbaum, Carl Edward Rasmussen, Adrian Weller, Zenna Tavares, Tom Silver, and Kevin Ellis

Yichao Liang, Dat Nguyen, Cambridge Yang, Tianyang Li, Joshua B. Tenenbaum, Carl Edward Rasmussen, Adrian Weller, Zenna Tavares, Tom Silver, and Kevin Ellis. Exopredicator: Learning abstract models of dynamic worlds for robot planning. InICLR, 2026

work page 2026