pith. sign in

arxiv: 2605.15975 · v2 · pith:ISPZPM4Jnew · submitted 2026-05-15 · 💻 cs.AI · cs.RO

Learning Bilevel Policies over Symbolic World Models for Long-Horizon Planning

Pith reviewed 2026-05-20 18:29 UTC · model grok-4.3

classification 💻 cs.AI cs.RO
keywords bilevel policiessymbolic abstractionslong-horizon planningimitation learningembodied AIMetaWorld benchmarksinductive generalisation
0
0 comments X

The pith

Bilevel policies pair symbolic high-level abstraction with learned low-level control to solve long-horizon embodied tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to show that high-level symbolic policies, built from abstractions of low-level demonstrations via inductive generalisation, can be paired with a neural low-level policy to enable reliable long-horizon planning. This matters because pure imitation learning struggles to generate extended sequences while symbolic methods bring efficiency and scalability. If correct, the result is agents that handle many more objects and longer tasks than end-to-end or VLA baselines allow, with clear gains in training and inference speed.

Core claim

Bilevel policies of the form (π^hl, π^ll) are constructed so that the high-level symbolic policy is derived from symbolic abstractions of low-level demonstrations combined with inductive generalisation; the low-level component is a neural policy learned from demonstrations. Realised in the BISON system, this structure generalises to long horizons and greater object counts than VLA or end-to-end methods and is more time- and memory-efficient, with high-level policies solving problems involving 10,000 relevant objects in under a minute when low-level execution is ignored.

What carries the argument

Bilevel policy (π^hl, π^ll) in which the high-level symbolic component operates over a symbolic world model abstracted from demonstrations and extended by inductive generalisation, while the low-level component is a neural controller trained by imitation.

If this is right

  • The approach generalises to long horizons and problems with greater numbers of objects than those solved by VLA and end-to-end methods.
  • Training and inference are more time and memory efficient than baselines.
  • High-level policies alone can solve problems with 10,000 relevant objects in under a minute.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same abstraction-plus-generalisation route might be applied to other continuous domains such as navigation or multi-robot coordination.
  • Learned symbolic policies could reduce dependence on hand-designed features across a wider range of planning problems.
  • Physical-robot experiments would test whether the abstracted policies remain robust under sensor noise and execution uncertainty.

Load-bearing premise

Symbolic abstractions extracted from low-level demonstrations, combined with inductive generalisation, suffice to construct a high-level policy that preserves all planning-relevant structure without manual feature engineering or loss of critical constraints.

What would settle it

A case in which the derived high-level policy produces incomplete or invalid plans on a long-horizon task with many objects because critical constraints were lost during abstraction or generalisation would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.15975 by Dillon Z. Chen, Sheila A. McIlraith, Till Hofmann, Toryn Q. Klassen.

Figure 1
Figure 1. Figure 1: Top Left – inputs for learning and executing bilevel policies: a domain theory D, a labelling function L that maps observations to state abstractions, and LL demos with HL goals. Bottom Left – bilevel policy learning: LL demos induce HL demos via L, and LL/HL policies are separately learned from LL/HL demos. Right – bilevel policy execution: state abstractions s hl are computed from observations s ll via L… view at source ↗
Figure 2
Figure 2. Figure 2: The HL policy π hl(a hl | s hl , g hl) learning process. Step 1: we use the labelling function L to construct HL traces from the LL demonstrations paired with HL goals. Step 2: we utilise goal regression to extract condition-action rules from the HL traces and goals (underlined). Step 3: we inductively generalise the rules by replacing objects with variables to produce symbolic policies. 4.1 Learning HL Po… view at source ↗
Figure 3
Figure 3. Figure 3: The LL policy π ll(a ll | s ll , a hl , g hl) represented by a GNN. In this example, the input action is a hl = pick(obj, loc), and the resulting output is a ll. Solid lines represent graph edges, and dashed lines represent how information is passed. Bold font indicates Euclidean vectors. Theorem 1. Let D = ⟨P, A⟩ be an HL domain, L a labelling function, and C ∈ N. There exists a finite dataset T such that… view at source ↗
Figure 4
Figure 4. Figure 4: Median (line) and range (shaded) of success rate ( [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Success rate (↑) vs. time (↓) for training and infer￾ence with 10 objects. (Q4) Is BISON more efficient than (re)planning and end-to￾end approaches? From [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: HL solving time (↓) vs. number of objects. (Q5) Do learned HL policies generalise to arbitrary numbers of objects? We answer this question by inspecting the learned policies and evaluating their performance on HL planning problems over numbers of objects. Learned HL policies in BISON are symbolic and hence can be interpreted manually or by an LLM to generalise over arbitrary numbers of objects, as displaye… view at source ↗
Figure 7
Figure 7. Figure 7: 1https://pytorch-geometric.readthedocs.io/en/latest/tutorial/heterogeneous.html 24 [PITH_FULL_IMAGE:figures/full_fig_p024_7.png] view at source ↗
Figure 7
Figure 7. Figure 7: Visualisations of benchmark environments. [PITH_FULL_IMAGE:figures/full_fig_p025_7.png] view at source ↗
read the original abstract

We tackle the challenge of building embodied AI agents that can reliably solve long-horizon planning problems. Imitation learning from demonstrations has shown itself to be effective in training robots to solve a diversity of complex tasks requiring fine motor control and manipulation over low-level (LL), continuous environments. Yet, it remains a difficult endeavour to generate long-horizon plans from imitation learning alone. In contrast, high-level (HL), symbolic abstractions facilitate efficient and interpretable long-horizon planning. We propose to combine the strengths of LL imitation learning for manipulation and control, and HL symbolic abstractions for long-horizon planning. We realise this idea via \emph{bilevel policies} of the form $(\pi^{\mathrm{hl}}, \pi^{\mathrm{ll}})$, consisting of a neural policy $\pi^{\mathrm{ll}}$ learned from LL demonstrations, and an HL symbolic policy $\pi^{\mathrm{hl}}$ that is constructed from symbolic abstractions of the LL demonstrations combined with inductive generalisation. We implement these ideas in the BISON system. Experiments on extended MetaWorld benchmarks demonstrate that BISON generalises to long horizons and problems with greater numbers of objects than those solved by VLA and end-to-end methods, and is more time and memory efficient in training and inference. Notably, when ignoring LL execution, BISON's HL policies can solve HL problems with 10,000 relevant objects in under a minute. Project page: https://dillonzchen.github.io/bison

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes BISON, a bilevel policy framework for long-horizon embodied planning that pairs a neural low-level policy learned via imitation from demonstrations with a high-level symbolic policy constructed by extracting symbolic abstractions from those demonstrations and applying inductive generalization. Experiments on extended MetaWorld benchmarks claim that BISON generalizes to longer horizons and instances with more objects than VLA or end-to-end baselines, while being more efficient in training and inference; notably, the HL symbolic component solves problems with 10,000 objects in under a minute when LL execution is ignored.

Significance. If the central claims hold, the work would demonstrate a practical way to combine the scalability of symbolic planning with the robustness of neural control for manipulation, offering efficiency and generalization advantages on long-horizon tasks without manual feature engineering. The bilevel separation and use of inductive generalization over predicates are promising directions, though the absence of formal guarantees on abstraction fidelity limits immediate impact.

major comments (2)
  1. [Experiments] Experiments section: the abstract and reported benchmark gains provide no details on statistical tests, number of runs, variance, data exclusion rules, or exact baseline implementations; this leaves the generalization claims to long horizons and greater object counts resting on high-level description only.
  2. [Abstraction pipeline] Abstraction pipeline (Section 3): no formal soundness argument or exhaustive executability audit is described for the symbolic abstraction extraction step; without this, it is unclear whether continuous constraints such as grasp stability under varying masses or metric reachability are preserved when the HL policy is applied to the neural LL executor.
minor comments (2)
  1. [Preliminaries] Notation: the bilevel policy is written as (π^hl, π^ll); explicitly state whether π^hl is a policy over predicates or a planner that invokes the LL policy at each step.
  2. [Figures] Figure clarity: ensure that diagrams of the abstraction extraction and inductive generalization steps label all inputs/outputs and distinguish learned versus hand-specified components.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and indicate the revisions we will make to improve clarity and rigor.

read point-by-point responses
  1. Referee: [Experiments] Experiments section: the abstract and reported benchmark gains provide no details on statistical tests, number of runs, variance, data exclusion rules, or exact baseline implementations; this leaves the generalization claims to long horizons and greater object counts resting on high-level description only.

    Authors: We agree that the current presentation of results lacks sufficient methodological detail for full reproducibility and assessment of the generalization claims. In the revised manuscript we will expand the Experiments section to report the number of independent runs (five random seeds per method and task), mean success rates with standard deviations, the statistical tests performed (paired t-tests with p-values), data exclusion rules (none applied beyond the standard success criterion of task completion within the horizon), and precise implementation details for all baselines including VLA and end-to-end architectures, training hyperparameters, and evaluation protocols. revision: yes

  2. Referee: [Abstraction pipeline] Abstraction pipeline (Section 3): no formal soundness argument or exhaustive executability audit is described for the symbolic abstraction extraction step; without this, it is unclear whether continuous constraints such as grasp stability under varying masses or metric reachability are preserved when the HL policy is applied to the neural LL executor.

    Authors: We acknowledge that the manuscript does not supply a formal soundness proof for the abstraction extraction procedure. The predicates are induced directly from successful low-level demonstration trajectories, and the neural low-level policy is trained via imitation learning to realize them in the continuous domain; task success rates in our benchmarks provide empirical evidence that grasp stability and reachability are handled adequately by the combined bilevel system. In the revision we will add an explicit discussion subsection in Section 3 that describes the extraction heuristics, reports observed failure modes related to continuous constraints, and states the absence of formal guarantees as a limitation. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical claims rest on external benchmarks and independent comparisons

full rationale

The paper presents a system proposal for bilevel policies (neural LL policy from demonstrations plus HL symbolic policy from abstractions and inductive generalization) and supports its claims via experimental results on extended MetaWorld benchmarks against VLA and end-to-end baselines. No mathematical derivation chain, equations, or fitted parameters are described that reduce by construction to the paper's own inputs. Evaluation relies on external benchmark performance metrics (time, memory, generalization to long horizons and 10k objects), which are falsifiable outside the system and do not invoke self-citation chains or self-definitional loops for the central results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim depends on the domain assumption that demonstration-derived symbolic abstractions plus inductive generalisation yield a complete and sound high-level policy; no free parameters or invented physical entities are described in the abstract.

axioms (1)
  • domain assumption Symbolic abstractions extracted from low-level demonstrations combined with inductive generalisation produce an effective high-level policy for long-horizon planning.
    Invoked to justify automatic construction of π^hl without manual engineering.
invented entities (1)
  • Bilevel policy (π^hl, π^ll) no independent evidence
    purpose: To separate symbolic long-horizon planning from neural low-level execution.
    Core architectural construct introduced by the BISON system.

pith-pipeline@v0.9.0 · 5804 in / 1362 out tokens · 110063 ms · 2026-05-20T18:29:01.504079+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

153 extracted references · 153 canonical work pages · 7 internal anchors

  1. [1]

    David Abel, Dilip Arumugam, Lucas Lehnert, and Michael L. Littman. State abstractions for lifelong reinforcement learning. InICML, 2018

  2. [2]

    Ellis Hershkowitz, and Michael L

    David Abel, D. Ellis Hershkowitz, and Michael L. Littman. Near optimal behavior via approximate state abstraction. InICML, 2016

  3. [3]

    Tenenbaum, Christopher Bates, and Samuel J

    Zergham Ahmed, Joshua B. Tenenbaum, Christopher Bates, and Samuel J. Gershman. Synthe- sizing world models for bilevel planning.Trans. Mach. Learn. Res., 2025, 2025

  4. [4]

    Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alexander Herzog, Daniel Ho, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Eric Jang, Rosario Jauregui Ruano, Kyle Jeffrey, Sally Jesmonth, Nikhil J. Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Kuang-H...

  5. [5]

    De- von Hjelm

    Ankesh Anand, Evan Racah, Sherjil Ozair, Yoshua Bengio, Marc-Alexandre Côté, and R. De- von Hjelm. Unsupervised state representation learning in Atari. InNeurIPS, 2019

  6. [6]

    Classical planning in deep latent space: Bridging the subsymbolic-symbolic boundary

    Masataro Asai and Alex Fukunaga. Classical planning in deep latent space: Bridging the subsymbolic-symbolic boundary. InAAAI, 2018. 10

  7. [7]

    Classical planning in deep latent space.J

    Masataro Asai, Hiroshi Kajino, Alex Fukunaga, and Christian Muise. Classical planning in deep latent space.J. Artif. Intell. Res., 74:1599–1686, 2022

  8. [8]

    From pixels to predicates: Learning symbolic world models via pretrained VLMs.IEEE Robotics and Automation Letters, 11(4):4002–4009, 2026

    Ashay Athalye, Nishanth Kumar, Tom Silver, Yichao Liang, Jiuguang Wang, Tomás Lozano- Pérez, and Leslie Pack Kaelbling. From pixels to predicates: Learning symbolic world models via pretrained VLMs.IEEE Robotics and Automation Letters, 11(4):4002–4009, 2026

  9. [9]

    Downward refinement and the efficiency of hierarchical problem solving.Artif

    Fahiem Bacchus and Qiang Yang. Downward refinement and the efficiency of hierarchical problem solving.Artif. Intell., 71(1):43–100, 1994

  10. [10]

    A survey on hierarchical planning - one abstract idea, many concrete realizations

    Pascal Bercher, Ron Alford, and Daniel Höller. A survey on hierarchical planning - one abstract idea, many concrete realizations. InIJCAI, 2019

  11. [11]

    Planning for temporally extended goals in pure-past linear temporal logic.Artif

    Luigi Bonassi, Giuseppe De Giacomo, Marco Favorito, Francesco Fuggitti, Alfonso Emilio Gerevini, and Enrico Scala. Planning for temporally extended goals in pure-past linear temporal logic.Artif. Intell., 348:104409, 2025

  12. [12]

    Learning to predict action feasibility for task and motion planning in 3d environments

    Smail Ait Bouhsain, Rachid Alami, and Thierry Siméon. Learning to predict action feasibility for task and motion planning in 3d environments. InICRA, 2023

  13. [13]

    Learning geometric reasoning networks for robot task and motion planning

    Smail Ait Bouhsain, Rachid Alami, and Thierry Siméon. Learning geometric reasoning networks for robot task and motion planning. InICLR, 2025

  14. [14]

    Using abstractions for decision-theoretic planning with time constraints

    Craig Boutilier and Richard Dearden. Using abstractions for decision-theoretic planning with time constraints. InAAAI, 1994

  15. [15]

    Zavlanos, and Miroslav Pajic

    Alper Kamil Bozkurt, Yu Wang, Michael M. Zavlanos, and Miroslav Pajic. Control synthesis from linear temporal logic specifications using model-free reinforcement learning. InICRA, 2020

  16. [16]

    MONet: Unsupervised Scene Decomposition and Representation

    Christopher P. Burgess, Loïc Matthey, Nicholas Watters, Rishabh Kabra, Irina Higgins, Matthew M. Botvinick, and Alexander Lerchner. MONet: Unsupervised scene decomposition and representation.CoRR, abs/1901.11390, 2019

  17. [17]

    Klassen, Richard Anthony Valenzano, and Sheila A

    Alberto Camacho, Rodrigo Toro Icarte, Toryn Q. Klassen, Richard Anthony Valenzano, and Sheila A. McIlraith. LTL and beyond: Formal languages for reward function specification in reinforcement learning. InIJCAI, 2019

  18. [18]

    Diffusion policy: Visuomotor policy learning via action diffusion

    Cheng Chi, Siyuan Feng, Yilun Du, Zhenjia Xu, Eric Cousineau, Benjamin Burchfiel, and Shuran Song. Diffusion policy: Visuomotor policy learning via action diffusion. InRSS, 2023

  19. [19]

    Guided search for task and motion plans using learned heuristics

    Rohan Chitnis, Dylan Hadfield-Menell, Abhishek Gupta, Siddharth Srivastava, Edward Gro- shev, Christopher Lin, and Pieter Abbeel. Guided search for task and motion plans using learned heuristics. InICRA, 2016

  20. [20]

    Tenenbaum, Tomás Lozano-Pérez, and Leslie Pack Kaelbling

    Rohan Chitnis, Tom Silver, Joshua B. Tenenbaum, Tomás Lozano-Pérez, and Leslie Pack Kaelbling. Learning neuro-symbolic relational transition models for bilevel planning. InIROS, 2022

  21. [21]

    Learning long-horizon action dependencies in sampling-based bilevel planning

    Bartlomiej Cieslar, Leslie Pack Kaelbling, Tomás Lozano-Pérez, and Jorge Mendez-Mendez. Learning long-horizon action dependencies in sampling-based bilevel planning. InCoRL, 2024

  22. [22]

    Corrêa, Florian Pommerening, Malte Helmert, and Guillem Francès

    Augusto B. Corrêa, Florian Pommerening, Malte Helmert, and Guillem Francès. Lifted successor generation using query optimization techniques. InICAPS, 2020

  23. [23]

    Long-horizon manipulation of unknown objects via task and motion planning with estimated affordances

    Aidan Curtis, Xiaolin Fang, Leslie Pack Kaelbling, Tomás Lozano-Pérez, and Caelan Reed Garrett. Long-horizon manipulation of unknown objects via task and motion planning with estimated affordances. InICRA, 2022

  24. [24]

    Tenenbaum, Tomás Lozano-Pérez, and Leslie Pack Kaelbling

    Aidan Curtis, George Matheos, Nishad Gothoskar, Vikash Mansinghka, Joshua B. Tenenbaum, Tomás Lozano-Pérez, and Leslie Pack Kaelbling. Partially observable task and motion planning with uncertainty and risk awareness. InRSS, 2024. 11

  25. [25]

    Tenenbaum, Tomás Lozano-Pérez, and Leslie Pack Kaelbling

    Aidan Curtis, Tom Silver, Joshua B. Tenenbaum, Tomás Lozano-Pérez, and Leslie Pack Kaelbling. Discovering state and action abstractions for generalized task and motion planning. InAAAI, 2022

  26. [26]

    Dantam, Zachary K

    Neil T. Dantam, Zachary K. Kingston, Swarat Chaudhuri, and Lydia E. Kavraki. An incremen- tal constraint-based framework for task and motion planning.Int. J. Robotics Res., 37(10), 2018

  27. [27]

    Peter Dayan and Geoffrey E. Hinton. Feudal reinforcement learning. InNeurIPS, 1992

  28. [28]

    Dean and Robert Givan

    Thomas L. Dean and Robert Givan. Model minimization in Markov decision processes. In AAAI, 1997

  29. [29]

    Dietterich

    Thomas G. Dietterich. Hierarchical reinforcement learning with the MAXQ value function decomposition.J. Artif. Intell. Res., 13:227–303, 2000

  30. [30]

    Deep visual reasoning: Learning to predict action sequences for task and motion planning from an initial scene image

    Danny Driess, Jung-Su Ha, and Marc Toussaint. Deep visual reasoning: Learning to predict action sequences for task and motion planning from an initial scene image. InRSS, 2020

  31. [31]

    Oguz, Jung-Su Ha, and Marc Toussaint

    Danny Driess, Ozgur S. Oguz, Jung-Su Ha, and Marc Toussaint. Deep visual heuristics: Learning feasibility of mixed-integer programs for manipulation planning. InICRA, 2020

  32. [32]

    Danny Driess, Fei Xia, Mehdi S. M. Sajjadi, Corey Lynch, Aakanksha Chowdhery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, Wenlong Huang, Yevgen Chebotar, Pierre Sermanet, Daniel Duckworth, Sergey Levine, Vincent Vanhoucke, Karol Hausman, Marc Toussaint, Klaus Greff, Andy Zeng, Igor Mordatch, and Pete Florence. Palm-e: An embodied ...

  33. [33]

    Fast task planning with neuro-symbolic relaxation.IEEE Robotics Autom

    Qiwei Du, Bowen Li, Yi Du, Shaoshu Su, Taimeng Fu, Zitong Zhan, Zhipeng Zhao, and Chen Wang. Fast task planning with neuro-symbolic relaxation.IEEE Robotics Autom. Lett., 11(3):3684–3691, 2026

  34. [34]

    Hwang, Soumya Sanyal, Xiang Ren, Allyson Ettinger, Zaïd Harchaoui, and Yejin Choi

    Nouha Dziri, Ximing Lu, Melanie Sclar, Xiang Lorraine Li, Liwei Jiang, Bill Yuchen Lin, Sean Welleck, Peter West, Chandra Bhagavatula, Ronan Le Bras, Jena D. Hwang, Soumya Sanyal, Xiang Ren, Allyson Ettinger, Zaïd Harchaoui, and Yejin Choi. Faith and fate: Limits of transformers on compositionality. InNeurIPS, 2023

  35. [35]

    Hendler, and Dana S

    Kutluhan Erol, James A. Hendler, and Dana S. Nau. Complexity results for HTN planning. Ann. Math. Artif. Intell., 18(1):69–93, 1996

  36. [36]

    Hart, and Nils J

    Richard Fikes, Peter E. Hart, and Nils J. Nilsson. Learning and executing generalized robot plans.Artif. Intell., 3(1-3):251–288, 1972

  37. [37]

    Richard Fikes and Nils J. Nilsson. STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving.Artif. Intell., 2(3/4):189–208, 1971

  38. [38]

    McIlraith

    Christian Fritz and Sheila A. McIlraith. Monitoring plan optimality during execution. In ICAPS, 2007

  39. [39]

    Holladay, Beomjoon Kim, Tom Silver, Leslie Pack Kaelbling, and Tomás Lozano-Pérez

    Caelan Reed Garrett, Rohan Chitnis, Rachel M. Holladay, Beomjoon Kim, Tom Silver, Leslie Pack Kaelbling, and Tomás Lozano-Pérez. Integrated task and motion planning.Annu. Rev. Control. Robotics Auton. Syst., 4:265–293, 2021

  40. [40]

    Synthesis Lectures on Artificial Intelligence and Machine Learning

    Hector Geffner and Blai Bonet.A Concise Introduction to Models and Methods for Automated Planning. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers, 2013

  41. [41]

    Nau, and Paolo Traverso.Automated planning - theory and practice

    Malik Ghallab, Dana S. Nau, and Paolo Traverso.Automated planning - theory and practice. Elsevier, 2004

  42. [42]

    Schoenholz, Patrick F

    Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George E. Dahl. Neural message passing for quantum chemistry. InICML, 2017

  43. [43]

    A theory of abstraction.Artif

    Fausto Giunchiglia and Toby Walsh. A theory of abstraction.Artif. Intell., 57:323–389, 1992. 12

  44. [44]

    Cordell Green

    C. Cordell Green. Application of theorem proving to problem solving. In Donald E. Walker and Lewis M. Norton, editors,IJCAI, pages 219–240. William Kaufmann, 1969

  45. [45]

    Exploiting first-order regression in inductive policy selection

    Charles Gretton and Sylvie Thiébaux. Exploiting first-order regression in inductive policy selection. InUAI, 2004

  46. [46]

    INTERPRET: interactive predicate learning from language feedback for generalizable task planning

    Muzhi Han, Yifeng Zhu, Song-Chun Zhu, Ying Nian Wu, and Yuke Zhu. INTERPRET: interactive predicate learning from language feedback for generalizable task planning. InRSS, 2024

  47. [47]

    Entity-centric reinforcement learning for object manipulation from pixels

    Dan Haramati, Tal Daniel, and Aviv Tamar. Entity-centric reinforcement learning for object manipulation from pixels. InICLR, 2024

  48. [48]

    Patrik Haslum, Nir Lipovetzky, Daniele Magazzeni, and Christian Muise.An Introduction to the Planning Domain Definition Language. 2019

  49. [49]

    Dean, and Craig Boutilier

    Milos Hauskrecht, Nicolas Meuleau, Leslie Pack Kaelbling, Thomas L. Dean, and Craig Boutilier. Hierarchical solution of Markov decision processes using macro-actions. InUAI, 1998

  50. [50]

    The fast downward planning system.J

    Malte Helmert. The fast downward planning system.J. Artif. Intell. Res., 26:191–246, 2006

  51. [51]

    Denoising diffusion probabilistic models

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In NeurIPS, 2020

  52. [52]

    Training compute-optimal large language models

    Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, DDL Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, et al. Training compute-optimal large language models. InNeurIPS, 2022

  53. [53]

    Look before you leap: Unveiling the power of gpt-4v in robotic vision- language planning,

    Yingdong Hu, Fanqi Lin, Tong Zhang, Li Yi, and Yang Gao. Look before you leap: Unveiling the power of GPT-4V in robotic vision-language planning.CoRR, abs/2311.17842, 2023

  54. [54]

    Automated planning domain inference for task and motion planning

    Jinbang Huang, Allen Tao, Rozilyn Marco, Miroslav Bogdanovic, Jonathan Kelly, and Florian Shkurti. Automated planning domain inference for task and motion planning. InICRA, 2025

  55. [55]

    Inner monologue: Embodied reasoning through planning with language models

    Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan Tompson, Igor Mordatch, Yevgen Chebotar, Pierre Sermanet, Tomas Jackson, Noah Brown, Linda Luu, Sergey Levine, Karol Hausman, and Brian Ichter. Inner monologue: Embodied reasoning through planning with language models. InCoRL, 2022

  56. [56]

    McIlraith

    León Illanes, Xi Yan, Rodrigo Toro Icarte, and Sheila A. McIlraith. Symbolic plans as high-level instructions for reinforcement learning. InICAPS, 2020

  57. [57]

    Physical Intelligence, Bo Ai, Ali Amin, Raichelle Aniceto, Ashwin Balakrishna, Greg Balke, Kevin Black, George Bokinsky, Shihao Cao, Thomas Charbonnier, Vedant Choudhary, Foster Collins, Ken Conley, Grace Connors, James Darpinian, Karan Dhabalia, Maitrayee Dhaka, Jared DiCarlo, Danny Driess, Michael Equi, Adnan Esmail, Yunhao Fang, Chelsea Finn, Catherine...

  58. [58]

    Physical Intelligence, Kevin Black, Noah Brown, James Darpinian, Karan Dhabalia, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, Manuel Y . Galliker, Dibya Ghosh, Lachy Groom, Karol Hausman, Brian Ichter, Szymon Jakubczak, Tim Jones, Liyiming Ke, Devin LeBlanc, Sergey Levine, Adrian Li-Bell, Mohith Mothukuri, Suraj Nair, Karl Pertsc...

  59. [59]

    Deepltl: Learning to efficiently satisfy complex LTL specifications for multi-task RL

    Mathias Jackermeier and Alessandro Abate. Deepltl: Learning to efficiently satisfy complex LTL specifications for multi-task RL. InICLR, 2025

  60. [60]

    Autonomous learning of object- centric abstractions for high-level planning

    Steven James, Benjamin Rosman, and George Konidaris. Autonomous learning of object- centric abstractions for high-level planning. InICLR, 2022

  61. [61]

    Position: Llms can’t plan, but can help planning in llm-modulo frameworks

    Subbarao Kambhampati, Karthik Valmeekam, Lin Guan, Mudit Verma, Kaya Stechly, Siddhant Bhambri, Lucas Saldyt, and Anil Murthy. Position: Llms can’t plan, but can help planning in llm-modulo frameworks. InICML, 2024

  62. [62]

    Learning to search in task and motion planning with streams.IEEE Robotics Autom

    Mohamed Khodeir, Ben Agro, and Florian Shkurti. Learning to search in task and motion planning with streams.IEEE Robotics Autom. Lett., 8(4):1983–1990, 2023

  63. [63]

    Learning to guide task and motion planning using score-space representation.Int

    Beomjoon Kim, Zi Wang, Leslie Pack Kaelbling, and Tomás Lozano-Pérez. Learning to guide task and motion planning using score-space representation.Int. J. Robotics Res., 38(7), 2019

  64. [64]

    Sanketi, Quan Vuong, Thomas Kollar, Benjamin Burchfiel, Russ Tedrake, Dorsa Sadigh, Sergey Levine, Percy Liang, and Chelsea Finn

    Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan Paul Foster, Pannag R. Sanketi, Quan Vuong, Thomas Kollar, Benjamin Burchfiel, Russ Tedrake, Dorsa Sadigh, Sergey Levine, Percy Liang, and Chelsea Finn. Openvla: An open-source vision-language-action model. InCoRL, 2024

  65. [65]

    Kingma and Jimmy Ba

    Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InICLR, 2015

  66. [66]

    Kipf, Elise van der Pol, and Max Welling

    Thomas N. Kipf, Elise van der Pol, and Max Welling. Contrastive learning of structured world models. InICLR, 2020

  67. [67]

    Kipf and Max Welling

    Thomas N. Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. InICLR, 2017

  68. [68]

    On the necessity of abstraction.Current Opinion in Behavioral Sciences, 29:1–7, 2019

    George Konidaris. On the necessity of abstraction.Current Opinion in Behavioral Sciences, 29:1–7, 2019. Artificial Intelligence

  69. [69]

    George Dimitri Konidaris and Andrew G. Barto. Efficient skill learning using abstraction selection. InIJCAI, 2009

  70. [70]

    Constructing symbolic representations for high-level planning

    George Dimitri Konidaris, Leslie Pack Kaelbling, and Tomás Lozano-Pérez. Constructing symbolic representations for high-level planning. InAAAI, 2014

  71. [71]

    From skills to symbols: Learning symbolic representations for abstract high-level planning.J

    George Dimitri Konidaris, Leslie Pack Kaelbling, and Tomás Lozano-Pérez. From skills to symbols: Learning symbolic representations for abstract high-level planning.J. Artif. Intell. Res., 61:215–289, 2018

  72. [72]

    Richard E. Korf. Planning as search: A quantitative approach.Artif. Intell., 33(1):65–88, 1987

  73. [73]

    Nishanth Kumar, Tom Silver, Willie McClinton, Linfeng Zhao, Stephen Proulx, Tomás Lozano- Pérez, Leslie Pack Kaelbling, and Jennifer L. Barry. Practice makes perfect: Planning to learning skill parameter policies. InRSS, 2024

  74. [74]

    Hierarchical imitation and reinforcement learning

    Hoang Minh Le, Nan Jiang, Alekh Agarwal, Miroslav Dudík, Yisong Yue, and Hal Daumé III. Hierarchical imitation and reinforcement learning. InICML, 2018

  75. [75]

    Li, Toryn Q

    Andrew C. Li, Toryn Q. Klassen, Andrew Wang, Parand A. Alamdari, and Sheila A. McIlraith. Ground-compose-reinforce: Grounding language inagentic behaviours using limited data. In NeurIPS, 2025. 14

  76. [76]

    Scherer, and Alexander G

    Bowen Li, Tom Silver, Sebastian A. Scherer, and Alexander G. Gray. Bilevel learning for bilevel planning. InRSS, 2025

  77. [77]

    Walsh, and Michael L

    Lihong Li, Thomas J. Walsh, and Michael L. Littman. Towards a unified theory of state abstraction for mdps. InInternational Symposium on Artificial Intelligence and Mathematics, 2006

  78. [78]

    Reinforcement learning with temporal logic rewards

    Xiao Li, Cristian Ioan Vasile, and Calin Belta. Reinforcement learning with temporal logic rewards. InIROS, 2017

  79. [79]

    Tenenbaum, Tom Silver, João F

    Yichao Liang, Nishanth Kumar, Hao Tang, Adrian Weller, Joshua B. Tenenbaum, Tom Silver, João F. Henriques, and Kevin Ellis. Visualpredicator: Learning abstract world models with neuro-symbolic predicates for robot planning. InICLR, 2025

  80. [80]

    Tenenbaum, Carl Edward Rasmussen, Adrian Weller, Zenna Tavares, Tom Silver, and Kevin Ellis

    Yichao Liang, Dat Nguyen, Cambridge Yang, Tianyang Li, Joshua B. Tenenbaum, Carl Edward Rasmussen, Adrian Weller, Zenna Tavares, Tom Silver, and Kevin Ellis. Exopredicator: Learning abstract models of dynamic worlds for robot planning. InICLR, 2026

Showing first 80 references.