Learning Bilevel Policies over Symbolic World Models for Long-Horizon Planning
Pith reviewed 2026-05-20 18:29 UTC · model grok-4.3
The pith
Bilevel policies pair symbolic high-level abstraction with learned low-level control to solve long-horizon embodied tasks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Bilevel policies of the form (π^hl, π^ll) are constructed so that the high-level symbolic policy is derived from symbolic abstractions of low-level demonstrations combined with inductive generalisation; the low-level component is a neural policy learned from demonstrations. Realised in the BISON system, this structure generalises to long horizons and greater object counts than VLA or end-to-end methods and is more time- and memory-efficient, with high-level policies solving problems involving 10,000 relevant objects in under a minute when low-level execution is ignored.
What carries the argument
Bilevel policy (π^hl, π^ll) in which the high-level symbolic component operates over a symbolic world model abstracted from demonstrations and extended by inductive generalisation, while the low-level component is a neural controller trained by imitation.
If this is right
- The approach generalises to long horizons and problems with greater numbers of objects than those solved by VLA and end-to-end methods.
- Training and inference are more time and memory efficient than baselines.
- High-level policies alone can solve problems with 10,000 relevant objects in under a minute.
Where Pith is reading between the lines
- The same abstraction-plus-generalisation route might be applied to other continuous domains such as navigation or multi-robot coordination.
- Learned symbolic policies could reduce dependence on hand-designed features across a wider range of planning problems.
- Physical-robot experiments would test whether the abstracted policies remain robust under sensor noise and execution uncertainty.
Load-bearing premise
Symbolic abstractions extracted from low-level demonstrations, combined with inductive generalisation, suffice to construct a high-level policy that preserves all planning-relevant structure without manual feature engineering or loss of critical constraints.
What would settle it
A case in which the derived high-level policy produces incomplete or invalid plans on a long-horizon task with many objects because critical constraints were lost during abstraction or generalisation would falsify the central claim.
Figures
read the original abstract
We tackle the challenge of building embodied AI agents that can reliably solve long-horizon planning problems. Imitation learning from demonstrations has shown itself to be effective in training robots to solve a diversity of complex tasks requiring fine motor control and manipulation over low-level (LL), continuous environments. Yet, it remains a difficult endeavour to generate long-horizon plans from imitation learning alone. In contrast, high-level (HL), symbolic abstractions facilitate efficient and interpretable long-horizon planning. We propose to combine the strengths of LL imitation learning for manipulation and control, and HL symbolic abstractions for long-horizon planning. We realise this idea via \emph{bilevel policies} of the form $(\pi^{\mathrm{hl}}, \pi^{\mathrm{ll}})$, consisting of a neural policy $\pi^{\mathrm{ll}}$ learned from LL demonstrations, and an HL symbolic policy $\pi^{\mathrm{hl}}$ that is constructed from symbolic abstractions of the LL demonstrations combined with inductive generalisation. We implement these ideas in the BISON system. Experiments on extended MetaWorld benchmarks demonstrate that BISON generalises to long horizons and problems with greater numbers of objects than those solved by VLA and end-to-end methods, and is more time and memory efficient in training and inference. Notably, when ignoring LL execution, BISON's HL policies can solve HL problems with 10,000 relevant objects in under a minute. Project page: https://dillonzchen.github.io/bison
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes BISON, a bilevel policy framework for long-horizon embodied planning that pairs a neural low-level policy learned via imitation from demonstrations with a high-level symbolic policy constructed by extracting symbolic abstractions from those demonstrations and applying inductive generalization. Experiments on extended MetaWorld benchmarks claim that BISON generalizes to longer horizons and instances with more objects than VLA or end-to-end baselines, while being more efficient in training and inference; notably, the HL symbolic component solves problems with 10,000 objects in under a minute when LL execution is ignored.
Significance. If the central claims hold, the work would demonstrate a practical way to combine the scalability of symbolic planning with the robustness of neural control for manipulation, offering efficiency and generalization advantages on long-horizon tasks without manual feature engineering. The bilevel separation and use of inductive generalization over predicates are promising directions, though the absence of formal guarantees on abstraction fidelity limits immediate impact.
major comments (2)
- [Experiments] Experiments section: the abstract and reported benchmark gains provide no details on statistical tests, number of runs, variance, data exclusion rules, or exact baseline implementations; this leaves the generalization claims to long horizons and greater object counts resting on high-level description only.
- [Abstraction pipeline] Abstraction pipeline (Section 3): no formal soundness argument or exhaustive executability audit is described for the symbolic abstraction extraction step; without this, it is unclear whether continuous constraints such as grasp stability under varying masses or metric reachability are preserved when the HL policy is applied to the neural LL executor.
minor comments (2)
- [Preliminaries] Notation: the bilevel policy is written as (π^hl, π^ll); explicitly state whether π^hl is a policy over predicates or a planner that invokes the LL policy at each step.
- [Figures] Figure clarity: ensure that diagrams of the abstraction extraction and inductive generalization steps label all inputs/outputs and distinguish learned versus hand-specified components.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below and indicate the revisions we will make to improve clarity and rigor.
read point-by-point responses
-
Referee: [Experiments] Experiments section: the abstract and reported benchmark gains provide no details on statistical tests, number of runs, variance, data exclusion rules, or exact baseline implementations; this leaves the generalization claims to long horizons and greater object counts resting on high-level description only.
Authors: We agree that the current presentation of results lacks sufficient methodological detail for full reproducibility and assessment of the generalization claims. In the revised manuscript we will expand the Experiments section to report the number of independent runs (five random seeds per method and task), mean success rates with standard deviations, the statistical tests performed (paired t-tests with p-values), data exclusion rules (none applied beyond the standard success criterion of task completion within the horizon), and precise implementation details for all baselines including VLA and end-to-end architectures, training hyperparameters, and evaluation protocols. revision: yes
-
Referee: [Abstraction pipeline] Abstraction pipeline (Section 3): no formal soundness argument or exhaustive executability audit is described for the symbolic abstraction extraction step; without this, it is unclear whether continuous constraints such as grasp stability under varying masses or metric reachability are preserved when the HL policy is applied to the neural LL executor.
Authors: We acknowledge that the manuscript does not supply a formal soundness proof for the abstraction extraction procedure. The predicates are induced directly from successful low-level demonstration trajectories, and the neural low-level policy is trained via imitation learning to realize them in the continuous domain; task success rates in our benchmarks provide empirical evidence that grasp stability and reachability are handled adequately by the combined bilevel system. In the revision we will add an explicit discussion subsection in Section 3 that describes the extraction heuristics, reports observed failure modes related to continuous constraints, and states the absence of formal guarantees as a limitation. revision: partial
Circularity Check
No circularity: empirical claims rest on external benchmarks and independent comparisons
full rationale
The paper presents a system proposal for bilevel policies (neural LL policy from demonstrations plus HL symbolic policy from abstractions and inductive generalization) and supports its claims via experimental results on extended MetaWorld benchmarks against VLA and end-to-end baselines. No mathematical derivation chain, equations, or fitted parameters are described that reduce by construction to the paper's own inputs. Evaluation relies on external benchmark performance metrics (time, memory, generalization to long horizons and 10k objects), which are falsifiable outside the system and do not invoke self-citation chains or self-definitional loops for the central results.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Symbolic abstractions extracted from low-level demonstrations combined with inductive generalisation produce an effective high-level policy for long-horizon planning.
invented entities (1)
-
Bilevel policy (π^hl, π^ll)
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We apply goal regression ... to extract a set of Condition → Action rules. We inductively generalise the resulting rules to produce compact and expressive HL policies consisting of first-order, condition-action rules
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
HL policies πhl consist of sets of first-order, condition-action rules ... with associated priority values related to goal proximity
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
David Abel, Dilip Arumugam, Lucas Lehnert, and Michael L. Littman. State abstractions for lifelong reinforcement learning. InICML, 2018
work page 2018
-
[2]
Ellis Hershkowitz, and Michael L
David Abel, D. Ellis Hershkowitz, and Michael L. Littman. Near optimal behavior via approximate state abstraction. InICML, 2016
work page 2016
-
[3]
Tenenbaum, Christopher Bates, and Samuel J
Zergham Ahmed, Joshua B. Tenenbaum, Christopher Bates, and Samuel J. Gershman. Synthe- sizing world models for bilevel planning.Trans. Mach. Learn. Res., 2025, 2025
work page 2025
-
[4]
Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alexander Herzog, Daniel Ho, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Eric Jang, Rosario Jauregui Ruano, Kyle Jeffrey, Sally Jesmonth, Nikhil J. Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Kuang-H...
work page 2022
-
[5]
Ankesh Anand, Evan Racah, Sherjil Ozair, Yoshua Bengio, Marc-Alexandre Côté, and R. De- von Hjelm. Unsupervised state representation learning in Atari. InNeurIPS, 2019
work page 2019
-
[6]
Classical planning in deep latent space: Bridging the subsymbolic-symbolic boundary
Masataro Asai and Alex Fukunaga. Classical planning in deep latent space: Bridging the subsymbolic-symbolic boundary. InAAAI, 2018. 10
work page 2018
-
[7]
Classical planning in deep latent space.J
Masataro Asai, Hiroshi Kajino, Alex Fukunaga, and Christian Muise. Classical planning in deep latent space.J. Artif. Intell. Res., 74:1599–1686, 2022
work page 2022
-
[8]
Ashay Athalye, Nishanth Kumar, Tom Silver, Yichao Liang, Jiuguang Wang, Tomás Lozano- Pérez, and Leslie Pack Kaelbling. From pixels to predicates: Learning symbolic world models via pretrained VLMs.IEEE Robotics and Automation Letters, 11(4):4002–4009, 2026
work page 2026
-
[9]
Downward refinement and the efficiency of hierarchical problem solving.Artif
Fahiem Bacchus and Qiang Yang. Downward refinement and the efficiency of hierarchical problem solving.Artif. Intell., 71(1):43–100, 1994
work page 1994
-
[10]
A survey on hierarchical planning - one abstract idea, many concrete realizations
Pascal Bercher, Ron Alford, and Daniel Höller. A survey on hierarchical planning - one abstract idea, many concrete realizations. InIJCAI, 2019
work page 2019
-
[11]
Planning for temporally extended goals in pure-past linear temporal logic.Artif
Luigi Bonassi, Giuseppe De Giacomo, Marco Favorito, Francesco Fuggitti, Alfonso Emilio Gerevini, and Enrico Scala. Planning for temporally extended goals in pure-past linear temporal logic.Artif. Intell., 348:104409, 2025
work page 2025
-
[12]
Learning to predict action feasibility for task and motion planning in 3d environments
Smail Ait Bouhsain, Rachid Alami, and Thierry Siméon. Learning to predict action feasibility for task and motion planning in 3d environments. InICRA, 2023
work page 2023
-
[13]
Learning geometric reasoning networks for robot task and motion planning
Smail Ait Bouhsain, Rachid Alami, and Thierry Siméon. Learning geometric reasoning networks for robot task and motion planning. InICLR, 2025
work page 2025
-
[14]
Using abstractions for decision-theoretic planning with time constraints
Craig Boutilier and Richard Dearden. Using abstractions for decision-theoretic planning with time constraints. InAAAI, 1994
work page 1994
-
[15]
Alper Kamil Bozkurt, Yu Wang, Michael M. Zavlanos, and Miroslav Pajic. Control synthesis from linear temporal logic specifications using model-free reinforcement learning. InICRA, 2020
work page 2020
-
[16]
MONet: Unsupervised Scene Decomposition and Representation
Christopher P. Burgess, Loïc Matthey, Nicholas Watters, Rishabh Kabra, Irina Higgins, Matthew M. Botvinick, and Alexander Lerchner. MONet: Unsupervised scene decomposition and representation.CoRR, abs/1901.11390, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1901
-
[17]
Klassen, Richard Anthony Valenzano, and Sheila A
Alberto Camacho, Rodrigo Toro Icarte, Toryn Q. Klassen, Richard Anthony Valenzano, and Sheila A. McIlraith. LTL and beyond: Formal languages for reward function specification in reinforcement learning. InIJCAI, 2019
work page 2019
-
[18]
Diffusion policy: Visuomotor policy learning via action diffusion
Cheng Chi, Siyuan Feng, Yilun Du, Zhenjia Xu, Eric Cousineau, Benjamin Burchfiel, and Shuran Song. Diffusion policy: Visuomotor policy learning via action diffusion. InRSS, 2023
work page 2023
-
[19]
Guided search for task and motion plans using learned heuristics
Rohan Chitnis, Dylan Hadfield-Menell, Abhishek Gupta, Siddharth Srivastava, Edward Gro- shev, Christopher Lin, and Pieter Abbeel. Guided search for task and motion plans using learned heuristics. InICRA, 2016
work page 2016
-
[20]
Tenenbaum, Tomás Lozano-Pérez, and Leslie Pack Kaelbling
Rohan Chitnis, Tom Silver, Joshua B. Tenenbaum, Tomás Lozano-Pérez, and Leslie Pack Kaelbling. Learning neuro-symbolic relational transition models for bilevel planning. InIROS, 2022
work page 2022
-
[21]
Learning long-horizon action dependencies in sampling-based bilevel planning
Bartlomiej Cieslar, Leslie Pack Kaelbling, Tomás Lozano-Pérez, and Jorge Mendez-Mendez. Learning long-horizon action dependencies in sampling-based bilevel planning. InCoRL, 2024
work page 2024
-
[22]
Corrêa, Florian Pommerening, Malte Helmert, and Guillem Francès
Augusto B. Corrêa, Florian Pommerening, Malte Helmert, and Guillem Francès. Lifted successor generation using query optimization techniques. InICAPS, 2020
work page 2020
-
[23]
Long-horizon manipulation of unknown objects via task and motion planning with estimated affordances
Aidan Curtis, Xiaolin Fang, Leslie Pack Kaelbling, Tomás Lozano-Pérez, and Caelan Reed Garrett. Long-horizon manipulation of unknown objects via task and motion planning with estimated affordances. InICRA, 2022
work page 2022
-
[24]
Tenenbaum, Tomás Lozano-Pérez, and Leslie Pack Kaelbling
Aidan Curtis, George Matheos, Nishad Gothoskar, Vikash Mansinghka, Joshua B. Tenenbaum, Tomás Lozano-Pérez, and Leslie Pack Kaelbling. Partially observable task and motion planning with uncertainty and risk awareness. InRSS, 2024. 11
work page 2024
-
[25]
Tenenbaum, Tomás Lozano-Pérez, and Leslie Pack Kaelbling
Aidan Curtis, Tom Silver, Joshua B. Tenenbaum, Tomás Lozano-Pérez, and Leslie Pack Kaelbling. Discovering state and action abstractions for generalized task and motion planning. InAAAI, 2022
work page 2022
-
[26]
Neil T. Dantam, Zachary K. Kingston, Swarat Chaudhuri, and Lydia E. Kavraki. An incremen- tal constraint-based framework for task and motion planning.Int. J. Robotics Res., 37(10), 2018
work page 2018
-
[27]
Peter Dayan and Geoffrey E. Hinton. Feudal reinforcement learning. InNeurIPS, 1992
work page 1992
-
[28]
Thomas L. Dean and Robert Givan. Model minimization in Markov decision processes. In AAAI, 1997
work page 1997
-
[29]
Thomas G. Dietterich. Hierarchical reinforcement learning with the MAXQ value function decomposition.J. Artif. Intell. Res., 13:227–303, 2000
work page 2000
-
[30]
Danny Driess, Jung-Su Ha, and Marc Toussaint. Deep visual reasoning: Learning to predict action sequences for task and motion planning from an initial scene image. InRSS, 2020
work page 2020
-
[31]
Oguz, Jung-Su Ha, and Marc Toussaint
Danny Driess, Ozgur S. Oguz, Jung-Su Ha, and Marc Toussaint. Deep visual heuristics: Learning feasibility of mixed-integer programs for manipulation planning. InICRA, 2020
work page 2020
-
[32]
Danny Driess, Fei Xia, Mehdi S. M. Sajjadi, Corey Lynch, Aakanksha Chowdhery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, Wenlong Huang, Yevgen Chebotar, Pierre Sermanet, Daniel Duckworth, Sergey Levine, Vincent Vanhoucke, Karol Hausman, Marc Toussaint, Klaus Greff, Andy Zeng, Igor Mordatch, and Pete Florence. Palm-e: An embodied ...
work page 2023
-
[33]
Fast task planning with neuro-symbolic relaxation.IEEE Robotics Autom
Qiwei Du, Bowen Li, Yi Du, Shaoshu Su, Taimeng Fu, Zitong Zhan, Zhipeng Zhao, and Chen Wang. Fast task planning with neuro-symbolic relaxation.IEEE Robotics Autom. Lett., 11(3):3684–3691, 2026
work page 2026
-
[34]
Hwang, Soumya Sanyal, Xiang Ren, Allyson Ettinger, Zaïd Harchaoui, and Yejin Choi
Nouha Dziri, Ximing Lu, Melanie Sclar, Xiang Lorraine Li, Liwei Jiang, Bill Yuchen Lin, Sean Welleck, Peter West, Chandra Bhagavatula, Ronan Le Bras, Jena D. Hwang, Soumya Sanyal, Xiang Ren, Allyson Ettinger, Zaïd Harchaoui, and Yejin Choi. Faith and fate: Limits of transformers on compositionality. InNeurIPS, 2023
work page 2023
-
[35]
Kutluhan Erol, James A. Hendler, and Dana S. Nau. Complexity results for HTN planning. Ann. Math. Artif. Intell., 18(1):69–93, 1996
work page 1996
-
[36]
Richard Fikes, Peter E. Hart, and Nils J. Nilsson. Learning and executing generalized robot plans.Artif. Intell., 3(1-3):251–288, 1972
work page 1972
-
[37]
Richard Fikes and Nils J. Nilsson. STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving.Artif. Intell., 2(3/4):189–208, 1971
work page 1971
- [38]
-
[39]
Holladay, Beomjoon Kim, Tom Silver, Leslie Pack Kaelbling, and Tomás Lozano-Pérez
Caelan Reed Garrett, Rohan Chitnis, Rachel M. Holladay, Beomjoon Kim, Tom Silver, Leslie Pack Kaelbling, and Tomás Lozano-Pérez. Integrated task and motion planning.Annu. Rev. Control. Robotics Auton. Syst., 4:265–293, 2021
work page 2021
-
[40]
Synthesis Lectures on Artificial Intelligence and Machine Learning
Hector Geffner and Blai Bonet.A Concise Introduction to Models and Methods for Automated Planning. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers, 2013
work page 2013
-
[41]
Nau, and Paolo Traverso.Automated planning - theory and practice
Malik Ghallab, Dana S. Nau, and Paolo Traverso.Automated planning - theory and practice. Elsevier, 2004
work page 2004
-
[42]
Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George E. Dahl. Neural message passing for quantum chemistry. InICML, 2017
work page 2017
-
[43]
Fausto Giunchiglia and Toby Walsh. A theory of abstraction.Artif. Intell., 57:323–389, 1992. 12
work page 1992
-
[44]
C. Cordell Green. Application of theorem proving to problem solving. In Donald E. Walker and Lewis M. Norton, editors,IJCAI, pages 219–240. William Kaufmann, 1969
work page 1969
-
[45]
Exploiting first-order regression in inductive policy selection
Charles Gretton and Sylvie Thiébaux. Exploiting first-order regression in inductive policy selection. InUAI, 2004
work page 2004
-
[46]
INTERPRET: interactive predicate learning from language feedback for generalizable task planning
Muzhi Han, Yifeng Zhu, Song-Chun Zhu, Ying Nian Wu, and Yuke Zhu. INTERPRET: interactive predicate learning from language feedback for generalizable task planning. InRSS, 2024
work page 2024
-
[47]
Entity-centric reinforcement learning for object manipulation from pixels
Dan Haramati, Tal Daniel, and Aviv Tamar. Entity-centric reinforcement learning for object manipulation from pixels. InICLR, 2024
work page 2024
-
[48]
Patrik Haslum, Nir Lipovetzky, Daniele Magazzeni, and Christian Muise.An Introduction to the Planning Domain Definition Language. 2019
work page 2019
-
[49]
Milos Hauskrecht, Nicolas Meuleau, Leslie Pack Kaelbling, Thomas L. Dean, and Craig Boutilier. Hierarchical solution of Markov decision processes using macro-actions. InUAI, 1998
work page 1998
-
[50]
The fast downward planning system.J
Malte Helmert. The fast downward planning system.J. Artif. Intell. Res., 26:191–246, 2006
work page 2006
-
[51]
Denoising diffusion probabilistic models
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In NeurIPS, 2020
work page 2020
-
[52]
Training compute-optimal large language models
Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, DDL Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, et al. Training compute-optimal large language models. InNeurIPS, 2022
work page 2022
-
[53]
Look before you leap: Unveiling the power of gpt-4v in robotic vision- language planning,
Yingdong Hu, Fanqi Lin, Tong Zhang, Li Yi, and Yang Gao. Look before you leap: Unveiling the power of GPT-4V in robotic vision-language planning.CoRR, abs/2311.17842, 2023
-
[54]
Automated planning domain inference for task and motion planning
Jinbang Huang, Allen Tao, Rozilyn Marco, Miroslav Bogdanovic, Jonathan Kelly, and Florian Shkurti. Automated planning domain inference for task and motion planning. InICRA, 2025
work page 2025
-
[55]
Inner monologue: Embodied reasoning through planning with language models
Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan Tompson, Igor Mordatch, Yevgen Chebotar, Pierre Sermanet, Tomas Jackson, Noah Brown, Linda Luu, Sergey Levine, Karol Hausman, and Brian Ichter. Inner monologue: Embodied reasoning through planning with language models. InCoRL, 2022
work page 2022
- [56]
-
[57]
Physical Intelligence, Bo Ai, Ali Amin, Raichelle Aniceto, Ashwin Balakrishna, Greg Balke, Kevin Black, George Bokinsky, Shihao Cao, Thomas Charbonnier, Vedant Choudhary, Foster Collins, Ken Conley, Grace Connors, James Darpinian, Karan Dhabalia, Maitrayee Dhaka, Jared DiCarlo, Danny Driess, Michael Equi, Adnan Esmail, Yunhao Fang, Chelsea Finn, Catherine...
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[58]
Physical Intelligence, Kevin Black, Noah Brown, James Darpinian, Karan Dhabalia, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, Manuel Y . Galliker, Dibya Ghosh, Lachy Groom, Karol Hausman, Brian Ichter, Szymon Jakubczak, Tim Jones, Liyiming Ke, Devin LeBlanc, Sergey Levine, Adrian Li-Bell, Mohith Mothukuri, Suraj Nair, Karl Pertsc...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[59]
Deepltl: Learning to efficiently satisfy complex LTL specifications for multi-task RL
Mathias Jackermeier and Alessandro Abate. Deepltl: Learning to efficiently satisfy complex LTL specifications for multi-task RL. InICLR, 2025
work page 2025
-
[60]
Autonomous learning of object- centric abstractions for high-level planning
Steven James, Benjamin Rosman, and George Konidaris. Autonomous learning of object- centric abstractions for high-level planning. InICLR, 2022
work page 2022
-
[61]
Position: Llms can’t plan, but can help planning in llm-modulo frameworks
Subbarao Kambhampati, Karthik Valmeekam, Lin Guan, Mudit Verma, Kaya Stechly, Siddhant Bhambri, Lucas Saldyt, and Anil Murthy. Position: Llms can’t plan, but can help planning in llm-modulo frameworks. InICML, 2024
work page 2024
-
[62]
Learning to search in task and motion planning with streams.IEEE Robotics Autom
Mohamed Khodeir, Ben Agro, and Florian Shkurti. Learning to search in task and motion planning with streams.IEEE Robotics Autom. Lett., 8(4):1983–1990, 2023
work page 1983
-
[63]
Learning to guide task and motion planning using score-space representation.Int
Beomjoon Kim, Zi Wang, Leslie Pack Kaelbling, and Tomás Lozano-Pérez. Learning to guide task and motion planning using score-space representation.Int. J. Robotics Res., 38(7), 2019
work page 2019
-
[64]
Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan Paul Foster, Pannag R. Sanketi, Quan Vuong, Thomas Kollar, Benjamin Burchfiel, Russ Tedrake, Dorsa Sadigh, Sergey Levine, Percy Liang, and Chelsea Finn. Openvla: An open-source vision-language-action model. InCoRL, 2024
work page 2024
-
[65]
Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InICLR, 2015
work page 2015
-
[66]
Kipf, Elise van der Pol, and Max Welling
Thomas N. Kipf, Elise van der Pol, and Max Welling. Contrastive learning of structured world models. InICLR, 2020
work page 2020
-
[67]
Thomas N. Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. InICLR, 2017
work page 2017
-
[68]
On the necessity of abstraction.Current Opinion in Behavioral Sciences, 29:1–7, 2019
George Konidaris. On the necessity of abstraction.Current Opinion in Behavioral Sciences, 29:1–7, 2019. Artificial Intelligence
work page 2019
-
[69]
George Dimitri Konidaris and Andrew G. Barto. Efficient skill learning using abstraction selection. InIJCAI, 2009
work page 2009
-
[70]
Constructing symbolic representations for high-level planning
George Dimitri Konidaris, Leslie Pack Kaelbling, and Tomás Lozano-Pérez. Constructing symbolic representations for high-level planning. InAAAI, 2014
work page 2014
-
[71]
From skills to symbols: Learning symbolic representations for abstract high-level planning.J
George Dimitri Konidaris, Leslie Pack Kaelbling, and Tomás Lozano-Pérez. From skills to symbols: Learning symbolic representations for abstract high-level planning.J. Artif. Intell. Res., 61:215–289, 2018
work page 2018
-
[72]
Richard E. Korf. Planning as search: A quantitative approach.Artif. Intell., 33(1):65–88, 1987
work page 1987
-
[73]
Nishanth Kumar, Tom Silver, Willie McClinton, Linfeng Zhao, Stephen Proulx, Tomás Lozano- Pérez, Leslie Pack Kaelbling, and Jennifer L. Barry. Practice makes perfect: Planning to learning skill parameter policies. InRSS, 2024
work page 2024
-
[74]
Hierarchical imitation and reinforcement learning
Hoang Minh Le, Nan Jiang, Alekh Agarwal, Miroslav Dudík, Yisong Yue, and Hal Daumé III. Hierarchical imitation and reinforcement learning. InICML, 2018
work page 2018
-
[75]
Andrew C. Li, Toryn Q. Klassen, Andrew Wang, Parand A. Alamdari, and Sheila A. McIlraith. Ground-compose-reinforce: Grounding language inagentic behaviours using limited data. In NeurIPS, 2025. 14
work page 2025
-
[76]
Bowen Li, Tom Silver, Sebastian A. Scherer, and Alexander G. Gray. Bilevel learning for bilevel planning. InRSS, 2025
work page 2025
-
[77]
Lihong Li, Thomas J. Walsh, and Michael L. Littman. Towards a unified theory of state abstraction for mdps. InInternational Symposium on Artificial Intelligence and Mathematics, 2006
work page 2006
-
[78]
Reinforcement learning with temporal logic rewards
Xiao Li, Cristian Ioan Vasile, and Calin Belta. Reinforcement learning with temporal logic rewards. InIROS, 2017
work page 2017
-
[79]
Yichao Liang, Nishanth Kumar, Hao Tang, Adrian Weller, Joshua B. Tenenbaum, Tom Silver, João F. Henriques, and Kevin Ellis. Visualpredicator: Learning abstract world models with neuro-symbolic predicates for robot planning. InICLR, 2025
work page 2025
-
[80]
Tenenbaum, Carl Edward Rasmussen, Adrian Weller, Zenna Tavares, Tom Silver, and Kevin Ellis
Yichao Liang, Dat Nguyen, Cambridge Yang, Tianyang Li, Joshua B. Tenenbaum, Carl Edward Rasmussen, Adrian Weller, Zenna Tavares, Tom Silver, and Kevin Ellis. Exopredicator: Learning abstract models of dynamic worlds for robot planning. InICLR, 2026
work page 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.