arxiv: 2605.13282 · v1 · submitted 2026-05-13 · 💻 cs.AI · cs.LG

Recognition: unknown

Differentiable Learning of Lifted Action Schemas for Classical Planning

Jonas Reiter , Jakob Elias Gebler , Hector Geffner

Authors on Pith no claims yet

Pith reviewed 2026-05-14 19:37 UTC · model grok-4.3

classification 💻 cs.AI cs.LG

keywords lifted action schemasclassical planningdifferentiable learningneuro-symbolic AISTRIPS domainsaction schema recoveryplanning domain learning

0 comments

The pith

A differentiable neural network learns lifted action schemas from fully observed state traces by inferring unobserved action arguments from state changes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a neural network architecture to learn the lifted action schemas that classical planners use to represent actions compactly in STRIPS-style domains. These schemas are learned from traces of states that are fully observed as sets of atoms, but without labels for which objects participate in each action. The key challenge is to jointly identify the action arguments and learn the schemas in a way that recovers the ground-truth structure exactly on standard planning domains. This provides a differentiable component suitable for integration into larger neuro-symbolic planning systems. The approach is evaluated on multiple domains and shows robustness to some observation noise.

Core claim

The central discovery is a novel neural network architecture that learns lifted action schemas from state traces where states are fully observed but action arguments are unobserved, by simultaneously identifying the arguments from state changes and learning the schemas such that the ground-truth structure is recovered in various planning domains.

What carries the argument

A differentiable neural network that processes sequences of states to infer action arguments and learn the corresponding lifted action schemas that add or delete atoms.

If this is right

The learned schemas enable effective planning in large deterministic MDPs represented in STRIPS or PDDL.
The architecture can be integrated into neuro-symbolic models for learning from more complex data like images.
Recovery of ground-truth structure holds across various planning domains.
The method shows robustness to observation noise.
It handles variations related to slot-based dynamics models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the schemas are learned perfectly, they could support structural generalization to infinitely many domain instances.
This approach might serve as a building block for learning planning domains directly from sequences of images and action labels.
Extensions could address cases with partial state observations or ambiguous action effects.
Integration with reinforcement learning could allow learning relational dynamics from experience.

Load-bearing premise

States are fully observed as sets of atoms and action arguments can be uniquely recovered from observed state changes without ambiguity or additional supervision.

What would settle it

Running the architecture on standard planning domains such as blocks world and observing whether the learned schemas match the ground-truth lifted representations exactly when action arguments are hidden.

Figures

Figures reproduced from arXiv: 2605.13282 by Hector Geffner, Jakob Elias Gebler, Jonas Reiter.

**Figure 3.** Figure 3: Original Blocks-3 operators (left) versus the operators learned by our model (right). Common literals are shown in black. Literals that appear only on one side are highlighted. Original Learned (:action stack (:action learned_stack :parameters (?bm ?bt) :parameters (?x1 ?x3) :precondition (and :precondition (and (clear ?bm) (clear ?x1) (clear ?bt) (clear ?x3) (on-table ?bm) (on-table ?x1) (not (eq ?bm ?bt)… view at source ↗

read the original abstract

Classical planners can effectively solve very large deterministic MDPs represented in STRIPS or PDDL where states are sets of atoms over objects and relations, and lifted action schemas add or delete these atoms. This compact representation yields strong search heuristics and provides an ideal setting for structural generalization, since lifted relations and action schemas give rise to infinitely many domain instances. A central challenge is to learn these relations and action schemas from data, and recent approaches have addressed this problem using different types of observations. In this work, we develop a novel neural network architecture for learning action schemas from traces where states are fully observed but action arguments are unobserved. The problem is a simplification but an important step towards learning planning domains from sequences of images and action labels, and we aim to solve this simplification in a nearly perfect manner. The challenge lies in learning the action schemas while simultaneously identifying the action arguments from observed state changes. Our approach yields a robust differentiable component that can then be integrated into larger neuro-symbolic models. We evaluate the architecture on various planning domains, where the learned lifted action schemas must recover the ground-truth structure. Additionally, we report experiments on robustness to observation noise and on a variation related to slot-based dynamics models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a differentiable architecture for learning lifted schemas from fully observed states with latent action arguments, recovering ground-truth structure on standard domains while leaving the uniqueness of those arguments lightly tested.

read the letter

The core idea is to train a neural net that simultaneously discovers the lifted schema and the hidden object bindings that explain each observed state transition. This is new in the combination of full differentiability with joint inference over schemas and arguments from traces alone. The experiments recover ground-truth schemas across several planning domains and include noise-robustness checks plus a slot-based variant, which is a reasonable extension. Those results are the strongest part of the work and show the approach can serve as a modular component for larger neuro-symbolic planners. The main soft spot is the recoverability premise. When two different bindings produce identical add/delete effects, the inverse problem is under-determined, yet the paper offers no formal argument that its parameterization rules this out and no ablation that injects controlled symmetry or partial observability. The reported near-perfect recovery therefore rests on the domains tested having unique bindings. This is a real but not fatal gap; it mainly means the generality claim needs more evidence. The paper is for people working on neuro-symbolic planning who need a trainable schema learner. It shows clear engagement with the problem and enough experimental grounding to deserve referee time rather than a desk reject.

Referee Report

2 major / 1 minor

Summary. The paper introduces a differentiable neural architecture to learn lifted action schemas for classical planning from traces in which states are fully observed as sets of atoms but action arguments are latent. The model simultaneously discovers the schema predicates and infers the argument bindings that explain each observed state transition; the central claim is that this recovers ground-truth lifted schemas nearly perfectly across standard planning domains while remaining robust to observation noise.

Significance. If the recoverability result holds under the stated assumptions, the work supplies a modular, differentiable primitive that can be embedded in larger neuro-symbolic planners, directly addressing the long-standing gap between perceptual input and compact STRIPS-style representations.

major comments (2)

[Method and Experimental Evaluation] The central claim that the architecture recovers ground-truth structure 'nearly perfectly' rests on the premise that each observed transition admits a unique binding of the learned schema parameters to the observed objects. No formal argument is given that the chosen neural parameterization or loss eliminates symmetries (e.g., identical effects produced by distinct bindings under commutative actions or symmetric objects).
[Experimental Evaluation] The experimental section reports high recovery rates but does not include controlled ablations that inject controlled ambiguity (partial observability, symmetric predicates, or multiple consistent bindings) and measure degradation in schema fidelity; without such tests the robustness claim remains under-supported.

minor comments (1)

Notation for the slot-based dynamics variant and the precise form of the reconstruction loss could be clarified with an additional diagram or pseudocode block.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address the major concerns point-by-point below, providing clarifications where possible and committing to revisions that strengthen the empirical support and discussion of the method.

read point-by-point responses

Referee: [Method and Experimental Evaluation] The central claim that the architecture recovers ground-truth structure 'nearly perfectly' rests on the premise that each observed transition admits a unique binding of the learned schema parameters to the observed objects. No formal argument is given that the chosen neural parameterization or loss eliminates symmetries (e.g., identical effects produced by distinct bindings under commutative actions or symmetric objects).

Authors: We acknowledge that the manuscript does not contain a formal proof that the neural parameterization and loss guarantee unique bindings in the presence of symmetries. The architecture relies on end-to-end optimization of a reconstruction loss over state transitions, which empirically selects the ground-truth schemas and bindings in the evaluated domains; the joint inference of schemas and argument bindings appears to break many symmetries because incorrect bindings produce inconsistent effects across multiple transitions. However, we agree this is an informal observation rather than a rigorous argument. In the revision we will add a dedicated discussion subsection that (i) explicitly identifies the symmetry issue, (ii) explains why the current loss and parameterization tend to avoid it in practice, and (iii) notes the conditions under which multiple bindings could remain consistent. revision: partial
Referee: [Experimental Evaluation] The experimental section reports high recovery rates but does not include controlled ablations that inject controlled ambiguity (partial observability, symmetric predicates, or multiple consistent bindings) and measure degradation in schema fidelity; without such tests the robustness claim remains under-supported.

Authors: We agree that the current experimental section would be strengthened by controlled ablations that systematically introduce ambiguity. The existing noise-robustness experiments already vary observation noise, but they do not isolate symmetric predicates or multiple consistent bindings. We will add two new ablation studies in the revised manuscript: (1) domains containing commutative actions and symmetric objects, measuring schema recovery accuracy as a function of the degree of symmetry, and (2) a controlled test that forces the model to choose among multiple bindings that produce identical effects on a subset of transitions. These results will be reported alongside the existing tables to directly quantify degradation in schema fidelity. revision: yes

Circularity Check

0 steps flagged

No circularity: learning driven by external traces and ground-truth recovery

full rationale

The paper introduces a neural architecture to learn lifted action schemas from fully-observed state traces while recovering unobserved action arguments. Success is measured by fidelity to externally supplied ground-truth schemas on standard planning domains, with additional robustness experiments. No equation or claim reduces by construction to a fitted parameter, self-citation, or renamed input; the inverse problem of argument binding is solved via differentiable optimization against observed add/delete effects rather than by definitional fiat. The derivation therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach rests on standard planning assumptions about fully observed states and the existence of a finite set of lifted schemas that explain all observed transitions.

axioms (2)

domain assumption States are fully observed as sets of atoms over objects and relations
Stated in the problem setup as the input to the learning problem.
domain assumption Action arguments can be identified from observed state changes
Central to the joint learning task described in the abstract.

pith-pipeline@v0.9.0 · 5511 in / 1176 out tokens · 41019 ms · 2026-05-14T19:37:18.570924+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

300 extracted references · 300 canonical work pages · 2 internal anchors

[1]

doi:10.1016/j.artint.2019.05.003 , langid =

Learning Action Models with Minimal Observability , author =. doi:10.1016/j.artint.2019.05.003 , langid =

work page doi:10.1016/j.artint.2019.05.003 2019
[2]

Classical

Asai, Masataro and Fukunaga, Alex , date =. Classical

work page
[3]

Michael , date =

Balyo, Tomáš and Suda, Martin and Chrpa, Lukáš and Šafránek, Dominik and Gocht, Stephan and Dvořák, Filip and Barták, Roman and Youngblood, G. Michael , date =. Planning. Proceedings of the. doi:10.24963/kr.2024/76 , eventtitle =

work page doi:10.24963/kr.2024/76 2024
[4]

and Monet, Mikael and Pérez, Jorge and Reutter, Juan and Silva, Juan Pablo , date =

Barceló, Pablo and Kostylev, Egor V. and Monet, Mikael and Pérez, Jorge and Reutter, Juan and Silva, Juan Pablo , date =. The

work page
[5]

Going Beyond Accuracy: Interpretability Metrics for CNN Representations of Physiological Signals , shorttitle =

Brun, Luc and Gaüzère, Benoit and Renton, Guillaume and Bougleux, Sébastien and Yger, Florian , date =. A Differentiable Approximation for the. 2022 26th. doi:10.1109/ICPR56361.2022.9956203 , eventtitle =

work page doi:10.1109/icpr56361.2022.9956203 2022
[6]

and McCluskey, Thomas L

Cresswell, Stephen N. and McCluskey, Thomas L. and West, Margaret M. , date =. Acquiring Planning Domain Models Using. doi:10.1017/S0269888912000422 , langid =

work page doi:10.1017/s0269888912000422
[7]

Geffner, Hector and Bonet, Blai , date =. A. doi:10.1007/978-3-031-01564-9 , isbn =

work page doi:10.1007/978-3-031-01564-9
[8]

Learning

Gösgens, Jonas and Jansen, Niklas and Geffner, Hector , date =. Learning

work page
[9]

Recurrent

Goyal, Anirudh and Lamb, Alex and Hoffmann, Jordan and Sodhani, Shagun and Levine, Sergey and Bengio, Yoshua and Schölkopf, Bernhard , date =. Recurrent

work page
[10]

Grohe, Martin , date =. The. 2021 36th. doi:10.1109/LICS52264.2021.9470677 , eventtitle =

work page doi:10.1109/lics52264.2021.9470677 2021
[11]

Ha, David and Schmidhuber, Jürgen , date =. World

work page
[12]

Dream to

Hafner, Danijar and Lillicrap, Timothy and Ba, Jimmy and Norouzi, Mohammad , date =. Dream to

work page
[13]

Learning

Hafner, Danijar and Lillicrap, Timothy and Fischer, Ian and Villegas, Ruben and Ha, David and Lee, Honglak and Davidson, James , date =. Learning. Proceedings of the 36th

work page
[14]

Learning

Jansen, Niklas and Gösgens, Jonas and Geffner, Hector , date =. Learning. doi:10.24963/kr.2025/80 , eventtitle =

work page doi:10.24963/kr.2025/80 2025
[15]

and Stern, Roni , date =

Juba, Brendan and Le, Hai S. and Stern, Roni , date =. Safe. Proceedings of the. doi:10.24963/kr.2021/36 , eventtitle =

work page doi:10.24963/kr.2021/36 2021
[16]

and Eldawy, Mohamed and Lázaro-Gredilla, Miguel and Lou, Xinghua and Dorfman, Nimrod and Sidor, Szymon and Phoenix, Scott and George, Dileep , date =

Kansky, Ken and Silver, Tom and Mély, David A. and Eldawy, Mohamed and Lázaro-Gredilla, Miguel and Lou, Xinghua and Dorfman, Nimrod and Sidor, Szymon and Phoenix, Scott and George, Dileep , date =. Schema. Proceedings of the 34th

work page
[17]

doi:10.1016/j.artint.2024.104256 , langid =

Lifted Action Models Learning from Partial Traces , author =. doi:10.1016/j.artint.2024.104256 , langid =

work page doi:10.1016/j.artint.2024.104256 2024
[18]

Learning

Mourao, Kira and Zettlemoyer, Luke and Petrick, Ron and Steedman, Mark , date =. Learning. Proceedings of the

work page
[19]

Núñez-Molina, Carlos and Gómez, Vicenç and Geffner, Hector , date =. From. doi:10.48550/ARXIV.2509.13389 , pubstate =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2509.13389
[20]

Kipf, Peter Bloem, Rianne van nbsp;den Berg, Ivan Titov, and Max Welling

Schlichtkrull, Michael and Kipf, Thomas N. and Bloem, Peter and van den Berg, Rianne and Titov, Ivan and Welling, Max , editor =. Modeling. The. doi:10.1007/978-3-319-93417-4_38 , isbn =

work page doi:10.1007/978-3-319-93417-4_38
[21]

doi:10.5281/ZENODO.6382173 , organization =

Seipp, Jendrik and Torralba, Álvaro and Hoffmann, Jörg , date =. doi:10.5281/ZENODO.6382173 , organization =

work page doi:10.5281/zenodo.6382173
[22]

Attention Is

Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and ukasz Kaiser, Ł and Polosukhin, Illia , date =. Attention Is. Advances in

work page
[23]

Learning Lifted Action Models from Unsupervised Visual Traces

Xi, Kai and Gould, Stephen and Thiébaux, Sylvie , date =. Learning. doi:10.48550/ARXIV.2604.19043 , pubstate =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.19043
[24]

Xi, Kai and Gould, Stephen and Thiébaux, Sylvie , date =. Neuro-. doi:10.1609/icaps.v34i1.31528 , langid =

work page doi:10.1609/icaps.v34i1.31528
[25]

Learning Action Models from Plan Examples Using Weighted

Yang, Qiang and Wu, Kangheng and Jiang, Yunfei , date =. Learning Action Models from Plan Examples Using Weighted. doi:10.1016/j.artint.2006.11.005 , langid =

work page doi:10.1016/j.artint.2006.11.005 2006
[26]

Gradient

Yu, Tianhe and Kumar, Saurabh and Gupta, Abhishek and Levine, Sergey and Hausman, Karol and Finn, Chelsea , date =. Gradient. Advances in

work page
[27]

International Conference on Learning Representations , year =

Ilya Loshchilov and Frank Hutter , title =. International Conference on Learning Representations , year =

work page
[28]

2025 , publisher =

Acting, Planning, and Learning , author =. 2025 , publisher =

work page 2025
[29]

Journal of Artificial Intelligence Research , volume =

The First Probabilistic Track of the International Planning Competition , author =. Journal of Artificial Intelligence Research , volume =

work page
[30]

Núñez-Molina, Carlos and Gómez, Vicenç and Geffner, Hector , date =. From

work page
[31]

Khardon , title =

R. Khardon , title =. Artificial Intelligence , year =

work page
[32]

Cenk Gazen and Craig Knoblock , title =

B. Cenk Gazen and Craig Knoblock , title =. 1997 , opteditor =

work page 1997
[33]

Cenk Gazen and Craig Knoblock , title =

B. Cenk Gazen and Craig Knoblock , title =. Proc. ECP-97 , pages =

work page
[34]

Diaz and P

D. Diaz and P. Codognet , title =. Journal of Functional and Logic Programming , year =

work page
[35]

Long and M

D. Long and M. Fox , title =. JAIR , year =

work page
[36]

Koehler and B

J. Koehler and B. Nebel and J. Hoffman and Y. Dimopoulos , title =. Recent Advances in AI Planning. Proc. 4th European Conf. on Planning (ECP-97). Lect. Notes in AI 1348 , year =

work page
[37]

Kautz and Bart Selman , title =

Henry A. Kautz and Bart Selman , title =. Proceedings of the Tenth European Conference on Artificial Intelligence (

work page
[38]

Nebel , title =

B. Nebel , title =. KI-99: Advances in Artificial Intelligence , publisher =. 1999 , pages =

work page 1999
[39]

Nebel , title =

B. Nebel , title =. Journal of Artificial Intelligence Research , year =

work page
[40]

Hoffmann and B

J. Hoffmann and B. Nebel , title =. Journal of Artificial Intelligence Research , year =

work page
[41]

Hoffmann , title =

J. Hoffmann , title =. Proc. of the 15th European Conference on Artificial Intelligence (ECAI-02) , pages =

work page
[42]

Fox and D

M. Fox and D. Long , title =. Proc. IJCAI-99 , editor =

work page
[43]

Koehler , title =

J. Koehler , title =. Proc. of the 13th European Conference on AI (ECAI-98) , pages =. 1998 , publisher =

work page 1998
[44]

McDermott , title =

D. McDermott , title =

work page
[45]

Artificial Intelligence Magazine , year =

Derek Long , title =. Artificial Intelligence Magazine , year =

work page
[46]

McDermott , title =

D. McDermott , title =. Artificial Intelligence Magazine , year =

work page
[47]

Bacchus , title =

F. Bacchus , title =. Artificial Intelligence Magazine , volume =

work page
[48]

Bonet and H

B. Bonet and H. Geffner , title =. Artificial Intelligence , year =

work page
[49]

Bonet and H

B. Bonet and H. Geffner , title =. Proceedings of ECP-99 , editors =

work page
[50]

Bonet and H

B. Bonet and H. Geffner , title =. Proc. of AIPS-2000 , publisher =

work page 2000
[51]

Bonet and H

B. Bonet and H. Geffner , title =. Proc. IJCAI Workshop on Planning with Uncertainty and Partial Information , year =

work page
[52]

Haslum and H

P. Haslum and H. Geffner , title =. Proc. of the Fifth International Conference on AI Planning Systems (AIPS-2000) , pages =

work page 2000
[53]

Rintanen , title =

J. Rintanen , title =. J. of

work page
[54]

Tison , title =

P. Tison , title =. IEEE Transactions on Computers , volume =

work page
[55]

Backstrom and B

C. Backstrom and B. Nebel , title =. Computational Intelligence , year =

work page
[56]

Backstr\"

C. Backstr\". Expressive equivalence of planning formalisms , journal =. 1995 , volume =

work page 1995
[57]

Jonsson and C

P. Jonsson and C. B\". Tractable Planning with State Variables by Exploiting Structural Restrictions , booktitle =

work page
[58]

Fox and D

M. Fox and D. Long , title =. Journal of AI Research , volume =

work page
[59]

Fox and D

M. Fox and D. Long , title =. Journal of AI Research , year =

work page
[60]

and Littman, M.L

Younes, H.L.S. and Littman, M.L. and Weissman, D. and Asmuth, J. , journal =

work page
[61]

Bonet and B

B. Bonet and B. Givan , title =

work page
[62]

Hoffmann and S

J. Hoffmann and S. Edelkamp , title =

work page
[63]

Edelkamp and F

S. Edelkamp and F. Reffel , title =

work page
[64]

Geffner , editor =

H. Geffner , editor =. Logic-Based Artificial Intelligence , title =

work page
[65]

Fagin and J

R. Fagin and J. Halpern and Y. Moses and M. Vardi , title =

work page
[66]

Junghanns and J

A. Junghanns and J. Schaeffer , title =. Proc. IJCAI-99 , year =

work page
[67]

Edelkamp and M

S. Edelkamp and M. Helmert , title =. Proc. AIPS Workshop on Model-Theoretic Approaches to Planning , year =

work page
[68]

Hansen and S

E. Hansen and S. Zilberstein , booktitle =. Heuristic Search in Cyclic. 1998 , pages =

work page 1998
[69]

Artificial Intelligence , volume =

LAO*: A Heuristic Search Algorithm that Finds Solutions with Loops , author =. Artificial Intelligence , volume =. 2001 , pages =

work page 2001
[70]

Veloso and J

M. Veloso and J. Carbonell and A. Perez and D. Borrajo and E. Find and J. Blythe , title =. J. of Experimental and Theoretical AI , year =

work page
[71]

Fink and M

E. Fink and M. Veloso , title =. New Directions in AI Planning , publisher =. 1996 , pages =

work page 1996
[72]

Kambhampati, B

S. Kambhampati, B. Srivastava , title =. New Directions in AI Planning , publisher =

work page
[73]

Rivest , title =

R. Rivest , title =. Machine Learning , year =

work page
[74]

Bylander , title =

T. Bylander , title =. Artificial Intelligence , year =

work page
[75]

Gupta and D

N. Gupta and D. Nau , title =. Proceedings AAAI-91 , pages =

work page
[76]

Agre and D

P. Agre and D. Chapman , title =. Proceedings AAAI-87 , year =

work page
[77]

Geffner , title =

H. Geffner , title =. Lecture Notes in AI , editor =

work page
[78]

Brachman and H

R. Brachman and H. J. Levesque , title =. Proceedings AAAI-84 , year =

work page
[79]

Donini and M

F. Donini and M. Lenzerini and D. Nardi and W. Nutt , title =. Proceedings KR'91 , publisher =

work page
[80]

Baader and U

F. Baader and U. Sattler , title =. Proceedings KR'96 , publisher =

work page

Showing first 80 references.