Differentiable Learning of Lifted Action Schemas for Classical Planning

Hector Geffner; Jakob Elias Gebler; Jonas Reiter

arxiv: 2605.13282 · v2 · pith:HMXUOUHUnew · submitted 2026-05-13 · 💻 cs.AI · cs.LG

Differentiable Learning of Lifted Action Schemas for Classical Planning

Jonas Reiter , Jakob Elias Gebler , Hector Geffner This is my paper

Pith reviewed 2026-05-14 19:37 UTC · model grok-4.3

classification 💻 cs.AI cs.LG

keywords lifted action schemasclassical planningdifferentiable learningneuro-symbolic AISTRIPS domainsaction schema recoveryplanning domain learning

0 comments

The pith

A differentiable neural network learns lifted action schemas from fully observed state traces by inferring unobserved action arguments from state changes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a neural network architecture to learn the lifted action schemas that classical planners use to represent actions compactly in STRIPS-style domains. These schemas are learned from traces of states that are fully observed as sets of atoms, but without labels for which objects participate in each action. The key challenge is to jointly identify the action arguments and learn the schemas in a way that recovers the ground-truth structure exactly on standard planning domains. This provides a differentiable component suitable for integration into larger neuro-symbolic planning systems. The approach is evaluated on multiple domains and shows robustness to some observation noise.

Core claim

The central discovery is a novel neural network architecture that learns lifted action schemas from state traces where states are fully observed but action arguments are unobserved, by simultaneously identifying the arguments from state changes and learning the schemas such that the ground-truth structure is recovered in various planning domains.

What carries the argument

A differentiable neural network that processes sequences of states to infer action arguments and learn the corresponding lifted action schemas that add or delete atoms.

If this is right

The learned schemas enable effective planning in large deterministic MDPs represented in STRIPS or PDDL.
The architecture can be integrated into neuro-symbolic models for learning from more complex data like images.
Recovery of ground-truth structure holds across various planning domains.
The method shows robustness to observation noise.
It handles variations related to slot-based dynamics models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the schemas are learned perfectly, they could support structural generalization to infinitely many domain instances.
This approach might serve as a building block for learning planning domains directly from sequences of images and action labels.
Extensions could address cases with partial state observations or ambiguous action effects.
Integration with reinforcement learning could allow learning relational dynamics from experience.

Load-bearing premise

States are fully observed as sets of atoms and action arguments can be uniquely recovered from observed state changes without ambiguity or additional supervision.

What would settle it

Running the architecture on standard planning domains such as blocks world and observing whether the learned schemas match the ground-truth lifted representations exactly when action arguments are hidden.

Figures

Figures reproduced from arXiv: 2605.13282 by Hector Geffner, Jakob Elias Gebler, Jonas Reiter.

**Figure 3.** Figure 3: Original Blocks-3 operators (left) versus the operators learned by our model (right). Common literals are shown in black. Literals that appear only on one side are highlighted. Original Learned (:action stack (:action learned_stack :parameters (?bm ?bt) :parameters (?x1 ?x3) :precondition (and :precondition (and (clear ?bm) (clear ?x1) (clear ?bt) (clear ?x3) (on-table ?bm) (on-table ?x1) (not (eq ?bm ?bt)… view at source ↗

read the original abstract

Classical planners can effectively solve very large deterministic MDPs represented in STRIPS or PDDL where states are sets of atoms over objects and relations, and lifted action schemas add or delete these atoms. This compact representation yields strong search heuristics and provides an ideal setting for structural generalization, since lifted relations and action schemas give rise to infinitely many domain instances. A central challenge is to learn these relations and action schemas from data, and recent approaches have addressed this problem using different types of observations. In this work, we develop a novel neural network architecture for learning action schemas from traces where states are fully observed but action arguments are unobserved. The problem is a simplification but an important step towards learning planning domains from sequences of images and action labels, and we aim to solve this simplification in a nearly perfect manner. The challenge lies in learning the action schemas while simultaneously identifying the action arguments from observed state changes. Our approach yields a robust differentiable component that can then be integrated into larger neuro-symbolic models. We evaluate the architecture on various planning domains, where the learned lifted action schemas must recover the ground-truth structure. Additionally, we report experiments on robustness to observation noise and on a variation related to slot-based dynamics models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a differentiable architecture for learning lifted schemas from fully observed states with latent action arguments, recovering ground-truth structure on standard domains while leaving the uniqueness of those arguments lightly tested.

read the letter

The core idea is to train a neural net that simultaneously discovers the lifted schema and the hidden object bindings that explain each observed state transition. This is new in the combination of full differentiability with joint inference over schemas and arguments from traces alone. The experiments recover ground-truth schemas across several planning domains and include noise-robustness checks plus a slot-based variant, which is a reasonable extension. Those results are the strongest part of the work and show the approach can serve as a modular component for larger neuro-symbolic planners. The main soft spot is the recoverability premise. When two different bindings produce identical add/delete effects, the inverse problem is under-determined, yet the paper offers no formal argument that its parameterization rules this out and no ablation that injects controlled symmetry or partial observability. The reported near-perfect recovery therefore rests on the domains tested having unique bindings. This is a real but not fatal gap; it mainly means the generality claim needs more evidence. The paper is for people working on neuro-symbolic planning who need a trainable schema learner. It shows clear engagement with the problem and enough experimental grounding to deserve referee time rather than a desk reject.

Referee Report

2 major / 1 minor

Summary. The paper introduces a differentiable neural architecture to learn lifted action schemas for classical planning from traces in which states are fully observed as sets of atoms but action arguments are latent. The model simultaneously discovers the schema predicates and infers the argument bindings that explain each observed state transition; the central claim is that this recovers ground-truth lifted schemas nearly perfectly across standard planning domains while remaining robust to observation noise.

Significance. If the recoverability result holds under the stated assumptions, the work supplies a modular, differentiable primitive that can be embedded in larger neuro-symbolic planners, directly addressing the long-standing gap between perceptual input and compact STRIPS-style representations.

major comments (2)

[Method and Experimental Evaluation] The central claim that the architecture recovers ground-truth structure 'nearly perfectly' rests on the premise that each observed transition admits a unique binding of the learned schema parameters to the observed objects. No formal argument is given that the chosen neural parameterization or loss eliminates symmetries (e.g., identical effects produced by distinct bindings under commutative actions or symmetric objects).
[Experimental Evaluation] The experimental section reports high recovery rates but does not include controlled ablations that inject controlled ambiguity (partial observability, symmetric predicates, or multiple consistent bindings) and measure degradation in schema fidelity; without such tests the robustness claim remains under-supported.

minor comments (1)

Notation for the slot-based dynamics variant and the precise form of the reconstruction loss could be clarified with an additional diagram or pseudocode block.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address the major concerns point-by-point below, providing clarifications where possible and committing to revisions that strengthen the empirical support and discussion of the method.

read point-by-point responses

Referee: [Method and Experimental Evaluation] The central claim that the architecture recovers ground-truth structure 'nearly perfectly' rests on the premise that each observed transition admits a unique binding of the learned schema parameters to the observed objects. No formal argument is given that the chosen neural parameterization or loss eliminates symmetries (e.g., identical effects produced by distinct bindings under commutative actions or symmetric objects).

Authors: We acknowledge that the manuscript does not contain a formal proof that the neural parameterization and loss guarantee unique bindings in the presence of symmetries. The architecture relies on end-to-end optimization of a reconstruction loss over state transitions, which empirically selects the ground-truth schemas and bindings in the evaluated domains; the joint inference of schemas and argument bindings appears to break many symmetries because incorrect bindings produce inconsistent effects across multiple transitions. However, we agree this is an informal observation rather than a rigorous argument. In the revision we will add a dedicated discussion subsection that (i) explicitly identifies the symmetry issue, (ii) explains why the current loss and parameterization tend to avoid it in practice, and (iii) notes the conditions under which multiple bindings could remain consistent. revision: partial
Referee: [Experimental Evaluation] The experimental section reports high recovery rates but does not include controlled ablations that inject controlled ambiguity (partial observability, symmetric predicates, or multiple consistent bindings) and measure degradation in schema fidelity; without such tests the robustness claim remains under-supported.

Authors: We agree that the current experimental section would be strengthened by controlled ablations that systematically introduce ambiguity. The existing noise-robustness experiments already vary observation noise, but they do not isolate symmetric predicates or multiple consistent bindings. We will add two new ablation studies in the revised manuscript: (1) domains containing commutative actions and symmetric objects, measuring schema recovery accuracy as a function of the degree of symmetry, and (2) a controlled test that forces the model to choose among multiple bindings that produce identical effects on a subset of transitions. These results will be reported alongside the existing tables to directly quantify degradation in schema fidelity. revision: yes

Circularity Check

0 steps flagged

No circularity: learning driven by external traces and ground-truth recovery

full rationale

The paper introduces a neural architecture to learn lifted action schemas from fully-observed state traces while recovering unobserved action arguments. Success is measured by fidelity to externally supplied ground-truth schemas on standard planning domains, with additional robustness experiments. No equation or claim reduces by construction to a fitted parameter, self-citation, or renamed input; the inverse problem of argument binding is solved via differentiable optimization against observed add/delete effects rather than by definitional fiat. The derivation therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach rests on standard planning assumptions about fully observed states and the existence of a finite set of lifted schemas that explain all observed transitions.

axioms (2)

domain assumption States are fully observed as sets of atoms over objects and relations
Stated in the problem setup as the input to the learning problem.
domain assumption Action arguments can be identified from observed state changes
Central to the joint learning task described in the abstract.

pith-pipeline@v0.9.0 · 5511 in / 1176 out tokens · 41019 ms · 2026-05-14T19:37:18.570924+00:00 · methodology

Differentiable Learning of Lifted Action Schemas for Classical Planning

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)