Learning High-Level Planning Symbols from Intrinsically Motivated Experience

Angelo Oddi; Emilio Cartoni; Gabriele Sartor; Gianluca Baldassarre; Riccardo Rasconi; Vieri Giuliano Santucci

arxiv: 1907.08313 · v1 · pith:XFHETZRSnew · submitted 2019-07-18 · 💻 cs.AI · cs.LG

Learning High-Level Planning Symbols from Intrinsically Motivated Experience

Angelo Oddi , Riccardo Rasconi , Emilio Cartoni , Gabriele Sartor , Gianluca Baldassarre , Vieri Giuliano Santucci This is my paper

Pith reviewed 2026-05-24 19:29 UTC · model grok-4.3

classification 💻 cs.AI cs.LG

keywords options learningPDDL domain generationsymbolic planningintrinsic motivationautonomous abstractionsymbol groundinghierarchical reinforcement learningrobot interaction

0 comments

The pith

An architecture acquires options autonomously via open-ended learning and then constructs a usable PDDL domain from them.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents an architecture that first lets an agent acquire options on its own through intrinsic motivation and open-ended interaction. These options are then turned into symbols and operators that form a complete PDDL domain. A standard planner can thereafter solve goals supplied by a user. The work begins from an earlier abstraction method tested on humanoid-robot benchmarks and extends it by examining how different classifiers ground the symbols and reduce abstraction errors observed in those domains.

Core claim

The architecture first acquires options in a fully autonomous fashion on the basis of open-ended learning, then builds a PDDL domain based on symbols and operators that can be used to accomplish user-defined goals through a standard PDDL planner. It starts from an existing procedure on benchmark domains involving a humanoid robot interacting with objects, identifies critical aspects of the information abstraction process, and mitigates them by analysing the type of classifiers that produce suitable symbol grounding.

What carries the argument

The automatic abstraction procedure that converts learned options into PDDL symbols and operators, extended through classifier selection for improved symbol grounding.

If this is right

User goals can be solved by an off-the-shelf PDDL planner without any expert-supplied domain description.
Options are obtained without human supervision or intervention.
Suitable classifiers mitigate abstraction difficulties that appear when options are turned into planning symbols.
The resulting PDDL domain supports planning in robot interaction settings where states change through direct contact with objects.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same pipeline could be tested on non-robot environments to check whether classifier choice remains the decisive factor for abstraction quality.
If the autonomous option acquisition scales to longer horizons, the generated PDDL domains might support planning tasks that current hand-crafted domains cannot express.
The approach suggests a route for closing the loop between intrinsic-motivation learning and symbolic planning without manual symbol design.

Load-bearing premise

The classifiers examined produce symbol grounding that reduces the information abstraction problems observed in the benchmark domains.

What would settle it

Running the extended procedure on the same humanoid-robot benchmarks and finding that the chosen classifiers do not reduce the observed abstraction problems would falsify the central claim.

Figures

Figures reproduced from arXiv: 1907.08313 by Angelo Oddi, Emilio Cartoni, Gabriele Sartor, Gianluca Baldassarre, Riccardo Rasconi, Vieri Giuliano Santucci.

**Figure 1.** Figure 1: presents a graphical representation of the two Cl(I) and Cl(E) classifiers for each option, trained from the data obtained through the agent’s interactions with the six-bulbs environment described above1 . For example, let us look at the o4 row, Cl(I) column in the figure, representing the Initiation Set classifier of option o4. For the sake of simplicity, we make the assumption that each bulb bi is repre… view at source ↗

**Figure 2.** Figure 2: Running example: factors computation Continuing the previous example, [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 4.** Figure 4: Running example: the produced symbols sifier is defined. Note that due to the simplicity of the selected example, each produced symbol in the figure is identical to one of the effect set classifiers Cl(E) in [PITH_FULL_IMAGE:figures/full_fig_p003_4.png] view at source ↗

**Figure 3.** Figure 3: A visualization of the projection operation. Considering [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 5.** Figure 5: Reset PDDL domain using C4.5 The complete symbolic representation of the example domain is presented in [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗

**Figure 6.** Figure 6: Projections on three different set representations - a) blue [PITH_FULL_IMAGE:figures/full_fig_p004_6.png] view at source ↗

**Figure 7.** Figure 7: Reset PDDL domain using IntM The symbolic abstraction of this scenario returned by the PDDL-Gen procedure using the C4.5 and the IntM classifiers are respectively shown in [PITH_FULL_IMAGE:figures/full_fig_p005_7.png] view at source ↗

**Figure 8.** Figure 8: Scenario Negative: C4.5 classifier ditions, adds symbol σ3 as positive effect (i.e., bulb b1 on), but surprisingly includes symbol σ1 as negative effect (i.e., switches b2 off), while σ1 should be included among the positive effects! The reason why σ1 is not added to the positive effects can be explained by considering the symbol generation process described in Section 3, together with the C4.5’s classifi… view at source ↗

**Figure 9.** Figure 9: Scenario Unreachable: C4.5 classifier [PITH_FULL_IMAGE:figures/full_fig_p006_9.png] view at source ↗

**Figure 10.** Figure 10: Scenario Unreachable: IntM classifier 5 Conclusions In this paper we have connected a goal-discovering and skilllearning robotic architecture (GRAIL) see [Santucci et al., 2016] to the abstraction procedure proposed in [Konidaris et al., 2018], creating a processing pipeline from the low-level direct interaction of the agent with the environment, to the corresponding symbolic representation of the same e… view at source ↗

read the original abstract

In symbolic planning systems, the knowledge on the domain is commonly provided by an expert. Recently, an automatic abstraction procedure has been proposed in the literature to create a Planning Domain Definition Language (PDDL) representation, which is the most widely used input format for most off-the-shelf automated planners, starting from `options', a data structure used to represent actions within the hierarchical reinforcement learning framework. We propose an architecture that potentially removes the need for human intervention. In particular, the architecture first acquires options in a fully autonomous fashion on the basis of open-ended learning, then builds a PDDL domain based on symbols and operators that can be used to accomplish user-defined goals through a standard PDDL planner. We start from an implementation of the above mentioned procedure tested on a set of benchmark domains in which a humanoid robot can change the state of some objects through direct interaction with the environment. We then investigate some critical aspects of the information abstraction process that have been observed, and propose an extension that mitigates such criticalities, in particular by analysing the type of classifiers that allow a suitable grounding of symbols.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This extends prior option-to-PDDL conversion by testing classifiers for symbol grounding on humanoid robot benchmarks.

read the letter

The main takeaway is that the authors start from an existing automatic procedure that turns learned options into PDDL and then examine how different classifiers affect the symbol-grounding step on a set of humanoid-robot interaction tasks. They keep the pipeline autonomous: options come from open-ended intrinsic motivation, and the resulting domain is meant to support standard PDDL planning for user goals. That framing is consistent with the abstract and the cited prior work. The concrete addition is the focus on abstraction criticalities and the classifier analysis as a targeted fix. That is a narrow but legitimate incremental move rather than a new framework. The paper does a reasonable job laying out the architecture and the motivation for looking at classifiers. The soft spot is the absence of any reported numbers. The abstract describes the investigation but supplies no success rates, classifier comparisons, planning success metrics, or error analysis, so it is impossible to judge whether the proposed extension actually reduces the abstraction problems in practice. The benchmarks sound relevant to robotics, yet without those results the claims stay at the level of proposal. This is the sort of paper that would interest people working at the RL-symbolic planning boundary in robotics. It is not a broad advance, but the question it asks is real and the setup is grounded in prior results. A serious editor should send it to referees so the empirical side can be checked and strengthened if the data support the direction.

Referee Report

1 major / 0 minor

Summary. The paper proposes an architecture that first acquires options autonomously via open-ended learning and then constructs a PDDL domain model consisting of symbols and operators. This model enables a standard PDDL planner to solve user-defined goals. The work begins from a prior option-to-PDDL abstraction procedure evaluated on humanoid-robot benchmark domains involving object-state changes, identifies criticalities in the information-abstraction process, and extends the procedure by analyzing classifier types that support suitable symbol grounding.

Significance. If validated, the architecture would advance the integration of intrinsically motivated hierarchical reinforcement learning with symbolic planning, reducing reliance on expert-provided domain knowledge. The explicit focus on classifier choice for mitigating abstraction criticalities in robotic interaction domains addresses a concrete practical barrier identified in the referenced prior conversion procedure.

major comments (1)

[Abstract] Abstract: the manuscript supplies no quantitative results, error analysis, success rates on the benchmark domains, or comparison of the proposed classifier extension against the baseline procedure, rendering it impossible to assess whether the extension actually mitigates the identified abstraction criticalities.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the feedback. We address the major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the manuscript supplies no quantitative results, error analysis, success rates on the benchmark domains, or comparison of the proposed classifier extension against the baseline procedure, rendering it impossible to assess whether the extension actually mitigates the identified abstraction criticalities.

Authors: The manuscript starts from the prior option-to-PDDL procedure (which included evaluations on the humanoid-robot benchmarks) and focuses on identifying observed criticalities in the abstraction process together with an extension that analyzes classifier types for improved symbol grounding. The contribution is the identification of these criticalities and the classifier-based mitigation strategy rather than a new end-to-end empirical study. We agree that the absence of quantitative results, error analysis, success rates, and direct comparisons against the baseline makes it difficult to quantify the improvement. We will add a dedicated experimental section that reports these metrics on the same benchmark domains. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper describes an architecture that first acquires options autonomously via open-ended learning and then constructs a PDDL domain from them for use with standard planners. It references a prior automatic abstraction procedure from the literature for converting options to PDDL but extends this with an independent analysis of classifier types for symbol grounding on humanoid-robot benchmarks. No equations, fitted parameters, self-definitional steps, or load-bearing self-citations that reduce claims to inputs by construction are present in the provided text; the central claims rest on empirical investigation of abstraction criticalities rather than any renaming, ansatz smuggling, or uniqueness imported from the authors' own prior work.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on two domain assumptions: that open-ended intrinsic motivation can produce usable options in the robot domains, and that appropriate classifiers can ground symbols sufficiently to mitigate abstraction problems. No free parameters or invented entities are described in the abstract.

axioms (2)

domain assumption Options can be acquired autonomously through open-ended learning in robotic environments without human intervention.
This is required to remove the need for expert-provided knowledge as stated in the abstract.
domain assumption Classifiers of suitable type exist that allow proper grounding of symbols from the learned options.
The abstract states that the extension focuses on analyzing such classifiers to mitigate critical aspects of the abstraction process.

pith-pipeline@v0.9.0 · 5737 in / 1278 out tokens · 21207 ms · 2026-05-24T19:29:22.360925+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages · 1 internal anchor

[1]

Intrinsically motivated learning in natu- ral and artiﬁcial systems

[Baldassarre and Mirolli, 2013] Gianluca Baldassarre and Marco Mirolli. Intrinsically motivated learning in natu- ral and artiﬁcial systems. Springer,

work page 2013
[2]

Ghallab, A

[Ghallab et al., 1998] M. Ghallab, A. Howe, C. Knoblock, D. Mcdermott, A. Ram, M. Veloso, D. Weld, and D. Wilkins. PDDL—The Planning Domain Deﬁnition Language,

work page 1998
[3]

[Hall et al., 2009] Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. The weka data mining software: an update. SIGKDD Ex- plor. Newsl., 11(1):10–18,

work page 2009
[4]

From skills to sym- bols: Learning symbolic representations for abstract high- level planning

[Konidaris et al., 2018] George Konidaris, Leslie Pack Kael- bling, and Tomas Lozano-Perez. From skills to sym- bols: Learning symbolic representations for abstract high- level planning. Journal of Artiﬁcial Intelligence Research, 61:215–289,

work page 2018
[5]

Constructing abstrac- tion hierarchies using a skill-symbol loop

[Konidaris, 2016] George Konidaris. Constructing abstrac- tion hierarchies using a skill-symbol loop. Proceedings of the 25th International Joint Conference on Artiﬁcial Intel- ligence, 61:1648–1654,

work page 2016
[6]

Intrinsic motivation systems for au- tonomous mental development

[Oudeyer et al., 2007] Pierre-Yves Oudeyer, Frdric Kaplan, and Verena V Hafner. Intrinsic motivation systems for au- tonomous mental development. IEEE transactions on evo- lutionary computation, 11(2):265–286,

work page 2007
[7]

Ross Quinlan

[Quinlan, 1993] J. Ross Quinlan. C4.5: Programs for Ma- chine Learning. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA,

work page 1993
[8]

Intrinsic motivation sig- nals for driving the acquisition of multiple tasks: a sim- ulated robotic study

[Santucci et al., 2013] Vieri Giuliano Santucci, Gianluca Baldassarre, and Marco Mirolli. Intrinsic motivation sig- nals for driving the acquisition of multiple tasks: a sim- ulated robotic study. In Proceedings of the 12th Interna- tional Conference on Cognitive Modelling (ICCM),

work page 2013
[9]

what” and the “how

[Santucci et al., 2014] Vieri G Santucci, Gianluca Baldas- sarre, and Marco Mirolli. Autonomous selection of the “what” and the “how” of learning: an intrinsically mo- tivated system tested with a two armed robot. In De- velopment and Learning and Epigenetic Robotics (ICDL- Epirob), 2014 Joint IEEE International Conferences on , pages 434–439. IEEE,

work page 2014
[10]

GRAIL: A goal- discovering robotic architecture for intrinsically-motivated learning

[Santucci et al., 2016] Vieri Giuliano Santucci, Gianluca Baldassarre, and Marco Mirolli. GRAIL: A goal- discovering robotic architecture for intrinsically-motivated learning. IEEE Trans. Cognitive and Developmental Sys- tems, 8(3):214–231,

work page 2016
[11]

Autonomous Reinforcement Learning of Multiple Interrelated Tasks

[Santucci et al., 2019] Vieri Giuliano Santucci, Gianluca Baldassarre, and Emilio Cartoni. Autonomous reinforce- ment learning of multiple interrelated tasks.arXiv preprint arXiv:1906.01374,

work page internal anchor Pith review Pith/arXiv arXiv 2019
[12]

Sutton, Doina Precup, and Satinder Singh

[Sutton et al., 1999] Richard S. Sutton, Doina Precup, and Satinder Singh. Between mdps and semi-mdps: A frame- work for temporal abstraction in reinforcement learning. Artif. Intell., 112(1-2):181–211, August 1999

work page 1999

[1] [1]

Intrinsically motivated learning in natu- ral and artiﬁcial systems

[Baldassarre and Mirolli, 2013] Gianluca Baldassarre and Marco Mirolli. Intrinsically motivated learning in natu- ral and artiﬁcial systems. Springer,

work page 2013

[2] [2]

Ghallab, A

[Ghallab et al., 1998] M. Ghallab, A. Howe, C. Knoblock, D. Mcdermott, A. Ram, M. Veloso, D. Weld, and D. Wilkins. PDDL—The Planning Domain Deﬁnition Language,

work page 1998

[3] [3]

[Hall et al., 2009] Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. The weka data mining software: an update. SIGKDD Ex- plor. Newsl., 11(1):10–18,

work page 2009

[4] [4]

From skills to sym- bols: Learning symbolic representations for abstract high- level planning

[Konidaris et al., 2018] George Konidaris, Leslie Pack Kael- bling, and Tomas Lozano-Perez. From skills to sym- bols: Learning symbolic representations for abstract high- level planning. Journal of Artiﬁcial Intelligence Research, 61:215–289,

work page 2018

[5] [5]

Constructing abstrac- tion hierarchies using a skill-symbol loop

[Konidaris, 2016] George Konidaris. Constructing abstrac- tion hierarchies using a skill-symbol loop. Proceedings of the 25th International Joint Conference on Artiﬁcial Intel- ligence, 61:1648–1654,

work page 2016

[6] [6]

Intrinsic motivation systems for au- tonomous mental development

[Oudeyer et al., 2007] Pierre-Yves Oudeyer, Frdric Kaplan, and Verena V Hafner. Intrinsic motivation systems for au- tonomous mental development. IEEE transactions on evo- lutionary computation, 11(2):265–286,

work page 2007

[7] [7]

Ross Quinlan

[Quinlan, 1993] J. Ross Quinlan. C4.5: Programs for Ma- chine Learning. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA,

work page 1993

[8] [8]

Intrinsic motivation sig- nals for driving the acquisition of multiple tasks: a sim- ulated robotic study

[Santucci et al., 2013] Vieri Giuliano Santucci, Gianluca Baldassarre, and Marco Mirolli. Intrinsic motivation sig- nals for driving the acquisition of multiple tasks: a sim- ulated robotic study. In Proceedings of the 12th Interna- tional Conference on Cognitive Modelling (ICCM),

work page 2013

[9] [9]

what” and the “how

[Santucci et al., 2014] Vieri G Santucci, Gianluca Baldas- sarre, and Marco Mirolli. Autonomous selection of the “what” and the “how” of learning: an intrinsically mo- tivated system tested with a two armed robot. In De- velopment and Learning and Epigenetic Robotics (ICDL- Epirob), 2014 Joint IEEE International Conferences on , pages 434–439. IEEE,

work page 2014

[10] [10]

GRAIL: A goal- discovering robotic architecture for intrinsically-motivated learning

[Santucci et al., 2016] Vieri Giuliano Santucci, Gianluca Baldassarre, and Marco Mirolli. GRAIL: A goal- discovering robotic architecture for intrinsically-motivated learning. IEEE Trans. Cognitive and Developmental Sys- tems, 8(3):214–231,

work page 2016

[11] [11]

Autonomous Reinforcement Learning of Multiple Interrelated Tasks

[Santucci et al., 2019] Vieri Giuliano Santucci, Gianluca Baldassarre, and Emilio Cartoni. Autonomous reinforce- ment learning of multiple interrelated tasks.arXiv preprint arXiv:1906.01374,

work page internal anchor Pith review Pith/arXiv arXiv 2019

[12] [12]

Sutton, Doina Precup, and Satinder Singh

[Sutton et al., 1999] Richard S. Sutton, Doina Precup, and Satinder Singh. Between mdps and semi-mdps: A frame- work for temporal abstraction in reinforcement learning. Artif. Intell., 112(1-2):181–211, August 1999

work page 1999