Learning High-Level Planning Symbols from Intrinsically Motivated Experience
Pith reviewed 2026-05-24 19:29 UTC · model grok-4.3
The pith
An architecture acquires options autonomously via open-ended learning and then constructs a usable PDDL domain from them.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The architecture first acquires options in a fully autonomous fashion on the basis of open-ended learning, then builds a PDDL domain based on symbols and operators that can be used to accomplish user-defined goals through a standard PDDL planner. It starts from an existing procedure on benchmark domains involving a humanoid robot interacting with objects, identifies critical aspects of the information abstraction process, and mitigates them by analysing the type of classifiers that produce suitable symbol grounding.
What carries the argument
The automatic abstraction procedure that converts learned options into PDDL symbols and operators, extended through classifier selection for improved symbol grounding.
If this is right
- User goals can be solved by an off-the-shelf PDDL planner without any expert-supplied domain description.
- Options are obtained without human supervision or intervention.
- Suitable classifiers mitigate abstraction difficulties that appear when options are turned into planning symbols.
- The resulting PDDL domain supports planning in robot interaction settings where states change through direct contact with objects.
Where Pith is reading between the lines
- The same pipeline could be tested on non-robot environments to check whether classifier choice remains the decisive factor for abstraction quality.
- If the autonomous option acquisition scales to longer horizons, the generated PDDL domains might support planning tasks that current hand-crafted domains cannot express.
- The approach suggests a route for closing the loop between intrinsic-motivation learning and symbolic planning without manual symbol design.
Load-bearing premise
The classifiers examined produce symbol grounding that reduces the information abstraction problems observed in the benchmark domains.
What would settle it
Running the extended procedure on the same humanoid-robot benchmarks and finding that the chosen classifiers do not reduce the observed abstraction problems would falsify the central claim.
Figures
read the original abstract
In symbolic planning systems, the knowledge on the domain is commonly provided by an expert. Recently, an automatic abstraction procedure has been proposed in the literature to create a Planning Domain Definition Language (PDDL) representation, which is the most widely used input format for most off-the-shelf automated planners, starting from `options', a data structure used to represent actions within the hierarchical reinforcement learning framework. We propose an architecture that potentially removes the need for human intervention. In particular, the architecture first acquires options in a fully autonomous fashion on the basis of open-ended learning, then builds a PDDL domain based on symbols and operators that can be used to accomplish user-defined goals through a standard PDDL planner. We start from an implementation of the above mentioned procedure tested on a set of benchmark domains in which a humanoid robot can change the state of some objects through direct interaction with the environment. We then investigate some critical aspects of the information abstraction process that have been observed, and propose an extension that mitigates such criticalities, in particular by analysing the type of classifiers that allow a suitable grounding of symbols.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes an architecture that first acquires options autonomously via open-ended learning and then constructs a PDDL domain model consisting of symbols and operators. This model enables a standard PDDL planner to solve user-defined goals. The work begins from a prior option-to-PDDL abstraction procedure evaluated on humanoid-robot benchmark domains involving object-state changes, identifies criticalities in the information-abstraction process, and extends the procedure by analyzing classifier types that support suitable symbol grounding.
Significance. If validated, the architecture would advance the integration of intrinsically motivated hierarchical reinforcement learning with symbolic planning, reducing reliance on expert-provided domain knowledge. The explicit focus on classifier choice for mitigating abstraction criticalities in robotic interaction domains addresses a concrete practical barrier identified in the referenced prior conversion procedure.
major comments (1)
- [Abstract] Abstract: the manuscript supplies no quantitative results, error analysis, success rates on the benchmark domains, or comparison of the proposed classifier extension against the baseline procedure, rendering it impossible to assess whether the extension actually mitigates the identified abstraction criticalities.
Simulated Author's Rebuttal
We thank the referee for the feedback. We address the major comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the manuscript supplies no quantitative results, error analysis, success rates on the benchmark domains, or comparison of the proposed classifier extension against the baseline procedure, rendering it impossible to assess whether the extension actually mitigates the identified abstraction criticalities.
Authors: The manuscript starts from the prior option-to-PDDL procedure (which included evaluations on the humanoid-robot benchmarks) and focuses on identifying observed criticalities in the abstraction process together with an extension that analyzes classifier types for improved symbol grounding. The contribution is the identification of these criticalities and the classifier-based mitigation strategy rather than a new end-to-end empirical study. We agree that the absence of quantitative results, error analysis, success rates, and direct comparisons against the baseline makes it difficult to quantify the improvement. We will add a dedicated experimental section that reports these metrics on the same benchmark domains. revision: yes
Circularity Check
No significant circularity identified
full rationale
The paper describes an architecture that first acquires options autonomously via open-ended learning and then constructs a PDDL domain from them for use with standard planners. It references a prior automatic abstraction procedure from the literature for converting options to PDDL but extends this with an independent analysis of classifier types for symbol grounding on humanoid-robot benchmarks. No equations, fitted parameters, self-definitional steps, or load-bearing self-citations that reduce claims to inputs by construction are present in the provided text; the central claims rest on empirical investigation of abstraction criticalities rather than any renaming, ansatz smuggling, or uniqueness imported from the authors' own prior work.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Options can be acquired autonomously through open-ended learning in robotic environments without human intervention.
- domain assumption Classifiers of suitable type exist that allow proper grounding of symbols from the learned options.
Reference graph
Works this paper leans on
-
[1]
Intrinsically motivated learning in natu- ral and artificial systems
[Baldassarre and Mirolli, 2013] Gianluca Baldassarre and Marco Mirolli. Intrinsically motivated learning in natu- ral and artificial systems. Springer,
work page 2013
-
[2]
[Ghallab et al., 1998] M. Ghallab, A. Howe, C. Knoblock, D. Mcdermott, A. Ram, M. Veloso, D. Weld, and D. Wilkins. PDDL—The Planning Domain Definition Language,
work page 1998
-
[3]
[Hall et al., 2009] Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. The weka data mining software: an update. SIGKDD Ex- plor. Newsl., 11(1):10–18,
work page 2009
-
[4]
From skills to sym- bols: Learning symbolic representations for abstract high- level planning
[Konidaris et al., 2018] George Konidaris, Leslie Pack Kael- bling, and Tomas Lozano-Perez. From skills to sym- bols: Learning symbolic representations for abstract high- level planning. Journal of Artificial Intelligence Research, 61:215–289,
work page 2018
-
[5]
Constructing abstrac- tion hierarchies using a skill-symbol loop
[Konidaris, 2016] George Konidaris. Constructing abstrac- tion hierarchies using a skill-symbol loop. Proceedings of the 25th International Joint Conference on Artificial Intel- ligence, 61:1648–1654,
work page 2016
-
[6]
Intrinsic motivation systems for au- tonomous mental development
[Oudeyer et al., 2007] Pierre-Yves Oudeyer, Frdric Kaplan, and Verena V Hafner. Intrinsic motivation systems for au- tonomous mental development. IEEE transactions on evo- lutionary computation, 11(2):265–286,
work page 2007
-
[7]
[Quinlan, 1993] J. Ross Quinlan. C4.5: Programs for Ma- chine Learning. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA,
work page 1993
-
[8]
[Santucci et al., 2013] Vieri Giuliano Santucci, Gianluca Baldassarre, and Marco Mirolli. Intrinsic motivation sig- nals for driving the acquisition of multiple tasks: a sim- ulated robotic study. In Proceedings of the 12th Interna- tional Conference on Cognitive Modelling (ICCM),
work page 2013
-
[9]
[Santucci et al., 2014] Vieri G Santucci, Gianluca Baldas- sarre, and Marco Mirolli. Autonomous selection of the “what” and the “how” of learning: an intrinsically mo- tivated system tested with a two armed robot. In De- velopment and Learning and Epigenetic Robotics (ICDL- Epirob), 2014 Joint IEEE International Conferences on , pages 434–439. IEEE,
work page 2014
-
[10]
GRAIL: A goal- discovering robotic architecture for intrinsically-motivated learning
[Santucci et al., 2016] Vieri Giuliano Santucci, Gianluca Baldassarre, and Marco Mirolli. GRAIL: A goal- discovering robotic architecture for intrinsically-motivated learning. IEEE Trans. Cognitive and Developmental Sys- tems, 8(3):214–231,
work page 2016
-
[11]
Autonomous Reinforcement Learning of Multiple Interrelated Tasks
[Santucci et al., 2019] Vieri Giuliano Santucci, Gianluca Baldassarre, and Emilio Cartoni. Autonomous reinforce- ment learning of multiple interrelated tasks.arXiv preprint arXiv:1906.01374,
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[12]
Sutton, Doina Precup, and Satinder Singh
[Sutton et al., 1999] Richard S. Sutton, Doina Precup, and Satinder Singh. Between mdps and semi-mdps: A frame- work for temporal abstraction in reinforcement learning. Artif. Intell., 112(1-2):181–211, August 1999
work page 1999
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.