pith. sign in

arxiv: 2510.11144 · v2 · submitted 2025-10-13 · 💻 cs.AI · cs.CL

How²: How to learn from procedural How-to questions

Pith reviewed 2026-05-18 08:12 UTC · model grok-4.3

classification 💻 cs.AI cs.CL
keywords how-to questionslifelong learningplanning agentsmemory frameworkabstraction levelsinteractive environmentsknowledge reuse
0
0 comments X

The pith

Lifelong learning agents improve at planning when they store and reuse high-level answers to how-to questions rather than concrete action sequences.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a framework in which planning agents ask how-to questions during tasks, receive answers at different levels of detail, and store those answers for later use. It tests the idea in a Minecraft-style crafting environment where agents must assemble items by managing an inventory. Teachers supply either step-by-step executable actions or more abstract descriptions of subgoals. The central result is that agents achieve the best long-term performance when the stored answers describe goals without depending on the exact current state. This separation lets the same piece of knowledge apply to many different situations across a sequence of tasks.

Core claim

Using teacher models that answer at varying levels of abstraction, from executable action sequences to high-level subgoal descriptions, lifelong learning agents benefit most from answers that are abstracted and decoupled from the current state.

What carries the argument

The $How^{2}$ memory agent framework that lets agents ask how-to questions, store the answers, and reuse them for planning in interactive environments.

If this is right

  • Agents reduce uncertainty in both present and future planning problems by retrieving stored how-to knowledge.
  • Abstract answers transfer across different states and tasks more reliably than state-specific action lists.
  • Over many tasks the agent accumulates a growing set of reusable procedural knowledge without retraining.
  • LLM-based agents gain a mechanism to improve their planning abilities gradually through question asking and memory reuse.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Agents could be extended to select the most useful abstraction level when asking future questions based on past reuse success.
  • The same storage-and-reuse pattern might apply to robotic or navigation domains where plans must generalize across changing conditions.
  • Stored answers might be organized by similarity of tasks so that retrieval becomes more precise as the memory grows.

Load-bearing premise

The evaluation assumes that answers provided by teacher models at different abstraction levels can be effectively stored and reused by the agent without introducing errors or conflicts in the planning process across multiple tasks.

What would settle it

Test the same agent on a series of new but related assembly tasks and measure whether success rates are higher when it reuses stored abstract subgoal descriptions than when it reuses stored executable action sequences.

Figures

Figures reproduced from arXiv: 2510.11144 by Alex Lascarides, Frank Keller, Gautier Dagan.

Figure 1
Figure 1. Figure 1: We solve a Minecraft planning task through a [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Our proposed How2 agent framework for lifelong learning with external knowledge from a teacher. 1) The agent can call a read-memory tool which queries the memory module with a query θ. The memory is a key-value mapping which retrieves and indexes memories given the search query θ. 2) When nothing is stored under θ or all memories fail a relevance check w.r.t. the current state, then 3) the agent asks a how… view at source ↗
Figure 3
Figure 3. Figure 3: The executable teacher returns a full plan that is conditioned on the current inventory—where the inventory locations are instantiated. The subgoal-partially-executable teacher returns instructions where the inventory slots are not specified and decomposes each subtasks into identifiable subgoals. This generalises to unseen inventories as the crafting patterns remain the same. Lastly, the non-executable te… view at source ↗
Figure 4
Figure 4. Figure 4: Bar chart showing the success rate of the dif [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Heat-map for the performance of the executable teacher in each setup. We show the success rate (colour) and counts (values) per cache misses and cache hits. This highlights the effectiveness How2 in improving agent performance by filtering irrelevant memories, but also the trade-off between cache hits and success. rate and high cache miss rates. This is because most of the cached executable plans are irrel… view at source ↗
Figure 6
Figure 6. Figure 6: Distribution of path lengths for optimal plans [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: We plot the success rate of the different teach [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Heatmaps illustrating the performance (Success Rate and Cache Miss Rate) of the [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: The system prompt, defining the environment rules and action constraints within Plancraft and used by the [PITH_FULL_IMAGE:figures/full_fig_p019_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: The JSON schema for the read_memory tool. This tool allows the agent to search the memory database for previously stored recipes and instructions. Think Tool { "type": "function", "function": { "name": "think", "description": "Generate thoughts to help you decide on the next action", "parameters": { "type": "object", "properties": { "thought": { "type": "string", "description": "<thought message>" } }, "r… view at source ↗
Figure 11
Figure 11. Figure 11: The JSON schema for the think tool. This tool enables the agent to generate internal thoughts to guide its decision-making process. 20 [PITH_FULL_IMAGE:figures/full_fig_p020_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: The JSON schema for the move tool. This tool allows the agent to move items between different slots in the crafting grid or inventory. Smelt Tool { "type": "function", "function": { "name": "smelt", "description": "Smelt an item in a furnace and moves the output to a specific slot", "parameters": { "type": "object", "properties": { "slot_from": { "type": "string", "description": "The slot to smelt the ite… view at source ↗
Figure 13
Figure 13. Figure 13: The JSON schema for the smelt tool. This tool enables the agent to smelt items in Plancraft. 21 [PITH_FULL_IMAGE:figures/full_fig_p021_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: The JSON schema for the impossible tool. This tool allows the agent to declare when a task cannot be completed, providing a reason for the impossibility. Ask Prompt # Agent Context: {{context}} Formulate a procedural how-to question about {{recipe_name}} Consider the following: 1. Focus on asking **how** to craft the target item mentioned in the Agent Context 2. Use the agent context 3. Ensure the questio… view at source ↗
Figure 15
Figure 15. Figure 15: The prompt for the ‘ask’ role. This guides the agent in formulating a procedural ‘how-to‘ question when [PITH_FULL_IMAGE:figures/full_fig_p022_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: The prompt for the ‘relevance-check‘ role. This is used to validate whether a cached memory entry is [PITH_FULL_IMAGE:figures/full_fig_p022_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: The prompt for the ‘parse‘ role. This structures the teacher’s answer into a generalised format suitable [PITH_FULL_IMAGE:figures/full_fig_p023_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: The prompt for the non-executable teacher. This prompt instructs the teacher model to provide high-level, [PITH_FULL_IMAGE:figures/full_fig_p024_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: Success example from Just Ask and the subgoal-partially-executable teacher. Memory is read, and since we are in Just Ask setup, the agent asks a how-to question to the teacher. The result is a plan of actions (two actions required) to craft the goal (crimson planks). The Actor is able to successfully follow the Teacher’s plan grounding the item name to specific positions. Failure: VAL0540 (Target: craftin… view at source ↗
Figure 20
Figure 20. Figure 20: Failure example from Just Ask and the subgoal-partially-executable teacher. The agent emits an impossible action even though the task is solvable. The agent should have recognised that the oak logs can be crafted into planks, which are the main ingredient for the crafting table. In this case, the agent fails to call the memory module and therefore no teacher is consulted. 25 [PITH_FULL_IMAGE:figures/full… view at source ↗
Figure 21
Figure 21. Figure 21: Success example a from Memory-Only executable teacher. Compared to the partially executable plans, the executable teacher provides fully grounded answers. 26 [PITH_FULL_IMAGE:figures/full_fig_p026_21.png] view at source ↗
Figure 22
Figure 22. Figure 22: Failure example from Memory-Only with an [PITH_FULL_IMAGE:figures/full_fig_p027_22.png] view at source ↗
Figure 23
Figure 23. Figure 23: Success example from How2 with a non-executable teacher. The agent reads the memory for an acacia pressure plate, since there are no relevant memories, it asks a how-to question to the teacher. The teacher answers in the generic recipe that is entirely ungrounded in the Plancraft environment. The parse step, ground the answer to actual slots. The agent then uses the parsed memory to successfully craft the… view at source ↗
Figure 24
Figure 24. Figure 24: Failure example from How2 with a non-executable teacher. The teacher provides an ungrounded explanation of the pattern, which the parsed step translates into a structured memory. In this case, the parse step fails to ground the 3x2 pattern to the 6 relevant crafting slots. As the agent starts filling the crafting grid and following the instructions, the brown carpet is added to the output slot as its patt… view at source ↗
read the original abstract

An agent facing a planning problem can use answers to how-to questions to reduce uncertainty and fill knowledge gaps, helping it solve both current and future tasks. However, their open ended nature, where valid answers to "How do I X?" range from executable actions to high-level descriptions of X's sub-goals, makes them challenging for AI agents to ask, and for AI experts to answer, in ways that support efficient planning. We introduce $How^{2}$, a memory agent framework that enables agents to ask how-to questions, store the answers, and reuse them for lifelong learning in interactive environments. We evaluate our approach in Plancraft, a Minecraft crafting environment, where agents must complete an assembly task by manipulating inventory items. Using teacher models that answer at varying levels of abstraction, from executable action sequences to high-level subgoal descriptions, we show that lifelong learning agents benefit most from answers that are abstracted and decoupled from the current state. $How^{2}$ offers a way for LLM-based agents to improve their planning capabilities over time by asking questions in interactive environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces $How^{2}$, a memory agent framework enabling LLM-based agents to pose how-to questions, store answers from teacher models at varying abstraction levels (executable actions to high-level subgoals), and reuse them for lifelong learning. Evaluated in the Plancraft Minecraft crafting environment, the central claim is that agents benefit most from abstracted, state-decoupled answers rather than current-state-tied sequences.

Significance. If the empirical results hold, the work could meaningfully advance planning capabilities in interactive agents by providing a mechanism for accumulating procedural knowledge across tasks. The explicit variation of answer abstraction levels and the concrete Plancraft testbed are strengths; the framework's emphasis on memory reuse for future tasks addresses a relevant gap in lifelong agent learning.

major comments (2)
  1. [Abstract] Abstract: the claim that lifelong learning agents benefit most from abstracted and state-decoupled answers is stated without any quantitative results, metrics, error bars, baselines, or details on how performance was measured or compared, leaving the central empirical claim without visible supporting data.
  2. [$How^{2}$ memory agent framework] $How^{2}$ memory agent framework and Plancraft evaluation: the framework stores and reuses state-decoupled answers (e.g., high-level subgoal descriptions) but provides no explicit mechanism for grounding, updating, or resolving conflicts when inventory states change across tasks, which risks invalid plans on reuse.
minor comments (2)
  1. [Introduction] The notation and acronym $How^{2}$ should be introduced with a brief expansion and motivation in the first paragraph of the introduction.
  2. [Experiments] Figure captions and axis labels in the Plancraft results should explicitly state the performance metric and number of runs.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. We have carefully considered each comment and made revisions to improve the clarity and completeness of the work, particularly strengthening the abstract and addressing potential limitations in the framework description.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that lifelong learning agents benefit most from abstracted and state-decoupled answers is stated without any quantitative results, metrics, error bars, baselines, or details on how performance was measured or compared, leaving the central empirical claim without visible supporting data.

    Authors: We agree with this observation. The abstract was intentionally kept concise, but it should better support the central claim with evidence. In the revised manuscript, we have updated the abstract to briefly include key quantitative results, such as the relative improvement in task success rates when using abstracted answers versus state-tied sequences, along with mention of the evaluation metrics (task completion success) and baselines used in the Plancraft experiments. Full details, including error bars from multiple runs, remain in the results section and figures. revision: yes

  2. Referee: [$How^{2}$ memory agent framework] $How^{2}$ memory agent framework and Plancraft evaluation: the framework stores and reuses state-decoupled answers (e.g., high-level subgoal descriptions) but provides no explicit mechanism for grounding, updating, or resolving conflicts when inventory states change across tasks, which risks invalid plans on reuse.

    Authors: This is a valid concern regarding the practical deployment of the memory. The current framework assumes that the agent will ground the stored answers to the current environment state at the time of reuse, using its perception module. However, we acknowledge that explicit mechanisms for updating memories or resolving conflicts (e.g., when an item is no longer available) were not detailed. We have added a paragraph in the framework description explaining the grounding process and a note on limitations, including that conflict resolution is handled heuristically by the planner and may require future extensions for robustness. revision: partial

Circularity Check

0 steps flagged

Empirical framework evaluation is self-contained with no circular derivation

full rationale

The paper introduces the $How^{2}$ memory agent framework for asking, storing, and reusing how-to answers across abstraction levels in the Plancraft Minecraft crafting environment. The central result—that lifelong learning agents benefit most from abstracted, state-decoupled subgoal descriptions—is obtained via direct experimental comparison of teacher-model answers at different abstraction levels. No equations, fitted parameters, or uniqueness theorems are presented that reduce the outcome to the inputs by construction. The evaluation relies on external task performance in Plancraft rather than self-referential definitions or self-citation chains, making the derivation independent and non-circular.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim depends on the domain assumption that abstracted answers can be decoupled from state and reused effectively; no free parameters or invented entities with independent evidence are described.

axioms (1)
  • domain assumption Agents facing planning problems can reduce uncertainty by storing and reusing answers to how-to questions.
    This premise underpins the entire memory agent framework and lifelong learning benefit.
invented entities (1)
  • $How^{2}$ memory agent framework no independent evidence
    purpose: Enables agents to ask how-to questions, store answers, and reuse them for planning.
    This is the core proposed system; no external falsifiable evidence for its components is provided in the abstract.

pith-pipeline@v0.9.0 · 5717 in / 1260 out tokens · 35952 ms · 2026-05-18T08:12:08.565909+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages · 3 internal anchors

  1. [1]

    MemGPT: Towards LLMs as Operating Systems

    Memgpt: Towards llms as operating systems. Preprint, arXiv:2310.08560. Hai Pham, Isma Hadji, Xinnuo Xu, Ziedune Degutyte, Jay Rainey, Evangelos Kazakos, Afsaneh Fazly, Georgios Tzimiropoulos, and Brais Martinez. 2024. Graph guided question answer generation for pro- cedural question-answering. InProceedings of the 18th Conference of the European Chapter o...

  2. [2]

    Learning to Ask: When LLM Agents Meet Unclear Instruction

    Learning to ask: When LLM agents meet un- clear instruction.arXiv preprint arXiv:2409.00557. Version v3, 16 Feb 2025. Julia White, Gabriel Poesia, Robert Hawkins, Dorsa Sadigh, and Noah Goodman. 2021. Open-domain clarification question generation without question ex- amples. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Pro...

  3. [3]

    A-MEM: Agentic Memory for LLM Agents

    Asking clarification questions in knowledge- based question answering. InProceedings of the 2019 Conference on Empirical Methods in Natu- ral Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), page 1618–1629, Hong Kong, China. Association for Computational Linguistics. Wujiang Xu, Kai Mei, Hang Ga...

  4. [4]

    Focus on asking **how** to craft the target item mentioned in the Agent Context

  5. [5]

    Use the agent context

  6. [6]

    {{recipe_name}}

    Ensure the question is concise and focused on {{recipe_name}} Based on the above, please provide a clear, well-formed question about {{recipe_name}}. Only return the question. Do not include any additional text or context. Figure 15: The prompt for the ‘ask’ role. This guides the agent in formulating a procedural ‘how-to‘ question when it encounters a kno...

  7. [7]

    slot_from

    Craft crimson_planks 1.1. move crimson_hyphae to A1 1.2. move crimson_planks to a free inventory slot move({"slot_from": "I15", "slot_to": "A1", "quantity": 1}) User:Craft an item of type: crimson_planks inventory: - crimson_planks 0 quantity 4 - crimson_hyphae A1 quantity 1 - moosh... move({"slot_from": "0", "slot_to": "I1", "quantity": 4}) ✓Crafting suc...

  8. [8]

    move: from I7 to A1 with quantity 1

  9. [9]

    move: from I15 to A2 with quantity 1

  10. [10]

    slot_from

    move: from 0 to I1 with quantity 1 move({"slot_from": "I7", "slot_to": "A1", "quantity": 1}) User:Craft an item of type: lime_wool inventory: - lime_dye A1 quantity 1 - jungle_stairs I2 quantity 45 - dark_oak_fence ... move({"slot_from": "I15", "slot_to": "A2", "quantity": 1}) User:Craft an item of type: lime_wool inventory: - lime_wool 0 quantity 1 - lim...

  11. [13]

    move: from I10 to A1 with quantity 1

  12. [14]

    move: from 0 to I1 with quantity 4

  13. [15]

    thought":

    think({"thought": "I do not have oak_planks in slot I10, I have them in slot I20. I will move oak_planks from slot I20 to slot A1 with quantity 1."}) User:Craft an item of type: oak_boat inventory: - turtle_spawn_egg I3 quantity 19 - orange_bed I4 quantity 1 - smooth_ston... move({"slot_from": "I20", "slot_to": "A1", "quantity": 1}) User:Craft an item of ...

  14. [16]

    Arrange 2 acacia_planks in a 1x2 shape in the top row of the crafting grid

  15. [17]

    Place one acacia_plank in the top left (A1) and the other in the top middle (A2)

  16. [18]

    The acacia_pressure_plate will be created in the output slot

  17. [19]

    slot_from

    Move the acacia_pressure_plate from the output slot to a free inventory slot. RELATED IT... move({"slot_from": "I32", "slot_to": "A1", "quantity": 1}) User:Craft an item of type: acacia_pressure_plate inventory: - acacia_button 0 quantity 1 - acacia_planks A1 quantity 1 - ... move({"slot_from": "I32", "slot_to": "A2", "quantity": 1}) User:Craft an item of...

  18. [20]

    Arrange 6 brown_wool items in a 3x2 pattern, filling all slots in the top and middle rows of the crafting grid

  19. [21]

    Place 1 stick in the bottom middle slot (C2) of the crafting grid

  20. [22]

    slot_from

    Move the resulting brown_banner from the output slot to a free inventory slot. RELATED ITEMS: [’brown_wool’, ’stick’] move({"slot_from": "I7", "slot_to": "A1", "quantity": 1}) User:Craft an item of type: brown_banner inventory: - brown_wool A1 quantity 1 - brown_wool I7 quantity 5 - terracotta I9 ... move({"slot_from": "I7", "slot_to": "A2", "quantity": 1...