How²: How to learn from procedural How-to questions
Pith reviewed 2026-05-18 08:12 UTC · model grok-4.3
The pith
Lifelong learning agents improve at planning when they store and reuse high-level answers to how-to questions rather than concrete action sequences.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Using teacher models that answer at varying levels of abstraction, from executable action sequences to high-level subgoal descriptions, lifelong learning agents benefit most from answers that are abstracted and decoupled from the current state.
What carries the argument
The $How^{2}$ memory agent framework that lets agents ask how-to questions, store the answers, and reuse them for planning in interactive environments.
If this is right
- Agents reduce uncertainty in both present and future planning problems by retrieving stored how-to knowledge.
- Abstract answers transfer across different states and tasks more reliably than state-specific action lists.
- Over many tasks the agent accumulates a growing set of reusable procedural knowledge without retraining.
- LLM-based agents gain a mechanism to improve their planning abilities gradually through question asking and memory reuse.
Where Pith is reading between the lines
- Agents could be extended to select the most useful abstraction level when asking future questions based on past reuse success.
- The same storage-and-reuse pattern might apply to robotic or navigation domains where plans must generalize across changing conditions.
- Stored answers might be organized by similarity of tasks so that retrieval becomes more precise as the memory grows.
Load-bearing premise
The evaluation assumes that answers provided by teacher models at different abstraction levels can be effectively stored and reused by the agent without introducing errors or conflicts in the planning process across multiple tasks.
What would settle it
Test the same agent on a series of new but related assembly tasks and measure whether success rates are higher when it reuses stored abstract subgoal descriptions than when it reuses stored executable action sequences.
Figures
read the original abstract
An agent facing a planning problem can use answers to how-to questions to reduce uncertainty and fill knowledge gaps, helping it solve both current and future tasks. However, their open ended nature, where valid answers to "How do I X?" range from executable actions to high-level descriptions of X's sub-goals, makes them challenging for AI agents to ask, and for AI experts to answer, in ways that support efficient planning. We introduce $How^{2}$, a memory agent framework that enables agents to ask how-to questions, store the answers, and reuse them for lifelong learning in interactive environments. We evaluate our approach in Plancraft, a Minecraft crafting environment, where agents must complete an assembly task by manipulating inventory items. Using teacher models that answer at varying levels of abstraction, from executable action sequences to high-level subgoal descriptions, we show that lifelong learning agents benefit most from answers that are abstracted and decoupled from the current state. $How^{2}$ offers a way for LLM-based agents to improve their planning capabilities over time by asking questions in interactive environments.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces $How^{2}$, a memory agent framework enabling LLM-based agents to pose how-to questions, store answers from teacher models at varying abstraction levels (executable actions to high-level subgoals), and reuse them for lifelong learning. Evaluated in the Plancraft Minecraft crafting environment, the central claim is that agents benefit most from abstracted, state-decoupled answers rather than current-state-tied sequences.
Significance. If the empirical results hold, the work could meaningfully advance planning capabilities in interactive agents by providing a mechanism for accumulating procedural knowledge across tasks. The explicit variation of answer abstraction levels and the concrete Plancraft testbed are strengths; the framework's emphasis on memory reuse for future tasks addresses a relevant gap in lifelong agent learning.
major comments (2)
- [Abstract] Abstract: the claim that lifelong learning agents benefit most from abstracted and state-decoupled answers is stated without any quantitative results, metrics, error bars, baselines, or details on how performance was measured or compared, leaving the central empirical claim without visible supporting data.
- [$How^{2}$ memory agent framework] $How^{2}$ memory agent framework and Plancraft evaluation: the framework stores and reuses state-decoupled answers (e.g., high-level subgoal descriptions) but provides no explicit mechanism for grounding, updating, or resolving conflicts when inventory states change across tasks, which risks invalid plans on reuse.
minor comments (2)
- [Introduction] The notation and acronym $How^{2}$ should be introduced with a brief expansion and motivation in the first paragraph of the introduction.
- [Experiments] Figure captions and axis labels in the Plancraft results should explicitly state the performance metric and number of runs.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback on our manuscript. We have carefully considered each comment and made revisions to improve the clarity and completeness of the work, particularly strengthening the abstract and addressing potential limitations in the framework description.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that lifelong learning agents benefit most from abstracted and state-decoupled answers is stated without any quantitative results, metrics, error bars, baselines, or details on how performance was measured or compared, leaving the central empirical claim without visible supporting data.
Authors: We agree with this observation. The abstract was intentionally kept concise, but it should better support the central claim with evidence. In the revised manuscript, we have updated the abstract to briefly include key quantitative results, such as the relative improvement in task success rates when using abstracted answers versus state-tied sequences, along with mention of the evaluation metrics (task completion success) and baselines used in the Plancraft experiments. Full details, including error bars from multiple runs, remain in the results section and figures. revision: yes
-
Referee: [$How^{2}$ memory agent framework] $How^{2}$ memory agent framework and Plancraft evaluation: the framework stores and reuses state-decoupled answers (e.g., high-level subgoal descriptions) but provides no explicit mechanism for grounding, updating, or resolving conflicts when inventory states change across tasks, which risks invalid plans on reuse.
Authors: This is a valid concern regarding the practical deployment of the memory. The current framework assumes that the agent will ground the stored answers to the current environment state at the time of reuse, using its perception module. However, we acknowledge that explicit mechanisms for updating memories or resolving conflicts (e.g., when an item is no longer available) were not detailed. We have added a paragraph in the framework description explaining the grounding process and a note on limitations, including that conflict resolution is handled heuristically by the planner and may require future extensions for robustness. revision: partial
Circularity Check
Empirical framework evaluation is self-contained with no circular derivation
full rationale
The paper introduces the $How^{2}$ memory agent framework for asking, storing, and reusing how-to answers across abstraction levels in the Plancraft Minecraft crafting environment. The central result—that lifelong learning agents benefit most from abstracted, state-decoupled subgoal descriptions—is obtained via direct experimental comparison of teacher-model answers at different abstraction levels. No equations, fitted parameters, or uniqueness theorems are presented that reduce the outcome to the inputs by construction. The evaluation relies on external task performance in Plancraft rather than self-referential definitions or self-citation chains, making the derivation independent and non-circular.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Agents facing planning problems can reduce uncertainty by storing and reusing answers to how-to questions.
invented entities (1)
-
$How^{2}$ memory agent framework
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We introduce How², a memory agent framework that enables agents to ask how-to questions, store the answers, and reuse them for lifelong learning in interactive environments.
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
answers that are abstracted and decoupled from the current state
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
MemGPT: Towards LLMs as Operating Systems
Memgpt: Towards llms as operating systems. Preprint, arXiv:2310.08560. Hai Pham, Isma Hadji, Xinnuo Xu, Ziedune Degutyte, Jay Rainey, Evangelos Kazakos, Afsaneh Fazly, Georgios Tzimiropoulos, and Brais Martinez. 2024. Graph guided question answer generation for pro- cedural question-answering. InProceedings of the 18th Conference of the European Chapter o...
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[2]
Learning to Ask: When LLM Agents Meet Unclear Instruction
Learning to ask: When LLM agents meet un- clear instruction.arXiv preprint arXiv:2409.00557. Version v3, 16 Feb 2025. Julia White, Gabriel Poesia, Robert Hawkins, Dorsa Sadigh, and Noah Goodman. 2021. Open-domain clarification question generation without question ex- amples. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Pro...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[3]
A-MEM: Agentic Memory for LLM Agents
Asking clarification questions in knowledge- based question answering. InProceedings of the 2019 Conference on Empirical Methods in Natu- ral Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), page 1618–1629, Hong Kong, China. Association for Computational Linguistics. Wujiang Xu, Kai Mei, Hang Ga...
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[4]
Focus on asking **how** to craft the target item mentioned in the Agent Context
-
[5]
Use the agent context
-
[6]
Ensure the question is concise and focused on {{recipe_name}} Based on the above, please provide a clear, well-formed question about {{recipe_name}}. Only return the question. Do not include any additional text or context. Figure 15: The prompt for the ‘ask’ role. This guides the agent in formulating a procedural ‘how-to‘ question when it encounters a kno...
-
[7]
Craft crimson_planks 1.1. move crimson_hyphae to A1 1.2. move crimson_planks to a free inventory slot move({"slot_from": "I15", "slot_to": "A1", "quantity": 1}) User:Craft an item of type: crimson_planks inventory: - crimson_planks 0 quantity 4 - crimson_hyphae A1 quantity 1 - moosh... move({"slot_from": "0", "slot_to": "I1", "quantity": 4}) ✓Crafting suc...
-
[8]
move: from I7 to A1 with quantity 1
-
[9]
move: from I15 to A2 with quantity 1
-
[10]
move: from 0 to I1 with quantity 1 move({"slot_from": "I7", "slot_to": "A1", "quantity": 1}) User:Craft an item of type: lime_wool inventory: - lime_dye A1 quantity 1 - jungle_stairs I2 quantity 45 - dark_oak_fence ... move({"slot_from": "I15", "slot_to": "A2", "quantity": 1}) User:Craft an item of type: lime_wool inventory: - lime_wool 0 quantity 1 - lim...
-
[13]
move: from I10 to A1 with quantity 1
-
[14]
move: from 0 to I1 with quantity 4
-
[15]
think({"thought": "I do not have oak_planks in slot I10, I have them in slot I20. I will move oak_planks from slot I20 to slot A1 with quantity 1."}) User:Craft an item of type: oak_boat inventory: - turtle_spawn_egg I3 quantity 19 - orange_bed I4 quantity 1 - smooth_ston... move({"slot_from": "I20", "slot_to": "A1", "quantity": 1}) User:Craft an item of ...
-
[16]
Arrange 2 acacia_planks in a 1x2 shape in the top row of the crafting grid
-
[17]
Place one acacia_plank in the top left (A1) and the other in the top middle (A2)
-
[18]
The acacia_pressure_plate will be created in the output slot
-
[19]
Move the acacia_pressure_plate from the output slot to a free inventory slot. RELATED IT... move({"slot_from": "I32", "slot_to": "A1", "quantity": 1}) User:Craft an item of type: acacia_pressure_plate inventory: - acacia_button 0 quantity 1 - acacia_planks A1 quantity 1 - ... move({"slot_from": "I32", "slot_to": "A2", "quantity": 1}) User:Craft an item of...
-
[20]
Arrange 6 brown_wool items in a 3x2 pattern, filling all slots in the top and middle rows of the crafting grid
-
[21]
Place 1 stick in the bottom middle slot (C2) of the crafting grid
-
[22]
Move the resulting brown_banner from the output slot to a free inventory slot. RELATED ITEMS: [’brown_wool’, ’stick’] move({"slot_from": "I7", "slot_to": "A1", "quantity": 1}) User:Craft an item of type: brown_banner inventory: - brown_wool A1 quantity 1 - brown_wool I7 quantity 5 - terracotta I9 ... move({"slot_from": "I7", "slot_to": "A2", "quantity": 1...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.