arxiv: 2511.22074 · v2 · submitted 2025-11-27 · 💻 cs.AI · cs.IR

Real-Time Procedural Learning From Experience for AI Agents

Dasheng Bi , Yubin Hu , Mohammed N. Nasir This is my paper

Pith reviewed 2026-05-17 05:17 UTC · model grok-4.3

classification 💻 cs.AI cs.IR

keywords AI agentsprocedural learningreal-time learningexperience retrievalLLM agentsweb browsingstate matchingpost-training adaptation

0 comments

The pith

PRAXIS lets AI agents acquire procedural knowledge in real time by retrieving past state-action-result examples that match the current situation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents PRAXIS as a lightweight mechanism that lets deployed LLM-based agents learn procedures from their own trial-and-error experiences without further training. It stores the outcomes of actions taken in past episodes and retrieves relevant ones by jointly comparing both the external environment and the agent's internal state to the present moment. On a web-browsing benchmark the method raised task-completion accuracy and reliability while lowering cost, worked with multiple model backbones, and showed some ability to handle tasks it had not seen before. The core idea is that this real-time recall supplies concrete guidance for the next action, addressing the fact that most current agents cannot improve their procedures after they are deployed.

Core claim

PRAXIS stores the consequences of actions and retrieves them by jointly matching environmental and internal states of past episodes to the current state. It augments agentic action selection with retrieved state-action-result exemplars that are generated in real time. When evaluated on the REAL web browsing benchmark, PRAXIS improves task completion accuracy, reliability, and cost efficiency across different foundation model backbones, and shows preliminary generalization to unseen tasks in similar environments.

What carries the argument

PRAXIS, a post-training mechanism that indexes and retrieves state-action-result exemplars by joint matching of environmental and internal states from prior episodes to guide current action selection.

If this is right

Task completion accuracy rises when agents can draw on real-time retrieved examples rather than relying only on the base model.
Reliability improves because retrieved outcomes provide concrete guidance instead of purely generative responses.
Cost efficiency increases across different foundation-model backbones because fewer tokens or steps are wasted on unproductive actions.
Preliminary generalization appears for unseen tasks that share similar environments with the training episodes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same state-matching retrieval could be tested in other stateful domains such as robotic manipulation or multi-step planning where environments change during execution.
If retrieval noise proves low, the approach might reduce the need for frequent model fine-tuning by letting agents accumulate procedural knowledge continuously.
The method's emphasis on internal state matching suggests it could be combined with memory architectures that already track an agent's goals or beliefs.

Load-bearing premise

Joint matching of past environmental and internal states will reliably surface useful exemplars for the current decision without introducing noise, excessive latency, or retrieval failures.

What would settle it

Running the agent in a rapidly changing web environment where retrieval either fails to improve accuracy or adds measurable latency and errors would show the central claim does not hold.

Figures

Figures reproduced from arXiv: 2511.22074 by Dasheng Bi, Mohammed N. Nasir, Yubin Hu.

**Figure 2.** Figure 2: Agent reliability on REAL benchmark for different models with (blue) and without (grey) procedural memory. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Performance as a function of retrieval breadth. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

read the original abstract

Learning how to do things from trial and error in real time is a hallmark of biological intelligence, yet most LLM-based agents lack mechanisms to acquire procedural knowledge after deployment. We propose Procedural Recall for Agents with eXperiences Indexed by State (PRAXIS), a lightweight post-training learning mechanism that stores the consequences of actions and retrieves them by jointly matching environmental and internal states of past episodes to the current state. PRAXIS augments agentic action selection with retrieved state-action-result exemplars that are generated in real time. When evaluated on the REAL web browsing benchmark, PRAXIS improves task completion accuracy, reliability, and cost efficiency across different foundation model backbones, and shows preliminary generalization to unseen tasks in similar environments. These results demonstrate that PRAXIS enables the practical adoption of AI agents in fast-evolving stateful environments by helping them learn new procedures effectively.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes PRAXIS, a lightweight post-training mechanism for LLM-based agents that stores consequences of actions in real time and retrieves state-action-result exemplars by jointly matching environmental states (e.g., page DOM or screenshots) and internal states (e.g., agent memory or goal) from past episodes to the current state. This augments action selection during inference. Evaluation on the REAL web browsing benchmark is claimed to show improvements in task completion accuracy, reliability, and cost efficiency across foundation model backbones, along with preliminary generalization to unseen tasks in similar environments.

Significance. If the quantitative results and retrieval robustness hold under detailed scrutiny, the work could be significant for enabling practical deployment of agents in fast-evolving stateful settings by providing a parameter-free, post-deployment way to acquire procedural knowledge without full retraining. The real-time exemplar generation and cross-backbone consistency are potential strengths if supported by ablations and metrics.

major comments (2)

[Abstract and Evaluation section] Abstract and Evaluation section: the claim of improved task completion accuracy, reliability, and cost efficiency on the REAL benchmark is asserted without any quantitative results, error bars, statistical tests, baseline comparisons, or description of retrieval mechanics (e.g., encoding functions or similarity thresholds). This leaves the central performance claim unverifiable and weakens attribution of gains specifically to PRAXIS rather than prompt length or model variance.
[Method section on retrieval] Method section on retrieval: joint matching of environmental and internal states is described at a high level without concrete details on state encoding granularity, similarity computation, or failure modes for small layout shifts or goal drift. In the fast-evolving web environments targeted by the paper, this risks retrieving noisy or outdated exemplars, undermining the claim that observed gains derive from procedural recall rather than other factors.

minor comments (2)

[Method] Provide explicit pseudocode or diagram for the storage and retrieval loop, including how real-time exemplars are generated and indexed.
[Preliminaries or Method] Clarify the exact definition and representation of 'internal state' versus 'environmental state' with concrete examples from the web browsing domain.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for their detailed and insightful comments, which have helped us identify areas for improvement in our manuscript. We address each major comment below and outline the revisions we plan to make.

read point-by-point responses

Referee: [Abstract and Evaluation section] Abstract and Evaluation section: the claim of improved task completion accuracy, reliability, and cost efficiency on the REAL benchmark is asserted without any quantitative results, error bars, statistical tests, baseline comparisons, or description of retrieval mechanics (e.g., encoding functions or similarity thresholds). This leaves the central performance claim unverifiable and weakens attribution of gains specifically to PRAXIS rather than prompt length or model variance.

Authors: We thank the referee for highlighting this issue. The Evaluation section does contain quantitative results demonstrating improvements in task completion accuracy, reliability, and cost efficiency across foundation model backbones on the REAL benchmark, along with baseline comparisons. However, we acknowledge that the presentation could be strengthened by including error bars, statistical tests, and a more explicit description of retrieval mechanics. In the revised manuscript, we will add these elements, including error bars from multiple runs, p-values for statistical significance, and details on encoding functions and similarity thresholds in both the abstract summary and the evaluation section to ensure verifiability and clear attribution to PRAXIS. revision: yes
Referee: [Method section on retrieval] Method section on retrieval: joint matching of environmental and internal states is described at a high level without concrete details on state encoding granularity, similarity computation, or failure modes for small layout shifts or goal drift. In the fast-evolving web environments targeted by the paper, this risks retrieving noisy or outdated exemplars, undermining the claim that observed gains derive from procedural recall rather than other factors.

Authors: We agree with the referee that additional concrete details are necessary to fully substantiate the retrieval mechanism. In the revised version, we will expand the Method section to specify the state encoding granularity (e.g., using sentence embeddings for internal states and visual/DOM embeddings for environmental states), the similarity computation method (e.g., cosine similarity with a dynamic threshold), and a discussion of failure modes including small layout shifts and goal drift, along with mitigation strategies such as hierarchical matching and recency weighting. This will help demonstrate that the gains are indeed from procedural recall. revision: yes

Circularity Check

0 steps flagged

No circularity detected in derivation or claims

full rationale

The paper introduces PRAXIS as an algorithmic mechanism for storing and retrieving state-action-result exemplars via joint environmental and internal state matching to augment agent action selection. No equations, derivations, fitted parameters, or mathematical reductions appear in the provided text or abstract. Claims of improved accuracy, reliability, and generalization rest on external empirical evaluation on the REAL web browsing benchmark across foundation model backbones, with no self-citation load-bearing steps, ansatz smuggling, or self-definitional loops. The proposal is self-contained as a practical post-training addition evaluated independently of its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no explicit free parameters, background axioms, or new postulated entities are identifiable beyond standard assumptions in agent memory systems.

pith-pipeline@v0.9.0 · 5443 in / 1137 out tokens · 39908 ms · 2026-05-17T05:17:40.938790+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

PRAXIS augments agentic action selection with retrieved state-action-result exemplars that are generated in real time... jointly matching environmental and internal states

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages · 12 internal anchors

[1]

Altrina (formerly Tessa AI). 2025. Evolving Our State-of-the-Art Browsing Agent. https://www.altrina.com/blog/evolving-our-state-of-the-art- browsing-agent. Accessed: 2025-11-16

work page 2025
[2]

Altrina (formerly Tessa AI). 2025. Introducing Large Neurosymbolic Cognitive Models. https://www.altrina.com/blog/introducing-large- neurosymbolic-cognitive-models. Accessed: 2025-11-16

work page 2025
[3]

Gordon H. Bower. 1981. Mood and memory.American Psychologist36, 2 (1981), 129–148. doi:10.1037/0003-066X.36.2.129 Place: US Publisher: American Psychological Association

work page doi:10.1037/0003-066x.36.2.129 1981
[4]

Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav. 2025. Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory. doi:10.48550/arXiv.2504.19413 arXiv:2504.19413 [cs]

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2504.19413 2025
[5]

Xiang Deng, Yu Gu, Boyuan Zheng, Shijie Chen, Samuel Stevens, Boshi Wang, Huan Sun, and Yu Su. 2023. Mind2Web: Towards a Generalist Agent for the Web. doi:10.48550/arXiv.2306.06070 arXiv:2306.06070 [cs]

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2306.06070 2023
[6]

Divyansh Garg, Shaun VanWeelden, Diego Caples, Andis Draguns, Nikil Ravi, Pranav Putta, Naman Garg, Tomas Abraham, Michael Lara, Federico Lopez, James Liu, Atharva Gundawar, Prannay Hebbar, Youngchul Joo, Jindong Gu, Charles London, Christian Schroeder de Witt, and Sumeet Motwani. 2025. REAL: Benchmarking Autonomous Agents on Deterministic Simulations of ...

work page doi:10.48550/arxiv.2504.11543 2025
[7]

Hongliang He, Wenlin Yao, Kaixin Ma, Wenhao Yu, Yong Dai, Hongming Zhang, Zhenzhong Lan, and Dong Yu. 2024. WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models. doi:10.48550/arXiv.2401.13919 arXiv:2401.13919 [cs]

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2401.13919 2024
[8]

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. 2021. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. doi:10.48550/arXiv. 2005.11401 arXiv:2005.11401 [cs]

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv 2021
[9]

Evan Zheran Liu, Kelvin Guu, Panupong Pasupat, Tianlin Shi, and Percy Liang. 2018. Reinforcement Learning on Web Interfaces Using Workflow- Guided Exploration. doi:10.48550/arXiv.1802.08802 arXiv:1802.08802 [cs]

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1802.08802 2018
[10]

Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, Shashank Gupta, Bodhisattwa Prasad Majumder, Katherine Hermann, Sean Welleck, Amir Yazdanbakhsh, and Peter Clark. 2023. Self-Refine: Iterative Refinement with Self-Feedback. doi:10.48550/arXiv.2303.17651 arXiv:2303....

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2303.17651 2023
[11]

Bodhisattwa Prasad Majumder, Bhavana Dalvi Mishra, Peter Jansen, Oyvind Tafjord, Niket Tandon, Li Zhang, Chris Callison-Burch, and Peter Clark

work page
[12]

doi:10.48550/arXiv.2310.10134 arXiv:2310.10134 [cs]

CLIN: A Continually Learning Language Agent for Rapid Task Adaptation and Generalization. doi:10.48550/arXiv.2310.10134 arXiv:2310.10134 [cs]

work page doi:10.48550/arxiv.2310.10134
[13]

MemGPT: Towards LLMs as Operating Systems

Charles Packer, Sarah Wooders, Kevin Lin, Vivian Fang, Shishir G. Patil, Ion Stoica, and Joseph E. Gonzalez. 2024. MemGPT: Towards LLMs as Operating Systems. arXiv:2310.08560 [cs.AI] https://arxiv.org/abs/2310.08560

work page internal anchor Pith review Pith/arXiv arXiv 2024
[14]

Noah Shinn, Federico Cassano, Edward Berman, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. 2023. Reflexion: Language Agents with Verbal Reinforcement Learning. doi:10.48550/arXiv.2303.11366 arXiv:2303.11366 [cs]

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2303.11366 2023
[15]

Endel Tulving and Donald M. Thomson. 1973. Encoding specificity and retrieval processes in episodic memory.Psychological Review80, 5 (1973), 352–373. doi:10.1037/h0020071 Place: US Publisher: American Psychological Association

work page doi:10.1037/h0020071 1973
[16]

Zora Zhiruo Wang, Jiayuan Mao, Daniel Fried, and Graham Neubig. 2024. Agent Workflow Memory. doi:10.48550/arXiv.2409.07429 arXiv:2409.07429 [cs]

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2409.07429 2024
[17]

Wujiang Xu, Zujie Liang, Kai Mei, Hang Gao, Juntao Tan, and Yongfeng Zhang. 2025. A-MEM: Agentic Memory for LLM Agents. doi:10.48550/ arXiv.2502.12110 arXiv:2502.12110 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2025
[18]

Andrew Zhao, Daniel Huang, Quentin Xu, Matthieu Lin, Yong-Jin Liu, and Gao Huang. 2023. ExpeL: LLM Agents Are Experiential Learners. doi:10.48550/arXiv.2308.10144 arXiv:2308.10144 [cs] version: 2

work page doi:10.48550/arxiv.2308.10144 2023
[19]

Longtao Zheng, Rundong Wang, Xinrun Wang, and Bo An. 2024. Synapse: Trajectory-as-Exemplar Prompting with Memory for Computer Control. doi:10.48550/arXiv.2306.07863 arXiv:2306.07863 [cs]

work page doi:10.48550/arxiv.2306.07863 2024
[20]

Wanjun Zhong, Lianghong Guo, Qiqi Gao, He Ye, and Yanlin Wang. 2023. MemoryBank: Enhancing Large Language Models with Long-Term Memory. doi:10.48550/arXiv.2305.10250 arXiv:2305.10250 [cs]. 8 Bi, Hu, and Nasir

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2305.10250 2023
[21]

WebArena: A Realistic Web Environment for Building Autonomous Agents

Shuyan Zhou, Frank F. Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, Uri Alon, and Graham Neubig. 2024. WebArena: A Realistic Web Environment for Building Autonomous Agents. doi:10.48550/arXiv.2307.13854 arXiv:2307.13854 [cs] version: 4

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2307.13854 2024