arxiv: 2604.16839 · v1 · submitted 2026-04-18 · 💻 cs.CL

Recognition: unknown

HeLa-Mem: Hebbian Learning and Associative Memory for LLM Agents

Jinchang Zhu , Jindong Li , Cheng Zhang , Jiahong Liu , Menglin Yang

Authors on Pith no claims yet

Pith reviewed 2026-05-10 07:19 UTC · model grok-4.3

classification 💻 cs.CL

keywords Hebbian learningassociative memoryLLM agentsepisodic memorysemantic memorylong-term memorymemory graphLoCoMo benchmark

0 comments

The pith

HeLa-Mem builds LLM agent memory as an evolving associative graph using Hebbian learning to strengthen connections from repeated co-activation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to replace unstructured embedding vectors in LLM agent memory with a structured graph that grows through association rules drawn from biology. It organizes memory in two layers: an episodic graph that links experiences when they activate together and a semantic store that condenses the strongest hubs into reusable knowledge via a reflective process. This design aims to preserve coherence in long-running conversations where fixed context windows normally fail. Experiments on the LoCoMo benchmark show the system handles four categories of questions more accurately while consuming markedly fewer tokens than standard retrieval methods. The core argument is that biological mechanisms of association, consolidation, and spreading activation can be approximated inside existing agent loops to improve long-term performance.

Core claim

HeLa-Mem models memory as a dynamic graph with Hebbian learning dynamics. It maintains an episodic memory graph that evolves through co-activation patterns and populates a semantic memory store via Hebbian Distillation, in which a Reflective Agent identifies densely connected memory hubs and distills them into structured, reusable semantic knowledge. This dual-path design combines semantic similarity with learned associations, mirroring the episodic-semantic distinction in human cognition and yielding superior performance across four question categories on LoCoMo while using significantly fewer context tokens.

What carries the argument

A dual-level memory architecture with an episodic memory graph that updates via Hebbian co-activation rules and a semantic memory store created by reflective distillation of dense graph hubs.

If this is right

Agents can sustain coherent behavior across interaction lengths that would otherwise exceed fixed context windows.
Retrieval draws on both vector similarity and explicitly learned associative links rather than similarity alone.
The memory structure reduces the number of tokens needed per response while raising accuracy on question-answering tasks.
Biological processes of association and consolidation can be approximated without retraining the underlying language model.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The graph structure may scale more gracefully than vector stores as total interaction history grows, because only co-activated edges strengthen.
Integration with existing agent tool-use loops could allow the memory to influence planning and reflection without additional model calls.
Similar Hebbian updates might be applied to other agent components, such as tool-selection histories or goal-tracking graphs.
The approach suggests a path toward online, parameter-free memory adaptation that continues after initial deployment.

Load-bearing premise

The Hebbian update rules and reflective distillation process can be implemented efficiently inside existing LLM inference loops without introducing instability or requiring extensive unreported hyper-parameter tuning.

What would settle it

A controlled run of an agent on a multi-thousand-turn conversation sequence that measures whether retrieval accuracy falls below the embedding baseline once the graph grows large or whether token usage stops decreasing.

Figures

Figures reproduced from arXiv: 2604.16839 by Cheng Zhang, Jiahong Liu, Jinchang Zhu, Jindong Li, Menglin Yang.

**Figure 2.** Figure 2: Conceptual illustration of Hebbian learning [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: The architectural overview of HeLa-Mem. The framework consists of three modules: (1) Hebbian Association for dynamic graph construction (Section 3.2); (2) Reflective Consolidation for semantic knowledge distillation (Section 3.3); and (3) Retrieval and Response using a Dual-Path strategy (Section 3.4). Knowledge-organization methods focus on capturing and structuring intermediate reasoning states. Think-i… view at source ↗

**Figure 4.** Figure 4: Hebbian memory graph showing the Reflec [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Dual-Path Retrieval for multi-hop reasoning. Given a query requiring both “career influence” and “meeting [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Hebbian edge weight matrix for the first 20 [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗

read the original abstract

Long-term memory is a critical challenge for Large Language Model agents, as fixed context windows cannot preserve coherence across extended interactions. Existing memory systems represent conversation history as unstructured embedding vectors, retrieving information through semantic similarity. This paradigm fails to capture the associative structure of human memory, wherein related experiences progressively strengthen interconnections through repeated co-activation. Inspired by cognitive neuroscience, we identify three mechanisms central to biological memory: association, consolidation, and spreading activation, which remain largely absent in current research. To bridge this gap, we propose HeLa-Mem, a bio-inspired memory architecture that models memory as a dynamic graph with Hebbian learning dynamics. HeLa-Mem employs a dual-level organization: (1) an episodic memory graph that evolves through co-activation patterns, and (2) a semantic memory store populated via Hebbian Distillation, wherein a Reflective Agent identifies densely connected memory hubs and distills them into structured, reusable semantic knowledge. This dual-path design leverages both semantic similarity and learned associations, mirroring the episodic-semantic distinction in human cognition. Experiments on LoCoMo demonstrate superior performance across four question categories while using significantly fewer context tokens. Code is available on GitHub: https://github.com/ReinerBRO/HeLa-Mem

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

HeLa-Mem builds a dynamic episodic graph with Hebbian co-activation plus reflective distillation into semantic memory, and claims better LoCoMo results with lower token use.

read the letter

The paper's main contribution is a memory architecture that treats conversation history as a growing graph whose edges strengthen when items co-activate, then periodically distills the densest hubs into structured semantic entries via an extra reflective step. This is a clear step past the usual embedding-retrieval setups that dominate current agent work, because it adds explicit association dynamics and the episodic-semantic distinction drawn from the three neuroscience mechanisms named in the abstract.

Referee Report

2 major / 2 minor

Summary. The paper proposes HeLa-Mem, a bio-inspired dual-level memory architecture for LLM agents. It models episodic memory as a dynamic graph that evolves via Hebbian learning on co-activation patterns and populates a semantic memory store through Hebbian Distillation, where a Reflective Agent identifies memory hubs and distills them into structured knowledge. The central claim is that this system achieves superior performance across four question categories on the LoCoMo benchmark while using significantly fewer context tokens than existing approaches.

Significance. If the empirical results are robust, the work could meaningfully advance long-term memory mechanisms in LLM agents by incorporating associative graph structures and consolidation processes drawn from cognitive neuroscience. The linked code repository supports reproducibility and allows external verification of the implementation.

major comments (2)

[Experiments] Experiments section: the manuscript asserts superior performance on LoCoMo across four question categories but provides no quantitative scores, baseline comparisons, standard deviations, or statistical significance tests. This absence prevents evaluation of the central empirical claim.
[§3] §3 (Method): the Hebbian update rules for the episodic graph and the reflective distillation process are described conceptually without explicit equations, pseudocode, or hyperparameter specifications, leaving the precise dynamics and potential for inference-loop instability unaddressed despite the linked repository.

minor comments (2)

[Introduction] The abstract and introduction could more explicitly distinguish the proposed mechanisms from prior embedding-based retrieval systems to clarify novelty.
[Figures] Figure captions and axis labels in any performance plots should include exact token counts and category breakdowns for clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the presentation of our empirical results and methodological details. We address each major point below and will revise the manuscript to incorporate the suggested improvements.

read point-by-point responses

Referee: [Experiments] Experiments section: the manuscript asserts superior performance on LoCoMo across four question categories but provides no quantitative scores, baseline comparisons, standard deviations, or statistical significance tests. This absence prevents evaluation of the central empirical claim.

Authors: We acknowledge this limitation in the current manuscript. The Experiments section will be expanded in the revision to include a detailed results table reporting exact performance scores for each of the four LoCoMo question categories, direct comparisons against the relevant baselines, standard deviations computed over multiple independent runs, and appropriate statistical significance tests (e.g., paired t-tests) to substantiate the claims of superior performance and reduced token usage. revision: yes
Referee: [§3] §3 (Method): the Hebbian update rules for the episodic graph and the reflective distillation process are described conceptually without explicit equations, pseudocode, or hyperparameter specifications, leaving the precise dynamics and potential for inference-loop instability unaddressed despite the linked repository.

Authors: We agree that the Method section would benefit from greater formality. The revised manuscript will include explicit mathematical equations for the Hebbian weight updates in the episodic memory graph (based on co-activation strength), pseudocode for the full Hebbian Distillation procedure executed by the Reflective Agent, a complete list of hyperparameters with their chosen values, and a brief discussion of stability considerations during inference to address potential loop instability. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents a bio-inspired architectural proposal for LLM agent memory, defining episodic and semantic stores via Hebbian dynamics and distillation as independent design choices drawn from neuroscience. No equations, parameter-fitting procedures, or derivation chains appear in the abstract or high-level description. Claims rest on experimental outcomes on the LoCoMo benchmark rather than any self-referential reduction of outputs to inputs. Code availability provides an external verification path. No load-bearing step matches any enumerated circularity pattern.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The proposal rests on the untested premise that biological Hebbian mechanisms translate directly to LLM memory graphs and that the reflective agent can reliably identify and distill hubs without introducing new errors.

axioms (1)

domain assumption Biological memory mechanisms of association, consolidation, and spreading activation can be faithfully modeled by a dynamic graph with Hebbian edge updates inside an LLM agent.
Explicitly stated as the inspiration for the architecture in the abstract.

invented entities (1)

Hebbian Distillation no independent evidence
purpose: Process that converts densely connected episodic hubs into structured semantic knowledge.
New term and procedure introduced in the abstract to bridge episodic and semantic memory.

pith-pipeline@v0.9.0 · 5533 in / 1329 out tokens · 41982 ms · 2026-05-10T07:19:42.000801+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

6 extracted references · 4 canonical work pages · 1 internal anchor

[1]

Memory in the Age of AI Agents

Memory in the age of ai agents.arXiv preprint arXiv:2512.13564. Le Huang, Hengzhi Lan, Zijun Sun, Chuan Shi, and Ting Bai. 2024. Emotional rag: Enhancing role- playing agents through emotional retrieval. In2024 IEEE International Conference on Knowledge Graph (ICKG), pages 120–127. IEEE. Jiazheng Kang, Mingming Ji, Zhe Zhao, and Ting Bai. 2025. Memory os ...

work page internal anchor Pith review arXiv 2024
[2]

Jiafeng Liang, Hao Li, Chang Li, Jiaqi Zhou, Shixin Jiang, Zekun Wang, Changkai Ji, Zhihao Zhu, Runxuan Liu, Tao Ren, Jinlan Fu, See-Kiong Ng, Xia Liang, Ming Liu, and Bing Qin

Discrete tokenization for multimodal llms: A comprehensive survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1–24. Jiafeng Liang, Hao Li, Chang Li, Jiaqi Zhou, Shixin Jiang, Zekun Wang, Changkai Ji, Zhihao Zhu, Runxuan Liu, Tao Ren, Jinlan Fu, See-Kiong Ng, Xia Liang, Ming Liu, and Bing Qin. 2025. Ai meets brain: Memory systems ...

work page arXiv 2025
[3]

Unleashing infinite-length input capacity for large-scale language models with self-controlled memory system

Scm: Enhancing large language model with self-controlled memory framework.arXiv preprint arXiv:2304.13343. Jiahong Liu, Zexuan Qiu, Zhongyang Li, Quanyu Dai, Wenhao Yu, Jieming Zhu, Minda Hu, Menglin Yang, Tat-Seng Chua, and Irwin King. 2025. A survey of personalized large language models: Progress and future directions.arXiv preprint arXiv:2502.11528. Ji...

work page arXiv 2025
[4]

Hopfield networks is all you need.arXiv preprint arXiv:2008.02217, 2020

Evaluating very long-term conversational memory of LLM agents. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13851– 13870, Bangkok, Thailand. Association for Compu- tational Linguistics. Charles Packer, Vivian Fang, Shishir_G Patil, Kevin Lin, Sarah Wooders, and Joseph_E Gonzalez. ...

work page arXiv 2023
[5]

USER CHARACTERISTICS : 8- Observable traits ( with evidence ) 9- Content preferences ( with evidence ) 10- Interaction patterns 11
[6]

15 July 2023

FACTUAL INFORMATION : 13- Events with dates and locations 14- Stated preferences 15- Mentioned relationships 16 17Format : Concise bullet points with supporting evidence . 18 19Memory Cluster : { conversation } A.2 Response Generation The system uses the following prompt to generate responses using retrieved episodic and semantic memories. Prompt 2: Respo...

2023