Lightweight LLM Agent Memory with Small Language Models

· 2026 · cs.AI · arXiv 2604.07798

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

open full Pith review browse 5 citing papers arXiv PDF

abstract

Although LLM agents can leverage tools for complex tasks, they still need memory to maintain cross-turn consistency and accumulate reusable information in long-horizon interactions. However, retrieval-based external memory systems incur low online overhead but suffer from unstable accuracy due to limited query construction and candidate filtering. In contrast, many systems use repeated large-model calls for online memory operations, improving accuracy but accumulating latency over long interactions. We propose LightMem, a lightweight memory system for better agent memory driven by Small Language Models (SLMs). LightMem modularizes memory retrieval, writing, and long-term consolidation, and separates online processing from offline consolidation to enable efficient memory invocation under bounded compute. We organize memory into short-term memory (STM) for immediate conversational context, mid-term memory (MTM) for reusable interaction summaries, and long-term memory (LTM) for consolidated knowledge, and uses user identifiers to support independent retrieval and incremental maintenance in multi-user settings. Online, LightMem operates under a fixed retrieval budget and selects memories via a two-stage procedure: vector-based coarse retrieval followed by semantic consistency re-ranking. Offline, it abstracts reusable interaction evidence and incrementally integrates it into LTM. Experiments show consistent gains across model scales, with an average F1 improvement of about 2.5 over A-MEM on LoCoMo, while achieving higher efficiency and low median latency (83 ms for retrieval and 581 ms end-to-end).

representative citing papers

CAP: Controllable Alignment Prompting for Unlearning in LLMs

cs.LG · 2026-04-23 · unverdicted · novelty 6.0

CAP enables reversible unlearning of targeted knowledge in LLMs through optimized prompts generated via reinforcement learning, without any parameter updates.

DASH-KV: Accelerating Long-Context LLM Inference via Asymmetric KV Cache Hashing

cs.CL · 2026-04-21 · unverdicted · novelty 6.0

DASH-KV accelerates long-context LLM inference to linear complexity via asymmetric KV cache hashing and mixed-precision retention, matching full attention performance on LongBench.

Transforming External Knowledge into Triplets for Enhanced Retrieval in RAG of LLMs

cs.CL · 2026-04-14 · unverdicted · novelty 6.0

Tri-RAG turns external knowledge into Condition-Proof-Conclusion triplets and retrieves via the Condition anchor to improve efficiency and quality in LLM RAG.

cs.CL · 2026-04-25 · unverdicted · novelty 5.0

A hybrid graph-based training-free framework for LLM context compression matches strong baselines and shows larger gains on long-document benchmarks.

CAP-CoT: Cycle Adversarial Prompt for Improving Chain of Thoughts in LLM Reasoning

cs.AI · 2026-04-25 · unverdicted · novelty 5.0

CAP-CoT uses iterative adversarial prompt cycles to improve CoT accuracy, stability, and robustness across six benchmarks and four LLM backbones.

citing papers explorer

Showing 5 of 5 citing papers.

CAP: Controllable Alignment Prompting for Unlearning in LLMs cs.LG · 2026-04-23 · unverdicted · none · ref 5 · internal anchor
CAP enables reversible unlearning of targeted knowledge in LLMs through optimized prompts generated via reinforcement learning, without any parameter updates.
DASH-KV: Accelerating Long-Context LLM Inference via Asymmetric KV Cache Hashing cs.CL · 2026-04-21 · unverdicted · none · ref 61 · internal anchor
DASH-KV accelerates long-context LLM inference to linear complexity via asymmetric KV cache hashing and mixed-precision retention, matching full attention performance on LongBench.
Transforming External Knowledge into Triplets for Enhanced Retrieval in RAG of LLMs cs.CL · 2026-04-14 · unverdicted · none · ref 5 · internal anchor
Tri-RAG turns external knowledge into Condition-Proof-Conclusion triplets and retrieves via the Condition anchor to improve efficiency and quality in LLM RAG.
From Similarity to Structure: Training-free LLM Context Compression with Hybrid Graph Priors cs.CL · 2026-04-25 · unverdicted · none · ref 18 · internal anchor
A hybrid graph-based training-free framework for LLM context compression matches strong baselines and shows larger gains on long-document benchmarks.
CAP-CoT: Cycle Adversarial Prompt for Improving Chain of Thoughts in LLM Reasoning cs.AI · 2026-04-25 · unverdicted · none · ref 51 · internal anchor
CAP-CoT uses iterative adversarial prompt cycles to improve CoT accuracy, stability, and robustness across six benchmarks and four LLM backbones.

Lightweight LLM Agent Memory with Small Language Models

fields

years

verdicts

representative citing papers

citing papers explorer