HORMA builds a hierarchical memory structure from agent experiences and trains a lightweight RL navigator to retrieve minimal sufficient context, yielding better task performance with at most 22.17% of baseline token usage on ALFWorld, LoCoMo, and LongMemEval.
Beyond static summarization: Proactive memory extraction for llm agents.arXiv preprint arXiv:2601.04463, 2026a
9 Pith papers cite this work. Polarity classification is still indexing.
years
2026 9representative citing papers
TMEM lets LLM agents evolve their policy mid-episode by absorbing distilled supervision into online LoRA updates, outperforming summary and retrieval baselines on several long-context benchmarks.
Mem-π is a framework using a dedicated model and decision-content decoupled RL to generate context-specific guidance on demand for LLM agents, outperforming retrieval baselines by over 30% on web navigation.
Slipstream uses asynchronous compaction with trajectory-grounded judge validation to improve long-horizon agent accuracy by up to 8.8 percentage points and reduce latency by up to 39.7%.
HyMem introduces dual-granular memory storage with a lightweight summary module for fast responses and selective activation of a deep LLM module for complex queries, outperforming full-context baselines by 92.6% lower computational cost on LOCOMO and LongMemEval benchmarks.
A survey that maps safety risks in personalized LLMs, introduces a unified taxonomy, and highlights three structural inadequacies in existing research on user-invariant safety, isolated techniques, and short-term evaluations.
EMBER learns to retain source-backed evidence capsules under a fixed token budget, improving F1, Retain-Recall, and Read-Recall on LongMemEval-RR over budgeted baselines.
MemIR is a typed memory representation for LLM agents that structures memory into atoms separating evidence, cues, and claims, leading to better performance on source tracking tasks in experiments on LoCoMo and BEAM-100K.
MEMTIER reports 0.382 accuracy and 0.412 F1 on the 500-question LongMemEval-S benchmark, a 33pp gain over full-context baseline using tiered memory and retrieval components on 6GB GPU hardware.
citing papers explorer
-
MEMTIER: Tiered Memory Architecture and Retrieval Bottleneck Analysis for Long-Running Autonomous AI Agents
MEMTIER reports 0.382 accuracy and 0.412 F1 on the 500-question LongMemEval-S benchmark, a 33pp gain over full-context baseline using tiered memory and retrieval components on 6GB GPU hardware.