super hub Canonical reference

MemGPT: Towards LLMs as Operating Systems

Charles Packer, Ion Stoica, Kevin Lin, Sarah Wooders, Shishir G. Patil, Vivian Fang · 2023 · cs.AI · arXiv 2310.08560

Canonical reference. 77% of citing Pith papers cite this work as background.

312 Pith papers citing it

Background 77% of classified citations

open full Pith review browse 312 citing papers more from Charles Packer arXiv PDF

abstract

Large language models (LLMs) have revolutionized AI, but are constrained by limited context windows, hindering their utility in tasks like extended conversations and document analysis. To enable using context beyond limited context windows, we propose virtual context management, a technique drawing inspiration from hierarchical memory systems in traditional operating systems that provide the appearance of large memory resources through data movement between fast and slow memory. Using this technique, we introduce MemGPT (Memory-GPT), a system that intelligently manages different memory tiers in order to effectively provide extended context within the LLM's limited context window, and utilizes interrupts to manage control flow between itself and the user. We evaluate our OS-inspired design in two domains where the limited context windows of modern LLMs severely handicaps their performance: document analysis, where MemGPT is able to analyze large documents that far exceed the underlying LLM's context window, and multi-session chat, where MemGPT can create conversational agents that remember, reflect, and evolve dynamically through long-term interactions with their users. We release MemGPT code and data for our experiments at https://memgpt.ai.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 36 baseline 3 dataset 3 method 1 other 1

citation-polarity summary

background 34 baseline 3 use dataset 3 support 2 unclear 1 use method 1

claims ledger

abstract Large language models (LLMs) have revolutionized AI, but are constrained by limited context windows, hindering their utility in tasks like extended conversations and document analysis. To enable using context beyond limited context windows, we propose virtual context management, a technique drawing inspiration from hierarchical memory systems in traditional operating systems that provide the appearance of large memory resources through data movement between fast and slow memory. Using this technique, we introduce MemGPT (Memory-GPT), a system that intelligently manages different memory tiers i

authors

Charles Packer Ion Stoica Kevin Lin Sarah Wooders Shishir G. Patil Vivian Fang

co-cited works

representative citing papers

Continual Learning Bench: Evaluating Frontier AI Systems in Real-World Stateful Environments

cs.AI · 2026-06-04 · unverdicted · novelty 8.0

CL-Bench is the first expert-validated benchmark for continual learning in frontier LLMs across six real-world domains, showing limited gains and that naive in-context learning outperforms dedicated memory systems.

ShadowMerge: A Novel Poisoning Attack on Graph-Based Agent Memory via Relation-Channel Conflicts

cs.CR · 2026-05-09 · unverdicted · novelty 8.0 · 3 refs

ShadowMerge exploits relation-channel conflicts to poison graph-based agent memory, achieving 93.8% average attack success rate on Mem0 and real-world datasets while bypassing existing defenses.

MemEvoBench: Benchmarking Safety Risks from Memory Misevolution in LLM Agents

cs.CL · 2026-04-17 · unverdicted · novelty 8.0 · 2 refs

MemEvoBench is presented as the first standardized benchmark for long-horizon memory safety in LLM agents, covering adversarial memory injection, noisy tool outputs, and biased feedback across QA and workflow tasks.

Agentic AI for Multi-Stage Physics Experiments at a Large-Scale User Facility Particle Accelerator

physics.acc-ph · 2025-09-21 · unverdicted · novelty 8.0

A language-model-driven agentic AI system autonomously executes multi-stage physics experiments at a production synchrotron light source, reducing preparation time by two orders of magnitude while upholding safety constraints.

Why Do Multi-Agent LLM Systems Fail?

cs.AI · 2025-03-17 · unverdicted · novelty 8.0

The authors create the first large-scale dataset and taxonomy of failure modes in multi-agent LLM systems to explain their limited performance gains.

Self-GC: Self-Governing Context for Long-Horizon LLM Agents

cs.AI · 2026-07-01 · unverdicted · novelty 7.0

Self-GC governs agent context as indexed objects with planner-proposed actions, achieving 84.85% no-impact on future continuations on a hard set versus 54-70% for baselines.

When Classic Cache Policies Fail: Learning-Augmented Replacement for Semantic Retrieval Buffers

cs.DB · 2026-07-01 · unverdicted · novelty 7.0

SOLAR is a learning-augmented policy for semantic cache replacement that achieves constant competitive ratio 3 and 5-75% gains over FIFO on retrieval workloads.

CLQT: A Closed-Loop, Cost-Aware, Strategy-Consistent Benchmark for Diagnostic Evaluation of LLM Portfolio-Management Agents

cs.AI · 2026-06-29 · unverdicted · novelty 7.0

CLQT is a new closed-loop, cost-aware benchmark that diagnoses LLM trading agent capabilities through strategy-consistent metrics and hash-verifiable trails rather than outcome rankings.

HyphaeDB: A Living Knowledge Topology for Agent-First Memory

cs.AI · 2026-06-27 · unverdicted · novelty 7.0

HyphaeDB introduces an agent-native memory system using HNSW topology for gossip-based knowledge propagation, enabling emergent behaviors in multi-agent AI.

LLM agents security duality: a comprehensive survey of self-security and empowered cybersecurity

cs.CR · 2026-06-26 · unverdicted · novelty 7.0

A survey of LLM agent self-security threats and mitigations alongside their applications in the cybersecurity lifecycle, introducing a synergy concept and empowerment framework.

Reclaim Evaluation: A Lossy Memory Is Worse Than an Empty One

cs.CL · 2026-06-24 · unverdicted · novelty 7.0

Reclaim evaluation shows lossy memory in language models is never better than empty memory across eight models, with a source-first policy restoring correctability at fixed budget.

MemTrace: Probing What Final Accuracy Misses in Long-Term Memory

cs.AI · 2026-06-15 · unverdicted · novelty 7.0

MemTrace shows that evidence utilization, not retrieval, is the dominant failure mode in LLM long-term memory systems across tested configurations.

Verified Detection and Prevention of Concurrency Anomalies in Multi-Agent Large Language Model Systems

cs.LG · 2026-06-15 · accept · novelty 7.0

Formalizes four concurrency anomalies in multi-agent LLM systems and mechanically verifies a hierarchy of sound detectors and preventions realized in Rust runtimes using TLA+ and Verus.

Control-Plane Placement Shapes Forgetting: An Architectural Study of Agent Memory Across Thirteen System Configurations

cs.CL · 2026-06-14 · unverdicted · novelty 7.0

An empirical comparison of thirteen control-plane placements in agent memory pipelines identifies three regimes with complementary forgetting recovery on a new 385-case adversarial benchmark, with mutation-time placement achieving 91.7-93.2% overall.

Learning What to Remember: Observability-Safe Memory Retention via Constrained Optimization for Long-Horizon Language Agents

cs.AI · 2026-06-09 · unverdicted · novelty 7.0

OSL-MR is a learning-augmented framework that casts memory retention as constrained stochastic optimization under partial observability and outperforms heuristic baselines on LoCoMo and LongMemEval.

Self-Harness: Harnesses That Improve Themselves

cs.CL · 2026-06-08 · unverdicted · novelty 7.0

Self-Harness lets LLM agents autonomously refine their interaction harnesses through weakness mining, proposal generation, and validation, raising held-out pass rates on Terminal-Bench-2.0 from 40.5% to 61.9%, 23.8% to 38.1%, and 42.9% to 57.1% across three models.

Memory Beyond Recall: A Dual-Process Cognitive Memory System for Self-Evolving LLM Agents

cs.CL · 2026-06-08 · unverdicted · novelty 7.0

DCPM reorganizes LLM agent memory into a cognitive hierarchy driven by a synchronous daytime belief writer and an asynchronous nighttime schema engine, reporting gains on cross-session inference benchmarks.

Agent Memory: Characterization and System Implications of Stateful Long-Horizon Workloads

cs.AI · 2026-06-04 · unverdicted · novelty 7.0

The paper delivers the first systems characterization of agent memory, with a four-axis taxonomy, phase-aware profiler, evaluation of ten systems on two benchmarks, and ten design recommendations.

TOKI: A Bitemporal Operator Algebra for Contradiction Resolution in LLM-Agent Persistent Memory

cs.DB · 2026-06-04 · unverdicted · novelty 7.0

TOKI types four common contradiction-resolution heuristics as bitemporal operators on a dual-row schema, supplies soundness theorems, and shows via a verdict matrix that it alone avoids three write-time anomalies while retaining a language-model judge.

SubtleMemory: A Benchmark for Fine-Grained Relational Memory Discrimination in Long-Horizon AI Agents

cs.AI · 2026-06-04 · unverdicted · novelty 7.0

SubtleMemory benchmark with 1,522 instances over 10 histories shows current memory systems are weak at fine-grained relational discrimination in long-term AI agent interactions.

eMEM: A Hybrid Spatio-Temporal Memory System For Embodied Agents

cs.RO · 2026-06-02 · unverdicted · novelty 7.0

eMEM is a multi-index memory architecture with tiered consolidation and ten recall tools for embodied agents, scoring 80.8 weighted mean on eMEM-Bench covering eight cognitive psychology paradigms and outperforming a flat RAG baseline on context and lure rejection tasks.

SkillDAG: Self-Evolving Typed Skill Graphs for LLM Skill Selection at Scale

cs.AI · 2026-06-02 · unverdicted · novelty 7.0

SkillDAG builds a self-evolving typed skill graph that LLM agents query and update at inference time, raising success on ALFWorld and SkillsBench by 12.8 and 8.6 points over graph baselines.

Better with Experience: Self-Evolving LLM Agents for Evidence-Grounded Health Community Notes

cs.CL · 2026-06-01 · unverdicted · novelty 7.0

EvoNote lets LLM agents self-evolve by distilling prior correction feedback into reusable memory for claim analysis, evidence acquisition, and note writing, outperforming human notes on a 1.2K health post benchmark.

Leyline: KV Cache Directives for Agentic Inference

cs.DC · 2026-05-31 · unverdicted · novelty 7.0

Leyline adds a policy-directed KV cache edit primitive with closed-form RoPE correction for agentic inference, reporting +11.2 pp cache-hit lift and +14.3 pp solve-rate gain.

citing papers explorer

Showing 50 of 312 citing papers.

REAL: A Reasoning-Enhanced Graph Framework for Long-Term Memory Management of LLMs cs.CL · 2026-06-09 · unverdicted · none · ref 29 · internal anchor
REAL represents long-term LLM memory as a temporal confidence-aware directed property graph with non-destructive updates and uses evaluator-guided beam search plus counterfactual inference for retrieval, reporting 22.72% average gains over baselines.
SafeGEO: Understanding Generative Engine Optimization Risks in Recommendation Agents cs.IR · 2026-06-08 · unverdicted · none · ref 4 · internal anchor
SafeGEO benchmark demonstrates that GEO attacks raise flawed product inclusion in recommendation sets by up to 83.2%, with partial mitigation from defensive prompting and evidence checks.
End-to-End Context Compression at Scale cs.CL · 2026-06-08 · unverdicted · none · ref 68 · internal anchor
LCLMs are scaled 0.6B-encoder 4B-decoder compressors pre-trained on over 350B tokens that improve the Pareto frontier for general-task performance, compression speed, and peak memory in long-context language model inference.
When More Cores Hurts: The Vector Database Scaling Paradox in HPC cs.DC · 2026-06-08 · unverdicted · none · ref 9 · internal anchor
Large-scale HPC evaluation of Qdrant, Milvus, and Weaviate reveals that workload patterns limit scaling and extra cores can reduce throughput, exposing a cloud-to-HPC design mismatch.
Emergence World: A Platform for Evaluating Long-Horizon Multi-Agent Autonomy cs.MA · 2026-06-06 · unverdicted · none · ref 82 · internal anchor
Emergence World is a model-agnostic multi-agent simulation platform integrating live data, 120+ tools, persistent memory, and democratic governance, illustrated by a 15-day study showing divergent outcomes across five LLM models.
Bayesian-Agent: Posterior-Guided Skill Evolution for LLM Agent Harnesses cs.CL · 2026-06-06 · unverdicted · none · ref 14 · internal anchor
Bayesian-Agent maintains feature-conditioned categorical posteriors over skills/SOPs from verified trajectories and maps them to actions that improve benchmark scores on SOP-Bench, Lifelong AgentBench, and RealFin-Bench.
SKILL.nb: Selective Formalization and Gated Execution for Durable Agent Workflows cs.AI · 2026-06-06 · unverdicted · none · ref 30 · internal anchor
SKILL.nb uses selective formalization and gate-conditioned execution in auditable notebooks to improve durability of agent workflows, achieving 53.7% success on WebArena-Verified with 91.7% retention across re-executions.
Constrained Dominant Sets for Multimodal Document Question Answering cs.IR · 2026-06-05 · unverdicted · none · ref 39 · internal anchor
Constrained Dominant Sets on query-augmented graphs select complementary evidence for long multimodal document QA, claiming new SOTA of 66.99 on VisDoMBench and gains of 37.1 and 4.8 points over baselines.
Less Context, More Accuracy: A Bi-Temporal Memory Engine for LLM Agents Where a Lean Retrieved Context Beats the Full History cs.CL · 2026-06-05 · accept · none · ref 14 · internal anchor
Engram's hybrid bi-temporal retrieval from a knowledge graph with provenance yields 83.6% accuracy on LongMemEval_S using 9.6k tokens versus 73.2% with full history.
TokenMizer: Graph-Structured Session Memory for Long-Horizon LLM Context Management cs.AI · 2026-06-04 · unverdicted · none · ref 2 · internal anchor
TokenMizer builds a knowledge graph of LLM sessions and serializes it into 78-token resume blocks that retain more task, decision, and file information than flat-text baselines at roughly half the token cost.
RAMPART: Registry-based Agentic Memory with Priority-Aware Runtime Transformation cs.CL · 2026-06-03 · unverdicted · none · ref 7 · internal anchor
RAMPART is a registry-based memory system for LLM agents with priority-aware primitives that experimentally demonstrates position-dependent performance cliffs and benefits from block grouping and relevance gating.
Scaling Self-Evolving Agents via Parametric Memory cs.AI · 2026-06-03 · unverdicted · none · ref 11 · internal anchor
TMEM lets LLM agents evolve their policy mid-episode by absorbing distilled supervision into online LoRA updates, outperforming summary and retrieval baselines on several long-context benchmarks.
Language Models Need Sleep: Learning to Self-Modify and Consolidate Memories cs.LG · 2026-06-02 · unverdicted · none · ref 61 · internal anchor
Language models can use a two-stage sleep process of upward distillation for memory consolidation and RL-based dreaming for unsupervised self-improvement to enable continual learning.
SkillRevise: Improving LLM-Authored Agent Skills via Trace-Conditioned Skill Revision cs.AI · 2026-05-31 · unverdicted · none · ref 32 · internal anchor
SkillRevise iteratively refines initial LLM-generated agent skills using execution traces to diagnose defects and apply repairs, raising success rates from 36.05% to 61.63% on SkillsBench across three benchmarks and five LLMs.
MemPro: Agentic Memory Systems as Evolvable Programs cs.CL · 2026-05-30 · unverdicted · none · ref 33 · internal anchor
MemPro evolves the entire MCR pipeline as runnable programs via failure-guided refinement on a version tree and outperforms static baselines on LongMemEval, LoCoMo, HotpotQA, and NarrativeQA.
Eywa: Provenance-Grounded Long-Term Memory for AI Agents cs.CL · 2026-05-29 · unverdicted · none · ref 18 · internal anchor
Eywa introduces a provenance-grounded memory system for persistent AI agents featuring evidence-first storage, typed validation, and deterministic multi-route retrieval, reporting 90.19% accuracy on LoCoMo and 88.2% on LongMemEval-S.
SAGE: A Novelty Gate for Efficient Memory Evolution in Agentic LLMs cs.CL · 2026-05-29 · unverdicted · none · ref 3 · internal anchor
SAGE applies a von Mises-Fisher density estimator with an adaptive threshold to route memory updates, achieving best-in-class token-F1 on LoCoMo while reducing API cost 3.4x and latency 2.5x on GPT-4o-mini.
Selective QA over Conflicting Multi-Source Personal Memory: A Diagnostic Testbed and Method Comparison cs.AI · 2026-05-28 · conditional · none · ref 27 · internal anchor
Introduces a benchmark with 34,560 instances for selective QA over conflicting multi-source personal memory and compares fusion methods against LLMs.
GRASP: Gated Regression-Aware Skill Proposer for Self-Improving LLM Agents cs.AI · 2026-05-28 · unverdicted · none · ref 3 · internal anchor
GRASP adds a regression-aware acceptance gate to skill proposal for LLM agents, producing large gains on clinical benchmarks while preventing silent regressions on prior behavior.
Entity-Collision: A Stratified Protocol for Attributing Retrieval Lift in Agent Memory cs.CL · 2026-05-28 · unverdicted · none · ref 2 · internal anchor
Entity-collision protocol stratifies agent-memory retrieval tests by tag and pins BM25 floor via shared entity tokens to attribute lift specifically to embedders.
PatchBoard: Schema-Grounded State Mutation for Reliable and Auditable LLM Multi-Agent Collaboration cs.CL · 2026-05-28 · unverdicted · none · ref 2 · internal anchor
PatchBoard introduces schema-grounded JSON Patch state mutations with an Architect agent and validation kernel, reporting 84.6% success and lower token use on 630 ALFWorld episodes versus LangGraph and Flock baselines.
Rethinking Memory as Continuously Evolving Connectivity cs.CL · 2026-05-27 · unverdicted · none · ref 36 · internal anchor
FluxMem evolves memory as a heterogeneous graph via three refinement stages and reports consistent state-of-the-art results on LoCoMo, Mind2Web, and GAIA benchmarks.
MemCog: From Memory-as-Tool to Memory-as-Cognition in Conversational Agents cs.AI · 2026-05-27 · unverdicted · none · ref 3 · internal anchor
MemCog introduces a Memory-as-Cognition paradigm with Navigable Memory Store, Cross-Dimensional Navigation Interface, and Proactive Reasoning Protocol, claiming SOTA results on LoCoMo, LongMemEval, and a new ProactiveMemBench.
Long Live the Librarian! A Persistent Search Sub-Agent for Energy-Efficient Multi-Agent Software Engineering Systems cs.MA · 2026-05-27 · unverdicted · none · ref 4 · internal anchor
Librarian reduces per-episode GPU energy use by up to 25% in existing multi-agent SWE systems on SWE-Bench Verified by tracking search history and minimizing redundant output tokens while preserving task performance.
Uni-LaViRA: Language-Vision-Robot Actions Translation for Unified Embodied Navigation cs.RO · 2026-05-26 · unverdicted · none · ref 55 · internal anchor
A zero-shot unified agent for VLN-CE, ObjectNav, EQA and Aerial-VLN on wheeled, quadruped, humanoid and UAV platforms that translates language and vision inputs into actions via MLLMs plus TDM and SCB mechanisms, matching trained foundation models on multiple benchmarks.
Anticipate and Learn: Unleashing Idle-Time Compute in Proactive Agents cs.CL · 2026-05-25 · unverdicted · none · ref 29 · internal anchor
ProAct uses idle compute to anticipate user needs via dialogue history and memory, achieving 14.8% fewer turns, 11.7% less user effort, and 28.1% fewer hallucinations than reactive baselines on the new ProActEval benchmark.
EfficientGraph-RAG: Structured Retrieval-State Management for Cross-Task Retrieval-Augmented Generation cs.CL · 2026-05-25 · unverdicted · none · ref 13 · internal anchor
EfficientGraph-RAG structures retrieval state with TAM, MARS and SMP, ranking first on averaged LongBench answer-quality metrics while cutting token use 3.51x on HotpotQA.
SemanticZip: A Pilot Framework for Lossy Text Compression with LLMs as Semantic Decompressors cs.LG · 2026-05-23 · unverdicted · none · ref 8 · internal anchor
SemanticZip is a pilot framework introducing LLM-mediated lossy text compression with an experimental interface evaluating six representation regimes on five diagnostic cases for semantic atom recovery and token efficiency.
SAM: State-Adaptive Memory for Long-Horizon Reasoning Agent cs.AI · 2026-05-23 · unverdicted · none · ref 22 · internal anchor
SAM is a standalone memory framework for long-horizon LLM agents that creates state-adaptive cues from interactions, preserves raw trajectories for intent-driven recall, and optimizes the module via expert supervision and RL, outperforming baselines on BrowseComp and related benchmarks.
What Training Data Teaches RL Memory Agents: An Empirical Study of Curriculum Effects in Memory-Augmented QA cs.CL · 2026-05-21 · unverdicted · none · ref 8 · internal anchor
Controlled study shows mixed training curricula improve aggregate F1 on memory QA benchmarks while out-of-domain data transfers targeted skills like temporal reasoning, with per-question-type effects exceeding aggregate differences.
DeferMem: Query-Time Evidence Distillation via Reinforcement Learning for Long-Term Memory QA cs.CL · 2026-05-21 · unverdicted · none · ref 27 · internal anchor
DeferMem decouples memory QA into high-recall retrieval and RL-based query-conditioned evidence distillation, outperforming baselines on LoCoMo and LongMemEval-S with highest accuracy, fastest runtime, and zero API token cost.
Ratchet: A Minimal Hygiene Recipe for Self-Evolving LLM Agents cs.AI · 2026-05-21 · conditional · none · ref 5 · internal anchor
Ratchet provides a minimal hygiene recipe for self-managing skill libraries in frozen LLM agents, delivering +0.328 rolling-mean pass@1 gain on MBPP+ hard-100 and +0.22 peak lift on SWE-bench Verified.
The Log is the Agent: Event-Sourced Reactive Graphs for Auditable, Forkable Agentic Systems cs.AI · 2026-05-21 · unverdicted · none · ref 2 · internal anchor
ActiveGraph inverts traditional agent frameworks by treating the append-only event log as the primary source of truth, from which the reactive graph is projected, yielding deterministic replay, forking, and lineage tracking.
Mem-$\pi$: Adaptive Memory through Learning When and What to Generate cs.CL · 2026-05-20 · unverdicted · none · ref 30 · internal anchor
Mem-π is a framework using a dedicated model and decision-content decoupled RL to generate context-specific guidance on demand for LLM agents, outperforming retrieval baselines by over 30% on web navigation.
Auto-Dreamer: Learning Offline Memory Consolidation for Language Agents cs.CL · 2026-05-20 · unverdicted · none · ref 22 · internal anchor
Auto-Dreamer trains an offline memory consolidator via GRPO on agent performance to abstract cross-session patterns, outperforming baselines by 7 points on ScienceWorld with 12x smaller memory and generalizing to ALFWorld and WebArena.
SPIKE: An Adaptive Dual Controller Framework for Cost-Efficient Long-Horizon Game Agents cs.CV · 2026-05-18 · unverdicted · none · ref 19 · internal anchor
SPIKE dual-controller framework raises success rates 5-9 points and cuts tokens 55% in StarDojo agents by reusing strategic plans across stable segments and escalating only at detected events.
Compress the Context, Keep the Commitments: A Formal Framework for Verifiable LLM Context Compression cs.LG · 2026-05-17 · unverdicted · none · ref 14 · internal anchor
Context Codec is a commitment-level framework for verifiable LLM context compression using semantic atoms, defined metrics, and a compact rendering language.
H-Mem: A Novel Memory Mechanism for Evolving and Retrieving Agent Memory via a Hybrid Structure cs.CL · 2026-05-15 · unverdicted · none · ref 27 · internal anchor
H-Mem introduces a hybrid tree-plus-graph memory mechanism that evolves short-term agent memories into long-term summaries and enables efficient retrieval, reporting state-of-the-art QA results on three benchmarks.
Solvita: Enhancing Large Language Models for Competitive Programming via Agentic Evolution cs.AI · 2026-05-14 · unverdicted · none · ref 40 · internal anchor
Solvita is an agentic evolution system using Planner, Solver, Oracle, and Hacker agents with trainable graph knowledge networks updated by reinforcement learning on pass/fail and vulnerability signals to achieve SOTA code generation performance.
Agentic Recommender System with Hierarchical Belief-State Memory cs.CL · 2026-05-14 · unverdicted · none · ref 18 · 2 links · internal anchor
MARS uses hierarchical event-preference-profile memory with an LLM-scheduled lifecycle of six operations to achieve state-of-the-art results on InstructRec benchmarks.
Hypergraph Enterprise Agentic Reasoner over Heterogeneous Business Systems cs.AI · 2026-05-14 · unverdicted · none · ref 49 · internal anchor
HEAR uses a stratified hypergraph ontology to orchestrate evidence-driven multi-hop reasoning over heterogeneous business systems, reaching 94.7% accuracy on supply-chain root-cause tasks with open-weight models.
Evaluating Memory Condensation Strategies for Coding Agents in Data-Driven Scientific Discovery cs.LG · 2026-05-13 · unverdicted · none · ref 12 · internal anchor
Empirical evaluation of eight memory condensation strategies on 480 DiscoveryBench tasks finds no significant impact on hypothesis quality but domain-dependent differences in token efficiency.
PRISM: Pareto-Efficient Retrieval over Intent-Aware Structured Memory for Long-Horizon Agents cs.CL · 2026-05-12 · unverdicted · none · ref 16 · 2 links · internal anchor
PRISM is a new inference-time retrieval system that achieves higher accuracy than baselines on long-horizon agent tasks while using an order of magnitude less context by combining hierarchical graph search, intent-based costing, compression, and adaptive routing over structured memory.
An Annotation Scheme and Classifier for Personal Facts in Dialogue cs.CL · 2026-05-11 · accept · none · ref 26 · internal anchor
An extended annotation scheme with new categories and attributes plus a Gemma-300M-based multi-head classifier achieves 81.6% macro F1 on personal fact classification, outperforming few-shot LLM baselines by nearly 9 points with lower compute.
PREPING: Building Agent Memory without Tasks cs.AI · 2026-05-11 · unverdicted · none · ref 17 · internal anchor
Preping builds agent memory via proposer-guided synthetic practice and selective validation, matching offline/online methods at 2-3x lower deployment cost.
EvoMAS: Learning Execution-Time Workflows for Multi-Agent Systems cs.AI · 2026-05-09 · unverdicted · none · ref 20 · internal anchor
EvoMAS trains a workflow adapter with policy gradients to dynamically instantiate stage-specific multi-agent workflows from a fixed agent pool, using explicit task-state construction and terminal success signals, and outperforms static baselines on GAIA, HLE, and DeepResearcher.
Self-Consolidating Language Models: Continual Knowledge Incorporation from Context cs.CL · 2026-05-08 · unverdicted · none · ref 9 · 2 links · internal anchor
SCoL trains LLMs via meta-reinforcement learning to generate layer-specific update instructions that improve knowledge acquisition and retention from context streams over standard baselines.
Storage Is Not Memory: A Retrieval-Centered Architecture for Agent Recall cs.CL · 2026-05-06 · conditional · none · ref 10 · internal anchor
True Memory is a verbatim-event retrieval pipeline running on a single SQLite file that reaches 93% accuracy on LoCoMo multi-session questions, outperforming Mem0, Supermemory, Zep, and matching or exceeding EverMemOS and Hindsight on other long-context benchmarks.
Hierarchical Visual Agent: Managing Contexts in Joint Image-Text Space for Advanced Chart Reasoning cs.CV · 2026-05-05 · unverdicted · none · ref 49 · internal anchor
HierVA improves multi-step chart question answering by having a high-level manager maintain key joint contexts while specialized workers perform targeted reasoning with visual zoom-in.
ScrapMem: A Bio-inspired Framework for On-device Personalized Agent Memory via Optical Forgetting cs.AI · 2026-05-05 · unverdicted · none · ref 36 · 2 links · internal anchor
ScrapMem reports SOTA 51.0% Joint@10 on ATM-Bench with up to 93% memory reduction and 70.3% Recall@10 via optical forgetting and EM-Graph.

MemGPT: Towards LLMs as Operating Systems

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer