SEAM learns to generate utility-optimized structured experiences via rollouts to boost frozen LLM performance on mathematical reasoning benchmarks with low overhead.
hub
Meminsight: Autonomous memory augmentation for llm agents
11 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
EvoMemBench evaluates 15 memory methods for LLM agents and finds long-context baselines competitive with no single memory approach working consistently across settings.
A lightweight supervised router using frozen-LLM embeddings for memory admission decisions outperforms LLM-based memory managers in both F1 score and latency on the LoCoMo benchmark.
APEX-MEM uses property graphs with temporal events, append-only storage, and an agentic retrieval system to reach 88.88% accuracy on LOCOMO QA and 86.2% on LongMemEval, outperforming prior session-aware methods.
ReCreate automatically creates and adapts domain LLM agents by turning past interaction experiences into scaffold edits via reasoning and hierarchical abstraction.
AgeMem unifies long-term and short-term memory management in LLM agents by exposing memory operations as learnable tool actions trained via three-stage progressive reinforcement learning, outperforming baselines on long-horizon tasks.
Memory-R1 uses PPO and GRPO to train a Memory Manager (ADD/UPDATE/DELETE/NOOP) and Answer Agent that together outperform baselines on long-context QA benchmarks after training on only 152 examples.
HyperMem is a hypergraph memory architecture that groups related conversation episodes and facts via hyperedges and reports 92.73% LLM-as-a-judge accuracy on the LoCoMo benchmark.
The paper surveys human memory categories, maps them to LLM memory, and proposes a new three-dimension (object, form, time) categorization into eight quadrants to organize existing work and highlight open problems.
LLM agent memory is organized into Storage (preserving trajectories), Reflection (refining them), and Experience (abstracting into reusable knowledge) stages driven by needs for long-range consistency, dynamic adaptation, and continual learning.
The paper delivers the first systematic review of self-evolving agents, structured around what components evolve, when adaptation occurs, and how it is implemented.
citing papers explorer
-
EvoMemBench: Benchmarking Agent Memory from a Self-Evolving Perspective
EvoMemBench evaluates 15 memory methods for LLM agents and finds long-context baselines competitive with no single memory approach working consistently across settings.
-
MemRouter: Memory-as-Embedding Routing for Long-Term Conversational Agents
A lightweight supervised router using frozen-LLM embeddings for memory admission decisions outperforms LLM-based memory managers in both F1 score and latency on the LoCoMo benchmark.
-
APEX-MEM: Agentic Semi-Structured Memory with Temporal Reasoning for Long-Term Conversational AI
APEX-MEM uses property graphs with temporal events, append-only storage, and an agentic retrieval system to reach 88.88% accuracy on LOCOMO QA and 86.2% on LongMemEval, outperforming prior session-aware methods.
-
Agentic Memory: Learning Unified Long-Term and Short-Term Memory Management for Large Language Model Agents
AgeMem unifies long-term and short-term memory management in LLM agents by exposing memory operations as learnable tool actions trained via three-stage progressive reinforcement learning, outperforming baselines on long-horizon tasks.
-
Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning
Memory-R1 uses PPO and GRPO to train a Memory Manager (ADD/UPDATE/DELETE/NOOP) and Answer Agent that together outperform baselines on long-context QA benchmarks after training on only 152 examples.
-
HyperMem: Hypergraph Memory for Long-Term Conversations
HyperMem is a hypergraph memory architecture that groups related conversation episodes and facts via hyperedges and reports 92.73% LLM-as-a-judge accuracy on the LoCoMo benchmark.