LMEB benchmark shows that embedding models' performance on traditional retrieval does not transfer to long-horizon memory tasks, larger models do not always perform better, and LMEB measures capabilities orthogonal to MTEB.
hub
Memory operations in large language models: A survey
12 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
background 3polarities
background 3representative citing papers
DeferMem decouples memory QA into high-recall retrieval and RL-based query-conditioned evidence distillation, outperforming baselines on LoCoMo and LongMemEval-S with highest accuracy, fastest runtime, and zero API token cost.
Memory-equipped LLM agents exhibit increasing safety violation rates as memory accumulates across independent tasks, termed temporal memory contamination, detected via a new trigger-probe protocol.
Empirical evaluation of eight memory condensation strategies on 480 DiscoveryBench tasks finds no significant impact on hypothesis quality but domain-dependent differences in token efficiency.
A three-regime framework resolves contradictions in LLM context vs. parametric knowledge conflicts by distinguishing single-source updating, competitive integration, and task-appropriate selection, with empirical confirmation of certainty gradients and task effects across five models.
A semantics-aware KV cache hierarchy offloads tokens to slower memory with zero approximation error, demonstrating that LLM reasoning accuracy depends only on the permanent eviction ratio and not on HBM residency.
Schema-aware iterative extraction turns AI memory into a verified system of record, reaching 90-97% accuracy on extraction and end-to-end memory benchmarks where retrieval baselines score 80-87%.
HingeMem segments dialogue memory via boundary-triggered hyperedges over four elements and applies query-adaptive retrieval, yielding ~20% relative gains and 68% lower QA token cost versus baselines on LOCOMO.
A unified framework for LLM agent memory is benchmarked, with a new hybrid method outperforming state-of-the-art on standard tasks.
HSUGA improves LLM-enhanced sequential recommendation via staged hierarchical semantic understanding for better preference extraction and group-aware alignment that varies intensity by user activity level.
A comprehensive review of self-evolving AI agents that improve themselves over time, organized via a framework of inputs, agent system, environment, and optimizers, with domain-specific and safety discussions.
MemOS introduces a unified memory management framework for LLMs using MemCubes to handle and evolve different memory types for improved controllability and evolvability.
citing papers explorer
-
LMEB: Long-horizon Memory Embedding Benchmark
LMEB benchmark shows that embedding models' performance on traditional retrieval does not transfer to long-horizon memory tasks, larger models do not always perform better, and LMEB measures capabilities orthogonal to MTEB.
-
DeferMem: Query-Time Evidence Distillation via Reinforcement Learning for Long-Term Memory QA
DeferMem decouples memory QA into high-recall retrieval and RL-based query-conditioned evidence distillation, outperforming baselines on LoCoMo and LongMemEval-S with highest accuracy, fastest runtime, and zero API token cost.
-
Remembering More, Risking More: Longitudinal Safety Risks in Memory-Equipped LLM Agents
Memory-equipped LLM agents exhibit increasing safety violation rates as memory accumulates across independent tasks, termed temporal memory contamination, detected via a new trigger-probe protocol.
-
Evaluating Memory Condensation Strategies for Coding Agents in Data-Driven Scientific Discovery
Empirical evaluation of eight memory condensation strategies on 480 DiscoveryBench tasks finds no significant impact on hypothesis quality but domain-dependent differences in token efficiency.
-
Three Regimes of Context-Parametric Conflict: A Predictive Framework and Empirical Validation
A three-regime framework resolves contradictions in LLM context vs. parametric knowledge conflicts by distinguishing single-source updating, competitive integration, and task-appropriate selection, with empirical confirmation of certainty gradients and task effects across five models.
-
Not All Thoughts Need HBM: Semantics-Aware Memory Hierarchy for LLM Reasoning
A semantics-aware KV cache hierarchy offloads tokens to slower memory with zero approximation error, demonstrating that LLM reasoning accuracy depends only on the permanent eviction ratio and not on HBM residency.
-
From Unstructured Recall to Schema-Grounded Memory: Reliable AI Memory via Iterative, Schema-Aware Extraction
Schema-aware iterative extraction turns AI memory into a verified system of record, reaching 90-97% accuracy on extraction and end-to-end memory benchmarks where retrieval baselines score 80-87%.
-
HingeMem: Boundary Guided Long-Term Memory with Query Adaptive Retrieval for Scalable Dialogues
HingeMem segments dialogue memory via boundary-triggered hyperedges over four elements and applies query-adaptive retrieval, yielding ~20% relative gains and 68% lower QA token cost versus baselines on LOCOMO.
-
Memory in the LLM Era: Modular Architectures and Strategies in a Unified Framework
A unified framework for LLM agent memory is benchmarked, with a new hybrid method outperforming state-of-the-art on standard tasks.
-
HSUGA: LLM-Enhanced Recommendation with Hierarchical Semantic Understanding and Group-Aware Alignment
HSUGA improves LLM-enhanced sequential recommendation via staged hierarchical semantic understanding for better preference extraction and group-aware alignment that varies intensity by user activity level.
-
A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems
A comprehensive review of self-evolving AI agents that improve themselves over time, organized via a framework of inputs, agent system, environment, and optimizers, with domain-specific and safety discussions.
-
MemOS: A Memory OS for AI System
MemOS introduces a unified memory management framework for LLMs using MemCubes to handle and evolve different memory types for improved controllability and evolvability.