MemLens benchmark shows long-context LVLMs lose accuracy with length while memory agents lose visual fidelity, with multi-session reasoning below 30% for most systems and neither approach solving the task alone.
Title resolution pending
15 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 3polarities
background 3representative citing papers
LMEB benchmark shows that embedding models' performance on traditional retrieval does not transfer to long-horizon memory tasks, larger models do not always perform better, and LMEB measures capabilities orthogonal to MTEB.
MemShot renders local dialogue spans as structured visual memory units to improve long-term dialogue modeling in LLMs, achieving competitive benchmark performance with 70x faster memory construction.
DeferMem decouples memory QA into high-recall retrieval and RL-based query-conditioned evidence distillation, outperforming baselines on LoCoMo and LongMemEval-S with highest accuracy, fastest runtime, and zero API token cost.
Memory-equipped LLM agents exhibit increasing safety violation rates as memory accumulates across independent tasks, termed temporal memory contamination, detected via a new trigger-probe protocol.
Empirical evaluation of eight memory condensation strategies on 480 DiscoveryBench tasks finds no significant impact on hypothesis quality but domain-dependent differences in token efficiency.
A three-regime framework resolves contradictions in LLM context vs. parametric knowledge conflicts by distinguishing single-source updating, competitive integration, and task-appropriate selection, with empirical confirmation of certainty gradients and task effects across five models.
A semantics-aware KV cache hierarchy offloads tokens to slower memory with zero approximation error, demonstrating that LLM reasoning accuracy depends only on the permanent eviction ratio and not on HBM residency.
Schema-aware iterative extraction turns AI memory into a verified system of record, reaching 90-97% accuracy on extraction and end-to-end memory benchmarks where retrieval baselines score 80-87%.
HingeMem segments dialogue memory via boundary-triggered hyperedges over four elements and applies query-adaptive retrieval, yielding ~20% relative gains and 68% lower QA token cost versus baselines on LOCOMO.
A unified framework for LLM agent memory is benchmarked, with a new hybrid method outperforming state-of-the-art on standard tasks.
SkillPCF is a closed-loop agent framework with a physics-guided memory skill bank, reinforcement-learned skill selection, and simulator-grounded evolution that improves design quality and efficiency for photonic crystal fiber inverse design under limited simulation budgets.
HSUGA improves LLM-enhanced sequential recommendation via staged hierarchical semantic understanding for better preference extraction and group-aware alignment that varies intensity by user activity level.
A comprehensive review of self-evolving AI agents that improve themselves over time, organized via a framework of inputs, agent system, environment, and optimizers, with domain-specific and safety discussions.
MemOS introduces a unified memory management framework for LLMs using MemCubes to handle and evolve different memory types for improved controllability and evolvability.
citing papers explorer
-
A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems
A comprehensive review of self-evolving AI agents that improve themselves over time, organized via a framework of inputs, agent system, environment, and optimizers, with domain-specific and safety discussions.