super hub Canonical reference

MemGPT: Towards LLMs as Operating Systems

Charles Packer, Ion Stoica, Kevin Lin, Sarah Wooders, Shishir G. Patil, Vivian Fang · 2023 · cs.AI · arXiv 2310.08560

Canonical reference. 77% of citing Pith papers cite this work as background.

313 Pith papers citing it

Background 77% of classified citations

open full Pith review browse 313 citing papers more from Charles Packer arXiv PDF

abstract

Large language models (LLMs) have revolutionized AI, but are constrained by limited context windows, hindering their utility in tasks like extended conversations and document analysis. To enable using context beyond limited context windows, we propose virtual context management, a technique drawing inspiration from hierarchical memory systems in traditional operating systems that provide the appearance of large memory resources through data movement between fast and slow memory. Using this technique, we introduce MemGPT (Memory-GPT), a system that intelligently manages different memory tiers in order to effectively provide extended context within the LLM's limited context window, and utilizes interrupts to manage control flow between itself and the user. We evaluate our OS-inspired design in two domains where the limited context windows of modern LLMs severely handicaps their performance: document analysis, where MemGPT is able to analyze large documents that far exceed the underlying LLM's context window, and multi-session chat, where MemGPT can create conversational agents that remember, reflect, and evolve dynamically through long-term interactions with their users. We release MemGPT code and data for our experiments at https://memgpt.ai.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 36 baseline 3 dataset 3 method 1 other 1

citation-polarity summary

background 34 baseline 3 use dataset 3 support 2 unclear 1 use method 1

claims ledger

abstract Large language models (LLMs) have revolutionized AI, but are constrained by limited context windows, hindering their utility in tasks like extended conversations and document analysis. To enable using context beyond limited context windows, we propose virtual context management, a technique drawing inspiration from hierarchical memory systems in traditional operating systems that provide the appearance of large memory resources through data movement between fast and slow memory. Using this technique, we introduce MemGPT (Memory-GPT), a system that intelligently manages different memory tiers i

authors

Charles Packer Ion Stoica Kevin Lin Sarah Wooders Shishir G. Patil Vivian Fang

co-cited works

representative citing papers

Continual Learning Bench: Evaluating Frontier AI Systems in Real-World Stateful Environments

cs.AI · 2026-06-04 · unverdicted · novelty 8.0

CL-Bench is the first expert-validated benchmark for continual learning in frontier LLMs across six real-world domains, showing limited gains and that naive in-context learning outperforms dedicated memory systems.

ShadowMerge: A Novel Poisoning Attack on Graph-Based Agent Memory via Relation-Channel Conflicts

cs.CR · 2026-05-09 · unverdicted · novelty 8.0 · 3 refs

ShadowMerge exploits relation-channel conflicts to poison graph-based agent memory, achieving 93.8% average attack success rate on Mem0 and real-world datasets while bypassing existing defenses.

MemEvoBench: Benchmarking Safety Risks from Memory Misevolution in LLM Agents

cs.CL · 2026-04-17 · unverdicted · novelty 8.0 · 2 refs

MemEvoBench is presented as the first standardized benchmark for long-horizon memory safety in LLM agents, covering adversarial memory injection, noisy tool outputs, and biased feedback across QA and workflow tasks.

Agentic AI for Multi-Stage Physics Experiments at a Large-Scale User Facility Particle Accelerator

physics.acc-ph · 2025-09-21 · unverdicted · novelty 8.0

A language-model-driven agentic AI system autonomously executes multi-stage physics experiments at a production synchrotron light source, reducing preparation time by two orders of magnitude while upholding safety constraints.

Why Do Multi-Agent LLM Systems Fail?

cs.AI · 2025-03-17 · unverdicted · novelty 8.0

The authors create the first large-scale dataset and taxonomy of failure modes in multi-agent LLM systems to explain their limited performance gains.

Self-GC: Self-Governing Context for Long-Horizon LLM Agents

cs.AI · 2026-07-01 · unverdicted · novelty 7.0

Self-GC governs agent context as indexed objects with planner-proposed actions, achieving 84.85% no-impact on future continuations on a hard set versus 54-70% for baselines.

When Classic Cache Policies Fail: Learning-Augmented Replacement for Semantic Retrieval Buffers

cs.DB · 2026-07-01 · unverdicted · novelty 7.0

SOLAR is a learning-augmented policy for semantic cache replacement that achieves constant competitive ratio 3 and 5-75% gains over FIFO on retrieval workloads.

CLQT: A Closed-Loop, Cost-Aware, Strategy-Consistent Benchmark for Diagnostic Evaluation of LLM Portfolio-Management Agents

cs.AI · 2026-06-29 · unverdicted · novelty 7.0

CLQT is a new closed-loop, cost-aware benchmark that diagnoses LLM trading agent capabilities through strategy-consistent metrics and hash-verifiable trails rather than outcome rankings.

HyphaeDB: A Living Knowledge Topology for Agent-First Memory

cs.AI · 2026-06-27 · unverdicted · novelty 7.0

HyphaeDB introduces an agent-native memory system using HNSW topology for gossip-based knowledge propagation, enabling emergent behaviors in multi-agent AI.

LLM agents security duality: a comprehensive survey of self-security and empowered cybersecurity

cs.CR · 2026-06-26 · unverdicted · novelty 7.0

A survey of LLM agent self-security threats and mitigations alongside their applications in the cybersecurity lifecycle, introducing a synergy concept and empowerment framework.

Reclaim Evaluation: A Lossy Memory Is Worse Than an Empty One

cs.CL · 2026-06-24 · unverdicted · novelty 7.0

Reclaim evaluation shows lossy memory in language models is never better than empty memory across eight models, with a source-first policy restoring correctability at fixed budget.

MemTrace: Probing What Final Accuracy Misses in Long-Term Memory

cs.AI · 2026-06-15 · unverdicted · novelty 7.0

MemTrace shows that evidence utilization, not retrieval, is the dominant failure mode in LLM long-term memory systems across tested configurations.

Verified Detection and Prevention of Concurrency Anomalies in Multi-Agent Large Language Model Systems

cs.LG · 2026-06-15 · accept · novelty 7.0

Formalizes four concurrency anomalies in multi-agent LLM systems and mechanically verifies a hierarchy of sound detectors and preventions realized in Rust runtimes using TLA+ and Verus.

Control-Plane Placement Shapes Forgetting: An Architectural Study of Agent Memory Across Thirteen System Configurations

cs.CL · 2026-06-14 · unverdicted · novelty 7.0

An empirical comparison of thirteen control-plane placements in agent memory pipelines identifies three regimes with complementary forgetting recovery on a new 385-case adversarial benchmark, with mutation-time placement achieving 91.7-93.2% overall.

Learning What to Remember: Observability-Safe Memory Retention via Constrained Optimization for Long-Horizon Language Agents

cs.AI · 2026-06-09 · unverdicted · novelty 7.0

OSL-MR is a learning-augmented framework that casts memory retention as constrained stochastic optimization under partial observability and outperforms heuristic baselines on LoCoMo and LongMemEval.

Self-Harness: Harnesses That Improve Themselves

cs.CL · 2026-06-08 · unverdicted · novelty 7.0

Self-Harness lets LLM agents autonomously refine their interaction harnesses through weakness mining, proposal generation, and validation, raising held-out pass rates on Terminal-Bench-2.0 from 40.5% to 61.9%, 23.8% to 38.1%, and 42.9% to 57.1% across three models.

Memory Beyond Recall: A Dual-Process Cognitive Memory System for Self-Evolving LLM Agents

cs.CL · 2026-06-08 · unverdicted · novelty 7.0

DCPM reorganizes LLM agent memory into a cognitive hierarchy driven by a synchronous daytime belief writer and an asynchronous nighttime schema engine, reporting gains on cross-session inference benchmarks.

Agent Memory: Characterization and System Implications of Stateful Long-Horizon Workloads

cs.AI · 2026-06-04 · unverdicted · novelty 7.0

The paper delivers the first systems characterization of agent memory, with a four-axis taxonomy, phase-aware profiler, evaluation of ten systems on two benchmarks, and ten design recommendations.

TOKI: A Bitemporal Operator Algebra for Contradiction Resolution in LLM-Agent Persistent Memory

cs.DB · 2026-06-04 · unverdicted · novelty 7.0

TOKI types four common contradiction-resolution heuristics as bitemporal operators on a dual-row schema, supplies soundness theorems, and shows via a verdict matrix that it alone avoids three write-time anomalies while retaining a language-model judge.

SubtleMemory: A Benchmark for Fine-Grained Relational Memory Discrimination in Long-Horizon AI Agents

cs.AI · 2026-06-04 · unverdicted · novelty 7.0

SubtleMemory benchmark with 1,522 instances over 10 histories shows current memory systems are weak at fine-grained relational discrimination in long-term AI agent interactions.

eMEM: A Hybrid Spatio-Temporal Memory System For Embodied Agents

cs.RO · 2026-06-02 · unverdicted · novelty 7.0

eMEM is a multi-index memory architecture with tiered consolidation and ten recall tools for embodied agents, scoring 80.8 weighted mean on eMEM-Bench covering eight cognitive psychology paradigms and outperforming a flat RAG baseline on context and lure rejection tasks.

SkillDAG: Self-Evolving Typed Skill Graphs for LLM Skill Selection at Scale

cs.AI · 2026-06-02 · unverdicted · novelty 7.0

SkillDAG builds a self-evolving typed skill graph that LLM agents query and update at inference time, raising success on ALFWorld and SkillsBench by 12.8 and 8.6 points over graph baselines.

Better with Experience: Self-Evolving LLM Agents for Evidence-Grounded Health Community Notes

cs.CL · 2026-06-01 · unverdicted · novelty 7.0

EvoNote lets LLM agents self-evolve by distilling prior correction feedback into reusable memory for claim analysis, evidence acquisition, and note writing, outperforming human notes on a 1.2K health post benchmark.

Leyline: KV Cache Directives for Agentic Inference

cs.DC · 2026-05-31 · unverdicted · novelty 7.0

Leyline adds a policy-directed KV cache edit primitive with closed-form RoPE correction for agentic inference, reporting +11.2 pp cache-hit lift and +14.3 pp solve-rate gain.

citing papers explorer

Showing 50 of 313 citing papers.

Trojan Hippo: Weaponizing Agent Memory for Data Exfiltration cs.CR · 2026-05-03 · unverdicted · none · ref 61 · 2 links · internal anchor
The paper defines and evaluates Trojan Hippo attacks on LLM agent memory, showing 85-100% success in data exfiltration across backends and reduced rates with defenses at varying utility costs.
MemRouter: Memory-as-Embedding Routing for Long-Term Conversational Agents cs.CL · 2026-05-01 · unverdicted · none · ref 11 · internal anchor
A lightweight supervised router using frozen-LLM embeddings for memory admission decisions outperforms LLM-based memory managers in both F1 score and latency on the LoCoMo benchmark.
AgentWard: A Lifecycle Security Architecture for Autonomous AI Agents cs.CR · 2026-04-27 · conditional · none · ref 5 · internal anchor
AgentWard organizes stage-specific security controls with cross-layer coordination to intercept threats across the full lifecycle of autonomous AI agents.
Memanto: Typed Semantic Memory with Information-Theoretic Retrieval for Long-Horizon Agents cs.AI · 2026-04-23 · unverdicted · none · ref 8 · internal anchor
Memanto delivers 89.8% and 87.1% accuracy on LongMemEval and LoCoMo benchmarks using typed semantic memory and information-theoretic retrieval, outperforming hybrid graph and vector systems with a single query and zero ingestion cost.
StructMem: Structured Memory for Long-Horizon Behavior in LLMs cs.CL · 2026-04-23 · unverdicted · none · ref 2 · internal anchor
StructMem is a structure-enriched hierarchical memory system that improves temporal reasoning and multi-hop QA on LoCoMo while cutting token usage, API calls, and runtime versus prior flat or graph-based memories.
Stateless Decision Memory for Enterprise AI Agents cs.AI · 2026-04-22 · unverdicted · none · ref 2 · internal anchor
Deterministic Projection Memory (DPM) delivers stateless, deterministic decision memory for enterprise AI agents that matches or exceeds summarization-based approaches at tight memory budgets while improving speed, determinism, and auditability.
HiGMem: A Hierarchical and LLM-Guided Memory System for Long-Term Conversational Agents cs.CL · 2026-04-20 · unverdicted · none · ref 4 · internal anchor
HiGMem combines hierarchical event-turn memory with LLM-guided selection to retrieve concise relevant evidence from long dialogues, improving F1 scores and cutting retrieved turns by an order of magnitude on the LoCoMo10 benchmark.
AnchorMem: Anchored Facts with Associative Contexts for Building Memory in Large Language Models cs.CL · 2026-04-19 · unverdicted · none · ref 4 · internal anchor
AnchorMem decouples atomic fact anchors and associative event graphs for retrieval from preserved raw interaction contexts, outperforming prior memory methods on the LoCoMo benchmark.
SOCIA-EVO: Automated Simulator Construction via Dual-Anchored Bi-Level Optimization cs.AI · 2026-04-19 · unverdicted · none · ref 98 · internal anchor
SOCIA-EVO generates statistically consistent simulators by separating structural refinement from parameter calibration via bi-level optimization and falsifying strategies through execution feedback in a Bayesian-weighted playbook.
GenericAgent: A Token-Efficient Self-Evolving LLM Agent via Contextual Information Density Maximization (V1.0) cs.CL · 2026-04-18 · unverdicted · none · ref 4 · internal anchor
GenericAgent outperforms other LLM agents on long-horizon tasks by maximizing context information density with fewer tokens via minimal tools, on-demand memory, trajectory-to-SOP evolution, and compression.
Visual Inception: Compromising Long-term Planning in Agentic Recommenders via Multimodal Memory Poisoning cs.CR · 2026-04-18 · unverdicted · none · ref 4 · internal anchor
Visual Inception poisons images to hijack long-term memory in agentic recommenders and steer planning, while CognitiveGuard reduces success to about 10% via perceptual sanitization and reasoning verification.
When Agents Go Quiet: Output Generation Capacity and Format-Cost Separation for LLM Document Synthesis cs.AI · 2026-04-17 · unverdicted · none · ref 20 · internal anchor
LLM agents avoid output stalling and reduce generation tokens by 48-72% via deferred template rendering guided by Output Generation Capacity and a Format-Cost Separation Theorem.
APEX-MEM: Agentic Semi-Structured Memory with Temporal Reasoning for Long-Term Conversational AI cs.CL · 2026-04-15 · unverdicted · none · ref 2 · internal anchor
APEX-MEM uses property graphs with temporal events, append-only storage, and an agentic retrieval system to reach 88.88% accuracy on LOCOMO QA and 86.2% on LongMemEval, outperforming prior session-aware methods.
AgentSPEX: An Agent SPecification and EXecution Language cs.CL · 2026-04-14 · unverdicted · none · ref 9 · internal anchor
AgentSPEX is a new language and harness for explicitly specifying and running structured LLM-agent workflows with typed steps, control flow, parallel execution, and a visual editor.
Drawing on Memory: Dual-Trace Encoding Improves Cross-Session Recall in LLM Agents cs.AI · 2026-04-14 · conditional · none · ref 5 · internal anchor
Dual-trace encoding improves LLM agent cross-session recall from 53.5% to 73.7% accuracy by storing facts alongside concrete scene reconstructions, with largest gains in temporal reasoning and multi-session aggregation.
MARS: Efficient, Adaptive Co-Scheduling for Heterogeneous Agentic Systems cs.OS · 2026-04-14 · conditional · none · ref 28 · internal anchor
MARS coordinates heterogeneous GPU-CPU resources for agentic LLM workloads via decoupled admission control and agent-centric KV cache management, delivering up to 5.94x lower latency and 1.87x faster task completion.
GAM: Hierarchical Graph-based Agentic Memory for LLM Agents cs.AI · 2026-04-14 · unverdicted · none · ref 22 · internal anchor
GAM decouples event-level memory encoding from topic-level consolidation in LLM agents using hierarchical graphs to reduce interference and improve long-term coherence and retrieval.
Time is Not a Label: Continuous Phase Rotation for Temporal Knowledge Graphs and Agentic Memory cs.CL · 2026-04-13 · unverdicted · none · ref 2 · internal anchor
RoMem uses a Semantic Speed Gate to assign volatility to relations and continuous phase rotation to shadow obsolete facts in complex space, delivering SOTA temporal KG completion and 2-3x gains on agentic memory benchmarks.
ACF: A Collaborative Framework for Agent Covert Communication under Cognitive Asymmetry cs.AI · 2026-04-09 · unverdicted · none · ref 20 · internal anchor
ACF structurally decouples covert communication from semantic reasoning in agent networks using a shared steganographic configuration to maintain performance under cognitive asymmetry.
HingeMem: Boundary Guided Long-Term Memory with Query Adaptive Retrieval for Scalable Dialogues cs.CL · 2026-04-08 · unverdicted · none · ref 36 · internal anchor
HingeMem segments dialogue memory via boundary-triggered hyperedges over four elements and applies query-adaptive retrieval, yielding ~20% relative gains and 68% lower QA token cost versus baselines on LOCOMO.
ATANT: An Evaluation Framework for AI Continuity cs.AI · 2026-04-08 · unverdicted · none · ref 5 · internal anchor
ATANT defines AI continuity via seven properties and offers a 10-checkpoint, LLM-free test using 250 stories to check if systems retrieve correct facts without cross-contamination.
HYVE: Hybrid Views for LLM Context Engineering over Machine Data cs.AI · 2026-04-07 · unverdicted · none · ref 40 · internal anchor
HYVE cuts LLM token usage on machine data by 50-90% using database-style hybrid views and a request-scoped datastore while maintaining or improving quality on tasks like anomaly detection and chart generation.
Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw cs.CR · 2026-04-06 · conditional · none · ref 11 · internal anchor
Poisoning any single CIK dimension of an AI agent raises average attack success rate from 24.6% to 64-74% across models, and tested defenses leave substantial residual risk.
Do Agent Societies Develop Intellectual Elites? The Hidden Power Laws of Collective Cognition in LLM Multi-Agent Systems cs.MA · 2026-04-03 · unverdicted · none · ref 46 · internal anchor
LLM agent societies develop power-law coordination cascades and intellectual elites through an integration bottleneck that grows with system size.
Opal: Private Memory for Personal AI cs.CR · 2026-04-02 · unverdicted · none · ref 164 · internal anchor
Opal enables private long-term memory for personal AI by decoupling reasoning to a trusted enclave with a lightweight knowledge graph and piggybacking reindexing on ORAM accesses.
Memory in the LLM Era: Modular Architectures and Strategies in a Unified Framework cs.CL · 2026-04-02 · unverdicted · none · ref 73 · internal anchor
A unified framework for LLM agent memory is benchmarked, with a new hybrid method outperforming state-of-the-art on standard tasks.
Oblivion: Self-Adaptive Agentic Memory Control through Decay-Driven Activation cs.CL · 2026-03-31 · unverdicted · none · ref 3 · internal anchor
Oblivion is a decay-driven memory framework that decouples read and write paths in LLM agents to enable adaptive forgetting and reinforcement for better long-horizon reasoning.
PersonaVLM: Long-Term Personalized Multimodal LLMs cs.CL · 2026-03-20 · unverdicted · none · ref 31 · internal anchor
PersonaVLM adds memory extraction, multi-turn retrieval-based reasoning, and personality inference to multimodal LLMs, yielding 22.4% gains on a new long-term personalization benchmark and outperforming GPT-4o.
AdaMem: Adaptive User-Centric Memory for Long-Horizon Dialogue Agents cs.CL · 2026-03-17 · unverdicted · none · ref 2 · internal anchor
AdaMem introduces four memory types and adaptive retrieval to reach state-of-the-art results on long-horizon dialogue benchmarks LoCoMo and PERSONAMEM.
SoK: Agentic Skills -- Beyond Tool Use in LLM Agents cs.CR · 2026-02-24 · unverdicted · none · ref 24 · internal anchor
The paper systematizes agentic skills beyond tool use, providing design pattern and representation-scope taxonomies plus security analysis of malicious skill infiltration in agent marketplaces.
AI and My Values: User Perceptions of LLMs' Ability to Extract, Embody, and Explain Human Values from Casual Conversations cs.HC · 2026-01-30 · unverdicted · none · ref 77 · internal anchor
13 participants became convinced AI understands human values after chatbot interactions evaluated with the VAPT toolkit.
HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding cs.CV · 2026-01-21 · unverdicted · none · ref 34 · internal anchor
HERMES organizes the KV cache into a hierarchical memory to enable real-time streaming video understanding in MLLMs, achieving 10x faster TTFT and up to 11.4% accuracy gains on streaming benchmarks with 68% fewer tokens.
Real-Time Procedural Learning From Experience for AI Agents cs.AI · 2025-11-27 · unverdicted · none · ref 13 · internal anchor
PRAXIS enables AI agents to acquire procedural knowledge in real time by indexing and retrieving state-action-result experiences, leading to better accuracy, reliability, and efficiency on web browsing benchmarks.
Tree Training: Accelerating Agentic LLMs Training via Shared Prefix Reuse cs.LG · 2025-11-01 · unverdicted · none · ref 10 · internal anchor
Tree Training serializes tree trajectories via DFS and uses redundancy-free partitioning to compute weighted per-token losses exactly once per token, achieving up to 6.2x training speedup on dense and MoE models.
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey cs.AI · 2025-09-02 · accept · none · ref 131 · internal anchor
Survey that defines agentic RL for LLMs via POMDPs, introduces a taxonomy of planning/tool-use/memory/reasoning capabilities and domains, and compiles open environments from over 500 papers.
A-MEM: Agentic Memory for LLM Agents cs.CL · 2025-02-17 · unverdicted · none · ref 25 · internal anchor
A-MEM is a dynamic memory system for LLM agents that builds and refines an interconnected network of notes with agent-driven linking and evolution, showing performance gains over prior memory methods on six models.
Retrieval-Augmented Generation for Natural Language Processing: A Survey cs.CL · 2024-07-18 · accept · none · ref 134 · internal anchor
The survey organizes RAG methods via a taxonomy of query-based, logits-based, latent, and parametric fusion with comparisons on accessibility, efficiency, applications, and challenges.
A-TMA: Decoupling State-Aware Memory Failures in Long-Term Agent Memory cs.AI · 2026-07-02 · unverdicted · none · ref 11 · internal anchor
ATMA adds state labels and evidence packets to existing memory systems to reduce ghost memory failures, with reported gains on a new LTP benchmark and LoCoMo.
Procedural Memory Distillation: Online Reflection for Self-Improving Language Models cs.AI · 2026-07-01 · unverdicted · none · ref 20 · internal anchor
PMD extracts and distills cross-episode procedural knowledge from RL rollouts into LLM policies at three abstraction levels, yielding 3.8-13.6% gains over SDPO on SCIKNOWEVAL and LIVECODEBENCH via co-evolution.
AI Native Games: A Survey and Roadmap cs.AI · 2026-07-01 · unverdicted · none · ref 70 · internal anchor
The paper proposes a counterfactual definition of AI-native games, screens 53 examples, introduces a G/N taxonomy, and outlines a research roadmap for the field.
Surprise as a Signal for Plasticity and Metacognition cs.AI · 2026-06-30 · unverdicted · none · ref 12 · internal anchor
Prediction-error surprise computed over frozen encoder latents gates episodic memory plasticity in continual ImageNet streams and modulates VLM assertiveness, hedging, and single-shot learning, with reported retention and AUROC gains.
AgRefactor: Self-Evolving Agentic Workflow for HLS Compatibility and Performance cs.AI · 2026-06-29 · unverdicted · none · ref 14 · internal anchor
AgRefactor deploys a self-evolving multi-agent workflow that combines LLM rewrites with automated tools to convert software into HLS code, matching or beating baselines on long benchmarks and delivering 6.51x geometric mean speedup after optimization.
MAS-Lab: A Specification-Driven Validation Framework for Reliable Multi-Agent Systems cs.MA · 2026-06-29 · unverdicted · none · ref 31 · internal anchor
MAS-Lab proposes a specification-driven framework with Spec, MAS-OS, and Labs layers to enable intent-based validation and reliable evolution of multi-agent systems.
EVAF: A Test-Retest Protocol for Selective Parametric Consolidation cs.LG · 2026-06-29 · unverdicted · none · ref 13 · internal anchor
EVAF and test-retest protocol show selective parametric consolidation of high-valence experiences in GPT-2 and TinyLlama while preserving factual retrieval.
RESOURCE2SKILL: Distilling Executable Agent Skills from Human-Created Multimodal Resources cs.SE · 2026-06-28 · unverdicted · none · ref 8 · internal anchor
RESOURCE2SKILL converts multimodal human resources into a hierarchical Skill Wiki of executable agent skills, reporting +11.9 percentage point average gains over no-skill baselines across seven authoring domains.
Manufactured Confidence: How Memory Consolidation Turns Hearsay into Confident Facts cs.CR · 2026-06-28 · unverdicted · none · ref 30 · internal anchor
LLM memory consolidation turns casual hedged statements into confident facts that agents obey regardless of source or verification.
KernelFlume: Elastic Core-Attention Scaling for Agentic Long-Context Decoding cs.DC · 2026-06-28 · unverdicted · none · ref 24 · internal anchor
KernelFlume presents a disaggregated decode architecture that separates core attention from projection/FFN paths to enable elastic scaling of attention nodes, reporting up to 61% lower cost per million tokens versus full-instance scaling on H100 hardware for Llama-3.1-8B under dynamic long-context w
Selective Memory Retention for Long-Horizon LLM Agents cs.AI · 2026-06-28 · conditional · none · ref 5 · internal anchor
TraceRetain applies feature-based scoring to evict low-value entries from bounded external memory in frozen LLM agents, preserving task success under 75% synthetic distractors on ALFWorld where unbounded memory degrades.
Advancing Omnimodal Embodied Agents from Isolated Skills to Everyday Physical Autonomy cs.RO · 2026-06-25 · unverdicted · none · ref 11 · internal anchor
OmniAct framework integrates planning, memory, and verification to enable persistent autonomy in omnimodal embodied agents, showing improved success and stable context in 40 real-world tasks.
When Does Overlap Help? OSU-Mem and a Cell-Conditional Analysis of Trajectory Memory for LLM Agents cs.IR · 2026-06-19 · unverdicted · none · ref 7 · internal anchor
OSU-Mem shows overlapping memory helps retrieval when evidence shares tools or entities but hurts when steps are heterogeneous, with benefits on synthetic benchmarks vanishing on mixed real ones due to query mixing.

MemGPT: Towards LLMs as Operating Systems

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer