arxiv: 2507.03724 · v4 · submitted 2025-07-04 · 💻 cs.CL

Recognition: no theorem link

MemOS: A Memory OS for AI System

Zhiyu Li , Chenyang Xi , Chunyu Li , Ding Chen , Boyu Chen , Shichao Song , Simin Niu , Hanyu Wang

show 31 more authors

Jiawei Yang Chen Tang Qingchen Yu Jihao Zhao Yezhaohui Wang Peng Liu Zehao Lin Pengyuan Wang Jiahao Huo Tianyi Chen Kai Chen Kehang Li Zhen Tao Huayi Lai Hao Wu Bo Tang Zhengren Wang Zhaoxin Fan Ningyu Zhang Linfeng Zhang Junchi Yan Mingchuan Yang Tong Xu Wei Xu Huajun Chen Haofen Wang Hongkang Yang Wentao Zhang Zhi-Qin John Xu Siheng Chen Feiyu Xiong

Authors on Pith no claims yet

Pith reviewed 2026-05-15 08:15 UTC · model grok-4.3

classification 💻 cs.CL

keywords memory managementlarge language modelscontinual learningMemCuberetrieval-augmented generationpersonalized modelingmemory hierarchysystem framework

0 comments

The pith

MemOS proposes a memory operating system that unifies plaintext, activation-based, and parameter memories in LLMs through MemCubes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces MemOS as a framework that treats memory as a manageable system resource for large language models. It unifies representation, scheduling, and evolution across plaintext, activation, and parameter-level memories to address limitations in long-context reasoning and knowledge consistency. By using MemCubes as the core unit, the system enables composition, migration, and fusion of memories, bridging retrieval methods with parameter updates for more efficient and controllable operation.

Core claim

MemOS establishes a memory-centric system that unifies the handling of plaintext, activation-based, and parameter-level memories, with MemCubes serving as the basic unit that encapsulates content and metadata to support flexible transitions and evolution over time.

What carries the argument

The MemCube, which encapsulates memory content and metadata such as provenance and versioning to enable composition, migration, and fusion between different memory types.

If this is right

LLMs gain the ability to manage knowledge across different time scales and sources with explicit lifecycle control.
Continual learning becomes feasible through memory composition and evolution without full retraining.
Personalized modeling improves by integrating user-specific memories with persistent representations.
Computational costs decrease by externalizing specific knowledge into an intermediate memory layer.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could allow hybrid systems where temporary retrieval outputs gradually migrate into stable parameter updates.
It opens paths to test memory fusion techniques in multi-user or multi-domain scenarios.
Future work might explore how MemCubes interact with existing short-context windows during inference.

Load-bearing premise

A unified memory layer using MemCubes can be practically implemented to bridge retrieval and parameter-based learning while delivering the claimed reductions in cost and gains in consistency.

What would settle it

An implementation and benchmark experiment demonstrating no meaningful cost savings or consistency improvements relative to standard RAG combined with periodic fine-tuning.

read the original abstract

Large Language Models (LLMs) have become an essential infrastructure for Artificial General Intelligence (AGI), yet their lack of well-defined memory management systems hinders the development of long-context reasoning, continual personalization, and knowledge consistency.Existing models mainly rely on static parameters and short-lived contextual states, limiting their ability to track user preferences or update knowledge over extended periods.While Retrieval-Augmented Generation (RAG) introduces external knowledge in plain text, it remains a stateless workaround without lifecycle control or integration with persistent representations.Recent work has modeled the training and inference cost of LLMs from a memory hierarchy perspective, showing that introducing an explicit memory layer between parameter memory and external retrieval can substantially reduce these costs by externalizing specific knowledge. Beyond computational efficiency, LLMs face broader challenges arising from how information is distributed over time and context, requiring systems capable of managing heterogeneous knowledge spanning different temporal scales and sources. To address this challenge, we propose MemOS, a memory operating system that treats memory as a manageable system resource. It unifies the representation, scheduling, and evolution of plaintext, activation-based, and parameter-level memories, enabling cost-efficient storage and retrieval. As the basic unit, a MemCube encapsulates both memory content and metadata such as provenance and versioning. MemCubes can be composed, migrated, and fused over time, enabling flexible transitions between memory types and bridging retrieval with parameter-based learning. MemOS establishes a memory-centric system framework that brings controllability, plasticity, and evolvability to LLMs, laying the foundation for continual learning and personalized modeling.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MemOS is a high-level vision for an LLM memory OS using MemCubes, but the mechanisms and evidence for its claimed benefits are missing.

read the letter

The main thing your colleague should know is that this paper proposes MemOS as a memory operating system for AI, using MemCubes to unify plaintext, activation, and parameter memories. It aims to bring better controllability and support for continual learning. What is actually new is the framing of memory management as an OS with specific operations like composing, migrating, and fusing MemCubes to bridge different memory types. The paper does well in identifying the real issues with current LLMs, like their static nature and the shortcomings of RAG for persistent knowledge. It builds sensibly on ideas from memory hierarchy research to suggest efficiency gains. The soft spots are that everything stays conceptual. There are no specifics on how MemCubes would be scheduled, versioned, or transitioned between types during actual model operation. No equations, algorithms, or results are provided to show cost reductions or improved consistency. The assumption that this unification will deliver the benefits is not tested or detailed, so the feasibility remains unshown. This work is for people working on the systems side of LLMs, particularly those interested in infrastructure for long-term adaptation and personalization. A reader could get some useful framing from it for thinking about future designs. I think it deserves peer review as a vision paper to encourage more concrete follow-up work on the mechanisms.

Referee Report

3 major / 1 minor

Summary. The paper proposes MemOS, a memory operating system for LLMs that unifies plaintext, activation-based, and parameter-level memories. MemCubes serve as the basic unit, encapsulating content and metadata (e.g., provenance, versioning); they can be composed, migrated, and fused to enable flexible transitions between memory types, externalize knowledge, and deliver controllability, plasticity, and evolvability for continual learning and personalized modeling.

Significance. If the unification via MemCubes can be realized with concrete mechanisms, the framework could address a genuine gap in LLM memory management by providing lifecycle control beyond static parameters or stateless RAG, potentially lowering costs through externalization and supporting long-term consistency. The conceptual contribution is clear, but significance remains prospective given the absence of implementation details or validation.

major comments (3)

[Abstract] Abstract: the assertion that MemCubes 'can be composed, migrated, and fused' to bridge retrieval with parameter-based learning is load-bearing for the central claim, yet no operators, fusion semantics, migration protocols, or scheduling algorithms are defined.
[Abstract] Abstract: the claim that an explicit memory layer 'can substantially reduce' training and inference costs lacks any cost model, equations, or quantitative comparison to RAG or fine-tuning baselines.
[Abstract] Abstract: the benefits of controllability, plasticity, and evolvability are asserted without specifying how transitions between memory types are realized inside training or inference loops, or how versioning and provenance metadata are maintained across fusions.

minor comments (1)

The manuscript would benefit from an explicit system diagram or pseudocode section illustrating MemCube lifecycle operations.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We agree that the abstract would benefit from greater specificity on mechanisms and will revise it to better support the central claims while preserving the paper's focus as a framework proposal. Point-by-point responses to the major comments are provided below.

read point-by-point responses

Referee: [Abstract] Abstract: the assertion that MemCubes 'can be composed, migrated, and fused' to bridge retrieval with parameter-based learning is load-bearing for the central claim, yet no operators, fusion semantics, migration protocols, or scheduling algorithms are defined.

Authors: The body of the manuscript (Sections 3–5) defines these elements: composition operators (union/intersection with metadata alignment), fusion semantics (provenance-preserving merge rules), migration protocols (via the MemCube scheduler), and scheduling algorithms (priority-based eviction and promotion). To address the concern that the abstract is insufficiently self-contained, we will add a concise sentence summarizing these mechanisms and reference the relevant sections. revision: yes
Referee: [Abstract] Abstract: the claim that an explicit memory layer 'can substantially reduce' training and inference costs lacks any cost model, equations, or quantitative comparison to RAG or fine-tuning baselines.

Authors: The claim is grounded in the memory-hierarchy cost analysis cited in the introduction. We acknowledge that the abstract does not include the equations or direct comparisons. In revision we will insert a brief reference to the cost model and add a short quantitative comparison paragraph (drawing on the cited prior work) to the abstract and discussion section. revision: yes
Referee: [Abstract] Abstract: the benefits of controllability, plasticity, and evolvability are asserted without specifying how transitions between memory types are realized inside training or inference loops, or how versioning and provenance metadata are maintained across fusions.

Authors: Section 4 describes the MemOS scheduler realizing type transitions inside both training and inference loops, with versioning and provenance maintained via immutable metadata logs that survive fusion through a defined merge protocol. We will revise the abstract to include a high-level statement of these transition and metadata mechanisms. revision: yes

Circularity Check

0 steps flagged

No significant circularity in conceptual system proposal

full rationale

The paper presents a high-level architectural proposal for MemOS without any mathematical derivations, equations, or quantitative predictions. Concepts such as MemCubes are introduced descriptively as encapsulating content and metadata, with operations like composition, migration, and fusion stated as capabilities rather than derived from prior definitions or fitted inputs. No self-citation chains, uniqueness theorems, or ansatzes are invoked to support core claims; the benefits of controllability, plasticity, and evolvability are framed as outcomes of the proposed framework itself. The text contains no load-bearing reductions where a result equals its inputs by construction, making the derivation chain self-contained as a system design document.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The central claim rests on domain assumptions about LLM memory limitations and the value of an explicit unified layer; no free parameters or fitted values are introduced, and MemCube and MemOS are new conceptual entities without independent evidence.

axioms (2)

domain assumption LLMs lack well-defined memory management systems that support long-term tracking and evolution of knowledge
Stated as the core problem motivating the proposal in the abstract.
domain assumption An explicit memory layer between parameters and external retrieval can reduce costs and enable unification of heterogeneous knowledge
Referenced via recent work and adopted as the basis for MemOS design.

invented entities (2)

MemCube no independent evidence
purpose: Basic unit that encapsulates memory content together with metadata such as provenance and versioning
Introduced as the fundamental building block enabling composition, migration, and fusion of memories.
MemOS no independent evidence
purpose: Memory operating system that unifies representation, scheduling, and evolution of multiple memory types
Core proposed framework for bringing controllability to LLM memory.

pith-pipeline@v0.9.0 · 5722 in / 1553 out tokens · 52679 ms · 2026-05-15T08:15:28.772869+00:00 · methodology

discussion (0)

Forward citations

Cited by 25 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

AlpsBench: An LLM Personalization Benchmark for Real-Dialogue Memorization and Preference Alignment
cs.CL 2026-03 unverdicted novelty 8.0

AlpsBench supplies 2500 real-dialogue sequences with verified memories to benchmark LLM extraction, updating, retrieval, and utilization of personalized information.
Agentic Recommender System with Hierarchical Belief-State Memory
cs.CL 2026-05 unverdicted novelty 7.0

MARS uses hierarchical memory and LLM planning to achieve 26.4% higher HR@1 on InstructRec benchmarks compared to prior methods.
Belief Memory: Agent Memory Under Partial Observability
cs.AI 2026-05 unverdicted novelty 7.0

BeliefMem is a probabilistic memory architecture for LLM agents that retains multiple candidate conclusions with probabilities updated by Noisy-OR, achieving superior average performance over deterministic baselines o...
Belief Memory: Agent Memory Under Partial Observability
cs.AI 2026-05 unverdicted novelty 7.0

BeliefMem stores multiple candidate conclusions with probabilities in agent memory and updates them via Noisy-OR rules to preserve uncertainty under partial observability.
Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory
cs.CL 2026-05 unverdicted novelty 7.0

MemCoE learns memory organization guidelines via contrastive feedback and then trains a guideline-aligned RL policy for memory updates, yielding consistent gains on personalization benchmarks.
Cognifold: Always-On Proactive Memory via Cognitive Folding
cs.AI 2026-05 unverdicted novelty 6.0

Cognifold is a new proactive memory architecture that folds event streams into emergent cognitive structures by extending complementary learning systems theory with a prefrontal intent layer and graph topology self-or...
HAGE: Harnessing Agentic Memory via RL-Driven Weighted Graph Evolution
cs.AI 2026-05 unverdicted novelty 6.0

HAGE proposes a trainable weighted graph memory framework with LLM intent classification, dynamic edge modulation, and RL optimization that improves long-horizon reasoning accuracy in agentic LLMs over static baselines.
MemPrivacy: Privacy-Preserving Personalized Memory Management for Edge-Cloud Agents
cs.CR 2026-05 unverdicted novelty 6.0

MemPrivacy replaces privacy-sensitive spans with structured placeholders on edge devices to enable effective cloud memory management while limiting utility loss to 1.6% and outperforming general models on privacy extraction.
MemPrivacy: Privacy-Preserving Personalized Memory Management for Edge-Cloud Agents
cs.CR 2026-05 unverdicted novelty 6.0

MemPrivacy uses edge-side privacy span detection and semantic placeholders to enable cloud memory management for LLM agents while limiting utility loss to 1.6% and outperforming masking baselines.
MemPrivacy: Privacy-Preserving Personalized Memory Management for Edge-Cloud Agents
cs.CR 2026-05 unverdicted novelty 6.0

MemPrivacy uses edge detection of sensitive spans and type-aware placeholders to enable cloud-side memory management for LLM agents without exposing private data, achieving under 1.6% utility loss.
MemReader: From Passive to Active Extraction for Long-Term Agent Memory
cs.CL 2026-04 unverdicted novelty 6.0

MemReader uses distilled passive and GRPO-trained active extractors to selectively write low-noise long-term memories, outperforming passive baselines on knowledge updating, temporal reasoning, and hallucination tasks.
HingeMem: Boundary Guided Long-Term Memory with Query Adaptive Retrieval for Scalable Dialogues
cs.CL 2026-04 unverdicted novelty 6.0

HingeMem segments dialogue memory via boundary-triggered hyperedges over four elements and applies query-adaptive retrieval, yielding ~20% relative gains and 68% lower QA token cost versus baselines on LOCOMO.
FileGram: Grounding Agent Personalization in File-System Behavioral Traces
cs.CV 2026-04 unverdicted novelty 6.0

FileGram grounds AI agent personalization in file-system behavioral traces via a data simulation engine, a diagnostic benchmark, and a bottom-up memory architecture.
Memory in the LLM Era: Modular Architectures and Strategies in a Unified Framework
cs.CL 2026-04 unverdicted novelty 6.0

A unified framework for LLM agent memory is benchmarked, with a new hybrid method outperforming state-of-the-art on standard tasks.
Oblivion: Self-Adaptive Agentic Memory Control through Decay-Driven Activation
cs.CL 2026-03 unverdicted novelty 6.0

Oblivion is a decay-driven memory framework that decouples read and write paths in LLM agents to enable adaptive forgetting and reinforcement for better long-horizon reasoning.
MemFactory: Unified Inference & Training Framework for Agent Memory
cs.CL 2026-03 unverdicted novelty 6.0

MemFactory is a new unified modular framework for memory-augmented LLM agent inference and training that integrates GRPO and reports up to 14.8% relative gains on MemAgent evaluations.
Safety in Embodied AI: A Survey of Risks, Attacks, and Defenses
cs.CR 2026-03 unverdicted novelty 6.0

The survey organizes over 400 papers on embodied AI safety into a multi-level taxonomy and flags overlooked issues such as fragile multimodal fusion and unstable planning under jailbreaks.
PersonaVLM: Long-Term Personalized Multimodal LLMs
cs.CL 2026-03 unverdicted novelty 6.0

PersonaVLM adds memory extraction, multi-turn retrieval-based reasoning, and personality inference to multimodal LLMs, yielding 22.4% gains on a new long-term personalization benchmark and outperforming GPT-4o.
MemReread: Enhancing Agentic Long-Context Reasoning via Memory-Guided Rereading
cs.CL 2026-05 unverdicted novelty 5.0

MemReread improves agent long-context reasoning by triggering rereading on insufficient final memory to recover discarded indirect facts, outperforming baselines at linear complexity.
MemReranker: Reasoning-Aware Reranking for Agent Memory Retrieval
cs.CL 2026-05 unverdicted novelty 5.0

MemReranker applies multi-stage distillation to Qwen3-Reranker to produce reasoning-aware rerankers that outperform baselines on memory tasks with temporal and causal constraints.
Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering
cs.SE 2026-04 accept novelty 5.0

LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.
MemCoT: Test-Time Scaling through Memory-Driven Chain-of-Thought
cs.MA 2026-04 unverdicted novelty 5.0

MemCoT redefines long-context reasoning as iterative stateful search with zoom-in/zoom-out memory perception and dual short-term memories, claiming SOTA results on LoCoMo and LongMemEval-S benchmarks.
MemMachine: A Ground-Truth-Preserving Memory System for Personalized AI Agents
cs.AI 2026-04 unverdicted novelty 5.0

MemMachine stores entire conversational episodes and applies contextualized retrieval plus adaptive query routing to achieve 0.9169 accuracy on LoCoMo and 93 percent on LongMemEvalS while using 80 percent fewer tokens...
MemReranker: Reasoning-Aware Reranking for Agent Memory Retrieval
cs.CL 2026-05 unverdicted novelty 4.0

MemReranker applies multi-teacher pairwise distillation, BCE pointwise training, and InfoNCE contrastive learning on mixed general and memory-specific dialogue data to produce efficient rerankers that improve calibrat...
Memory as Metabolism: A Design for Companion Knowledge Systems
cs.AI 2026-04 unverdicted novelty 4.0

This paper designs a companion knowledge system with TRIAGE, DECAY, CONTEXTUALIZE, CONSOLIDATE, and AUDIT operations plus memory gravity and minority-hypothesis retention to give contradictory evidence a path to updat...

Reference graph

Works this paper leans on

106 extracted references · 106 canonical work pages · cited by 21 Pith papers · 23 internal anchors

[1]

Memory3: Language modeling with explicit memory.Journal of Machine Learning, 3(3):300–346, January 2024

Hongkang Yang, Zehao Lin, Wenjin Wang, Hao Wu, Zhiyu Li, Bo Tang, Wenqiang Wei, Jinbo Wang, Zeyun Tang, Shichao Song, Chenyang Xi, Yu Yu, Kai Chen, Feiyu Xiong, Linpeng Tang, and Weinan E. Memory3: Language modeling with explicit memory.Journal of Machine Learning, 3(3):300–346, January 2024

work page 2024
[2]

A Survey of Large Language Models

Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, et al. A survey of large language models.arXiv preprint arXiv:2303.18223, 1(2), 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[3]

Codet5+: Open code large language models for code understanding and generation.arXiv preprint arXiv:2305.07922, 2023

Yue Wang, Hung Le, Akhilesh Deepak Gotmare, Nghi DQ Bui, Junnan Li, and Steven CH Hoi. Codet5+: Open code large language models for code understanding and generation.arXiv preprint arXiv:2305.07922, 2023

work page arXiv 2023
[4]

From linguistic giants to sensory maestros: A survey on cross-modal reasoning with large language models.arXiv preprint arXiv:2409.18996, 2024

Shengsheng Qian, Zuyi Zhou, Dizhan Xue, Bing Wang, and Changsheng Xu. From linguistic giants to sensory maestros: A survey on cross-modal reasoning with large language models.arXiv preprint arXiv:2409.18996, 2024

work page arXiv 2024
[5]

Retrieval-Augmented Generation for AI-Generated Content: A Survey

Penghao Zhao, Hailin Zhang, Qinhan Yu, Zhengren Wang, Yunteng Geng, Fangcheng Fu, Ling Yang, Wentao Zhang, and Bin Cui. Retrieval-augmented generation for ai-generated content: A survey.CoRR, abs/2402.19473, 2024. 31

work page internal anchor Pith review arXiv 2024
[6]

Retrieval-Augmented Generation for Large Language Models: A Survey

Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Qianyu Guo, Meng Wang, and Haofen Wang. Retrieval-augmented generation for large language models: A survey.CoRR, abs/2312.10997, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[7]

A survey of graph retrieval-augmented generation for customized large language models.arXiv preprint arXiv:2501.13958, 2025

Qinggang Zhang, Shengyuan Chen, Yuanchen Bei, Zheng Yuan, Huachi Zhou, Zijin Hong, Junnan Dong, Hao Chen, Yi Chang, and Xiao Huang. A survey of graph retrieval-augmented generation for customized large language models. CoRR, abs/2501.13958, 2025

work page arXiv 2025
[8]

Rossi, Franck Dernoncourt, Md

Bo Ni, Zheyuan Liu, Leyao Wang, Yongjia Lei, Yuying Zhao, Xueqi Cheng, Qingkai Zeng, Luna Dong, Yinglong Xia, Krishnaram Kenthapadi, Ryan A. Rossi, Franck Dernoncourt, Md. Mehrab Tanjim, Nesreen K. Ahmed, Xiaorui Liu, Wenqi Fan, Erik Blasch, Yu Wang, Meng Jiang, and Tyler Derr. Towards trustworthy retrieval augmented generation for large language models: ...

work page arXiv 2025
[9]

Walking down the memory maze: Beyond context limit through interactive reading.CoRR, abs/2310.05029, 2023

Howard Chen, Ramakanth Pasunuru, Jason Weston, and Asli Celikyilmaz. Walking down the memory maze: Beyond context limit through interactive reading.CoRR, abs/2310.05029, 2023

work page arXiv 2023
[10]

From Local to Global: A Graph RAG Approach to Query-Focused Summarization

Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, and Jonathan Larson. From local to global: A graph RAG approach to query-focused summarization.CoRR, abs/2404.16130, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[11]

LightRAG: Simple and Fast Retrieval-Augmented Generation

Zirui Guo, Lianghao Xia, Yanhua Yu, Tu Ao, and Chao Huang. Lightrag: Simple and fast retrieval-augmented generation. CoRR, abs/2410.05779, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[12]

Retrieval augmented generation (rag) in azure ai search, 2025

Microsoft. Retrieval augmented generation (rag) in azure ai search, 2025

work page 2025
[13]

Vertex ai search, 2025

Google. Vertex ai search, 2025

work page 2025
[14]

Build innovative ai search experiences, 2025

Elastic. Build innovative ai search experiences, 2025

work page 2025
[15]

Agentic rag-as-a-service company, 2025

Nuclia. Agentic rag-as-a-service company, 2025

work page 2025
[16]

Freshllms: Refreshing large language models with search engine augmentation, 2023

Tu Vu, Mohit Iyyer, Xuezhi Wang, Noah Constant, Jerry Wei, Jason Wei, Chris Tar, Yun-Hsuan Sung, Denny Zhou, Quoc Le, and Thang Luong. Freshllms: Refreshing large language models with search engine augmentation, 2023

work page 2023
[17]

GPT-4 Technical Report

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[18]

Cursor - The AI Code Editor

work page
[19]

Yiming Du, Wenyu Huang, Danna Zheng, Zhaowei Wang, Sebastien Montella, Mirella Lapata, Kam-Fai Wong, and Jeff Z. Pan. Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions, May 2025. arXiv:2505.00675 [cs]

work page arXiv 2025
[20]

From Human Memory to AI Memory: A Survey on Memory Mechanisms in the Era of LLMs, April 2025

Yaxiong Wu, Sheng Liang, Chen Zhang, Yichao Wang, Yongyue Zhang, Huifeng Guo, Ruiming Tang, and Yong Liu. From Human Memory to AI Memory: A Survey on Memory Mechanisms in the Era of LLMs, April 2025. arXiv:2504.15965 [cs]

work page arXiv 2025
[21]

Cognitive Memory in Large Language Models, April 2025

Lianlei Shan, Shixian Luo, Zezhou Zhu, Yu Yuan, and Yong Wu. Cognitive Memory in Large Language Models, April 2025. arXiv:2504.02441 [cs]

work page arXiv 2025
[22]

Language Models are Unsupervised Multitask Learners

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language Models are Unsupervised Multitask Learners

work page
[23]

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin...

work page internal anchor Pith review Pith/arXiv arXiv 2020
[24]

Prefix-Tuning: Optimizing Continuous Prompts for Generation

Xiang Lisa Li and Percy Liang. Prefix-Tuning: Optimizing Continuous Prompts for Generation. InProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume1: Long Papers), pages 4582–4597, Online, 2021. Associa- tion for Computational Linguistics. 32

work page 2021
[25]

The Power of Scale for Parameter-Efficient Prompt Tuning

Brian Lester, Rami Al-Rfou, and Noah Constant. The Power of Scale for Parameter-Efficient Prompt Tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3045–3059, Online and Punta Cana, Dominican Republic, 2021. Association for Computational Linguistics

work page 2021
[26]

arXiv preprint arXiv:2103.10385 , year=

Xiao Liu, Yanan Zheng, Zhengxiao Du, Ming Ding, Yujie Qian, Zhilin Yang, and Jie Tang. GPT Understands, Too, October 2023. arXiv:2103.10385 [cs]

work page arXiv 2023
[27]

P-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks

Xiao Liu, Kaixuan Ji, Yicheng Fu, Weng Tam, Zhengxiao Du, Zhilin Yang, and Jie Tang. P-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume2: Short Papers), pages 61–68, Dublin, Ireland, 2022. Association for Computational Linguistics

work page 2022
[28]

Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, and Ryan Lowe. Training language models to follow instructions with human feedback,...

work page internal anchor Pith review Pith/arXiv arXiv 2022
[29]

Efficient Memory Management for Large Language Model Serving with PagedAttention

Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph Gonzalez, Hao Zhang, and Ion Stoica. Efficient Memory Management for Large Language Model Serving with PagedAttention. In Proceedings of the 29th Symposium on Operating Systems Principles, pages 611–626, Koblenz Germany, October 2023. ACM

work page 2023
[30]

Efficient Streaming Language Models with Attention Sinks

Guangxuan Xiao, Yuandong Tian, Beidi Chen, Song Han, and Mike Lewis. Efficient Streaming Language Models with Attention Sinks. October 2023

work page 2023
[31]

H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.Advances in Neural Information Processing Systems, 36:34661–34710, December 2023

Zhenyu Zhang, Ying Sheng, Tianyi Zhou, Tianlong Chen, Lianmin Zheng, Ruisi Cai, Zhao Song, Yuandong Tian, Christopher Ré, Clark Barrett, Zhangyang "Atlas" Wang, and Beidi Chen. H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.Advances in Neural Information Processing Systems, 36:34661–34710, December 2023

work page 2023
[32]

Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference

Harry Dong, Xinyu Yang, Zhenyu Zhang, Zhangyang Wang, Yuejie Chi, and Beidi Chen. Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference. June 2024

work page 2024
[33]

Mahoney, Yakun S

Coleman Hooper, Sehoon Kim, Hiva Mohammadzadeh, Michael W. Mahoney, Yakun S. Shao, Kurt Keutzer, and Amir Gholami. KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization. Advancesin Neural Information Processing Systems, 37:1270–1303, December 2024

work page 2024
[34]

RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval, December 2024

Di Liu, Meng Chen, Baotong Lu, Huiqiang Jiang, Zhenhua Han, Qianxi Zhang, Qi Chen, Chengruidong Zhang, Bailu Ding, Kai Zhang, Chen Chen, Fan Yang, Yuqing Yang, and Lili Qiu. RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval, December 2024. arXiv:2409.10516 [cs]

work page arXiv 2024
[35]

Extracting Latent Steering Vectors from Pretrained Language Models

Nishant Subramani, Nivedita Suresh, and Matthew Peters. Extracting Latent Steering Vectors from Pretrained Language Models. In Findings of the Association for Computational Linguistics: ACL 2022, pages 566–581, Dublin, Ireland, 2022. Association for Computational Linguistics

work page 2022
[36]

In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering, February 2024

Sheng Liu, Haotian Ye, Lei Xing, and James Zou. In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering, February 2024. arXiv:2311.06668 [cs]

work page arXiv 2024
[37]

Steering Language Models With Activation Engineering

Alexander Matt Turner, Lisa Thiergart, Gavin Leech, David Udell, Juan J. Vazquez, Ulisse Mini, and Monte MacDiarmid. Steering Language Models With Activation Engineering, October 2024. arXiv:2308.10248 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2024
[38]

Style vectors for steering generative large language models

Kai Konen, Sophie Jentzsch, Diaoulé Diallo, Peer Schütt, Oliver Bensch, Roxanne El Baff, Dominik Opitz, and Tobias Hecking. Style vectors for steering generative large language models. In Yvette Graham and Matthew Purver, editors, Findings of the Association for Computational Linguistics: EACL 2024, pages 782–802, St. Julian’s, Malta, March 2024. Associat...

work page 2024
[39]

Steering Llama 2 via Contrastive Activation Addition

Nina Rimsky, Nick Gabrieli, Julian Schulz, Meg Tong, Evan Hubinger, and Alexander Turner. Steering Llama 2 via Contrastive Activation Addition. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors,Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume1: Long Papers), pages 15504–15522, Bangkok, Thailand, Augus...

work page 2024
[40]

FreeCtrl: Constructing Control Centers with Feedforward Layers for Learning-Free Controllable Text Generation

Zijian Feng, Hanzhang Zhou, Kezhi Mao, and Zixiao Zhu. FreeCtrl: Constructing Control Centers with Feedforward Layers for Learning-Free Controllable Text Generation. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume1: Long Papers), pages 7627–7640, Bangkok, Thailand,

work page
[41]

Association for Computational Linguistics. 33

work page
[42]

EasyEdit2: An Easy-to-use Steering Framework for Editing Large Language Models, April 2025

Ziwen Xu, Shuxun Wang, Kewei Xu, Haoming Xu, Mengru Wang, Xinle Deng, Yunzhi Yao, Guozhou Zheng, Huajun Chen, and Ningyu Zhang. EasyEdit2: An Easy-to-use Steering Framework for Editing Large Language Models, April 2025. arXiv:2504.15133 [cs]

work page arXiv 2025
[43]

Enhancing Multiple Dimensions of Trustworthiness in LLMs via Sparse Activation Control

Yuxin Xiao, Chaoqun Wan, Yonggang Zhang, Wenxiao Wang, Binbin Lin, Xiaofei He, Xu Shen, and Jieping Ye. Enhancing Multiple Dimensions of Trustworthiness in LLMs via Sparse Activation Control. November 2024

work page 2024
[44]

DESTEIN: Navigating Detoxification of Language Models via Universal Steering Pairs and Head-wise Activation Fusion, August 2024

Yu Li, Han Jiang, Chuanyang Gong, and Zhihua Wei. DESTEIN: Navigating Detoxification of Language Models via Universal Steering Pairs and Head-wise Activation Fusion, August 2024. arXiv:2404.10464 [cs]

work page arXiv 2024
[45]

Word Embeddings Are Steers for Language Models

Chi Han, Jialiang Xu, Manling Li, Yi Fung, Chenkai Sun, Nan Jiang, Tarek Abdelzaher, and Heng Ji. Word Embeddings Are Steers for Language Models. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors, Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 16410–16430, Bangkok, Thailand, A...

work page 2024
[46]

Generalization through Memorization: Nearest Neighbor Language Models

Urvashi Khandelwal, Omer Levy, Dan Jurafsky, Luke Zettlemoyer, and Mike Lewis. Generalization through Memorization: Nearest Neighbor Language Models. September 2019

work page 2019
[47]

Goodtriever: Adaptive Toxicity Mitigation with Retrieval-augmented Models

Luiza Pozzobon, Beyza Ermis, Patrick Lewis, and Sara Hooker. Goodtriever: Adaptive Toxicity Mitigation with Retrieval-augmented Models. InFindings of the Association for Computational Linguistics: EMNLP 2023, pages 5108–5125, Singapore, 2023. Association for Computational Linguistics

work page 2023
[48]

Noderag: Structuring graph-based rag with heterogeneous nodes, 2025

Tianyang Xu, Haojie Zheng, Chengze Li, Haoxiang Chen, Yixin Liu, Ruoxi Chen, and Lichao Sun. Noderag: Structuring graph-based rag with heterogeneous nodes, 2025

work page 2025
[49]

Heterag: A heterogeneous retrieval-augmented generation framework with decoupled knowledge representations, 2025

Peiru Yang, Xintian Li, Zhiyang Hu, Jiapeng Wang, Jinhua Yin, Huili Wang, Lizhi He, Shuai Yang, Shangguang Wang, Yongfeng Huang, and Tao Qi. Heterag: A heterogeneous retrieval-augmented generation framework with decoupled knowledge representations, 2025

work page 2025
[50]

Hypergraphrag: Retrieval-augmented generation with hypergraph-structured knowledge representation.CoRR, abs/2503.21322, 2025

Haoran Luo, Haihong E, Guanting Chen, Yandan Zheng, Xiaobao Wu, Yikai Guo, Qika Lin, Yu Feng, Ze-min Kuang, Meina Song, Yifan Zhu, and Luu Anh Tuan. Hypergraphrag: Retrieval-augmented generation with hypergraph-structured knowledge representation.CoRR, abs/2503.21322, 2025

work page arXiv 2025
[51]

Hipporag: Neurobiologically inspired long-term memory for large language models

Bernal Jimenez Gutierrez, Yiheng Shu, Yu Gu, Michihiro Yasunaga, and Yu Su. Hipporag: Neurobiologically inspired long-term memory for large language models. In Amir Globersons, Lester Mackey, Danielle Belgrave, Angela Fan, Ulrich Paquet, Jakub M. Tomczak, and Cheng Zhang, editors,Advances in Neural Information Processing Systems 38: Annual Conference on N...

work page 2024
[52]

From rag to memory: Non-parametric continual learning for large language models.arXiv preprint arXiv:2502.14802, 2025

Bernal Jiménez Gutiérrez, Yiheng Shu, Weijian Qi, Sizhe Zhou, and Yu Su. From RAG to memory: Non- parametric continual learning for large language models.CoRR, abs/2502.14802, 2025

work page arXiv 2025
[53]

Empowering large language models to set up a knowledge retrieval indexer via self-learning,

Xiang Liang, Simin Niu, Zhiyu Li, Sensen Zhang, Shichao Song, Hanyu Wang, Jiawei Yang, Feiyu Xiong, Bo Tang, and Chenyang Xi. Empowering large language models to set up a knowledge retrieval indexer via self-learning. CoRR, abs/2405.16933, 2024

work page arXiv 2024
[55]

A-MEM: Agentic Memory for LLM Agents

Wujiang Xu, Zujie Liang, Kai Mei, Hang Gao, Juntao Tan, and Yongfeng Zhang. A-MEM: agentic memory for LLM agents.CoRR, abs/2502.12110, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[56]

Mem0: Building production- ready ai agents with scalable long-term memory, 2025

Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav. Mem0: Building production- ready ai agents with scalable long-term memory, 2025

work page 2025
[57]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Jill Burstein, Christy Doran, and Thamar Solorio, editors, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies,Volu...

work page 2019
[58]

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

Yuntao Bai, Andy Jones, Kamal Ndousse, Amanda Askell, Anna Chen, Nova DasSarma, Dawn Drain, Stanislav Fort, Deep Ganguli, Tom Henighan, Nicholas Joseph, Saurav Kadavath, Jackson Kernion, Tom Conerly, Sheer El-Showk, Nelson Elhage, Zac Hatfield-Dodds, Danny Hernandez, Tristan Hume, Scott Johnston, Shauna Kravec, Liane Lovitt, Neel Nanda, Catherine Olsson, ...

work page internal anchor Pith review Pith/arXiv arXiv 2022
[59]

Varshney, Caiming Xiong, and Richard Socher

Nitish Shirish Keskar, Bryan McCann, Lav R. Varshney, Caiming Xiong, and Richard Socher. CTRL: A Conditional Transformer Language Model for Controllable Generation, September 2019. arXiv:1909.05858 [cs]

work page arXiv 2019
[60]

Llama SLayer 8B: Shallow Layers Hold the Key to Knowledge Injection

Tianxiang Chen, Zhentao Tan, Tao Gong, Yue Wu, Qi Chu, Bin Liu, Jieping Ye, and Nenghai Yu. Llama SLayer 8B: Shallow Layers Hold the Key to Knowledge Injection. InFindings of the Association for Computational Linguistics: EMNLP 2024, pages 5991–6002, Miami, Florida, USA, 2024. Association for Computational Linguistics

work page 2024
[61]

LoRA: Low-Rank Adaptation of Large Language Models

Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-Rank Adaptation of Large Language Models, October 2021. arXiv:2106.09685 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2021
[62]

Parametric Retrieval Augmented Generation, January 2025

Weihang Su, Yichen Tang, Qingyao Ai, Junxi Yan, Changyue Wang, Hongning Wang, Ziyi Ye, Yujia Zhou, and Yiqun Liu. Parametric Retrieval Augmented Generation, January 2025. arXiv:2501.15915 [cs]

work page arXiv 2025
[63]

Better wit than wealth: Dynamic Parametric Retrieval Augmented Generation for Test-time Knowledge Enhancement, March 2025

Yuqiao Tan, Shizhu He, Huanxuan Liao, Jun Zhao, and Kang Liu. Better wit than wealth: Dynamic Parametric Retrieval Augmented Generation for Test-time Knowledge Enhancement, March 2025. arXiv:2503.23895 [cs]

work page arXiv 2025
[64]

Manning, and Chelsea Finn

Eric Mitchell, Charles Lin, Antoine Bosselut, Christopher D. Manning, and Chelsea Finn. Memory-Based Model Editing at Scale. InProceedings of the 39th International Conference on Machine Learning, pages 15817–15831. PMLR, June 2022. ISSN: 2640-3498

work page 2022
[65]

Calibrating Factual Knowledge in Pretrained Language Models

Qingxiu Dong, Damai Dai, Yifan Song, Jingjing Xu, Zhifang Sui, and Lei Li. Calibrating Factual Knowledge in Pretrained Language Models. InFindings of the Association for Computational Linguistics: EMNLP 2022, pages 5937–5947, Abu Dhabi, United Arab Emirates, 2022. Association for Computational Linguistics

work page 2022
[66]

Decouple knowledge from paramters for plug-and-play language modeling

Xin Cheng, Yankai Lin, Xiuying Chen, Dongyan Zhao, and Rui Yan. Decouple knowledge from paramters for plug-and-play language modeling. InFindings of the Association for Computational Linguistics: ACL 2023, pages 14288–14308, Toronto, Canada, 2023. Association for Computational Linguistics

work page 2023
[67]

Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors, October 2023

Thomas Hartvigsen, Swami Sankaranarayanan, Hamid Palangi, Yoon Kim, and Marzyeh Ghassemi. Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors, October 2023. arXiv:2211.11031 [cs]

work page arXiv 2023
[68]

Locating and Editing Factual Associations in GPT, January 2023

Kevin Meng, David Bau, Alex Andonian, and Yonatan Belinkov. Locating and Editing Factual Associations in GPT, January 2023. arXiv:2202.05262 [cs]

work page arXiv 2023
[69]

Mass-Editing Memory in a Transformer

Kevin Meng, Arnab Sen Sharma, Alex Andonian, Yonatan Belinkov, and David Bau. Mass-Editing Memory in a Transformer, August 2023. arXiv:2210.07229 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2023
[70]

AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models, March 2025

Junfeng Fang, Houcheng Jiang, Kun Wang, Yunshan Ma, Shi Jie, Xiang Wang, Xiangnan He, and Tat-seng Chua. AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models, March 2025. arXiv:2410.02355 [cs]

work page arXiv 2025
[71]

AnyEdit: Edit Any Knowledge Encoded in Language Models, February 2025

Houcheng Jiang, Junfeng Fang, Ningyu Zhang, Guojun Ma, Mingyang Wan, Xiang Wang, Xiangnan He, and Tat-seng Chua. AnyEdit: Edit Any Knowledge Encoded in Language Models, February 2025. arXiv:2502.05628 [cs]

work page arXiv 2025
[72]

A comprehensive study of knowledge editing for large language models,

Ningyu Zhang, Yunzhi Yao, Bozhong Tian, Peng Wang, Shumin Deng, Mengru Wang, Zekun Xi, Shengyu Mao, Jintian Zhang, Yuansheng Ni, Siyuan Cheng, Ziwen Xu, Xin Xu, Jia-Chen Gu, Yong Jiang, Pengjun Xie, Fei Huang, Lei Liang, Zhiqiang Zhang, Xiaowei Zhu, Jun Zhou, and Huajun Chen. A Comprehensive Study of Knowledge Editing for Large Language Models, November 2...

work page arXiv 2024
[73]

Can We Continually Edit Language Models? On the Knowledge Attenuation in Sequential Model Editing

Qi Li and Xiaowen Chu. Can We Continually Edit Language Models? On the Knowledge Attenuation in Sequential Model Editing. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors,Findings of the Association for Computational Linguistics: ACL 2024, pages 5438–5455, Bangkok, Thailand, August 2024. Association for Computational Linguistics

work page 2024
[74]

Mass-Editing Memory with Attention in Transformers: A cross-lingual exploration of knowledge

Daniel Tamayo, Aitor Gonzalez-Agirre, Javier Hernando, and Marta Villegas. Mass-Editing Memory with Attention in Transformers: A cross-lingual exploration of knowledge. In Findings of the Association for Computational Linguistics ACL 2024, pages 5831–5847, 2024. arXiv:2502.02173 [cs]

work page arXiv 2024
[75]

Disentangling Memory and Reasoning Ability in Large Language Models, November 2024

Mingyu Jin, Weidi Luo, Sitao Cheng, Xinyi Wang, Wenyue Hua, Ruixiang Tang, William Yang Wang, and Yongfeng Zhang. Disentangling Memory and Reasoning Ability in Large Language Models, November 2024. arXiv:2411.13504 [cs]. 35

work page arXiv 2024
[76]

Titans: Learning to Memorize at Test Time

Ali Behrouz, Peilin Zhong, and Vahab Mirrokni. Titans: Learning to memorize at test time. CoRR, abs/2501.00663, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[77]

Editing large language models: Problems, methods, and opportunities,

Yunzhi Yao, Peng Wang, Bozhong Tian, Siyuan Cheng, Zhoubo Li, Shumin Deng, Huajun Chen, and Ningyu Zhang. Editing Large Language Models: Problems, Methods, and Opportunities, November 2023. arXiv:2305.13172 [cs]

work page arXiv 2023
[78]

BiasEdit: Debiasing Stereotyped Language Models via Model Editing, March 2025

Xin Xu, Wei Xu, Ningyu Zhang, and Julian McAuley. BiasEdit: Debiasing Stereotyped Language Models via Model Editing, March 2025. arXiv:2503.08588 [cs]

work page arXiv 2025
[79]

Editing Factual Knowledge in Language Models, September 2021

Nicola De Cao, Wilker Aziz, and Ivan Titov. Editing Factual Knowledge in Language Models, September 2021. arXiv:2104.08164 [cs]

work page arXiv 2021
[80]

Eric Mitchell, Charles Lin, Antoine Bosselut, Chelsea Finn, and Christopher D. Manning. Fast Model Editing at Scale. October 2021

work page 2021
[81]

Massive Editing for Large Language Models via Meta Learning

Chenmien Tan, Ge Zhang, and Jie Fu. Massive Editing for Large Language Models via Meta Learning. October 2023

work page 2023

Showing first 80 references.