Recognition: no theorem link
MemOS: A Memory OS for AI System
Pith reviewed 2026-05-15 08:15 UTC · model grok-4.3
The pith
MemOS proposes a memory operating system that unifies plaintext, activation-based, and parameter memories in LLMs through MemCubes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MemOS establishes a memory-centric system that unifies the handling of plaintext, activation-based, and parameter-level memories, with MemCubes serving as the basic unit that encapsulates content and metadata to support flexible transitions and evolution over time.
What carries the argument
The MemCube, which encapsulates memory content and metadata such as provenance and versioning to enable composition, migration, and fusion between different memory types.
If this is right
- LLMs gain the ability to manage knowledge across different time scales and sources with explicit lifecycle control.
- Continual learning becomes feasible through memory composition and evolution without full retraining.
- Personalized modeling improves by integrating user-specific memories with persistent representations.
- Computational costs decrease by externalizing specific knowledge into an intermediate memory layer.
Where Pith is reading between the lines
- The approach could allow hybrid systems where temporary retrieval outputs gradually migrate into stable parameter updates.
- It opens paths to test memory fusion techniques in multi-user or multi-domain scenarios.
- Future work might explore how MemCubes interact with existing short-context windows during inference.
Load-bearing premise
A unified memory layer using MemCubes can be practically implemented to bridge retrieval and parameter-based learning while delivering the claimed reductions in cost and gains in consistency.
What would settle it
An implementation and benchmark experiment demonstrating no meaningful cost savings or consistency improvements relative to standard RAG combined with periodic fine-tuning.
read the original abstract
Large Language Models (LLMs) have become an essential infrastructure for Artificial General Intelligence (AGI), yet their lack of well-defined memory management systems hinders the development of long-context reasoning, continual personalization, and knowledge consistency.Existing models mainly rely on static parameters and short-lived contextual states, limiting their ability to track user preferences or update knowledge over extended periods.While Retrieval-Augmented Generation (RAG) introduces external knowledge in plain text, it remains a stateless workaround without lifecycle control or integration with persistent representations.Recent work has modeled the training and inference cost of LLMs from a memory hierarchy perspective, showing that introducing an explicit memory layer between parameter memory and external retrieval can substantially reduce these costs by externalizing specific knowledge. Beyond computational efficiency, LLMs face broader challenges arising from how information is distributed over time and context, requiring systems capable of managing heterogeneous knowledge spanning different temporal scales and sources. To address this challenge, we propose MemOS, a memory operating system that treats memory as a manageable system resource. It unifies the representation, scheduling, and evolution of plaintext, activation-based, and parameter-level memories, enabling cost-efficient storage and retrieval. As the basic unit, a MemCube encapsulates both memory content and metadata such as provenance and versioning. MemCubes can be composed, migrated, and fused over time, enabling flexible transitions between memory types and bridging retrieval with parameter-based learning. MemOS establishes a memory-centric system framework that brings controllability, plasticity, and evolvability to LLMs, laying the foundation for continual learning and personalized modeling.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes MemOS, a memory operating system for LLMs that unifies plaintext, activation-based, and parameter-level memories. MemCubes serve as the basic unit, encapsulating content and metadata (e.g., provenance, versioning); they can be composed, migrated, and fused to enable flexible transitions between memory types, externalize knowledge, and deliver controllability, plasticity, and evolvability for continual learning and personalized modeling.
Significance. If the unification via MemCubes can be realized with concrete mechanisms, the framework could address a genuine gap in LLM memory management by providing lifecycle control beyond static parameters or stateless RAG, potentially lowering costs through externalization and supporting long-term consistency. The conceptual contribution is clear, but significance remains prospective given the absence of implementation details or validation.
major comments (3)
- [Abstract] Abstract: the assertion that MemCubes 'can be composed, migrated, and fused' to bridge retrieval with parameter-based learning is load-bearing for the central claim, yet no operators, fusion semantics, migration protocols, or scheduling algorithms are defined.
- [Abstract] Abstract: the claim that an explicit memory layer 'can substantially reduce' training and inference costs lacks any cost model, equations, or quantitative comparison to RAG or fine-tuning baselines.
- [Abstract] Abstract: the benefits of controllability, plasticity, and evolvability are asserted without specifying how transitions between memory types are realized inside training or inference loops, or how versioning and provenance metadata are maintained across fusions.
minor comments (1)
- The manuscript would benefit from an explicit system diagram or pseudocode section illustrating MemCube lifecycle operations.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We agree that the abstract would benefit from greater specificity on mechanisms and will revise it to better support the central claims while preserving the paper's focus as a framework proposal. Point-by-point responses to the major comments are provided below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the assertion that MemCubes 'can be composed, migrated, and fused' to bridge retrieval with parameter-based learning is load-bearing for the central claim, yet no operators, fusion semantics, migration protocols, or scheduling algorithms are defined.
Authors: The body of the manuscript (Sections 3–5) defines these elements: composition operators (union/intersection with metadata alignment), fusion semantics (provenance-preserving merge rules), migration protocols (via the MemCube scheduler), and scheduling algorithms (priority-based eviction and promotion). To address the concern that the abstract is insufficiently self-contained, we will add a concise sentence summarizing these mechanisms and reference the relevant sections. revision: yes
-
Referee: [Abstract] Abstract: the claim that an explicit memory layer 'can substantially reduce' training and inference costs lacks any cost model, equations, or quantitative comparison to RAG or fine-tuning baselines.
Authors: The claim is grounded in the memory-hierarchy cost analysis cited in the introduction. We acknowledge that the abstract does not include the equations or direct comparisons. In revision we will insert a brief reference to the cost model and add a short quantitative comparison paragraph (drawing on the cited prior work) to the abstract and discussion section. revision: yes
-
Referee: [Abstract] Abstract: the benefits of controllability, plasticity, and evolvability are asserted without specifying how transitions between memory types are realized inside training or inference loops, or how versioning and provenance metadata are maintained across fusions.
Authors: Section 4 describes the MemOS scheduler realizing type transitions inside both training and inference loops, with versioning and provenance maintained via immutable metadata logs that survive fusion through a defined merge protocol. We will revise the abstract to include a high-level statement of these transition and metadata mechanisms. revision: yes
Circularity Check
No significant circularity in conceptual system proposal
full rationale
The paper presents a high-level architectural proposal for MemOS without any mathematical derivations, equations, or quantitative predictions. Concepts such as MemCubes are introduced descriptively as encapsulating content and metadata, with operations like composition, migration, and fusion stated as capabilities rather than derived from prior definitions or fitted inputs. No self-citation chains, uniqueness theorems, or ansatzes are invoked to support core claims; the benefits of controllability, plasticity, and evolvability are framed as outcomes of the proposed framework itself. The text contains no load-bearing reductions where a result equals its inputs by construction, making the derivation chain self-contained as a system design document.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption LLMs lack well-defined memory management systems that support long-term tracking and evolution of knowledge
- domain assumption An explicit memory layer between parameters and external retrieval can reduce costs and enable unification of heterogeneous knowledge
invented entities (2)
-
MemCube
no independent evidence
-
MemOS
no independent evidence
Forward citations
Cited by 25 Pith papers
-
AlpsBench: An LLM Personalization Benchmark for Real-Dialogue Memorization and Preference Alignment
AlpsBench supplies 2500 real-dialogue sequences with verified memories to benchmark LLM extraction, updating, retrieval, and utilization of personalized information.
-
Agentic Recommender System with Hierarchical Belief-State Memory
MARS uses hierarchical memory and LLM planning to achieve 26.4% higher HR@1 on InstructRec benchmarks compared to prior methods.
-
Belief Memory: Agent Memory Under Partial Observability
BeliefMem is a probabilistic memory architecture for LLM agents that retains multiple candidate conclusions with probabilities updated by Noisy-OR, achieving superior average performance over deterministic baselines o...
-
Belief Memory: Agent Memory Under Partial Observability
BeliefMem stores multiple candidate conclusions with probabilities in agent memory and updates them via Noisy-OR rules to preserve uncertainty under partial observability.
-
Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory
MemCoE learns memory organization guidelines via contrastive feedback and then trains a guideline-aligned RL policy for memory updates, yielding consistent gains on personalization benchmarks.
-
Cognifold: Always-On Proactive Memory via Cognitive Folding
Cognifold is a new proactive memory architecture that folds event streams into emergent cognitive structures by extending complementary learning systems theory with a prefrontal intent layer and graph topology self-or...
-
HAGE: Harnessing Agentic Memory via RL-Driven Weighted Graph Evolution
HAGE proposes a trainable weighted graph memory framework with LLM intent classification, dynamic edge modulation, and RL optimization that improves long-horizon reasoning accuracy in agentic LLMs over static baselines.
-
MemPrivacy: Privacy-Preserving Personalized Memory Management for Edge-Cloud Agents
MemPrivacy replaces privacy-sensitive spans with structured placeholders on edge devices to enable effective cloud memory management while limiting utility loss to 1.6% and outperforming general models on privacy extraction.
-
MemPrivacy: Privacy-Preserving Personalized Memory Management for Edge-Cloud Agents
MemPrivacy uses edge-side privacy span detection and semantic placeholders to enable cloud memory management for LLM agents while limiting utility loss to 1.6% and outperforming masking baselines.
-
MemPrivacy: Privacy-Preserving Personalized Memory Management for Edge-Cloud Agents
MemPrivacy uses edge detection of sensitive spans and type-aware placeholders to enable cloud-side memory management for LLM agents without exposing private data, achieving under 1.6% utility loss.
-
MemReader: From Passive to Active Extraction for Long-Term Agent Memory
MemReader uses distilled passive and GRPO-trained active extractors to selectively write low-noise long-term memories, outperforming passive baselines on knowledge updating, temporal reasoning, and hallucination tasks.
-
HingeMem: Boundary Guided Long-Term Memory with Query Adaptive Retrieval for Scalable Dialogues
HingeMem segments dialogue memory via boundary-triggered hyperedges over four elements and applies query-adaptive retrieval, yielding ~20% relative gains and 68% lower QA token cost versus baselines on LOCOMO.
-
FileGram: Grounding Agent Personalization in File-System Behavioral Traces
FileGram grounds AI agent personalization in file-system behavioral traces via a data simulation engine, a diagnostic benchmark, and a bottom-up memory architecture.
-
Memory in the LLM Era: Modular Architectures and Strategies in a Unified Framework
A unified framework for LLM agent memory is benchmarked, with a new hybrid method outperforming state-of-the-art on standard tasks.
-
Oblivion: Self-Adaptive Agentic Memory Control through Decay-Driven Activation
Oblivion is a decay-driven memory framework that decouples read and write paths in LLM agents to enable adaptive forgetting and reinforcement for better long-horizon reasoning.
-
MemFactory: Unified Inference & Training Framework for Agent Memory
MemFactory is a new unified modular framework for memory-augmented LLM agent inference and training that integrates GRPO and reports up to 14.8% relative gains on MemAgent evaluations.
-
Safety in Embodied AI: A Survey of Risks, Attacks, and Defenses
The survey organizes over 400 papers on embodied AI safety into a multi-level taxonomy and flags overlooked issues such as fragile multimodal fusion and unstable planning under jailbreaks.
-
PersonaVLM: Long-Term Personalized Multimodal LLMs
PersonaVLM adds memory extraction, multi-turn retrieval-based reasoning, and personality inference to multimodal LLMs, yielding 22.4% gains on a new long-term personalization benchmark and outperforming GPT-4o.
-
MemReread: Enhancing Agentic Long-Context Reasoning via Memory-Guided Rereading
MemReread improves agent long-context reasoning by triggering rereading on insufficient final memory to recover discarded indirect facts, outperforming baselines at linear complexity.
-
MemReranker: Reasoning-Aware Reranking for Agent Memory Retrieval
MemReranker applies multi-stage distillation to Qwen3-Reranker to produce reasoning-aware rerankers that outperform baselines on memory tasks with temporal and causal constraints.
-
Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering
LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.
-
MemCoT: Test-Time Scaling through Memory-Driven Chain-of-Thought
MemCoT redefines long-context reasoning as iterative stateful search with zoom-in/zoom-out memory perception and dual short-term memories, claiming SOTA results on LoCoMo and LongMemEval-S benchmarks.
-
MemMachine: A Ground-Truth-Preserving Memory System for Personalized AI Agents
MemMachine stores entire conversational episodes and applies contextualized retrieval plus adaptive query routing to achieve 0.9169 accuracy on LoCoMo and 93 percent on LongMemEvalS while using 80 percent fewer tokens...
-
MemReranker: Reasoning-Aware Reranking for Agent Memory Retrieval
MemReranker applies multi-teacher pairwise distillation, BCE pointwise training, and InfoNCE contrastive learning on mixed general and memory-specific dialogue data to produce efficient rerankers that improve calibrat...
-
Memory as Metabolism: A Design for Companion Knowledge Systems
This paper designs a companion knowledge system with TRIAGE, DECAY, CONTEXTUALIZE, CONSOLIDATE, and AUDIT operations plus memory gravity and minority-hypothesis retention to give contradictory evidence a path to updat...
Reference graph
Works this paper leans on
-
[1]
Hongkang Yang, Zehao Lin, Wenjin Wang, Hao Wu, Zhiyu Li, Bo Tang, Wenqiang Wei, Jinbo Wang, Zeyun Tang, Shichao Song, Chenyang Xi, Yu Yu, Kai Chen, Feiyu Xiong, Linpeng Tang, and Weinan E. Memory3: Language modeling with explicit memory.Journal of Machine Learning, 3(3):300–346, January 2024
work page 2024
-
[2]
A Survey of Large Language Models
Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, et al. A survey of large language models.arXiv preprint arXiv:2303.18223, 1(2), 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[3]
Yue Wang, Hung Le, Akhilesh Deepak Gotmare, Nghi DQ Bui, Junnan Li, and Steven CH Hoi. Codet5+: Open code large language models for code understanding and generation.arXiv preprint arXiv:2305.07922, 2023
-
[4]
Shengsheng Qian, Zuyi Zhou, Dizhan Xue, Bing Wang, and Changsheng Xu. From linguistic giants to sensory maestros: A survey on cross-modal reasoning with large language models.arXiv preprint arXiv:2409.18996, 2024
-
[5]
Retrieval-Augmented Generation for AI-Generated Content: A Survey
Penghao Zhao, Hailin Zhang, Qinhan Yu, Zhengren Wang, Yunteng Geng, Fangcheng Fu, Ling Yang, Wentao Zhang, and Bin Cui. Retrieval-augmented generation for ai-generated content: A survey.CoRR, abs/2402.19473, 2024. 31
work page internal anchor Pith review arXiv 2024
-
[6]
Retrieval-Augmented Generation for Large Language Models: A Survey
Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Qianyu Guo, Meng Wang, and Haofen Wang. Retrieval-augmented generation for large language models: A survey.CoRR, abs/2312.10997, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[7]
Qinggang Zhang, Shengyuan Chen, Yuanchen Bei, Zheng Yuan, Huachi Zhou, Zijin Hong, Junnan Dong, Hao Chen, Yi Chang, and Xiao Huang. A survey of graph retrieval-augmented generation for customized large language models. CoRR, abs/2501.13958, 2025
-
[8]
Bo Ni, Zheyuan Liu, Leyao Wang, Yongjia Lei, Yuying Zhao, Xueqi Cheng, Qingkai Zeng, Luna Dong, Yinglong Xia, Krishnaram Kenthapadi, Ryan A. Rossi, Franck Dernoncourt, Md. Mehrab Tanjim, Nesreen K. Ahmed, Xiaorui Liu, Wenqi Fan, Erik Blasch, Yu Wang, Meng Jiang, and Tyler Derr. Towards trustworthy retrieval augmented generation for large language models: ...
-
[9]
Howard Chen, Ramakanth Pasunuru, Jason Weston, and Asli Celikyilmaz. Walking down the memory maze: Beyond context limit through interactive reading.CoRR, abs/2310.05029, 2023
-
[10]
From Local to Global: A Graph RAG Approach to Query-Focused Summarization
Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, and Jonathan Larson. From local to global: A graph RAG approach to query-focused summarization.CoRR, abs/2404.16130, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[11]
LightRAG: Simple and Fast Retrieval-Augmented Generation
Zirui Guo, Lianghao Xia, Yanhua Yu, Tu Ao, and Chao Huang. Lightrag: Simple and fast retrieval-augmented generation. CoRR, abs/2410.05779, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[12]
Retrieval augmented generation (rag) in azure ai search, 2025
Microsoft. Retrieval augmented generation (rag) in azure ai search, 2025
work page 2025
- [13]
-
[14]
Build innovative ai search experiences, 2025
Elastic. Build innovative ai search experiences, 2025
work page 2025
-
[15]
Agentic rag-as-a-service company, 2025
Nuclia. Agentic rag-as-a-service company, 2025
work page 2025
-
[16]
Freshllms: Refreshing large language models with search engine augmentation, 2023
Tu Vu, Mohit Iyyer, Xuezhi Wang, Noah Constant, Jerry Wei, Jason Wei, Chris Tar, Yun-Hsuan Sung, Denny Zhou, Quoc Le, and Thang Luong. Freshllms: Refreshing large language models with search engine augmentation, 2023
work page 2023
-
[17]
Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[18]
Cursor - The AI Code Editor
- [19]
-
[20]
From Human Memory to AI Memory: A Survey on Memory Mechanisms in the Era of LLMs, April 2025
Yaxiong Wu, Sheng Liang, Chen Zhang, Yichao Wang, Yongyue Zhang, Huifeng Guo, Ruiming Tang, and Yong Liu. From Human Memory to AI Memory: A Survey on Memory Mechanisms in the Era of LLMs, April 2025. arXiv:2504.15965 [cs]
-
[21]
Cognitive Memory in Large Language Models, April 2025
Lianlei Shan, Shixian Luo, Zezhou Zhu, Yu Yuan, and Yong Wu. Cognitive Memory in Large Language Models, April 2025. arXiv:2504.02441 [cs]
-
[22]
Language Models are Unsupervised Multitask Learners
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language Models are Unsupervised Multitask Learners
-
[23]
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin...
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[24]
Prefix-Tuning: Optimizing Continuous Prompts for Generation
Xiang Lisa Li and Percy Liang. Prefix-Tuning: Optimizing Continuous Prompts for Generation. InProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume1: Long Papers), pages 4582–4597, Online, 2021. Associa- tion for Computational Linguistics. 32
work page 2021
-
[25]
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester, Rami Al-Rfou, and Noah Constant. The Power of Scale for Parameter-Efficient Prompt Tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3045–3059, Online and Punta Cana, Dominican Republic, 2021. Association for Computational Linguistics
work page 2021
-
[26]
arXiv preprint arXiv:2103.10385 , year=
Xiao Liu, Yanan Zheng, Zhengxiao Du, Ming Ding, Yujie Qian, Zhilin Yang, and Jie Tang. GPT Understands, Too, October 2023. arXiv:2103.10385 [cs]
-
[27]
P-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks
Xiao Liu, Kaixuan Ji, Yicheng Fu, Weng Tam, Zhengxiao Du, Zhilin Yang, and Jie Tang. P-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume2: Short Papers), pages 61–68, Dublin, Ireland, 2022. Association for Computational Linguistics
work page 2022
-
[28]
Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, and Ryan Lowe. Training language models to follow instructions with human feedback,...
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[29]
Efficient Memory Management for Large Language Model Serving with PagedAttention
Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph Gonzalez, Hao Zhang, and Ion Stoica. Efficient Memory Management for Large Language Model Serving with PagedAttention. In Proceedings of the 29th Symposium on Operating Systems Principles, pages 611–626, Koblenz Germany, October 2023. ACM
work page 2023
-
[30]
Efficient Streaming Language Models with Attention Sinks
Guangxuan Xiao, Yuandong Tian, Beidi Chen, Song Han, and Mike Lewis. Efficient Streaming Language Models with Attention Sinks. October 2023
work page 2023
-
[31]
Zhenyu Zhang, Ying Sheng, Tianyi Zhou, Tianlong Chen, Lianmin Zheng, Ruisi Cai, Zhao Song, Yuandong Tian, Christopher Ré, Clark Barrett, Zhangyang "Atlas" Wang, and Beidi Chen. H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.Advances in Neural Information Processing Systems, 36:34661–34710, December 2023
work page 2023
-
[32]
Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference
Harry Dong, Xinyu Yang, Zhenyu Zhang, Zhangyang Wang, Yuejie Chi, and Beidi Chen. Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference. June 2024
work page 2024
-
[33]
Coleman Hooper, Sehoon Kim, Hiva Mohammadzadeh, Michael W. Mahoney, Yakun S. Shao, Kurt Keutzer, and Amir Gholami. KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization. Advancesin Neural Information Processing Systems, 37:1270–1303, December 2024
work page 2024
-
[34]
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval, December 2024
Di Liu, Meng Chen, Baotong Lu, Huiqiang Jiang, Zhenhua Han, Qianxi Zhang, Qi Chen, Chengruidong Zhang, Bailu Ding, Kai Zhang, Chen Chen, Fan Yang, Yuqing Yang, and Lili Qiu. RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval, December 2024. arXiv:2409.10516 [cs]
-
[35]
Extracting Latent Steering Vectors from Pretrained Language Models
Nishant Subramani, Nivedita Suresh, and Matthew Peters. Extracting Latent Steering Vectors from Pretrained Language Models. In Findings of the Association for Computational Linguistics: ACL 2022, pages 566–581, Dublin, Ireland, 2022. Association for Computational Linguistics
work page 2022
-
[36]
Sheng Liu, Haotian Ye, Lei Xing, and James Zou. In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering, February 2024. arXiv:2311.06668 [cs]
-
[37]
Steering Language Models With Activation Engineering
Alexander Matt Turner, Lisa Thiergart, Gavin Leech, David Udell, Juan J. Vazquez, Ulisse Mini, and Monte MacDiarmid. Steering Language Models With Activation Engineering, October 2024. arXiv:2308.10248 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[38]
Style vectors for steering generative large language models
Kai Konen, Sophie Jentzsch, Diaoulé Diallo, Peer Schütt, Oliver Bensch, Roxanne El Baff, Dominik Opitz, and Tobias Hecking. Style vectors for steering generative large language models. In Yvette Graham and Matthew Purver, editors, Findings of the Association for Computational Linguistics: EACL 2024, pages 782–802, St. Julian’s, Malta, March 2024. Associat...
work page 2024
-
[39]
Steering Llama 2 via Contrastive Activation Addition
Nina Rimsky, Nick Gabrieli, Julian Schulz, Meg Tong, Evan Hubinger, and Alexander Turner. Steering Llama 2 via Contrastive Activation Addition. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors,Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume1: Long Papers), pages 15504–15522, Bangkok, Thailand, Augus...
work page 2024
-
[40]
Zijian Feng, Hanzhang Zhou, Kezhi Mao, and Zixiao Zhu. FreeCtrl: Constructing Control Centers with Feedforward Layers for Learning-Free Controllable Text Generation. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume1: Long Papers), pages 7627–7640, Bangkok, Thailand,
-
[41]
Association for Computational Linguistics. 33
-
[42]
EasyEdit2: An Easy-to-use Steering Framework for Editing Large Language Models, April 2025
Ziwen Xu, Shuxun Wang, Kewei Xu, Haoming Xu, Mengru Wang, Xinle Deng, Yunzhi Yao, Guozhou Zheng, Huajun Chen, and Ningyu Zhang. EasyEdit2: An Easy-to-use Steering Framework for Editing Large Language Models, April 2025. arXiv:2504.15133 [cs]
-
[43]
Enhancing Multiple Dimensions of Trustworthiness in LLMs via Sparse Activation Control
Yuxin Xiao, Chaoqun Wan, Yonggang Zhang, Wenxiao Wang, Binbin Lin, Xiaofei He, Xu Shen, and Jieping Ye. Enhancing Multiple Dimensions of Trustworthiness in LLMs via Sparse Activation Control. November 2024
work page 2024
-
[44]
Yu Li, Han Jiang, Chuanyang Gong, and Zhihua Wei. DESTEIN: Navigating Detoxification of Language Models via Universal Steering Pairs and Head-wise Activation Fusion, August 2024. arXiv:2404.10464 [cs]
-
[45]
Word Embeddings Are Steers for Language Models
Chi Han, Jialiang Xu, Manling Li, Yi Fung, Chenkai Sun, Nan Jiang, Tarek Abdelzaher, and Heng Ji. Word Embeddings Are Steers for Language Models. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors, Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 16410–16430, Bangkok, Thailand, A...
work page 2024
-
[46]
Generalization through Memorization: Nearest Neighbor Language Models
Urvashi Khandelwal, Omer Levy, Dan Jurafsky, Luke Zettlemoyer, and Mike Lewis. Generalization through Memorization: Nearest Neighbor Language Models. September 2019
work page 2019
-
[47]
Goodtriever: Adaptive Toxicity Mitigation with Retrieval-augmented Models
Luiza Pozzobon, Beyza Ermis, Patrick Lewis, and Sara Hooker. Goodtriever: Adaptive Toxicity Mitigation with Retrieval-augmented Models. InFindings of the Association for Computational Linguistics: EMNLP 2023, pages 5108–5125, Singapore, 2023. Association for Computational Linguistics
work page 2023
-
[48]
Noderag: Structuring graph-based rag with heterogeneous nodes, 2025
Tianyang Xu, Haojie Zheng, Chengze Li, Haoxiang Chen, Yixin Liu, Ruoxi Chen, and Lichao Sun. Noderag: Structuring graph-based rag with heterogeneous nodes, 2025
work page 2025
-
[49]
Peiru Yang, Xintian Li, Zhiyang Hu, Jiapeng Wang, Jinhua Yin, Huili Wang, Lizhi He, Shuai Yang, Shangguang Wang, Yongfeng Huang, and Tao Qi. Heterag: A heterogeneous retrieval-augmented generation framework with decoupled knowledge representations, 2025
work page 2025
-
[50]
Haoran Luo, Haihong E, Guanting Chen, Yandan Zheng, Xiaobao Wu, Yikai Guo, Qika Lin, Yu Feng, Ze-min Kuang, Meina Song, Yifan Zhu, and Luu Anh Tuan. Hypergraphrag: Retrieval-augmented generation with hypergraph-structured knowledge representation.CoRR, abs/2503.21322, 2025
-
[51]
Hipporag: Neurobiologically inspired long-term memory for large language models
Bernal Jimenez Gutierrez, Yiheng Shu, Yu Gu, Michihiro Yasunaga, and Yu Su. Hipporag: Neurobiologically inspired long-term memory for large language models. In Amir Globersons, Lester Mackey, Danielle Belgrave, Angela Fan, Ulrich Paquet, Jakub M. Tomczak, and Cheng Zhang, editors,Advances in Neural Information Processing Systems 38: Annual Conference on N...
work page 2024
-
[52]
Bernal Jiménez Gutiérrez, Yiheng Shu, Weijian Qi, Sizhe Zhou, and Yu Su. From RAG to memory: Non- parametric continual learning for large language models.CoRR, abs/2502.14802, 2025
-
[53]
Empowering large language models to set up a knowledge retrieval indexer via self-learning,
Xiang Liang, Simin Niu, Zhiyu Li, Sensen Zhang, Shichao Song, Hanyu Wang, Jiawei Yang, Feiyu Xiong, Bo Tang, and Chenyang Xi. Empowering large language models to set up a knowledge retrieval indexer via self-learning. CoRR, abs/2405.16933, 2024
-
[55]
A-MEM: Agentic Memory for LLM Agents
Wujiang Xu, Zujie Liang, Kai Mei, Hang Gao, Juntao Tan, and Yongfeng Zhang. A-MEM: agentic memory for LLM agents.CoRR, abs/2502.12110, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[56]
Mem0: Building production- ready ai agents with scalable long-term memory, 2025
Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav. Mem0: Building production- ready ai agents with scalable long-term memory, 2025
work page 2025
-
[57]
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Jill Burstein, Christy Doran, and Thamar Solorio, editors, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies,Volu...
work page 2019
-
[58]
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Yuntao Bai, Andy Jones, Kamal Ndousse, Amanda Askell, Anna Chen, Nova DasSarma, Dawn Drain, Stanislav Fort, Deep Ganguli, Tom Henighan, Nicholas Joseph, Saurav Kadavath, Jackson Kernion, Tom Conerly, Sheer El-Showk, Nelson Elhage, Zac Hatfield-Dodds, Danny Hernandez, Tristan Hume, Scott Johnston, Shauna Kravec, Liane Lovitt, Neel Nanda, Catherine Olsson, ...
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[59]
Varshney, Caiming Xiong, and Richard Socher
Nitish Shirish Keskar, Bryan McCann, Lav R. Varshney, Caiming Xiong, and Richard Socher. CTRL: A Conditional Transformer Language Model for Controllable Generation, September 2019. arXiv:1909.05858 [cs]
-
[60]
Llama SLayer 8B: Shallow Layers Hold the Key to Knowledge Injection
Tianxiang Chen, Zhentao Tan, Tao Gong, Yue Wu, Qi Chu, Bin Liu, Jieping Ye, and Nenghai Yu. Llama SLayer 8B: Shallow Layers Hold the Key to Knowledge Injection. InFindings of the Association for Computational Linguistics: EMNLP 2024, pages 5991–6002, Miami, Florida, USA, 2024. Association for Computational Linguistics
work page 2024
-
[61]
LoRA: Low-Rank Adaptation of Large Language Models
Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-Rank Adaptation of Large Language Models, October 2021. arXiv:2106.09685 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[62]
Parametric Retrieval Augmented Generation, January 2025
Weihang Su, Yichen Tang, Qingyao Ai, Junxi Yan, Changyue Wang, Hongning Wang, Ziyi Ye, Yujia Zhou, and Yiqun Liu. Parametric Retrieval Augmented Generation, January 2025. arXiv:2501.15915 [cs]
-
[63]
Yuqiao Tan, Shizhu He, Huanxuan Liao, Jun Zhao, and Kang Liu. Better wit than wealth: Dynamic Parametric Retrieval Augmented Generation for Test-time Knowledge Enhancement, March 2025. arXiv:2503.23895 [cs]
-
[64]
Eric Mitchell, Charles Lin, Antoine Bosselut, Christopher D. Manning, and Chelsea Finn. Memory-Based Model Editing at Scale. InProceedings of the 39th International Conference on Machine Learning, pages 15817–15831. PMLR, June 2022. ISSN: 2640-3498
work page 2022
-
[65]
Calibrating Factual Knowledge in Pretrained Language Models
Qingxiu Dong, Damai Dai, Yifan Song, Jingjing Xu, Zhifang Sui, and Lei Li. Calibrating Factual Knowledge in Pretrained Language Models. InFindings of the Association for Computational Linguistics: EMNLP 2022, pages 5937–5947, Abu Dhabi, United Arab Emirates, 2022. Association for Computational Linguistics
work page 2022
-
[66]
Decouple knowledge from paramters for plug-and-play language modeling
Xin Cheng, Yankai Lin, Xiuying Chen, Dongyan Zhao, and Rui Yan. Decouple knowledge from paramters for plug-and-play language modeling. InFindings of the Association for Computational Linguistics: ACL 2023, pages 14288–14308, Toronto, Canada, 2023. Association for Computational Linguistics
work page 2023
-
[67]
Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors, October 2023
Thomas Hartvigsen, Swami Sankaranarayanan, Hamid Palangi, Yoon Kim, and Marzyeh Ghassemi. Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors, October 2023. arXiv:2211.11031 [cs]
-
[68]
Locating and Editing Factual Associations in GPT, January 2023
Kevin Meng, David Bau, Alex Andonian, and Yonatan Belinkov. Locating and Editing Factual Associations in GPT, January 2023. arXiv:2202.05262 [cs]
-
[69]
Mass-Editing Memory in a Transformer
Kevin Meng, Arnab Sen Sharma, Alex Andonian, Yonatan Belinkov, and David Bau. Mass-Editing Memory in a Transformer, August 2023. arXiv:2210.07229 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[70]
AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models, March 2025
Junfeng Fang, Houcheng Jiang, Kun Wang, Yunshan Ma, Shi Jie, Xiang Wang, Xiangnan He, and Tat-seng Chua. AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models, March 2025. arXiv:2410.02355 [cs]
-
[71]
AnyEdit: Edit Any Knowledge Encoded in Language Models, February 2025
Houcheng Jiang, Junfeng Fang, Ningyu Zhang, Guojun Ma, Mingyang Wan, Xiang Wang, Xiangnan He, and Tat-seng Chua. AnyEdit: Edit Any Knowledge Encoded in Language Models, February 2025. arXiv:2502.05628 [cs]
-
[72]
A comprehensive study of knowledge editing for large language models,
Ningyu Zhang, Yunzhi Yao, Bozhong Tian, Peng Wang, Shumin Deng, Mengru Wang, Zekun Xi, Shengyu Mao, Jintian Zhang, Yuansheng Ni, Siyuan Cheng, Ziwen Xu, Xin Xu, Jia-Chen Gu, Yong Jiang, Pengjun Xie, Fei Huang, Lei Liang, Zhiqiang Zhang, Xiaowei Zhu, Jun Zhou, and Huajun Chen. A Comprehensive Study of Knowledge Editing for Large Language Models, November 2...
-
[73]
Can We Continually Edit Language Models? On the Knowledge Attenuation in Sequential Model Editing
Qi Li and Xiaowen Chu. Can We Continually Edit Language Models? On the Knowledge Attenuation in Sequential Model Editing. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors,Findings of the Association for Computational Linguistics: ACL 2024, pages 5438–5455, Bangkok, Thailand, August 2024. Association for Computational Linguistics
work page 2024
-
[74]
Mass-Editing Memory with Attention in Transformers: A cross-lingual exploration of knowledge
Daniel Tamayo, Aitor Gonzalez-Agirre, Javier Hernando, and Marta Villegas. Mass-Editing Memory with Attention in Transformers: A cross-lingual exploration of knowledge. In Findings of the Association for Computational Linguistics ACL 2024, pages 5831–5847, 2024. arXiv:2502.02173 [cs]
-
[75]
Disentangling Memory and Reasoning Ability in Large Language Models, November 2024
Mingyu Jin, Weidi Luo, Sitao Cheng, Xinyi Wang, Wenyue Hua, Ruixiang Tang, William Yang Wang, and Yongfeng Zhang. Disentangling Memory and Reasoning Ability in Large Language Models, November 2024. arXiv:2411.13504 [cs]. 35
-
[76]
Titans: Learning to Memorize at Test Time
Ali Behrouz, Peilin Zhong, and Vahab Mirrokni. Titans: Learning to memorize at test time. CoRR, abs/2501.00663, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[77]
Editing large language models: Problems, methods, and opportunities,
Yunzhi Yao, Peng Wang, Bozhong Tian, Siyuan Cheng, Zhoubo Li, Shumin Deng, Huajun Chen, and Ningyu Zhang. Editing Large Language Models: Problems, Methods, and Opportunities, November 2023. arXiv:2305.13172 [cs]
-
[78]
BiasEdit: Debiasing Stereotyped Language Models via Model Editing, March 2025
Xin Xu, Wei Xu, Ningyu Zhang, and Julian McAuley. BiasEdit: Debiasing Stereotyped Language Models via Model Editing, March 2025. arXiv:2503.08588 [cs]
-
[79]
Editing Factual Knowledge in Language Models, September 2021
Nicola De Cao, Wilker Aziz, and Ivan Titov. Editing Factual Knowledge in Language Models, September 2021. arXiv:2104.08164 [cs]
-
[80]
Eric Mitchell, Charles Lin, Antoine Bosselut, Chelsea Finn, and Christopher D. Manning. Fast Model Editing at Scale. October 2021
work page 2021
-
[81]
Massive Editing for Large Language Models via Meta Learning
Chenmien Tan, Ge Zhang, and Jie Fu. Massive Editing for Large Language Models via Meta Learning. October 2023
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.