pith. machine review for the scientific record. sign in

arxiv: 2507.03724 · v4 · submitted 2025-07-04 · 💻 cs.CL

Recognition: no theorem link

MemOS: A Memory OS for AI System

Authors on Pith no claims yet

Pith reviewed 2026-05-15 08:15 UTC · model grok-4.3

classification 💻 cs.CL
keywords memory managementlarge language modelscontinual learningMemCuberetrieval-augmented generationpersonalized modelingmemory hierarchysystem framework
0
0 comments X

The pith

MemOS proposes a memory operating system that unifies plaintext, activation-based, and parameter memories in LLMs through MemCubes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces MemOS as a framework that treats memory as a manageable system resource for large language models. It unifies representation, scheduling, and evolution across plaintext, activation, and parameter-level memories to address limitations in long-context reasoning and knowledge consistency. By using MemCubes as the core unit, the system enables composition, migration, and fusion of memories, bridging retrieval methods with parameter updates for more efficient and controllable operation.

Core claim

MemOS establishes a memory-centric system that unifies the handling of plaintext, activation-based, and parameter-level memories, with MemCubes serving as the basic unit that encapsulates content and metadata to support flexible transitions and evolution over time.

What carries the argument

The MemCube, which encapsulates memory content and metadata such as provenance and versioning to enable composition, migration, and fusion between different memory types.

If this is right

  • LLMs gain the ability to manage knowledge across different time scales and sources with explicit lifecycle control.
  • Continual learning becomes feasible through memory composition and evolution without full retraining.
  • Personalized modeling improves by integrating user-specific memories with persistent representations.
  • Computational costs decrease by externalizing specific knowledge into an intermediate memory layer.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could allow hybrid systems where temporary retrieval outputs gradually migrate into stable parameter updates.
  • It opens paths to test memory fusion techniques in multi-user or multi-domain scenarios.
  • Future work might explore how MemCubes interact with existing short-context windows during inference.

Load-bearing premise

A unified memory layer using MemCubes can be practically implemented to bridge retrieval and parameter-based learning while delivering the claimed reductions in cost and gains in consistency.

What would settle it

An implementation and benchmark experiment demonstrating no meaningful cost savings or consistency improvements relative to standard RAG combined with periodic fine-tuning.

read the original abstract

Large Language Models (LLMs) have become an essential infrastructure for Artificial General Intelligence (AGI), yet their lack of well-defined memory management systems hinders the development of long-context reasoning, continual personalization, and knowledge consistency.Existing models mainly rely on static parameters and short-lived contextual states, limiting their ability to track user preferences or update knowledge over extended periods.While Retrieval-Augmented Generation (RAG) introduces external knowledge in plain text, it remains a stateless workaround without lifecycle control or integration with persistent representations.Recent work has modeled the training and inference cost of LLMs from a memory hierarchy perspective, showing that introducing an explicit memory layer between parameter memory and external retrieval can substantially reduce these costs by externalizing specific knowledge. Beyond computational efficiency, LLMs face broader challenges arising from how information is distributed over time and context, requiring systems capable of managing heterogeneous knowledge spanning different temporal scales and sources. To address this challenge, we propose MemOS, a memory operating system that treats memory as a manageable system resource. It unifies the representation, scheduling, and evolution of plaintext, activation-based, and parameter-level memories, enabling cost-efficient storage and retrieval. As the basic unit, a MemCube encapsulates both memory content and metadata such as provenance and versioning. MemCubes can be composed, migrated, and fused over time, enabling flexible transitions between memory types and bridging retrieval with parameter-based learning. MemOS establishes a memory-centric system framework that brings controllability, plasticity, and evolvability to LLMs, laying the foundation for continual learning and personalized modeling.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper proposes MemOS, a memory operating system for LLMs that unifies plaintext, activation-based, and parameter-level memories. MemCubes serve as the basic unit, encapsulating content and metadata (e.g., provenance, versioning); they can be composed, migrated, and fused to enable flexible transitions between memory types, externalize knowledge, and deliver controllability, plasticity, and evolvability for continual learning and personalized modeling.

Significance. If the unification via MemCubes can be realized with concrete mechanisms, the framework could address a genuine gap in LLM memory management by providing lifecycle control beyond static parameters or stateless RAG, potentially lowering costs through externalization and supporting long-term consistency. The conceptual contribution is clear, but significance remains prospective given the absence of implementation details or validation.

major comments (3)
  1. [Abstract] Abstract: the assertion that MemCubes 'can be composed, migrated, and fused' to bridge retrieval with parameter-based learning is load-bearing for the central claim, yet no operators, fusion semantics, migration protocols, or scheduling algorithms are defined.
  2. [Abstract] Abstract: the claim that an explicit memory layer 'can substantially reduce' training and inference costs lacks any cost model, equations, or quantitative comparison to RAG or fine-tuning baselines.
  3. [Abstract] Abstract: the benefits of controllability, plasticity, and evolvability are asserted without specifying how transitions between memory types are realized inside training or inference loops, or how versioning and provenance metadata are maintained across fusions.
minor comments (1)
  1. The manuscript would benefit from an explicit system diagram or pseudocode section illustrating MemCube lifecycle operations.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We agree that the abstract would benefit from greater specificity on mechanisms and will revise it to better support the central claims while preserving the paper's focus as a framework proposal. Point-by-point responses to the major comments are provided below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the assertion that MemCubes 'can be composed, migrated, and fused' to bridge retrieval with parameter-based learning is load-bearing for the central claim, yet no operators, fusion semantics, migration protocols, or scheduling algorithms are defined.

    Authors: The body of the manuscript (Sections 3–5) defines these elements: composition operators (union/intersection with metadata alignment), fusion semantics (provenance-preserving merge rules), migration protocols (via the MemCube scheduler), and scheduling algorithms (priority-based eviction and promotion). To address the concern that the abstract is insufficiently self-contained, we will add a concise sentence summarizing these mechanisms and reference the relevant sections. revision: yes

  2. Referee: [Abstract] Abstract: the claim that an explicit memory layer 'can substantially reduce' training and inference costs lacks any cost model, equations, or quantitative comparison to RAG or fine-tuning baselines.

    Authors: The claim is grounded in the memory-hierarchy cost analysis cited in the introduction. We acknowledge that the abstract does not include the equations or direct comparisons. In revision we will insert a brief reference to the cost model and add a short quantitative comparison paragraph (drawing on the cited prior work) to the abstract and discussion section. revision: yes

  3. Referee: [Abstract] Abstract: the benefits of controllability, plasticity, and evolvability are asserted without specifying how transitions between memory types are realized inside training or inference loops, or how versioning and provenance metadata are maintained across fusions.

    Authors: Section 4 describes the MemOS scheduler realizing type transitions inside both training and inference loops, with versioning and provenance maintained via immutable metadata logs that survive fusion through a defined merge protocol. We will revise the abstract to include a high-level statement of these transition and metadata mechanisms. revision: yes

Circularity Check

0 steps flagged

No significant circularity in conceptual system proposal

full rationale

The paper presents a high-level architectural proposal for MemOS without any mathematical derivations, equations, or quantitative predictions. Concepts such as MemCubes are introduced descriptively as encapsulating content and metadata, with operations like composition, migration, and fusion stated as capabilities rather than derived from prior definitions or fitted inputs. No self-citation chains, uniqueness theorems, or ansatzes are invoked to support core claims; the benefits of controllability, plasticity, and evolvability are framed as outcomes of the proposed framework itself. The text contains no load-bearing reductions where a result equals its inputs by construction, making the derivation chain self-contained as a system design document.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The central claim rests on domain assumptions about LLM memory limitations and the value of an explicit unified layer; no free parameters or fitted values are introduced, and MemCube and MemOS are new conceptual entities without independent evidence.

axioms (2)
  • domain assumption LLMs lack well-defined memory management systems that support long-term tracking and evolution of knowledge
    Stated as the core problem motivating the proposal in the abstract.
  • domain assumption An explicit memory layer between parameters and external retrieval can reduce costs and enable unification of heterogeneous knowledge
    Referenced via recent work and adopted as the basis for MemOS design.
invented entities (2)
  • MemCube no independent evidence
    purpose: Basic unit that encapsulates memory content together with metadata such as provenance and versioning
    Introduced as the fundamental building block enabling composition, migration, and fusion of memories.
  • MemOS no independent evidence
    purpose: Memory operating system that unifies representation, scheduling, and evolution of multiple memory types
    Core proposed framework for bringing controllability to LLM memory.

pith-pipeline@v0.9.0 · 5722 in / 1553 out tokens · 52679 ms · 2026-05-15T08:15:28.772869+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 25 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. AlpsBench: An LLM Personalization Benchmark for Real-Dialogue Memorization and Preference Alignment

    cs.CL 2026-03 unverdicted novelty 8.0

    AlpsBench supplies 2500 real-dialogue sequences with verified memories to benchmark LLM extraction, updating, retrieval, and utilization of personalized information.

  2. Agentic Recommender System with Hierarchical Belief-State Memory

    cs.CL 2026-05 unverdicted novelty 7.0

    MARS uses hierarchical memory and LLM planning to achieve 26.4% higher HR@1 on InstructRec benchmarks compared to prior methods.

  3. Belief Memory: Agent Memory Under Partial Observability

    cs.AI 2026-05 unverdicted novelty 7.0

    BeliefMem is a probabilistic memory architecture for LLM agents that retains multiple candidate conclusions with probabilities updated by Noisy-OR, achieving superior average performance over deterministic baselines o...

  4. Belief Memory: Agent Memory Under Partial Observability

    cs.AI 2026-05 unverdicted novelty 7.0

    BeliefMem stores multiple candidate conclusions with probabilities in agent memory and updates them via Noisy-OR rules to preserve uncertainty under partial observability.

  5. Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory

    cs.CL 2026-05 unverdicted novelty 7.0

    MemCoE learns memory organization guidelines via contrastive feedback and then trains a guideline-aligned RL policy for memory updates, yielding consistent gains on personalization benchmarks.

  6. Cognifold: Always-On Proactive Memory via Cognitive Folding

    cs.AI 2026-05 unverdicted novelty 6.0

    Cognifold is a new proactive memory architecture that folds event streams into emergent cognitive structures by extending complementary learning systems theory with a prefrontal intent layer and graph topology self-or...

  7. HAGE: Harnessing Agentic Memory via RL-Driven Weighted Graph Evolution

    cs.AI 2026-05 unverdicted novelty 6.0

    HAGE proposes a trainable weighted graph memory framework with LLM intent classification, dynamic edge modulation, and RL optimization that improves long-horizon reasoning accuracy in agentic LLMs over static baselines.

  8. MemPrivacy: Privacy-Preserving Personalized Memory Management for Edge-Cloud Agents

    cs.CR 2026-05 unverdicted novelty 6.0

    MemPrivacy replaces privacy-sensitive spans with structured placeholders on edge devices to enable effective cloud memory management while limiting utility loss to 1.6% and outperforming general models on privacy extraction.

  9. MemPrivacy: Privacy-Preserving Personalized Memory Management for Edge-Cloud Agents

    cs.CR 2026-05 unverdicted novelty 6.0

    MemPrivacy uses edge-side privacy span detection and semantic placeholders to enable cloud memory management for LLM agents while limiting utility loss to 1.6% and outperforming masking baselines.

  10. MemPrivacy: Privacy-Preserving Personalized Memory Management for Edge-Cloud Agents

    cs.CR 2026-05 unverdicted novelty 6.0

    MemPrivacy uses edge detection of sensitive spans and type-aware placeholders to enable cloud-side memory management for LLM agents without exposing private data, achieving under 1.6% utility loss.

  11. MemReader: From Passive to Active Extraction for Long-Term Agent Memory

    cs.CL 2026-04 unverdicted novelty 6.0

    MemReader uses distilled passive and GRPO-trained active extractors to selectively write low-noise long-term memories, outperforming passive baselines on knowledge updating, temporal reasoning, and hallucination tasks.

  12. HingeMem: Boundary Guided Long-Term Memory with Query Adaptive Retrieval for Scalable Dialogues

    cs.CL 2026-04 unverdicted novelty 6.0

    HingeMem segments dialogue memory via boundary-triggered hyperedges over four elements and applies query-adaptive retrieval, yielding ~20% relative gains and 68% lower QA token cost versus baselines on LOCOMO.

  13. FileGram: Grounding Agent Personalization in File-System Behavioral Traces

    cs.CV 2026-04 unverdicted novelty 6.0

    FileGram grounds AI agent personalization in file-system behavioral traces via a data simulation engine, a diagnostic benchmark, and a bottom-up memory architecture.

  14. Memory in the LLM Era: Modular Architectures and Strategies in a Unified Framework

    cs.CL 2026-04 unverdicted novelty 6.0

    A unified framework for LLM agent memory is benchmarked, with a new hybrid method outperforming state-of-the-art on standard tasks.

  15. Oblivion: Self-Adaptive Agentic Memory Control through Decay-Driven Activation

    cs.CL 2026-03 unverdicted novelty 6.0

    Oblivion is a decay-driven memory framework that decouples read and write paths in LLM agents to enable adaptive forgetting and reinforcement for better long-horizon reasoning.

  16. MemFactory: Unified Inference & Training Framework for Agent Memory

    cs.CL 2026-03 unverdicted novelty 6.0

    MemFactory is a new unified modular framework for memory-augmented LLM agent inference and training that integrates GRPO and reports up to 14.8% relative gains on MemAgent evaluations.

  17. Safety in Embodied AI: A Survey of Risks, Attacks, and Defenses

    cs.CR 2026-03 unverdicted novelty 6.0

    The survey organizes over 400 papers on embodied AI safety into a multi-level taxonomy and flags overlooked issues such as fragile multimodal fusion and unstable planning under jailbreaks.

  18. PersonaVLM: Long-Term Personalized Multimodal LLMs

    cs.CL 2026-03 unverdicted novelty 6.0

    PersonaVLM adds memory extraction, multi-turn retrieval-based reasoning, and personality inference to multimodal LLMs, yielding 22.4% gains on a new long-term personalization benchmark and outperforming GPT-4o.

  19. MemReread: Enhancing Agentic Long-Context Reasoning via Memory-Guided Rereading

    cs.CL 2026-05 unverdicted novelty 5.0

    MemReread improves agent long-context reasoning by triggering rereading on insufficient final memory to recover discarded indirect facts, outperforming baselines at linear complexity.

  20. MemReranker: Reasoning-Aware Reranking for Agent Memory Retrieval

    cs.CL 2026-05 unverdicted novelty 5.0

    MemReranker applies multi-stage distillation to Qwen3-Reranker to produce reasoning-aware rerankers that outperform baselines on memory tasks with temporal and causal constraints.

  21. Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering

    cs.SE 2026-04 accept novelty 5.0

    LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.

  22. MemCoT: Test-Time Scaling through Memory-Driven Chain-of-Thought

    cs.MA 2026-04 unverdicted novelty 5.0

    MemCoT redefines long-context reasoning as iterative stateful search with zoom-in/zoom-out memory perception and dual short-term memories, claiming SOTA results on LoCoMo and LongMemEval-S benchmarks.

  23. MemMachine: A Ground-Truth-Preserving Memory System for Personalized AI Agents

    cs.AI 2026-04 unverdicted novelty 5.0

    MemMachine stores entire conversational episodes and applies contextualized retrieval plus adaptive query routing to achieve 0.9169 accuracy on LoCoMo and 93 percent on LongMemEvalS while using 80 percent fewer tokens...

  24. MemReranker: Reasoning-Aware Reranking for Agent Memory Retrieval

    cs.CL 2026-05 unverdicted novelty 4.0

    MemReranker applies multi-teacher pairwise distillation, BCE pointwise training, and InfoNCE contrastive learning on mixed general and memory-specific dialogue data to produce efficient rerankers that improve calibrat...

  25. Memory as Metabolism: A Design for Companion Knowledge Systems

    cs.AI 2026-04 unverdicted novelty 4.0

    This paper designs a companion knowledge system with TRIAGE, DECAY, CONTEXTUALIZE, CONSOLIDATE, and AUDIT operations plus memory gravity and minority-hypothesis retention to give contradictory evidence a path to updat...

Reference graph

Works this paper leans on

106 extracted references · 106 canonical work pages · cited by 21 Pith papers · 23 internal anchors

  1. [1]

    Memory3: Language modeling with explicit memory.Journal of Machine Learning, 3(3):300–346, January 2024

    Hongkang Yang, Zehao Lin, Wenjin Wang, Hao Wu, Zhiyu Li, Bo Tang, Wenqiang Wei, Jinbo Wang, Zeyun Tang, Shichao Song, Chenyang Xi, Yu Yu, Kai Chen, Feiyu Xiong, Linpeng Tang, and Weinan E. Memory3: Language modeling with explicit memory.Journal of Machine Learning, 3(3):300–346, January 2024

  2. [2]

    A Survey of Large Language Models

    Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, et al. A survey of large language models.arXiv preprint arXiv:2303.18223, 1(2), 2023

  3. [3]

    Codet5+: Open code large language models for code understanding and generation.arXiv preprint arXiv:2305.07922, 2023

    Yue Wang, Hung Le, Akhilesh Deepak Gotmare, Nghi DQ Bui, Junnan Li, and Steven CH Hoi. Codet5+: Open code large language models for code understanding and generation.arXiv preprint arXiv:2305.07922, 2023

  4. [4]

    From linguistic giants to sensory maestros: A survey on cross-modal reasoning with large language models.arXiv preprint arXiv:2409.18996, 2024

    Shengsheng Qian, Zuyi Zhou, Dizhan Xue, Bing Wang, and Changsheng Xu. From linguistic giants to sensory maestros: A survey on cross-modal reasoning with large language models.arXiv preprint arXiv:2409.18996, 2024

  5. [5]

    Retrieval-Augmented Generation for AI-Generated Content: A Survey

    Penghao Zhao, Hailin Zhang, Qinhan Yu, Zhengren Wang, Yunteng Geng, Fangcheng Fu, Ling Yang, Wentao Zhang, and Bin Cui. Retrieval-augmented generation for ai-generated content: A survey.CoRR, abs/2402.19473, 2024. 31

  6. [6]

    Retrieval-Augmented Generation for Large Language Models: A Survey

    Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Qianyu Guo, Meng Wang, and Haofen Wang. Retrieval-augmented generation for large language models: A survey.CoRR, abs/2312.10997, 2023

  7. [7]

    A survey of graph retrieval-augmented generation for customized large language models.arXiv preprint arXiv:2501.13958, 2025

    Qinggang Zhang, Shengyuan Chen, Yuanchen Bei, Zheng Yuan, Huachi Zhou, Zijin Hong, Junnan Dong, Hao Chen, Yi Chang, and Xiao Huang. A survey of graph retrieval-augmented generation for customized large language models. CoRR, abs/2501.13958, 2025

  8. [8]

    Rossi, Franck Dernoncourt, Md

    Bo Ni, Zheyuan Liu, Leyao Wang, Yongjia Lei, Yuying Zhao, Xueqi Cheng, Qingkai Zeng, Luna Dong, Yinglong Xia, Krishnaram Kenthapadi, Ryan A. Rossi, Franck Dernoncourt, Md. Mehrab Tanjim, Nesreen K. Ahmed, Xiaorui Liu, Wenqi Fan, Erik Blasch, Yu Wang, Meng Jiang, and Tyler Derr. Towards trustworthy retrieval augmented generation for large language models: ...

  9. [9]

    Walking down the memory maze: Beyond context limit through interactive reading.CoRR, abs/2310.05029, 2023

    Howard Chen, Ramakanth Pasunuru, Jason Weston, and Asli Celikyilmaz. Walking down the memory maze: Beyond context limit through interactive reading.CoRR, abs/2310.05029, 2023

  10. [10]

    From Local to Global: A Graph RAG Approach to Query-Focused Summarization

    Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, and Jonathan Larson. From local to global: A graph RAG approach to query-focused summarization.CoRR, abs/2404.16130, 2024

  11. [11]

    LightRAG: Simple and Fast Retrieval-Augmented Generation

    Zirui Guo, Lianghao Xia, Yanhua Yu, Tu Ao, and Chao Huang. Lightrag: Simple and fast retrieval-augmented generation. CoRR, abs/2410.05779, 2024

  12. [12]

    Retrieval augmented generation (rag) in azure ai search, 2025

    Microsoft. Retrieval augmented generation (rag) in azure ai search, 2025

  13. [13]

    Vertex ai search, 2025

    Google. Vertex ai search, 2025

  14. [14]

    Build innovative ai search experiences, 2025

    Elastic. Build innovative ai search experiences, 2025

  15. [15]

    Agentic rag-as-a-service company, 2025

    Nuclia. Agentic rag-as-a-service company, 2025

  16. [16]

    Freshllms: Refreshing large language models with search engine augmentation, 2023

    Tu Vu, Mohit Iyyer, Xuezhi Wang, Noah Constant, Jerry Wei, Jason Wei, Chris Tar, Yun-Hsuan Sung, Denny Zhou, Quoc Le, and Thang Luong. Freshllms: Refreshing large language models with search engine augmentation, 2023

  17. [17]

    GPT-4 Technical Report

    Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023

  18. [18]

    Cursor - The AI Code Editor

  19. [19]

    Yiming Du, Wenyu Huang, Danna Zheng, Zhaowei Wang, Sebastien Montella, Mirella Lapata, Kam-Fai Wong, and Jeff Z. Pan. Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions, May 2025. arXiv:2505.00675 [cs]

  20. [20]

    From Human Memory to AI Memory: A Survey on Memory Mechanisms in the Era of LLMs, April 2025

    Yaxiong Wu, Sheng Liang, Chen Zhang, Yichao Wang, Yongyue Zhang, Huifeng Guo, Ruiming Tang, and Yong Liu. From Human Memory to AI Memory: A Survey on Memory Mechanisms in the Era of LLMs, April 2025. arXiv:2504.15965 [cs]

  21. [21]

    Cognitive Memory in Large Language Models, April 2025

    Lianlei Shan, Shixian Luo, Zezhou Zhu, Yu Yuan, and Yong Wu. Cognitive Memory in Large Language Models, April 2025. arXiv:2504.02441 [cs]

  22. [22]

    Language Models are Unsupervised Multitask Learners

    Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language Models are Unsupervised Multitask Learners

  23. [23]

    Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin...

  24. [24]

    Prefix-Tuning: Optimizing Continuous Prompts for Generation

    Xiang Lisa Li and Percy Liang. Prefix-Tuning: Optimizing Continuous Prompts for Generation. InProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume1: Long Papers), pages 4582–4597, Online, 2021. Associa- tion for Computational Linguistics. 32

  25. [25]

    The Power of Scale for Parameter-Efficient Prompt Tuning

    Brian Lester, Rami Al-Rfou, and Noah Constant. The Power of Scale for Parameter-Efficient Prompt Tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3045–3059, Online and Punta Cana, Dominican Republic, 2021. Association for Computational Linguistics

  26. [26]

    arXiv preprint arXiv:2103.10385 , year=

    Xiao Liu, Yanan Zheng, Zhengxiao Du, Ming Ding, Yujie Qian, Zhilin Yang, and Jie Tang. GPT Understands, Too, October 2023. arXiv:2103.10385 [cs]

  27. [27]

    P-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks

    Xiao Liu, Kaixuan Ji, Yicheng Fu, Weng Tam, Zhengxiao Du, Zhilin Yang, and Jie Tang. P-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume2: Short Papers), pages 61–68, Dublin, Ireland, 2022. Association for Computational Linguistics

  28. [28]

    Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, and Ryan Lowe. Training language models to follow instructions with human feedback,...

  29. [29]

    Efficient Memory Management for Large Language Model Serving with PagedAttention

    Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph Gonzalez, Hao Zhang, and Ion Stoica. Efficient Memory Management for Large Language Model Serving with PagedAttention. In Proceedings of the 29th Symposium on Operating Systems Principles, pages 611–626, Koblenz Germany, October 2023. ACM

  30. [30]

    Efficient Streaming Language Models with Attention Sinks

    Guangxuan Xiao, Yuandong Tian, Beidi Chen, Song Han, and Mike Lewis. Efficient Streaming Language Models with Attention Sinks. October 2023

  31. [31]

    H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.Advances in Neural Information Processing Systems, 36:34661–34710, December 2023

    Zhenyu Zhang, Ying Sheng, Tianyi Zhou, Tianlong Chen, Lianmin Zheng, Ruisi Cai, Zhao Song, Yuandong Tian, Christopher Ré, Clark Barrett, Zhangyang "Atlas" Wang, and Beidi Chen. H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.Advances in Neural Information Processing Systems, 36:34661–34710, December 2023

  32. [32]

    Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference

    Harry Dong, Xinyu Yang, Zhenyu Zhang, Zhangyang Wang, Yuejie Chi, and Beidi Chen. Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference. June 2024

  33. [33]

    Mahoney, Yakun S

    Coleman Hooper, Sehoon Kim, Hiva Mohammadzadeh, Michael W. Mahoney, Yakun S. Shao, Kurt Keutzer, and Amir Gholami. KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization. Advancesin Neural Information Processing Systems, 37:1270–1303, December 2024

  34. [34]

    RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval, December 2024

    Di Liu, Meng Chen, Baotong Lu, Huiqiang Jiang, Zhenhua Han, Qianxi Zhang, Qi Chen, Chengruidong Zhang, Bailu Ding, Kai Zhang, Chen Chen, Fan Yang, Yuqing Yang, and Lili Qiu. RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval, December 2024. arXiv:2409.10516 [cs]

  35. [35]

    Extracting Latent Steering Vectors from Pretrained Language Models

    Nishant Subramani, Nivedita Suresh, and Matthew Peters. Extracting Latent Steering Vectors from Pretrained Language Models. In Findings of the Association for Computational Linguistics: ACL 2022, pages 566–581, Dublin, Ireland, 2022. Association for Computational Linguistics

  36. [36]

    In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering, February 2024

    Sheng Liu, Haotian Ye, Lei Xing, and James Zou. In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering, February 2024. arXiv:2311.06668 [cs]

  37. [37]

    Steering Language Models With Activation Engineering

    Alexander Matt Turner, Lisa Thiergart, Gavin Leech, David Udell, Juan J. Vazquez, Ulisse Mini, and Monte MacDiarmid. Steering Language Models With Activation Engineering, October 2024. arXiv:2308.10248 [cs]

  38. [38]

    Style vectors for steering generative large language models

    Kai Konen, Sophie Jentzsch, Diaoulé Diallo, Peer Schütt, Oliver Bensch, Roxanne El Baff, Dominik Opitz, and Tobias Hecking. Style vectors for steering generative large language models. In Yvette Graham and Matthew Purver, editors, Findings of the Association for Computational Linguistics: EACL 2024, pages 782–802, St. Julian’s, Malta, March 2024. Associat...

  39. [39]

    Steering Llama 2 via Contrastive Activation Addition

    Nina Rimsky, Nick Gabrieli, Julian Schulz, Meg Tong, Evan Hubinger, and Alexander Turner. Steering Llama 2 via Contrastive Activation Addition. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors,Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume1: Long Papers), pages 15504–15522, Bangkok, Thailand, Augus...

  40. [40]

    FreeCtrl: Constructing Control Centers with Feedforward Layers for Learning-Free Controllable Text Generation

    Zijian Feng, Hanzhang Zhou, Kezhi Mao, and Zixiao Zhu. FreeCtrl: Constructing Control Centers with Feedforward Layers for Learning-Free Controllable Text Generation. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume1: Long Papers), pages 7627–7640, Bangkok, Thailand,

  41. [41]

    Association for Computational Linguistics. 33

  42. [42]

    EasyEdit2: An Easy-to-use Steering Framework for Editing Large Language Models, April 2025

    Ziwen Xu, Shuxun Wang, Kewei Xu, Haoming Xu, Mengru Wang, Xinle Deng, Yunzhi Yao, Guozhou Zheng, Huajun Chen, and Ningyu Zhang. EasyEdit2: An Easy-to-use Steering Framework for Editing Large Language Models, April 2025. arXiv:2504.15133 [cs]

  43. [43]

    Enhancing Multiple Dimensions of Trustworthiness in LLMs via Sparse Activation Control

    Yuxin Xiao, Chaoqun Wan, Yonggang Zhang, Wenxiao Wang, Binbin Lin, Xiaofei He, Xu Shen, and Jieping Ye. Enhancing Multiple Dimensions of Trustworthiness in LLMs via Sparse Activation Control. November 2024

  44. [44]

    DESTEIN: Navigating Detoxification of Language Models via Universal Steering Pairs and Head-wise Activation Fusion, August 2024

    Yu Li, Han Jiang, Chuanyang Gong, and Zhihua Wei. DESTEIN: Navigating Detoxification of Language Models via Universal Steering Pairs and Head-wise Activation Fusion, August 2024. arXiv:2404.10464 [cs]

  45. [45]

    Word Embeddings Are Steers for Language Models

    Chi Han, Jialiang Xu, Manling Li, Yi Fung, Chenkai Sun, Nan Jiang, Tarek Abdelzaher, and Heng Ji. Word Embeddings Are Steers for Language Models. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors, Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 16410–16430, Bangkok, Thailand, A...

  46. [46]

    Generalization through Memorization: Nearest Neighbor Language Models

    Urvashi Khandelwal, Omer Levy, Dan Jurafsky, Luke Zettlemoyer, and Mike Lewis. Generalization through Memorization: Nearest Neighbor Language Models. September 2019

  47. [47]

    Goodtriever: Adaptive Toxicity Mitigation with Retrieval-augmented Models

    Luiza Pozzobon, Beyza Ermis, Patrick Lewis, and Sara Hooker. Goodtriever: Adaptive Toxicity Mitigation with Retrieval-augmented Models. InFindings of the Association for Computational Linguistics: EMNLP 2023, pages 5108–5125, Singapore, 2023. Association for Computational Linguistics

  48. [48]

    Noderag: Structuring graph-based rag with heterogeneous nodes, 2025

    Tianyang Xu, Haojie Zheng, Chengze Li, Haoxiang Chen, Yixin Liu, Ruoxi Chen, and Lichao Sun. Noderag: Structuring graph-based rag with heterogeneous nodes, 2025

  49. [49]

    Heterag: A heterogeneous retrieval-augmented generation framework with decoupled knowledge representations, 2025

    Peiru Yang, Xintian Li, Zhiyang Hu, Jiapeng Wang, Jinhua Yin, Huili Wang, Lizhi He, Shuai Yang, Shangguang Wang, Yongfeng Huang, and Tao Qi. Heterag: A heterogeneous retrieval-augmented generation framework with decoupled knowledge representations, 2025

  50. [50]

    Hypergraphrag: Retrieval-augmented generation with hypergraph-structured knowledge representation.CoRR, abs/2503.21322, 2025

    Haoran Luo, Haihong E, Guanting Chen, Yandan Zheng, Xiaobao Wu, Yikai Guo, Qika Lin, Yu Feng, Ze-min Kuang, Meina Song, Yifan Zhu, and Luu Anh Tuan. Hypergraphrag: Retrieval-augmented generation with hypergraph-structured knowledge representation.CoRR, abs/2503.21322, 2025

  51. [51]

    Hipporag: Neurobiologically inspired long-term memory for large language models

    Bernal Jimenez Gutierrez, Yiheng Shu, Yu Gu, Michihiro Yasunaga, and Yu Su. Hipporag: Neurobiologically inspired long-term memory for large language models. In Amir Globersons, Lester Mackey, Danielle Belgrave, Angela Fan, Ulrich Paquet, Jakub M. Tomczak, and Cheng Zhang, editors,Advances in Neural Information Processing Systems 38: Annual Conference on N...

  52. [52]

    From rag to memory: Non-parametric continual learning for large language models.arXiv preprint arXiv:2502.14802, 2025

    Bernal Jiménez Gutiérrez, Yiheng Shu, Weijian Qi, Sizhe Zhou, and Yu Su. From RAG to memory: Non- parametric continual learning for large language models.CoRR, abs/2502.14802, 2025

  53. [53]

    Empowering large language models to set up a knowledge retrieval indexer via self-learning,

    Xiang Liang, Simin Niu, Zhiyu Li, Sensen Zhang, Shichao Song, Hanyu Wang, Jiawei Yang, Feiyu Xiong, Bo Tang, and Chenyang Xi. Empowering large language models to set up a knowledge retrieval indexer via self-learning. CoRR, abs/2405.16933, 2024

  54. [55]

    A-MEM: Agentic Memory for LLM Agents

    Wujiang Xu, Zujie Liang, Kai Mei, Hang Gao, Juntao Tan, and Yongfeng Zhang. A-MEM: agentic memory for LLM agents.CoRR, abs/2502.12110, 2025

  55. [56]

    Mem0: Building production- ready ai agents with scalable long-term memory, 2025

    Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav. Mem0: Building production- ready ai agents with scalable long-term memory, 2025

  56. [57]

    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Jill Burstein, Christy Doran, and Thamar Solorio, editors, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies,Volu...

  57. [58]

    Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

    Yuntao Bai, Andy Jones, Kamal Ndousse, Amanda Askell, Anna Chen, Nova DasSarma, Dawn Drain, Stanislav Fort, Deep Ganguli, Tom Henighan, Nicholas Joseph, Saurav Kadavath, Jackson Kernion, Tom Conerly, Sheer El-Showk, Nelson Elhage, Zac Hatfield-Dodds, Danny Hernandez, Tristan Hume, Scott Johnston, Shauna Kravec, Liane Lovitt, Neel Nanda, Catherine Olsson, ...

  58. [59]

    Varshney, Caiming Xiong, and Richard Socher

    Nitish Shirish Keskar, Bryan McCann, Lav R. Varshney, Caiming Xiong, and Richard Socher. CTRL: A Conditional Transformer Language Model for Controllable Generation, September 2019. arXiv:1909.05858 [cs]

  59. [60]

    Llama SLayer 8B: Shallow Layers Hold the Key to Knowledge Injection

    Tianxiang Chen, Zhentao Tan, Tao Gong, Yue Wu, Qi Chu, Bin Liu, Jieping Ye, and Nenghai Yu. Llama SLayer 8B: Shallow Layers Hold the Key to Knowledge Injection. InFindings of the Association for Computational Linguistics: EMNLP 2024, pages 5991–6002, Miami, Florida, USA, 2024. Association for Computational Linguistics

  60. [61]

    LoRA: Low-Rank Adaptation of Large Language Models

    Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-Rank Adaptation of Large Language Models, October 2021. arXiv:2106.09685 [cs]

  61. [62]

    Parametric Retrieval Augmented Generation, January 2025

    Weihang Su, Yichen Tang, Qingyao Ai, Junxi Yan, Changyue Wang, Hongning Wang, Ziyi Ye, Yujia Zhou, and Yiqun Liu. Parametric Retrieval Augmented Generation, January 2025. arXiv:2501.15915 [cs]

  62. [63]

    Better wit than wealth: Dynamic Parametric Retrieval Augmented Generation for Test-time Knowledge Enhancement, March 2025

    Yuqiao Tan, Shizhu He, Huanxuan Liao, Jun Zhao, and Kang Liu. Better wit than wealth: Dynamic Parametric Retrieval Augmented Generation for Test-time Knowledge Enhancement, March 2025. arXiv:2503.23895 [cs]

  63. [64]

    Manning, and Chelsea Finn

    Eric Mitchell, Charles Lin, Antoine Bosselut, Christopher D. Manning, and Chelsea Finn. Memory-Based Model Editing at Scale. InProceedings of the 39th International Conference on Machine Learning, pages 15817–15831. PMLR, June 2022. ISSN: 2640-3498

  64. [65]

    Calibrating Factual Knowledge in Pretrained Language Models

    Qingxiu Dong, Damai Dai, Yifan Song, Jingjing Xu, Zhifang Sui, and Lei Li. Calibrating Factual Knowledge in Pretrained Language Models. InFindings of the Association for Computational Linguistics: EMNLP 2022, pages 5937–5947, Abu Dhabi, United Arab Emirates, 2022. Association for Computational Linguistics

  65. [66]

    Decouple knowledge from paramters for plug-and-play language modeling

    Xin Cheng, Yankai Lin, Xiuying Chen, Dongyan Zhao, and Rui Yan. Decouple knowledge from paramters for plug-and-play language modeling. InFindings of the Association for Computational Linguistics: ACL 2023, pages 14288–14308, Toronto, Canada, 2023. Association for Computational Linguistics

  66. [67]

    Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors, October 2023

    Thomas Hartvigsen, Swami Sankaranarayanan, Hamid Palangi, Yoon Kim, and Marzyeh Ghassemi. Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors, October 2023. arXiv:2211.11031 [cs]

  67. [68]

    Locating and Editing Factual Associations in GPT, January 2023

    Kevin Meng, David Bau, Alex Andonian, and Yonatan Belinkov. Locating and Editing Factual Associations in GPT, January 2023. arXiv:2202.05262 [cs]

  68. [69]

    Mass-Editing Memory in a Transformer

    Kevin Meng, Arnab Sen Sharma, Alex Andonian, Yonatan Belinkov, and David Bau. Mass-Editing Memory in a Transformer, August 2023. arXiv:2210.07229 [cs]

  69. [70]

    AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models, March 2025

    Junfeng Fang, Houcheng Jiang, Kun Wang, Yunshan Ma, Shi Jie, Xiang Wang, Xiangnan He, and Tat-seng Chua. AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models, March 2025. arXiv:2410.02355 [cs]

  70. [71]

    AnyEdit: Edit Any Knowledge Encoded in Language Models, February 2025

    Houcheng Jiang, Junfeng Fang, Ningyu Zhang, Guojun Ma, Mingyang Wan, Xiang Wang, Xiangnan He, and Tat-seng Chua. AnyEdit: Edit Any Knowledge Encoded in Language Models, February 2025. arXiv:2502.05628 [cs]

  71. [72]

    A comprehensive study of knowledge editing for large language models,

    Ningyu Zhang, Yunzhi Yao, Bozhong Tian, Peng Wang, Shumin Deng, Mengru Wang, Zekun Xi, Shengyu Mao, Jintian Zhang, Yuansheng Ni, Siyuan Cheng, Ziwen Xu, Xin Xu, Jia-Chen Gu, Yong Jiang, Pengjun Xie, Fei Huang, Lei Liang, Zhiqiang Zhang, Xiaowei Zhu, Jun Zhou, and Huajun Chen. A Comprehensive Study of Knowledge Editing for Large Language Models, November 2...

  72. [73]

    Can We Continually Edit Language Models? On the Knowledge Attenuation in Sequential Model Editing

    Qi Li and Xiaowen Chu. Can We Continually Edit Language Models? On the Knowledge Attenuation in Sequential Model Editing. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors,Findings of the Association for Computational Linguistics: ACL 2024, pages 5438–5455, Bangkok, Thailand, August 2024. Association for Computational Linguistics

  73. [74]

    Mass-Editing Memory with Attention in Transformers: A cross-lingual exploration of knowledge

    Daniel Tamayo, Aitor Gonzalez-Agirre, Javier Hernando, and Marta Villegas. Mass-Editing Memory with Attention in Transformers: A cross-lingual exploration of knowledge. In Findings of the Association for Computational Linguistics ACL 2024, pages 5831–5847, 2024. arXiv:2502.02173 [cs]

  74. [75]

    Disentangling Memory and Reasoning Ability in Large Language Models, November 2024

    Mingyu Jin, Weidi Luo, Sitao Cheng, Xinyi Wang, Wenyue Hua, Ruixiang Tang, William Yang Wang, and Yongfeng Zhang. Disentangling Memory and Reasoning Ability in Large Language Models, November 2024. arXiv:2411.13504 [cs]. 35

  75. [76]

    Titans: Learning to Memorize at Test Time

    Ali Behrouz, Peilin Zhong, and Vahab Mirrokni. Titans: Learning to memorize at test time. CoRR, abs/2501.00663, 2025

  76. [77]

    Editing large language models: Problems, methods, and opportunities,

    Yunzhi Yao, Peng Wang, Bozhong Tian, Siyuan Cheng, Zhoubo Li, Shumin Deng, Huajun Chen, and Ningyu Zhang. Editing Large Language Models: Problems, Methods, and Opportunities, November 2023. arXiv:2305.13172 [cs]

  77. [78]

    BiasEdit: Debiasing Stereotyped Language Models via Model Editing, March 2025

    Xin Xu, Wei Xu, Ningyu Zhang, and Julian McAuley. BiasEdit: Debiasing Stereotyped Language Models via Model Editing, March 2025. arXiv:2503.08588 [cs]

  78. [79]

    Editing Factual Knowledge in Language Models, September 2021

    Nicola De Cao, Wilker Aziz, and Ivan Titov. Editing Factual Knowledge in Language Models, September 2021. arXiv:2104.08164 [cs]

  79. [80]

    Eric Mitchell, Charles Lin, Antoine Bosselut, Chelsea Finn, and Christopher D. Manning. Fast Model Editing at Scale. October 2021

  80. [81]

    Massive Editing for Large Language Models via Meta Learning

    Chenmien Tan, Ge Zhang, and Jie Fu. Massive Editing for Large Language Models via Meta Learning. October 2023

Showing first 80 references.