Personalization Meets Safety:Mechanisms,Risks,and Mitigations in Personalized LLMs

Chunxu Zhao; Cong Geng; Jie Liu; Junlan Feng; Lehao Xing; Pengwei Hu; Qian Hu; Qing Wang; Ruiqiao Bai; Xin Huang

arxiv: 2606.09038 · v1 · pith:S2LO6FRWnew · submitted 2026-06-08 · 💻 cs.AI

Personalization Meets Safety:Mechanisms,Risks,and Mitigations in Personalized LLMs

Yanyan Luo , Xue Han , Ruiqiao Bai , Xin Huang , Yitong Wang , Qian Hu , Qing Wang , Chunxu Zhao

show 5 more authors

Jie Liu Cong Geng Lehao Xing Pengwei Hu Junlan Feng

This is my paper

Pith reviewed 2026-06-27 16:47 UTC · model grok-4.3

classification 💻 cs.AI

keywords personalized LLMssafety risksrisk taxonomypersonalization paradigmsmitigation strategiesevaluation frameworksuser representationLLM agents

0 comments

The pith

Safety for personalized LLMs must account for each user's context

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reviews how personalization in large language models adapts outputs to user preferences, contexts, and histories, which also creates new safety challenges not covered by prior work. It structures the review around user representations, common personalization techniques such as prompting and fine-tuning, and evaluation methods, while building a taxonomy of associated risks at each level. The main argument identifies three shortcomings in existing studies: safety checks treat all users the same instead of varying with the user relationship, techniques are studied separately rather than in combination, and evaluations miss risks that develop over extended use. Readers would care because these models are moving into widespread use, and without addressing the intersection, harms could become more targeted and harder to detect.

Core claim

We present the first comprehensive, safety-aware review of personalized LLMs. We organize personalization along three dimensions—user representation, personalization paradigm, and evaluation—and introduce a unified taxonomy of safety risks. At the representation level, we analyze risks arising from diverse user representations. Across mainstream personalization paradigms, we delineate vulnerabilities inherent to prompting, retrieval augmentation, parameter fine-tuning, reinforcement learning, Mixture-of-Experts, pruning, agent frameworks, and multimodal personalization, and synthesize mitigation strategies across the model lifecycle. Beyond these fine-grained risks, we characterize paradigm-

What carries the argument

Unified taxonomy of safety risks structured along the three dimensions of user representation, personalization paradigm, and evaluation.

If this is right

Safety evaluations must shift from uniform metrics to ones that vary with the user relationship.
Personalization techniques require analysis in combination rather than one at a time.
Evaluation frameworks need new methods to detect risks that emerge over long periods of use.
Mitigation approaches should be applied across the full model lifecycle for all listed paradigms.
Personalized datasets and benchmarks should support testing of relational and compositional safety.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The relational safety view could support experiments that create user-specific threat models to measure differences in risk.
The composition gap points to possible new tests that combine multiple personalization methods in one system.
Long-term risk assessment might use simulated multi-turn user histories to surface issues not visible in short evaluations.
The taxonomy could be extended to privacy concerns that arise when user representations are stored or updated over time.

Load-bearing premise

The proposed taxonomy and gap analysis fully capture the main risks and shortcomings at the intersection of personalization and safety without major omissions from the literature.

What would settle it

A follow-up review or empirical study that identifies major safety risks or gaps in personalized LLMs not covered by the taxonomy or the three structural inadequacies would show the analysis is incomplete.

read the original abstract

Large Language Models (LLMs) have enabled increasingly personalized interactions by adapting to users' preferences, contexts, and long-term histories. However, the mechanisms that enable personalization also expand the safety landscape in ways not systematically addressed by existing literature. Existing reviews typically focus either on personalization or safety, leaving their intersection largely unexplored. We present the first comprehensive, safety-aware review of personalized LLMs. We organize personalization along three dimensions-user representation, personalization paradigm, and evaluation-and introduce a unified taxonomy of safety risks. At the representation level, we analyze risks arising from diverse user representations. Across mainstream personalization paradigms, we delineate vulnerabilities inherent to prompting, retrieval augmentation, parameter fine-tuning, reinforcement learning, Mixture-of-Experts (MoE), pruning, agent frameworks, and multimodal personalization, and synthesize mitigation strategies across the model lifecycle. Beyond these fine-grained risks, we characterize paradigm-agnostic safety risks arising from personalized adaptation. We further summarize personalized datasets and evaluation methodologies. Through a case study of OpenClaw, we analyze deployment trends in personalized agent ecosystems. Our analysis reveals three structural inadequacies in existing research: safety is evaluated as user-invariant rather than relational, personalization techniques are analyzed in isolation rather than in composition, and evaluation frameworks cannot capture emergent long-term risks. By jointly examining personalized representations, personalization paradigms, safety risks, defenses, and evaluation methods, we provide a unified framework for developing safe personalized LLMs and highlight key directions for future research.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A survey that organizes safety risks across personalization methods in LLMs and names three field-wide gaps, but the gap diagnosis rests on synthesis without clear checks against counterexamples in the cited work.

read the letter

This paper is a review that builds a taxonomy of safety risks for personalized LLMs and calls out three structural problems in prior research: safety treated as user-invariant rather than relational, techniques studied in isolation, and evaluations missing long-term emergent risks.

It does a solid job laying out the space along user representations, then across paradigms including prompting, retrieval, fine-tuning, RL, MoE, pruning, agents, and multimodal cases. It links each to specific vulnerabilities, pulls together mitigation ideas across the model lifecycle, summarizes datasets and evaluation methods, and adds the OpenClaw case study on agent deployment trends. That kind of consolidation is useful when the two areas have mostly been reviewed separately.

The soft spot is the load-bearing claim about the three inadequacies. The paper positions its taxonomy and framework as the fix, yet the evidence is the authors' reading of the literature. If some of the cited papers already handle relational safety or compositional analysis even without using those exact terms, the diagnosis of structural failure weakens. The case study would also need to demonstrate a concrete long-term risk that current frameworks miss. As a synthesis, there is always the chance of selection effects in what gets included.

This is for researchers who need a single map of the personalization-safety intersection before starting new work on safe systems. It is not the source for new mechanisms or measurements.

The organizational effort is clear enough to merit referee time so the gap claims can be tested against the full set of references.

Referee Report

2 major / 2 minor

Summary. The paper claims to deliver the first comprehensive safety-aware review of personalized LLMs. It organizes personalization along user representation, paradigm, and evaluation dimensions; introduces a unified taxonomy of safety risks spanning prompting, RAG, fine-tuning, RL, MoE, pruning, agents, and multimodal methods; synthesizes mitigations across the model lifecycle; discusses datasets and evaluation methodologies; presents an OpenClaw case study on personalized agent ecosystems; and identifies three structural inadequacies in prior work—safety evaluated as user-invariant rather than relational, personalization techniques analyzed in isolation rather than in composition, and evaluation frameworks unable to capture emergent long-term risks—while offering a unified framework for safe personalized LLMs.

Significance. If the gap analysis and taxonomy hold, the work supplies a needed synthesis at the personalization-safety intersection and surfaces actionable directions for future research. The explicit coverage of multiple paradigms and the OpenClaw deployment case study are concrete strengths. The manuscript does not contain machine-checked proofs or parameter-free derivations, but its value lies in the breadth of the literature synthesis and the framing of paradigm-agnostic risks.

major comments (2)

[Abstract / concluding discussion of structural inadequacies] Abstract and concluding discussion of structural inadequacies: the central claim that existing research exhibits exactly the three named inadequacies (user-invariant safety evaluation, isolated paradigm analysis, inability to capture long-term emergent risks) is load-bearing for the paper's positioning of its taxonomy and framework as the remedy. The manuscript provides no systematic mapping, table, or quantitative breakdown showing how the cited works were classified with respect to these three criteria, leaving open the possibility that relational or compositional treatments already exist in the reviewed literature but were not framed as such.
[OpenClaw case study] OpenClaw case study section: to substantiate that current evaluation frameworks cannot capture emergent long-term risks, the case study must explicitly contrast observed deployment behaviors against the limitations of the evaluation methodologies summarized earlier in the paper. The current presentation does not include such a direct linkage or falsifiable prediction that would demonstrate the claimed gap.

minor comments (2)

[Taxonomy and risk synthesis sections] The taxonomy of risks would benefit from an explicit table that cross-references each personalization paradigm with the specific risks and mitigations discussed, to improve traceability for readers.
[Organization of personalization dimensions] Notation for the three personalization dimensions (user representation, paradigm, evaluation) should be introduced once with consistent abbreviations to avoid repeated re-definition across sections.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our survey. We address each major comment below with targeted revisions to strengthen the evidence supporting our claims.

read point-by-point responses

Referee: [Abstract / concluding discussion of structural inadequacies] Abstract and concluding discussion of structural inadequacies: the central claim that existing research exhibits exactly the three named inadequacies (user-invariant safety evaluation, isolated paradigm analysis, inability to capture long-term emergent risks) is load-bearing for the paper's positioning of its taxonomy and framework as the remedy. The manuscript provides no systematic mapping, table, or quantitative breakdown showing how the cited works were classified with respect to these three criteria, leaving open the possibility that relational or compositional treatments already exist in the reviewed literature but were not framed as such.

Authors: Our identification of the three inadequacies derives from the qualitative synthesis across the reviewed literature on personalization paradigms and safety evaluations. We agree that an explicit classification table would improve verifiability and transparency regarding how works were assessed against the criteria. In the revised manuscript we will add a table mapping representative cited works to the three inadequacies, accompanied by a short methods note on the classification process. This addition will also clarify that our review did not identify relational or compositional treatments framed as such in the existing literature. revision: yes
Referee: [OpenClaw case study] OpenClaw case study section: to substantiate that current evaluation frameworks cannot capture emergent long-term risks, the case study must explicitly contrast observed deployment behaviors against the limitations of the evaluation methodologies summarized earlier in the paper. The current presentation does not include such a direct linkage or falsifiable prediction that would demonstrate the claimed gap.

Authors: The OpenClaw case study is presented to illustrate real-world deployment patterns that exceed the scope of the short-term, user-invariant evaluations summarized in the paper. We concur that stronger explicit linkages are needed. In revision we will expand the case study to include direct contrasts with the evaluation methodologies discussed earlier, specifying which observed behaviors fall outside those methodologies and adding falsifiable predictions for long-term risk emergence that future evaluations could test. revision: yes

Circularity Check

0 steps flagged

No significant circularity; synthesis of external literature

full rationale

This is a survey paper that organizes and analyzes existing external literature on personalized LLMs and safety risks. It introduces a taxonomy and identifies three structural inadequacies based on review of cited works, without any equations, fitted parameters, predictions, or derivations that reduce to the paper's own inputs by construction. No self-citation load-bearing steps, self-definitional loops, or ansatz smuggling are present. The central claims are secured by the synthesis process itself, which is independent of the authors' prior work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper introduces an organizational taxonomy and gap analysis but does not introduce fitted parameters, new mathematical axioms, or invented physical entities; its contribution is the synthesis itself.

pith-pipeline@v0.9.1-grok · 5829 in / 1076 out tokens · 21705 ms · 2026-06-27T16:47:13.492003+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

298 extracted references · 14 canonical work pages · 1 internal anchor

[1]

Association for Computational Linguistics, Suzhou, China (2025)

Chu, Z., Wang, S., Xie, J., Zhu, T., Yan, Y., Ye, J., Zhong, A., Hu, X., Liang, J., Yu, P.S., Wen, Q.: LLM Agents for Education: Advances and Applications. Association for Computational Linguistics, Suzhou, China (2025)

2025
[2]

Xu, X., Yao, B., Yang, Z., Zhang, S., Rogers, E., Intille, S., Shara, N., Gao, G., Wang, D.: Talk2Care: Facilitating Asynchronous Patient-Provider Communication with Large-Language-Model (2024)

2024
[3]

Easin, A.M., Sourav, S., Tamas, O.: An intelligent llm-powered personalized assistant for digital banking using langgraph and chain of thoughts (2024) 4https://blogs.cisco.com/ai/personal-ai-agents-like-openclaw-are-a-security-nightmare; https://www.immersivelabs.com/resources/c7-blog/openclaw-what-you-need-to-know-before-it-claws-its- way-into-your-organ...

2024
[4]

Huang, Z., Tang, T., Chen, S., Lin, S., Jie, Z., Ma, L., Wang, G., Liang, X.: Making large language models better planners with reasoning-decision alignment (2024)

2024
[5]

https://arxiv.org/abs/2505.18882

Wu, Y., Sun, E., Zhu, K., Lian, J., Hernandez-Orallo, J., Caliskan, A., Wang, J.: Personalized Safety in LLMs: A Benchmark and A Planning-based Agent Approach (2026). https://arxiv.org/abs/2505.18882

arXiv 2026
[6]

Dash, T., Karri, D., Vurity, A., Datla, G., Ahmad, T., Rafi, S., Tangudu, R.: Polypersona: Persona-grounded LLM for Synthetic Survey Responses (2025)

2025
[7]

Guo, Y., Chen, Z., Zhang, J.M., Liu, Y., Ma, Y.: Personality-guided code generation using large language models (2025)

2025
[9]

https://arxiv.org/ab s/2310.11564

Jang, J., Kim, S., Lin, B.Y., Wang, Y., Hessel, J., Zettlemoyer, L., Hajishirzi, H., Choi, Y., Ammanabrolu, P.: Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging (2023). https://arxiv.org/ab s/2310.11564

arXiv 2023
[10]

Zhang, X.F., Beauchamp, N., Wang, L.: Prime: Large language model per- sonalization with cognitive dual-memory and personalized thought process (2025)

2025
[11]

Zhang, K., Kang, Y., Zhao, F., Liu, X.: LLM-based medical assistant personal- ization with short-and long-term memory coordination (2024)

2024
[12]

https://arxiv.org/abs/25 09.17183

Li, J., Zhou, J., Zhan, B., Yang, Y., Pan, Q., Chen, S., Huai, T., Li, X., Chen, Q., He, L.: LifeAlign: Lifelong Alignment for Large Language Models with Memory- augmented Focalized Preference Optimization (2025). https://arxiv.org/abs/25 09.17183

2025
[13]

ACM, Sydney NSW Australia (2025)

Prahlad, D., Lee, C., Kim, D., Kim, H.: Personalizing Large Language Models using Retrieval Augmented Generation and Knowledge Graph. ACM, Sydney NSW Australia (2025)

2025
[14]

https://arxiv.org/abs/2304.114 06

Salemi, A., Mysore, S., Bendersky, M., Zamani, H.: LaMP: When Large Lan- guage Models Meet Personalization (2024). https://arxiv.org/abs/2304.114 06

2024
[15]

https://arxiv.org/abs/2406.17803

Wu, B., Shi, Z., Rahmani, H.A., Ramineni, V., Yilmaz, E.: Understanding the Role of User Profile in the Personalization of Large Language Models (2024). https://arxiv.org/abs/2406.17803

arXiv 2024
[16]

Pi, R., Zhang, J., Han, T., Zhang, J., Pan, R., Zhang, T.: Personalized visual 54 instruction tuning (2024)

2024
[17]

Huang, Q., Liu, X., Ko, T., Wu, B., Wang, W., Zhang, Y., Tang, L.: Selective prompting tuning for personalized conversations with LLMs (2024)

2024
[19]

Wu, Z., Hu, Y., Shi, W., Dziri, N., Suhr, A., Ammanabrolu, P., Smith, N.A., Ostendorf, M., Hajishirzi, H.: Fine-grained human feedback gives better rewards for language model training (2023)

2023
[20]

Park, C., Liu, M., Kong, D., Zhang, K., Ozdaglar, A.: RLHF from heterogeneous feedback via personalization and preference aggregation (2024)

2024
[21]

Lee, S., Park, S.H., Kim, S., Seo, M.: Aligning to thousands of preferences via system message generalization (2024)

2024
[22]

Zhao, S., Hong, M., Liu, Y., Hazarika, D., Lin, K.: Do LLMs recognize your preferences? Evaluating personalized preference following in LLMs (2025)

2025
[23]

https://arxiv.org/abs/2501.04167

Salemi, A., Li, C., Zhang, M., Mei, Q., Kong, W., Chen, T., Li, Z., Bendersky, M., Zamani, H.: Reasoning-enhanced Self-training for Long-form Personalized Text Generation (2025). https://arxiv.org/abs/2501.04167

arXiv 2025
[24]

https://arxiv.org/abs/2409.20296

Zollo, T.P., Siah, A.W.T., Ye, N., Li, A., Namkoong, H.: PersonalLLM: Tailoring LLMs to Individual Preferences (2025). https://arxiv.org/abs/2409.20296

arXiv 2025
[25]

In, Y., Kim, W., Yoon, K., Kim, S., Tanjim, M., Park, S., Kim, K., Park, C.: Is safety standard same for everyone? User-specific safety evaluation of large language models (2025)

2025
[26]

Qi, X., Zeng, Y., Xie, T., Chen, P.-Y., Jia, R., Mittal, P., Henderson, P.: Fine- tuning aligned language models compromises safety, even when users do not intend to! arXiv preprint arXiv:2310.03693 (2023)

Pith/arXiv arXiv 2023
[27]

Association for Computational Linguistics, Albuquerque, New Mexico (2025)

An, B., Zhang, S., Dredze, M.: RAG LLMs are Not Safer: A Safety Analysis of Retrieval-augmented Generation for Large Language Models. Association for Computational Linguistics, Albuquerque, New Mexico (2025)

2025
[28]

https://arxiv.org/ab s/2407.12784

Chen, Z., Xiang, Z., Xiao, C., Song, D., Li, B.: AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases (2024). https://arxiv.org/ab s/2407.12784

arXiv 2024
[29]

https://ar xiv.org/abs/2410.14479 55

Clop, C., Teglia, Y.: Backdoored Retrievers for Prompt Injection Attacks on Retrieval Augmented Generation of Large Language Models (2024). https://ar xiv.org/abs/2410.14479 55

arXiv 2024
[30]

https://arxiv.org/abs/2308.03958

Wei, J., Huang, D., Lu, Y., Zhou, D., Le, Q.V.: Simple synthetic data reduces sycophancy in large language models (2024). https://arxiv.org/abs/2308.03958

Pith/arXiv arXiv 2024
[31]

https: //arxiv.org/abs/2508.15036

Ding, R., Xu, T., Shen, X., Ding, A.A., Fei, Y.: MoEcho: Exploiting Side-Channel Attacks to Compromise User Privacy in Mixture-of-Experts LLMs (2025). https: //arxiv.org/abs/2508.15036

arXiv 2025
[32]

https://arxiv.org/abs/ 2509.08747

Guo, W., Brau, F., Pintor, M., Demontis, A., Biggio, B.: Silent Until Sparse: Backdoor Attacks on Semi-Structured Sparsity (2026). https://arxiv.org/abs/ 2509.08747

arXiv 2026
[33]

Liu, J., Qiu, Z., Li, Z., Dai, Q., Yu, W., Zhu, J., Hu, M., Yang, M., Chua, T.-S., King, I.: A survey of personalized large language models: Progress and future directions (2025)

2025
[34]

https://arxiv.org/abs/2503.17003

Guan, J., Wu, J., Li, J.-N., Cheng, C., Wu, W.: A Survey on Personalized Align- ment – The Missing Piece for Large Language Models in Real-world Applications (2025). https://arxiv.org/abs/2503.17003

arXiv 2025
[35]

https://arxiv.org/abs/2504.07070

Xie, Z., Wu, J., Shen, Y., Xia, Y., Li, X., Chang, A., Rossi, R., Kumar, S., Majumder, B.P., Shang, J., Ammanabrolu, P., McAuley, J.: A Survey on Person- alized and Pluralistic Preference Alignment in Large Language Models (2025). https://arxiv.org/abs/2504.07070

arXiv 2025
[36]

Xu, Y., Zhang, J., Salemi, A., Hu, X., Wang, W., Feng, F., Zamani, H., He, X., Chua, T.-S.: Personalized generation in large model era: A survey (2025)

2025
[37]

Zhang, Z., Rossi, R.A., Kveton, B., Shao, Y., Yang, D., Zamani, H., Dernoncourt, F., Barrow, J., Yu, T., Kim, S., et al.: Personalization of large language models: A survey (2024)

2024
[38]

Wu, J., Lyu, H., Xia, Y., Zhang, Z., Barrow, J., Kumar, I., Mirtaheri, M., Chen, H., Rossi, R.A., Dernoncourt, F., et al.: Personalized multimodal large language models: A survey (2024)

2024
[39]

https://arxiv.org/abs/2406.01171

Tseng, Y.-M., Huang, Y.-C., Hsiao, T.-Y., Chen, W.-L., Huang, C.-W., Meng, Y., Chen, Y.-N.: Two Tales of Persona in LLMs: A Survey of Role-playing and Personalization (2024). https://arxiv.org/abs/2406.01171

arXiv 2024
[40]

https://arxiv.org/abs/2412.17686

Shi, D., Shen, T., Huang, Y., Li, Z., Leng, Y., Jin, R., Liu, C., Wu, X., Guo, Z., Yu, L., Shi, L., Jiang, B., Xiong, D.: Large Language Model Safety: A Holistic Survey (2024). https://arxiv.org/abs/2412.17686

arXiv 2024
[41]

Journal of Electronic Science and Technology23(1), 100301 (2025) 56

Zhang, R., Li, H.-W., Qian, X.-Y., Jiang, W.-B., Chen, H.-X.: On large language models safety, security, and privacy: A survey. Journal of Electronic Science and Technology23(1), 100301 (2025) 56

2025
[42]

https: //arxiv.org/abs/2410.03198

Afzoon, S., Jamali, Z., Naseem, U., Beheshti, A.: PersoBench: Benchmarking Personalized Response Generation in Large Language Models (2026). https: //arxiv.org/abs/2410.03198

arXiv 2026
[43]

https://www.de epkeep.ai/post/top-three-scenarios-gen-ai-pii-leakage

AI, D.: Top Three Scenarios for PII Leakage in GenAI (2026). https://www.de epkeep.ai/post/top-three-scenarios-gen-ai-pii-leakage

2026
[44]

https://arxiv.org/abs/1603 .06155

Li, J., Galley, M., Brockett, C., Spithourakis, G.P., Gao, J., Dolan, B.: A Persona-based Neural Conversation Model (2016). https://arxiv.org/abs/1603 .06155

2016
[45]

https://arxiv.or g/abs/1801.07243

Zhang, S., Dinan, E., Urbanek, J., Szlam, A., Kiela, D., Weston, J.: Personalizing Dialogue Agents: I have a dog, do you have pets too? (2018). https://arxiv.or g/abs/1801.07243

Pith/arXiv arXiv 2018
[46]

https://arxiv.org/abs/2406.20094

Ge, T., Chan, X., Wang, X., Yu, D., Mi, H., Yu, D.: Scaling Synthetic Data Creation with 1,000,000,000 Personas (2025). https://arxiv.org/abs/2406.20094

Pith/arXiv arXiv 2025
[47]

arXiv preprint arXiv:2503.15463 (2025)

Li, J.-N., Guan, J., Wu, S., Wu, W., Yan, R.: From 1,000,000 users to every user: Scaling up personalized preference for user-level alignment. arXiv preprint arXiv:2503.15463 (2025)

arXiv 2025
[48]

Ryan, M.J., Shaikh, O., Bhagirath, A., Frees, D., Held, W.B., Yang, D.: Syn- thesizeMe! Inducing persona-guided prompts for personalized reward models in LLMs (2025)

2025
[49]

arXiv preprint arXiv:2601.02553 (2026)

Liu, J., Su, Y., Xia, P., Han, S., Zheng, Z., Xie, C., Ding, M., Yao, H.: Simple- mem: Efficient lifelong memory for llm agents. arXiv preprint arXiv:2601.02553 (2026)

Pith/arXiv arXiv 2026
[50]

arXiv preprint arXiv:2601.04463 (2026)

Yang, C., Sun, Z., Wei, W., Hu, W.: Beyond static summarization: Proactive memory extraction for llm agents. arXiv preprint arXiv:2601.04463 (2026)

arXiv 2026
[51]

Steering llama 2 via contrastive activation addition

Jiang, H., Zhang, X., Cao, X., Breazeal, C., Roy, D., Kabbara, J.: PersonaLLM: Investigating the ability of large language models to express personality traits. In: Duh, K., Gomez, H., Bethard, S. (eds.) Findings of the Association for Computational Linguistics: NAACL 2024, pp. 3605–3627. Association for Com- putational Linguistics, Mexico City, Mexico (2...

work page doi:10.18653/v 2024
[52]

Zhong, W., Guo, L., Gao, Q., Ye, H., Wang, Y.: Memorybank: Enhancing large language models with long-term memory (2024)

2024
[53]

Zhang, Z., Dai, Q., Bo, X., Ma, C., Li, R., Chen, X., Zhu, J., Dong, Z., Wen, J.- R.: A survey on the memory mechanism of large language model-based agents 43(6) (2025) 57

2025
[54]

Edge, D., Trinh, H., Cheng, N., Bradley, J., Chao, A., Mody, A., Truitt, S., Metropolitansky, D., Ness, R.O., Larson, J.: From local to global: A graph RAG approach to query-focused summarization (2024)

2024
[55]

arXiv preprint arXiv:2504.19413 (2025)

Chhikara, P., Khant, D., Aryan, S., Singh, T., Yadav, D.: Mem0: Building production-ready ai agents with scalable long-term memory. arXiv preprint arXiv:2504.19413 (2025)

Pith/arXiv arXiv 2025
[56]

arXiv preprint arXiv:2308.08239 (2023)

Lu, J., An, S., Lin, M., Pergola, G., He, Y., Yin, D., Sun, X., Wu, Y.: Memochat: Tuning llms to use memos for consistent long-range open-domain conversation. arXiv preprint arXiv:2308.08239 (2023)

arXiv 2023
[57]

Zeng, H., Kallumadi, S., Alibadi, Z., Nogueira, R., Zamani, H.: A personalized dense retrieval framework for unified information access (2023)

2023
[58]

Mysore, S., Lu, Z., Wan, M., Yang, L., Sarrafzadeh, B., Menezes, S., Baghaee, T., Gonzalez, E.B., Neville, J., Safavi, T.: Pearl: Personalizing large language model writing assistants with generation-calibrated retrievers (2024)

2024
[59]

Sun, C., Yang, K., Reddy, R.G., Fung, Y., Chan, H.P., Small, K., Zhai, C., Ji, H.: Persona-db: Efficient large language model personalization for response prediction with collaborative data refinement (2025)

2025
[60]

arXiv preprint arXiv:2509.25140 (2025)

Ouyang, S., Yan, J., Hsu, I., Chen, Y., Jiang, K., Wang, Z., Han, R., Le, L.T., Daruki, S., Tang, X., et al.: Reasoningbank: Scaling agent self-evolving with reasoning memory. arXiv preprint arXiv:2509.25140 (2025)

Pith/arXiv arXiv 2025
[61]

arXiv preprint arXiv:2512.10696 (2025)

Cao, Z., Deng, J., Yu, L., Zhou, W., Liu, Z., Ding, B., Zhao, H.: Remember me, refine me: A dynamic procedural memory framework for experience-driven agent evolution. arXiv preprint arXiv:2512.10696 (2025)

Pith/arXiv arXiv 2025
[62]

arXiv preprint arXiv:2512.18950 (2025)

Forouzandeh, S., Peng, W., Moradi, P., Yu, X., Jalili, M.: Learning hierarchical procedural memory for llm agents through bayesian selection and contrastive refinement. arXiv preprint arXiv:2512.18950 (2025)

arXiv 2025
[63]

arXiv preprint arXiv:2512.18746 (2025)

Zhang, G., Ren, H., Zhan, C., Zhou, Z., Wang, J., Zhu, H., Zhou, W., Yan, S.: Memevolve: Meta-evolution of agent memory systems. arXiv preprint arXiv:2512.18746 (2025)

Pith/arXiv arXiv 2025
[64]

arXiv preprint arXiv:2512.12818 (2025)

Latimer, C., Boschi, N., Neeser, A., Bartholomew, C., Srivastava, G., Wang, X., Ramakrishnan, N.: Hindsight is 20/20: Building agent memory that retains, recalls, and reflects. arXiv preprint arXiv:2512.12818 (2025)

arXiv 2025
[65]

Li, X., Bantupalli, J., Dharmani, R., Zhang, Y., Shang, J.: Toward multi-session personalized conversation: A large-scale dataset and hierarchical tree framework for implicit reasoning (2025) 58

2025
[66]

arXiv preprint arXiv:2511.12960 (2025)

Patel, D., Patel, S.: Engram: Effective, lightweight memory orchestration for conversational agents. arXiv preprint arXiv:2511.12960 (2025)

arXiv 2025
[67]

MemoDB: Memobase: User Profile-based Long-term Memory for AI Chatbot Applications. (2025). https://github.com/memodb-io/memobase

2025
[68]

https://github.com/langchain-ai/langmem

langchain-ai: LangMem (2025). https://github.com/langchain-ai/langmem

2025
[69]

arXiv preprint arXiv:2601.16621 (2026)

Feng, X., Gan, W., Chen, X., Dai, Q., Liu, Y.: How does personalized memory shape llm behavior? benchmarking rational preference utilization in personalized assistants. arXiv preprint arXiv:2601.16621 (2026)

arXiv 2026
[70]

arXiv preprint arXiv:2601.03192 (2026)

Zhang, S., Wang, J., Zhou, R., Liao, J., Feng, Y., Li, Z., Zheng, Y., Zhang, W., Wen, Y., Li, Z., et al.: Memrl: Self-evolving agents via runtime reinforcement learning on episodic memory. arXiv preprint arXiv:2601.03192 (2026)

Pith/arXiv arXiv 2026
[71]

arXiv preprint arXiv:2602.02474 (2026)

Zhang, H., Long, Q., Bao, J., Feng, T., Zhang, W., Yue, H., Wang, W.: Memskill: Learning and evolving memory skills for self-evolving agents. arXiv preprint arXiv:2602.02474 (2026)

Pith/arXiv arXiv 2026
[72]

arXiv preprint arXiv:2601.07470 (2026)

Liang, S., Cao, P., Zhao, J., Teng, W., Liao, X., Zhao, J., Liu, K.: Learning how to remember: A meta-cognitive management method for structured and transferable agent memory. arXiv preprint arXiv:2601.07470 (2026)

arXiv 2026
[73]

Packer, C., Fang, V., Patil, S., Lin, K., Wooders, S., Gonzalez, J.: Memgpt: towards llms as operating systems. (2023)

2023
[74]

arXiv preprint arXiv:2402.09727 (2024)

Lee, K.-H., Chen, X., Furuta, H., Canny, J., Fischer, I.: A human-inspired reading agent with gist memory of very long contexts. arXiv preprint arXiv:2402.09727 (2024)

arXiv 2024
[75]

Advances in Neural Information Processing Systems37, 59532–59569 (2024)

Jimenez Gutierrez, B., Shu, Y., Gu, Y., Yasunaga, M., Su, Y.: Hipporag: Neu- robiologically inspired long-term memory for large language models. Advances in Neural Information Processing Systems37, 59532–59569 (2024)

2024
[76]

arXiv preprint arXiv:2407.04363 (2024)

Anokhin, P., Semenov, N., Sorokin, A., Evseev, D., Kravchenko, A., Burtsev, M., Burnaev, E.: Arigraph: Learning knowledge graph world models with episodic memory for llm agents. arXiv preprint arXiv:2407.04363 (2024)

arXiv 2024
[77]

Wang, Z., Li, Z., Jiang, Z., Tu, D., Shi, W.: Crafting personalized agents through retrieval-augmented generation on editable memory graphs (2024)

2024
[78]

arXiv preprint arXiv:2501.13956 (2025)

Rasmussen, P., Paliychuk, P., Beauvais, T., Ryan, J., Chalef, D.: Zep: a temporal knowledge graph architecture for agent memory. arXiv preprint arXiv:2501.13956 (2025)

Pith/arXiv arXiv 2025
[79]

arXiv preprint arXiv:2502.12110 (2025) 59

Xu, W., Liang, Z., Mei, K., Gao, H., Tan, J., Zhang, Y.: A-mem: Agentic memory for llm agents. arXiv preprint arXiv:2502.12110 (2025) 59

Pith/arXiv arXiv 2025
[80]

arXiv preprint arXiv:2507.03724 (2025)

Li, Z., Xi, C., Li, C., Chen, D., Chen, B., Song, S., Niu, S., Wang, H., Yang, J., Tang, C., et al.: Memos: A memory os for ai system. arXiv preprint arXiv:2507.03724 (2025)

Pith/arXiv arXiv 2025
[81]

Jiayang, C., Ru, D., Qiu, L., Li, Y., Cao, X., Song, Y., Cai, X.: AMemGym: Inter- active Memory Benchmarking for Assistants in Long-Horizon Conversations
[82]

arXiv preprint arXiv:2602.02007 (2026)

Hu, Z., Zhu, Q., Yan, H., He, Y., Gui, L.: Beyond rag for agent memory: Retrieval by decoupling and aggregation. arXiv preprint arXiv:2602.02007 (2026)

Pith/arXiv arXiv 2026

Showing first 80 references.

[1] [1]

Association for Computational Linguistics, Suzhou, China (2025)

Chu, Z., Wang, S., Xie, J., Zhu, T., Yan, Y., Ye, J., Zhong, A., Hu, X., Liang, J., Yu, P.S., Wen, Q.: LLM Agents for Education: Advances and Applications. Association for Computational Linguistics, Suzhou, China (2025)

2025

[2] [2]

Xu, X., Yao, B., Yang, Z., Zhang, S., Rogers, E., Intille, S., Shara, N., Gao, G., Wang, D.: Talk2Care: Facilitating Asynchronous Patient-Provider Communication with Large-Language-Model (2024)

2024

[3] [3]

Easin, A.M., Sourav, S., Tamas, O.: An intelligent llm-powered personalized assistant for digital banking using langgraph and chain of thoughts (2024) 4https://blogs.cisco.com/ai/personal-ai-agents-like-openclaw-are-a-security-nightmare; https://www.immersivelabs.com/resources/c7-blog/openclaw-what-you-need-to-know-before-it-claws-its- way-into-your-organ...

2024

[4] [4]

Huang, Z., Tang, T., Chen, S., Lin, S., Jie, Z., Ma, L., Wang, G., Liang, X.: Making large language models better planners with reasoning-decision alignment (2024)

2024

[5] [5]

https://arxiv.org/abs/2505.18882

Wu, Y., Sun, E., Zhu, K., Lian, J., Hernandez-Orallo, J., Caliskan, A., Wang, J.: Personalized Safety in LLMs: A Benchmark and A Planning-based Agent Approach (2026). https://arxiv.org/abs/2505.18882

arXiv 2026

[6] [6]

Dash, T., Karri, D., Vurity, A., Datla, G., Ahmad, T., Rafi, S., Tangudu, R.: Polypersona: Persona-grounded LLM for Synthetic Survey Responses (2025)

2025

[7] [7]

Guo, Y., Chen, Z., Zhang, J.M., Liu, Y., Ma, Y.: Personality-guided code generation using large language models (2025)

2025

[8] [9]

https://arxiv.org/ab s/2310.11564

Jang, J., Kim, S., Lin, B.Y., Wang, Y., Hessel, J., Zettlemoyer, L., Hajishirzi, H., Choi, Y., Ammanabrolu, P.: Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging (2023). https://arxiv.org/ab s/2310.11564

arXiv 2023

[9] [10]

Zhang, X.F., Beauchamp, N., Wang, L.: Prime: Large language model per- sonalization with cognitive dual-memory and personalized thought process (2025)

2025

[10] [11]

Zhang, K., Kang, Y., Zhao, F., Liu, X.: LLM-based medical assistant personal- ization with short-and long-term memory coordination (2024)

2024

[11] [12]

https://arxiv.org/abs/25 09.17183

Li, J., Zhou, J., Zhan, B., Yang, Y., Pan, Q., Chen, S., Huai, T., Li, X., Chen, Q., He, L.: LifeAlign: Lifelong Alignment for Large Language Models with Memory- augmented Focalized Preference Optimization (2025). https://arxiv.org/abs/25 09.17183

2025

[12] [13]

ACM, Sydney NSW Australia (2025)

Prahlad, D., Lee, C., Kim, D., Kim, H.: Personalizing Large Language Models using Retrieval Augmented Generation and Knowledge Graph. ACM, Sydney NSW Australia (2025)

2025

[13] [14]

https://arxiv.org/abs/2304.114 06

Salemi, A., Mysore, S., Bendersky, M., Zamani, H.: LaMP: When Large Lan- guage Models Meet Personalization (2024). https://arxiv.org/abs/2304.114 06

2024

[14] [15]

https://arxiv.org/abs/2406.17803

Wu, B., Shi, Z., Rahmani, H.A., Ramineni, V., Yilmaz, E.: Understanding the Role of User Profile in the Personalization of Large Language Models (2024). https://arxiv.org/abs/2406.17803

arXiv 2024

[15] [16]

Pi, R., Zhang, J., Han, T., Zhang, J., Pan, R., Zhang, T.: Personalized visual 54 instruction tuning (2024)

2024

[16] [17]

Huang, Q., Liu, X., Ko, T., Wu, B., Wang, W., Zhang, Y., Tang, L.: Selective prompting tuning for personalized conversations with LLMs (2024)

2024

[17] [19]

Wu, Z., Hu, Y., Shi, W., Dziri, N., Suhr, A., Ammanabrolu, P., Smith, N.A., Ostendorf, M., Hajishirzi, H.: Fine-grained human feedback gives better rewards for language model training (2023)

2023

[18] [20]

Park, C., Liu, M., Kong, D., Zhang, K., Ozdaglar, A.: RLHF from heterogeneous feedback via personalization and preference aggregation (2024)

2024

[19] [21]

Lee, S., Park, S.H., Kim, S., Seo, M.: Aligning to thousands of preferences via system message generalization (2024)

2024

[20] [22]

Zhao, S., Hong, M., Liu, Y., Hazarika, D., Lin, K.: Do LLMs recognize your preferences? Evaluating personalized preference following in LLMs (2025)

2025

[21] [23]

https://arxiv.org/abs/2501.04167

Salemi, A., Li, C., Zhang, M., Mei, Q., Kong, W., Chen, T., Li, Z., Bendersky, M., Zamani, H.: Reasoning-enhanced Self-training for Long-form Personalized Text Generation (2025). https://arxiv.org/abs/2501.04167

arXiv 2025

[22] [24]

https://arxiv.org/abs/2409.20296

Zollo, T.P., Siah, A.W.T., Ye, N., Li, A., Namkoong, H.: PersonalLLM: Tailoring LLMs to Individual Preferences (2025). https://arxiv.org/abs/2409.20296

arXiv 2025

[23] [25]

In, Y., Kim, W., Yoon, K., Kim, S., Tanjim, M., Park, S., Kim, K., Park, C.: Is safety standard same for everyone? User-specific safety evaluation of large language models (2025)

2025

[24] [26]

Qi, X., Zeng, Y., Xie, T., Chen, P.-Y., Jia, R., Mittal, P., Henderson, P.: Fine- tuning aligned language models compromises safety, even when users do not intend to! arXiv preprint arXiv:2310.03693 (2023)

Pith/arXiv arXiv 2023

[25] [27]

Association for Computational Linguistics, Albuquerque, New Mexico (2025)

An, B., Zhang, S., Dredze, M.: RAG LLMs are Not Safer: A Safety Analysis of Retrieval-augmented Generation for Large Language Models. Association for Computational Linguistics, Albuquerque, New Mexico (2025)

2025

[26] [28]

https://arxiv.org/ab s/2407.12784

Chen, Z., Xiang, Z., Xiao, C., Song, D., Li, B.: AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases (2024). https://arxiv.org/ab s/2407.12784

arXiv 2024

[27] [29]

https://ar xiv.org/abs/2410.14479 55

Clop, C., Teglia, Y.: Backdoored Retrievers for Prompt Injection Attacks on Retrieval Augmented Generation of Large Language Models (2024). https://ar xiv.org/abs/2410.14479 55

arXiv 2024

[28] [30]

https://arxiv.org/abs/2308.03958

Wei, J., Huang, D., Lu, Y., Zhou, D., Le, Q.V.: Simple synthetic data reduces sycophancy in large language models (2024). https://arxiv.org/abs/2308.03958

Pith/arXiv arXiv 2024

[29] [31]

https: //arxiv.org/abs/2508.15036

Ding, R., Xu, T., Shen, X., Ding, A.A., Fei, Y.: MoEcho: Exploiting Side-Channel Attacks to Compromise User Privacy in Mixture-of-Experts LLMs (2025). https: //arxiv.org/abs/2508.15036

arXiv 2025

[30] [32]

https://arxiv.org/abs/ 2509.08747

Guo, W., Brau, F., Pintor, M., Demontis, A., Biggio, B.: Silent Until Sparse: Backdoor Attacks on Semi-Structured Sparsity (2026). https://arxiv.org/abs/ 2509.08747

arXiv 2026

[31] [33]

Liu, J., Qiu, Z., Li, Z., Dai, Q., Yu, W., Zhu, J., Hu, M., Yang, M., Chua, T.-S., King, I.: A survey of personalized large language models: Progress and future directions (2025)

2025

[32] [34]

https://arxiv.org/abs/2503.17003

Guan, J., Wu, J., Li, J.-N., Cheng, C., Wu, W.: A Survey on Personalized Align- ment – The Missing Piece for Large Language Models in Real-world Applications (2025). https://arxiv.org/abs/2503.17003

arXiv 2025

[33] [35]

https://arxiv.org/abs/2504.07070

Xie, Z., Wu, J., Shen, Y., Xia, Y., Li, X., Chang, A., Rossi, R., Kumar, S., Majumder, B.P., Shang, J., Ammanabrolu, P., McAuley, J.: A Survey on Person- alized and Pluralistic Preference Alignment in Large Language Models (2025). https://arxiv.org/abs/2504.07070

arXiv 2025

[34] [36]

Xu, Y., Zhang, J., Salemi, A., Hu, X., Wang, W., Feng, F., Zamani, H., He, X., Chua, T.-S.: Personalized generation in large model era: A survey (2025)

2025

[35] [37]

Zhang, Z., Rossi, R.A., Kveton, B., Shao, Y., Yang, D., Zamani, H., Dernoncourt, F., Barrow, J., Yu, T., Kim, S., et al.: Personalization of large language models: A survey (2024)

2024

[36] [38]

Wu, J., Lyu, H., Xia, Y., Zhang, Z., Barrow, J., Kumar, I., Mirtaheri, M., Chen, H., Rossi, R.A., Dernoncourt, F., et al.: Personalized multimodal large language models: A survey (2024)

2024

[37] [39]

https://arxiv.org/abs/2406.01171

Tseng, Y.-M., Huang, Y.-C., Hsiao, T.-Y., Chen, W.-L., Huang, C.-W., Meng, Y., Chen, Y.-N.: Two Tales of Persona in LLMs: A Survey of Role-playing and Personalization (2024). https://arxiv.org/abs/2406.01171

arXiv 2024

[38] [40]

https://arxiv.org/abs/2412.17686

Shi, D., Shen, T., Huang, Y., Li, Z., Leng, Y., Jin, R., Liu, C., Wu, X., Guo, Z., Yu, L., Shi, L., Jiang, B., Xiong, D.: Large Language Model Safety: A Holistic Survey (2024). https://arxiv.org/abs/2412.17686

arXiv 2024

[39] [41]

Journal of Electronic Science and Technology23(1), 100301 (2025) 56

Zhang, R., Li, H.-W., Qian, X.-Y., Jiang, W.-B., Chen, H.-X.: On large language models safety, security, and privacy: A survey. Journal of Electronic Science and Technology23(1), 100301 (2025) 56

2025

[40] [42]

https: //arxiv.org/abs/2410.03198

Afzoon, S., Jamali, Z., Naseem, U., Beheshti, A.: PersoBench: Benchmarking Personalized Response Generation in Large Language Models (2026). https: //arxiv.org/abs/2410.03198

arXiv 2026

[41] [43]

https://www.de epkeep.ai/post/top-three-scenarios-gen-ai-pii-leakage

AI, D.: Top Three Scenarios for PII Leakage in GenAI (2026). https://www.de epkeep.ai/post/top-three-scenarios-gen-ai-pii-leakage

2026

[42] [44]

https://arxiv.org/abs/1603 .06155

Li, J., Galley, M., Brockett, C., Spithourakis, G.P., Gao, J., Dolan, B.: A Persona-based Neural Conversation Model (2016). https://arxiv.org/abs/1603 .06155

2016

[43] [45]

https://arxiv.or g/abs/1801.07243

Zhang, S., Dinan, E., Urbanek, J., Szlam, A., Kiela, D., Weston, J.: Personalizing Dialogue Agents: I have a dog, do you have pets too? (2018). https://arxiv.or g/abs/1801.07243

Pith/arXiv arXiv 2018

[44] [46]

https://arxiv.org/abs/2406.20094

Ge, T., Chan, X., Wang, X., Yu, D., Mi, H., Yu, D.: Scaling Synthetic Data Creation with 1,000,000,000 Personas (2025). https://arxiv.org/abs/2406.20094

Pith/arXiv arXiv 2025

[45] [47]

arXiv preprint arXiv:2503.15463 (2025)

Li, J.-N., Guan, J., Wu, S., Wu, W., Yan, R.: From 1,000,000 users to every user: Scaling up personalized preference for user-level alignment. arXiv preprint arXiv:2503.15463 (2025)

arXiv 2025

[46] [48]

Ryan, M.J., Shaikh, O., Bhagirath, A., Frees, D., Held, W.B., Yang, D.: Syn- thesizeMe! Inducing persona-guided prompts for personalized reward models in LLMs (2025)

2025

[47] [49]

arXiv preprint arXiv:2601.02553 (2026)

Liu, J., Su, Y., Xia, P., Han, S., Zheng, Z., Xie, C., Ding, M., Yao, H.: Simple- mem: Efficient lifelong memory for llm agents. arXiv preprint arXiv:2601.02553 (2026)

Pith/arXiv arXiv 2026

[48] [50]

arXiv preprint arXiv:2601.04463 (2026)

Yang, C., Sun, Z., Wei, W., Hu, W.: Beyond static summarization: Proactive memory extraction for llm agents. arXiv preprint arXiv:2601.04463 (2026)

arXiv 2026

[49] [51]

Steering llama 2 via contrastive activation addition

Jiang, H., Zhang, X., Cao, X., Breazeal, C., Roy, D., Kabbara, J.: PersonaLLM: Investigating the ability of large language models to express personality traits. In: Duh, K., Gomez, H., Bethard, S. (eds.) Findings of the Association for Computational Linguistics: NAACL 2024, pp. 3605–3627. Association for Com- putational Linguistics, Mexico City, Mexico (2...

work page doi:10.18653/v 2024

[50] [52]

Zhong, W., Guo, L., Gao, Q., Ye, H., Wang, Y.: Memorybank: Enhancing large language models with long-term memory (2024)

2024

[51] [53]

Zhang, Z., Dai, Q., Bo, X., Ma, C., Li, R., Chen, X., Zhu, J., Dong, Z., Wen, J.- R.: A survey on the memory mechanism of large language model-based agents 43(6) (2025) 57

2025

[52] [54]

Edge, D., Trinh, H., Cheng, N., Bradley, J., Chao, A., Mody, A., Truitt, S., Metropolitansky, D., Ness, R.O., Larson, J.: From local to global: A graph RAG approach to query-focused summarization (2024)

2024

[53] [55]

arXiv preprint arXiv:2504.19413 (2025)

Chhikara, P., Khant, D., Aryan, S., Singh, T., Yadav, D.: Mem0: Building production-ready ai agents with scalable long-term memory. arXiv preprint arXiv:2504.19413 (2025)

Pith/arXiv arXiv 2025

[54] [56]

arXiv preprint arXiv:2308.08239 (2023)

Lu, J., An, S., Lin, M., Pergola, G., He, Y., Yin, D., Sun, X., Wu, Y.: Memochat: Tuning llms to use memos for consistent long-range open-domain conversation. arXiv preprint arXiv:2308.08239 (2023)

arXiv 2023

[55] [57]

Zeng, H., Kallumadi, S., Alibadi, Z., Nogueira, R., Zamani, H.: A personalized dense retrieval framework for unified information access (2023)

2023

[56] [58]

Mysore, S., Lu, Z., Wan, M., Yang, L., Sarrafzadeh, B., Menezes, S., Baghaee, T., Gonzalez, E.B., Neville, J., Safavi, T.: Pearl: Personalizing large language model writing assistants with generation-calibrated retrievers (2024)

2024

[57] [59]

Sun, C., Yang, K., Reddy, R.G., Fung, Y., Chan, H.P., Small, K., Zhai, C., Ji, H.: Persona-db: Efficient large language model personalization for response prediction with collaborative data refinement (2025)

2025

[58] [60]

arXiv preprint arXiv:2509.25140 (2025)

Ouyang, S., Yan, J., Hsu, I., Chen, Y., Jiang, K., Wang, Z., Han, R., Le, L.T., Daruki, S., Tang, X., et al.: Reasoningbank: Scaling agent self-evolving with reasoning memory. arXiv preprint arXiv:2509.25140 (2025)

Pith/arXiv arXiv 2025

[59] [61]

arXiv preprint arXiv:2512.10696 (2025)

Cao, Z., Deng, J., Yu, L., Zhou, W., Liu, Z., Ding, B., Zhao, H.: Remember me, refine me: A dynamic procedural memory framework for experience-driven agent evolution. arXiv preprint arXiv:2512.10696 (2025)

Pith/arXiv arXiv 2025

[60] [62]

arXiv preprint arXiv:2512.18950 (2025)

Forouzandeh, S., Peng, W., Moradi, P., Yu, X., Jalili, M.: Learning hierarchical procedural memory for llm agents through bayesian selection and contrastive refinement. arXiv preprint arXiv:2512.18950 (2025)

arXiv 2025

[61] [63]

arXiv preprint arXiv:2512.18746 (2025)

Zhang, G., Ren, H., Zhan, C., Zhou, Z., Wang, J., Zhu, H., Zhou, W., Yan, S.: Memevolve: Meta-evolution of agent memory systems. arXiv preprint arXiv:2512.18746 (2025)

Pith/arXiv arXiv 2025

[62] [64]

arXiv preprint arXiv:2512.12818 (2025)

Latimer, C., Boschi, N., Neeser, A., Bartholomew, C., Srivastava, G., Wang, X., Ramakrishnan, N.: Hindsight is 20/20: Building agent memory that retains, recalls, and reflects. arXiv preprint arXiv:2512.12818 (2025)

arXiv 2025

[63] [65]

Li, X., Bantupalli, J., Dharmani, R., Zhang, Y., Shang, J.: Toward multi-session personalized conversation: A large-scale dataset and hierarchical tree framework for implicit reasoning (2025) 58

2025

[64] [66]

arXiv preprint arXiv:2511.12960 (2025)

Patel, D., Patel, S.: Engram: Effective, lightweight memory orchestration for conversational agents. arXiv preprint arXiv:2511.12960 (2025)

arXiv 2025

[65] [67]

MemoDB: Memobase: User Profile-based Long-term Memory for AI Chatbot Applications. (2025). https://github.com/memodb-io/memobase

2025

[66] [68]

https://github.com/langchain-ai/langmem

langchain-ai: LangMem (2025). https://github.com/langchain-ai/langmem

2025

[67] [69]

arXiv preprint arXiv:2601.16621 (2026)

Feng, X., Gan, W., Chen, X., Dai, Q., Liu, Y.: How does personalized memory shape llm behavior? benchmarking rational preference utilization in personalized assistants. arXiv preprint arXiv:2601.16621 (2026)

arXiv 2026

[68] [70]

arXiv preprint arXiv:2601.03192 (2026)

Zhang, S., Wang, J., Zhou, R., Liao, J., Feng, Y., Li, Z., Zheng, Y., Zhang, W., Wen, Y., Li, Z., et al.: Memrl: Self-evolving agents via runtime reinforcement learning on episodic memory. arXiv preprint arXiv:2601.03192 (2026)

Pith/arXiv arXiv 2026

[69] [71]

arXiv preprint arXiv:2602.02474 (2026)

Zhang, H., Long, Q., Bao, J., Feng, T., Zhang, W., Yue, H., Wang, W.: Memskill: Learning and evolving memory skills for self-evolving agents. arXiv preprint arXiv:2602.02474 (2026)

Pith/arXiv arXiv 2026

[70] [72]

arXiv preprint arXiv:2601.07470 (2026)

Liang, S., Cao, P., Zhao, J., Teng, W., Liao, X., Zhao, J., Liu, K.: Learning how to remember: A meta-cognitive management method for structured and transferable agent memory. arXiv preprint arXiv:2601.07470 (2026)

arXiv 2026

[71] [73]

Packer, C., Fang, V., Patil, S., Lin, K., Wooders, S., Gonzalez, J.: Memgpt: towards llms as operating systems. (2023)

2023

[72] [74]

arXiv preprint arXiv:2402.09727 (2024)

Lee, K.-H., Chen, X., Furuta, H., Canny, J., Fischer, I.: A human-inspired reading agent with gist memory of very long contexts. arXiv preprint arXiv:2402.09727 (2024)

arXiv 2024

[73] [75]

Advances in Neural Information Processing Systems37, 59532–59569 (2024)

Jimenez Gutierrez, B., Shu, Y., Gu, Y., Yasunaga, M., Su, Y.: Hipporag: Neu- robiologically inspired long-term memory for large language models. Advances in Neural Information Processing Systems37, 59532–59569 (2024)

2024

[74] [76]

arXiv preprint arXiv:2407.04363 (2024)

Anokhin, P., Semenov, N., Sorokin, A., Evseev, D., Kravchenko, A., Burtsev, M., Burnaev, E.: Arigraph: Learning knowledge graph world models with episodic memory for llm agents. arXiv preprint arXiv:2407.04363 (2024)

arXiv 2024

[75] [77]

Wang, Z., Li, Z., Jiang, Z., Tu, D., Shi, W.: Crafting personalized agents through retrieval-augmented generation on editable memory graphs (2024)

2024

[76] [78]

arXiv preprint arXiv:2501.13956 (2025)

Rasmussen, P., Paliychuk, P., Beauvais, T., Ryan, J., Chalef, D.: Zep: a temporal knowledge graph architecture for agent memory. arXiv preprint arXiv:2501.13956 (2025)

Pith/arXiv arXiv 2025

[77] [79]

arXiv preprint arXiv:2502.12110 (2025) 59

Xu, W., Liang, Z., Mei, K., Gao, H., Tan, J., Zhang, Y.: A-mem: Agentic memory for llm agents. arXiv preprint arXiv:2502.12110 (2025) 59

Pith/arXiv arXiv 2025

[78] [80]

arXiv preprint arXiv:2507.03724 (2025)

Li, Z., Xi, C., Li, C., Chen, D., Chen, B., Song, S., Niu, S., Wang, H., Yang, J., Tang, C., et al.: Memos: A memory os for ai system. arXiv preprint arXiv:2507.03724 (2025)

Pith/arXiv arXiv 2025

[79] [81]

Jiayang, C., Ru, D., Qiu, L., Li, Y., Cao, X., Song, Y., Cai, X.: AMemGym: Inter- active Memory Benchmarking for Assistants in Long-Horizon Conversations

[80] [82]

arXiv preprint arXiv:2602.02007 (2026)

Hu, Z., Zhu, Q., Yan, H., He, Y., Gui, L.: Beyond rag for agent memory: Retrieval by decoupling and aggregation. arXiv preprint arXiv:2602.02007 (2026)

Pith/arXiv arXiv 2026