When Should Memory Stay Silent: Measuring Memory-Use Boundaries in Memory-Augmented Conversational Agents

Hongtu Chen; Jiaoyun Yang; Lingxiang Xu; Min Hu; Ning An

arxiv: 2606.06055 · v1 · pith:TOE32436new · submitted 2026-06-04 · 💻 cs.AI

When Should Memory Stay Silent: Measuring Memory-Use Boundaries in Memory-Augmented Conversational Agents

Lingxiang Xu , Jiaoyun Yang , Min Hu , Hongtu Chen , Ning An This is my paper

Pith reviewed 2026-06-28 01:55 UTC · model grok-4.3

classification 💻 cs.AI

keywords memory-augmented agentssensitive memory integrationLLM evaluationpersonalization safetyretrieval systemsconversational agentsmemory boundaries

0 comments

The pith

Access to memory causes conversational agents to integrate sensitive content into responses even under benign prompts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents RBI-Eval as a controlled way to test when memory-augmented agents should keep sensitive memories silent. It compares model outputs on identical benign prompts both with and without access to sensitive memory across full context and three retrieval setups. The evaluation covers four base language models and shows clear behavioral changes once memory becomes available. Separation scores drop for all models but by markedly different amounts, and retrieval lowers exposure without stopping integration at generation time. The work concludes that safe personalization therefore needs memory-aware choices at both retrieval and generation stages.

Core claim

RBI-Eval shows that providing access to sensitive memory produces substantial divergence from matched no-memory references. The separation score for sensitive-memory integration decreases by 8.9%--26.6% for one model and by 51.1%--82.9% for the other three. Control experiments establish that the change is specific to sensitive content rather than general personalization. Retrieval systems cut exposure but do not prevent integration once sensitive memory reaches the generator. These results indicate that safe personalization requires memory-aware decisions at both retrieval and generation time.

What carries the argument

RBI-Eval, a measurement study built on a probe set that compares model behavior with and without sensitive memory under identical benign prompts and quantifies the difference via a separation score.

If this is right

Retrieval systems reduce exposure to sensitive memory but do not eliminate its integration once the content reaches the generator.
The integration effect is specific to sensitive content rather than a general personalization tendency.
Models differ sharply in how readily they integrate sensitive memory, so safeguards must account for model variation.
Memory-aware decisions are required at both the retrieval stage and the generation stage for safe personalization.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Designers may need to add explicit suppression logic at generation time even when retrieval has already filtered content.
Similar probe-based tests could be applied to measure boundaries for other memory categories such as factual versus preference-based content.
Training objectives that penalize use of sensitive memory under benign prompts might raise separation scores across models.
Current retrieval-augmented pipelines leave a gap between what is retrieved and what should be used that generation-time checks could close.

Load-bearing premise

The probe set and separation score definition accurately capture scenarios in which sensitive memory content should remain unused under benign prompts.

What would settle it

Re-running the same probe set and finding no measurable drop in separation score when memory is supplied would show that the claimed integration does not occur.

Figures

Figures reproduced from arXiv: 2606.06055 by Hongtu Chen, Jiaoyun Yang, Lingxiang Xu, Min Hu, Ning An.

**Figure 2.** Figure 2: RBI-Eval construction pipeline. LoCoMo dialogues are distilled into persona profiles, then used to [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Paired changes in sensitive-history integration [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: DeepSeek target-memory exposure and integration-if-exposed under neutral prompting. Yet the model contrast remains, so the interface is not deterministic. Memory packaging and generator policy interact. 3.4 Q3: How Much Comes from Retrieval Exposure? We decompose integration risk by whether a sensitive target memory was retrieved; Appendix [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Complete history-augmentation prompt schema; full per-persona values and template markdown are [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗

**Figure 6.** Figure 6: Complete neutral-generation prompt. The prompt supplies memory context and condition name when [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗

**Figure 7.** Figure 7: Complete condition-blind judge prompt. The executable user payload hides evaluated model identity, [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

**Figure 8.** Figure 8: DeepSeek BSS-point loss by probe family; higher means larger loss versus matched no-memory responses. [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗

**Figure 9.** Figure 9: Retrieved memories for a fixed persona/query; useful outing context can also expose private anchors. [PITH_FULL_IMAGE:figures/full_fig_p021_9.png] view at source ↗

**Figure 10.** Figure 10: MemU-conditioned responses across base LLMs; plausible advice reuses unrequested private history. [PITH_FULL_IMAGE:figures/full_fig_p021_10.png] view at source ↗

read the original abstract

Long-term memory enables language model agents to support personalized interactions, but it remains unclear when available memories warrant integration into responses. Existing memory evaluations emphasize retrieval accuracy and downstream task utility, while overlooking whether retrieved sensitive memory content is warranted in the current turn. We introduce RBI-Eval, a controlled measurement study built around a probe set that compares model behavior with and without access to sensitive memory under identical benign prompts. We evaluate four base LLMs against a matched no-memory reference across four memory-access settings: full-context exposure and three retrieval systems. Our results reveal substantial behavioral divergence. With memory available, the separation score for sensitive-memory integration decreases by 8.9\%--26.6\% relative to the matched no-memory reference for GPT-5.4-mini, but by 51.1\%--82.9\% for Claude-Sonnet-4.6, DeepSeek-V4-Flash, and Qwen3.5-9B. Control experiments on DeepSeek and GPT-5.4-mini show this effect is specific to sensitive content, rather than general personalization. Retrieval systems reduce exposure but do not eliminate integration once sensitive memory reaches the generator. These findings suggest safe personalization requires memory-aware decisions at both retrieval and generation time.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RBI-Eval gives a direct probe for unwanted sensitive-memory integration under benign prompts, but the abstract leaves the probe construction and separation score too opaque to judge how cleanly it isolates the problem.

read the letter

The main point is that this paper measures when memory-augmented agents bring in sensitive content they should ignore, even on ordinary prompts. RBI-Eval compares the same benign queries with and without memory access, then tracks a separation score that drops when models integrate the sensitive stuff anyway. The numbers show clear model differences: GPT-5.4-mini changes modestly while the other three drop sharply, and retrieval cuts exposure without stopping the behavior once the memory reaches the generator.

What the work does cleanly is shift the question from retrieval accuracy to actual use. The with/without design and the control runs on non-sensitive content give a straightforward way to check specificity. That framing is new relative to the usual memory benchmarks and points to a practical requirement for checks at both retrieval and generation stages.

The soft spots sit in the missing mechanics. The abstract reports the percentage drops but gives no description of how the probe prompts were written, what counts as "benign," or the exact formula for the separation score. Without those, it is hard to know whether the probes truly require non-use or whether some prompts could legitimately draw on the memory. The control experiments help, but they do not replace seeing the construction details and any statistical reporting.

This is for people who build or evaluate memory systems in deployed agents. Anyone working on privacy constraints or agent safety will find the probe idea worth testing. The question is real and the basic comparison is reproducible in principle, so the paper should go to peer review once the methods section is expanded enough to let referees check the probe validity.

Referee Report

2 major / 2 minor

Summary. The paper introduces RBI-Eval, a controlled empirical measurement framework that compares LLM behavior with and without access to sensitive memory under identical benign prompts. It evaluates four base models across full-context and three retrieval settings, reports separation-score decreases of 8.9–26.6% for GPT-5.4-mini and 51.1–82.9% for the other three models relative to no-memory baselines, includes controls showing specificity to sensitive content, and concludes that safe personalization requires memory-aware decisions at both retrieval and generation time.

Significance. If the probe-set validity and separation-score definition hold, the work provides a direct, controlled demonstration that memory availability induces measurable integration of sensitive content even under benign prompts, with retrieval mitigating but not eliminating the effect. The inclusion of matched no-memory references and content-specificity controls is a strength that allows clear attribution of the observed divergence.

major comments (2)

[RBI-Eval construction paragraph] The paragraph describing RBI-Eval construction (abstract and methods): the headline claim that observed separation-score decreases reflect unwarranted sensitive-memory integration under benign prompts rests on the assumption that the probe prompts legitimately warrant non-use of the sensitive memory; the manuscript provides no explicit criteria, examples, or construction details for the probe set, nor any statistical tests or error bars, which is load-bearing for interpreting the reported 8.9–82.9% ranges and model differences.
[Results] Results section (model comparisons): the separation-score definition and its computation are not shown, so it is impossible to verify whether the metric cleanly isolates integration from other factors such as prompt relevance or generation stochasticity; this directly affects the cross-model claims.

minor comments (2)

[Abstract] The abstract states four memory-access settings but does not name the three retrieval systems; adding their names and a brief citation would improve clarity.
[Control experiments paragraph] Control experiments are mentioned for DeepSeek and GPT-5.4-mini only; a short note on why the other two models were omitted would prevent reader confusion.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments highlight areas where additional methodological detail will improve clarity and verifiability. We address each point below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses

Referee: [RBI-Eval construction paragraph] The paragraph describing RBI-Eval construction (abstract and methods): the headline claim that observed separation-score decreases reflect unwarranted sensitive-memory integration under benign prompts rests on the assumption that the probe prompts legitimately warrant non-use of the sensitive memory; the manuscript provides no explicit criteria, examples, or construction details for the probe set, nor any statistical tests or error bars, which is load-bearing for interpreting the reported 8.9–82.9% ranges and model differences.

Authors: We agree that the current description of probe-set construction is insufficiently detailed. The probe prompts were designed to be benign and topically unrelated to the sensitive memory content (e.g., general queries about daily activities when the memory concerns medical or financial details), but explicit criteria, sample prompts, and construction protocol are indeed missing. In revision we will add a dedicated subsection detailing the probe-set generation process, inclusion/exclusion criteria, and examples. We will also report bootstrap confidence intervals and appropriate statistical tests (e.g., paired t-tests or Wilcoxon tests) for the separation-score differences to support the reported ranges. revision: yes
Referee: [Results] Results section (model comparisons): the separation-score definition and its computation are not shown, so it is impossible to verify whether the metric cleanly isolates integration from other factors such as prompt relevance or generation stochasticity; this directly affects the cross-model claims.

Authors: We acknowledge that the separation-score formula and its exact computation steps are not presented in the results section. The metric is defined as the normalized difference in the rate of sensitive-content leakage between the memory-available and no-memory conditions, with controls for prompt relevance via matched benign prompts and stochasticity via multiple generations per prompt. In the revised manuscript we will insert an explicit definition, the mathematical formulation, and a description of how relevance and stochasticity are controlled, allowing readers to assess whether the metric isolates integration effects. revision: yes

Circularity Check

0 steps flagged

Empirical measurement study with no derivations or fitted parameters

full rationale

The paper is a controlled empirical evaluation introducing RBI-Eval to compare LLM behavior with versus without sensitive memory access under identical benign prompts. All reported results are direct observational differences (separation score changes relative to no-memory baselines) across models and retrieval settings, with control experiments for specificity. No mathematical derivations, first-principles predictions, parameter fitting, or self-citation chains are present; the central claims rest on the probe-set construction and score definition, which are inputs to the measurement rather than outputs that reduce to themselves by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The study rests on standard assumptions of LLM evaluation (prompts are benign, separation score measures integration) without introducing new free parameters or entities.

axioms (1)

domain assumption The separation score quantifies unwarranted sensitive-memory integration under identical benign prompts
Central metric of the RBI-Eval framework described in the abstract

pith-pipeline@v0.9.1-grok · 5762 in / 1274 out tokens · 19266 ms · 2026-06-28T01:55:42.249554+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

57 extracted references · 10 linked inside Pith

[1]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Evaluating very long-term conversational memory of llm agents , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
[2]

Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Personalizing dialogue agents: I have a dog, do you have pets too? , author=. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
[3]

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) , pages=

Recent trends in personalized dialogue generation: A review of datasets, methodologies, and evaluations , author=. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) , pages=

2024
[4]

Advances in neural information processing systems , volume=

Retrieval-augmented generation for knowledge-intensive nlp tasks , author=. Advances in neural information processing systems , volume=
[5]

arXiv preprint arXiv:2312.10997 , volume=

Retrieval-augmented generation for large language models: A survey , author=. arXiv preprint arXiv:2312.10997 , volume=

Pith/arXiv arXiv
[6]

Hello again! llm-powered personalized agent for long-term dialogue , author=. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages=

2025
[7]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages=

Crafting personalized agents through retrieval-augmented generation on editable memory graphs , author=. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages=

2024
[8]

International Conference on Learning Representations , volume=

Towards understanding sycophancy in language models , author=. International Conference on Learning Representations , volume=
[9]

arXiv preprint arXiv:2505.13995 , year=

ELEPHANT: Measuring and understanding social sycophancy in LLMs , author=. arXiv preprint arXiv:2505.13995 , year=

Pith/arXiv arXiv
[10]

arXiv preprint arXiv:2504.19413 , year=

Mem0: Building production-ready ai agents with scalable long-term memory , author=. arXiv preprint arXiv:2504.19413 , year=

Pith/arXiv arXiv
[11]

Advances in Neural Information Processing Systems , volume=

A-mem: Agentic memory for llm agents , author=. Advances in Neural Information Processing Systems , volume=
[12]

Proceedings of the AAAI conference on artificial intelligence , volume=

Memorybank: Enhancing large language models with long-term memory , author=. Proceedings of the AAAI conference on artificial intelligence , volume=
[13]

Advances in Neural Information Processing Systems , volume=

Augmenting language models with long-term memory , author=. Advances in Neural Information Processing Systems , volume=
[14]

Privacy as contextual integrity , author=. Wash. L. Rev. , volume=. 2004 , publisher=

2004
[15]

Privacy in context , year=

Privacy in context: Technology, policy, and the integrity of social life , author=. Privacy in context , year=
[16]

, author=

The environment and social behavior: privacy, personal space, territory, and crowding. , author=. 1975 , publisher=

1975
[17]

2002 , publisher=

Boundaries of privacy: Dialectics of disclosure , author=. 2002 , publisher=

2002
[18]

Philosophy & public affairs , pages=

Why privacy is important , author=. Philosophy & public affairs , pages=. 1975 , publisher=

1975
[19]

Companion Publication of the 2025 Conference on Computer-Supported Cooperative Work and Social Computing , pages=

Characterizing Relationships with Companion and Assistant Large Language Models , author=. Companion Publication of the 2025 Conference on Computer-Supported Cooperative Work and Social Computing , pages=

2025
[20]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Longsafety: Evaluating long-context safety of large language models , author=. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
[21]

Advances in Neural Information Processing Systems , volume=

Privacylens: Evaluating privacy norm awareness of language models in action , author=. Advances in Neural Information Processing Systems , volume=
[22]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Unveiling privacy risks in llm agent memory , author=. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
[23]

arXiv preprint arXiv:2601.13722 , year=

OP-Bench: Benchmarking Over-Personalization for Memory-Augmented Personalized Conversational Agents , author=. arXiv preprint arXiv:2601.13722 , year=

arXiv
[24]

arXiv preprint arXiv:2601.17887 , year=

When Personalization Legitimizes Risks: Uncovering Safety Vulnerabilities in Personalized Dialogue Agents , author=. arXiv preprint arXiv:2601.17887 , year=

Pith/arXiv arXiv
[25]

arXiv preprint arXiv:2401.05459 , year=

Personal llm agents: Insights and survey about the capability, efficiency and security , author=. arXiv preprint arXiv:2401.05459 , year=

Pith/arXiv arXiv
[26]

arXiv preprint arXiv:2502.11528 , year=

A survey of personalized large language models: Progress and future directions , author=. arXiv preprint arXiv:2502.11528 , year=

arXiv
[27]

arXiv preprint arXiv:2311.08719 , year=

Think-in-memory: Recalling and post-thinking enable llms with long-term memory , author=. arXiv preprint arXiv:2311.08719 , year=

arXiv
[28]

arXiv preprint arXiv:2507.03724 , year=

Memos: A memory os for ai system , author=. arXiv preprint arXiv:2507.03724 , year=

Pith/arXiv arXiv
[29]

arXiv preprint arXiv:2511.13593 , year=

O-mem: Omni memory system for personalized, long horizon, self-evolving agents , author=. arXiv preprint arXiv:2511.13593 , year=

arXiv
[30]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

In prospect and retrospect: Reflective memory management for long-term personalized dialogue agents , author=. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
[31]

arXiv preprint arXiv:2505.16067 , year=

How memory management impacts llm agents: An empirical study of experience-following behavior , author=. arXiv preprint arXiv:2505.16067 , year=

arXiv
[32]

Advances in Neural Information Processing Systems , volume=

Teaching language models to evolve with users: Dynamic profile modeling for personalized alignment , author=. Advances in Neural Information Processing Systems , volume=
[33]

Proceedings of the ACM on Web Conference 2025 , pages=

Large language models empowered personalized web agents , author=. Proceedings of the ACM on Web Conference 2025 , pages=

2025
[34]

arXiv preprint arXiv:2506.06254 , year=

Personaagent: When large language model agents meet personalization at test time , author=. arXiv preprint arXiv:2506.06254 , year=

Pith/arXiv arXiv
[35]

arXiv preprint arXiv:2311.08377 , year=

Learning to filter context for retrieval-augmented generation , author=. arXiv preprint arXiv:2311.08377 , year=

arXiv
[36]

arXiv preprint arXiv:2209.07858 , year=

Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned , author=. arXiv preprint arXiv:2209.07858 , year=

Pith/arXiv arXiv
[37]

arXiv preprint arXiv:2404.08676 , year=

ALERT: A comprehensive benchmark for assessing large language models' safety through red teaming , author=. arXiv preprint arXiv:2404.08676 , year=

arXiv
[38]

arXiv preprint arXiv:2402.04249 , year=

Harmbench: A standardized evaluation framework for automated red teaming and robust refusal , author=. arXiv preprint arXiv:2402.04249 , year=

Pith/arXiv arXiv
[39]

International Conference on Learning Representations , volume=

Sorry-bench: Systematically evaluating large language model safety refusal , author=. International Conference on Learning Representations , volume=
[40]

Advances in Neural Information Processing Systems , volume=

Personalized safety in llms: A benchmark and a planning-based agent approach , author=. Advances in Neural Information Processing Systems , volume=
[41]

arXiv preprint arXiv:2503.15552 , year=

Personalized Attacks of Social Engineering in Multi-turn Conversations: LLM Agents for Simulation and Detection , author=. arXiv preprint arXiv:2503.15552 , year=

arXiv
[42]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track , pages=

SAGE: A Generic Framework for LLM Safety Evaluation , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track , pages=

2025
[43]

2006 IEEE symposium on security and privacy (S&P'06) , pages=

Privacy and contextual integrity: Framework and applications , author=. 2006 IEEE symposium on security and privacy (S&P'06) , pages=. 2006 , organization=

2006
[44]

Speech acts , pages=

Logic and conversation , author=. Speech acts , pages=. 1975 , publisher=

1975
[45]

Psychiatry , volume=

On face-work: An analysis of ritual elements in social interaction , author=. Psychiatry , volume=. 1955 , publisher=

1955
[46]

arXiv preprint arXiv:2502.09597 , year=

Do llms recognize your preferences? evaluating personalized preference following in llms , author=. arXiv preprint arXiv:2502.09597 , year=

arXiv
[47]

arXiv preprint arXiv:2504.14225 , year=

Know me, respond to me: Benchmarking llms for dynamic user profiling and personalized responses at scale , author=. arXiv preprint arXiv:2504.14225 , year=

arXiv
[48]

URL https://arxiv

Evaluating memory in llm agents via incremental multi-turn interactions, 2025 , author=. URL https://arxiv. org/abs/2507.05257 , year=

Pith/arXiv arXiv 2025
[49]

Proceedings of the 2023 conference on empirical methods in natural language processing , pages=

G-eval: NLG evaluation using gpt-4 with better human alignment , author=. Proceedings of the 2023 conference on empirical methods in natural language processing , pages=

2023
[50]

Advances in neural information processing systems , volume=

Judging llm-as-a-judge with mt-bench and chatbot arena , author=. Advances in neural information processing systems , volume=
[51]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Large language models are not fair evaluators , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
[52]

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , pages=

USR: An unsupervised and reference free evaluation metric for dialog generation , author=. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , pages=
[53]

Proceedings of the 58th annual meeting of the association for computational linguistics , pages=

Beyond accuracy: Behavioral testing of NLP models with CheckList , author=. Proceedings of the 58th annual meeting of the association for computational linguistics , pages=
[54]

2026 , month = feb, url =

Claude Sonnet 4.6 System Card , author =. 2026 , month = feb, url =

2026
[55]

2026 , month = mar, url =

2026
[56]

2026 , month = apr, url =

2026
[57]

2026 , month = feb, url =

2026

[1] [1]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Evaluating very long-term conversational memory of llm agents , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

[2] [2]

Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Personalizing dialogue agents: I have a dog, do you have pets too? , author=. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

[3] [3]

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) , pages=

Recent trends in personalized dialogue generation: A review of datasets, methodologies, and evaluations , author=. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) , pages=

2024

[4] [4]

Advances in neural information processing systems , volume=

Retrieval-augmented generation for knowledge-intensive nlp tasks , author=. Advances in neural information processing systems , volume=

[5] [5]

arXiv preprint arXiv:2312.10997 , volume=

Retrieval-augmented generation for large language models: A survey , author=. arXiv preprint arXiv:2312.10997 , volume=

Pith/arXiv arXiv

[6] [6]

Hello again! llm-powered personalized agent for long-term dialogue , author=. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages=

2025

[7] [7]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages=

Crafting personalized agents through retrieval-augmented generation on editable memory graphs , author=. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages=

2024

[8] [8]

International Conference on Learning Representations , volume=

Towards understanding sycophancy in language models , author=. International Conference on Learning Representations , volume=

[9] [9]

arXiv preprint arXiv:2505.13995 , year=

ELEPHANT: Measuring and understanding social sycophancy in LLMs , author=. arXiv preprint arXiv:2505.13995 , year=

Pith/arXiv arXiv

[10] [10]

arXiv preprint arXiv:2504.19413 , year=

Mem0: Building production-ready ai agents with scalable long-term memory , author=. arXiv preprint arXiv:2504.19413 , year=

Pith/arXiv arXiv

[11] [11]

Advances in Neural Information Processing Systems , volume=

A-mem: Agentic memory for llm agents , author=. Advances in Neural Information Processing Systems , volume=

[12] [12]

Proceedings of the AAAI conference on artificial intelligence , volume=

Memorybank: Enhancing large language models with long-term memory , author=. Proceedings of the AAAI conference on artificial intelligence , volume=

[13] [13]

Advances in Neural Information Processing Systems , volume=

Augmenting language models with long-term memory , author=. Advances in Neural Information Processing Systems , volume=

[14] [14]

Privacy as contextual integrity , author=. Wash. L. Rev. , volume=. 2004 , publisher=

2004

[15] [15]

Privacy in context , year=

Privacy in context: Technology, policy, and the integrity of social life , author=. Privacy in context , year=

[16] [16]

, author=

The environment and social behavior: privacy, personal space, territory, and crowding. , author=. 1975 , publisher=

1975

[17] [17]

2002 , publisher=

Boundaries of privacy: Dialectics of disclosure , author=. 2002 , publisher=

2002

[18] [18]

Philosophy & public affairs , pages=

Why privacy is important , author=. Philosophy & public affairs , pages=. 1975 , publisher=

1975

[19] [19]

Companion Publication of the 2025 Conference on Computer-Supported Cooperative Work and Social Computing , pages=

Characterizing Relationships with Companion and Assistant Large Language Models , author=. Companion Publication of the 2025 Conference on Computer-Supported Cooperative Work and Social Computing , pages=

2025

[20] [20]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Longsafety: Evaluating long-context safety of large language models , author=. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

[21] [21]

Advances in Neural Information Processing Systems , volume=

Privacylens: Evaluating privacy norm awareness of language models in action , author=. Advances in Neural Information Processing Systems , volume=

[22] [22]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Unveiling privacy risks in llm agent memory , author=. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

[23] [23]

arXiv preprint arXiv:2601.13722 , year=

OP-Bench: Benchmarking Over-Personalization for Memory-Augmented Personalized Conversational Agents , author=. arXiv preprint arXiv:2601.13722 , year=

arXiv

[24] [24]

arXiv preprint arXiv:2601.17887 , year=

When Personalization Legitimizes Risks: Uncovering Safety Vulnerabilities in Personalized Dialogue Agents , author=. arXiv preprint arXiv:2601.17887 , year=

Pith/arXiv arXiv

[25] [25]

arXiv preprint arXiv:2401.05459 , year=

Personal llm agents: Insights and survey about the capability, efficiency and security , author=. arXiv preprint arXiv:2401.05459 , year=

Pith/arXiv arXiv

[26] [26]

arXiv preprint arXiv:2502.11528 , year=

A survey of personalized large language models: Progress and future directions , author=. arXiv preprint arXiv:2502.11528 , year=

arXiv

[27] [27]

arXiv preprint arXiv:2311.08719 , year=

Think-in-memory: Recalling and post-thinking enable llms with long-term memory , author=. arXiv preprint arXiv:2311.08719 , year=

arXiv

[28] [28]

arXiv preprint arXiv:2507.03724 , year=

Memos: A memory os for ai system , author=. arXiv preprint arXiv:2507.03724 , year=

Pith/arXiv arXiv

[29] [29]

arXiv preprint arXiv:2511.13593 , year=

O-mem: Omni memory system for personalized, long horizon, self-evolving agents , author=. arXiv preprint arXiv:2511.13593 , year=

arXiv

[30] [30]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

In prospect and retrospect: Reflective memory management for long-term personalized dialogue agents , author=. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

[31] [31]

arXiv preprint arXiv:2505.16067 , year=

How memory management impacts llm agents: An empirical study of experience-following behavior , author=. arXiv preprint arXiv:2505.16067 , year=

arXiv

[32] [32]

Advances in Neural Information Processing Systems , volume=

Teaching language models to evolve with users: Dynamic profile modeling for personalized alignment , author=. Advances in Neural Information Processing Systems , volume=

[33] [33]

Proceedings of the ACM on Web Conference 2025 , pages=

Large language models empowered personalized web agents , author=. Proceedings of the ACM on Web Conference 2025 , pages=

2025

[34] [34]

arXiv preprint arXiv:2506.06254 , year=

Personaagent: When large language model agents meet personalization at test time , author=. arXiv preprint arXiv:2506.06254 , year=

Pith/arXiv arXiv

[35] [35]

arXiv preprint arXiv:2311.08377 , year=

Learning to filter context for retrieval-augmented generation , author=. arXiv preprint arXiv:2311.08377 , year=

arXiv

[36] [36]

arXiv preprint arXiv:2209.07858 , year=

Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned , author=. arXiv preprint arXiv:2209.07858 , year=

Pith/arXiv arXiv

[37] [37]

arXiv preprint arXiv:2404.08676 , year=

ALERT: A comprehensive benchmark for assessing large language models' safety through red teaming , author=. arXiv preprint arXiv:2404.08676 , year=

arXiv

[38] [38]

arXiv preprint arXiv:2402.04249 , year=

Harmbench: A standardized evaluation framework for automated red teaming and robust refusal , author=. arXiv preprint arXiv:2402.04249 , year=

Pith/arXiv arXiv

[39] [39]

International Conference on Learning Representations , volume=

Sorry-bench: Systematically evaluating large language model safety refusal , author=. International Conference on Learning Representations , volume=

[40] [40]

Advances in Neural Information Processing Systems , volume=

Personalized safety in llms: A benchmark and a planning-based agent approach , author=. Advances in Neural Information Processing Systems , volume=

[41] [41]

arXiv preprint arXiv:2503.15552 , year=

Personalized Attacks of Social Engineering in Multi-turn Conversations: LLM Agents for Simulation and Detection , author=. arXiv preprint arXiv:2503.15552 , year=

arXiv

[42] [42]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track , pages=

SAGE: A Generic Framework for LLM Safety Evaluation , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track , pages=

2025

[43] [43]

2006 IEEE symposium on security and privacy (S&P'06) , pages=

Privacy and contextual integrity: Framework and applications , author=. 2006 IEEE symposium on security and privacy (S&P'06) , pages=. 2006 , organization=

2006

[44] [44]

Speech acts , pages=

Logic and conversation , author=. Speech acts , pages=. 1975 , publisher=

1975

[45] [45]

Psychiatry , volume=

On face-work: An analysis of ritual elements in social interaction , author=. Psychiatry , volume=. 1955 , publisher=

1955

[46] [46]

arXiv preprint arXiv:2502.09597 , year=

Do llms recognize your preferences? evaluating personalized preference following in llms , author=. arXiv preprint arXiv:2502.09597 , year=

arXiv

[47] [47]

arXiv preprint arXiv:2504.14225 , year=

Know me, respond to me: Benchmarking llms for dynamic user profiling and personalized responses at scale , author=. arXiv preprint arXiv:2504.14225 , year=

arXiv

[48] [48]

URL https://arxiv

Evaluating memory in llm agents via incremental multi-turn interactions, 2025 , author=. URL https://arxiv. org/abs/2507.05257 , year=

Pith/arXiv arXiv 2025

[49] [49]

Proceedings of the 2023 conference on empirical methods in natural language processing , pages=

G-eval: NLG evaluation using gpt-4 with better human alignment , author=. Proceedings of the 2023 conference on empirical methods in natural language processing , pages=

2023

[50] [50]

Advances in neural information processing systems , volume=

Judging llm-as-a-judge with mt-bench and chatbot arena , author=. Advances in neural information processing systems , volume=

[51] [51]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Large language models are not fair evaluators , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

[52] [52]

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , pages=

USR: An unsupervised and reference free evaluation metric for dialog generation , author=. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , pages=

[53] [53]

Proceedings of the 58th annual meeting of the association for computational linguistics , pages=

Beyond accuracy: Behavioral testing of NLP models with CheckList , author=. Proceedings of the 58th annual meeting of the association for computational linguistics , pages=

[54] [54]

2026 , month = feb, url =

Claude Sonnet 4.6 System Card , author =. 2026 , month = feb, url =

2026

[55] [55]

2026 , month = mar, url =

2026

[56] [56]

2026 , month = apr, url =

2026

[57] [57]

2026 , month = feb, url =

2026