Recognition: no theorem link
ADAM: A Systematic Data Extraction Attack on Agent Memory via Adaptive Querying
Pith reviewed 2026-05-10 17:55 UTC · model grok-4.3
The pith
ADAM extracts private data from LLM agent memory by estimating its distribution and adaptively querying with an entropy-guided strategy to achieve up to 100 percent success.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ADAM estimates the data distribution inside a victim agent's memory and applies an entropy-guided query strategy to maximize privacy leakage, outperforming existing attacks with attack success rates up to 100 percent.
What carries the argument
The entropy-guided query strategy, which selects queries to maximize information gain from the estimated distribution of memory contents.
If this is right
- LLM agents that store prior interactions in memory become vulnerable to high-rate data extraction.
- Entropy-based adaptive querying extracts more private information than fixed or random query patterns.
- Both memory modules and retrieval-augmented generation mechanisms in agents carry comparable privacy risks.
- Current agent architectures require new privacy-preserving techniques to limit leakage through queries.
Where Pith is reading between the lines
- The same distribution-estimation approach could be tested on other queryable components such as tool-use histories.
- Memory designs that add controlled noise or restrict adaptive query patterns might reduce leakage even if the underlying distribution remains accessible.
- Real-world deployment tests on commercial agents would reveal whether the reported success rates hold under production response constraints.
Load-bearing premise
A victim LLM agent will respond to crafted queries in ways that reliably reveal the statistical distribution of its internal memory contents without additional safeguards.
What would settle it
A test on agents equipped with response filtering or memory-obscuring safeguards in which ADAM's attack success rate falls below that of prior non-adaptive attacks.
Figures
read the original abstract
Large Language Model (LLM) agents have achieved rapid adoption and demonstrated remarkable capabilities across a wide range of applications. To improve reasoning and task execution, modern LLM agents would incorporate memory modules or retrieval-augmented generation (RAG) mechanisms, enabling them to further leverage prior interactions or external knowledge. However, such a design also introduces a group of critical privacy vulnerabilities: sensitive information stored in memory can be leaked through query-based attacks. Although feasible, existing attacks often achieve only limited performance, with low attack success rates (ASR). In this paper, we propose ADAM, a novel privacy attack that features data distribution estimation of a victim agent's memory and employs an entropy-guided query strategy for maximizing privacy leakage. Extensive experiments demonstrate that our attack substantially outperforms state-of-the-art ones, achieving up to 100% ASRs. These results thus underscore the urgent need for robust privacy-preserving methods for current LLM agents.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes ADAM, a novel privacy attack on LLM agents equipped with memory modules or RAG. It estimates the empirical distribution over the victim's memory contents via adaptive queries and then uses an entropy-guided strategy to select follow-up queries that maximize leakage of sensitive information. The central claim is that this approach substantially outperforms prior state-of-the-art extraction attacks, achieving up to 100% attack success rates (ASR) across extensive experiments.
Significance. If the reported ASRs hold under realistic agent configurations, the work would be significant for the security and privacy community by demonstrating a practical, high-success query-based extraction method against memory-augmented agents and by underscoring the need for output sanitization and refusal mechanisms. The entropy-guided adaptive querying is a clear technical contribution over non-adaptive baselines.
major comments (2)
- [Abstract and §4] Abstract and §4 (Experiments): The 100% ASR claim is load-bearing for the paper's contribution, yet the provided text supplies no details on victim-agent implementation (e.g., presence of output sanitization, refusal training, or raw RAG access), baseline attack reproductions, dataset statistics, or statistical significance tests. Without these, it is impossible to evaluate whether the result generalizes beyond unconstrained, cooperative LLMs.
- [§3] §3 (Method): The entropy-guided query strategy presupposes that generated responses faithfully reflect the internal memory retrieval distribution. This assumption is not stress-tested against agents that apply even light post-processing or that refuse high-entropy probes; the manuscript should include an ablation or threat-model section quantifying degradation under such realistic constraints.
minor comments (2)
- [Abstract] The abstract states 'extensive experiments' but does not name the concrete LLM back-ends, memory sizes, or number of trials; adding these summary statistics would improve reproducibility.
- [§3] Notation for attack success rate (ASR) and entropy calculation should be defined at first use with a short equation or pseudocode snippet.
Simulated Author's Rebuttal
We thank the referee for their insightful comments, which have helped us improve the clarity and rigor of our work. We address each major comment below and have made substantial revisions to the manuscript to incorporate the suggested enhancements.
read point-by-point responses
-
Referee: [Abstract and §4] Abstract and §4 (Experiments): The 100% ASR claim is load-bearing for the paper's contribution, yet the provided text supplies no details on victim-agent implementation (e.g., presence of output sanitization, refusal training, or raw RAG access), baseline attack reproductions, dataset statistics, or statistical significance tests. Without these, it is impossible to evaluate whether the result generalizes beyond unconstrained, cooperative LLMs.
Authors: We appreciate this observation and agree that more comprehensive details are required to support the claims. In the revised version, we have significantly expanded Section 4 (Experiments) to provide full specifications of the victim-agent setups, including configurations with and without output sanitization and refusal mechanisms. We have also included detailed reproductions of the baseline attacks with their exact parameters, comprehensive dataset statistics, and statistical significance testing (including p-values and confidence intervals across repeated trials). Furthermore, we added experiments demonstrating the attack's performance on more constrained agents to address generalizability concerns. These changes ensure the results can be properly evaluated. revision: yes
-
Referee: [§3] §3 (Method): The entropy-guided query strategy presupposes that generated responses faithfully reflect the internal memory retrieval distribution. This assumption is not stress-tested against agents that apply even light post-processing or that refuse high-entropy probes; the manuscript should include an ablation or threat-model section quantifying degradation under such realistic constraints.
Authors: We acknowledge the validity of this point regarding the assumptions in our threat model. To address it, we have added a new subsection in Section 3 (Method) that explicitly outlines the threat model and its assumptions. Additionally, we have included an ablation study in Section 4 that evaluates the attack's effectiveness when the agent applies post-processing filters or refusal strategies for high-entropy queries. This study quantifies the degradation in ASR under these more realistic conditions, providing a clearer picture of the attack's robustness and limitations. revision: yes
Circularity Check
No circularity: empirical attack with no self-referential derivations
full rationale
The paper describes an empirical privacy attack (ADAM) that estimates memory distributions via adaptive queries and entropy-guided selection, validated through experiments on LLM agents. No equations, fitted parameters renamed as predictions, self-definitional constructs, or load-bearing self-citations appear in the abstract or description. Claims rest on experimental ASRs rather than any derivation chain that reduces outputs to inputs by construction. The work is self-contained against external benchmarks with no uniqueness theorems or ansatzes imported from prior author work.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Defending Against Alignment-Breaking Attacks via Robustly Aligned LLM.CoRR abs/2309.14348, 2023
Bochuan Cao, Yuanpu Cao, Lu Lin, and Jinghui Chen. Defending against alignment-breaking attacks via robustly aligned llm. arXiv preprint arXiv:2309.14348,
-
[2]
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory
Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav. Mem0: Building production-ready ai agents with scalable long-term memory.arXiv preprint arXiv:2504.19413,
work page internal anchor Pith review arXiv
-
[3]
Stav Cohen, Ron Bitton, and Ben Nassi. Unleashing worms and extracting data: Escalating the outcome of attacks against rag- based inference in scale and severity using jailbreaking.arXiv preprint arXiv:2409.08045,
-
[4]
Christian Di Maio, Cristian Cosci, Marco Maggini, Valentina Poggioni, and Stefano Melacci. Pirates of the rag: Adaptively attacking llms to leak knowledge bases.arXiv preprint arXiv:2412.18295,
-
[5]
Xiaoning Dong, Wenbo Hu, Wei Xu, and Tianxing He
Han Ding, Yinheng Li, Junhao Wang, and Hang Chen. Large language model agent in financial trading: A survey.arXiv preprint arXiv:2408.06361,
-
[6]
Melanie Ducoffe and Frederic Precioso. Adversarial active learning for deep networks: a margin based approach.arXiv preprint arXiv:1802.09841,
-
[7]
Jinyuan Fang, Yanwen Peng, Xi Zhang, Yingxu Wang, Xinhao Yi, Guibin Zhang, Yi Xu, Bin Wu, Siwei Liu, Zihao Li, et al. A comprehensive survey of self-evolving ai agents: A new paradigm bridging foundation models and lifelong agentic systems.arXiv preprint arXiv:2508.07407,
-
[8]
arXiv preprint arXiv:2504.21716 (2025)
Marc Glocker, Peter Hönig, Matthias Hirschmanner, and Markus Vincze. Llm-empowered embodied agent for memory- augmented task planning in household robotics.arXiv preprint arXiv:2504.21716,
-
[9]
URL https://arxiv.org/abs/2310.06825. Changyue Jiang, Xudong Pan, Geng Hong, Chenfu Bao, and Min Yang. Rag-thief: Scalable extraction of private data from retrieval-augmented generation applications with agent-based attacks.arXiv preprint arXiv:2411.14110,
work page internal anchor Pith review Pith/arXiv arXiv
-
[10]
Certifying llm safety against adversarial prompting
Aounon Kumar, Chirag Agarwal, Suraj Srinivas, Aaron Jiaxun Li, Soheil Feizi, and Himabindu Lakkaraju. Certifying llm safety against adversarial prompting.arXiv preprint arXiv:2309.02705,
-
[11]
Legalagentbench: Evaluating llm agents in legal domain,
Haitao Li, Junjie Chen, Jingli Yang, Qingyao Ai, Wei Jia, Youfeng Liu, Kai Lin, Yueyue Wu, Guozhi Yuan, Yiran Hu, et al. Legalagentbench: Evaluating llm agents in legal domain.arXiv preprint arXiv:2412.17259,
-
[12]
arXiv preprint arXiv:2409.00872 , year=
Xuechen Liang, Yangfan He, Yinghui Xia, Xinyuan Song, Jianhui Wang, Meiling Tao, Li Sun, Xinhang Yuan, Jiayi Su, Keqin Li, et al. Self-evolving agents with reflective and memory- augmented abilities.arXiv preprint arXiv:2409.00872,
-
[13]
arXiv preprint arXiv:2409.11295 , year=
Zeyi Liao, Lingbo Mo, Chejian Xu, Mintong Kang, Jiawei Zhang, Chaowei Xiao, Yuan Tian, Bo Li, and Huan Sun. Eia: Environmental injection attack on generalist web agents for privacy leakage.arXiv preprint arXiv:2409.11295,
-
[14]
9 ADAM: A Systematic Data Extraction Attack on Agent Memory via Adaptive Querying Bang Liu, Xinfeng Li, Jiayi Zhang, Jinlin Wang, Tanjin He, Sirui Hong, Hongzhang Liu, Shaokun Zhang, Kaitao Song, Kunlun Zhu, et al. Advances and challenges in foundation agents: From brain-inspired intelligence to evolutionary, collaborative, and safe systems.arXiv preprint...
-
[15]
Query rewriting in retrieval-augmented large language models
Xinbei Ma, Yeyun Gong, Pengcheng He, Hai Zhao, and Nan Duan. Query rewriting in retrieval-augmented large language models. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 5303–5315,
2023
-
[16]
Evaluating Very Long-Term Conversational Memory of LLM Agents
Adyasha Maharana, Dong-Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, and Yuwei Fang. Evaluating very long-term conversational memory of llm agents.arXiv preprint arXiv:2402.17753,
work page internal anchor Pith review arXiv
-
[17]
Zhenting Qi, Hanlin Zhang, Eric Xing, Sham Kakade, and Himabindu Lakkaraju. Follow my instruction and spill the beans: Scalable data extraction from retrieval-augmented generation systems.arXiv preprint arXiv:2402.17840,
-
[18]
Summary the savior: Harmful keyword and query-based summarization for llm jailbreak defense
Shagoto Rahman and Ian Harris. Summary the savior: Harmful keyword and query-based summarization for llm jailbreak defense. InProceedings of the 5th Workshop on Trustworthy NLP (TrustNLP 2025), pages 266–275,
2025
-
[19]
Active Learning for Convolutional Neural Networks: A Core-Set Approach
Ozan Sener and Silvio Savarese. Active learning for convolutional neural networks: A core-set approach.arXiv preprint arXiv:1708.00489,
-
[20]
Ehragent: Code empowers large language models for few-shot complex tabular reasoning on electronic health records
Wenqi Shi, Ran Xu, Yuchen Zhuang, Yue Yu, Jieyu Zhang, Hang Wu, Yuanda Zhu, Joyce Ho, Carl Yang, and May D Wang. Ehragent: Code empowers large language models for few-shot complex tabular reasoning on electronic health records. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural ...
2024
-
[21]
Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG
Aditi Singh, Abul Ehtesham, Saket Kumar, and Tala Talaei Khoei. Agentic retrieval-augmented generation: A survey on agentic rag.arXiv preprint arXiv:2501.09136,
work page internal anchor Pith review arXiv
-
[22]
arXiv preprint arXiv:2502.13172 , year=
Bo Wang, Weiyi He, Shenglai Zeng, Zhen Xiang, Yue Xing, Jiliang Tang, and Pengfei He. Unveiling privacy risks in llm agent memory.arXiv preprint arXiv:2502.13172,
-
[23]
LLM jailbreak attack versus defense techniques -- a comprehensive study
Zihao Xu, Yi Liu, Gelei Deng, Yuekang Li, and Stjepan Picek. A comprehensive study of jailbreak attack versus defense for large language models.arXiv preprint arXiv:2402.13457,
-
[24]
arXiv preprint arXiv:2402.10828 (2024)
Jianhao Yuan, Shuyang Sun, Daniel Omeiza, Bo Zhao, Paul Newman, Lars Kunze, and Matthew Gadd. Rag-driver: Generalisable driving explanations with retrieval-augmented in-context learning in multi-modal large language model.arXiv preprint arXiv:2402.10828,
-
[25]
The good and the bad: Exploring privacy issues in retrieval-augmented generation (rag),
Shenglai Zeng, Jiankun Zhang, Pengfei He, Yue Xing, Yiding Liu, Han Xu, Jie Ren, Shuaiqiang Wang, Dawei Yin, Yi Chang, et al. The good and the bad: Exploring privacy issues in retrieval- augmented generation (rag).arXiv preprint arXiv:2402.16893,
-
[26]
random-word initialization
and the algorithm description (Algorithm 1). The LLM did not contribute to research ideation, experimental design, implementation, data analysis, or any other part of the work. B. LLM Agents with Memory LLM-based agents, often built on retrieval-augmented generation (RAG), demonstrate strong reasoning and interaction abilities in domains such as healthcar...
2024
-
[27]
In our experiments, we use both prefix injections (task-oriented prompts) and suffix injections (aligned with the agent’s task), as summarized in Table
Injection commands.We adopt injection commands following (Wang et al., 2025), which are designed to override safety mechanisms and instruction-following constraints. In our experiments, we use both prefix injections (task-oriented prompts) and suffix injections (aligned with the agent’s task), as summarized in Table
2025
-
[28]
16 ADAM: A Systematic Data Extraction Attack on Agent Memory via Adaptive Querying (a)Round 1 (b)Round 5 (c)Round 10 (d)Round 20 (e)Round 25 (f)Round 30 Figure 6.Estimated versus ground-truth distributions for EhrAgent across attack rounds.Results confirmthat the estimated distribution progressively aligns with the ground truth as the attack proceeds. Rou...
2015
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.