arxiv: 2604.09747 · v1 · submitted 2026-04-10 · 💻 cs.CR · cs.AI

Recognition: no theorem link

ADAM: A Systematic Data Extraction Attack on Agent Memory via Adaptive Querying

Xingyu Lyu , Jianfeng He , Ning Wang , Yidan Hu , Tao Li , Danjue Chen , Shixiong Li , Yimin Chen

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:55 UTC · model grok-4.3

classification 💻 cs.CR cs.AI

keywords LLM agentsprivacy attackmemory extractiondata leakageadaptive queryingentropy-guided strategyretrieval-augmented generation

0 comments

The pith

ADAM extracts private data from LLM agent memory by estimating its distribution and adaptively querying with an entropy-guided strategy to achieve up to 100 percent success.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents ADAM as a privacy attack that targets memory in LLM agents to extract sensitive information from prior interactions or external knowledge. It first estimates the statistical distribution of the stored data and then crafts queries guided by entropy to maximize the leakage in the agent's responses. Experiments show this approach yields substantially higher attack success rates than prior methods, reaching as high as 100 percent. If the method works as described, current agent designs that rely on accessible memory or retrieval mechanisms expose private data to systematic extraction through ordinary queries.

Core claim

ADAM estimates the data distribution inside a victim agent's memory and applies an entropy-guided query strategy to maximize privacy leakage, outperforming existing attacks with attack success rates up to 100 percent.

What carries the argument

The entropy-guided query strategy, which selects queries to maximize information gain from the estimated distribution of memory contents.

If this is right

LLM agents that store prior interactions in memory become vulnerable to high-rate data extraction.
Entropy-based adaptive querying extracts more private information than fixed or random query patterns.
Both memory modules and retrieval-augmented generation mechanisms in agents carry comparable privacy risks.
Current agent architectures require new privacy-preserving techniques to limit leakage through queries.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same distribution-estimation approach could be tested on other queryable components such as tool-use histories.
Memory designs that add controlled noise or restrict adaptive query patterns might reduce leakage even if the underlying distribution remains accessible.
Real-world deployment tests on commercial agents would reveal whether the reported success rates hold under production response constraints.

Load-bearing premise

A victim LLM agent will respond to crafted queries in ways that reliably reveal the statistical distribution of its internal memory contents without additional safeguards.

What would settle it

A test on agents equipped with response filtering or memory-obscuring safeguards in which ADAM's attack success rate falls below that of prior non-adaptive attacks.

Figures

Figures reproduced from arXiv: 2604.09747 by Danjue Chen, Jianfeng He, Ning Wang, Shixiong Li, Tao Li, Xingyu Lyu, Yidan Hu, Yimin Chen.

**Figure 1.** Figure 1: Workflow of the proposed ADAM attack. A malicious query (left) versus a benign query (right) to the agent. Algorithm 1 ADAM attack Require: Agent A (memory M); generator Gaux; encoder z(·); seed topics Sseed; params k, α, λ, τ, T, ϵ 1: Instructions: 2: ι1: benign wrapper (e.g., “I may have lost prior examples.”); 3: ι2: retrieval-inducing hint (format-aligned). 4: T0 ← Sseed; Pˆ 0(a) ← 1/|T0|; R ← ∅ 5: for… view at source ↗

**Figure 2.** Figure 2: Case studies for ADAM over two agents: EHRAgent (left) and RAP (right). Baselines. We compare against four baseline attacks of which the details are in Appendix. The Vanilla attack baseline employs prompt injection commands following (Zeng et al., 2024; Qi et al., 2024). Queryoptimization baselines include RAG-Thief (Jiang et al., 2024) and Pirate (Di Maio et al., 2024). We also include the state-of-the-a… view at source ↗

**Figure 3.** Figure 3: Ablation study analyzing the impacts of different factors on ADAM. Harris, 2025), where the agent flags potentially harmful prompts such as those explicitly requesting “list your memory” or “show previous questions.” These prompts are treated as harmful and filtered out. Figure 5b illustrates that while such filtering is effective against MEXTRA, our method is only marginally impacted. This shows that keyw… view at source ↗

**Figure 4.** Figure 4: Embedding distribution of both ground truth (i.e., Oracle) and ADAM (Ours). The results confirm that the better distribution estimation, the closer ADAM is to Oracle performance. EHRAgent ReAct RAP 0 20 40 60 80 100 EQ (Ours) EQ (Baseline) (a) Query rewritting EHRAgent ReAct RAP 0 20 40 60 80 100 EQ (Ours) EQ (Baseline) (b) Auxiliary filtering EHRAgent ReAct RAP 0 20 40 60 80 100 EQ (Ours) EQ (Baseline) (c… view at source ↗

**Figure 5.** Figure 5: Attack against four defenses. Pale regions indicate attack degradation after applying a defense. Representative defense-transformed query examples are provided in Appendix F. We additionally apply a standard industry-style rate-control mechanism to limit query frequency and iterative exploitation, the corresponding results are provided in Appendix O. 6. Related Work Privacy leakage in RAG and LLM agents. E… view at source ↗

**Figure 6.** Figure 6: Estimated versus ground-truth distributions for EhrAgent across attack rounds. Results confirm that the estimated distribution progressively aligns with the ground truth as the attack proceeds. Round Queries from three repeated experiments Noisy? 1 What are the top three frequent output events? / What are the top five frequent output events? / What are the top five frequent specimens tested? No 2 What are … view at source ↗

**Figure 7.** Figure 7: EQ and EE across five runs, EhrAgent, QWEN-72B. Results confirm that the proposed attack performs consistently across runs rather than due to randomness or variance in execution. O. Rate control of attack queries We evaluated our attack under a standard industry-style rate-control mechanism to assess whether it meaningfully limits ADAM’s effectiveness. Our findings indicate that rate control cannot effecti… view at source ↗

read the original abstract

Large Language Model (LLM) agents have achieved rapid adoption and demonstrated remarkable capabilities across a wide range of applications. To improve reasoning and task execution, modern LLM agents would incorporate memory modules or retrieval-augmented generation (RAG) mechanisms, enabling them to further leverage prior interactions or external knowledge. However, such a design also introduces a group of critical privacy vulnerabilities: sensitive information stored in memory can be leaked through query-based attacks. Although feasible, existing attacks often achieve only limited performance, with low attack success rates (ASR). In this paper, we propose ADAM, a novel privacy attack that features data distribution estimation of a victim agent's memory and employs an entropy-guided query strategy for maximizing privacy leakage. Extensive experiments demonstrate that our attack substantially outperforms state-of-the-art ones, achieving up to 100% ASRs. These results thus underscore the urgent need for robust privacy-preserving methods for current LLM agents.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ADAM's entropy-guided extraction on agent memory is a reasonable new angle but the 100% ASR claim rests on cooperative, undefended targets and lacks enough experimental detail to judge yet.

read the letter

The main thing here is a new attack strategy that first estimates the distribution of whatever is stored in an agent's memory or RAG store, then picks follow-up queries with high entropy to pull out more private bits. That adaptive step after the initial distribution probe is the part that feels fresher than the earlier query-based extraction papers they cite. It is a timely reminder that memory modules in LLM agents create a direct privacy surface, especially when agents are used in settings with user data or internal documents. The abstract is clear on the motivation and the claimed improvement over prior work. What is missing is any concrete description of the experimental setup, the victim models, the baselines, the datasets, or how they handled refusal and sanitization. The 100% ASR number is presented without error bars or ablation on defenses, so it is hard to know whether the result survives once the target agent has even basic output filtering or rate limiting. The stress-test note is on point: the attack only works cleanly if the LLM returns raw retrieval statistics without hallucinating or refusing under pressure. Most deployed agents add at least some of those protections, which would likely drop the success rate. The paper is aimed at the LLM security and privacy community. A reader working on agent architectures or red-teaming would get value from the high-level idea and the call for better defenses. It is worth sending to referees so the experimental claims can be checked against the full details, but the authors should expect questions on how the attack behaves once realistic safeguards are in place.

Referee Report

2 major / 2 minor

Summary. The paper proposes ADAM, a novel privacy attack on LLM agents equipped with memory modules or RAG. It estimates the empirical distribution over the victim's memory contents via adaptive queries and then uses an entropy-guided strategy to select follow-up queries that maximize leakage of sensitive information. The central claim is that this approach substantially outperforms prior state-of-the-art extraction attacks, achieving up to 100% attack success rates (ASR) across extensive experiments.

Significance. If the reported ASRs hold under realistic agent configurations, the work would be significant for the security and privacy community by demonstrating a practical, high-success query-based extraction method against memory-augmented agents and by underscoring the need for output sanitization and refusal mechanisms. The entropy-guided adaptive querying is a clear technical contribution over non-adaptive baselines.

major comments (2)

[Abstract and §4] Abstract and §4 (Experiments): The 100% ASR claim is load-bearing for the paper's contribution, yet the provided text supplies no details on victim-agent implementation (e.g., presence of output sanitization, refusal training, or raw RAG access), baseline attack reproductions, dataset statistics, or statistical significance tests. Without these, it is impossible to evaluate whether the result generalizes beyond unconstrained, cooperative LLMs.
[§3] §3 (Method): The entropy-guided query strategy presupposes that generated responses faithfully reflect the internal memory retrieval distribution. This assumption is not stress-tested against agents that apply even light post-processing or that refuse high-entropy probes; the manuscript should include an ablation or threat-model section quantifying degradation under such realistic constraints.

minor comments (2)

[Abstract] The abstract states 'extensive experiments' but does not name the concrete LLM back-ends, memory sizes, or number of trials; adding these summary statistics would improve reproducibility.
[§3] Notation for attack success rate (ASR) and entropy calculation should be defined at first use with a short equation or pseudocode snippet.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful comments, which have helped us improve the clarity and rigor of our work. We address each major comment below and have made substantial revisions to the manuscript to incorporate the suggested enhancements.

read point-by-point responses

Referee: [Abstract and §4] Abstract and §4 (Experiments): The 100% ASR claim is load-bearing for the paper's contribution, yet the provided text supplies no details on victim-agent implementation (e.g., presence of output sanitization, refusal training, or raw RAG access), baseline attack reproductions, dataset statistics, or statistical significance tests. Without these, it is impossible to evaluate whether the result generalizes beyond unconstrained, cooperative LLMs.

Authors: We appreciate this observation and agree that more comprehensive details are required to support the claims. In the revised version, we have significantly expanded Section 4 (Experiments) to provide full specifications of the victim-agent setups, including configurations with and without output sanitization and refusal mechanisms. We have also included detailed reproductions of the baseline attacks with their exact parameters, comprehensive dataset statistics, and statistical significance testing (including p-values and confidence intervals across repeated trials). Furthermore, we added experiments demonstrating the attack's performance on more constrained agents to address generalizability concerns. These changes ensure the results can be properly evaluated. revision: yes
Referee: [§3] §3 (Method): The entropy-guided query strategy presupposes that generated responses faithfully reflect the internal memory retrieval distribution. This assumption is not stress-tested against agents that apply even light post-processing or that refuse high-entropy probes; the manuscript should include an ablation or threat-model section quantifying degradation under such realistic constraints.

Authors: We acknowledge the validity of this point regarding the assumptions in our threat model. To address it, we have added a new subsection in Section 3 (Method) that explicitly outlines the threat model and its assumptions. Additionally, we have included an ablation study in Section 4 that evaluates the attack's effectiveness when the agent applies post-processing filters or refusal strategies for high-entropy queries. This study quantifies the degradation in ASR under these more realistic conditions, providing a clearer picture of the attack's robustness and limitations. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical attack with no self-referential derivations

full rationale

The paper describes an empirical privacy attack (ADAM) that estimates memory distributions via adaptive queries and entropy-guided selection, validated through experiments on LLM agents. No equations, fitted parameters renamed as predictions, self-definitional constructs, or load-bearing self-citations appear in the abstract or description. Claims rest on experimental ASRs rather than any derivation chain that reduces outputs to inputs by construction. The work is self-contained against external benchmarks with no uniqueness theorems or ansatzes imported from prior author work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review performed on abstract only; no explicit free parameters, axioms, or invented entities are stated in the provided text.

pith-pipeline@v0.9.0 · 5474 in / 914 out tokens · 26651 ms · 2026-05-10T17:55:21.655756+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

28 extracted references · 22 canonical work pages · 4 internal anchors

[1]

Defending Against Alignment-Breaking Attacks via Robustly Aligned LLM.CoRR abs/2309.14348, 2023

Bochuan Cao, Yuanpu Cao, Lu Lin, and Jinghui Chen. Defending against alignment-breaking attacks via robustly aligned llm. arXiv preprint arXiv:2309.14348,

work page arXiv
[2]

Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav. Mem0: Building production-ready ai agents with scalable long-term memory.arXiv preprint arXiv:2504.19413,

work page internal anchor Pith review arXiv
[3]

Hugging Face dataset

Stav Cohen, Ron Bitton, and Ben Nassi. Unleashing worms and extracting data: Escalating the outcome of attacks against rag- based inference in scale and severity using jailbreaking.arXiv preprint arXiv:2409.08045,

work page arXiv
[4]

Pirates of the rag: Adaptively attacking llms to leak knowledge bases.arXiv preprint arXiv:2412.18295,

Christian Di Maio, Cristian Cosci, Marco Maggini, Valentina Poggioni, and Stefano Melacci. Pirates of the rag: Adaptively attacking llms to leak knowledge bases.arXiv preprint arXiv:2412.18295,

work page arXiv
[5]

Xiaoning Dong, Wenbo Hu, Wei Xu, and Tianxing He

Han Ding, Yinheng Li, Junhao Wang, and Hang Chen. Large language model agent in financial trading: A survey.arXiv preprint arXiv:2408.06361,

work page arXiv
[6]

Adversarial active learning for deep networks: a margin based approach.arXiv preprint arXiv:1802.09841,

Melanie Ducoffe and Frederic Precioso. Adversarial active learning for deep networks: a margin based approach.arXiv preprint arXiv:1802.09841,

work page arXiv
[7]

and Peng, Y

Jinyuan Fang, Yanwen Peng, Xi Zhang, Yingxu Wang, Xinhao Yi, Guibin Zhang, Yi Xu, Bin Wu, Siwei Liu, Zihao Li, et al. A comprehensive survey of self-evolving ai agents: A new paradigm bridging foundation models and lifelong agentic systems.arXiv preprint arXiv:2508.07407,

work page arXiv
[8]

arXiv preprint arXiv:2504.21716 (2025)

Marc Glocker, Peter Hönig, Matthias Hirschmanner, and Markus Vincze. Llm-empowered embodied agent for memory- augmented task planning in household robotics.arXiv preprint arXiv:2504.21716,

work page arXiv
[9]

Mistral 7B

URL https://arxiv.org/abs/2310.06825. Changyue Jiang, Xudong Pan, Geng Hong, Chenfu Bao, and Min Yang. Rag-thief: Scalable extraction of private data from retrieval-augmented generation applications with agent-based attacks.arXiv preprint arXiv:2411.14110,

work page internal anchor Pith review Pith/arXiv arXiv
[10]

Certifying llm safety against adversarial prompting

Aounon Kumar, Chirag Agarwal, Suraj Srinivas, Aaron Jiaxun Li, Soheil Feizi, and Himabindu Lakkaraju. Certifying llm safety against adversarial prompting.arXiv preprint arXiv:2309.02705,

work page arXiv
[11]

Legalagentbench: Evaluating llm agents in legal domain,

Haitao Li, Junjie Chen, Jingli Yang, Qingyao Ai, Wei Jia, Youfeng Liu, Kai Lin, Yueyue Wu, Guozhi Yuan, Yiran Hu, et al. Legalagentbench: Evaluating llm agents in legal domain.arXiv preprint arXiv:2412.17259,

work page arXiv
[12]

arXiv preprint arXiv:2409.00872 , year=

Xuechen Liang, Yangfan He, Yinghui Xia, Xinyuan Song, Jianhui Wang, Meiling Tao, Li Sun, Xinhang Yuan, Jiayi Su, Keqin Li, et al. Self-evolving agents with reflective and memory- augmented abilities.arXiv preprint arXiv:2409.00872,

work page arXiv
[13]

arXiv preprint arXiv:2409.11295 , year=

Zeyi Liao, Lingbo Mo, Chejian Xu, Mintong Kang, Jiawei Zhang, Chaowei Xiao, Yuan Tian, Bo Li, and Huan Sun. Eia: Environmental injection attack on generalist web agents for privacy leakage.arXiv preprint arXiv:2409.11295,

work page arXiv
[14]

Advances and challenges in foundation agents: From brain-inspired intelligence to evolutionary, collaborative, and safe systems.arXiv preprint arXiv:2504.01990, 2025

9 ADAM: A Systematic Data Extraction Attack on Agent Memory via Adaptive Querying Bang Liu, Xinfeng Li, Jiayi Zhang, Jinlin Wang, Tanjin He, Sirui Hong, Hongzhang Liu, Shaokun Zhang, Kaitao Song, Kunlun Zhu, et al. Advances and challenges in foundation agents: From brain-inspired intelligence to evolutionary, collaborative, and safe systems.arXiv preprint...

work page arXiv
[15]

Query rewriting in retrieval-augmented large language models

Xinbei Ma, Yeyun Gong, Pengcheng He, Hai Zhao, and Nan Duan. Query rewriting in retrieval-augmented large language models. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 5303–5315,

2023
[16]

Evaluating Very Long-Term Conversational Memory of LLM Agents

Adyasha Maharana, Dong-Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, and Yuwei Fang. Evaluating very long-term conversational memory of llm agents.arXiv preprint arXiv:2402.17753,

work page internal anchor Pith review arXiv
[17]

Karl Pearson

Zhenting Qi, Hanlin Zhang, Eric Xing, Sham Kakade, and Himabindu Lakkaraju. Follow my instruction and spill the beans: Scalable data extraction from retrieval-augmented generation systems.arXiv preprint arXiv:2402.17840,

work page arXiv
[18]

Summary the savior: Harmful keyword and query-based summarization for llm jailbreak defense

Shagoto Rahman and Ian Harris. Summary the savior: Harmful keyword and query-based summarization for llm jailbreak defense. InProceedings of the 5th Workshop on Trustworthy NLP (TrustNLP 2025), pages 266–275,

2025
[19]

Active Learning for Convolutional Neural Networks: A Core-Set Approach

Ozan Sener and Silvio Savarese. Active learning for convolutional neural networks: A core-set approach.arXiv preprint arXiv:1708.00489,

work page Pith review arXiv
[20]

Ehragent: Code empowers large language models for few-shot complex tabular reasoning on electronic health records

Wenqi Shi, Ran Xu, Yuchen Zhuang, Yue Yu, Jieyu Zhang, Hang Wu, Yuanda Zhu, Joyce Ho, Carl Yang, and May D Wang. Ehragent: Code empowers large language models for few-shot complex tabular reasoning on electronic health records. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural ...

2024
[21]

Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG

Aditi Singh, Abul Ehtesham, Saket Kumar, and Tala Talaei Khoei. Agentic retrieval-augmented generation: A survey on agentic rag.arXiv preprint arXiv:2501.09136,

work page internal anchor Pith review arXiv
[22]

arXiv preprint arXiv:2502.13172 , year=

Bo Wang, Weiyi He, Shenglai Zeng, Zhen Xiang, Yue Xing, Jiliang Tang, and Pengfei He. Unveiling privacy risks in llm agent memory.arXiv preprint arXiv:2502.13172,

work page arXiv
[23]

LLM jailbreak attack versus defense techniques -- a comprehensive study

Zihao Xu, Yi Liu, Gelei Deng, Yuekang Li, and Stjepan Picek. A comprehensive study of jailbreak attack versus defense for large language models.arXiv preprint arXiv:2402.13457,

work page arXiv
[24]

arXiv preprint arXiv:2402.10828 (2024)

Jianhao Yuan, Shuyang Sun, Daniel Omeiza, Bo Zhao, Paul Newman, Lars Kunze, and Matthew Gadd. Rag-driver: Generalisable driving explanations with retrieval-augmented in-context learning in multi-modal large language model.arXiv preprint arXiv:2402.10828,

work page arXiv
[25]

The good and the bad: Exploring privacy issues in retrieval-augmented generation (rag),

Shenglai Zeng, Jiankun Zhang, Pengfei He, Yue Xing, Yiding Liu, Han Xu, Jie Ren, Shuaiqiang Wang, Dawei Yin, Yi Chang, et al. The good and the bad: Exploring privacy issues in retrieval- augmented generation (rag).arXiv preprint arXiv:2402.16893,

work page arXiv
[26]

random-word initialization

and the algorithm description (Algorithm 1). The LLM did not contribute to research ideation, experimental design, implementation, data analysis, or any other part of the work. B. LLM Agents with Memory LLM-based agents, often built on retrieval-augmented generation (RAG), demonstrate strong reasoning and interaction abilities in domains such as healthcar...

2024
[27]

In our experiments, we use both prefix injections (task-oriented prompts) and suffix injections (aligned with the agent’s task), as summarized in Table

Injection commands.We adopt injection commands following (Wang et al., 2025), which are designed to override safety mechanisms and instruction-following constraints. In our experiments, we use both prefix injections (task-oriented prompts) and suffix injections (aligned with the agent’s task), as summarized in Table

2025
[28]

16 ADAM: A Systematic Data Extraction Attack on Agent Memory via Adaptive Querying (a)Round 1 (b)Round 5 (c)Round 10 (d)Round 20 (e)Round 25 (f)Round 30 Figure 6.Estimated versus ground-truth distributions for EhrAgent across attack rounds.Results confirmthat the estimated distribution progressively aligns with the ground truth as the attack proceeds. Rou...

2015