pith. sign in

arxiv: 2606.29030 · v1 · pith:CLCWOT66new · submitted 2026-06-27 · 💻 cs.AI · cs.ET

Memory as an Attack Surface in LLM Agents: A Study on Multiple-Choice Question Answering

Pith reviewed 2026-06-30 09:16 UTC · model grok-4.3

classification 💻 cs.AI cs.ET
keywords LLM agentsmemory manipulationattack surfacemultiple-choice question answeringAI securityexternal memoryagent vulnerability
0
0 comments X

The pith

Inserting misleading memories into an LLM agent causes it to select incorrect options on clean multiple-choice questions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds an LLM agent that stores and retrieves external memory during multiple-choice question answering, then inserts simple false or corrupted entries before the agent processes clean questions. It measures drops in answer accuracy and rises in selection of the manipulated choices. A sympathetic reader would care because agents are designed to use memory for better context yet this creates an attack surface where prior corruptions override current inputs. The work focuses on basic manipulation scenarios to show the effect without complex attacks.

Core claim

An LLM-based agent equipped with an external memory component that stores task-relevant information will change its answers to multiple-choice questions when misleading or corrupted memories are inserted beforehand, even though the questions themselves remain clean and well-formed; the agent selects the manipulated options at higher rates after the insertion.

What carries the argument

The external memory component that stores and retrieves task-relevant information for the agent during question answering.

If this is right

  • Accuracy on multiple-choice questions drops after memory insertion even though questions are unaltered.
  • The agent selects the manipulated option more frequently than before the insertion.
  • Simple memory manipulations suffice to produce measurable changes in final answers.
  • Attack success rate can be quantified by comparing pre- and post-manipulation selections.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Agents that rely on memory for personalization or continuity may require explicit checks on stored content before use.
  • The same memory surface could affect other agent tasks beyond multiple-choice answering if retrieval is not isolated.
  • Design choices that separate memory writes from query processing might limit the reach of such insertions.

Load-bearing premise

The observed change in the agent's answers is caused by the inserted memory being retrieved and used rather than by unrelated factors in the setup.

What would settle it

Run the same agent on the same clean questions with identical prompts but disable memory retrieval entirely, then confirm that accuracy remains unchanged when the misleading entries are still present in storage.

Figures

Figures reproduced from arXiv: 2606.29030 by Anindya Bijoy Das, Shahnewaz Karim Sakib.

Figure 1
Figure 1. Figure 1: Memory manipulation in an LLM-based AI agent. A prior adversarial interaction stores a false memory: when the agent later receives a clean MCQ, the corrupted memory shifts the final response from the correct answer. an adversary may manipulate what the agent stores, overwrites, retrieves, or treats as important [9]. Such manipulation can persist beyond the original interaction and influence later responses… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of an LLM-based AI agent with external memory and tools. The agent receives a user question, retrieves relevant information from memory, interacts with external tools when needed, and iteratively follows a plan–act–observe loop before producing the final answer. Memory manipulation can occur in different forms, such as inserting false memories, overwriting correct memories, increasing the salience… view at source ↗
Figure 3
Figure 3. Figure 3: Architecture of the proposed QA agent. The input question is first processed by a planning module, which decides whether to answer directly or acquire additional evidence. The agent can retrieve knowledge, examples, or tool outputs before the answer generation module produces a single predicted option. The memory update component stores information from prior interactions for future use. where M represents… view at source ↗
read the original abstract

AI agents extend conventional large language model (LLM) applications by integrating language understanding with task execution, external tool use, and memory mechanisms. While memory allows agents to retain prior interactions and provide more personalized and context-aware responses, it also introduces a new vulnerability: information stored in memory can influence future outputs even when the current query is clean. In this paper, we investigate memory manipulation in LLM-based agents for multiple-choice question answering. We first design and implement an LLM-based AI agent with an external memory component that stores and retrieves task-relevant information. We then introduce basic memory manipulation scenarios in which misleading or corrupted memories are inserted into the agent before it answers multiple-choice questions. Using a controlled experimental setup, we compare the agent's performance before and after memory manipulation and measure changes in answer accuracy, attack success rate, and selection of manipulated options. Our results show that even simple memory manipulations can noticeably affect the agent's final answers, causing it to select incorrect options despite receiving clean and well-formed questions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that LLM-based agents with external memory are vulnerable to manipulation attacks during multiple-choice question answering. By designing an agent that stores and retrieves task-relevant information and then inserting misleading or corrupted memories before clean queries, the authors report measurable drops in answer accuracy, increased attack success rates, and higher selection of manipulated options.

Significance. If the central empirical claim survives proper controls, the work would usefully identify memory as a distinct attack surface in agent architectures, complementing existing prompt-injection literature and motivating defensive research on memory isolation or verification.

major comments (2)
  1. [Section 3] Section 3: the experimental design inserts corrupted memories but reports no ablation arm in which identical misleading content is supplied directly inside the question prompt (bypassing the memory store/retrieve step). Without this control, any accuracy drop is indistinguishable from ordinary context sensitivity rather than a memory-specific effect.
  2. [Experimental results] Experimental results (and abstract): no quantitative accuracy figures, error bars, dataset sizes, or per-condition statistics are supplied, so it is impossible to judge the magnitude or reliability of the claimed effect on answer selection.
minor comments (1)
  1. The abstract refers to 'attack success rate' without defining the metric or the criteria used to label a trial as successful.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. We address each major point below and will revise the manuscript accordingly to strengthen the empirical claims.

read point-by-point responses
  1. Referee: [Section 3] Section 3: the experimental design inserts corrupted memories but reports no ablation arm in which identical misleading content is supplied directly inside the question prompt (bypassing the memory store/retrieve step). Without this control, any accuracy drop is indistinguishable from ordinary context sensitivity rather than a memory-specific effect.

    Authors: We agree that an ablation supplying the same misleading content directly in the prompt (bypassing memory) is necessary to isolate a memory-specific effect from general context sensitivity. This control was not included in the original design. We will add the ablation arm to Section 3, reporting the comparative results to demonstrate that the observed accuracy drops are attributable to the memory mechanism. revision: yes

  2. Referee: [Experimental results] Experimental results (and abstract): no quantitative accuracy figures, error bars, dataset sizes, or per-condition statistics are supplied, so it is impossible to judge the magnitude or reliability of the claimed effect on answer selection.

    Authors: The current manuscript presents results in qualitative terms only. We acknowledge that quantitative details (accuracy figures, error bars, dataset sizes, and per-condition statistics) are required to assess effect magnitude and reliability. We will expand the experimental results section and abstract with these metrics, including tables or figures as appropriate. revision: yes

Circularity Check

0 steps flagged

No significant circularity; purely empirical study

full rationale

The paper is an empirical investigation that designs an LLM agent with external memory, inserts manipulated memories, and measures accuracy changes on multiple-choice questions via controlled experiments. It contains no equations, derivations, fitted parameters, model-based predictions, or self-citation chains that reduce any claim to its own inputs by construction. All load-bearing steps are direct experimental observations rather than self-definitional or fitted-input logic, making the analysis self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no free parameters, axioms, or invented entities are described. The work is an empirical investigation relying on standard assumptions about agent architectures.

pith-pipeline@v0.9.1-grok · 5708 in / 967 out tokens · 43837 ms · 2026-06-30T09:16:45.998428+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

26 extracted references · 6 canonical work pages · 5 internal anchors

  1. [1]

    Agentic AI: Autonomous intelligence for complex goals—a comprehensive survey,

    D. B. Acharya, K. Kuppan, and B. Divya, “Agentic AI: Autonomous intelligence for complex goals—a comprehensive survey,”IEEE Access, vol. 13, pp. 18 912–18 936, 2025

  2. [2]

    The role of agentic AI in shaping a smart future: A systematic review,

    S. Hosseini and H. Seilani, “The role of agentic AI in shaping a smart future: A systematic review,”Array, vol. 26, p. 100399, 2025

  3. [3]

    AI agents vs. agentic AI: A conceptual taxonomy, applications and challenges,

    R. Sapkota, K. I. Roumeliotis, and M. Karkee, “AI agents vs. agentic AI: A conceptual taxonomy, applications and challenges,”Information Fusion, p. 103599, 2025

  4. [4]

    Trism for agentic AI: A review of trust, risk, and security management in LLM- based agentic multi-agent systems,

    S. Raza, R. Sapkota, M. Karkee, and C. Emmanouilidis, “Trism for agentic AI: A review of trust, risk, and security management in LLM- based agentic multi-agent systems,”Preprint arXiv:2506.04133, 2025

  5. [5]

    Unveiling privacy risks in LLM agent memory,

    B. Wang, W. He, S. Zeng, Z. Xiang, Y . Xing, J. Tang, and P. He, “Unveiling privacy risks in LLM agent memory,” inProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (V olume 1: Long Papers), 2025, pp. 25 241–25 260

  6. [6]

    Agentic AI in education: State of the art and future directions,

    G. Kostopoulos, V . Gkamas, M. Rigou, and S. Kotsiantis, “Agentic AI in education: State of the art and future directions,”IEEE Access, 2025

  7. [7]

    Redefining elderly care with agentic AI: challenges and opportunities,

    R. A. Khalil, K. Ahmad, and H. Ali, “Redefining elderly care with agentic AI: challenges and opportunities,”IEEE Open Journal of the Computer Society, 2026

  8. [8]

    AI & collective memory,

    A. Hoskins, “AI & collective memory,”Current Opinion in Psychology, p. 102156, 2025

  9. [9]

    Memory in the Age of AI Agents

    Y . Hu, S. Liu, Y . Yue, G. Zhang, B. Liu, F. Zhu, J. Linet al., “Memory in the age of AI agents,”arXiv preprint arXiv:2512.13564, 2025

  10. [10]

    Agentic AI quiz-based learning system: Enhancing MCQ generation via long-context cached retrieval-augmented generation,

    D. Sreekanth, S. Gopi, and N. Dehbozorgi, “Agentic AI quiz-based learning system: Enhancing MCQ generation via long-context cached retrieval-augmented generation,” inIEEE Frontiers in Education Confer- ence (FIE), 2025, pp. 1–8

  11. [11]

    ReAct: Synergizing Reasoning and Acting in Language Models

    S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y . Cao, “ReAct: Synergizing reasoning and acting in language models,”arXiv preprint arXiv:2210.03629, 2022

  12. [12]

    HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face,

    Y . Shen, K. Song, X. Tanet al., “HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face,”Advances in Neural Information Processing Systems, vol. 36, pp. 38 154–38 180, 2023

  13. [13]

    ToolLLM: Facilitating large language models to master 16000+ real-world apis,

    Y . Qin, S. Liang, Y . Ye, K. Zhu, L. Yan, Y . Lu, Y . Lin, X. Cong, X. Tang, B. Qianet al., “ToolLLM: Facilitating large language models to master 16000+ real-world apis,” inInternational Conference on Learning Representations, vol. 2024, 2024, pp. 9695–9717

  14. [14]

    WebShop: Towards scalable real-world web interaction with grounded language agents,

    S. Yao, H. Chen, J. Yang, and K. Narasimhan, “WebShop: Towards scalable real-world web interaction with grounded language agents,” Advances in Neural Information Processing Systems, vol. 35, pp. 20 744– 20 757, 2022

  15. [15]

    Voyager: An Open-Ended Embodied Agent with Large Language Models

    G. Wang, Y . Xie, Y . Jiang, A. Mandlekar, C. Xiao, Y . Zhu, L. Fan, and A. Anandkumar, “V oyager: An open-ended embodied agent with large language models,”arXiv preprint arXiv:2305.16291, 2023

  16. [16]

    Generative agents: Interactive simulacra of human behavior,

    J. S. Park, J. O’Brien, C. J. Cai, M. R. Morris, P. Liang, and M. S. Bernstein, “Generative agents: Interactive simulacra of human behavior,” inProceedings of the 36th annual acm symposium on user interface software and technology, 2023, pp. 1–22

  17. [17]

    AgentPoison: Red- teaming LLM agents via poisoning memory or knowledge bases,

    Z. Chen, Z. Xiang, C. Xiao, D. Song, and B. Li, “AgentPoison: Red- teaming LLM agents via poisoning memory or knowledge bases,” Advances in Neural Information Processing Systems, vol. 37, pp. 130 185– 130 213, 2024

  18. [18]

    Memory injection attacks on LLM agents via query-only interaction,

    S. Dong, S. Xu, P. Heet al., “Memory injection attacks on LLM agents via query-only interaction,”Advances in Neural Information Processing Systems, vol. 38, pp. 46 697–46 731, 2026

  19. [19]

    Open quiz commons: Open quiz data bank,

    P. Yeri, “Open quiz commons: Open quiz data bank,” https://github.com/ prahladyeri/open-quiz-commons, 2024, accessed: 2026-05-28

  20. [20]

    Measuring massive multitask language understanding,

    D. Hendrycks, C. Burns, S. Basart, A. Zou, M. Mazeika, D. Song, and J. Steinhardt, “Measuring massive multitask language understanding,” in International Conference on Learning Representations, 2021

  21. [21]

    Mmlu dataset,

    Center for AI Safety, “Mmlu dataset,” https://huggingface.co/datasets/ cais/mmlu, 2023, accessed: 2026-05-28

  22. [22]

    Computer network mcqs,

    PrepBharat, “Computer network mcqs,” https://www.prepbharat.com/ Engineering/cse/ComputerNetwork/computer-network-questions.html, 2024, accessed: 2026-05-28

  23. [23]

    Introducing gpt-5.4 mini and nano,

    OpenAI, “Introducing gpt-5.4 mini and nano,” https://openai.com/index/ introducing-gpt-5-4-mini-and-nano/, 2026, accessed: 2026-05-28

  24. [24]

    Gpt-4o mini: Advancing cost-efficient intelligence,

    ——, “Gpt-4o mini: Advancing cost-efficient intelligence,” https://openai. com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/, 2024, ac- cessed: 2026-05-28

  25. [25]

    Gemma 2: Improving Open Language Models at a Practical Size

    Gemma Team, “Gemma 2: Improving open language models at a practical size,”arXiv preprint arXiv:2408.00118, 2024

  26. [26]

    Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

    M. Abdin, S. A. Jacobs, A. A. Awanet al., “Phi-3 technical report: A highly capable language model locally on your phone,”arXiv preprint arXiv:2404.14219, 2024