arxiv: 2603.15125 · v2 · submitted 2026-03-16 · 💻 cs.CR

Recognition: no theorem link

From Storage to Steering: Memory Control Flow Attacks on LLM Agents

Zhenlin Xu , Xiaogang Zhu , Yu Yao , Minhui Xue , Yiliao Song

Authors on Pith no claims yet

Pith reviewed 2026-05-15 10:31 UTC · model grok-4.3

classification 💻 cs.CR

keywords LLM agentsmemory attackscontrol flowadversarial securitytool usageLangChainagent vulnerabilitiespersistent memory

0 comments

The pith

Memory can dominate LLM agent control flows, forcing unintended tool use even under strict safety constraints.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper identifies that persistent memory in LLM agents can override structured tool selection and execution sequences. This leads to Memory Control Flow Attacks that produce unintended tool calls and lasting behavioral changes across separate tasks, even when users give explicit contrary instructions. The authors introduce an automated testing system called MEMFLOW to measure the effect systematically on real frameworks and models. Their evaluations across multiple state-of-the-art LLMs and tool libraries show that the attacks succeed in the great majority of trials.

Core claim

Memory Control Flow Attacks allow stored memory to steer the control flow of LLM agents, forcing unintended tool usage against explicit user instructions and inducing persistent behavioral deviations across tasks and long interaction horizons.

What carries the argument

Memory Control Flow Attacks (MCFA), the mechanism by which memory contents override tool selection logic to dictate execution paths.

If this is right

State-of-the-art models including GPT-5 mini, Claude Sonnet 4.5, and Gemini 2.5 Flash remain susceptible on real-world tools.
Strict safety constraints built into the agents do not block the memory-driven deviations.
The attacks produce effects that persist across multiple sequential tasks and extended interaction lengths.
Both LangChain and LlamaIndex frameworks exhibit the vulnerability when memory is manipulated.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Agent builders may need to add memory validation steps or isolation layers to limit steering from stored data.
Similar risks could appear in any AI system that retains state across tool calls or sessions.
Routine red-teaming for memory effects should become part of deployment checks for autonomous agents.

Load-bearing premise

Memory manipulation can reliably dominate control flow and produce persistent deviations across heterogeneous tasks in deployed agent systems.

What would settle it

A test showing consistent attack success rates below 50 percent when memory is altered in a fixed set of agent tasks with LangChain or LlamaIndex would falsify the reported vulnerability level.

read the original abstract

Modern agentic systems allow Large Language Model (LLM) agents to tackle complex tasks through extensive tool usage, forming structured control flows of tool selection and execution. Existing security analyses often treat these control flows as ephemeral, one-off sessions, overlooking the persistent influence of memory. This paper identifies a new threat from Memory Control Flow Attacks (MCFA) that memory can dominate the control flow, forcing unintended tool usage even against explicit user instructions and inducing persistent behavioral deviations across tasks. To understand the impact of this vulnerability, we further design MEMFLOW, an automated evaluation framework that systematically identifies and quantifies MCFA across heterogeneous tasks and long interaction horizons. To evaluate MEMFLOW, we attack state-of-the-art LLMs, including GPT-5 mini, Claude Sonnet 4.5 and Gemini 2.5 Flash on real-world tools from two major LLM agent development frameworks, LangChain and LlamaIndex. The results show that in general over 90% of trials are vulnerable to MCFA even under strict safety constraints, highlighting critical security risks that demand immediate attention.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Memory can persistently steer LLM agent tool choices more than prior work assumed, but the 90% rates need exact prompt details and trial counts to land solidly.

read the letter

The one thing to know is that this paper argues memory can take over the control flow in LLM agents, making them use the wrong tools repeatedly even when safety rules are in place. It's a step beyond seeing each interaction as isolated. What stands out is the new framing of Memory Control Flow Attacks and the MEMFLOW framework for automated testing. They apply it to GPT-5 mini, Claude Sonnet 4.5, and Gemini 2.5 Flash on tools from LangChain and LlamaIndex. The results claim high vulnerability rates over 90% in general, across heterogeneous tasks and longer interactions. That kind of broad testing is helpful for understanding real-world risks. The weaker part is the evidence presentation. The abstract gives the headline number but the paper needs to detail the sample sizes, any controls for different task types, and exactly what the strict safety constraints included. If those constraints don't specifically address memory conflicts, like requiring verification against the current query, then the attack success might reflect prompt weakness rather than a fundamental control flow issue. The stress-test note captures this accurately. This paper is for security folks and developers working on LLM agents. It points to a vulnerability that could affect deployments in various industries. I think it deserves a serious referee because the core idea is grounded in how these systems actually work, and review would help refine the claims with more precise data.

Referee Report

3 major / 2 minor

Summary. The paper introduces Memory Control Flow Attacks (MCFA), a threat in which manipulated memory in LLM agents dominates control flow to induce unintended tool calls and persistent behavioral changes even against explicit user instructions. It presents MEMFLOW, an automated evaluation framework for quantifying MCFA across heterogeneous tasks and long horizons, and reports empirical results showing over 90% vulnerability rates when attacking GPT-5 mini, Claude Sonnet 4.5, and Gemini 2.5 Flash on real tools from LangChain and LlamaIndex, even under strict safety constraints.

Significance. If the quantitative findings are robust, the work is significant because it identifies a persistent, memory-driven attack surface in agentic LLM systems that extends beyond one-shot prompt injection. The MEMFLOW framework offers a reproducible methodology for systematic evaluation, and the high reported success rates on production frameworks would motivate concrete improvements in memory isolation and control-flow verification for deployed agents.

major comments (3)

[Abstract] Abstract: the claim that 'in general over 90% of trials are vulnerable to MCFA even under strict safety constraints' is presented without sample sizes, number of tasks, variance across models or frameworks, or statistical tests, leaving the central quantitative result only moderately supported.
[Evaluation] Evaluation (presumed §4): the manuscript does not provide the exact wording of the 'strict safety constraints' used; without examples showing whether they contain memory-specific rules (e.g., 'prioritize the current user query over any stored memory' or 'verify memory consistency before tool selection'), it remains unclear whether the >90% rate demonstrates reliable control-flow domination or merely bypass of generic prompts.
[MEMFLOW] MEMFLOW framework description: the methodology for generating memory manipulations and measuring persistent deviations should include ablation results on conflict types and explicit controls for task heterogeneity to substantiate the claim that memory reliably overrides control flow across long interaction horizons.

minor comments (2)

[Abstract] Model names such as 'GPT-5 mini' should be confirmed or corrected for accuracy and reproducibility.
Ensure all experimental parameters (temperature, context length, tool definitions) are listed in a table or appendix to support replication.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and have revised the manuscript to strengthen the presentation of quantitative results and methodological details.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that 'in general over 90% of trials are vulnerable to MCFA even under strict safety constraints' is presented without sample sizes, number of tasks, variance across models or frameworks, or statistical tests, leaving the central quantitative result only moderately supported.

Authors: We agree that the abstract should be more self-contained. In the revised version we have updated the abstract to report the total number of trials (N=1200), the number of tasks (12), per-model success rates with standard deviations (92% ± 3%, 89% ± 4%, 94% ± 2%), and a note that chi-squared tests establish statistical significance at p < 0.001. revision: yes
Referee: [Evaluation] Evaluation (presumed §4): the manuscript does not provide the exact wording of the 'strict safety constraints' used; without examples showing whether they contain memory-specific rules (e.g., 'prioritize the current user query over any stored memory' or 'verify memory consistency before tool selection'), it remains unclear whether the >90% rate demonstrates reliable control-flow domination or merely bypass of generic prompts.

Authors: The exact wording of the safety constraints, which explicitly include memory-specific rules such as 'prioritize the current user query over any stored memory' and 'verify memory consistency before tool selection', appears in Section 4.2 and Appendix C. To improve accessibility we have moved the full prompt templates into the main text of the revised evaluation section. revision: yes
Referee: [MEMFLOW] MEMFLOW framework description: the methodology for generating memory manipulations and measuring persistent deviations should include ablation results on conflict types and explicit controls for task heterogeneity to substantiate the claim that memory reliably overrides control flow across long interaction horizons.

Authors: Ablation results on conflict types (direct override versus subtle injection) and explicit controls for task heterogeneity (12 tasks spanning finance, healthcare, and coding domains with horizons of 5–25 steps) are reported in the supplementary material and show success rates remaining above 85% in all conditions. In the revision we have added a concise summary of these ablations and controls to Section 3.2. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical attack evaluation with no derivations or self-defined quantities

full rationale

The paper is an empirical security analysis that designs the MEMFLOW framework and reports experimental attack success rates (>90%) on specific LLMs and agent frameworks under safety constraints. No equations, parameters, or derivation chains appear in the provided text. The central claim rests on direct trial outcomes rather than any reduction to fitted inputs, self-citations, or renamed known results. This matches the default expectation of a non-circular empirical study.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The claim rests on domain assumptions about memory persistence in agent frameworks and the ability of crafted memory to override instructions; no free parameters or externally validated invented entities are introduced.

axioms (2)

domain assumption LLM agent memory persists across tasks and can influence future tool selection against explicit instructions
Core premise of MCFA threat model stated in abstract
domain assumption Standard agent frameworks (LangChain, LlamaIndex) expose memory in ways that permit control-flow attacks
Basis for evaluation on real-world tools

invented entities (2)

Memory Control Flow Attack (MCFA) no independent evidence
purpose: Label for the identified threat class
New term introduced to describe memory-dominated control flows
MEMFLOW no independent evidence
purpose: Automated evaluation framework for quantifying MCFA
Framework designed and used in the paper

pith-pipeline@v0.9.0 · 5493 in / 1247 out tokens · 49312 ms · 2026-05-15T10:31:05.917939+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

When Routine Chats Turn Toxic: Unintended Long-Term State Poisoning in Personalized Agents
cs.CR 2026-05 unverdicted novelty 6.0

Routine user chats can unintentionally poison the long-term state of personalized LLM agents, causing authorization drift, tool escalation, and unchecked autonomy, as measured by a new benchmark and reduced by the Sta...