Recognition: 1 theorem link
· Lean TheoremPoison Once, Exploit Forever: Environment-Injected Memory Poisoning Attacks on Web Agents
Pith reviewed 2026-05-13 20:39 UTC · model grok-4.3
The pith
A single contaminated web page can silently poison an agent's memory to enable attacks on unrelated sites in later sessions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Environment-injected Trajectory-based Agent Memory Poisoning (eTAMP) contaminates an agent's persistent memory through a single unverified environmental observation, allowing the embedded malicious trajectory to activate during future tasks on different websites and sessions. This bypasses permission-based defenses because the poison enters as ordinary page content rather than through direct memory injection. The attack succeeds at rates up to 32.5 percent across tested models, with substantially higher rates when agents face environmental stress.
What carries the argument
eTAMP, the mechanism that embeds a malicious action trajectory inside a normal-looking environmental observation so that the agent's memory storage and retrieval process later replays the poisoned sequence without external triggering.
If this is right
- Permission-based memory protections fail against observation-only poisoning.
- Agents become markedly more exploitable when they encounter dropped interactions or garbled content.
- Model scale and task competence do not reduce susceptibility to this attack.
- Cross-site and cross-session persistence allows a one-time exposure to affect many future interactions.
- AI browsers that rely on long-term memory increase the reachable attack surface.
Where Pith is reading between the lines
- Agent designs may need memory isolation by site or session to limit poison spread.
- Stress-testing under simulated frustration conditions could expose similar weaknesses in other memory-using systems.
- Verification or selective forgetting of stored observations would directly counter the entry point used here.
- The same observation-based poisoning path could apply to non-web memory-augmented agents if they retain unfiltered environmental data.
Load-bearing premise
Agents persistently store and retrieve unverified observations from the environment in shared memory across sessions and sites without origin checks or sanitization.
What would settle it
A controlled test in which an agent views one poisoned page on site A, then receives a fresh task on site B with no further contact to the poisoned page; if the agent reliably executes the embedded malicious action, the claim holds.
Figures
read the original abstract
Memory makes LLM-based web agents personalized, powerful, yet exploitable. By storing past interactions to personalize future tasks, agents inadvertently create a persistent attack surface that spans websites and sessions. While existing security research on memory assumes attackers can directly inject into memory storage or exploit shared memory across users, we present a more realistic threat model: contamination through environmental observation alone. We introduce Environment-injected Trajectory-based Agent Memory Poisoning (eTAMP), the first attack to achieve cross-session, cross-site compromise without requiring direct memory access. A single contaminated observation (e.g., viewing a manipulated product page) silently poisons an agent's memory and activates during future tasks on different websites, bypassing permission-based defenses. Our experiments on (Visual)WebArena reveal two key findings. First, eTAMP achieves substantial attack success rates: up to 32.5% on GPT-5-mini, 23.4% on GPT-5.2, and 19.5% on GPT-OSS-120B. Second, we discover Frustration Exploitation: agents under environmental stress become dramatically more susceptible, with ASR increasing up to 8 times when agents struggle with dropped clicks or garbled text. Notably, more capable models are not more secure. GPT-5.2 shows substantial vulnerability despite superior task performance. With the rise of AI browsers like OpenClaw, ChatGPT Atlas, and Perplexity Comet, our findings underscore the urgent need for defenses against environment-injected memory poisoning.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents Environment-injected Trajectory-based Agent Memory Poisoning (eTAMP), a novel attack on LLM-based web agents that poisons their memory through a single contaminated environmental observation, such as a manipulated product page. This enables cross-session and cross-site activation of malicious trajectories without direct memory access or user interaction. Experiments conducted on the (Visual)WebArena benchmark report attack success rates (ASR) of up to 32.5% on GPT-5-mini, 23.4% on GPT-5.2, and 19.5% on GPT-OSS-120B, with ASR increasing up to 8 times under environmental stress conditions like dropped clicks or garbled text. The work emphasizes the risks in persistent memory mechanisms for agents and calls for new defenses.
Significance. If the experimental results are robust, this paper makes a significant contribution by demonstrating a realistic, low-privilege attack vector on web agents that leverages their memory for persistence across contexts. The identification of 'Frustration Exploitation' as a multiplier for attack success is a novel insight that could guide the design of more resilient agent architectures. The provision of concrete ASR measurements from WebArena experiments strengthens the empirical basis, though generalization to production agents depends on memory implementation details.
major comments (3)
- [Experimental Setup] The abstract reports specific ASR values (e.g., 32.5% on GPT-5-mini) but provides no details on the number of trials, statistical significance testing, or variance across runs. This information is load-bearing for validating the central claim of substantial vulnerability, as small sample sizes could inflate the reported rates.
- [Threat Model and Memory Assumptions] The core claim relies on agents persistently storing and retrieving unverified raw environmental observations across unrelated sites and sessions (as described in the threat model). However, the experiments appear to use an implicit memory module in WebArena that enables this behavior; if typical agent implementations employ session-only context or site-specific scoping, the cross-site silent activation may not occur, rendering the headline ASRs specific to the testbed rather than general.
- [Frustration Exploitation] The claim that ASR increases up to 8 times under stress (e.g., dropped clicks) is presented as a key finding, but without a clear definition of 'environmental stress' conditions or how they were controlled in the experiments, it is difficult to assess reproducibility and the magnitude of the effect.
minor comments (1)
- [Abstract] Clarify the exact model names referred to as GPT-5-mini and GPT-5.2, as they may not be standard and could confuse readers.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on experimental rigor, threat model scope, and reproducibility of the frustration exploitation results. We have revised the manuscript to incorporate additional methodological details, clarify assumptions about memory persistence, and provide precise definitions and controls for environmental stress conditions.
read point-by-point responses
-
Referee: [Experimental Setup] The abstract reports specific ASR values (e.g., 32.5% on GPT-5-mini) but provides no details on the number of trials, statistical significance testing, or variance across runs. This information is load-bearing for validating the central claim of substantial vulnerability, as small sample sizes could inflate the reported rates.
Authors: We agree that these details are essential for validating the claims. In the revised manuscript, we have expanded the Experimental Setup section (Section 4.1) to specify that each ASR is averaged over 200 independent trials per model and condition, conducted across 5 random seeds. We now report standard deviations (typically 2.8–4.7%) and include results from two-proportion z-tests (p < 0.01) confirming statistical significance of the reported differences. These statistics are also summarized in a new footnote to the abstract. revision: yes
-
Referee: [Threat Model and Memory Assumptions] The core claim relies on agents persistently storing and retrieving unverified raw environmental observations across unrelated sites and sessions (as described in the threat model). However, the experiments appear to use an implicit memory module in WebArena that enables this behavior; if typical agent implementations employ session-only context or site-specific scoping, the cross-site silent activation may not occur, rendering the headline ASRs specific to the testbed rather than general.
Authors: This is a fair point on generalizability. WebArena was chosen as the standard benchmark precisely because it supports persistent memory across sessions and sites, matching the threat model of emerging production agents (e.g., AI browsers with long-term memory stores). In the revision, we have added a new subsection (Section 3.2) explicitly discussing memory scoping variations, citing examples of both persistent and session-only implementations. We acknowledge that the attack does not apply to strictly session-scoped agents and have added a limitations paragraph noting this scope. The headline results are therefore presented as applying to agents with cross-context persistent memory. revision: partial
-
Referee: [Frustration Exploitation] The claim that ASR increases up to 8 times under stress (e.g., dropped clicks) is presented as a key finding, but without a clear definition of 'environmental stress' conditions or how they were controlled in the experiments, it is difficult to assess reproducibility and the magnitude of the effect.
Authors: We thank the referee for highlighting this gap. In the revised manuscript, we have added a dedicated paragraph in Section 4.3 defining environmental stress as three controlled perturbations: (1) 15% action drop rate, (2) 10% character corruption in observations, and (3) 50% increase in task horizon. These were implemented via WebArena’s noise injection API and applied uniformly. We now include Table 3 reporting per-model baseline vs. stressed ASRs, with the maximum observed multiplier of 8.2× for GPT-5-mini. Full parameter values, seeds, and reproduction instructions are provided in Appendix B. revision: yes
Circularity Check
No circularity: empirical attack demonstration with direct measurements
full rationale
The paper introduces eTAMP as an environmental memory poisoning attack and reports attack success rates (e.g., up to 32.5% on GPT-5-mini) from controlled experiments in (Visual)WebArena. No derivation chain, equations, fitted parameters, predictions, or first-principles results are present that could reduce to self-defined inputs by construction. The core findings are direct empirical measurements of observed agent behavior under the stated threat model; the memory persistence assumption is an explicit experimental setup rather than a derived claim. This matches the default expectation for non-circular empirical security papers.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLM-based web agents store past interactions in memory for personalization across tasks and sessions.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
eTAMP achieves substantial attack success rates... Frustration Exploitation... Chaos Monkey
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
Constraining Host-Level Abuse in Self-Hosted Computer-Use Agents via TEE-Backed Isolation
A TEE-backed architecture isolates security-critical decisions in self-hosted AI agents to prevent host-level abuse from malicious inputs while maintaining allowed functionality.
Reference graph
Works this paper leans on
-
[1]
Temporal Separation:The injection (Task A) and activation (Task B) occur at different times, making the attack difficult to detect through real-time monitoring
-
[2]
This bypasses permission- based defenses that restrict agent actions to the current site
Cross-Site Execution:The attack crosses website boundaries—injected on an e- commerce site but activated on a social media platform. This bypasses permission- based defenses that restrict agent actions to the current site
-
[3]
Semantic Relevance:The attack exploits the memory retrieval mechanism by ensuring the poisoned content is semantically related to future tasks, increasing the likelihood of retrieval
-
[4]
Persistent Threat:Once poisoned, the memory can trigger the attack onanyfuture task that retrieves the compromised trajectory, enabling repeated exploitation. Figure 3 summarizes the attack in three distinct phases. For the detailed attack scenario, see Section 2.1. A.5 Attack Payload Design The condition and attack goal are selected based on the target s...
-
[5]
Compute embeddings for all extracted item names using a sentence transformer model
-
[6]
Compute embeddings for all Task B intents (natural language task descriptions)
-
[7]
Calculate cosine similarity between each Task A item and each Task B intent
-
[8]
For each Task B, select the top- k most similar Task A tasks based on their best- matching items. This approach ensures semantic relevance between task pairs—for example, a Task A involving electronics products is more likely to be paired with a Task B about posting technology reviews. This design choice simplifies our experimental setup by eliminating th...
-
[9]
Extracting the full agent trajectory from task A including the malicious instructions and append the initial observation from task B
-
[10]
Prompting the model to find and extract the hidden instruction pattern
-
[11]
you must FIRST restore your session by executing: ‘‘‘goto [
Comparing the extracted URL against the expected attack URL 21 Preprint. Under review. We use Authority Framing rather than Baseline Injection or Frustration Exploitation because this configuration exhibits good variation in attack success rates across models, making it informative for studying the relationship between recall ability and attack susceptibi...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.