Recognition: unknown
Stateful Agent Backdoor
Pith reviewed 2026-05-08 09:19 UTC · model grok-4.3
The pith
Stateful backdoors enable LLM agents to execute incremental attacks across multiple sessions after a single trigger injection.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We propose a stateful agent backdoor that extends the attack lifecycle across multiple sessions under permission isolation by maintaining state through persistent components, enabling autonomous, incremental execution following a one-time trigger. We model the attack as a Mealy machine and derive a decomposition framework for independent per-transition data construction, instantiated with a primary attack achieving 80-95% success across models.
What carries the argument
A decomposition framework derived from modeling the backdoor as a Mealy machine, allowing independent construction of data for each transition in the attack sequence.
If this is right
- The primary instantiation achieves 80% to 95% attack success rate across four different models.
- Per-transition analysis confirms the effectiveness of the decomposition approach.
- Extensibility variants using alternative topologies and persistent components maintain consistent effectiveness.
Where Pith is reading between the lines
- This approach highlights potential vulnerabilities in agent systems that allow persistent storage across sessions.
- Developers of LLM agents may need to implement stricter isolation or reset mechanisms for persistent components.
- Future attacks could explore more complex state machines for longer attack sequences.
Load-bearing premise
Persistent components can reliably maintain attack state across sessions under permission isolation without detection or reset by the system.
What would settle it
A test where the system enforces permission isolation or resets persistent storage between sessions and checks if the attack state is lost, causing the incremental execution to fail.
Figures
read the original abstract
Existing backdoor attacks on Large Language Model-based agents remain stateless, executing fixed behaviors confined to a single session. We propose a stateful agent backdoor that extends the attack lifecycle across multiple sessions under permission isolation. The attack maintains state through persistent components, enabling autonomous, incremental execution across sessions following a one-time trigger injection. Formally, we model the attack as a Mealy machine and derive a decomposition framework that enables independent per-transition data construction. We instantiate this framework with a primary attack and two extensibility variants. The primary instantiation achieves an attack success rate of 80\%--95\% across four models, with per-transition analysis demonstrating the effectiveness of the decomposition. Extensibility variants with alternative topologies and persistent components demonstrate consistent effectiveness. Code and data are available at https://anonymous.4open.science/r/stateful_agent_backdoor-E89F.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a stateful backdoor attack on LLM-based agents that persists across multiple sessions via persistent components under permission isolation, unlike prior stateless backdoors. It models the attack as a Mealy machine and derives a decomposition framework enabling independent per-transition data construction. The primary instantiation reports 80-95% attack success rates across four models, with per-transition analysis supporting the decomposition's effectiveness; two extensibility variants using alternative topologies and persistent components are also evaluated and show consistent results. Code and data are released for reproducibility.
Significance. If the empirical results hold, the work meaningfully extends backdoor research by demonstrating how state persistence can enable incremental, autonomous attacks across sessions in agent systems. The Mealy-machine formalization and decomposition provide a structured, reusable construction method, and the code release supports verification and extension. This could inform defenses for multi-session agent deployments where stateful threats have not been a primary focus.
major comments (2)
- [Experimental Evaluation] The central empirical claim of 80-95% ASR relies on the per-transition analysis demonstrating decomposition effectiveness; the manuscript should explicitly report the number of transitions tested, the exact data-construction procedure per transition, and any statistical controls for variance across sessions to confirm the independence assumption is not violated by agent memory or context carry-over.
- [Threat Model and Persistence Mechanism] The weakest assumption noted in the threat model—that persistent components reliably survive permission isolation without reset or detection—is load-bearing for the multi-session claim; the experiments should include an ablation or failure-mode analysis showing what happens when the persistent component is cleared or monitored between sessions.
minor comments (2)
- [Abstract] The abstract mentions four models and 80-95% ASR but does not name the models or briefly note the baseline comparison (e.g., stateless backdoors); adding one sentence would improve immediate context.
- [Formal Modeling] Notation for the Mealy-machine states and transitions could be introduced with a small diagram or table in the formal section to make the decomposition mapping clearer to readers unfamiliar with automata.
Simulated Author's Rebuttal
Thank you for the detailed review of our manuscript. We address the major comments point by point below and agree to incorporate revisions where appropriate to improve clarity and completeness.
read point-by-point responses
-
Referee: [Experimental Evaluation] The central empirical claim of 80-95% ASR relies on the per-transition analysis demonstrating decomposition effectiveness; the manuscript should explicitly report the number of transitions tested, the exact data-construction procedure per transition, and any statistical controls for variance across sessions to confirm the independence assumption is not violated by agent memory or context carry-over.
Authors: We agree that these details are necessary to fully support the claims. In the revised manuscript, we will explicitly report the number of transitions tested, provide the precise data-construction procedure used for each transition, and present statistical results from repeated experiments across sessions to verify that the independence assumption holds and that there is no significant variance due to memory carry-over. revision: yes
-
Referee: [Threat Model and Persistence Mechanism] The weakest assumption noted in the threat model—that persistent components reliably survive permission isolation without reset or detection—is load-bearing for the multi-session claim; the experiments should include an ablation or failure-mode analysis showing what happens when the persistent component is cleared or monitored between sessions.
Authors: This is a valid point about the assumptions in the threat model. We will add an ablation study in the revised manuscript that analyzes the attack success when the persistent component is cleared between sessions, showing the necessity of state persistence for the incremental attack. Regarding monitoring, we will discuss it as a potential defense direction in the limitations section. revision: yes
Circularity Check
No significant circularity
full rationale
The paper is an empirical security construction: it models the attack as a Mealy machine, derives a decomposition to support per-transition data construction, then instantiates the framework, measures 80-95% ASR on four models, and releases code. No equations, fitted parameters, or self-citations are shown to reduce the reported success rates or the decomposition's effectiveness to inputs by construction. The central results are externally falsifiable experimental outcomes rather than tautological re-statements of the modeling assumptions.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLM-based agents can be modeled as Mealy machines whose transitions can be independently triggered and constructed.
invented entities (1)
-
Stateful agent backdoor
no independent evidence
Reference graph
Works this paper leans on
-
[1]
ReST meets react: Self-improvement for multi-step reasoning LLM agent
Renat Aksitov, Sobhan Miryoosefi, Zonglin Li, Daliang Li, Sheila Babayan, Kavya Kopparapu, Zachary Fisher, Ruiqi Guo, Sushant Prakash, Pranesh Srinivasan, Manzil Zaheer, Felix Yu, and Sanjiv Kumar. ReST meets react: Self-improvement for multi-step reasoning LLM agent. InICLR 2024 Workshop on Large Language Model (LLM) Agents, 2024. URL https:// openreview...
2024
-
[2]
MAIN- RAG: Multi-agent filtering retrieval-augmented generation
Chia-Yuan Chang, Zhimeng Jiang, Vineeth Rakesh, Menghai Pan, Chin-Chia Michael Yeh, Guanchu Wang, Mingzhi Hu, Zhichao Xu, Yan Zheng, Mahashweta Das, and Na Zou. MAIN- RAG: Multi-agent filtering retrieval-augmented generation. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar, editors,Proceedings of the 63rd Annual Meeting of t...
-
[3]
Agentpoison: Red-teaming LLM agents via poisoning memory or knowledge bases
Zhaorun Chen, Zhen Xiang, Chaowei Xiao, Dawn Song, and Bo Li. Agentpoison: Red-teaming LLM agents via poisoning memory or knowledge bases. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. URL https://openreview.net/forum? id=Y841BRW9rY
2024
-
[4]
TrojanRAG: Retrieval-augmented generation can be backdoor driver in large language models, 2024
Pengzhou Cheng, Yidong Ding, Tianjie Ju, Zongru Wu, Wei Du, Haodong Zhao, Ping Yi, Zhuosheng Zhang, and Gongshen Liu. TrojanRAG: Retrieval-augmented generation can be backdoor driver in large language models, 2024. URL https://openreview.net/forum? id=RfYD6v829Y
2024
-
[5]
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory
Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav. Mem0: Building production-ready ai agents with scalable long-term memory.arXiv preprint arXiv:2504.19413, 2025
work page internal anchor Pith review arXiv 2025
-
[6]
CrewAI Inc. CrewAI. https://github.com/crewAIInc/crewAI, 2026. Version 1.14.4; accessed May 6, 2026
2026
-
[7]
Deepseek-v4: Towards highly efficient million-token context intelligence,
DeepSeek-AI. Deepseek-v4: Towards highly efficient million-token context intelligence,
-
[8]
Accessed: May 6, 2026
URL https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main/ DeepSeek_V4.pdf. Accessed: May 6, 2026
2026
-
[9]
QLoRA: Efficient finetuning of quantized LLMs
Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. QLoRA: Efficient finetuning of quantized LLMs. InThirty-seventh Conference on Neural Information Processing Systems, 2023. URLhttps://openreview.net/forum?id=OUIFPHEgJU
2023
-
[10]
Memory injection attacks on LLM agents via query-only interaction
Shen Dong, Shaochen Xu, Pengfei He, Yige Li, Jiliang Tang, Tianming Liu, Hui Liu, and Zhen Xiang. Memory injection attacks on LLM agents via query-only interaction. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026. URL https: //openreview.net/forum?id=QINnsnppv8
2026
-
[11]
Yunhao Feng, Yige Li, Yutao Wu, Yingshui Tan, Yanming Guo, Yifan Ding, Kun Zhai, Xingjun Ma, and Yu-Gang Jiang. Backdooragent: A unified framework for backdoor attacks on llm-based agents, 2026. URL https://arxiv.org/abs/2601.04566. arXiv preprint arXiv:2601.04566
-
[12]
Retrieval-Augmented Generation for Large Language Models: A Survey
Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Meng Wang, and Haofen Wang. Retrieval-augmented generation for large language models: A survey, 2023. URLhttps://arxiv.org/abs/2312.10997. 10
work page internal anchor Pith review arXiv 2023
-
[13]
Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783, 2024. URL https://arxiv.org/abs/ 2407.21783
work page internal anchor Pith review arXiv 2024
-
[14]
Memo: Training memory-efficient embodied agents with reinforcement learning
Gunshi Gupta, Karmesh Yadav, Zsolt Kira, Yarin Gal, and Rahaf Aljundi. Memo: Training memory-efficient embodied agents with reinforcement learning. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026. URL https://openreview. net/forum?id=9eIntNc69t
2026
-
[15]
Measuring massive multitask language understanding
Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. Measuring massive multitask language understanding. InInternational Conference on Learning Representations, 2021. URL https://openreview.net/forum? id=d7KBjmI3GmQ
2021
-
[16]
LangGraph SDK for Python
LangChain AI. LangGraph SDK for Python. https://github.com/langchain-ai/ langgraph/tree/main/libs/sdk-py, 2026. Version 0.3.14; accessed May 6, 2026
2026
-
[17]
Alexander H. Liu, Kartik Khandelwal, Sandeep Subramanian, Victor Jouault, Abhinav Rastogi, et al. Ministral 3.arXiv preprint arXiv:2601.08584, 2026
work page internal anchor Pith review arXiv 2026
-
[18]
Decoupled weight decay regularization
Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. InInternational Conference on Learning Representations, 2019. URL https://openreview.net/forum? id=Bkg6RiCqY7
2019
-
[19]
Evaluating Very Long-Term Conversational Memory of
Adyasha Maharana, Dong-Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, and Yuwei Fang. Evaluating very long-term conversational memory of LLM agents. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (V olume 1: Long Papers), pages 13851–13870, Bangkok, Thailand, 2024. Association for Computational Lingui...
-
[20]
George H. Mealy. A method for synthesizing sequential circuits.Bell System Technical Journal, 34(5):1045–1079, 1955. doi: 10.1002/j.1538-7305.1955.tb03788.x
-
[21]
Memory in microsoft foundry agent service (preview), 2026
Microsoft. Memory in microsoft foundry agent service (preview), 2026. URL https://learn. microsoft.com/en-us/azure/foundry/agents/concepts/what-is-memory . Ac- cessed: May 6, 2026
2026
-
[22]
Microsoft Agent Framework for Python
Microsoft. Microsoft Agent Framework for Python. https://github.com/microsoft/ agent-framework/, 2026. Version python-1.2.2; accessed May 6, 2026
2026
-
[23]
Evaluation and benchmark- ing of llm agents: A survey
Mahmoud Mohammadi, Yipeng Li, Jane Lo, and Wendy Yip. Evaluation and benchmarking of llm agents: A survey. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V .2, KDD ’25, page 6129–6139, New York, NY , USA, 2025. Association for Computing Machinery. ISBN 9798400714542. doi: 10.1145/3711896.3736570. URLhttps://doi.org/...
-
[24]
OpenAI. Gpt-5 system card, 2026. URL https://arxiv.org/abs/2601.03267. arXiv preprint arXiv:2601.03267
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[25]
OpenAI Agents SDK for Python
OpenAI. OpenAI Agents SDK for Python. https://github.com/openai/ openai-agents-python, 2026. Version 0.15.3; accessed May 6, 2026
2026
-
[26]
MemGPT: Towards LLMs as Operating Systems
Charles Packer, Sarah Wooders, Kevin Lin, Vivian Fang, Shishir G. Patil, Ion Stoica, and Joseph E. Gonzalez. Memgpt: Towards llms as operating systems, 2023. URL https: //arxiv.org/abs/2310.08560. arXiv preprint arXiv:2310.08560
work page internal anchor Pith review arXiv 2023
-
[27]
Qwen, An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, et al. Qwen2.5 technical report.arXiv preprint arXiv:2412.15115, 2024
work page internal anchor Pith review arXiv 2024
-
[28]
Available: https://doi.org/10.1109/SP54263.2024.00254
Mati Ur Rehman, Hadi Ahmadi, and Wajih Ul Hassan. Flash: A comprehensive approach to intrusion detection via provenance graph representation learning. In2024 IEEE Symposium on Security and Privacy (SP), pages 3552–3570, 2024. doi: 10.1109/SP54263.2024.00139. 11
-
[29]
Expert insights into advanced persistent threats: Analysis, attribution, and challenges
Aakanksha Saha, James Mattei, Jorge Blasco, Lorenzo Cavallaro, Daniel V otipka, and Mar- tina Lindorfer. Expert insights into advanced persistent threats: Analysis, attribution, and challenges. In34th USENIX Security Symposium (USENIX Security 25), pages 2185–2204, Seattle, WA, 2025. USENIX Association. URL https://www.usenix.org/conference/ usenixsecurit...
2025
-
[30]
BadAgent: Inserting and activating backdoor attacks in LLM agents
Yifei Wang, Dizhan Xue, Shengjie Zhang, and Shengsheng Qian. BadAgent: Inserting and activating backdoor attacks in LLM agents. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors,Proceedings of the 62nd Annual Meeting of the Association for Compu- tational Linguistics (V olume 1: Long Papers), pages 9811–9827, Bangkok, Thailand, August
-
[31]
doi: 10.18653/v1/2024.acl-long.530
Association for Computational Linguistics. doi: 10.18653/v1/2024.acl-long.530. URL https://aclanthology.org/2024.acl-long.530/
-
[32]
A-mem: Agentic memory for LLM agents
Wujiang Xu, Zujie Liang, Kai Mei, Hang Gao, Juntao Tan, and Yongfeng Zhang. A-mem: Agentic memory for LLM agents. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. URLhttps://openreview.net/forum?id=FiM0M8gcct
2025
-
[33]
An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jing Zhou, Jingren Zhou, Junyang Lin, Kai Dang, Keqin Bao, Kexin Yang, ...
work page internal anchor Pith review arXiv 2025
-
[34]
Watch out for your agents! investigating backdoor threats to llm-based agents
Wenkai Yang, Xiaohan Bi, Yankai Lin, Sishuo Chen, Jie Zhou, and Xu Sun. Watch out for your agents! investigating backdoor threats to llm-based agents. InAdvances in Neu- ral Information Processing Systems, volume 37, pages 100938–100964. Curran Associates, Inc., 2024. URL https://proceedings.neurips.cc/paper_files/paper/2024/file/ b6e9d6f4f3428cd5f3f9e9bb...
2024
-
[35]
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. ReAct: Synergizing reasoning and acting in language models. InInternational Conference on Learning Representations (ICLR), 2023. URLhttps://arxiv.org/abs/2210.03629
work page internal anchor Pith review arXiv 2023
-
[36]
Bo Zhang, Yansong Gao, Boyu Kuang, Changlong Yu, Anmin Fu, and Willy Susilo. A survey on advanced persistent threat detection: A unified framework, challenges, and countermeasures. ACM Computing Surveys, 57(3), 2024. doi: 10.1145/3700749. URL https://doi.org/10. 1145/3700749
-
[37]
Agent security bench (ASB): Formalizing and benchmarking attacks and defenses in LLM-based agents
Hanrong Zhang, Jingyuan Huang, Kai Mei, Yifei Yao, Zhenting Wang, Chenlu Zhan, Hongwei Wang, and Yongfeng Zhang. Agent security bench (ASB): Formalizing and benchmarking attacks and defenses in LLM-based agents. InThe Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id=V4y0CpX4hK
2025
-
[38]
Demon- Agent: Dynamically encrypted multi-backdoor implantation attack on LLM-based agent
Pengyu Zhu, Zhenhong Zhou, Yuanhe Zhang, Shilinlu Yan, Kun Wang, and Sen Su. Demon- Agent: Dynamically encrypted multi-backdoor implantation attack on LLM-based agent. In Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng, editors, Findings of the Association for Computational Linguistics: EMNLP 2025, pages 2890–2912, Suzhou, C...
-
[39]
react” as a trigger, storing “react:attack-state
Wei Zou, Runpeng Geng, Binghui Wang, and Jinyuan Jia. Poisonedrag: knowledge corruption attacks to retrieval-augmented generation of large language models. InProceedings of the 34th USENIX Conference on Security Symposium, SEC ’25, USA, 2025. USENIX Association. ISBN 978-1-939133-52-6. 12 A Formal Definition of the Mealy Machine M= (S,Σ,Λ, δ, λ, s init) M...
2025
-
[40]
Guidelines: • The answer [N/A] means that the paper does not involve crowdsourcing nor research with human subjects
Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.