Recognition: unknown
Your Agent Is Mine: Measuring Malicious Intermediary Attacks on the LLM Supply Chain
Pith reviewed 2026-05-10 17:37 UTC · model grok-4.3
The pith
Third-party LLM API routers actively steal credentials and inject malicious code into agents.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Malicious LLM API routers exist in the supply chain and perform payload injection and secret exfiltration on agent requests, as demonstrated by active attacks on canary credentials and keys across tested routers.
What carries the argument
The threat model defining payload injection (AC-1) and secret exfiltration (AC-2) attacks, implemented via the Mine research proxy against agent frameworks.
Load-bearing premise
The malicious behaviors observed are caused by the routers under test rather than other network elements or setup artifacts.
What would settle it
A controlled experiment where canary credentials are used directly with providers without routers and show no access or injection would falsify the router-specific causation.
Figures
read the original abstract
Large language model (LLM) agents increasingly rely on third-party API routers to dispatch tool-calling requests across multiple upstream providers. These routers operate as application-layer proxies with full plaintext access to every in-flight JSON payload, yet no provider enforces cryptographic integrity between client and upstream model. We present the first systematic study of this attack surface. We formalize a threat model for malicious LLM API routers and define two core attack classes, payload injection (AC-1) and secret exfiltration (AC-2), together with two adaptive evasion variants: dependency-targeted injection (AC-1.a) and conditional delivery (AC-1.b). Across 28 paid routers purchased from Taobao, Xianyu, and Shopify-hosted storefronts and 400 free routers collected from public communities, we find 1 paid and 8 free routers actively injecting malicious code, 2 deploying adaptive evasion triggers, 17 touching researcher-owned AWS canary credentials, and 1 draining ETH from a researcher-owned private key. Two poisoning studies further show that ostensibly benign routers can be pulled into the same attack surface: a leaked OpenAI key generates 100M GPT-5.4 tokens and more than seven Codex sessions, while weakly configured decoys yield 2B billed tokens, 99 credentials across 440 Codex sessions, and 401 sessions already running in autonomous YOLO mode. We build Mine, a research proxy that implements all four attack classes against four public agent frameworks, and use it to evaluate three deployable client-side defenses: a fail-closed policy gate, response-side anomaly screening, and append-only transparency logging.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents the first systematic empirical study of malicious intermediary attacks on LLM API routers, which act as plaintext proxies for tool-calling requests. It formalizes a threat model with two core attack classes—payload injection (AC-1) and secret exfiltration (AC-2)—plus adaptive variants (dependency-targeted injection AC-1.a and conditional delivery AC-1.b). Across 28 paid routers sourced from Taobao, Xianyu, and Shopify and 400 free routers from public communities, the authors report 1 paid and 8 free routers actively injecting malicious code, 2 using adaptive evasion triggers, 17 touching researcher-owned AWS canary credentials, and 1 draining ETH from a researcher-owned private key. Two poisoning studies demonstrate how leaked keys can lead to massive token consumption (100M GPT-5.4 tokens, 2B billed tokens) and credential theft (99 credentials across 440 sessions). The authors also introduce the Mine research proxy implementing all attack classes against four public agent frameworks and evaluate three client-side defenses: fail-closed policy gates, response-side anomaly screening, and append-only transparency logging.
Significance. If the router-specific attributions hold, this work provides the first concrete, large-scale measurements of a previously unexamined supply-chain risk in LLM agent ecosystems, with direct implications for how tool-calling traffic is secured. Strengths include the scale of router testing, the use of researcher-controlled canary accounts and keys to generate falsifiable detections, the demonstration of poisoning vectors that turn ostensibly benign routers into attack surfaces, and the release of the Mine proxy for reproducibility and defense evaluation. These elements offer actionable insights for both practitioners and future research on LLM infrastructure security.
major comments (2)
- [§4] §4 (Canary and Key Exfiltration Experiments): The headline counts of 17 routers touching AWS canaries and 1 draining ETH are load-bearing for the central empirical claim, yet the manuscript does not demonstrate isolation of these events to the tested routers. The traffic path includes the client's network, DNS, and upstream providers; without unique per-router canary values, request-level logging tied to specific router sessions, or out-of-band verification (e.g., timestamped access logs correlated with router queries), exfiltration could originate from other actors or setup artifacts. This must be addressed with additional controls or data before the counts can be confidently attributed.
- [§5.2] §5.2 (Poisoning Studies): The reported outcomes (100M tokens generated, 2B billed tokens, 99 credentials stolen across 440 sessions) are presented as direct results of the poisoning mechanism, but the section lacks full details on experimental controls, data exclusion rules, baseline comparisons without poisoning, and error analysis. These omissions make it impossible to assess whether the figures are robust or confounded by other variables, weakening the claim that benign routers can be pulled into the attack surface.
minor comments (3)
- [Abstract] Abstract: 'GPT-5.4' is used without clarification; specify whether this denotes a real model version or is illustrative.
- [§2] The threat-model diagram (likely Figure 1) would benefit from explicit labeling of AC-1.a and AC-1.b to improve readability.
- [§1.1] Related-work section should cite prior measurements of API proxy or intermediary attacks for context.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed review, which highlights key areas for strengthening the empirical rigor of our measurements. We address each major comment below with clarifications on our methodology and commit to revisions that enhance transparency without altering the core findings.
read point-by-point responses
-
Referee: [§4] §4 (Canary and Key Exfiltration Experiments): The headline counts of 17 routers touching AWS canaries and 1 draining ETH are load-bearing for the central empirical claim, yet the manuscript does not demonstrate isolation of these events to the tested routers. The traffic path includes the client's network, DNS, and upstream providers; without unique per-router canary values, request-level logging tied to specific router sessions, or out-of-band verification (e.g., timestamped access logs correlated with router queries), exfiltration could originate from other actors or setup artifacts. This must be addressed with additional controls or data before the counts can be confidently attributed.
Authors: We appreciate this observation on the need for explicit attribution controls. In the experiments, we generated and assigned a unique AWS access key pair to each of the 428 tested routers, ensuring no credential reuse across sessions. All queries were issued sequentially in isolated test runs, with per-router request logs recording the exact timestamp, router identifier, and canary key used. AWS CloudTrail and IAM access logs for each unique key were then cross-referenced against these timestamps to confirm that observed accesses aligned exclusively with the corresponding router query windows. For the single ETH drain event, a distinct researcher-controlled private key was employed and monitored via public blockchain explorers, with the transaction timestamp matching the query to that specific router. We will revise §4 to include a dedicated subsection describing the unique canary assignment, logging protocol, and correlation procedure, thereby making the isolation methodology fully transparent and reproducible. revision: yes
-
Referee: [§5.2] §5.2 (Poisoning Studies): The reported outcomes (100M tokens generated, 2B billed tokens, 99 credentials stolen across 440 sessions) are presented as direct results of the poisoning mechanism, but the section lacks full details on experimental controls, data exclusion rules, baseline comparisons without poisoning, and error analysis. These omissions make it impossible to assess whether the figures are robust or confounded by other variables, weakening the claim that benign routers can be pulled into the attack surface.
Authors: We agree that the current presentation of the poisoning studies would benefit from expanded methodological detail to support the robustness of the reported figures. The experiments included explicit baseline phases in which each router was queried for multiple sessions using non-poisoned, researcher-controlled keys to establish normal behavior and token consumption rates. Poisoned keys were then introduced in subsequent phases, with all sessions logged for router identity, key usage, and outcome. Data exclusion rules removed sessions exhibiting network timeouts, router non-responses, or incomplete tool calls (affecting <4% of total runs); these were documented separately rather than included in the final counts. Error analysis comprised variance measurements across repeated trials and confirmation that elevated token usage and credential exfiltration occurred only post-poisoning. We will revise §5.2 by adding a new subsection that details the phased protocol, baseline comparisons, exclusion criteria, and error metrics, including summary statistics to allow readers to evaluate potential confounds. revision: yes
Circularity Check
No circularity: purely empirical measurement study with no derivations, fits, or self-referential claims.
full rationale
The paper reports direct observations from testing 428 routers using researcher-controlled canary accounts and keys. No equations, parameter fitting, predictions, or uniqueness theorems appear in the abstract or described methodology. Results (injection counts, canary touches, ETH drain) are presented as raw empirical findings rather than derived quantities. Attribution concerns raised by the skeptic are validity issues, not circularity in a derivation chain. The work is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLM API routers operate as application-layer proxies with full plaintext access to every in-flight JSON payload
Forward citations
Cited by 2 Pith papers
-
When Alignment Isn't Enough: Response-Path Attacks on LLM Agents
A malicious relay can strategically rewrite aligned LLM outputs in BYOK agent architectures to achieve up to 99.1% attack success on benchmarks like AgentDojo and ASB.
-
CoT-Guard: Small Models for Strong Monitoring
CoT-Guard is a 4B model using SFT and RL that achieves 75% G-mean^2 on hidden objective detection under prompt and code manipulation attacks, outperforming several larger models.
Reference graph
Works this paper leans on
-
[1]
Terminal-Bench
2025. Terminal-Bench. https://www.tbench.ai/. Benchmark for testing AI agents in terminal environments. Accessed: 2026-04-08
2025
-
[2]
Alibaba Group. 2026. Taobao. https://www.taobao.com. Chinese consumer-to- consumer marketplace. Accessed: 2026-04-07
2026
-
[3]
Alibaba Group. 2026. Xianyu (Idle Fish). https://www.goofish.com. Chinese second-hand marketplace. Accessed: 2026-04-07
2026
-
[4]
Amazon Web Services. 2026. Amazon Bedrock. https://aws.amazon.com/ bedrock/. Managed service providing access to foundation models from AI21, Anthropic, Cohere, Meta, Mistral, Stability AI, and Amazon via a unified API. Accessed: 2026-04-08
2026
-
[5]
Anthropic. 2024. Tool use with Claude. https://platform.claude.com/docs/en/ agents-and-tools/tool-use/overview. Accessed: 2026-04-08
2024
- [6]
-
[7]
BerriAI. 2024. LiteLLM: Call 100+ LLM APIs in OpenAI Format. https://github. com/BerriAI/litellm. Accessed: 2026-03-15
2024
-
[8]
Brian Campbell, John Bradley, Nat Sakimura, and Torsten Lodderstedt. 2020. OAuth 2.0 Mutual-TLS Client Authentication and Certificate-Bound Access To- kens. RFC 8705. https://doi.org/10.17487/RFC8705
-
[9]
Codecov. 2021. Bash Uploader Security Update. https://about.codecov.io/security- update/. April 2021. CI/CD supply chain breach persisting January–April 2021. 12 Your Agent Is Mine: Measuring Malicious Intermediary Attacks on the LLM Supply Chain Accessed: 2026-03-20
2021
-
[10]
Dave Crocker, Tony Hansen, and Murray S. Kucherawy. 2011. DomainKeys Identified Mail (DKIM) Signatures. RFC 6376. https://doi.org/10.17487/RFC6376
-
[11]
Datadog Security Labs. 2026. LiteLLM and Telnyx compromised on PyPI: Trac- ing the TeamPCP supply chain campaign. https://securitylabs.datadoghq.com/ articles/litellm-compromised-pypi-teampcp-supply-chain-campaign/. March
2026
-
[12]
Accessed: 2026-04-08
2026
-
[13]
Xavier de Carné de Carnavalet and Mohammad Mannan. 2016. Killed by Proxy: Analyzing Client-end TLS Interception Software. InProceedings of the 2016 Network and Distributed System Security Symposium (NDSS). Internet Society. https://doi.org/10.14722/ndss.2016.23374
-
[14]
Ruian Duan, Omar Alrawi, Ranjita Pai Kasturi, Ryan Elder, Brendan Saltafor- maggio, and Wenke Lee. 2021. Towards Measuring Supply Chain Attacks on Package Managers for Interpreted Languages. InProceedings of the 2021 Network and Distributed System Security Symposium (NDSS). Internet Society
2021
-
[15]
Alex Halderman, and Vern Paxson
Zakir Durumeric, Zane Ma, Drew Springall, Richard Barnes, Nick Sullivan, Elie Bursztein, Michael Bailey, J. Alex Halderman, and Vern Paxson. 2017. The Security Impact of HTTPS Interception. InProceedings of the 2017 Network and Distributed System Security Symposium (NDSS). Internet Society. https: //doi.org/10.14722/ndss.2017.23456
-
[16]
E2B. 2026. E2B Documentation. https://e2b.dev/docs. Accessed: 2026-04-07
2026
-
[17]
Google. 2024. Function calling with the Gemini API. https://ai.google.dev/gemini- api/docs/function-calling. Accessed: 2026-04-08
2024
-
[18]
Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. 2023. Not What You’ve Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection. In Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security (AISec). ACM. https://doi.org/10.1145/3605764.3623985
-
[19]
Tianyu Gu, Kang Liu, Brendan Dolan-Gavitt, and Siddharth Garg. 2019. BadNets: Evaluating Backdooring Attacks on Deep Neural Networks.IEEE Access7 (2019), 47230–47244. https://doi.org/10.1109/ACCESS.2019.2909068
-
[20]
Xinyi Hou, Yanjie Zhao, Shenao Wang, and Haoyu Wang. 2025. Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions. arXiv preprint arXiv:2503.23278(2025)
work page internal anchor Pith review arXiv 2025
-
[21]
Kong. 2026. Kong AI Gateway. https://developer.konghq.com/ai-gateway/. Accessed: 2026-04-08
2026
-
[22]
Keita Kurita, Paul Michel, and Graham Neubig. 2020. Weight Poisoning Attacks on Pretrained Models. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL). ACL. https://doi.org/10.18653/v1/2020.acl- main.249
-
[23]
Piergiorgio Ladisa, Henrik Plate, Matias Martinez, and Olivier Barais. 2023. SoK: Taxonomy of Attacks on Open-Source Software Supply Chains. InProceedings of the 2023 IEEE Symposium on Security and Privacy (S&P). IEEE. https://doi.org/ 10.1109/SP46215.2023.10179304
-
[24]
Ben Laurie, Adam Langley, and Emil Kasper. 2013. Certificate Transparency. RFC 6962. https://doi.org/10.17487/RFC6962
-
[25]
Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. 2008. Isolation Forest. In Proceedings of the 2008 IEEE International Conference on Data Mining (ICDM). IEEE. https://doi.org/10.1109/ICDM.2008.17
-
[26]
Yi Liu, Weizhe Wang, Ruitao Feng, Yao Zhang, Guangquan Xu, Gelei Deng, Yuekang Li, and Leo Zhang. 2026. Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale.arXiv preprint arXiv:2601.10338(2026). https://doi.org/10.48550/arXiv.2601.10338
work page internal anchor Pith review doi:10.48550/arxiv.2601.10338 2026
-
[27]
Keyu Man, Zhiyun Qian, Zhongjie Wang, Xiaofeng Zheng, Youjun Huang, and Haixin Duan. 2020. DNS Cache Poisoning Attack Reloaded. InProceedings of the 2020 ACM Conference on Computer and Communications Security (CCS). ACM. https://doi.org/10.1145/3372297.3417280
-
[28]
McNiece, and Bradley Reaves
Michael Meli, Matthew R. McNiece, and Bradley Reaves. 2019. How Bad Can It Git? Characterizing Secret Leakage in Public GitHub Reposito- ries. https://www.ndss-symposium.org/ndss-paper/how-bad-can-it-git- characterizing-secret-leakage-in-public-github-repositories/. InProceedings of the 2019 Network and Distributed System Security Symposium (NDSS). Intern...
2019
-
[29]
Microsoft. 2026. Azure OpenAI in Foundry Models. https://azure.microsoft.com/ en-us/products/ai-foundry/models/openai/. Accessed: 2026-04-08
2026
-
[30]
Model Context Protocol. 2025. Security Best Practices - Model Context Pro- tocol. https://modelcontextprotocol.io/docs/tutorials/security/security_best_ practices. Accessed: 2026-04-08
2025
-
[31]
Marc Ohm, Henrik Plate, Arnold Sykosch, and Michael Meier. 2020. Backstabber’s Knife Collection: A Review of Open Source Software Supply Chain Attacks. In Proceedings of the 17th Conference on Detection of Intrusions and Malware & Vulnerability Assessment (DIMV A). Springer. https://doi.org/10.1007/978-3-030- 52683-2_2
-
[32]
one-api contributors. 2026. one-api: OpenAI API Management and Distribution System. https://github.com/songquanpeng/one-api. 30.5k GitHub stars, 1.19M Docker Hub pulls as of April 2026. Accessed: 2026-04-07
2026
-
[33]
OpenAI. 2023. Function calling and other API updates. https://openai.com/index/ function-calling-and-other-api-updates/. Accessed: 2026-04-08
2023
-
[34]
OpenClaw. 2026. OpenClaw Features Documentation. https://docs.openclaw. ai/concepts/features. Accessed: 2026-04-07. Documents support for 35+ model providers, including custom and self-hosted OpenAI-compatible and Anthropic- compatible endpoints
2026
-
[35]
OpenCode. 2026. OpenCode Providers Documentation. https://opencode.ai/docs/ providers. Accessed: 2026-04-07. Documents support for 75+ LLM providers and configurable base URLs for custom endpoints and proxy services
2026
-
[36]
OpenRouter. 2024. OpenRouter: A Unified Interface for LLMs. https://openrouter. ai. Accessed: 2026-03-15
2024
-
[37]
Lily Ottinger, Jordan Schneider, and Zilan Qian. 2025. How to Use Banned US Models in China. https://www.chinatalk.media/p/the-grey-market-for- american-llms. Investigation of Taobao and Xianyu LLM API reselling market. Accessed: 2026-04-08
2025
-
[38]
Gorilla: Large Language Model Connected with Massive APIs
Shishir G. Patil, Tianjun Zhang, Xin Wang, and Joseph E. Gonzalez. 2023. Go- rilla: Large Language Model Connected with Massive APIs.arXiv preprint arXiv:2305.15334(2023)
work page internal anchor Pith review arXiv 2023
-
[39]
Fábio Perez and Ian Ribeiro. 2022. Ignore Previous Prompt: Attack Techniques For Language Models.arXiv preprint arXiv:2211.09527(2022)
work page internal anchor Pith review arXiv 2022
-
[40]
Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, Sihan Zhao, Lauren Hong, Runchu Tian, Ruobing Xie, Jie Zhou, Mark Gerstein, Dahai Li, Zhiyuan Liu, and Maosong Sun
-
[41]
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs.arXiv preprint arXiv:2307.16789(2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[42]
QuantumNous. 2026. new-api. https://github.com/QuantumNous/new-api. Open-source multi-provider API management and distribution platform. Ac- cessed: 2026-04-08
2026
-
[43]
Maddison, and Tatsunori Hashimoto
Yangjun Ruan, Honghua Dong, Andrew Wang, Silviu Pitis, Yongchao Zhou, Jimmy Ba, Yann Dubois, Chris J. Maddison, and Tatsunori Hashimoto. 2024. Identifying the Risks of LM Agents with an LM-Emulated Sandbox. InProceedings of the 12th International Conference on Learning Representations (ICLR)
2024
-
[44]
Anders Rundgren, Benjamin Jordan, and Samuel Erdtman. 2020. JSON Canoni- calization Scheme (JCS). RFC 8785. https://doi.org/10.17487/RFC8785
-
[45]
Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. 2023. Toolformer: Language Models Can Teach Themselves to Use Tools. InAdvances in Neural Information Processing Systems (NeurIPS), Vol. 36
2023
-
[46]
Shopify. 2026. Shopify. https://www.shopify.com. Global e-commerce platform hosting independent storefronts. Accessed: 2026-04-07
2026
-
[47]
Sigstore. 2026. Sigstore Documentation. https://docs.sigstore.dev/. Accessed: 2026-04-07
2026
-
[48]
SLSA. 2026. SLSA Specification. https://slsa.dev/spec/v1.2/. Accessed: 2026-04-07
2026
-
[49]
sub2api. 2026. sub2api. https://github.com/Wei-Shaw/sub2api. Open-source OpenAI-compatible API router template. Accessed: 2026-04-08
2026
-
[50]
W3C. 2016. Subresource Integrity. https://www.w3.org/TR/SRI/. W3C Recom- mendation. Accessed: 2026-04-07
2016
- [51]
-
[52]
Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, Wayne Xin Zhao, Zhewei Wei, and Ji-Rong Wen. 2023. A Survey on Large Language Model based Autonomous Agents.arXiv preprint arXiv:2308.11432(2023)
work page internal anchor Pith review arXiv 2023
-
[53]
Jeffrey Yasskin. 2020. Signed HTTP Exchanges. Internet-Draft draft-yasskin-http- origin-signed-responses-09. https://datatracker.ietf.org/doc/html/draft-yasskin- http-origin-signed-responses-09 Work in progress. Accessed: 2026-04-07
2020
-
[54]
Universal and Transferable Adversarial Attacks on Aligned Language Models
Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, J. Zico Kolter, and Matt Fredrikson. 2023. Universal and Transferable Adversarial Attacks on Aligned Language Models.arXiv preprint arXiv:2307.15043(2023). A Ethical Considerations This appendix describes the ethical framework governing our re- search, including data handling, measurement constraints, an...
work page internal anchor Pith review Pith/arXiv arXiv 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.