pith. machine review for the scientific record. sign in

arxiv: 2604.08407 · v1 · submitted 2026-04-09 · 💻 cs.CR

Recognition: unknown

Your Agent Is Mine: Measuring Malicious Intermediary Attacks on the LLM Supply Chain

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:37 UTC · model grok-4.3

classification 💻 cs.CR
keywords routersattackac-1acrossmaliciousmodelsessionsadaptive
0
0 comments X

The pith

Third-party LLM API routers actively steal credentials and inject malicious code into agents.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies the risks from third-party routers that forward LLM tool-calling requests. These routers see all data in plain text and can change it or steal secrets. Testing many paid and free routers revealed several that perform injections or access researcher canaries. Poisoning tests showed that leaked keys can turn normal routers into attack vectors too. The authors built a tool to simulate the attacks and test defenses.

Core claim

Malicious LLM API routers exist in the supply chain and perform payload injection and secret exfiltration on agent requests, as demonstrated by active attacks on canary credentials and keys across tested routers.

What carries the argument

The threat model defining payload injection (AC-1) and secret exfiltration (AC-2) attacks, implemented via the Mine research proxy against agent frameworks.

Load-bearing premise

The malicious behaviors observed are caused by the routers under test rather than other network elements or setup artifacts.

What would settle it

A controlled experiment where canary credentials are used directly with providers without routers and show no access or injection would falsify the router-specific causation.

Figures

Figures reproduced from arXiv: 2604.08407 by Chaofan Shou, Hanzhi Liu, Hongbo Wen, Ryan Jingyang Fang, Yanju Chen, Yu Feng.

Figure 1
Figure 1. Figure 1: LLM router ecosystem and taint propagation. Agent clients (left) exchange requests and responses through a multi-hop [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Request–response lifecycle through a malicious router. AC-2 tags mark where the router passively scans traffic for [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Observed malicious-router behaviors across 28 paid [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Defense evaluation. (a) Threshold sweep: detection rate vs. false-positive budget for the anomaly screener across [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
read the original abstract

Large language model (LLM) agents increasingly rely on third-party API routers to dispatch tool-calling requests across multiple upstream providers. These routers operate as application-layer proxies with full plaintext access to every in-flight JSON payload, yet no provider enforces cryptographic integrity between client and upstream model. We present the first systematic study of this attack surface. We formalize a threat model for malicious LLM API routers and define two core attack classes, payload injection (AC-1) and secret exfiltration (AC-2), together with two adaptive evasion variants: dependency-targeted injection (AC-1.a) and conditional delivery (AC-1.b). Across 28 paid routers purchased from Taobao, Xianyu, and Shopify-hosted storefronts and 400 free routers collected from public communities, we find 1 paid and 8 free routers actively injecting malicious code, 2 deploying adaptive evasion triggers, 17 touching researcher-owned AWS canary credentials, and 1 draining ETH from a researcher-owned private key. Two poisoning studies further show that ostensibly benign routers can be pulled into the same attack surface: a leaked OpenAI key generates 100M GPT-5.4 tokens and more than seven Codex sessions, while weakly configured decoys yield 2B billed tokens, 99 credentials across 440 Codex sessions, and 401 sessions already running in autonomous YOLO mode. We build Mine, a research proxy that implements all four attack classes against four public agent frameworks, and use it to evaluate three deployable client-side defenses: a fail-closed policy gate, response-side anomaly screening, and append-only transparency logging.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The manuscript presents the first systematic empirical study of malicious intermediary attacks on LLM API routers, which act as plaintext proxies for tool-calling requests. It formalizes a threat model with two core attack classes—payload injection (AC-1) and secret exfiltration (AC-2)—plus adaptive variants (dependency-targeted injection AC-1.a and conditional delivery AC-1.b). Across 28 paid routers sourced from Taobao, Xianyu, and Shopify and 400 free routers from public communities, the authors report 1 paid and 8 free routers actively injecting malicious code, 2 using adaptive evasion triggers, 17 touching researcher-owned AWS canary credentials, and 1 draining ETH from a researcher-owned private key. Two poisoning studies demonstrate how leaked keys can lead to massive token consumption (100M GPT-5.4 tokens, 2B billed tokens) and credential theft (99 credentials across 440 sessions). The authors also introduce the Mine research proxy implementing all attack classes against four public agent frameworks and evaluate three client-side defenses: fail-closed policy gates, response-side anomaly screening, and append-only transparency logging.

Significance. If the router-specific attributions hold, this work provides the first concrete, large-scale measurements of a previously unexamined supply-chain risk in LLM agent ecosystems, with direct implications for how tool-calling traffic is secured. Strengths include the scale of router testing, the use of researcher-controlled canary accounts and keys to generate falsifiable detections, the demonstration of poisoning vectors that turn ostensibly benign routers into attack surfaces, and the release of the Mine proxy for reproducibility and defense evaluation. These elements offer actionable insights for both practitioners and future research on LLM infrastructure security.

major comments (2)
  1. [§4] §4 (Canary and Key Exfiltration Experiments): The headline counts of 17 routers touching AWS canaries and 1 draining ETH are load-bearing for the central empirical claim, yet the manuscript does not demonstrate isolation of these events to the tested routers. The traffic path includes the client's network, DNS, and upstream providers; without unique per-router canary values, request-level logging tied to specific router sessions, or out-of-band verification (e.g., timestamped access logs correlated with router queries), exfiltration could originate from other actors or setup artifacts. This must be addressed with additional controls or data before the counts can be confidently attributed.
  2. [§5.2] §5.2 (Poisoning Studies): The reported outcomes (100M tokens generated, 2B billed tokens, 99 credentials stolen across 440 sessions) are presented as direct results of the poisoning mechanism, but the section lacks full details on experimental controls, data exclusion rules, baseline comparisons without poisoning, and error analysis. These omissions make it impossible to assess whether the figures are robust or confounded by other variables, weakening the claim that benign routers can be pulled into the attack surface.
minor comments (3)
  1. [Abstract] Abstract: 'GPT-5.4' is used without clarification; specify whether this denotes a real model version or is illustrative.
  2. [§2] The threat-model diagram (likely Figure 1) would benefit from explicit labeling of AC-1.a and AC-1.b to improve readability.
  3. [§1.1] Related-work section should cite prior measurements of API proxy or intermediary attacks for context.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed review, which highlights key areas for strengthening the empirical rigor of our measurements. We address each major comment below with clarifications on our methodology and commit to revisions that enhance transparency without altering the core findings.

read point-by-point responses
  1. Referee: [§4] §4 (Canary and Key Exfiltration Experiments): The headline counts of 17 routers touching AWS canaries and 1 draining ETH are load-bearing for the central empirical claim, yet the manuscript does not demonstrate isolation of these events to the tested routers. The traffic path includes the client's network, DNS, and upstream providers; without unique per-router canary values, request-level logging tied to specific router sessions, or out-of-band verification (e.g., timestamped access logs correlated with router queries), exfiltration could originate from other actors or setup artifacts. This must be addressed with additional controls or data before the counts can be confidently attributed.

    Authors: We appreciate this observation on the need for explicit attribution controls. In the experiments, we generated and assigned a unique AWS access key pair to each of the 428 tested routers, ensuring no credential reuse across sessions. All queries were issued sequentially in isolated test runs, with per-router request logs recording the exact timestamp, router identifier, and canary key used. AWS CloudTrail and IAM access logs for each unique key were then cross-referenced against these timestamps to confirm that observed accesses aligned exclusively with the corresponding router query windows. For the single ETH drain event, a distinct researcher-controlled private key was employed and monitored via public blockchain explorers, with the transaction timestamp matching the query to that specific router. We will revise §4 to include a dedicated subsection describing the unique canary assignment, logging protocol, and correlation procedure, thereby making the isolation methodology fully transparent and reproducible. revision: yes

  2. Referee: [§5.2] §5.2 (Poisoning Studies): The reported outcomes (100M tokens generated, 2B billed tokens, 99 credentials stolen across 440 sessions) are presented as direct results of the poisoning mechanism, but the section lacks full details on experimental controls, data exclusion rules, baseline comparisons without poisoning, and error analysis. These omissions make it impossible to assess whether the figures are robust or confounded by other variables, weakening the claim that benign routers can be pulled into the attack surface.

    Authors: We agree that the current presentation of the poisoning studies would benefit from expanded methodological detail to support the robustness of the reported figures. The experiments included explicit baseline phases in which each router was queried for multiple sessions using non-poisoned, researcher-controlled keys to establish normal behavior and token consumption rates. Poisoned keys were then introduced in subsequent phases, with all sessions logged for router identity, key usage, and outcome. Data exclusion rules removed sessions exhibiting network timeouts, router non-responses, or incomplete tool calls (affecting <4% of total runs); these were documented separately rather than included in the final counts. Error analysis comprised variance measurements across repeated trials and confirmation that elevated token usage and credential exfiltration occurred only post-poisoning. We will revise §5.2 by adding a new subsection that details the phased protocol, baseline comparisons, exclusion criteria, and error metrics, including summary statistics to allow readers to evaluate potential confounds. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical measurement study with no derivations, fits, or self-referential claims.

full rationale

The paper reports direct observations from testing 428 routers using researcher-controlled canary accounts and keys. No equations, parameter fitting, predictions, or uniqueness theorems appear in the abstract or described methodology. Results (injection counts, canary touches, ETH drain) are presented as raw empirical findings rather than derived quantities. Attribution concerns raised by the skeptic are validity issues, not circularity in a derivation chain. The work is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper is an empirical security measurement study. It relies on the domain assumption that routers see plaintext JSON and on standard threat-modeling premises about proxy behavior. No free parameters, no new invented entities, and no formal derivations are present.

axioms (1)
  • domain assumption LLM API routers operate as application-layer proxies with full plaintext access to every in-flight JSON payload
    Stated directly in the abstract as the basis for the attack surface.

pith-pipeline@v0.9.0 · 5608 in / 1173 out tokens · 33544 ms · 2026-05-10T17:37:21.463355+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. When Alignment Isn't Enough: Response-Path Attacks on LLM Agents

    cs.CR 2026-05 unverdicted novelty 7.0

    A malicious relay can strategically rewrite aligned LLM outputs in BYOK agent architectures to achieve up to 99.1% attack success on benchmarks like AgentDojo and ASB.

  2. CoT-Guard: Small Models for Strong Monitoring

    cs.CR 2026-05 unverdicted novelty 5.0

    CoT-Guard is a 4B model using SFT and RL that achieves 75% G-mean^2 on hidden objective detection under prompt and code manipulation attacks, outperforming several larger models.

Reference graph

Works this paper leans on

54 extracted references · 22 canonical work pages · cited by 2 Pith papers · 7 internal anchors

  1. [1]

    Terminal-Bench

    2025. Terminal-Bench. https://www.tbench.ai/. Benchmark for testing AI agents in terminal environments. Accessed: 2026-04-08

  2. [2]

    Alibaba Group. 2026. Taobao. https://www.taobao.com. Chinese consumer-to- consumer marketplace. Accessed: 2026-04-07

  3. [3]

    Alibaba Group. 2026. Xianyu (Idle Fish). https://www.goofish.com. Chinese second-hand marketplace. Accessed: 2026-04-07

  4. [4]

    Amazon Web Services. 2026. Amazon Bedrock. https://aws.amazon.com/ bedrock/. Managed service providing access to foundation models from AI21, Anthropic, Cohere, Meta, Mistral, Stability AI, and Amazon via a unified API. Accessed: 2026-04-08

  5. [5]

    Anthropic. 2024. Tool use with Claude. https://platform.claude.com/docs/en/ agents-and-tools/tool-use/overview. Accessed: 2026-04-08

  6. [6]

    Malika Aubakirova, Alex Atallah, Chris Clark, Justin Summerville, and Anjney Midha. 2026. State of AI: An Empirical 100 Trillion Token Study with OpenRouter. arXiv preprint arXiv:2601.10088(2026)

  7. [7]

    BerriAI. 2024. LiteLLM: Call 100+ LLM APIs in OpenAI Format. https://github. com/BerriAI/litellm. Accessed: 2026-03-15

  8. [8]

    Brian Campbell, John Bradley, Nat Sakimura, and Torsten Lodderstedt. 2020. OAuth 2.0 Mutual-TLS Client Authentication and Certificate-Bound Access To- kens. RFC 8705. https://doi.org/10.17487/RFC8705

  9. [9]

    Codecov. 2021. Bash Uploader Security Update. https://about.codecov.io/security- update/. April 2021. CI/CD supply chain breach persisting January–April 2021. 12 Your Agent Is Mine: Measuring Malicious Intermediary Attacks on the LLM Supply Chain Accessed: 2026-03-20

  10. [10]

    Kucherawy

    Dave Crocker, Tony Hansen, and Murray S. Kucherawy. 2011. DomainKeys Identified Mail (DKIM) Signatures. RFC 6376. https://doi.org/10.17487/RFC6376

  11. [11]

    Datadog Security Labs. 2026. LiteLLM and Telnyx compromised on PyPI: Trac- ing the TeamPCP supply chain campaign. https://securitylabs.datadoghq.com/ articles/litellm-compromised-pypi-teampcp-supply-chain-campaign/. March

  12. [12]

    Accessed: 2026-04-08

  13. [13]

    Xavier de Carné de Carnavalet and Mohammad Mannan. 2016. Killed by Proxy: Analyzing Client-end TLS Interception Software. InProceedings of the 2016 Network and Distributed System Security Symposium (NDSS). Internet Society. https://doi.org/10.14722/ndss.2016.23374

  14. [14]

    Ruian Duan, Omar Alrawi, Ranjita Pai Kasturi, Ryan Elder, Brendan Saltafor- maggio, and Wenke Lee. 2021. Towards Measuring Supply Chain Attacks on Package Managers for Interpreted Languages. InProceedings of the 2021 Network and Distributed System Security Symposium (NDSS). Internet Society

  15. [15]

    Alex Halderman, and Vern Paxson

    Zakir Durumeric, Zane Ma, Drew Springall, Richard Barnes, Nick Sullivan, Elie Bursztein, Michael Bailey, J. Alex Halderman, and Vern Paxson. 2017. The Security Impact of HTTPS Interception. InProceedings of the 2017 Network and Distributed System Security Symposium (NDSS). Internet Society. https: //doi.org/10.14722/ndss.2017.23456

  16. [16]

    E2B. 2026. E2B Documentation. https://e2b.dev/docs. Accessed: 2026-04-07

  17. [17]

    Google. 2024. Function calling with the Gemini API. https://ai.google.dev/gemini- api/docs/function-calling. Accessed: 2026-04-08

  18. [18]

    Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. 2023. Not What You’ve Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection. In Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security (AISec). ACM. https://doi.org/10.1145/3605764.3623985

  19. [19]

    Tianyu Gu, Kang Liu, Brendan Dolan-Gavitt, and Siddharth Garg. 2019. BadNets: Evaluating Backdooring Attacks on Deep Neural Networks.IEEE Access7 (2019), 47230–47244. https://doi.org/10.1109/ACCESS.2019.2909068

  20. [20]

    Xinyi Hou, Yanjie Zhao, Shenao Wang, and Haoyu Wang. 2025. Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions. arXiv preprint arXiv:2503.23278(2025)

  21. [21]

    Kong. 2026. Kong AI Gateway. https://developer.konghq.com/ai-gateway/. Accessed: 2026-04-08

  22. [22]

    Keita Kurita, Paul Michel, and Graham Neubig. 2020. Weight Poisoning Attacks on Pretrained Models. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL). ACL. https://doi.org/10.18653/v1/2020.acl- main.249

  23. [23]

    Piergiorgio Ladisa, Henrik Plate, Matias Martinez, and Olivier Barais. 2023. SoK: Taxonomy of Attacks on Open-Source Software Supply Chains. InProceedings of the 2023 IEEE Symposium on Security and Privacy (S&P). IEEE. https://doi.org/ 10.1109/SP46215.2023.10179304

  24. [24]

    Ben Laurie, Adam Langley, and Emil Kasper. 2013. Certificate Transparency. RFC 6962. https://doi.org/10.17487/RFC6962

  25. [25]

    Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. 2008. Isolation Forest. In Proceedings of the 2008 IEEE International Conference on Data Mining (ICDM). IEEE. https://doi.org/10.1109/ICDM.2008.17

  26. [26]

    Yi Liu, Weizhe Wang, Ruitao Feng, Yao Zhang, Guangquan Xu, Gelei Deng, Yuekang Li, and Leo Zhang. 2026. Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale.arXiv preprint arXiv:2601.10338(2026). https://doi.org/10.48550/arXiv.2601.10338

  27. [27]

    Keyu Man, Zhiyun Qian, Zhongjie Wang, Xiaofeng Zheng, Youjun Huang, and Haixin Duan. 2020. DNS Cache Poisoning Attack Reloaded. InProceedings of the 2020 ACM Conference on Computer and Communications Security (CCS). ACM. https://doi.org/10.1145/3372297.3417280

  28. [28]

    McNiece, and Bradley Reaves

    Michael Meli, Matthew R. McNiece, and Bradley Reaves. 2019. How Bad Can It Git? Characterizing Secret Leakage in Public GitHub Reposito- ries. https://www.ndss-symposium.org/ndss-paper/how-bad-can-it-git- characterizing-secret-leakage-in-public-github-repositories/. InProceedings of the 2019 Network and Distributed System Security Symposium (NDSS). Intern...

  29. [29]

    Microsoft. 2026. Azure OpenAI in Foundry Models. https://azure.microsoft.com/ en-us/products/ai-foundry/models/openai/. Accessed: 2026-04-08

  30. [30]

    Model Context Protocol. 2025. Security Best Practices - Model Context Pro- tocol. https://modelcontextprotocol.io/docs/tutorials/security/security_best_ practices. Accessed: 2026-04-08

  31. [31]

    Marc Ohm, Henrik Plate, Arnold Sykosch, and Michael Meier. 2020. Backstabber’s Knife Collection: A Review of Open Source Software Supply Chain Attacks. In Proceedings of the 17th Conference on Detection of Intrusions and Malware & Vulnerability Assessment (DIMV A). Springer. https://doi.org/10.1007/978-3-030- 52683-2_2

  32. [32]

    one-api contributors. 2026. one-api: OpenAI API Management and Distribution System. https://github.com/songquanpeng/one-api. 30.5k GitHub stars, 1.19M Docker Hub pulls as of April 2026. Accessed: 2026-04-07

  33. [33]

    OpenAI. 2023. Function calling and other API updates. https://openai.com/index/ function-calling-and-other-api-updates/. Accessed: 2026-04-08

  34. [34]

    OpenClaw. 2026. OpenClaw Features Documentation. https://docs.openclaw. ai/concepts/features. Accessed: 2026-04-07. Documents support for 35+ model providers, including custom and self-hosted OpenAI-compatible and Anthropic- compatible endpoints

  35. [35]

    OpenCode. 2026. OpenCode Providers Documentation. https://opencode.ai/docs/ providers. Accessed: 2026-04-07. Documents support for 75+ LLM providers and configurable base URLs for custom endpoints and proxy services

  36. [36]

    OpenRouter. 2024. OpenRouter: A Unified Interface for LLMs. https://openrouter. ai. Accessed: 2026-03-15

  37. [37]

    Lily Ottinger, Jordan Schneider, and Zilan Qian. 2025. How to Use Banned US Models in China. https://www.chinatalk.media/p/the-grey-market-for- american-llms. Investigation of Taobao and Xianyu LLM API reselling market. Accessed: 2026-04-08

  38. [38]

    Gorilla: Large Language Model Connected with Massive APIs

    Shishir G. Patil, Tianjun Zhang, Xin Wang, and Joseph E. Gonzalez. 2023. Go- rilla: Large Language Model Connected with Massive APIs.arXiv preprint arXiv:2305.15334(2023)

  39. [39]

    Fábio Perez and Ian Ribeiro. 2022. Ignore Previous Prompt: Attack Techniques For Language Models.arXiv preprint arXiv:2211.09527(2022)

  40. [40]

    Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, Sihan Zhao, Lauren Hong, Runchu Tian, Ruobing Xie, Jie Zhou, Mark Gerstein, Dahai Li, Zhiyuan Liu, and Maosong Sun

  41. [41]

    ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs.arXiv preprint arXiv:2307.16789(2023)

  42. [42]

    QuantumNous. 2026. new-api. https://github.com/QuantumNous/new-api. Open-source multi-provider API management and distribution platform. Ac- cessed: 2026-04-08

  43. [43]

    Maddison, and Tatsunori Hashimoto

    Yangjun Ruan, Honghua Dong, Andrew Wang, Silviu Pitis, Yongchao Zhou, Jimmy Ba, Yann Dubois, Chris J. Maddison, and Tatsunori Hashimoto. 2024. Identifying the Risks of LM Agents with an LM-Emulated Sandbox. InProceedings of the 12th International Conference on Learning Representations (ICLR)

  44. [44]

    Anders Rundgren, Benjamin Jordan, and Samuel Erdtman. 2020. JSON Canoni- calization Scheme (JCS). RFC 8785. https://doi.org/10.17487/RFC8785

  45. [45]

    Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. 2023. Toolformer: Language Models Can Teach Themselves to Use Tools. InAdvances in Neural Information Processing Systems (NeurIPS), Vol. 36

  46. [46]

    Shopify. 2026. Shopify. https://www.shopify.com. Global e-commerce platform hosting independent storefronts. Accessed: 2026-04-07

  47. [47]

    Sigstore. 2026. Sigstore Documentation. https://docs.sigstore.dev/. Accessed: 2026-04-07

  48. [48]

    SLSA. 2026. SLSA Specification. https://slsa.dev/spec/v1.2/. Accessed: 2026-04-07

  49. [49]

    sub2api. 2026. sub2api. https://github.com/Wei-Shaw/sub2api. Open-source OpenAI-compatible API router template. Accessed: 2026-04-08

  50. [50]

    W3C. 2016. Subresource Integrity. https://www.w3.org/TR/SRI/. W3C Recom- mendation. Accessed: 2026-04-07

  51. [51]

    Louis Waked, Mohammad Mannan, and Amr Youssef. 2018. The Sorry State of TLS Security in Enterprise Interception Appliances.arXiv preprint arXiv:1809.08729(2018)

  52. [52]

    Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, Wayne Xin Zhao, Zhewei Wei, and Ji-Rong Wen. 2023. A Survey on Large Language Model based Autonomous Agents.arXiv preprint arXiv:2308.11432(2023)

  53. [53]

    Jeffrey Yasskin. 2020. Signed HTTP Exchanges. Internet-Draft draft-yasskin-http- origin-signed-responses-09. https://datatracker.ietf.org/doc/html/draft-yasskin- http-origin-signed-responses-09 Work in progress. Accessed: 2026-04-07

  54. [54]

    Universal and Transferable Adversarial Attacks on Aligned Language Models

    Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, J. Zico Kolter, and Matt Fredrikson. 2023. Universal and Transferable Adversarial Attacks on Aligned Language Models.arXiv preprint arXiv:2307.15043(2023). A Ethical Considerations This appendix describes the ethical framework governing our re- search, including data handling, measurement constraints, an...