Formal Security Analysis of Agent Protocol Composition
Pith reviewed 2026-06-30 10:08 UTC · model grok-4.3
The pith
Security problems in AI agent protocols stem from unassigned responsibilities across specs, SDKs, and deployments rather than isolated errors.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Across five agent protocols, AgentThread records 35 specification-level findings and 80 implementation tests, then uncovers thirty additional failures that appear solely when protocols are composed; only one protocol enforces any security-relevant control in practice and none assign responsibility for cross-protocol behavior, so insecurity arises from responsibility gaps rather than specification or implementation problems alone.
What carries the argument
AgentThread, a source-linked framework that defines a layered security scope, encodes protocol-derived checks as TLA+ invariants, and uses a two-phase checker to compile specifications into models and replay executable counterexamples on real SDKs through protocol adapters.
If this is right
- Protocol specifications must separate requirements from recommendations and record cross-protocol duties.
- SDKs and reference servers must be tested together rather than in isolation to expose composition failures.
- Only one of the five studied protocols currently implements a security control that the analysis checks.
- No studied protocol assigns enforcement for behavior that crosses protocol boundaries.
- Findings are supported by source-text links so each violation can be traced to its originating requirement.
Where Pith is reading between the lines
- Teams building agent systems would need a shared responsibility matrix before combining protocols instead of fixing issues after deployment.
- The same two-phase method could be applied to other multi-protocol domains such as cloud service meshes or distributed ledgers.
- If responsibility gaps remain unaddressed, security tooling will continue to focus on single-protocol bugs while composition attacks increase.
Load-bearing premise
Protocol adapters replay model-checker counterexamples against real SDKs without introducing artifacts or missing relevant behaviors.
What would settle it
A protocol document or SDK that explicitly assigns enforcement responsibility for security behavior spanning multiple protocols would contradict the central claim.
Figures
read the original abstract
AI agent protocols define how agents use tools, delegate work, and coordinate across software systems, but their security requirements remain incomplete and inconsistently enforced across deployments. We present AgentThread, a source-linked framework for security assurance analysis of agent protocols, from specification text to running SDKs. AgentThread contributes a layered security scope, protocol-derived checks formalized as TLA+ invariants, and a two-phase checker that compiles protocol specifications into model-checkable models and replays executable counterexamples against real SDKs through protocol adapters. For each finding, AgentThread records the source text behind the check and separates violated protocol requirements from missing recommendations, hardening gaps, and unassigned cross-protocol responsibilities. Across five emerging agent protocols, AgentThread identifies 35 specification-level findings, supports them with 80 implementation tests against production SDKs and reference servers, and finds 30 additional failures that emerge only under protocol composition. We further show that only one protocol enforces a security-relevant control in practice and no protocol assigns enforcement for cross-protocol behavior. Insecurity in agent protocols is therefore not only a specification or implementation problem, but also a responsibility gap across protocols, SDKs, and deployments.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces AgentThread, a source-linked framework that formalizes security checks for AI agent protocols as TLA+ invariants, compiles specifications into model-checkable models, and uses a two-phase checker to replay counterexamples against production SDKs via protocol adapters. Across five protocols it reports 35 specification-level findings, 80 implementation tests, and 30 additional failures that appear only under composition; it concludes that only one protocol enforces a relevant control in practice and that no protocol assigns responsibility for cross-protocol behavior, establishing a responsibility gap across protocols, SDKs, and deployments.
Significance. If the adapter replay step is shown to be faithful, the work supplies concrete, source-linked evidence that security shortfalls in agent systems arise not only from individual specifications or implementations but from unassigned cross-protocol responsibilities. The combination of TLA+ invariants with executable replay on real SDKs and the separation of violated requirements from missing recommendations constitute a useful methodological contribution for protocol analysis.
major comments (3)
- [Abstract / two-phase checker] Abstract and two-phase checker description: the 30 composition failures (and therefore the responsibility-gap claim) rest on the assertion that protocol adapters replay TLA+ counterexamples against production SDKs without introducing artifacts or omitting behaviors. No completeness argument, construction details, or validation that adapters preserve the model’s observable behaviors is supplied; this is load-bearing for the central conclusion.
- [Abstract] Abstract: the counts (35 findings, 80 tests, 30 composition failures) are presented without derivation details, exclusion criteria, or a table mapping each failure to the specific invariant violated and the source text that generated it. This prevents independent assessment of whether the 30 failures are genuinely emergent under composition rather than artifacts of the adapter layer.
- [Abstract] Abstract claim that “only one protocol enforces a security-relevant control in practice”: the supporting evidence is the 80 implementation tests, yet no section enumerates which control was tested, which SDKs were used, or the pass/fail criteria applied to each of the five protocols.
minor comments (1)
- Notation for the layered security scope and the distinction between “violated protocol requirements,” “missing recommendations,” “hardening gaps,” and “unassigned cross-protocol responsibilities” is introduced in the abstract but never given explicit definitions or an example mapping in the provided text.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. The comments highlight important aspects of presentation and evidence that we will address through targeted revisions. Below we respond point-by-point to the major comments.
read point-by-point responses
-
Referee: [Abstract / two-phase checker] Abstract and two-phase checker description: the 30 composition failures (and therefore the responsibility-gap claim) rest on the assertion that protocol adapters replay TLA+ counterexamples against production SDKs without introducing artifacts or omitting behaviors. No completeness argument, construction details, or validation that adapters preserve the model’s observable behaviors is supplied; this is load-bearing for the central conclusion.
Authors: We agree that adapter fidelity is load-bearing for the composition-only failure claims and the responsibility-gap conclusion. Section 4 of the manuscript describes the two-phase checker and the construction of protocol adapters that map TLA+ actions to SDK API calls while preserving message formats and state transitions. However, we acknowledge the absence of an explicit completeness argument or dedicated validation experiments. We will add a new subsection (4.3) providing construction details, a formal argument that adapters preserve observable behaviors (by construction from the protocol specification), and validation results showing that counterexamples replay identically on reference servers. This revision will directly support the 30 composition failures. revision: yes
-
Referee: [Abstract] Abstract: the counts (35 findings, 80 tests, 30 composition failures) are presented without derivation details, exclusion criteria, or a table mapping each failure to the specific invariant violated and the source text that generated it. This prevents independent assessment of whether the 30 failures are genuinely emergent under composition rather than artifacts of the adapter layer.
Authors: The counts are obtained by applying the layered security scope (Section 3) to the specification text of each of the five protocols, producing 35 invariant violations, then executing the corresponding checks via the two-phase checker to yield 80 implementation tests and 30 composition-only failures. Exclusion criteria (security-relevant invariants only; non-security recommendations separated) are stated in Section 3.2. To enable independent assessment we will add a summary table (new Table 2) and an appendix that maps every finding to the violated invariant, the exact source-text excerpt, the protocol, and whether the failure appears only under composition. The abstract will be updated with a brief derivation sentence. revision: yes
-
Referee: [Abstract] Abstract claim that “only one protocol enforces a security-relevant control in practice”: the supporting evidence is the 80 implementation tests, yet no section enumerates which control was tested, which SDKs were used, or the pass/fail criteria applied to each of the five protocols.
Authors: Section 6 enumerates the security-relevant controls tested (authentication, authorization, and session-integrity invariants), lists the five protocols together with the production SDKs and reference servers used for each, and defines pass/fail criteria as whether the running implementation satisfies the TLA+ invariant under the replayed counterexample. The claim that only one protocol passes follows directly from these results. To improve clarity we will insert a compact summary table (new Table 3) in Section 6 that lists, for each protocol, the control tested, the SDKs, and the pass/fail outcome. The abstract wording will remain but will now reference this table. revision: partial
Circularity Check
No significant circularity; external formal methods applied to independent specs and SDKs
full rationale
The derivation relies on TLA+ invariants checked against protocol specifications, followed by adapter-based replay of counterexamples on production SDKs. The 35 specification findings and 30 composition failures are outputs of this external model-checking process and 80 implementation tests, not reductions of fitted parameters or self-citations. No self-definitional equations, uniqueness theorems from prior author work, or ansatzes smuggled via citation appear in the abstract or described method. The central responsibility-gap claim follows from the independent checker results rather than by construction from its inputs.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption TLA+ models derived from protocol text accurately capture security-relevant behaviors for invariant checking.
- domain assumption The five selected emerging agent protocols are representative of the broader class for drawing general conclusions about responsibility gaps.
invented entities (1)
-
AgentThread two-phase checker with protocol adapters
no independent evidence
Reference graph
Works this paper leans on
-
[1]
A2A Project. 2025. Agent-to-Agent Protocol (A2A). https://github.com/a2aproj ect/A2A. Announced April 2025; now under Linux Foundation. Retrieved March 2026
2025
-
[2]
ACP Contributors. 2025. Agent Communication Protocol. https://agentcommu nicationprotocol.dev/introduction/welcome. Under active development via RFDs. Retrieved March 2026
2025
-
[3]
William Adamson and Nicolas Williams. 2016. Remote Procedure Call (RPC) Security Version 3. RFC 7861
2016
-
[4]
Agent Client Protocol Contributors. 2025. Agent Client Protocol (ACP). https: //github.com/agentclientprotocol/agent-client-protocol. retrieved March 2026
2025
-
[5]
Agent Network Protocol Team. 2025. AgentNetworkProtocol (ANP): An Open Protocol for Agent Communication. https://github.com/agent- network- protocol/AgentNetworkProtocol. Retrieved March 2026
2025
-
[6]
Abdullah Al Ishtiaq, Sarkar Snigdha Sarathi Das, Syed Md Mukit Rashid, Ali Ran- jbar, Kai Tu, Tianwei Wu, Zhezheng Song, Weixuan Wang, Mujtahid Akon, Rui Zhang, et al. 2024. Hermes: Unlocking security analysis of cellular network pro- tocols by synthesizing finite state machines from natural language specifications. In33rd USENIX Security Symposium (USENI...
2024
-
[7]
Zeynab Anbiaee, Mahdi Rabbani, Mansur Mirani, Gunjan Piya, Igor Opushnyev, Ali Ghorbani, and Sajjad Dadkhah. 2026. Security Threat Modeling for Emerging AI-Agent Protocols: A Comparative Analysis of MCP, A2A, Agora, and ANP. arXiv preprint arXiv: 2602.11327(2026)
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[8]
Anthropic. 2024. Model Context Protocol Specification. https://modelcontextpr otocol.io/specification/2025-11-25. retrieved March 2026
2024
-
[9]
Linard Arquint, Malte Schwerhoff, Vaibhav Mehta, and Peter Müller. 2023. A Generic Methodology for the Modular Verification of Security Protocol Imple- mentations. InProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security(Copenhagen, Denmark)(CCS ’23). Association for Computing Machinery, New York, NY, USA, 1377–1391. doi:10....
-
[10]
David Basin, Jannik Dreier, Lucca Hirschi, Saša Radomirovic, Ralf Sasse, and Vincent Stettler. 2018. A Formal Analysis of 5G Authentication. InProceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security (Toronto, Canada)(CCS ’18). Association for Computing Machinery, New York, NY, USA, 1383–1396. doi:10.1145/3243734.3243846
-
[11]
David Basin, Nate Foster, Kenneth L. McMillan, Kedar S. Namjoshi, Cristina Nita-Rotaru, Jonathan M. Smith, Pamela Zave, and Lenore D. Zuck. 2025. It Takes a Village: Bridging the Gaps between Current and Formal Specifications for Protocols.Commun. ACM68, 8 (July 2025), 50–61. doi:10.1145/3706572
-
[12]
Karthikeyan Bhargavan, Davor Obradovic, and Carl A. Gunter. 2002. Formal verification of standards for distance vector routing protocols.J. ACM49, 4 (July 2002), 538–576. doi:10.1145/581771.581775
-
[13]
Steve Bishop, Matthew Fairbairn, Michael Norrish, Peter Sewell, Michael Smith, and Keith Wansbrough. 2005. Rigorous specification and conformance testing techniques for network protocols, as applied to TCP, UDP, and sockets, Vol. 35. Association for Computing Machinery, New York, NY, USA, 265–276. doi:10.114 5/1090191.1080123
-
[14]
B Blanchet et al. [n. d.]. An efficient cryptographic protocol verifier based on prolog rules. 2014.doi10, 82–96
2014
-
[15]
Bochmann and Alexandre Petrenko
Gregor V. Bochmann and Alexandre Petrenko. 1994. Protocol testing: review of methods and relevance for software testing. InProceedings of the 1994 ACM SIGSOFT International Symposium on Software Testing and Analysis(Seattle, Washington, USA)(ISSTA ’94). Association for Computing Machinery, New York, NY, USA, 109–124. doi:10.1145/186258.187153
-
[16]
Scott Bradner. 1997. Key words for use in RFCs to Indicate Requirement Levels. RFC 2119
1997
-
[17]
Moshe Siman Tov Bustan, Mustafa Naamnih, Nir Zadok, and Roni Bar. 2026. The Mother of All AI Supply Chains: Critical, Systemic Vulnerability at the Core of Anthropic’s MCP. OX Security Blog, https://www.ox.security/blog/the-mother- of-all-ai-supply-chains-critical-systemic-vulnerability-at-the-core-of-the- mcp/. Accessed June 2026
2026
-
[18]
Yi Chen, Di Tang, Yepeng Yao, Mingming Zha, XiaoFeng Wang, Xiaozhong Liu, Haixu Tang, and Dongfang Zhao. 2022. Seeing the Forest for the Trees: Understanding Security Hazards in the 3GPP Ecosystem through Intelligent Analysis on Change Requests. In31st USENIX Security Symposium (USENIX Security 22). USENIX Association, Boston, MA, 17–34. https://www.useni...
2022
-
[19]
Cloud Security Alliance AI Safety Initiative. 2026. MCP by Design: RCE Across the AI Agent Ecosystem. Cloud Security Alliance Lab Space, https://labs.clo udsecurityalliance.org/research/csa-research-note-mcp-by-design-rce-ox- security-20260420-csa/. Accessed June 2026
2026
-
[20]
Cas Cremers, Alexander Dax, and Aurora Naska. 2023. Formal Analysis of SPDM: Security Protocol and Data Model version 1.2. In32nd USENIX Security Symposium (USENIX Security 23). USENIX Association, Anaheim, CA, 6611–6628. https://www.usenix.org/conference/usenixsecurity23/presentation/cremers- spdm
2023
-
[21]
Cas Cremers, Benjamin Kiesl, and Niklas Medinger. 2020. A Formal Analysis of IEEE 802.11’s WPA2: Countering the Kracks Caused by Cracking the Counters. In 29th USENIX Security Symposium (USENIX Security 20). USENIX Association, 1–
2020
-
[22]
https://www.usenix.org/conference/usenixsecurity20/presentation/cremers
-
[23]
Edoardo Debenedetti, Jie Zhang, Mislav Balunovic, Luca Beurer-Kellner, Marc Fischer, and Florian Tramèr. 2024. Agentdojo: A dynamic environment to eval- uate prompt injection attacks and defenses for llm agents.Advances in Neural Information Processing Systems37, 82895–82920
2024
-
[24]
Vincent Diemunsch, Lucca Hirschi, and Steve Kremer. 2025. A comprehensive formal security analysis of OPC UA. InProceedings of the 34th USENIX Conference on Security Symposium(Seattle, WA, USA)(SEC ’25). USENIX Association, USA, Article 363, 20 pages
2025
-
[25]
Danny Dolev and Andrew Yao. 2003. On the security of public key protocols. IEEE Transactions on information theory29, 2 (2003), 198–208
2003
- [26]
-
[27]
Jacob Ginesin, Max von Hippel, Evan Defloor, Cristina Nita-Rotaru, and Michael Tüxen. 2024. A Formal Analysis of SCTP: Attack Synthesis and Patch Verification. In33rd USENIX Security Symposium (USENIX Security 24). USENIX Association, Philadelphia, PA, 3099–3116. https://www.usenix.org/conference/usenixsecuri ty24/presentation/ginesin
2024
-
[28]
Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. 2023. Not what you’ve signed up for: Compromising real- world llm-integrated applications with indirect prompt injection. InProceedings of the 16th ACM workshop on artificial intelligence and security. 79–90
2023
-
[29]
Allenoush Hayrapetian and Rajeev Raje. 2018. Empirically Analyzing and Eval- uating Security Features in Software Requirements. InProceedings of the 11th Innovations in Software Engineering Conference(Hyderabad, India)(ISEC ’18). Association for Computing Machinery, New York, NY, USA, Article 9, 11 pages. doi:10.1145/3172871.3172879
-
[30]
Feng He, Tianqing Zhu, Dayong Ye, Bo Liu, Wanlei Zhou, and Philip S. Yu. 2026. The Emerged Security and Privacy of LLM Agent: A Survey with Case Studies. ACM Comput. Surv.58, 6 (2026), 162:1–162:36. doi:10.1145/3773080
-
[31]
Charlie Jacomme, Elise Klein, Steve Kremer, and Maïwenn Racouchot. 2023. A comprehensive, formal and automated analysis of the EDHOC protocol. In 32nd USENIX Security Symposium (USENIX Security 23). USENIX Association, Anaheim, CA, 5881–5898. https://www.usenix.org/conference/usenixsecurity 23/presentation/jacomme
2023
-
[32]
Dezhang Kong, Shi Lin, Zhenhua Xu, Zhebo Wang, Minghao Li, Yufeng Li, Yilun Zhang, Hujin Peng, Xiang Chen, Zeyang Sha, Yuyuan Li, Changting Lin, Xun Wang, Xuan Liu, Ningyu Zhang, Chaochao Chen, Chunming Wu, Muham- mad Khurram Khan, and Meng Han. 2025. A Survey of LLM-Driven AI Agent Communication: Protocols, Security Risks, and Defense Countermeasures.arX...
-
[33]
Tomas Kulik, Brijesh Dongol, Peter Gorm Larsen, Hugo Daniel Macedo, Steve Schneider, Peter W. V. Tran-Jørgensen, and James Woodcock. 2022. A Survey of Practical Formal Methods for Security.Form. Asp. Comput.34, 1, Article 5 (July 2022), 39 pages. doi:10.1145/3522582
-
[34]
2002.Specifying systems
Leslie Lamport. 2002.Specifying systems. Vol. 388. Addison-Wesley Boston
2002
-
[35]
Christine P Lee, David Porfirio, Xinyu Jessica Wang, Kevin Chenkai Zhao, and Bilge Mutlu. 2025. Veriplan: Integrating formal verification and llms into end- user planning. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. 1–19
2025
-
[36]
Hui Li, Zhen Dong, Siao Wang, Hui Zhang, Liwei Shen, Xin Peng, and Dongdong She. 2025. Extracting Formal Specifications From Documents Using LLMS for Test Automation. In2025 IEEE/ACM 33rd International Conference on Program Comprehension (ICPC). IEEE Computer Society, 1–12
2025
- [37]
- [38]
-
[39]
Felix Linker, Ralf Sasse, and David Basin. 2025. A formal analysis of apple’s iMessage PQ3 protocol. InProceedings of the 34th USENIX Conference on Security Symposium(Seattle, WA, USA)(SEC ’25). USENIX Association, USA, Article 258, 20 pages
2025
-
[40]
R. J. Linn. 2006. Conformance evaluation methodology and protocol testing. IEEE J.Sel. A. Commun.7, 7 (Sept. 2006), 1143–1158. doi:10.1109/49.44561
-
[41]
Yupei Liu, Yuqi Jia, Runpeng Geng, Jinyuan Jia, and Neil Zhenqiang Gong. 2024. Formalizing and Benchmarking Prompt Injection Attacks and Defenses. In33rd USENIX Security Symposium (USENIX Security 24). USENIX Association, Philadel- phia, PA, 1831–1847. https://www.usenix.org/conference/usenixsecurity24/pre sentation/liu-yupei
2024
- [42]
-
[43]
Gavin Lowe. 1996. Breaking and fixing the Needham-Schroeder public-key protocol using FDR. InInternational Workshop on Tools and Algorithms for the Construction and Analysis of Systems. Springer, 147–166
1996
-
[44]
Simon Meier, Benedikt Schmidt, Cas Cremers, and David Basin. 2013. The TAMARIN prover for the symbolic analysis of security protocols. InInternational conference on computer aided verification. Springer, 696–701
2013
-
[45]
National Institute of Standards and Technology. 2024. Open Security Controls Assessment Language (OSCAL). https://pages.nist.gov/OSCAL/. Accessed April 2026
2024
-
[46]
National Vulnerability Database. 2025. CVE-2025-49596: MCP Inspector remote code execution vulnerability. https://nvd.nist.gov/vuln/detail/CVE-2025-49596
2025
-
[47]
National Vulnerability Database. 2025. CVE-2025-68143: Anthropic MCP Git server path handling vulnerability. https://nvd.nist.gov/vuln/detail/CVE-2025- 68143
2025
-
[48]
National Vulnerability Database. 2025. CVE-2025-68144: Anthropic MCP Git server argument injection vulnerability. https://nvd.nist.gov/vuln/detail/CVE- 2025-68144
2025
-
[49]
National Vulnerability Database. 2025. CVE-2025-68145: Anthropic MCP Git server repository boundary vulnerability. https://nvd.nist.gov/vuln/detail/CVE- 2025-68145
2025
-
[50]
Raul Onitza-Klugman. 2025. Prompt Injection Meets MCP: A New Exploitation Vector Emerging? Snyk Labs, https://labs.snyk.io/resources/prompt-injection- mcp/. Accessed June 2026
2025
-
[51]
OpenAI. 2026. Introducing the OpenAI Safety Bug Bounty Program. https: //openai.com/index/safety-bug-bounty/. Accessed June 2026
2026
-
[52]
Maria Leonor Pacheco, Max von Hippel, Ben Weintraub, Dan Goldwasser, and Cristina Nita-Rotaru. 2022. Automated attack synthesis by extracting finite state machines from protocol specification documents. In2022 IEEE Symposium on Security and Privacy (SP). IEEE, 51–68
2022
-
[53]
Eric Rescorla and Brian Korver. 2003. Guidelines for Writing RFC Text on Security Considerations. RFC 3552
2003
-
[54]
Shubham Saboo and Kristopher Overholt. 2026. A Developers’ Guide to AI Agent Protocols. https://developers.googleblog.com/developers-guide-to-ai-agent- protocols/. Accessed April 2026
2026
-
[55]
B Sarikaya. 1986. Formal specification-based conformance testing. InProceedings of the ACM SIGCOMM Conference on Communications Architectures & Protocols (Stowe, Vermont, USA)(SIGCOMM ’86). Association for Computing Machinery, New York, NY, USA, 236–240. doi:10.1145/18172.18199
-
[56]
Prakhar Sharma and Vinod Yegneswaran. 2023. Prosper: Extracting protocol specifications using large language models. InProceedings of the 22nd ACM workshop on hot topics in networks. 41–47
2023
-
[57]
Min Shi, Jing Chen, Kun He, Haoran Zhao, Meng Jia, and Ruiying Du. 2023. Formal Analysis and Patching of BLE-SC Pairing. In32nd USENIX Security Symposium (USENIX Security 23). USENIX Association, Anaheim, CA, 37–52. https://www.usenix.org/conference/usenixsecurity23/presentation/shi-min
2023
-
[58]
Robert Thurlow. 2009. RPC: Remote Procedure Call Protocol Specification Version
2009
-
[59]
Shuren Xia, Qiwei Li, Taqiya Ehsan, and Jorge Ortiz. 2026. TraceFix: Repairing Agent Coordination Protocols with TLA+ Counterexamples. InProceedings of the ACM Conference on AI and Agentic Systems. 181–196
2026
-
[60]
Jane Yen, Tamás Lévai, Qinyuan Ye, Xiang Ren, Ramesh Govindan, and Barath Raghavan. 2021. Semi-automated protocol disambiguation and code generation (SIGCOMM ’21). Association for Computing Machinery, New York, NY, USA, 272–286. doi:10.1145/3452296.3472910
-
[61]
Yuan Yu, Panagiotis Manolios, and Leslie Lamport. 1999. Model checking TLA+ specifications. InAdvanced research working conference on correct hardware design and verification methods. Springer, 54–66
1999
-
[62]
Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. 2024. Injecagent: Benchmarking indirect prompt injections in tool-integrated large language model agents. InFindings of the Association for Computational Linguistics: ACL 2024. 10471–10506
2024
-
[63]
Mingwei Zheng, Danning Xie, Qingkai Shi, Chengpeng Wang, and Xiangyu Zhang. 2025. Validating Network Protocol Parsers with Traceable RFC Document Interpretation.Proc. ACM Softw. Eng.2, ISSTA, Article ISSTA078, 23 pages. doi:10.1145/3728955 11
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.