Formal Security Analysis of Agent Protocol Composition

Christophe Hauser; Haonan Li; Qifan Zhang; Shenghan Zheng; Zheng Zhang

arxiv: 2606.28690 · v1 · pith:WATDE7ASnew · submitted 2026-06-27 · 💻 cs.CR

Formal Security Analysis of Agent Protocol Composition

Shenghan Zheng , Qifan Zhang , Zheng Zhang , Haonan Li , Christophe Hauser This is my paper

Pith reviewed 2026-06-30 10:08 UTC · model grok-4.3

classification 💻 cs.CR

keywords agent protocolssecurity analysisprotocol compositionTLA+responsibility gapsformal verificationAI agentsSDK testing

0 comments

The pith

Security problems in AI agent protocols stem from unassigned responsibilities across specs, SDKs, and deployments rather than isolated errors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that agent protocols suffer from gaps in who must enforce security when multiple protocols interact, and it introduces AgentThread to detect these gaps systematically. AgentThread turns specification text into TLA+ invariants, checks them in a model checker, then replays counterexamples on real SDKs via adapters while tracing each result back to the original source text. A sympathetic reader would care because current agent deployments combine protocols without any single document saying who handles cross-protocol threats, so the same failures keep appearing in production. The work separates violated requirements from missing recommendations and unassigned duties, showing that thirty failures surface only under composition and that only one of five protocols actually enforces a relevant control in code.

Core claim

Across five agent protocols, AgentThread records 35 specification-level findings and 80 implementation tests, then uncovers thirty additional failures that appear solely when protocols are composed; only one protocol enforces any security-relevant control in practice and none assign responsibility for cross-protocol behavior, so insecurity arises from responsibility gaps rather than specification or implementation problems alone.

What carries the argument

AgentThread, a source-linked framework that defines a layered security scope, encodes protocol-derived checks as TLA+ invariants, and uses a two-phase checker to compile specifications into models and replay executable counterexamples on real SDKs through protocol adapters.

If this is right

Protocol specifications must separate requirements from recommendations and record cross-protocol duties.
SDKs and reference servers must be tested together rather than in isolation to expose composition failures.
Only one of the five studied protocols currently implements a security control that the analysis checks.
No studied protocol assigns enforcement for behavior that crosses protocol boundaries.
Findings are supported by source-text links so each violation can be traced to its originating requirement.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Teams building agent systems would need a shared responsibility matrix before combining protocols instead of fixing issues after deployment.
The same two-phase method could be applied to other multi-protocol domains such as cloud service meshes or distributed ledgers.
If responsibility gaps remain unaddressed, security tooling will continue to focus on single-protocol bugs while composition attacks increase.

Load-bearing premise

Protocol adapters replay model-checker counterexamples against real SDKs without introducing artifacts or missing relevant behaviors.

What would settle it

A protocol document or SDK that explicitly assigns enforcement responsibility for security behavior spanning multiple protocols would contradict the central claim.

Figures

Figures reproduced from arXiv: 2606.28690 by Christophe Hauser, Haonan Li, Qifan Zhang, Shenghan Zheng, Zheng Zhang.

**Figure 2.** Figure 2: Layered security scope used by AgentThread. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: AgentThread workflow. Protocol artifacts are nor [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: Cross-protocol attack chain: MCP prompt injection [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Responsibility gap heatmap across layer-derived [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

read the original abstract

AI agent protocols define how agents use tools, delegate work, and coordinate across software systems, but their security requirements remain incomplete and inconsistently enforced across deployments. We present AgentThread, a source-linked framework for security assurance analysis of agent protocols, from specification text to running SDKs. AgentThread contributes a layered security scope, protocol-derived checks formalized as TLA+ invariants, and a two-phase checker that compiles protocol specifications into model-checkable models and replays executable counterexamples against real SDKs through protocol adapters. For each finding, AgentThread records the source text behind the check and separates violated protocol requirements from missing recommendations, hardening gaps, and unassigned cross-protocol responsibilities. Across five emerging agent protocols, AgentThread identifies 35 specification-level findings, supports them with 80 implementation tests against production SDKs and reference servers, and finds 30 additional failures that emerge only under protocol composition. We further show that only one protocol enforces a security-relevant control in practice and no protocol assigns enforcement for cross-protocol behavior. Insecurity in agent protocols is therefore not only a specification or implementation problem, but also a responsibility gap across protocols, SDKs, and deployments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

AgentThread combines TLA+ with SDK replay to surface 30 composition-only failures across five agent protocols and argues for a responsibility gap, but the adapter step that turns model counterexamples into executable tests is the part that needs the most scrutiny.

read the letter

The paper's real contribution is taking five emerging agent protocols, formalizing their security invariants in TLA+, and then attempting to replay the counterexamples through adapters against actual production SDKs. They report 35 specification findings plus 30 failures that only appear under composition, and they note that only one protocol actually enforces a relevant control in practice. The source-linked approach and the separation of requirements from recommendations are useful; it makes the findings traceable rather than just abstract claims.

What stands out is the focus on composition. Single-protocol analysis can miss interactions that only show up when agents switch between protocols, and the numbers (80 tests, 30 composition failures) give a concrete sense of scale. The work also engages directly with deployed code rather than staying in the model.

The soft spot is exactly the replay step. The adapters have to translate TLA+ counterexamples into calls that the real SDKs will accept without dropping behaviors or adding artifacts. The abstract and stress-test note both flag that there is little detail on how the adapters were built or validated. If that mapping is incomplete, the 30 composition failures and the responsibility-gap conclusion do not follow cleanly from the model-checker output. The rest of the evidence (spec findings and single-protocol tests) looks more solid.

This is for researchers working on AI agent security or anyone trying to apply formal methods to protocol stacks that are still being written. A reader who wants to see TLA+ used on live SDKs will find the method worth looking at, even if the specific counts need independent checking.

It deserves peer review. The topic is timely, the method is a reasonable attempt to bridge model and implementation, and the composition angle is worth referee attention even if the adapter validation needs strengthening.

Referee Report

3 major / 1 minor

Summary. The paper introduces AgentThread, a source-linked framework that formalizes security checks for AI agent protocols as TLA+ invariants, compiles specifications into model-checkable models, and uses a two-phase checker to replay counterexamples against production SDKs via protocol adapters. Across five protocols it reports 35 specification-level findings, 80 implementation tests, and 30 additional failures that appear only under composition; it concludes that only one protocol enforces a relevant control in practice and that no protocol assigns responsibility for cross-protocol behavior, establishing a responsibility gap across protocols, SDKs, and deployments.

Significance. If the adapter replay step is shown to be faithful, the work supplies concrete, source-linked evidence that security shortfalls in agent systems arise not only from individual specifications or implementations but from unassigned cross-protocol responsibilities. The combination of TLA+ invariants with executable replay on real SDKs and the separation of violated requirements from missing recommendations constitute a useful methodological contribution for protocol analysis.

major comments (3)

[Abstract / two-phase checker] Abstract and two-phase checker description: the 30 composition failures (and therefore the responsibility-gap claim) rest on the assertion that protocol adapters replay TLA+ counterexamples against production SDKs without introducing artifacts or omitting behaviors. No completeness argument, construction details, or validation that adapters preserve the model’s observable behaviors is supplied; this is load-bearing for the central conclusion.
[Abstract] Abstract: the counts (35 findings, 80 tests, 30 composition failures) are presented without derivation details, exclusion criteria, or a table mapping each failure to the specific invariant violated and the source text that generated it. This prevents independent assessment of whether the 30 failures are genuinely emergent under composition rather than artifacts of the adapter layer.
[Abstract] Abstract claim that “only one protocol enforces a security-relevant control in practice”: the supporting evidence is the 80 implementation tests, yet no section enumerates which control was tested, which SDKs were used, or the pass/fail criteria applied to each of the five protocols.

minor comments (1)

Notation for the layered security scope and the distinction between “violated protocol requirements,” “missing recommendations,” “hardening gaps,” and “unassigned cross-protocol responsibilities” is introduced in the abstract but never given explicit definitions or an example mapping in the provided text.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments highlight important aspects of presentation and evidence that we will address through targeted revisions. Below we respond point-by-point to the major comments.

read point-by-point responses

Referee: [Abstract / two-phase checker] Abstract and two-phase checker description: the 30 composition failures (and therefore the responsibility-gap claim) rest on the assertion that protocol adapters replay TLA+ counterexamples against production SDKs without introducing artifacts or omitting behaviors. No completeness argument, construction details, or validation that adapters preserve the model’s observable behaviors is supplied; this is load-bearing for the central conclusion.

Authors: We agree that adapter fidelity is load-bearing for the composition-only failure claims and the responsibility-gap conclusion. Section 4 of the manuscript describes the two-phase checker and the construction of protocol adapters that map TLA+ actions to SDK API calls while preserving message formats and state transitions. However, we acknowledge the absence of an explicit completeness argument or dedicated validation experiments. We will add a new subsection (4.3) providing construction details, a formal argument that adapters preserve observable behaviors (by construction from the protocol specification), and validation results showing that counterexamples replay identically on reference servers. This revision will directly support the 30 composition failures. revision: yes
Referee: [Abstract] Abstract: the counts (35 findings, 80 tests, 30 composition failures) are presented without derivation details, exclusion criteria, or a table mapping each failure to the specific invariant violated and the source text that generated it. This prevents independent assessment of whether the 30 failures are genuinely emergent under composition rather than artifacts of the adapter layer.

Authors: The counts are obtained by applying the layered security scope (Section 3) to the specification text of each of the five protocols, producing 35 invariant violations, then executing the corresponding checks via the two-phase checker to yield 80 implementation tests and 30 composition-only failures. Exclusion criteria (security-relevant invariants only; non-security recommendations separated) are stated in Section 3.2. To enable independent assessment we will add a summary table (new Table 2) and an appendix that maps every finding to the violated invariant, the exact source-text excerpt, the protocol, and whether the failure appears only under composition. The abstract will be updated with a brief derivation sentence. revision: yes
Referee: [Abstract] Abstract claim that “only one protocol enforces a security-relevant control in practice”: the supporting evidence is the 80 implementation tests, yet no section enumerates which control was tested, which SDKs were used, or the pass/fail criteria applied to each of the five protocols.

Authors: Section 6 enumerates the security-relevant controls tested (authentication, authorization, and session-integrity invariants), lists the five protocols together with the production SDKs and reference servers used for each, and defines pass/fail criteria as whether the running implementation satisfies the TLA+ invariant under the replayed counterexample. The claim that only one protocol passes follows directly from these results. To improve clarity we will insert a compact summary table (new Table 3) in Section 6 that lists, for each protocol, the control tested, the SDKs, and the pass/fail outcome. The abstract wording will remain but will now reference this table. revision: partial

Circularity Check

0 steps flagged

No significant circularity; external formal methods applied to independent specs and SDKs

full rationale

The derivation relies on TLA+ invariants checked against protocol specifications, followed by adapter-based replay of counterexamples on production SDKs. The 35 specification findings and 30 composition failures are outputs of this external model-checking process and 80 implementation tests, not reductions of fitted parameters or self-citations. No self-definitional equations, uniqueness theorems from prior author work, or ansatzes smuggled via citation appear in the abstract or described method. The central responsibility-gap claim follows from the independent checker results rather than by construction from its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim depends on domain assumptions about model fidelity and protocol representativeness rather than free parameters or new physical entities; the framework itself is an invented analysis method without independent falsifiable evidence outside the paper.

axioms (2)

domain assumption TLA+ models derived from protocol text accurately capture security-relevant behaviors for invariant checking.
Invoked in the description of compiling specifications into model-checkable models.
domain assumption The five selected emerging agent protocols are representative of the broader class for drawing general conclusions about responsibility gaps.
Basis for generalizing from the reported findings to all agent protocols.

invented entities (1)

AgentThread two-phase checker with protocol adapters no independent evidence
purpose: To link formal counterexamples back to executable SDK behavior
New component introduced to bridge specification analysis and implementation testing.

pith-pipeline@v0.9.1-grok · 5734 in / 1501 out tokens · 42821 ms · 2026-06-30T10:08:01.488972+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

63 extracted references · 19 canonical work pages · 1 internal anchor

[1]

A2A Project. 2025. Agent-to-Agent Protocol (A2A). https://github.com/a2aproj ect/A2A. Announced April 2025; now under Linux Foundation. Retrieved March 2026

2025
[2]

ACP Contributors. 2025. Agent Communication Protocol. https://agentcommu nicationprotocol.dev/introduction/welcome. Under active development via RFDs. Retrieved March 2026

2025
[3]

William Adamson and Nicolas Williams. 2016. Remote Procedure Call (RPC) Security Version 3. RFC 7861

2016
[4]

Agent Client Protocol Contributors. 2025. Agent Client Protocol (ACP). https: //github.com/agentclientprotocol/agent-client-protocol. retrieved March 2026

2025
[5]

Agent Network Protocol Team. 2025. AgentNetworkProtocol (ANP): An Open Protocol for Agent Communication. https://github.com/agent- network- protocol/AgentNetworkProtocol. Retrieved March 2026

2025
[6]

Abdullah Al Ishtiaq, Sarkar Snigdha Sarathi Das, Syed Md Mukit Rashid, Ali Ran- jbar, Kai Tu, Tianwei Wu, Zhezheng Song, Weixuan Wang, Mujtahid Akon, Rui Zhang, et al. 2024. Hermes: Unlocking security analysis of cellular network pro- tocols by synthesizing finite state machines from natural language specifications. In33rd USENIX Security Symposium (USENI...

2024
[7]

Zeynab Anbiaee, Mahdi Rabbani, Mansur Mirani, Gunjan Piya, Igor Opushnyev, Ali Ghorbani, and Sajjad Dadkhah. 2026. Security Threat Modeling for Emerging AI-Agent Protocols: A Comparative Analysis of MCP, A2A, Agora, and ANP. arXiv preprint arXiv: 2602.11327(2026)

work page internal anchor Pith review Pith/arXiv arXiv 2026
[8]

Anthropic. 2024. Model Context Protocol Specification. https://modelcontextpr otocol.io/specification/2025-11-25. retrieved March 2026

2024
[9]

Linard Arquint, Malte Schwerhoff, Vaibhav Mehta, and Peter Müller. 2023. A Generic Methodology for the Modular Verification of Security Protocol Imple- mentations. InProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security(Copenhagen, Denmark)(CCS ’23). Association for Computing Machinery, New York, NY, USA, 1377–1391. doi:10....

work page doi:10.1145/3576915.36 2023
[10]

David Basin, Jannik Dreier, Lucca Hirschi, Saša Radomirovic, Ralf Sasse, and Vincent Stettler. 2018. A Formal Analysis of 5G Authentication. InProceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security (Toronto, Canada)(CCS ’18). Association for Computing Machinery, New York, NY, USA, 1383–1396. doi:10.1145/3243734.3243846

work page doi:10.1145/3243734.3243846 2018
[11]

McMillan, Kedar S

David Basin, Nate Foster, Kenneth L. McMillan, Kedar S. Namjoshi, Cristina Nita-Rotaru, Jonathan M. Smith, Pamela Zave, and Lenore D. Zuck. 2025. It Takes a Village: Bridging the Gaps between Current and Formal Specifications for Protocols.Commun. ACM68, 8 (July 2025), 50–61. doi:10.1145/3706572

work page doi:10.1145/3706572 2025
[12]

Karthikeyan Bhargavan, Davor Obradovic, and Carl A. Gunter. 2002. Formal verification of standards for distance vector routing protocols.J. ACM49, 4 (July 2002), 538–576. doi:10.1145/581771.581775

work page doi:10.1145/581771.581775 2002
[13]

Steve Bishop, Matthew Fairbairn, Michael Norrish, Peter Sewell, Michael Smith, and Keith Wansbrough. 2005. Rigorous specification and conformance testing techniques for network protocols, as applied to TCP, UDP, and sockets, Vol. 35. Association for Computing Machinery, New York, NY, USA, 265–276. doi:10.114 5/1090191.1080123

work page arXiv 2005
[14]

B Blanchet et al. [n. d.]. An efficient cryptographic protocol verifier based on prolog rules. 2014.doi10, 82–96

2014
[15]

Bochmann and Alexandre Petrenko

Gregor V. Bochmann and Alexandre Petrenko. 1994. Protocol testing: review of methods and relevance for software testing. InProceedings of the 1994 ACM SIGSOFT International Symposium on Software Testing and Analysis(Seattle, Washington, USA)(ISSTA ’94). Association for Computing Machinery, New York, NY, USA, 109–124. doi:10.1145/186258.187153

work page doi:10.1145/186258.187153 1994
[16]

Scott Bradner. 1997. Key words for use in RFCs to Indicate Requirement Levels. RFC 2119

1997
[17]

Moshe Siman Tov Bustan, Mustafa Naamnih, Nir Zadok, and Roni Bar. 2026. The Mother of All AI Supply Chains: Critical, Systemic Vulnerability at the Core of Anthropic’s MCP. OX Security Blog, https://www.ox.security/blog/the-mother- of-all-ai-supply-chains-critical-systemic-vulnerability-at-the-core-of-the- mcp/. Accessed June 2026

2026
[18]

Yi Chen, Di Tang, Yepeng Yao, Mingming Zha, XiaoFeng Wang, Xiaozhong Liu, Haixu Tang, and Dongfang Zhao. 2022. Seeing the Forest for the Trees: Understanding Security Hazards in the 3GPP Ecosystem through Intelligent Analysis on Change Requests. In31st USENIX Security Symposium (USENIX Security 22). USENIX Association, Boston, MA, 17–34. https://www.useni...

2022
[19]

Cloud Security Alliance AI Safety Initiative. 2026. MCP by Design: RCE Across the AI Agent Ecosystem. Cloud Security Alliance Lab Space, https://labs.clo udsecurityalliance.org/research/csa-research-note-mcp-by-design-rce-ox- security-20260420-csa/. Accessed June 2026

2026
[20]

Cas Cremers, Alexander Dax, and Aurora Naska. 2023. Formal Analysis of SPDM: Security Protocol and Data Model version 1.2. In32nd USENIX Security Symposium (USENIX Security 23). USENIX Association, Anaheim, CA, 6611–6628. https://www.usenix.org/conference/usenixsecurity23/presentation/cremers- spdm

2023
[21]

Cas Cremers, Benjamin Kiesl, and Niklas Medinger. 2020. A Formal Analysis of IEEE 802.11’s WPA2: Countering the Kracks Caused by Cracking the Counters. In 29th USENIX Security Symposium (USENIX Security 20). USENIX Association, 1–

2020
[22]

https://www.usenix.org/conference/usenixsecurity20/presentation/cremers
[23]

Edoardo Debenedetti, Jie Zhang, Mislav Balunovic, Luca Beurer-Kellner, Marc Fischer, and Florian Tramèr. 2024. Agentdojo: A dynamic environment to eval- uate prompt injection attacks and defenses for llm agents.Advances in Neural Information Processing Systems37, 82895–82920

2024
[24]

Vincent Diemunsch, Lucca Hirschi, and Steve Kremer. 2025. A comprehensive formal security analysis of OPC UA. InProceedings of the 34th USENIX Conference on Security Symposium(Seattle, WA, USA)(SEC ’25). USENIX Association, USA, Article 363, 20 pages

2025
[25]

Danny Dolev and Andrew Yao. 2003. On the security of public key protocols. IEEE Transactions on information theory29, 2 (2003), 198–208

2003
[26]

Pengfei Gao and Chao Peng. 2025. More with less: An empirical study of turn- control strategies for efficient coding agents.arXiv preprint arXiv:2510.16786 (2025). 10

work page arXiv 2025
[27]

Jacob Ginesin, Max von Hippel, Evan Defloor, Cristina Nita-Rotaru, and Michael Tüxen. 2024. A Formal Analysis of SCTP: Attack Synthesis and Patch Verification. In33rd USENIX Security Symposium (USENIX Security 24). USENIX Association, Philadelphia, PA, 3099–3116. https://www.usenix.org/conference/usenixsecuri ty24/presentation/ginesin

2024
[28]

Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. 2023. Not what you’ve signed up for: Compromising real- world llm-integrated applications with indirect prompt injection. InProceedings of the 16th ACM workshop on artificial intelligence and security. 79–90

2023
[29]

Allenoush Hayrapetian and Rajeev Raje. 2018. Empirically Analyzing and Eval- uating Security Features in Software Requirements. InProceedings of the 11th Innovations in Software Engineering Conference(Hyderabad, India)(ISEC ’18). Association for Computing Machinery, New York, NY, USA, Article 9, 11 pages. doi:10.1145/3172871.3172879

work page doi:10.1145/3172871.3172879 2018
[30]

Feng He, Tianqing Zhu, Dayong Ye, Bo Liu, Wanlei Zhou, and Philip S. Yu. 2026. The Emerged Security and Privacy of LLM Agent: A Survey with Case Studies. ACM Comput. Surv.58, 6 (2026), 162:1–162:36. doi:10.1145/3773080

work page doi:10.1145/3773080 2026
[31]

Charlie Jacomme, Elise Klein, Steve Kremer, and Maïwenn Racouchot. 2023. A comprehensive, formal and automated analysis of the EDHOC protocol. In 32nd USENIX Security Symposium (USENIX Security 23). USENIX Association, Anaheim, CA, 5881–5898. https://www.usenix.org/conference/usenixsecurity 23/presentation/jacomme

2023
[32]

Dezhang Kong, Shi Lin, Zhenhua Xu, Zhebo Wang, Minghao Li, Yufeng Li, Yilun Zhang, Hujin Peng, Xiang Chen, Zeyang Sha, Yuyuan Li, Changting Lin, Xun Wang, Xuan Liu, Ningyu Zhang, Chaochao Chen, Chunming Wu, Muham- mad Khurram Khan, and Meng Han. 2025. A Survey of LLM-Driven AI Agent Communication: Protocols, Security Risks, and Defense Countermeasures.arX...

work page arXiv 2025
[33]

Tomas Kulik, Brijesh Dongol, Peter Gorm Larsen, Hugo Daniel Macedo, Steve Schneider, Peter W. V. Tran-Jørgensen, and James Woodcock. 2022. A Survey of Practical Formal Methods for Security.Form. Asp. Comput.34, 1, Article 5 (July 2022), 39 pages. doi:10.1145/3522582

work page doi:10.1145/3522582 2022
[34]

2002.Specifying systems

Leslie Lamport. 2002.Specifying systems. Vol. 388. Addison-Wesley Boston

2002
[35]

Christine P Lee, David Porfirio, Xinyu Jessica Wang, Kevin Chenkai Zhao, and Bilge Mutlu. 2025. Veriplan: Integrating formal verification and llms into end- user planning. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. 1–19

2025
[36]

Hui Li, Zhen Dong, Siao Wang, Hui Zhang, Liwei Shen, Xin Peng, and Dongdong She. 2025. Extracting Formal Specifications From Documents Using LLMS for Test Automation. In2025 IEEE/ACM 33rd International Conference on Program Comprehension (ICPC). IEEE Computer Society, 1–12

2025
[37]

Han Li, Yuling Shi, Shaoxin Lin, Xiaodong Gu, Heng Lian, Xin Wang, Yantao Jia, Tao Huang, and Qianxiang Wang. 2025. Swe-debate: Competitive multi-agent debate for software issue resolution.arXiv preprint arXiv:2507.23348(2025)

work page arXiv 2025
[38]

Qiaomu Li and Ying Xie. 2025. From glue-code to protocols: A critical anal- ysis of a2a and mcp integration for scalable agent systems.arXiv preprint arXiv:2505.03864(2025)

work page arXiv 2025
[39]

Felix Linker, Ralf Sasse, and David Basin. 2025. A formal analysis of apple’s iMessage PQ3 protocol. InProceedings of the 34th USENIX Conference on Security Symposium(Seattle, WA, USA)(SEC ’25). USENIX Association, USA, Article 258, 20 pages

2025
[40]

R. J. Linn. 2006. Conformance evaluation methodology and protocol testing. IEEE J.Sel. A. Commun.7, 7 (Sept. 2006), 1143–1158. doi:10.1109/49.44561

work page doi:10.1109/49.44561 2006
[41]

Yupei Liu, Yuqi Jia, Runpeng Geng, Jinyuan Jia, and Neil Zhenqiang Gong. 2024. Formalizing and Benchmarking Prompt Injection Attacks and Defenses. In33rd USENIX Security Symposium (USENIX Security 24). USENIX Association, Philadel- phia, PA, 1831–1847. https://www.usenix.org/conference/usenixsecurity24/pre sentation/liu-yupei

2024
[42]

Yinuo Liu, Ruohan Xu, Xilong Wang, Yuqi Jia, and Neil Zhenqiang Gong. 2025. WAInjectBench: Benchmarking Prompt Injection Detections for Web Agents. arXiv preprint arXiv: 2510.01354(2025)

work page arXiv 2025
[43]

Gavin Lowe. 1996. Breaking and fixing the Needham-Schroeder public-key protocol using FDR. InInternational Workshop on Tools and Algorithms for the Construction and Analysis of Systems. Springer, 147–166

1996
[44]

Simon Meier, Benedikt Schmidt, Cas Cremers, and David Basin. 2013. The TAMARIN prover for the symbolic analysis of security protocols. InInternational conference on computer aided verification. Springer, 696–701

2013
[45]

National Institute of Standards and Technology. 2024. Open Security Controls Assessment Language (OSCAL). https://pages.nist.gov/OSCAL/. Accessed April 2026

2024
[46]

National Vulnerability Database. 2025. CVE-2025-49596: MCP Inspector remote code execution vulnerability. https://nvd.nist.gov/vuln/detail/CVE-2025-49596

2025
[47]

National Vulnerability Database. 2025. CVE-2025-68143: Anthropic MCP Git server path handling vulnerability. https://nvd.nist.gov/vuln/detail/CVE-2025- 68143

2025
[48]

National Vulnerability Database. 2025. CVE-2025-68144: Anthropic MCP Git server argument injection vulnerability. https://nvd.nist.gov/vuln/detail/CVE- 2025-68144

2025
[49]

National Vulnerability Database. 2025. CVE-2025-68145: Anthropic MCP Git server repository boundary vulnerability. https://nvd.nist.gov/vuln/detail/CVE- 2025-68145

2025
[50]

Raul Onitza-Klugman. 2025. Prompt Injection Meets MCP: A New Exploitation Vector Emerging? Snyk Labs, https://labs.snyk.io/resources/prompt-injection- mcp/. Accessed June 2026

2025
[51]

OpenAI. 2026. Introducing the OpenAI Safety Bug Bounty Program. https: //openai.com/index/safety-bug-bounty/. Accessed June 2026

2026
[52]

Maria Leonor Pacheco, Max von Hippel, Ben Weintraub, Dan Goldwasser, and Cristina Nita-Rotaru. 2022. Automated attack synthesis by extracting finite state machines from protocol specification documents. In2022 IEEE Symposium on Security and Privacy (SP). IEEE, 51–68

2022
[53]

Eric Rescorla and Brian Korver. 2003. Guidelines for Writing RFC Text on Security Considerations. RFC 3552

2003
[54]

Shubham Saboo and Kristopher Overholt. 2026. A Developers’ Guide to AI Agent Protocols. https://developers.googleblog.com/developers-guide-to-ai-agent- protocols/. Accessed April 2026

2026
[55]

B Sarikaya. 1986. Formal specification-based conformance testing. InProceedings of the ACM SIGCOMM Conference on Communications Architectures & Protocols (Stowe, Vermont, USA)(SIGCOMM ’86). Association for Computing Machinery, New York, NY, USA, 236–240. doi:10.1145/18172.18199

work page doi:10.1145/18172.18199 1986
[56]

Prakhar Sharma and Vinod Yegneswaran. 2023. Prosper: Extracting protocol specifications using large language models. InProceedings of the 22nd ACM workshop on hot topics in networks. 41–47

2023
[57]

Min Shi, Jing Chen, Kun He, Haoran Zhao, Meng Jia, and Ruiying Du. 2023. Formal Analysis and Patching of BLE-SC Pairing. In32nd USENIX Security Symposium (USENIX Security 23). USENIX Association, Anaheim, CA, 37–52. https://www.usenix.org/conference/usenixsecurity23/presentation/shi-min

2023
[58]

Robert Thurlow. 2009. RPC: Remote Procedure Call Protocol Specification Version

2009
[59]

Shuren Xia, Qiwei Li, Taqiya Ehsan, and Jorge Ortiz. 2026. TraceFix: Repairing Agent Coordination Protocols with TLA+ Counterexamples. InProceedings of the ACM Conference on AI and Agentic Systems. 181–196

2026
[60]

Jane Yen, Tamás Lévai, Qinyuan Ye, Xiang Ren, Ramesh Govindan, and Barath Raghavan. 2021. Semi-automated protocol disambiguation and code generation (SIGCOMM ’21). Association for Computing Machinery, New York, NY, USA, 272–286. doi:10.1145/3452296.3472910

work page doi:10.1145/3452296.3472910 2021
[61]

Yuan Yu, Panagiotis Manolios, and Leslie Lamport. 1999. Model checking TLA+ specifications. InAdvanced research working conference on correct hardware design and verification methods. Springer, 54–66

1999
[62]

Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. 2024. Injecagent: Benchmarking indirect prompt injections in tool-integrated large language model agents. InFindings of the Association for Computational Linguistics: ACL 2024. 10471–10506

2024
[63]

Mingwei Zheng, Danning Xie, Qingkai Shi, Chengpeng Wang, and Xiangyu Zhang. 2025. Validating Network Protocol Parsers with Traceable RFC Document Interpretation.Proc. ACM Softw. Eng.2, ISSTA, Article ISSTA078, 23 pages. doi:10.1145/3728955 11

work page doi:10.1145/3728955 2025

[1] [1]

A2A Project. 2025. Agent-to-Agent Protocol (A2A). https://github.com/a2aproj ect/A2A. Announced April 2025; now under Linux Foundation. Retrieved March 2026

2025

[2] [2]

ACP Contributors. 2025. Agent Communication Protocol. https://agentcommu nicationprotocol.dev/introduction/welcome. Under active development via RFDs. Retrieved March 2026

2025

[3] [3]

William Adamson and Nicolas Williams. 2016. Remote Procedure Call (RPC) Security Version 3. RFC 7861

2016

[4] [4]

Agent Client Protocol Contributors. 2025. Agent Client Protocol (ACP). https: //github.com/agentclientprotocol/agent-client-protocol. retrieved March 2026

2025

[5] [5]

Agent Network Protocol Team. 2025. AgentNetworkProtocol (ANP): An Open Protocol for Agent Communication. https://github.com/agent- network- protocol/AgentNetworkProtocol. Retrieved March 2026

2025

[6] [6]

Abdullah Al Ishtiaq, Sarkar Snigdha Sarathi Das, Syed Md Mukit Rashid, Ali Ran- jbar, Kai Tu, Tianwei Wu, Zhezheng Song, Weixuan Wang, Mujtahid Akon, Rui Zhang, et al. 2024. Hermes: Unlocking security analysis of cellular network pro- tocols by synthesizing finite state machines from natural language specifications. In33rd USENIX Security Symposium (USENI...

2024

[7] [7]

Zeynab Anbiaee, Mahdi Rabbani, Mansur Mirani, Gunjan Piya, Igor Opushnyev, Ali Ghorbani, and Sajjad Dadkhah. 2026. Security Threat Modeling for Emerging AI-Agent Protocols: A Comparative Analysis of MCP, A2A, Agora, and ANP. arXiv preprint arXiv: 2602.11327(2026)

work page internal anchor Pith review Pith/arXiv arXiv 2026

[8] [8]

Anthropic. 2024. Model Context Protocol Specification. https://modelcontextpr otocol.io/specification/2025-11-25. retrieved March 2026

2024

[9] [9]

Linard Arquint, Malte Schwerhoff, Vaibhav Mehta, and Peter Müller. 2023. A Generic Methodology for the Modular Verification of Security Protocol Imple- mentations. InProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security(Copenhagen, Denmark)(CCS ’23). Association for Computing Machinery, New York, NY, USA, 1377–1391. doi:10....

work page doi:10.1145/3576915.36 2023

[10] [10]

David Basin, Jannik Dreier, Lucca Hirschi, Saša Radomirovic, Ralf Sasse, and Vincent Stettler. 2018. A Formal Analysis of 5G Authentication. InProceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security (Toronto, Canada)(CCS ’18). Association for Computing Machinery, New York, NY, USA, 1383–1396. doi:10.1145/3243734.3243846

work page doi:10.1145/3243734.3243846 2018

[11] [11]

McMillan, Kedar S

David Basin, Nate Foster, Kenneth L. McMillan, Kedar S. Namjoshi, Cristina Nita-Rotaru, Jonathan M. Smith, Pamela Zave, and Lenore D. Zuck. 2025. It Takes a Village: Bridging the Gaps between Current and Formal Specifications for Protocols.Commun. ACM68, 8 (July 2025), 50–61. doi:10.1145/3706572

work page doi:10.1145/3706572 2025

[12] [12]

Karthikeyan Bhargavan, Davor Obradovic, and Carl A. Gunter. 2002. Formal verification of standards for distance vector routing protocols.J. ACM49, 4 (July 2002), 538–576. doi:10.1145/581771.581775

work page doi:10.1145/581771.581775 2002

[13] [13]

Steve Bishop, Matthew Fairbairn, Michael Norrish, Peter Sewell, Michael Smith, and Keith Wansbrough. 2005. Rigorous specification and conformance testing techniques for network protocols, as applied to TCP, UDP, and sockets, Vol. 35. Association for Computing Machinery, New York, NY, USA, 265–276. doi:10.114 5/1090191.1080123

work page arXiv 2005

[14] [14]

B Blanchet et al. [n. d.]. An efficient cryptographic protocol verifier based on prolog rules. 2014.doi10, 82–96

2014

[15] [15]

Bochmann and Alexandre Petrenko

Gregor V. Bochmann and Alexandre Petrenko. 1994. Protocol testing: review of methods and relevance for software testing. InProceedings of the 1994 ACM SIGSOFT International Symposium on Software Testing and Analysis(Seattle, Washington, USA)(ISSTA ’94). Association for Computing Machinery, New York, NY, USA, 109–124. doi:10.1145/186258.187153

work page doi:10.1145/186258.187153 1994

[16] [16]

Scott Bradner. 1997. Key words for use in RFCs to Indicate Requirement Levels. RFC 2119

1997

[17] [17]

Moshe Siman Tov Bustan, Mustafa Naamnih, Nir Zadok, and Roni Bar. 2026. The Mother of All AI Supply Chains: Critical, Systemic Vulnerability at the Core of Anthropic’s MCP. OX Security Blog, https://www.ox.security/blog/the-mother- of-all-ai-supply-chains-critical-systemic-vulnerability-at-the-core-of-the- mcp/. Accessed June 2026

2026

[18] [18]

Yi Chen, Di Tang, Yepeng Yao, Mingming Zha, XiaoFeng Wang, Xiaozhong Liu, Haixu Tang, and Dongfang Zhao. 2022. Seeing the Forest for the Trees: Understanding Security Hazards in the 3GPP Ecosystem through Intelligent Analysis on Change Requests. In31st USENIX Security Symposium (USENIX Security 22). USENIX Association, Boston, MA, 17–34. https://www.useni...

2022

[19] [19]

Cloud Security Alliance AI Safety Initiative. 2026. MCP by Design: RCE Across the AI Agent Ecosystem. Cloud Security Alliance Lab Space, https://labs.clo udsecurityalliance.org/research/csa-research-note-mcp-by-design-rce-ox- security-20260420-csa/. Accessed June 2026

2026

[20] [20]

Cas Cremers, Alexander Dax, and Aurora Naska. 2023. Formal Analysis of SPDM: Security Protocol and Data Model version 1.2. In32nd USENIX Security Symposium (USENIX Security 23). USENIX Association, Anaheim, CA, 6611–6628. https://www.usenix.org/conference/usenixsecurity23/presentation/cremers- spdm

2023

[21] [21]

Cas Cremers, Benjamin Kiesl, and Niklas Medinger. 2020. A Formal Analysis of IEEE 802.11’s WPA2: Countering the Kracks Caused by Cracking the Counters. In 29th USENIX Security Symposium (USENIX Security 20). USENIX Association, 1–

2020

[22] [22]

https://www.usenix.org/conference/usenixsecurity20/presentation/cremers

[23] [23]

Edoardo Debenedetti, Jie Zhang, Mislav Balunovic, Luca Beurer-Kellner, Marc Fischer, and Florian Tramèr. 2024. Agentdojo: A dynamic environment to eval- uate prompt injection attacks and defenses for llm agents.Advances in Neural Information Processing Systems37, 82895–82920

2024

[24] [24]

Vincent Diemunsch, Lucca Hirschi, and Steve Kremer. 2025. A comprehensive formal security analysis of OPC UA. InProceedings of the 34th USENIX Conference on Security Symposium(Seattle, WA, USA)(SEC ’25). USENIX Association, USA, Article 363, 20 pages

2025

[25] [25]

Danny Dolev and Andrew Yao. 2003. On the security of public key protocols. IEEE Transactions on information theory29, 2 (2003), 198–208

2003

[26] [26]

Pengfei Gao and Chao Peng. 2025. More with less: An empirical study of turn- control strategies for efficient coding agents.arXiv preprint arXiv:2510.16786 (2025). 10

work page arXiv 2025

[27] [27]

Jacob Ginesin, Max von Hippel, Evan Defloor, Cristina Nita-Rotaru, and Michael Tüxen. 2024. A Formal Analysis of SCTP: Attack Synthesis and Patch Verification. In33rd USENIX Security Symposium (USENIX Security 24). USENIX Association, Philadelphia, PA, 3099–3116. https://www.usenix.org/conference/usenixsecuri ty24/presentation/ginesin

2024

[28] [28]

Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. 2023. Not what you’ve signed up for: Compromising real- world llm-integrated applications with indirect prompt injection. InProceedings of the 16th ACM workshop on artificial intelligence and security. 79–90

2023

[29] [29]

Allenoush Hayrapetian and Rajeev Raje. 2018. Empirically Analyzing and Eval- uating Security Features in Software Requirements. InProceedings of the 11th Innovations in Software Engineering Conference(Hyderabad, India)(ISEC ’18). Association for Computing Machinery, New York, NY, USA, Article 9, 11 pages. doi:10.1145/3172871.3172879

work page doi:10.1145/3172871.3172879 2018

[30] [30]

Feng He, Tianqing Zhu, Dayong Ye, Bo Liu, Wanlei Zhou, and Philip S. Yu. 2026. The Emerged Security and Privacy of LLM Agent: A Survey with Case Studies. ACM Comput. Surv.58, 6 (2026), 162:1–162:36. doi:10.1145/3773080

work page doi:10.1145/3773080 2026

[31] [31]

Charlie Jacomme, Elise Klein, Steve Kremer, and Maïwenn Racouchot. 2023. A comprehensive, formal and automated analysis of the EDHOC protocol. In 32nd USENIX Security Symposium (USENIX Security 23). USENIX Association, Anaheim, CA, 5881–5898. https://www.usenix.org/conference/usenixsecurity 23/presentation/jacomme

2023

[32] [32]

Dezhang Kong, Shi Lin, Zhenhua Xu, Zhebo Wang, Minghao Li, Yufeng Li, Yilun Zhang, Hujin Peng, Xiang Chen, Zeyang Sha, Yuyuan Li, Changting Lin, Xun Wang, Xuan Liu, Ningyu Zhang, Chaochao Chen, Chunming Wu, Muham- mad Khurram Khan, and Meng Han. 2025. A Survey of LLM-Driven AI Agent Communication: Protocols, Security Risks, and Defense Countermeasures.arX...

work page arXiv 2025

[33] [33]

Tomas Kulik, Brijesh Dongol, Peter Gorm Larsen, Hugo Daniel Macedo, Steve Schneider, Peter W. V. Tran-Jørgensen, and James Woodcock. 2022. A Survey of Practical Formal Methods for Security.Form. Asp. Comput.34, 1, Article 5 (July 2022), 39 pages. doi:10.1145/3522582

work page doi:10.1145/3522582 2022

[34] [34]

2002.Specifying systems

Leslie Lamport. 2002.Specifying systems. Vol. 388. Addison-Wesley Boston

2002

[35] [35]

Christine P Lee, David Porfirio, Xinyu Jessica Wang, Kevin Chenkai Zhao, and Bilge Mutlu. 2025. Veriplan: Integrating formal verification and llms into end- user planning. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. 1–19

2025

[36] [36]

Hui Li, Zhen Dong, Siao Wang, Hui Zhang, Liwei Shen, Xin Peng, and Dongdong She. 2025. Extracting Formal Specifications From Documents Using LLMS for Test Automation. In2025 IEEE/ACM 33rd International Conference on Program Comprehension (ICPC). IEEE Computer Society, 1–12

2025

[37] [37]

Han Li, Yuling Shi, Shaoxin Lin, Xiaodong Gu, Heng Lian, Xin Wang, Yantao Jia, Tao Huang, and Qianxiang Wang. 2025. Swe-debate: Competitive multi-agent debate for software issue resolution.arXiv preprint arXiv:2507.23348(2025)

work page arXiv 2025

[38] [38]

Qiaomu Li and Ying Xie. 2025. From glue-code to protocols: A critical anal- ysis of a2a and mcp integration for scalable agent systems.arXiv preprint arXiv:2505.03864(2025)

work page arXiv 2025

[39] [39]

Felix Linker, Ralf Sasse, and David Basin. 2025. A formal analysis of apple’s iMessage PQ3 protocol. InProceedings of the 34th USENIX Conference on Security Symposium(Seattle, WA, USA)(SEC ’25). USENIX Association, USA, Article 258, 20 pages

2025

[40] [40]

R. J. Linn. 2006. Conformance evaluation methodology and protocol testing. IEEE J.Sel. A. Commun.7, 7 (Sept. 2006), 1143–1158. doi:10.1109/49.44561

work page doi:10.1109/49.44561 2006

[41] [41]

Yupei Liu, Yuqi Jia, Runpeng Geng, Jinyuan Jia, and Neil Zhenqiang Gong. 2024. Formalizing and Benchmarking Prompt Injection Attacks and Defenses. In33rd USENIX Security Symposium (USENIX Security 24). USENIX Association, Philadel- phia, PA, 1831–1847. https://www.usenix.org/conference/usenixsecurity24/pre sentation/liu-yupei

2024

[42] [42]

Yinuo Liu, Ruohan Xu, Xilong Wang, Yuqi Jia, and Neil Zhenqiang Gong. 2025. WAInjectBench: Benchmarking Prompt Injection Detections for Web Agents. arXiv preprint arXiv: 2510.01354(2025)

work page arXiv 2025

[43] [43]

Gavin Lowe. 1996. Breaking and fixing the Needham-Schroeder public-key protocol using FDR. InInternational Workshop on Tools and Algorithms for the Construction and Analysis of Systems. Springer, 147–166

1996

[44] [44]

Simon Meier, Benedikt Schmidt, Cas Cremers, and David Basin. 2013. The TAMARIN prover for the symbolic analysis of security protocols. InInternational conference on computer aided verification. Springer, 696–701

2013

[45] [45]

National Institute of Standards and Technology. 2024. Open Security Controls Assessment Language (OSCAL). https://pages.nist.gov/OSCAL/. Accessed April 2026

2024

[46] [46]

National Vulnerability Database. 2025. CVE-2025-49596: MCP Inspector remote code execution vulnerability. https://nvd.nist.gov/vuln/detail/CVE-2025-49596

2025

[47] [47]

National Vulnerability Database. 2025. CVE-2025-68143: Anthropic MCP Git server path handling vulnerability. https://nvd.nist.gov/vuln/detail/CVE-2025- 68143

2025

[48] [48]

National Vulnerability Database. 2025. CVE-2025-68144: Anthropic MCP Git server argument injection vulnerability. https://nvd.nist.gov/vuln/detail/CVE- 2025-68144

2025

[49] [49]

National Vulnerability Database. 2025. CVE-2025-68145: Anthropic MCP Git server repository boundary vulnerability. https://nvd.nist.gov/vuln/detail/CVE- 2025-68145

2025

[50] [50]

Raul Onitza-Klugman. 2025. Prompt Injection Meets MCP: A New Exploitation Vector Emerging? Snyk Labs, https://labs.snyk.io/resources/prompt-injection- mcp/. Accessed June 2026

2025

[51] [51]

OpenAI. 2026. Introducing the OpenAI Safety Bug Bounty Program. https: //openai.com/index/safety-bug-bounty/. Accessed June 2026

2026

[52] [52]

Maria Leonor Pacheco, Max von Hippel, Ben Weintraub, Dan Goldwasser, and Cristina Nita-Rotaru. 2022. Automated attack synthesis by extracting finite state machines from protocol specification documents. In2022 IEEE Symposium on Security and Privacy (SP). IEEE, 51–68

2022

[53] [53]

Eric Rescorla and Brian Korver. 2003. Guidelines for Writing RFC Text on Security Considerations. RFC 3552

2003

[54] [54]

Shubham Saboo and Kristopher Overholt. 2026. A Developers’ Guide to AI Agent Protocols. https://developers.googleblog.com/developers-guide-to-ai-agent- protocols/. Accessed April 2026

2026

[55] [55]

B Sarikaya. 1986. Formal specification-based conformance testing. InProceedings of the ACM SIGCOMM Conference on Communications Architectures & Protocols (Stowe, Vermont, USA)(SIGCOMM ’86). Association for Computing Machinery, New York, NY, USA, 236–240. doi:10.1145/18172.18199

work page doi:10.1145/18172.18199 1986

[56] [56]

Prakhar Sharma and Vinod Yegneswaran. 2023. Prosper: Extracting protocol specifications using large language models. InProceedings of the 22nd ACM workshop on hot topics in networks. 41–47

2023

[57] [57]

Min Shi, Jing Chen, Kun He, Haoran Zhao, Meng Jia, and Ruiying Du. 2023. Formal Analysis and Patching of BLE-SC Pairing. In32nd USENIX Security Symposium (USENIX Security 23). USENIX Association, Anaheim, CA, 37–52. https://www.usenix.org/conference/usenixsecurity23/presentation/shi-min

2023

[58] [58]

Robert Thurlow. 2009. RPC: Remote Procedure Call Protocol Specification Version

2009

[59] [59]

Shuren Xia, Qiwei Li, Taqiya Ehsan, and Jorge Ortiz. 2026. TraceFix: Repairing Agent Coordination Protocols with TLA+ Counterexamples. InProceedings of the ACM Conference on AI and Agentic Systems. 181–196

2026

[60] [60]

Jane Yen, Tamás Lévai, Qinyuan Ye, Xiang Ren, Ramesh Govindan, and Barath Raghavan. 2021. Semi-automated protocol disambiguation and code generation (SIGCOMM ’21). Association for Computing Machinery, New York, NY, USA, 272–286. doi:10.1145/3452296.3472910

work page doi:10.1145/3452296.3472910 2021

[61] [61]

Yuan Yu, Panagiotis Manolios, and Leslie Lamport. 1999. Model checking TLA+ specifications. InAdvanced research working conference on correct hardware design and verification methods. Springer, 54–66

1999

[62] [62]

Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. 2024. Injecagent: Benchmarking indirect prompt injections in tool-integrated large language model agents. InFindings of the Association for Computational Linguistics: ACL 2024. 10471–10506

2024

[63] [63]

Mingwei Zheng, Danning Xie, Qingkai Shi, Chengpeng Wang, and Xiangyu Zhang. 2025. Validating Network Protocol Parsers with Traceable RFC Document Interpretation.Proc. ACM Softw. Eng.2, ISSTA, Article ISSTA078, 23 pages. doi:10.1145/3728955 11

work page doi:10.1145/3728955 2025