Investigating Detection and Obfuscation of Prompt Injection Attacks Against Software Reverse Engineering AI Agents

Brian Crawford; Patrick McClure

arxiv: 2605.30677 · v1 · pith:4EZAM7R7new · submitted 2026-05-29 · 💻 cs.CR · cs.AI· cs.SE

Investigating Detection and Obfuscation of Prompt Injection Attacks Against Software Reverse Engineering AI Agents

Brian Crawford , Patrick McClure This is my paper

Pith reviewed 2026-06-28 22:30 UTC · model grok-4.3

classification 💻 cs.CR cs.AIcs.SE

keywords prompt injectionAI agentssoftware reverse engineeringdecompiler outputobfuscationadversarial examplesbinary analysiscyber security

0 comments

The pith

Agentic software reverse engineering AI systems are vulnerable to prompt injection attacks embedded in binary source code, with detection possible via decompiler output.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that prompt injection attacks can be placed directly into the source code of executable binary files to target agentic software reverse engineering systems. It demonstrates that these attacks can be detected by searching for injection strings in the decompiler output produced from adversarial binaries. The work also explores techniques for obfuscating the injections and corresponding methods to defend against the obfuscated versions. This matters for production cyber workflows because agentic AI analysis tools may process untrusted binaries. The findings advance understanding of risks that must be addressed before such systems can be deployed securely.

Core claim

Agentic software reverse engineering systems are vulnerable to prompt injection attacks placed into the source code of executable binary files. This research demonstrates defensive tactics for detecting the presence of prompt injection strings in the decompiler output of adversarial example programs. Methods for obfuscating these attacks and subsequent methods for defending against these obfuscations are also explored.

What carries the argument

Prompt injection strings embedded in binary source code that survive compilation and appear in decompiler output to influence the AI agent's analysis.

If this is right

Detection of prompt injection strings in decompiler output provides a practical defense for AI reverse engineering agents.
Obfuscation methods can hide prompt injections, requiring updated detection approaches.
Defenses against obfuscated attacks can be developed once obfuscation techniques are understood.
Addressing these vulnerabilities is required before agentic systems can be used in production cyber workflows.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Detection logic could be built into standard decompilers to automatically flag suspicious inputs before they reach the AI agent.
The same embedding approach might affect other AI tools that ingest decompiled or disassembled code from untrusted sources.
Attack surfaces may extend beyond decompiler output to other stages of the agent's input pipeline.

Load-bearing premise

Prompt injection strings embedded in binaries will reliably appear in decompiler output in a form that allows reliable detection without excessive false positives.

What would settle it

Running the proposed detection on a set of binaries with embedded prompt injections where the strings either fail to appear in decompiler output or the detector produces high false positives on clean binaries.

Figures

Figures reproduced from arXiv: 2605.30677 by Brian Crawford, Patrick McClure.

**Figure 2.** Figure 2: An example indirect prompt injection string for attacking an agentic [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Depiction of a NN classification head on top of an LLM. The LLM [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

read the original abstract

Agentic software reverse engineering systems are vulnerable to prompt injection attacks placed into the source code of executable binary files. This research demonstrates defensive tactics for detecting the presences of prompt injection strings in the decompiler output of adversarial example programs. Methods for obfuscating these attacks and subsequent methods for defending against these obfuscations are also explored. This research advances the understanding of risk and security of agentic software analysis systems necessary for their deployment into production-level cyber workflows.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper flags prompt injection risks in AI reverse engineering agents via binaries but shows no experiments, data, or results to support any of the claims.

read the letter

The main takeaway is that this work points to a possible attack on agentic reverse engineering systems—embedding prompt injections in binary source so they survive into decompiler output—but the abstract makes those claims without any evidence, methods, or metrics.

What is new is the narrow application: taking established prompt injection ideas and testing them specifically against decompiler output fed to AI agents for binary analysis, plus some exploration of obfuscating the injections and then defending against the obfuscated versions. That pairing for this exact use case is a small step beyond general LLM security work.

It does identify why the issue could matter for production cyber tools, where these agents might handle real binaries.

The central problem is the complete lack of substance on the empirical side. The abstract says defensive tactics were demonstrated and obfuscation methods explored, yet nothing is shown—no experimental setup, no attack success rates, no detection accuracy numbers, no false positive data. You cannot tell whether the injections actually reach the agent in usable form or whether any defenses hold up. The assumption that strings will survive decompilation reliably enough for detection is left untested.

This is aimed at researchers working on AI for malware analysis or secure agent design. A reader wanting concrete techniques or validated risks will find little here. It is too preliminary to send for serious peer review.

Referee Report

2 major / 1 minor

Summary. The paper claims that agentic software reverse engineering systems are vulnerable to prompt injection attacks embedded in the source code of executable binaries. It asserts that defensive tactics for detecting prompt injection strings in decompiler output have been demonstrated on adversarial example programs, that methods for obfuscating such attacks have been explored, and that subsequent defenses against those obfuscations have also been developed. The work positions these findings as advancing risk and security understanding for deploying such AI agents in production cyber workflows.

Significance. If the claimed empirical demonstrations of detection, obfuscation, and counter-obfuscation techniques were supported by rigorous experimental validation, the results would be significant for the security of AI-driven reverse engineering tools, highlighting a novel attack surface in decompiler outputs and providing practical mitigations relevant to cyber defense applications.

major comments (2)

The manuscript asserts in the abstract that defensive tactics were demonstrated and obfuscation methods explored, yet provides no experimental setup, datasets, metrics, results, or evaluation sections to support any of these claims, making the central contributions impossible to assess or reproduce.
No methods, results, or discussion sections are present to substantiate the core claim that prompt injection strings embedded in binaries reliably survive into decompiler output in a detectable form, leaving the vulnerability assertion and all defensive tactics without evidence.

minor comments (1)

Abstract contains a typographical error: 'presences' should be 'presence'.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed review and constructive criticism. We acknowledge that the current manuscript version is incomplete in its presentation of experimental validation, and we agree that this prevents proper assessment of the claims. We will revise the paper substantially to include the missing sections.

read point-by-point responses

Referee: The manuscript asserts in the abstract that defensive tactics were demonstrated and obfuscation methods explored, yet provides no experimental setup, datasets, metrics, results, or evaluation sections to support any of these claims, making the central contributions impossible to assess or reproduce.

Authors: We agree with this assessment. The current draft does not contain the required experimental details. In the revised manuscript we will add a complete Methods section describing the experimental setup, the datasets of adversarial binaries, the detection metrics (e.g., precision, recall, F1), and an Evaluation section with quantitative results, tables, and figures that substantiate the detection, obfuscation, and counter-obfuscation claims. revision: yes
Referee: No methods, results, or discussion sections are present to substantiate the core claim that prompt injection strings embedded in binaries reliably survive into decompiler output in a detectable form, leaving the vulnerability assertion and all defensive tactics without evidence.

Authors: We accept this observation. The manuscript as submitted lacks these sections. The revision will introduce a Methods section that details how prompt injection strings were embedded and decompiled, a Results section reporting survival rates and detection performance across multiple decompilers, and a Discussion section addressing limitations and implications. This will provide the necessary evidence for the vulnerability and defense claims. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper is an empirical exploration of prompt injection attacks and defenses in agentic reverse engineering systems, with no equations, derivations, fitted parameters, or self-referential definitions. Claims rest on experimental demonstration of attack presence in decompiler output and detection tactics rather than any chain that reduces to its own inputs by construction. No load-bearing self-citations or uniqueness theorems are invoked. This matches the reader's assessment of score 0.0 and qualifies as self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit parameters, axioms, or invented entities; all elements of the ledger are empty as no technical details are available.

pith-pipeline@v0.9.1-grok · 5595 in / 1061 out tokens · 22845 ms · 2026-06-28T22:30:59.775293+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 7 canonical work pages · 5 internal anchors

[1]

Prompt Injection attack against LLM-integrated Applications

Y . Liu, G. Deng, Y . Li, K. Wang, Z. Wang, X. Wang, T. Zhang, Y . Liu, H. Wang, Y . Zhenget al., “Prompt injection attack against LLM- integrated applications,”arXiv preprint arXiv:2306.05499, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[2]

IDEsaster: A novel vulnerability class in AI IDEs,

A. Marzouk, “IDEsaster: A novel vulnerability class in AI IDEs,” MaccariTA, Dec. 6, 2025. [Online]. Available: https://maccarita.com/posts/idesaster

2025
[3]

Automatically attacking reverse engineering ai agents,

B. Crawford, J. Phillips, and P. McClure, “Automatically attacking reverse engineering ai agents,” inEuropean Conference on Cyber Warfare and Security. Academic Conferences International Limited, 2026, in press

2026
[4]

A critical evaluation of defenses against prompt injection attacks.arXiv preprint arXiv:2505.18333, 2025

Y . Jia, Z. Shao, Y . Liu, J. Jia, D. Song, and N. Z. Gong, “A critical evaluation of defenses against prompt injection attacks,”arXiv preprint arXiv:2505.18333, 2025

work page arXiv 2025
[5]

Promptarmor: Simple yet effective prompt injection defenses, 2025

T. Shi, K. Zhu, Z. Wang, Y . Jia, W. Cai, W. Liang, H. Wang, H. Alzahrani, J. Lu, K. Kawaguchiet al., “Promptarmor: Simple yet effective prompt injection defenses,”arXiv preprint arXiv:2507.15219, 2025

work page arXiv 2025
[6]

AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models

X. Liu, N. Xu, M. Chen, and C. Xiao, “AutoDAN: Generating stealthy jailbreak prompts on aligned large language models,”arXiv preprint arXiv:2310.04451, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[7]

Large language models: an overview of foundational architectures, recent trends, and a new taxonomy,

I. D. Mienye, N. Jere, G. Obaido, O. O. Ogunruku, E. Esenogho, and C. Modisane, “Large language models: an overview of foundational architectures, recent trends, and a new taxonomy,”Discover Applied Sciences, vol. 7, no. 9, p. 1027, 2025

2025
[8]

[Online]

National Security Agency, “Ghidra,” github, ac- cessed December 2, 2025. [Online]. Available: https://github.com/nationalsecurityagency/ghidra

2025
[9]

GhidraMCP,

L. Kirk, “GhidraMCP,” github, Jun. 22, 2025. [Online]. Available: https://github.com/lauriewired/ghidramcp

2025
[10]

Qwen3 Technical Report

A. Yang, A. Li, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Gao, C. Huang, C. Lvet al., “Qwen3 technical report,”arXiv preprint arXiv:2505.09388, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[11]

Overview of SQL injection defense mechanisms,

I. Tasevski and K. Jakimoski, “Overview of SQL injection defense mechanisms,” in2020 28th Telecommunications Forum (TELFOR). IEEE, 2020, pp. 1–4

2020
[12]

Kilpatrick, “ChatML,” github, accessed April 12,

L. Kilpatrick, “ChatML,” github, accessed April 12,
[13]

Available: https://github.com/openai/openai- python/blob/120d225b91a8453e15240a49fb1c6794d8119326/chatml.md

[Online]. Available: https://github.com/openai/openai- python/blob/120d225b91a8453e15240a49fb1c6794d8119326/chatml.md
[14]

OpenAI harmony response format,

D. Kundel, “OpenAI harmony response format,” Ope- nAI Developers, Aug. 5, 2025. [Online]. Available: https://developers.openai.com/cookbook/articles/openai-harmony

2025
[15]

Beginners C program examples,

G. Thakur, “Beginners C program examples,” github, accessed March 24, 2026. [Online]. Available: https://github.com/gouravthakur39/beginners-C-program- examples/tree/373d27c131e35bacdbf5c500f79fd234c6d4ec9b

2026
[16]

C all-in-one source code files,

D. Gookin, “C all-in-one source code files,” The Unofficial C For Dummies Website, accessed March 23, 2026. [Online]. Available: https://c-for-dummies.com/caio/sourcecode

2026
[17]

Ultimate collection of C programs: Source code with outputs,

C. Singh, “Ultimate collection of C programs: Source code with outputs,” Beginner’s Book, Dec. 1, 2024. [Online]. Available: https://beginnersbook.com/2015/02/simple-c-programs

2024
[18]

Fine-Tuning Causal LLMs for Text Classification: Embedding-Based vs. Instruction-Based Approaches

A. Yousefiramandi and C. Cooney, “Fine-tuning causal llms for text clas- sification: Embedding-based vs. instruction-based approaches,”arXiv preprint arXiv:2512.12677, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[19]

To tune or not to tune? adapting pretrained representations to diverse tasks,

M. E. Peters, S. Ruder, and N. A. Smith, “To tune or not to tune? adapting pretrained representations to diverse tasks,” inProceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP- 2019), 2019, pp. 7–14

2019
[20]

Text classification with transformers,

A. Naidu, “Text classification with transformers,” Medium, May 23,
[21]

Available: https://medium.com/@ashwinnaidu1991/text- classification-with-transformers-70acaf65c4a4

[Online]. Available: https://medium.com/@ashwinnaidu1991/text- classification-with-transformers-70acaf65c4a4
[22]

Adam: A Method for Stochastic Optimization

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[1] [1]

Prompt Injection attack against LLM-integrated Applications

Y . Liu, G. Deng, Y . Li, K. Wang, Z. Wang, X. Wang, T. Zhang, Y . Liu, H. Wang, Y . Zhenget al., “Prompt injection attack against LLM- integrated applications,”arXiv preprint arXiv:2306.05499, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[2] [2]

IDEsaster: A novel vulnerability class in AI IDEs,

A. Marzouk, “IDEsaster: A novel vulnerability class in AI IDEs,” MaccariTA, Dec. 6, 2025. [Online]. Available: https://maccarita.com/posts/idesaster

2025

[3] [3]

Automatically attacking reverse engineering ai agents,

B. Crawford, J. Phillips, and P. McClure, “Automatically attacking reverse engineering ai agents,” inEuropean Conference on Cyber Warfare and Security. Academic Conferences International Limited, 2026, in press

2026

[4] [4]

A critical evaluation of defenses against prompt injection attacks.arXiv preprint arXiv:2505.18333, 2025

Y . Jia, Z. Shao, Y . Liu, J. Jia, D. Song, and N. Z. Gong, “A critical evaluation of defenses against prompt injection attacks,”arXiv preprint arXiv:2505.18333, 2025

work page arXiv 2025

[5] [5]

Promptarmor: Simple yet effective prompt injection defenses, 2025

T. Shi, K. Zhu, Z. Wang, Y . Jia, W. Cai, W. Liang, H. Wang, H. Alzahrani, J. Lu, K. Kawaguchiet al., “Promptarmor: Simple yet effective prompt injection defenses,”arXiv preprint arXiv:2507.15219, 2025

work page arXiv 2025

[6] [6]

AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models

X. Liu, N. Xu, M. Chen, and C. Xiao, “AutoDAN: Generating stealthy jailbreak prompts on aligned large language models,”arXiv preprint arXiv:2310.04451, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[7] [7]

Large language models: an overview of foundational architectures, recent trends, and a new taxonomy,

I. D. Mienye, N. Jere, G. Obaido, O. O. Ogunruku, E. Esenogho, and C. Modisane, “Large language models: an overview of foundational architectures, recent trends, and a new taxonomy,”Discover Applied Sciences, vol. 7, no. 9, p. 1027, 2025

2025

[8] [8]

[Online]

National Security Agency, “Ghidra,” github, ac- cessed December 2, 2025. [Online]. Available: https://github.com/nationalsecurityagency/ghidra

2025

[9] [9]

GhidraMCP,

L. Kirk, “GhidraMCP,” github, Jun. 22, 2025. [Online]. Available: https://github.com/lauriewired/ghidramcp

2025

[10] [10]

Qwen3 Technical Report

A. Yang, A. Li, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Gao, C. Huang, C. Lvet al., “Qwen3 technical report,”arXiv preprint arXiv:2505.09388, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[11] [11]

Overview of SQL injection defense mechanisms,

I. Tasevski and K. Jakimoski, “Overview of SQL injection defense mechanisms,” in2020 28th Telecommunications Forum (TELFOR). IEEE, 2020, pp. 1–4

2020

[12] [12]

Kilpatrick, “ChatML,” github, accessed April 12,

L. Kilpatrick, “ChatML,” github, accessed April 12,

[13] [13]

Available: https://github.com/openai/openai- python/blob/120d225b91a8453e15240a49fb1c6794d8119326/chatml.md

[Online]. Available: https://github.com/openai/openai- python/blob/120d225b91a8453e15240a49fb1c6794d8119326/chatml.md

[14] [14]

OpenAI harmony response format,

D. Kundel, “OpenAI harmony response format,” Ope- nAI Developers, Aug. 5, 2025. [Online]. Available: https://developers.openai.com/cookbook/articles/openai-harmony

2025

[15] [15]

Beginners C program examples,

G. Thakur, “Beginners C program examples,” github, accessed March 24, 2026. [Online]. Available: https://github.com/gouravthakur39/beginners-C-program- examples/tree/373d27c131e35bacdbf5c500f79fd234c6d4ec9b

2026

[16] [16]

C all-in-one source code files,

D. Gookin, “C all-in-one source code files,” The Unofficial C For Dummies Website, accessed March 23, 2026. [Online]. Available: https://c-for-dummies.com/caio/sourcecode

2026

[17] [17]

Ultimate collection of C programs: Source code with outputs,

C. Singh, “Ultimate collection of C programs: Source code with outputs,” Beginner’s Book, Dec. 1, 2024. [Online]. Available: https://beginnersbook.com/2015/02/simple-c-programs

2024

[18] [18]

Fine-Tuning Causal LLMs for Text Classification: Embedding-Based vs. Instruction-Based Approaches

A. Yousefiramandi and C. Cooney, “Fine-tuning causal llms for text clas- sification: Embedding-based vs. instruction-based approaches,”arXiv preprint arXiv:2512.12677, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[19] [19]

To tune or not to tune? adapting pretrained representations to diverse tasks,

M. E. Peters, S. Ruder, and N. A. Smith, “To tune or not to tune? adapting pretrained representations to diverse tasks,” inProceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP- 2019), 2019, pp. 7–14

2019

[20] [20]

Text classification with transformers,

A. Naidu, “Text classification with transformers,” Medium, May 23,

[21] [21]

Available: https://medium.com/@ashwinnaidu1991/text- classification-with-transformers-70acaf65c4a4

[Online]. Available: https://medium.com/@ashwinnaidu1991/text- classification-with-transformers-70acaf65c4a4

[22] [22]

Adam: A Method for Stochastic Optimization

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014