pith. sign in

arxiv: 2605.30677 · v1 · pith:4EZAM7R7new · submitted 2026-05-29 · 💻 cs.CR · cs.AI· cs.SE

Investigating Detection and Obfuscation of Prompt Injection Attacks Against Software Reverse Engineering AI Agents

Pith reviewed 2026-06-28 22:30 UTC · model grok-4.3

classification 💻 cs.CR cs.AIcs.SE
keywords prompt injectionAI agentssoftware reverse engineeringdecompiler outputobfuscationadversarial examplesbinary analysiscyber security
0
0 comments X

The pith

Agentic software reverse engineering AI systems are vulnerable to prompt injection attacks embedded in binary source code, with detection possible via decompiler output.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that prompt injection attacks can be placed directly into the source code of executable binary files to target agentic software reverse engineering systems. It demonstrates that these attacks can be detected by searching for injection strings in the decompiler output produced from adversarial binaries. The work also explores techniques for obfuscating the injections and corresponding methods to defend against the obfuscated versions. This matters for production cyber workflows because agentic AI analysis tools may process untrusted binaries. The findings advance understanding of risks that must be addressed before such systems can be deployed securely.

Core claim

Agentic software reverse engineering systems are vulnerable to prompt injection attacks placed into the source code of executable binary files. This research demonstrates defensive tactics for detecting the presence of prompt injection strings in the decompiler output of adversarial example programs. Methods for obfuscating these attacks and subsequent methods for defending against these obfuscations are also explored.

What carries the argument

Prompt injection strings embedded in binary source code that survive compilation and appear in decompiler output to influence the AI agent's analysis.

If this is right

  • Detection of prompt injection strings in decompiler output provides a practical defense for AI reverse engineering agents.
  • Obfuscation methods can hide prompt injections, requiring updated detection approaches.
  • Defenses against obfuscated attacks can be developed once obfuscation techniques are understood.
  • Addressing these vulnerabilities is required before agentic systems can be used in production cyber workflows.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Detection logic could be built into standard decompilers to automatically flag suspicious inputs before they reach the AI agent.
  • The same embedding approach might affect other AI tools that ingest decompiled or disassembled code from untrusted sources.
  • Attack surfaces may extend beyond decompiler output to other stages of the agent's input pipeline.

Load-bearing premise

Prompt injection strings embedded in binaries will reliably appear in decompiler output in a form that allows reliable detection without excessive false positives.

What would settle it

Running the proposed detection on a set of binaries with embedded prompt injections where the strings either fail to appear in decompiler output or the detector produces high false positives on clean binaries.

Figures

Figures reproduced from arXiv: 2605.30677 by Brian Crawford, Patrick McClure.

Figure 1
Figure 1. Figure 1: Illustration of an indirect prompt injection attack against a software [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: An example indirect prompt injection string for attacking an agentic [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Depiction of a NN classification head on top of an LLM. The LLM [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
read the original abstract

Agentic software reverse engineering systems are vulnerable to prompt injection attacks placed into the source code of executable binary files. This research demonstrates defensive tactics for detecting the presences of prompt injection strings in the decompiler output of adversarial example programs. Methods for obfuscating these attacks and subsequent methods for defending against these obfuscations are also explored. This research advances the understanding of risk and security of agentic software analysis systems necessary for their deployment into production-level cyber workflows.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that agentic software reverse engineering systems are vulnerable to prompt injection attacks embedded in the source code of executable binaries. It asserts that defensive tactics for detecting prompt injection strings in decompiler output have been demonstrated on adversarial example programs, that methods for obfuscating such attacks have been explored, and that subsequent defenses against those obfuscations have also been developed. The work positions these findings as advancing risk and security understanding for deploying such AI agents in production cyber workflows.

Significance. If the claimed empirical demonstrations of detection, obfuscation, and counter-obfuscation techniques were supported by rigorous experimental validation, the results would be significant for the security of AI-driven reverse engineering tools, highlighting a novel attack surface in decompiler outputs and providing practical mitigations relevant to cyber defense applications.

major comments (2)
  1. The manuscript asserts in the abstract that defensive tactics were demonstrated and obfuscation methods explored, yet provides no experimental setup, datasets, metrics, results, or evaluation sections to support any of these claims, making the central contributions impossible to assess or reproduce.
  2. No methods, results, or discussion sections are present to substantiate the core claim that prompt injection strings embedded in binaries reliably survive into decompiler output in a detectable form, leaving the vulnerability assertion and all defensive tactics without evidence.
minor comments (1)
  1. Abstract contains a typographical error: 'presences' should be 'presence'.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed review and constructive criticism. We acknowledge that the current manuscript version is incomplete in its presentation of experimental validation, and we agree that this prevents proper assessment of the claims. We will revise the paper substantially to include the missing sections.

read point-by-point responses
  1. Referee: The manuscript asserts in the abstract that defensive tactics were demonstrated and obfuscation methods explored, yet provides no experimental setup, datasets, metrics, results, or evaluation sections to support any of these claims, making the central contributions impossible to assess or reproduce.

    Authors: We agree with this assessment. The current draft does not contain the required experimental details. In the revised manuscript we will add a complete Methods section describing the experimental setup, the datasets of adversarial binaries, the detection metrics (e.g., precision, recall, F1), and an Evaluation section with quantitative results, tables, and figures that substantiate the detection, obfuscation, and counter-obfuscation claims. revision: yes

  2. Referee: No methods, results, or discussion sections are present to substantiate the core claim that prompt injection strings embedded in binaries reliably survive into decompiler output in a detectable form, leaving the vulnerability assertion and all defensive tactics without evidence.

    Authors: We accept this observation. The manuscript as submitted lacks these sections. The revision will introduce a Methods section that details how prompt injection strings were embedded and decompiled, a Results section reporting survival rates and detection performance across multiple decompilers, and a Discussion section addressing limitations and implications. This will provide the necessary evidence for the vulnerability and defense claims. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper is an empirical exploration of prompt injection attacks and defenses in agentic reverse engineering systems, with no equations, derivations, fitted parameters, or self-referential definitions. Claims rest on experimental demonstration of attack presence in decompiler output and detection tactics rather than any chain that reduces to its own inputs by construction. No load-bearing self-citations or uniqueness theorems are invoked. This matches the reader's assessment of score 0.0 and qualifies as self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit parameters, axioms, or invented entities; all elements of the ledger are empty as no technical details are available.

pith-pipeline@v0.9.1-grok · 5595 in / 1061 out tokens · 22845 ms · 2026-06-28T22:30:59.775293+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

22 extracted references · 7 canonical work pages · 5 internal anchors

  1. [1]

    Prompt Injection attack against LLM-integrated Applications

    Y . Liu, G. Deng, Y . Li, K. Wang, Z. Wang, X. Wang, T. Zhang, Y . Liu, H. Wang, Y . Zhenget al., “Prompt injection attack against LLM- integrated applications,”arXiv preprint arXiv:2306.05499, 2023

  2. [2]

    IDEsaster: A novel vulnerability class in AI IDEs,

    A. Marzouk, “IDEsaster: A novel vulnerability class in AI IDEs,” MaccariTA, Dec. 6, 2025. [Online]. Available: https://maccarita.com/posts/idesaster

  3. [3]

    Automatically attacking reverse engineering ai agents,

    B. Crawford, J. Phillips, and P. McClure, “Automatically attacking reverse engineering ai agents,” inEuropean Conference on Cyber Warfare and Security. Academic Conferences International Limited, 2026, in press

  4. [4]

    A critical evaluation of defenses against prompt injection attacks.arXiv preprint arXiv:2505.18333, 2025

    Y . Jia, Z. Shao, Y . Liu, J. Jia, D. Song, and N. Z. Gong, “A critical evaluation of defenses against prompt injection attacks,”arXiv preprint arXiv:2505.18333, 2025

  5. [5]

    Promptarmor: Simple yet effective prompt injection defenses, 2025

    T. Shi, K. Zhu, Z. Wang, Y . Jia, W. Cai, W. Liang, H. Wang, H. Alzahrani, J. Lu, K. Kawaguchiet al., “Promptarmor: Simple yet effective prompt injection defenses,”arXiv preprint arXiv:2507.15219, 2025

  6. [6]

    AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models

    X. Liu, N. Xu, M. Chen, and C. Xiao, “AutoDAN: Generating stealthy jailbreak prompts on aligned large language models,”arXiv preprint arXiv:2310.04451, 2023

  7. [7]

    Large language models: an overview of foundational architectures, recent trends, and a new taxonomy,

    I. D. Mienye, N. Jere, G. Obaido, O. O. Ogunruku, E. Esenogho, and C. Modisane, “Large language models: an overview of foundational architectures, recent trends, and a new taxonomy,”Discover Applied Sciences, vol. 7, no. 9, p. 1027, 2025

  8. [8]

    [Online]

    National Security Agency, “Ghidra,” github, ac- cessed December 2, 2025. [Online]. Available: https://github.com/nationalsecurityagency/ghidra

  9. [9]

    GhidraMCP,

    L. Kirk, “GhidraMCP,” github, Jun. 22, 2025. [Online]. Available: https://github.com/lauriewired/ghidramcp

  10. [10]

    Qwen3 Technical Report

    A. Yang, A. Li, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Gao, C. Huang, C. Lvet al., “Qwen3 technical report,”arXiv preprint arXiv:2505.09388, 2025

  11. [11]

    Overview of SQL injection defense mechanisms,

    I. Tasevski and K. Jakimoski, “Overview of SQL injection defense mechanisms,” in2020 28th Telecommunications Forum (TELFOR). IEEE, 2020, pp. 1–4

  12. [12]

    Kilpatrick, “ChatML,” github, accessed April 12,

    L. Kilpatrick, “ChatML,” github, accessed April 12,

  13. [13]

    Available: https://github.com/openai/openai- python/blob/120d225b91a8453e15240a49fb1c6794d8119326/chatml.md

    [Online]. Available: https://github.com/openai/openai- python/blob/120d225b91a8453e15240a49fb1c6794d8119326/chatml.md

  14. [14]

    OpenAI harmony response format,

    D. Kundel, “OpenAI harmony response format,” Ope- nAI Developers, Aug. 5, 2025. [Online]. Available: https://developers.openai.com/cookbook/articles/openai-harmony

  15. [15]

    Beginners C program examples,

    G. Thakur, “Beginners C program examples,” github, accessed March 24, 2026. [Online]. Available: https://github.com/gouravthakur39/beginners-C-program- examples/tree/373d27c131e35bacdbf5c500f79fd234c6d4ec9b

  16. [16]

    C all-in-one source code files,

    D. Gookin, “C all-in-one source code files,” The Unofficial C For Dummies Website, accessed March 23, 2026. [Online]. Available: https://c-for-dummies.com/caio/sourcecode

  17. [17]

    Ultimate collection of C programs: Source code with outputs,

    C. Singh, “Ultimate collection of C programs: Source code with outputs,” Beginner’s Book, Dec. 1, 2024. [Online]. Available: https://beginnersbook.com/2015/02/simple-c-programs

  18. [18]

    Fine-Tuning Causal LLMs for Text Classification: Embedding-Based vs. Instruction-Based Approaches

    A. Yousefiramandi and C. Cooney, “Fine-tuning causal llms for text clas- sification: Embedding-based vs. instruction-based approaches,”arXiv preprint arXiv:2512.12677, 2025

  19. [19]

    To tune or not to tune? adapting pretrained representations to diverse tasks,

    M. E. Peters, S. Ruder, and N. A. Smith, “To tune or not to tune? adapting pretrained representations to diverse tasks,” inProceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP- 2019), 2019, pp. 7–14

  20. [20]

    Text classification with transformers,

    A. Naidu, “Text classification with transformers,” Medium, May 23,

  21. [21]

    Available: https://medium.com/@ashwinnaidu1991/text- classification-with-transformers-70acaf65c4a4

    [Online]. Available: https://medium.com/@ashwinnaidu1991/text- classification-with-transformers-70acaf65c4a4

  22. [22]

    Adam: A Method for Stochastic Optimization

    D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014