pith. sign in

arxiv: 2605.30667 · v1 · pith:6MX6IIXLnew · submitted 2026-05-28 · 💻 cs.CR · cs.AI

Automatically Attacking Software Reverse Engineering AI Agents

Pith reviewed 2026-06-29 06:12 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords prompt injectionLLM agentsreverse engineeringadversarial attackdecompilationmalware analysisgenetic algorithmbinary obfuscation
0
0 comments X

The pith

Prompt injections can be hidden in executable binaries to mislead LLM reverse engineering agents.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that attackers can use a genetic algorithm to insert prompt injections into source code as string variable assignments. These assignments do not change the program's behavior but survive compilation and decompilation to deliver instructions to LLM agents analyzing the binary. This reveals a vulnerability in automated tools that combine decompilers with LLMs for malware analysis. If the method works, it allows bypassing of LLM-based detection systems by corrupting their output. The approach adapts an existing adversarial technique to the domain of binary reverse engineering.

Core claim

By modifying the AutoDAN adversarial attack with a genetic algorithm, the authors generate string assignments that embed surreptitious instructions. When the binary is decompiled, the LLM receives these strings as part of the code and follows the hidden prompts, leading to misinterpretation of the executable's functionality without altering its actual behavior.

What carries the argument

Genetic algorithm search for prompt injections inserted as extraneous string variable assignments that carry instructions to the LLM without affecting executable functionality.

If this is right

  • LLM-powered disassembly and decompilation systems can be deceived into producing incorrect analytical output.
  • Automated detection systems relying on LLM analysis pipelines can be bypassed by attackers.
  • Insights can be gained on the security implications of integrating LLMs into cybersecurity toolchains.
  • More robust agentic code analysis systems are needed to resist such injections.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Sanitizing or ignoring string literals in decompiled output could mitigate the attack.
  • The technique might extend to other LLM agents that process code or text from untrusted sources.
  • Empirical testing on various LLMs and decompilers would determine the attack's success rate across different models.

Load-bearing premise

LLM agents will treat the content of string variable assignments in decompiled code as actionable instructions instead of filtering or disregarding them.

What would settle it

Compile a program with the generated string assignments, decompile it, feed the output to an LLM agent, and observe if the agent follows the injected instructions or analyzes the code correctly.

read the original abstract

Software tools for reverse engineering executable binary files, such as Ghidra, enable malware analysts to safely conduct robust static analysis without having access to original source code. Coupled with the analytic power of large language models (LLM), agentic systems enabled with tools, such as GhidraMCP, can allow analysts to automate a previously human driven process. Although this automation can increase the productivity of a single malware analyst, it also introduces a new area of vulnerability for malware obfuscation. This paper presents an adversarial technique using genetic algorithm-based prompt generation, a modification of an adversarial attack known as AutoDAN, to demonstrate the ability to deceive LLM-powered disassembly and decompilation systems into misinterpreting binary executables, effectively corrupting their analytical output. This proof-of-concept methodology exploits inherent vulnerabilities in how LLMs process and interpret decompiled machine code via prompt injection by using extraneous string variable assignments to pass surreptitious instructions to the LLM while not impacting the functionality of the executable file. We demonstrate this capability through several concise examples. This approach could enable attackers to bypass automated detection systems that rely on LLM-driven analysis pipelines. By studying and understanding this attack, insights can be gained regarding the security implication of integrating LLMs into cybersecurity toolchains and building more robust agentic code analysis systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript claims to demonstrate a proof-of-concept adversarial attack on LLM-powered reverse engineering agents (e.g., GhidraMCP) by modifying the AutoDAN attack with a genetic algorithm. The attack inserts extraneous string variable assignments into source code; these survive compilation and decompilation as prompt injections that cause the LLM to misinterpret binary functionality without changing runtime behavior. The approach is presented as exploiting inherent LLM vulnerabilities in processing decompiled code and is illustrated via several concise examples, with discussion of implications for securing LLM-integrated analysis pipelines.

Significance. If the attack mechanism were shown to work reliably, the result would be significant for AI security and cybersecurity toolchains, as it identifies a concrete prompt-injection vector specific to decompiled code and agentic analysis systems. The work would usefully highlight risks in automated RE and motivate defenses. However, the current lack of systematic evaluation limits its contribution to a preliminary observation rather than a substantiated finding.

major comments (2)
  1. [Abstract] Abstract: the central claim that the method 'effectively corrupts their analytical output' and enables bypassing of LLM-driven detection rests entirely on 'several concise examples' with no reported success rates, number of trials, controls, failure modes, or comparison baselines. This evidentiary gap is load-bearing for the claim of a viable attack.
  2. The manuscript (stress-test assumption): the attack requires that injected string literals survive decompilation and are treated by the target LLM agent as actionable instructions rather than data or sanitized content. No validation against realistic agent prompting, output parsing, or sanitization behaviors is provided, leaving the core behavioral premise untested.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback highlighting the need for stronger empirical support. We address each major comment below and commit to revisions that will expand the evaluation while preserving the proof-of-concept focus of the work.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that the method 'effectively corrupts their analytical output' and enables bypassing of LLM-driven detection rests entirely on 'several concise examples' with no reported success rates, number of trials, controls, failure modes, or comparison baselines. This evidentiary gap is load-bearing for the claim of a viable attack.

    Authors: We agree that the current evidentiary basis is limited to illustrative examples and that this weakens the central claims. In the revised manuscript we will update the abstract to describe the work more precisely as a proof-of-concept and add a dedicated evaluation section reporting success rates across repeated trials on multiple binaries, controls for code structure, observed failure modes, and simple baseline comparisons. revision: yes

  2. Referee: The manuscript (stress-test assumption): the attack requires that injected string literals survive decompilation and are treated by the target LLM agent as actionable instructions rather than data or sanitized content. No validation against realistic agent prompting, output parsing, or sanitization behaviors is provided, leaving the core behavioral premise untested.

    Authors: The examples demonstrate string survival through compilation and decompilation together with observable effects on the GhidraMCP agent. We acknowledge that these examples do not cover the full range of agent configurations. We will add experiments that vary prompting strategies, incorporate explicit output parsing steps, and test simple sanitization defenses to provide broader validation of the behavioral premise. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents an empirical proof-of-concept for a genetic-algorithm-modified AutoDAN attack that injects instructions via extraneous string assignments in binaries targeting LLM reverse-engineering agents. No equations, fitted parameters, or derivation chains appear; the central claim is an independent empirical demonstration of prompt injection surviving compilation/decompilation. No self-citations are load-bearing, no uniqueness theorems are imported, and no ansatzes or renamings reduce the result to its inputs by construction. The attack relies on external assumptions about LLM context processing that are tested via POC examples rather than derived circularly.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the work rests on the unstated premise that LLMs will execute injected instructions from decompiled strings.

pith-pipeline@v0.9.1-grok · 5752 in / 1056 out tokens · 28702 ms · 2026-06-29T06:12:14.755326+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

14 extracted references · 5 canonical work pages · 5 internal anchors

  1. [1]

    Ini0al dissembler tools allowed analysts to convert the machine code of an executable binary into the slightly less tedious assembly language presenta0on

    Introduc'on The rise of tools for reverse engineering executable binary files enhanced the ability of soFware analysts to determine the func0on of unknown programs through sta0c analysis. Ini0al dissembler tools allowed analysts to convert the machine code of an executable binary into the slightly less tedious assembly language presenta0on. More recently, ...

  2. [2]

    Methodology This aPack is based on ideas from what Thomas (2025) calls a transcript hack, where he shows, through a number guessing game illustra0on, that if a model is passed what appears to be the transcript for the current conversa0on’s history, it will believe that the provided informa0on is accurate to previous turns of the conversa0on. The format fo...

  3. [3]

    These strings serve to lead the LLM to act as though the decompile func0on data immediately before and aFer the adversarial string contains inaccuracies and that only the target code string should be evaluated in determining the purpose of the func0on. The target code string begins with the text that re-calls the decompile_func0on tool, a task originally ...

  4. [4]

    i'm sorry

    In AutoDAN, the gene0c algorithm samples from a seed list of prepend and append string pairs running each through the model and assessing the fitness of each pair. Similar to Equa0on 1, the fitness is determined by calcula0ng the loss, but instead of looking at the likelihood of a single token given the previous text, the AudoDAN fitness assessment calculate...

  5. [5]

    APack files were crea0ng using Qwen3-8B and tested against both Qwen3-8B and GPT-OSS-120B (Agarwal et al., 2025)

    Experiments and Results Experiments were conducted on four executable binary files: two containing a single main func0on and two containing a main func0on and another func0on. APack files were crea0ng using Qwen3-8B and tested against both Qwen3-8B and GPT-OSS-120B (Agarwal et al., 2025). 3.1 Single Func'on Files The AutoDAN algorithm using Qwen3-8B found a...

  6. [6]

    gpt-oss-120b & gpt-oss-20b Model Card

    gpt-oss-120b & gpt-oss-20b model card. arXiv preprint arXiv:2508.10925. Bandi, A., Kongari, B., Naguru, R., Pasnoor, S. and Vilipala, S.V.,

  7. [7]

    Future Internet, 17(9), p.404

    The rise of agen0c ai: A review of defini0ons, frameworks, architectures, applica0ons, evalua0on metrics, and challenges. Future Internet, 17(9), p.404. Chen, X., Zhou, A., Ye, C. and Zhang, C., 2025, October. ClearAgent: Agen0c Binary Analysis for Effec0ve Vulnerability Detec0on. In Proceedings of the 1st ACM SIGPLAN Interna6onal Workshop on Language Model...

  8. [8]

    (ed.) The 2026 Guide to Prompt Engineering [online] Available at: hPps://www.ibm.com/think/topics/prompt-injec0on (Accessed: 24 February 2026)

    ‘What is a Prompt Injec0on APack?’, Gadesha, V. (ed.) The 2026 Guide to Prompt Engineering [online] Available at: hPps://www.ibm.com/think/topics/prompt-injec0on (Accessed: 24 February 2026). Liu, Y ., Deng, G., Li, Y ., Wang, K., Wang, Z., Wang, X., Zhang, T., Liu, Y ., Wang, H., Zheng, Y . and Liu, Y .,

  9. [9]

    Prompt Injection attack against LLM-integrated Applications

    Prompt injec0on aPack against llm-integrated applica0ons. arXiv preprint arXiv:2306.05499. Liu, X., Xu, N., Chen, M. and Xiao, C.,

  10. [10]

    AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models

    Autodan: Genera0ng stealthy jailbreak prompts on aligned large language models. arXiv preprint arXiv:2310.04451. Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P ., Neelakantan, A., Shyam, P ., Sastry, G., Askell, A. and Agarwal, S.,

  11. [11]

    Language Models are Few-Shot Learners

    Language models are few-shot learners. arXiv preprint arXiv:2005.14165, 1(3), p.3. Marzouk, A.,

  12. [12]

    Ray, P .P .,

    ‘IDEsaster: A Novel Vulnerability Class in AI IDEs’, MaccariTA [online] Available at: hPps://maccarita.com/posts/idesaster (Accessed: 29 January 2026). Ray, P .P .,

  13. [13]

    Vassilev, A., Oprea, A., Fordyce, A

    ‘Why Smart Instruc0on-Following Makes Prompt Injec0on Easier’, Giles’ Blog [online] Available at: hPps://www.gilesthomas.com/2025/11 (Accessed: 17 December 2025). Vassilev, A., Oprea, A., Fordyce, A. and Andersen, H.,

  14. [14]

    Qwen3 Technical Report

    Qwen3 technical report. arXiv preprint arXiv:2505.09388