pith. sign in

arxiv: 2605.21821 · v1 · pith:QUUVRVNUnew · submitted 2026-05-20 · 💻 cs.CR

A Large Language Model Approach to Generating Bypass Rules for Malware Evasion in Analysis Sandbox

Pith reviewed 2026-05-22 08:23 UTC · model grok-4.3

classification 💻 cs.CR
keywords malware analysissandbox evasionlarge language modelsYARA rulesbypass rulesautomated analysisevasion detection
0
0 comments X

The pith

Large language models can generate YARA rules that bypass malware evasion checks in sandboxes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper establishes that large language models can automatically produce YARA rules capable of bypassing the environment detection mechanisms that malware uses to evade sandbox analysis. A sympathetic reader would care because it offers a way to handle the rapid evolution of evasion techniques without relying on labor-intensive manual reverse engineering for every new method. The approach works by feeding execution traces from prematurely terminated malware samples into the models, which apply multiple reasoning strategies to craft targeted bypass rules, then refines them through sanitization and iteration. If successful, this enables the identification of additional malware families and the observation of behaviors that standard platforms miss.

Core claim

The authors claim that their ABLE system leverages large language models to analyze malware execution traces and generate bypass YARA rules, achieving a 79% success rate across 334 samples from four models, where iterative refinement aids 29.5% of cases, and resulting in 47% more family classifications than existing platforms while uncovering previously hidden behaviors.

What carries the argument

The ABLE pipeline, which combines LLM reasoning on execution traces with an auto-sanitization pipeline and feedback-driven iterative refinement to produce functional bypass rules.

If this is right

  • Sandboxes can process more malware samples effectively without custom manual rules for each evasion technique.
  • Analysts gain visibility into malware families that were previously misclassified or undetected.
  • Hidden malicious behaviors become observable in sandbox reports for a larger portion of samples.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method could be adapted to generate rules for other types of security analysis tools that face similar evasion issues.
  • Testing across additional model types might reveal whether success rates hold beyond the open-weight models evaluated here.
  • Combining this generation process with existing signature databases could reduce the need for repeated refinements over time.

Load-bearing premise

The generated rules from the language models will consistently and reliably bypass the specific evasion checks in actual sandbox environments without introducing errors or missing critical evasion mechanisms.

What would settle it

Observing whether applying the output YARA rules to the malware samples in a real analysis sandbox results in the malicious payloads executing as expected rather than being suppressed by the evasion detection.

Figures

Figures reproduced from arXiv: 2605.21821 by Aisha Ali-Gombe, Justin Woodring, Lamine Noureddine, Mst Eshita Khatun, Sideeq Bello, Zhiyong Sui.

Figure 1
Figure 1. Figure 1: Static analysis of StealC malware. 2.3. Motivating Example: StealC Malware We illustrate the evasion bypass challenge using StealC, an information stealer malware distributed via a decentral￾ized MaaS model [18] [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Dynamic analysis of StealC. The clean trace (c’) shows the call to [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Overview of ABLE the engine analyzes the failure and uses this feedback to guide new LLM-based rules in subsequent iterations. 3.1. LLM-Guided YARA Rule Generation ABLE uses LLMs to analyze execution traces and gen￾erate YARA rules with bypass actions through structured prompting. Recent studies have shown that LLMs trained on large corpora containing code and decompilation knowl￾edge excel in code analysi… view at source ↗
Figure 4
Figure 4. Figure 4: Prompt template structure. Components 1, 2, 4 are [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: YARA rule auto-sanitization with self-correction [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: This figures illustrate the workflow example of ABLE on a StealC malware. [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗
read the original abstract

Sandbox evasion remains a critical challenge for automated malware analysis, as modern malware employs environment checks to detect analysis platforms and suppress malicious behavior. Existing approaches rely on manually crafted bypass rules that require deep reverse engineering of each evasion mechanism -an approach that cannot scale against rapidly evolving evasion techniques. In this paper, we leverage large language models (LLMs) to automatically generate YARA rules that bypass evasion checks in sandbox environments. We propose ABLE, which analyzes execution traces from malware terminated due to potentially evasive behavior and employs multiple reasoning strategies to generate targeted bypass rules. To address syntactic errors and improve the efficacy of the bypass rules in the LLM outputs, we introduce an auto-sanitization pipeline and feedback-driven iterative refinement. We evaluate ABLE on 334 real-world malware samples across four open-weight LLMs. ABLE achieves a 79% bypass success rate, with iterative refinement contributing 29.5% of successful cases. Compared to existing analysis platforms, ABLE identifies 47% more malware family classifications and exposes previously hidden behaviors.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes ABLE, an LLM-based system that analyzes malware execution traces to automatically generate YARA rules for bypassing sandbox evasion checks. It introduces auto-sanitization and iterative refinement to improve rule quality, and evaluates the approach on 334 real-world malware samples using four open-weight LLMs, reporting a 79% bypass success rate (with 29.5% of successes attributed to refinement) along with 47% more family classifications and exposure of hidden behaviors compared to existing platforms.

Significance. If the empirical results hold under rigorous validation, the work could offer a scalable alternative to manual reverse engineering for generating evasion bypasses, potentially improving automated malware analysis throughput and behavioral visibility. The combination of LLMs with feedback-driven refinement represents a practical application of generative models to a domain traditionally reliant on expert-crafted rules.

major comments (2)
  1. [Evaluation] Evaluation section: The headline 79% bypass success rate (and the 29.5% contribution from iterative refinement) is defined as the malware samples exhibiting malicious behavior rather than terminating early after rule application. However, the manuscript does not specify the sandbox platform used, the precise integration mechanism for applying the generated YARA rules (a static pattern-matching tool) to neutralize runtime environment checks performed by the malware, or the concrete success criteria (e.g., increased API call volume, dropped files, or C2 traffic). This leaves open whether the metric reflects genuine functional bypass or measurement artifacts.
  2. [Methods] Methods and § on rule generation: The claim that ABLE produces rules that 'reliably bypass evasion checks' rests on the assumption that LLM-generated YARA patterns, after sanitization, correctly target and disable the specific environment-detection logic in the samples. No error analysis or case studies are provided showing that the rules address the actual evasion mechanisms (e.g., timing checks, hardware artifacts) rather than producing overly permissive or irrelevant patterns.
minor comments (2)
  1. [Abstract] Abstract: The phrase 'exposes previously hidden behaviors' is not quantified; a concrete metric (e.g., additional API calls or network connections observed) would strengthen the comparison to existing platforms.
  2. [Evaluation] Sample selection: The criteria for choosing the 334 malware samples and the distribution across families or evasion techniques are not detailed, making it difficult to assess generalizability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback on our manuscript. The comments highlight important aspects of clarity in evaluation and validation of rule quality. We respond to each major comment below and will incorporate revisions to address them in the next version of the paper.

read point-by-point responses
  1. Referee: [Evaluation] Evaluation section: The headline 79% bypass success rate (and the 29.5% contribution from iterative refinement) is defined as the malware samples exhibiting malicious behavior rather than terminating early after rule application. However, the manuscript does not specify the sandbox platform used, the precise integration mechanism for applying the generated YARA rules (a static pattern-matching tool) to neutralize runtime environment checks performed by the malware, or the concrete success criteria (e.g., increased API call volume, dropped files, or C2 traffic). This leaves open whether the metric reflects genuine functional bypass or measurement artifacts.

    Authors: We agree that greater specificity is required to substantiate the evaluation metric. In the revised manuscript, we will explicitly describe the sandbox platform used for trace collection and rule testing, detail the integration process by which the generated YARA rules are applied within the sandbox to intercept environment checks, and clarify the success criteria, including observable indicators such as increased API call volume, dropped files, and C2 traffic. These additions will confirm that the 79% rate measures functional bypass rather than artifacts. revision: yes

  2. Referee: [Methods] Methods and § on rule generation: The claim that ABLE produces rules that 'reliably bypass evasion checks' rests on the assumption that LLM-generated YARA patterns, after sanitization, correctly target and disable the specific environment-detection logic in the samples. No error analysis or case studies are provided showing that the rules address the actual evasion mechanisms (e.g., timing checks, hardware artifacts) rather than producing overly permissive or irrelevant patterns.

    Authors: We acknowledge the value of qualitative validation alongside quantitative results. While the success rate across 334 samples supports the overall approach, we agree that error analysis and case studies would strengthen claims about rule targeting. In the revision, we will add a dedicated subsection with error analysis and representative case studies. These will show how specific sanitized and refined YARA rules address evasion mechanisms such as timing checks or hardware artifacts, including before-and-after behavioral comparisons. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical success rates measured on external malware samples

full rationale

The paper presents an empirical pipeline: execution traces from real malware samples are fed to LLMs to produce YARA rules, followed by sanitization and iterative refinement, then success is measured by whether the rules allow the 334 samples to exhibit malicious behavior in a sandbox. No equations, fitted parameters, or self-referential definitions appear in the abstract or described method. The 79% bypass rate and 29.5% refinement contribution are reported as direct experimental outcomes on external samples rather than quantities derived from the method itself by construction. No self-citation load-bearing steps or uniqueness theorems are invoked to justify core claims. The derivation chain remains independent of its measured outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The approach depends on the premise that LLMs can produce usable security rules from traces and that the sanitization/refinement loop corrects errors without systematic bias.

axioms (1)
  • domain assumption LLM-generated YARA rules can be made syntactically valid and functionally effective via post-processing and iteration
    Invoked to justify the auto-sanitization pipeline and feedback loop as sufficient for practical use.
invented entities (1)
  • ABLE framework no independent evidence
    purpose: End-to-end system for trace-to-bypass-rule generation using LLMs
    New named system introduced to organize the described components and evaluation.

pith-pipeline@v0.9.0 · 5733 in / 1262 out tokens · 41731 ms · 2026-05-22T08:23:39.894175+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

107 extracted references · 107 canonical work pages · 6 internal anchors

  1. [1]

    https://any.run,

    ANY .RUN: Interactive online malware sandbox. https://any.run,

  2. [2]

    Cloud-based interactive malware analysis platform

  3. [3]

    MalwareBazaar malware sam- ple

    abuse.ch. MalwareBazaar malware sam- ple. https : / / bazaar . abuse . ch/, 2023. SHA256: caf00150589120b59ea0145206e2aacad383d3cc18431674 fd58cc84f49b0e25

  4. [4]

    abuse.ch: Fighting malware and botnets

    abuse.ch. abuse.ch: Fighting malware and botnets. https://abuse.ch/,

  5. [5]

    Non-profit threat intelligence organization

  6. [6]

    Malware dynamic analysis evasion techniques: A survey

    Amir Afianian, Salman Niksefat, Babak Sadeghiyan, and David Baptiste. Malware dynamic analysis evasion techniques: A survey. ACM Computing Surveys (CSUR), 52(6):1–28, 2019

  7. [7]

    When malware is packin’heat; limits of machine learning classifiers based on static analysis features

    Hojjat Aghakhani, Fabio Gritti, Francesco Mecca, Martina Lin- dorfer, Stefano Ortolani, Davide Balzarotti, Giovanni Vigna, and Christopher Kruegel. When malware is packin’heat; limits of machine learning classifiers based on static analysis features. In Network and Distributed System Security Symposium. Internet So- ciety, 2020

  8. [8]

    Exploring llms for malware detection: Review, framework design, and countermeasure approaches.arXiv preprint arXiv:2409.07587, 2024

    Jamal Al-Karaki, Muhammad Al-Zafar Khan, and Marwan Omar. Exploring llms for malware detection: Review, framework design, and countermeasure approaches.arXiv preprint arXiv:2409.07587, 2024

  9. [9]

    Opseq: Android malware fingerprinting

    Aisha Ali-Gombe, Irfan Ahmed, Golden G Richard III, and Vassil Roussev. Opseq: Android malware fingerprinting. InProceedings of the 5th Program Protection and Reverse Engineering Workshop, pages 1–12, 2015

  10. [10]

    Aspectdroid: Android app analysis system

    Aisha Ali-Gombe, Irfan Ahmed, Golden G Richard III, and Vassil Roussev. Aspectdroid: Android app analysis system. InProceedings of the Sixth ACM Conference on Data and Application Security and Privacy, pages 145–147, 2016

  11. [11]

    Toward a more dependable hybrid analysis of android malware using aspect-oriented programming

    Aisha I Ali-Gombe, Brendan Saltaformaggio, Dongyan Xu, Golden G Richard III, et al. Toward a more dependable hybrid analysis of android malware using aspect-oriented programming. computers & security, 73:235–248, 2018

  12. [12]

    Behavioral analysis of ai- generated malware: New frontiers in threat detection

    Ammar Almomani, Samer Aoudi, Ahmad Al-Qerem, Amjad Ald- weesh, and Mouhammd Alkasassbeh. Behavioral analysis of ai- generated malware: New frontiers in threat detection. InExamining Cybersecurity Risks Produced by Generative AI, pages 211–234. IGI Global Scientific Publishing, 2025

  13. [13]

    Evading machine learning malware detection.black Hat, 2017:1–6, 2017

    Hyrum S Anderson, Anant Kharkar, Bobby Filar, and Phil Roth. Evading machine learning malware detection.black Hat, 2017:1–6, 2017

  14. [14]

    Claude opus 4.1

    Anthropic. Claude opus 4.1. https://www.anthropic.com/news/ claude-opus-4-1, 2025

  15. [15]

    Hancitor (aka chanitor) observed using multiple attack approaches

    Ankit Anubhav and Dileep Jallepalli. Hancitor (aka chanitor) observed using multiple attack approaches. Mandiant, Google Cloud, 2016. URL: https://cloud.google.com/blog/topics/threat- intelligence/hancitor-aka-chanit/

  16. [16]

    FormBook malware trend analysis

    ANY .RUN. FormBook malware trend analysis. https://any.run/ malware-trends/formbook, 2023

  17. [17]

    Malware analysis report: e536afc7f63611d1bbea4305f958661e.exe (MD5: E536afc7f63611d1bbea4305f958661e), 2023

    ANY .RUN. Malware analysis report: e536afc7f63611d1bbea4305f958661e.exe (MD5: E536afc7f63611d1bbea4305f958661e), 2023. URL: https://app.any.run/tasks/de97abb5-3aaf-40cc-b4d5-2d4a78997f09/

  18. [18]

    The android malware static analysis: techniques, limitations, and open challenges

    Khaled Bakour, H Murat ¨Unver, and Razan Ghanem. The android malware static analysis: techniques, limitations, and open challenges. In2018 3rd International Conference on Computer Science and Engineering (UBMK), pages 586–593. Ieee, 2018

  19. [19]

    A reverse engineering education needs analysis survey.arXiv preprint arXiv:2212.07531, 2022

    Charles R Barone IV , Robert Serafin, Ilya Shavrov, Ibrahim Baggili, Aisha Ali-Gombe, Golden G Richard III, and Andrew Case. A reverse engineering education needs analysis survey.arXiv preprint arXiv:2212.07531, 2022

  20. [20]

    Stealc: A copycat of vidar and raccoon infostealers gaining in popularity – part 2

    Pierre Le Bourhis, Quentin Bourgue, and Sekoia TDR. Stealc: A copycat of vidar and raccoon infostealers gaining in popularity – part 2. Sekoia.io Blog, 2023. URL: https://blog.sekoia.io/stealc- a-copycat-of-vidar-and-raccoon-infostealers-gaining-in-popularity- part-2/

  21. [21]

    Language models are few- shot learners.Advances in neural information processing systems, 33:1877–1901, 2020

    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few- shot learners.Advances in neural information processing systems, 33:1877–1901, 2020

  22. [22]

    A survey on automated dynamic malware analysis evasion and counter-evasion: Pc, mobile, and web

    Alexei Bulazel and B ¨ulent Yener. A survey on automated dynamic malware analysis evasion and counter-evasion: Pc, mobile, and web. InProceedings of the 1st Reversing and Offensive-oriented Trends Symposium, pages 1–21, 2017

  23. [23]

    Statos: A portable tool for secure malware analysis and sample acquisition in low resource environments.Ar- ray, 26:100391, 2025

    Alexander Cameron, Abu Alam, Nasreen Anjum, Javed Ali Khan, and Alexios Mylonas. Statos: A portable tool for secure malware analysis and sample acquisition in low resource environments.Ar- ray, 26:100391, 2025

  24. [24]

    Llm-cloudsec: Large language model empowered automatic and deep vulnerability analysis for intelligent clouds

    Daipeng Cao and W Jun. Llm-cloudsec: Large language model empowered automatic and deep vulnerability analysis for intelligent clouds. InIEEE INFOCOM 2024-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), pages 1–6. IEEE, 2024

  25. [25]

    Advanced or not? a comparative study of the use of anti- debugging and anti-vm techniques in generic and targeted malware

    Ping Chen, Christophe Huygens, Lieven Desmet, and Wouter Joosen. Advanced or not? a comparative study of the use of anti- debugging and anti-vm techniques in generic and targeted malware. InIFIP International Conference on ICT Systems Security and Privacy Protection, pages 323–336. Springer, 2016

  26. [26]

    Droidhook: a novel api- hook based android malware dynamic analysis sandbox.Automated Software Engineering, 30(1):10, 2023

    Yuning Cui, Yi Sun, and Zhaowen Lin. Droidhook: a novel api- hook based android malware dynamic analysis sandbox.Automated Software Engineering, 30(1):10, 2023

  27. [27]

    Francisco Handrick da Costa, Ismael Medeiros, Thales Menezes, Jo˜ao Victor da Silva, Ingrid Lorraine da Silva, Rodrigo Bonif ´acio, Krishna Narasimhan, and M´arcio Ribeiro. Exploring the use of static and dynamic analysis to improve the performance of the mining sandbox approach for android malware identification.Journal of Systems and Software, 183:111092, 2022

  28. [28]

    A bazar of tricks: Following team9’s development cycles

    Daniel Frank, Mary Zhao and Assaf Dahan. A bazar of tricks: Following team9’s development cycles. Cybereason Nocturnus. URL: https://www.cybereason.com/blog/research/a-bazar-of-tricks- following-team9s-development-cycles

  29. [29]

    DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

    DeepSeek-AI, Daya Guo, Qihao Liu, Zhenda Fan, Borong Liang, Aixin Huang, Zhewen Ruan, Wangding Shang, Zhaowei Zhao, Wangsheng Ren, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025. URL: https://arxiv.org/abs/2501.12948

  30. [30]

    Egregor ransomware the raas successor to maze, 2021

    NHS England Digital. Egregor ransomware the raas successor to maze, 2021. URL: https://digital.nhs.uk/cyber-alerts/2020/cc-3681

  31. [31]

    Artificial intelligence-based malware detection, analysis, and mitigation.Symmetry, 15(3):677, 2023

    Amir Djenna, Ahmed Bouridane, Saddaf Rubab, and Ibrahim Moussa Marou. Artificial intelligence-based malware detection, analysis, and mitigation.Symmetry, 15(3):677, 2023

  32. [32]

    The Llama 3 Herd of Models

    Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783, 2024. URL: https: //arxiv.org/abs/2407.21783

  33. [33]

    A survey on automated dynamic malware-analysis tech- niques and tools.ACM Computing Surveys, 44(2):1–42, 2012

    Manuel Egele, Theodoor Scholte, Engin Kirda, and Christopher Kruegel. A survey on automated dynamic malware-analysis tech- niques and tools.ACM Computing Surveys, 44(2):1–42, 2012. doi:10.1145/2089125.2089126

  34. [34]

    Malware analysis: Raccoon stealer v2.0

    eSentire Threat Response Unit. Malware analysis: Raccoon stealer v2.0. eSentire Threat Intelligence, September 2022. URL: https: / / www. esentire . com / blog / esentire - threat - intelligence - malware - analysis-raccoon-stealer-v2-0

  35. [35]

    Llm-maldetect: A large language model-based method for android malware detection.IEEE Access, 2025

    Ruirui Feng, Hui Chen, Shuo Wang, Md Monjurul Karim, and Qingshan Jiang. Llm-maldetect: A large language model-based method for android malware detection.IEEE Access, 2025

  36. [36]

    Anastasia: Android malware detection using static analysis of applications

    Hossein Fereidooni, Mauro Conti, Danfeng Yao, and Alessandro Sperduti. Anastasia: Android malware detection using static analysis of applications. In2016 8th IFIP international conference on new technologies, mobility and security (NTMS), pages 1–5. IEEE, 2016

  37. [37]

    Matthew Gaber, Mohiuddin Ahmed, and Helge Janicke. Defeating evasive malware with peekaboo: Extracting authentic malware be- havior with dynamic binary instrumentation.Journal of Information Security and Applications, 95:104290, 2025

  38. [38]

    A systematical and longitudinal study of evasive behaviors in windows malware.Computers & security, 113:102550, 2022

    Nicola Galloro, Mario Polino, Michele Carminati, Andrea Con- tinella, and Stefano Zanero. A systematical and longitudinal study of evasive behaviors in windows malware.Computers & security, 113:102550, 2022

  39. [39]

    Gemma 2: Improving Open Language Models at a Practical Size

    Gemma Team, Morgane Riviere, Shreya Pathak, Pier Giuseppe Sessa, Cassidy Hardin, Surya Bhupatiraju, L ´eonard Hussenot, Thomas Mesnard, Bobak Shahriari, Alexandre Ram ´e, et al. Gemma 2: Improving open language models at a practical size.arXiv preprint arXiv:2408.00118, 2024. URL: https://arxiv.org/abs/2408. 00118

  40. [40]

    En- viral: Fuzzing the environment for evasive malware analysis

    Floris Gorter, Cristiano Giuffrida, and Erik Van Der Kouwe. En- viral: Fuzzing the environment for evasive malware analysis. In Proceedings of the 16th European Workshop on System Security, pages 8–14, 2023

  41. [41]

    Cuckoo sandbox: open source automated malware anal- ysis

    Claudio Guarnieri, Alessio Tanasi, Jurriaan Bremer, and Mark Schloesser. Cuckoo sandbox: open source automated malware anal- ysis. Black Hat USA, 2013. URL: https://media.blackhat.com/us- 13/US-13-Bremer-Mo-Malware-Mo-Problems-Cuckoo-Sandbox- WP.pdf

  42. [42]

    Triage: Automated malware analysis sandbox

    Hatching. Triage: Automated malware analysis sandbox. https: //tria.ge/, 2024. Cloud-based malware analysis platform

  43. [43]

    On benchmarking code llms for android malware analysis

    Yiling He, Hongyu She, Xingzhi Qian, Xinran Zheng, Zhuo Chen, Zhan Qin, and Lorenzo Cavallaro. On benchmarking code llms for android malware analysis. InProceedings of the 34th ACM SIG- SOFT International Symposium on Software Testing and Analysis, pages 153–160, 2025

  44. [44]

    Object allocation pattern as an indicator for maliciousness-an exploratory analysis

    Adamu Hussaini, Bassam Zahran, and Aisha Ali-Gombe. Object allocation pattern as an indicator for maliciousness-an exploratory analysis. InProceedings of the Eleventh ACM Conference on Data and Application Security and Privacy, pages 313–315, 2021

  45. [45]

    A method for automatic android malware detection based on static analysis and deep learning.IEEE Access, 10:117334–117352, 2022

    M ¨ulhem ˙Ibrahim, Bayan Issa, and Muhammed Basheer Jasser. A method for automatic android malware detection based on static analysis and deep learning.IEEE Access, 10:117334–117352, 2022

  46. [46]

    Intezer analyze: Genetic malware analysis

    Intezer. Intezer analyze: Genetic malware analysis. https://www. intezer.com/, 2024. Code similarity and malware analysis platform

  47. [47]

    Dynamic analysis for iot malware detection with convolution neural network model.Ieee Access, 8:96899–96911, 2020

    Jueun Jeon, Jong Hyuk Park, and Young-Sik Jeong. Dynamic analysis for iot malware detection with convolution neural network model.Ieee Access, 8:96899–96911, 2020

  48. [48]

    Q., Sablayrolles, A., Mensch, A., Bamford, C., Chaplot, D

    Zhijing Jin, Jiarui Liu, Zhiheng Lyu, Spencer Poff, Mrinmaya Sachan, Rada Mihalcea, Mona Diab, and Bernhard Sch ¨olkopf. Can large language models infer causation from correlation?arXiv preprint arXiv:2306.05836, 2023

  49. [49]

    Joe Sandbox: Deep malware analysis

    Joe Security LLC. Joe Sandbox: Deep malware analysis. https:// www.joesecurity.org, 2024. Commercial malware analysis sandbox

  50. [50]

    From shamoon to stonedrill: Wipers attacking saudi organizations and beyond

    Kaspersky Lab. From shamoon to stonedrill: Wipers attacking saudi organizations and beyond. Kaspersky, 2017. URL: https:// media.kasperskycontenthub.com/wp-content/uploads/sites/43/2018/ 03/07180722/Report Shamoon StoneDrill final.pdf

  51. [51]

    Sama: A comprehensive smart automated malware analyzer empowered by chatgpt integration

    Mahmoud A Khalifa, Iman Almomani, and Walid El-Shafai. Sama: A comprehensive smart automated malware analyzer empowered by chatgpt integration. In2024 IEEE 30th International Conference on Telecommunications (ICT), pages 1–6. IEEE, 2024

  52. [52]

    Androbyte: Llm-driven privacy analysis through byte- code summarization and dynamic dataflow call graph generation

    Mst Eshita Khatun, Lamine Noureddine, Zhiyong Sui, and Aisha Ali-Gombe. Androbyte: Llm-driven privacy analysis through byte- code summarization and dynamic dataflow call graph generation. arXiv preprint arXiv:2510.15112, 2025

  53. [53]

    Logs in, patches out: Automated vulnerability repair via{Tree-of- Thought}{LLM}analysis

    Youngjoon Kim, Sunguk Shin, Hyoungshick Kim, and Jiwon Yoon. Logs in, patches out: Automated vulnerability repair via{Tree-of- Thought}{LLM}analysis. In34th USENIX Security Symposium (USENIX Security 25), pages 4401–4419, 2025

  54. [54]

    Malgene: Automatic extraction of malware analysis evasion signature

    Dhilung Kirat and Giovanni Vigna. Malgene: Automatic extraction of malware analysis evasion signature. InProceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pages 769–780, 2015

  55. [55]

    Evasive malware exposed and deconstructed

    Christopher Kruegel. Evasive malware exposed and deconstructed. InRSA Conference USA, 2015. Session CRWD-T08

  56. [56]

    A case study of llm for automated vulnerability repair: Assessing impact of reasoning and patch validation feedback

    Ummay Kulsum, Haotian Zhu, Bowen Xu, and Marcelo d’Amorim. A case study of llm for automated vulnerability repair: Assessing impact of reasoning and patch validation feedback. InProceedings of the 1st ACM International Conference on AI-Powered Software, pages 103–111, 2024

  57. [57]

    Rex86: A local large language model for assisting in x86 assembly reverse engineering

    Darrin Lea, James Ghawaly, Golden Richard, Aisha Ali-Gombe, and Andrew Case. Rex86: A local large language model for assisting in x86 assembly reverse engineering. In2025 IEEE Annual Computer Security Applications Conference (ACSAC), pages 108–122. IEEE, 2025

  58. [58]

    Dmalnet: Dynamic malware analysis based on api feature engineering and graph learning.Computers & Security, 122:102872, 2022

    Ce Li, Zijun Cheng, He Zhu, Leiqi Wang, Qiujian Lv, Yan Wang, Ning Li, and Degang Sun. Dmalnet: Dynamic malware analysis based on api feature engineering and graph learning.Computers & Security, 122:102872, 2022

  59. [59]

    Llm-based vulnerability detection

    Hongping Li and Li Shan. Llm-based vulnerability detection. In2023 International Conference on Human-Centered Cognitive Systems (HCCS), pages 1–4. IEEE, 2023

  60. [60]

    Exploring and evaluating hallucinations in llm-powered code generation.arXiv preprint arXiv:2404.00971, 2024

    Fang Liu, Yang Liu, Lin Shi, Houkun Huang, Ruifeng Wang, Zhen Yang, Li Zhang, Zhongqi Li, and Yuchi Ma. Exploring and evaluating hallucinations in llm-powered code generation.arXiv preprint arXiv:2404.00971, 2024

  61. [61]

    Exploring Code Analysis: Zero-Shot Insights on Syntax and Semantics with LLMs

    Wei Ma, Shangqing Liu, Zhihao Lin, Wenhan Wang, Qiang Hu, Ye Liu, Cen Zhang, Liming Nie, Li Li, and Yang Liu. Lms: Understanding code syntax and semantics for code analysis.arXiv preprint arXiv:2305.12138, 2023

  62. [62]

    Redefining malware sandboxing: En- hancing analysis through sysmon and elk integration.IEEe Access, 12:68624–68636, 2024

    Rasmi-Vlad Mahmoud, Marios Anagnostopoulos, Sergio Pastrana, and Jens Myrup Pedersen. Redefining malware sandboxing: En- hancing analysis through sysmon and elk integration.IEEe Access, 12:68624–68636, 2024

  63. [63]

    capa: The FLARE team’s open-source tool to identify capabilities in executable files

    Mandiant. capa: The FLARE team’s open-source tool to identify capabilities in executable files. https://github.com/mandiant/capa,

  64. [64]

    Open-source malware capability detection tool

  65. [65]

    Spotless sandboxes: Evading malware anal- ysis systems using wear-and-tear artifacts

    Najmeh Miramirkhani, Mahathi Priya Appini, Nick Nikiforakis, and Michalis Polychronakis. Spotless sandboxes: Evading malware anal- ysis systems using wear-and-tear artifacts. In2017 IEEE Symposium on Security and Privacy (SP), pages 1009–1024. IEEE, 2017

  66. [66]

    Borja Molina-Coronado, Antonio Ruggia, Usue Mori, Alessio Merlo, Alexander Mendiburu, and Jose Miguel-Alonso. Light up that droid! on the effectiveness of static analysis features against app obfuscation for android malware detection.Journal of Network and Computer Applications, 235:104094, 2025

  67. [67]

    Using an llm to help with code understanding

    Daye Nam, Andrew Macvean, Vincent Hellendoorn, Bogdan Vasilescu, and Brad Myers. Using an llm to help with code understanding. InProceedings of the IEEE/ACM 46th International Conference on Software Engineering, pages 1–13, 2024

  68. [68]

    Obfuscated malware detection and classification in network traffic leveraging hybrid large language models and synthetic data.Sensors (Basel, Switzerland), 25(1):202, 2025

    Mehwish Naseer, Farhan Ullah, Samia Ijaz, Hamad Naeem, Amjad Alsirhani, Ghadah Naif Alwakid, and Abdullah Alomari. Obfuscated malware detection and classification in network traffic leveraging hybrid large language models and synthetic data.Sensors (Basel, Switzerland), 25(1):202, 2025

  69. [69]

    Muzzamil Noor, Haider Abbas, and Waleed Bin Shahid. Countering cyber threats for industrial applications: An automated approach for malware evasion detection and analysis.Journal of Network and Computer Applications, 103:249–261, 2018

  70. [70]

    al-khaser: Public malware techniques used in the wild: Virtual machine, emulation, debuggers, sandbox detection

    Lord Noteworthy. al-khaser: Public malware techniques used in the wild: Virtual machine, emulation, debuggers, sandbox detection. https://github.com/LordNoteworthy/al-khaser, 2016. Open-source evasion toolkit, accessed: 2026

  71. [71]

    capemon: The monitor DLL for CAPE

    Kevin O’Reilly. capemon: The monitor DLL for CAPE. https: //github.com/kevoreilly/capemon, 2024. Accessed: 2026-02-05

  72. [72]

    CAPE Sandbox: Malware configuration and payload extraction

    Kevin O’Reilly and CAPE Contributors. CAPE Sandbox: Malware configuration and payload extraction. https://github.com/kevoreilly/ CAPEv2, 2024. Open-source malware sandbox, accessed: 2026

  73. [73]

    Mars stealer: Exclusive new threat research

    Arnold Osipov. Mars stealer: Exclusive new threat research. Mor- phisec Labs, March 2022. URL: https://www.morphisec.com/blog/ threat-research-mars-stealer

  74. [74]

    Spear phishing attacks target organi- zations in ukraine, payloads include the document stealer outsteel and the downloader saintbot

    Palo Alto Networks Unit 42. Spear phishing attacks target organi- zations in ukraine, payloads include the document stealer outsteel and the downloader saintbot. URL: https://unit42.paloaltonetworks. com/ukraine-targeted-outsteel-saintbot/

  75. [75]

    Automatic detection and bypassing of anti-debugging techniques for microsoft windows environments.Advances in Electrical and Computer Engineering, 19(2):23–28, 2019

    Juhyun Park, Yun-Hwan Jang, Soohwa Hong, and Yongsu Park. Automatic detection and bypassing of anti-debugging techniques for microsoft windows environments.Advances in Electrical and Computer Engineering, 19(2):23–28, 2019

  76. [76]

    Im- proving the robustness of ai-based malware detection using adver- sarial machine learning.Algorithms, 14(10):297, 2021

    Shruti Patil, Vijayakumar Varadarajan, Devika Walimbe, Siddharth Gulechha, Sushant Shenoy, Aditya Raina, and Ketan Kotecha. Im- proving the robustness of ai-based malware detection using adver- sarial machine learning.Algorithms, 14(10):297, 2021

  77. [77]

    As- sessing llms in malicious code deobfuscation of real-world malware campaigns.Expert Systems with Applications, 256:124912, 2024

    Constantinos Patsakis, Fran Casino, and Nikolaos Lykousas. As- sessing llms in malicious code deobfuscation of real-world malware campaigns.Expert Systems with Applications, 256:124912, 2024

  78. [78]

    Red Teaming Language Models with Language Models

    Ethan Perez, Saffron Huang, Francis Song, Trevor Cai, Roman Ring, John Aslanides, Amelia Glaese, Nat McAleese, and Geoffrey Irving. Red teaming language models with language models, 2022.URL https://arxiv. org/abs/2202.03286, 15, 2022

  79. [79]

    Spvexec and spvluexec- a novel realtime defensive tool for stealthy malware infection

    Nicholas Phillips and Aisha Ali-Gombe. Spvexec and spvluexec- a novel realtime defensive tool for stealthy malware infection. International Journal On Advances in Security, pages 72–85, 2023

  80. [80]

    Nicholas Phillips and A Ali Gombe. Longitudinal study of per- sistence vectors (pvs) in windows malware: Evolution, complexity, and stealthiness.SECURWARE 2022, The Sixteenth International Conference on Emerging Security Information, Systems and Tech- nologies, pages 28–34, 2022

Showing first 80 references.