SoK: AI-Augmented Binary Reversing

Dokyung Song; Hyungjoon Koo; Kexin Pei; Shakhzod Yuldoshkhujaev; Yiyue Zhang; Yujeong Kwon

arxiv: 2606.17398 · v1 · pith:XWX4T45Knew · submitted 2026-06-16 · 💻 cs.CR · cs.AI· cs.SE

SoK: AI-Augmented Binary Reversing

Yujeong Kwon , Yiyue Zhang , Shakhzod Yuldoshkhujaev , Kexin Pei , Dokyung Song , Hyungjoon Koo This is my paper

Pith reviewed 2026-06-27 00:58 UTC · model grok-4.3

classification 💻 cs.CR cs.AIcs.SE

keywords AI-augmented binary reversingsystematization of knowledgebinary analysismachine learning for reversinglarge language modelsmalware investigationvulnerability discoveryfirmware auditing

0 comments

The pith

A unified taxonomy organizes AI techniques for reversing binaries across 22 inference tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper reviews 144 studies published since 2015 that apply machine learning, large language models, and agentic AI to the task of binary reversing. It groups the work into 22 domains based on the specific inferences each approach attempts, such as recovering function boundaries or identifying vulnerabilities. The authors then build one taxonomy that links conventional static and dynamic analysis steps with newer AI pipelines, including how binaries are represented and what learning methods are used. A reader would care because compilation strips away semantic details, making reversing slow and error-prone, and a shared map could reduce duplication while exposing where AI still falls short on reliability and scale.

Core claim

By surveying 144 papers and sorting them into 22 domains according to inference tasks, the work introduces a single taxonomy that spans conventional reversing pipelines and AI-augmented ones. This taxonomy connects traditional analysis techniques, binary-derived artifacts, representation strategies, learning paradigms, and downstream tasks while clarifying the roles of large language models and agentic systems. The result supplies a common vocabulary, shows repeated structures across approaches, and points out ongoing challenges in evaluation and technical gaps.

What carries the argument

The unified taxonomy that links traditional analysis techniques, binary-derived artifacts, representation strategies, learning paradigms, and downstream inference tasks.

If this is right

The taxonomy supplies a shared vocabulary that connects work across reversing domains.
Common structures become visible across approaches that previously appeared unrelated.
Persistent technical challenges and evaluation gaps stand out for targeted improvement.
Promising research opportunities emerge for building more reliable and scalable systems.
The framework serves as a foundation for next-generation AI-augmented reversing tools.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Future tool builders could adopt the taxonomy to design benchmarks that cover all 22 domains instead of isolated tasks.
The emphasis on agentic AI suggests experiments that chain multiple reversing steps into single automated workflows.
Gaps in evaluation practices could be addressed by creating public datasets tied directly to the taxonomy categories.
The organization may help identify which binary representations work best when paired with large language models versus traditional machine learning.

Load-bearing premise

The 144 selected papers since 2015 form a representative sample of the field and the taxonomy captures the main underlying structures without large omissions or bias in how the papers were chosen.

What would settle it

Finding dozens of additional papers on AI-augmented binary reversing from 2015 onward whose methods or tasks fall outside the 22 domains or break the connections drawn in the taxonomy.

Figures

Figures reproduced from arXiv: 2606.17398 by Dokyung Song, Hyungjoon Koo, Kexin Pei, Shakhzod Yuldoshkhujaev, Yiyue Zhang, Yujeong Kwon.

**Figure 1.** Figure 1: A taxonomy overview for binary reversing spanning conventional (§4) and AI-augmented (§5) pipelines. Drawing [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Approximate composition of the binary corpus [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: Snapshot of our interactive visualization platform for AI-augmented binary reversing pipelines. The left panel [PITH_FULL_IMAGE:figures/full_fig_p019_3.png] view at source ↗

read the original abstract

Binary reversing is fundamental to software understanding, vulnerability discovery, malware investigation, and firmware auditing. However, it remains inherently challenging due to the irreversible loss of semantic information during compilation. Recent advances in machine learning, large language models (LLMs), and agentic AI systems have accelerated the adoption of AI-augmented binary reversing. Yet, the resulting body of work has become increasingly fragmented across reversing domains, artifact representations, learning approaches, and evaluation practices. This paper presents the first comprehensive systematization of knowledge on AI-augmented binary reversing. We analyze 144 research papers published since 2015, and organize them into 22 binary reversing domains according to the inference tasks. We further introduce a unified taxonomy spanning conventional and AI-augmented reversing pipelines. Our taxonomy connects traditional analysis techniques, binary-derived artifacts, representation strategies, learning paradigms, and downstream inference tasks, while clarifying the emerging roles of LLMs and agentic AI systems. By establishing a common vocabulary and structured framework, we provide a holistic view of the field's evolution over the past decade. Our study reveals common structures underlying seemingly disparate approaches, highlights persistent technical challenges and evaluation gaps, and identifies promising opportunities for future research. Collectively, these insights clarify the current state of the field and provide a foundation for the next generation of reliable and scalable AI-augmented binary reversing systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This SoK organizes 144 papers on AI for binary reversing into a taxonomy but leaves its own selection and construction methods unclear.

read the letter

The paper's main move is to collect post-2015 work on AI-augmented binary reversing and sort it into 22 domains by inference task, then lay out a single taxonomy that runs from traditional static and dynamic analysis through binary artifacts, representations, learning methods, and on to LLM and agentic approaches.

It does the basic job of an SoK by giving names and connections to pieces that have been scattered across venues. That can save time for someone trying to place a new technique or spot where evaluation practices are thin.

The soft spot is exactly where the stress-test note flags it: the abstract states the counts and the taxonomy but supplies no search protocol, databases, inclusion rules, or process for building the 22 domains. Without those details the claim that the sample is representative and the taxonomy is unified stays untestable. If the full paper has a clear, reproducible methods section then the organizational work holds; if not, the framework rests on an opaque foundation.

This is for people already working in binary analysis or security who want a map rather than a new algorithm. A reader who needs a quick way to locate related papers on, say, decompilation or malware classification with LLMs will find it useful.

It deserves a serious referee. The topic is active and fragmented enough that a careful survey can reduce overlap, provided the selection and taxonomy steps are documented well enough to let others build on them. I would send it to review and ask the referees to focus first on the survey methodology.

Referee Report

1 major / 0 minor

Summary. This SoK paper surveys AI-augmented binary reversing. It reviews 144 papers published since 2015, organizes them into 22 domains based on inference tasks, and introduces a unified taxonomy connecting conventional and AI-augmented pipelines (analysis techniques, binary artifacts, representations, learning paradigms, and downstream tasks). The work highlights common structures, technical challenges, evaluation gaps, and future opportunities while establishing a shared vocabulary for the field.

Significance. If the paper selection is systematic and the taxonomy construction is transparent and reproducible, the work would provide a valuable structured overview of a fragmented area, clarifying the evolution of AI use in binary analysis and identifying research directions. The explicit connection between traditional and LLM/agentic approaches is a potential strength for guiding future systems.

major comments (1)

[Abstract] Abstract and (presumably) the methodology section: the central claim that this is the 'first comprehensive systematization' analyzing 144 papers and yielding a 'unified taxonomy' spanning 22 domains is load-bearing on the selection process, yet no search protocol, queried databases, keywords, inclusion/exclusion criteria, or inter-rater process for taxonomy assignment is described. Without these details the representativeness of the sample and absence of major omissions cannot be assessed.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our SoK paper. We agree that explicit details on the paper selection process and taxonomy construction are essential to support the claims of comprehensiveness and to enable reproducibility assessment. We will revise the manuscript to address this concern.

read point-by-point responses

Referee: [Abstract] Abstract and (presumably) the methodology section: the central claim that this is the 'first comprehensive systematization' analyzing 144 papers and yielding a 'unified taxonomy' spanning 22 domains is load-bearing on the selection process, yet no search protocol, queried databases, keywords, inclusion/exclusion criteria, or inter-rater process for taxonomy assignment is described. Without these details the representativeness of the sample and absence of major omissions cannot be assessed.

Authors: We agree that this is a valid and important point. The submitted manuscript did not include a dedicated methodology section describing the systematic review process, which is an oversight for an SoK paper. In the revised version, we will add a new Section 2 (Methodology) that explicitly details: the search protocol and databases queried (Google Scholar, arXiv, IEEE Xplore, ACM Digital Library, and proceedings from major venues including IEEE S&P, CCS, USENIX Security, NDSS); the keyword strings and combinations used; the inclusion criteria (peer-reviewed or archival papers from 2015 onward applying AI/ML/LLM techniques to binary reversing tasks) and exclusion criteria (non-AI papers, pure surveys without new contributions, non-English works); the multi-stage screening process that yielded the final set of 144 papers; and the taxonomy construction approach, including how the 22 domains were iteratively derived and the author consensus process used for paper classification. This addition will directly address the referee's concern regarding representativeness and reproducibility. revision: yes

Circularity Check

0 steps flagged

No circularity: purely descriptive SoK survey with no derivations or self-referential reductions

full rationale

This is a systematization of knowledge paper that surveys 144 external papers since 2015, organizes them into 22 domains, and proposes a taxonomy connecting conventional and AI-augmented pipelines. It contains no equations, predictions, fitted parameters, or first-principles derivations that could reduce to the paper's own inputs by construction. The central claims are descriptive categorizations of external literature rather than self-definitional, fitted-input, or self-citation-load-bearing steps. No patterns from the enumerated circularity kinds apply, as the work is self-contained against external benchmarks (the cited papers) without renaming results or smuggling ansatzes.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

As a survey paper the work introduces no free parameters, mathematical axioms beyond standard literature-review practices, or new postulated entities.

axioms (1)

domain assumption The 144 papers selected since 2015 comprehensively represent the relevant literature on AI-augmented binary reversing
The SoK's claims of providing a holistic view rest on the assumption of representative coverage.

pith-pipeline@v0.9.1-grok · 5794 in / 1309 out tokens · 35777 ms · 2026-06-27T00:58:50.229064+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

191 extracted references · 3 linked inside Pith

[1]

Firmalice - automatic detection of authentication bypass vulnerabil- ities in binary firmware,

Y . Shoshitaishvili, R. Wang, C. Hauser, C. Kruegel, and G. Vigna, “Firmalice - automatic detection of authentication bypass vulnerabil- ities in binary firmware,” inProceedings of the 22nd Annual Network and Distributed System Security Symposium (NDSS), 2015

2015
[2]

Leveraging semantic relations in code and data to enhance taint analysis of embedded systems,

J. Zhao, Y . Li, Y . Zou, Z. Liang, Y . Xiao, Y . Li, B. Peng, N. Zhong, X. Wang, W. Wanget al., “Leveraging semantic relations in code and data to enhance taint analysis of embedded systems,” inProceedings of the 33rd USENIX Security Symposium (Security), 2024

2024
[3]

Finding bugs using your own code: detecting functionally-similar yet inconsistent code,

M. Ahmadi, R. M. Farkhani, R. Williams, and L. Lu, “Finding bugs using your own code: detecting functionally-similar yet inconsistent code,” inProceedings of the 30th USENIX Security Symposium (Security), 2021

2021
[4]

Evaluating and improving neural program- smoothing-based fuzzing,

M. Wu, L. Jiang, J. Xiang, Y . Zhang, G. Yang, H. Ma, S. Nie, S. Wu, H. Cui, and L. Zhang, “Evaluating and improving neural program- smoothing-based fuzzing,” inProceedings of the 44th IEEE/ACM International Conference on Software Engineering (ICSE), 2022

2022
[5]

Mtfuzz: fuzzing with a multi-task neural network,

D. She, R. Krishna, L. Yan, S. Jana, and B. Ray, “Mtfuzz: fuzzing with a multi-task neural network,” inProceedings of the 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), 2020

2020
[6]

Inspector gadget: Automated extraction of proprietary gadgets from malware binaries,

C. Kolbitsch, T. Holz, C. Kruegel, and E. Kirda, “Inspector gadget: Automated extraction of proprietary gadgets from malware binaries,” inProceedings of the 31st IEEE Symposium on Security and Privacy (SP), 2010

2010
[7]

Identifying dormant functionality in malware programs,

P. M. Comparetti, G. Salvaneschi, E. Kirda, C. Kolbitsch, C. Kruegel, and S. Zanero, “Identifying dormant functionality in malware programs,” inProceedings of the 31st IEEE Symposium on Security and Privacy (SP), 2010

2010
[8]

Autoprobe: Towards automatic active malicious server probing using dynamic binary analysis,

Z. Xu, A. Nappa, R. Baykov, G. Yang, J. Caballero, and G. Gu, “Autoprobe: Towards automatic active malicious server probing using dynamic binary analysis,” inProceedings of the 21st ACM SIGSAC Conference on Computer and Communications Security (CCS), 2014

2014
[9]

Dispatcher: Enabling active botnet infiltration using automatic protocol reverse- engineering,

J. Caballero, P. Poosankam, C. Kreibich, and D. Song, “Dispatcher: Enabling active botnet infiltration using automatic protocol reverse- engineering,” inProceedings of the 16th ACM SIGSAC Conference on Computer and Communications Security (CCS), 2009

2009
[10]

Binary sight-seeing: Accelerating reverse engineering via point-of-interest- beacons,

R. A. See, M. Gehring, M. Fischer, and S. Karuppayah, “Binary sight-seeing: Accelerating reverse engineering via point-of-interest- beacons,” inProceedings of the 39th Annual Computer Security Applications Conference (ACSAC), 2023

2023
[11]

One bad apple spoils the barrel: Under- standing the security risks introduced by third-party components in iot firmware,

B. Zhao, S. Ji, J. Xu, Y . Tian, Q. Wei, Q. Wang, C. Lyu, X. Zhang, C. Lin, J. Wuet al., “One bad apple spoils the barrel: Under- standing the security risks introduced by third-party components in iot firmware,”IEEE Transactions on Dependable and Secure Computing (TDSC), 2023

2023
[12]

Binaryai: Binary software composition analysis via intelligent bi- nary source code matching,

L. Jiang, J. An, H. Huang, Q. Tang, S. Nie, S. Wu, and Y . Zhang, “Binaryai: Binary software composition analysis via intelligent bi- nary source code matching,” inProceedings of the 46th IEEE/ACM International Conference on Software Engineering (ICSE), 2024

2024
[13]

Veribin: Adaptive verification of patches at the binary level

H. Wu, J. Wu, R. Wu, A. Sharma, A. Machiry, and A. Bianchi, “Veribin: Adaptive verification of patches at the binary level.” in Proceedings of the 32nd Annual Network and Distributed System Security Symposium (NDSS), 2025

2025
[14]

Precise and accurate patch presence test for binaries,

H. Zhang and Z. Qian, “Precise and accurate patch presence test for binaries,” inProceedings of the 27th USENIX Security Symposium (Security), 2018

2018
[15]

Patchdiscovery: Patch presence test for identifying binary vulnerabilities based on key basic blocks,

X. Xu, Q. Zheng, Z. Yan, M. Fan, A. Jia, Z. Zhou, H. Wang, and T. Liu, “Patchdiscovery: Patch presence test for identifying binary vulnerabilities based on key basic blocks,”IEEE Transactions on Software Engineering (TSE), 2023

2023
[16]

Automating patching of vulnerable open- source software versions in application binaries

R. Duan, A. Bijlani, Y . Ji, O. Alrawi, Y . Xiong, M. Ike, B. Saltafor- maggio, and W. Lee, “Automating patching of vulnerable open- source software versions in application binaries.” inProceedings of the 26th Annual Network and Distributed System Security Sympo- sium (NDSS), 2019

2019
[17]

An infrastructure to support interoperability in reverse engineering,

N. A. Kraft, B. A. Malloy, and J. F. Power, “An infrastructure to support interoperability in reverse engineering,”Information and Software Technology, 2007

2007
[18]

Toward an infrastructure to support interoperability in reverse engineering,

——, “Toward an infrastructure to support interoperability in reverse engineering,” inProceedings of the 12th Working Conference on Reverse Engineering (WCRE), 2005

2005
[19]

Memory forensics and the windows subsystem for linux,

N. Lewis, A. Case, A. Ali-Gombe, and G. G. Richard III, “Memory forensics and the windows subsystem for linux,”Digital Investiga- tion, 2018

2018
[20]

Seance: Divination of tool-breaking changes in forensically impor- tant binaries,

R. D. Maggio, A. Case, A. Ali-Gombe, and G. G. Richard III, “Seance: Divination of tool-breaking changes in forensically impor- tant binaries,”Forensic Science International: Digital Investigation, 2021

2021
[21]

Characterization of the windows kernel version vari- ability for accurate memory analysis,

M. I. Cohen, “Characterization of the windows kernel version vari- ability for accurate memory analysis,”Digital Investigation, 2015

2015
[22]

Bin-carver: Au- tomatic recovery of binary executable files,

S. Hand, Z. Lin, G. Gu, and B. Thuraisingham, “Bin-carver: Au- tomatic recovery of binary executable files,”Digital Investigation, 2012

2012
[23]

Who wrote this code? identifying the authors of program binaries,

N. Rosenblum, X. Zhu, and B. P. Miller, “Who wrote this code? identifying the authors of program binaries,” inProceedings of the 16th European Symposium on Research in Computer Security (ESORICS), 2011

2011
[24]

When coding style survives com- pilation: De-anonymizing programmers from executable binaries,

A. Caliskan, F. Yamaguchi, E. Dauber, R. Harang, K. Rieck, R. Greenstadt, and A. Narayanan, “When coding style survives com- pilation: De-anonymizing programmers from executable binaries,” inProceedings of the 25th Annual Network and Distributed System Security Symposium (NDSS), 2018

2018
[25]

Soft- ware plagiarism detection with birthmarks based on dynamic key instruction sequences,

Z. Tian, Q. Zheng, T. Liu, M. Fan, E. Zhuang, and Z. Yang, “Soft- ware plagiarism detection with birthmarks based on dynamic key instruction sequences,”IEEE Transactions on Software Engineering (TSE), 2015

2015
[26]

Identifying open- source license violation and 1-day security risk at large scale,

R. Duan, A. Bijlani, M. Xu, T. Kim, and W. Lee, “Identifying open- source license violation and 1-day security risk at large scale,” in Proceedings of the 24th ACM SIGSAC Conference on Computer and Communications Security (CCS), 2017

2017
[27]

Binary translation: Static, dynamic, retargetable?

C. Cifuentes and V . Malhotra, “Binary translation: Static, dynamic, retargetable?” inProceedings of the IEEE International Conference on Software Maintenance (ICSM), 1996

1996
[28]

De- composing legacy programs: A first step towards migrating to client– server platforms,

G. Canfora, A. Cimitile, A. De Lucia, and G. A. Di Lucca, “De- composing legacy programs: A first step towards migrating to client– server platforms,”Journal of Systems and Software (JSS), 2000

2000
[29]

Re- verse engineering from mainframe assembly to c codes in legacy migration,

D. Fujiwara, N. Ishiura, R. Sakai, R. Aoki, and T. Ogawara, “Re- verse engineering from mainframe assembly to c codes in legacy migration,” inProceedings of the 5th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI), 2016

2016
[30]

Structural analysis of binary executable headers for malware detection optimization,

B. David, E. Filiol, and K. Gallienne, “Structural analysis of binary executable headers for malware detection optimization,”Journal of Computer Virology and Hacking Techniques, 2017

2017
[31]

Automating the detection of evasive windows malware: An evaluated yara rule library for anti-vm and anti-sandbox techniques,

S. Kanj, G. Vila, and J. Pegueroles, “Automating the detection of evasive windows malware: An evaluated yara rule library for anti-vm and anti-sandbox techniques,”Journal of Cybersecurity and Privacy (JCP), 2026

2026
[32]

Detection of malware by using yara rules,

R. H. Mahdi and H. Trabelsi, “Detection of malware by using yara rules,” inProceedings of the 21st International Multi-Conference on Systems, Signals & Devices (SSD), 2024

2024
[33]

Malware detection based on multiple pe headers identification and optimization for specific types of files (jaec),

F. Zatloukal and J. Znoj, “Malware detection based on multiple pe headers identification and optimization for specific types of files (jaec),”Journal of Advanced Engineering and Computation, 2017

2017
[34]

Analyzing memory accesses in x86 executables,

G. Balakrishnan and T. Reps, “Analyzing memory accesses in x86 executables,” inProceedings of the 13th International Conference on Compiler Construction (CC), 2004

2004
[35]

When function signature recovery meets compiler optimization,

Y . Lin and D. Gao, “When function signature recovery meets compiler optimization,” inProceedings of the 42nd IEEE Symposium on Security and Privacy (SP), 2021

2021
[36]

cfi: Type-assisted control flow integrity for x86-64 binaries,

P. Muntean, M. Fischer, G. Tan, Z. Lin, J. Grossklags, and C. Eckert, “cfi: Type-assisted control flow integrity for x86-64 binaries,” in Proceedings of the 21th International Symposium on Research in Attacks, Intrusions, and Defenses (RAID), 2018

2018
[37]

Scalable data structure de- tection and classification for c/c++ binaries,

I. Haller, A. Slowinska, and H. Bos, “Scalable data structure de- tection and classification for c/c++ binaries,”Empirical Software Engineering, 2016

2016
[38]

Airtaint: Making dynamic taint analysis faster and easier,

Q. Sang, Y . Wang, Y . Liu, X. Jia, T. Bao, and P. Su, “Airtaint: Making dynamic taint analysis faster and easier,” inProceedings of the 45th IEEE Symposium on Security and Privacy (SP), 2024

2024
[39]

Hardtaint: production-run dynamic taint analysis via selec- tive hardware tracing,

Y . Zhang, T. Liu, Y . Wang, Y . Qi, K. Ji, J. Tang, X. Wang, X. Li, and Z. Zuo, “Hardtaint: production-run dynamic taint analysis via selec- tive hardware tracing,”Proceedings of the ACM on Programming Languages (PACMPL), 2024

2024
[40]

Detecting malware activities with malpminer: A dynamic analysis approach,

M. F. Abdelwahed, M. M. Kamal, and S. G. Sayed, “Detecting malware activities with malpminer: A dynamic analysis approach,” IEEE Access, 2023

2023
[41]

Unleashing mayhem on binary code,

S. K. Cha, T. Avgerinos, A. Rebert, and D. Brumley, “Unleashing mayhem on binary code,” inProceedings of the 33rd IEEE Sympo- sium on Security and Privacy (SP), 2012

2012
[42]

Cryptographic function detection in obfuscated binaries via bit-precise symbolic loop mapping,

D. Xu, J. Ming, and D. Wu, “Cryptographic function detection in obfuscated binaries via bit-precise symbolic loop mapping,” in Proceedings of the 38th IEEE Symposium on Security and Privacy (SP), 2017

2017
[43]

Symbolic execution with symcc: Don’t interpret, compile!

S. Poeplau and A. Francillon, “Symbolic execution with symcc: Don’t interpret, compile!” inProceedings of the 29th USENIX Security Symposium (Security), 2020

2020
[44]

uth, C. Dietrich, and R. Drechsler, “Accurate and extensible symbolic execution of binary code based on formal isa semantics,

S. Tempel, T. Brandt, C. L ¨"uth, C. Dietrich, and R. Drechsler, “Accurate and extensible symbolic execution of binary code based on formal isa semantics,” inDesign, Automation, and Test in Europe (DATE), 2025

2025
[45]

Mopt: Optimized mutation scheduling for fuzzers,

C. Lyu, S. Ji, C. Zhang, Y . Li, W.-H. Lee, Y . Song, and R. Beyah, “Mopt: Optimized mutation scheduling for fuzzers,” inProceedings of the 28th USENIX Security Symposium (Security), 2019

2019
[46]

Redqueen: Fuzzing with input-to-state correspondence

C. Aschermann, S. Schumilo, T. Blazytko, R. Gawlik, and T. Holz, “Redqueen: Fuzzing with input-to-state correspondence.” inPro- ceedings of the 26th Network and Distributed System Security Symposium (NDSS), 2019

2019
[47]

ohme, and A. Roychoudhury, “Model-based whitebox fuzzing for program binaries,

V .-T. Pham, M. B ¨"ohme, and A. Roychoudhury, “Model-based whitebox fuzzing for program binaries,” inProceedings of the 31st International Conference on Automated Software Engineering (ASE), 2016

2016
[48]

Pangolin: Incremental hybrid fuzzing with polyhedral path abstraction,

H. Huang, P. Yao, R. Wu, Q. Shi, and C. Zhang, “Pangolin: Incremental hybrid fuzzing with polyhedral path abstraction,” in Proceedings of the 41th IEEE Symposium on Security and Privacy (SP), 2020

2020
[49]

Auto- mated vulnerability discovery system based on hybrid execution,

T. Liu, Z. Wang, Y . Zhang, Z. Liu, B. Fang, and Z. Pang, “Auto- mated vulnerability discovery system based on hybrid execution,” inProceedings of the 7th International Conference on Data Science in Cyberspace (DSC), 2022

2022
[50]

Ghidra: Software reverse engineering (sre) suite of tools,

N. S. A. (NSA), “Ghidra: Software reverse engineering (sre) suite of tools,” https://ghidra-sre.org/, 2019

2019
[51]

Ida pro disassembler,

Hex-Rays, “Ida pro disassembler,” https://www.hex-rays.com/ products/ida/, 2022

2022
[52]

Codesurfer/x86—a platform for analyzing x86 executables,

G. Balakrishnan, R. Gruian, T. Reps, and T. Teitelbaum, “Codesurfer/x86—a platform for analyzing x86 executables,” in Proceedings of the 14th International Conference on Compiler Construction (CC), 2005

2005
[53]

Sok:(state of) the art of war: Offensive techniques in binary anal- ysis,

Y . Shoshitaishvili, R. Wang, C. Salls, N. Stephens, M. Polino, A. Dutcher, J. Grosen, S. Feng, C. Hauser, C. Kruegelet al., “Sok:(state of) the art of war: Offensive techniques in binary anal- ysis,” inProceedings of the 37th IEEE Symposium on Security and Privacy (SP), 2016

2016
[54]

Tie: Principled reverse engineering of types in bi- nary programs,

T. Avgerinos, “Tie: Principled reverse engineering of types in bi- nary programs,” inProceedings of the 18th Annual Network and Distributed System Security Symposium (NDSS), 2011

2011
[55]

Howard: A dynamic exca- vator for reverse engineering data structures

A. Slowinska, T. Stancescu, and H. Bos, “Howard: A dynamic exca- vator for reverse engineering data structures.” inProceedings of the 18th Annual Network and Distributed System Security Symposium (NDSS), 2011

2011
[56]

Automatic reverse engineering of data structures from binary execution,

Z. Lin, X. Zhang, and D. Xu, “Automatic reverse engineering of data structures from binary execution,” inProceedings of the 11th Annual Information Security Symposium, 2010

2010
[57]

Google scholar,

Google, “Google scholar,” https://scholar.google.com/intl/en/scholar/ about.html, 2026

2026
[58]

Recognizing functions in binaries with neural networks,

E. C. R. Shin, D. Song, and R. Moazzezi, “Recognizing functions in binaries with neural networks,” inProceedings of the 24th USENIX Security Symposium (Security), 2015

2015
[59]

A survey of binary code similarity,

I. U. Haq and J. Caballero, “A survey of binary code similarity,” ACM Computing Surveys (CSUR), 2021

2021
[60]

A survey of available information recovery of binary programs based on machine learn- ing,

W. Shao, Q. Yang, X. Guo, and R. Cai, “A survey of available information recovery of binary programs based on machine learn- ing,” inProceedings of the 5th International Conference on Artificial Intelligence and Big Data (ICAIBD), 2022

2022
[61]

A survey on machine learning-based malware detection in executable files,

J. Singh and J. Singh, “A survey on machine learning-based malware detection in executable files,”Journal of Systems Architecture (JSA), 2021

2021
[62]

Application of deep learning in malware detection: a review,

Y . Song, D. Zhang, J. Wang, Y . Wang, Y . Wang, and P. Ding, “Application of deep learning in malware detection: a review,” Journal of Big Data, 2025

2025
[63]

Survey of techniques to detect com- mon weaknesses in program binaries,

A. Adhikari and P. Kulkarni, “Survey of techniques to detect com- mon weaknesses in program binaries,”Cyber Security and Applica- tions (CSA), 2025

2025
[64]

A survey of automatic software vulnerability detection, program repair, and defect prediction techniques,

Z. Shen and S. Chen, “A survey of automatic software vulnerability detection, program repair, and defect prediction techniques,”Security and Communication Networks, 2020

2020
[65]

Finer: Enhancing state-of-the- art classifiers with feature attribution to facilitate security analysis,

Y . He, J. Lou, Z. Qin, and K. Ren, “Finer: Enhancing state-of-the- art classifiers with feature attribution to facilitate security analysis,” inProceedings of the 30th ACM SIGSAC Conference on Computer and Communications Security (CCS), 2023

2023
[66]

Lemna: Ex- plaining deep learning based security applications,

W. Guo, D. Mu, J. Xu, P. Su, G. Wang, and X. Xing, “Lemna: Ex- plaining deep learning based security applications,” inProceedings of the 25th ACM SIGSAC Conference on Computer and Communi- cations Security (CCS), 2018

2018
[67]

Palmtree: Learning an assembly language model for instruction embedding,

X. Li, Y . Qu, and H. Yin, “Palmtree: Learning an assembly language model for instruction embedding,” inProceedings of the 28th ACM SIGSAC Conference on Computer and Communications Security (CCS), 2021

2021
[68]

Codeart: Better code models by attention regularization when sym- bols are lacking,

Z. Su, X. Xu, Z. Huang, Z. Zhang, Y . Ye, J. Huang, and X. Zhang, “Codeart: Better code models by attention regularization when sym- bols are lacking,” inProceedings of the 32nd ACM International Conference on the Foundations of Software Engineering (FSE), 2024

2024
[69]

Clap: learning transferable binary code representations with natural language supervision,

H. Wang, Z. Gao, C. Zhang, Z. Sha, M. Sun, Y . Zhou, W. Zhu, W. Sun, H. Qiu, and X. Xiao, “Clap: learning transferable binary code representations with natural language supervision,” inPro- ceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), 2024

2024
[70]

Hext5: Unified pre-training for stripped binary code information inference,

J. Xiong, G. Chen, K. Chen, H. Gao, S. Cheng, and W. Zhang, “Hext5: Unified pre-training for stripped binary code information inference,” inProceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2023

2023
[71]

Nova: Generative language models for assembly code with hier- archical attention and contrastive learning,

N. Jiang, C. Wang, K. Liu, X. Xu, L. Tan, X. Zhang, and P. Babkin, “Nova: Generative language models for assembly code with hier- archical attention and contrastive learning,” inProceedings of the 13th International Conference on Learning Representations (ICLR), 2025

2025
[72]

Out of distribution data detection using dropout bayesian neural networks,

A. T. Nguyen, F. Lu, G. L. Munoz, E. Raff, C. Nicholas, and J. Holt, “Out of distribution data detection using dropout bayesian neural networks,” inProceedings of the 36th AAAI Conference on Artificial Intelligence (AAAI), 2022

2022
[73]

Detecting and mitigating sampling bias in cyberse- curity with unlabeled data,

S. Thirumuruganathan, F. Deniz, I. Khalil, T. Yu, M. Nabeel, and M. Ouzzani, “Detecting and mitigating sampling bias in cyberse- curity with unlabeled data,” inProceedings of the 33rd USENIX Security Symposium (Security), 2024

2024
[74]

Cade: Detecting and explaining concept drift samples for security applications,

L. Yang, W. Guo, Q. Hao, A. Ciptadi, A. Ahmadzadeh, X. Xing, and G. Wang, “Cade: Detecting and explaining concept drift samples for security applications,” inProceedings of the 30th USENIX Security Symposium (Security), 2021

2021
[75]

Can llms obfuscate code? a systematic analysis of large language models into assembly code obfuscation,

S. Mohseni, S. Mohammadi, D. Tilwani, Y . Saxena, G. K. Ndawula, S. Vema, E. Raff, and M. Gaur, “Can llms obfuscate code? a systematic analysis of large language models into assembly code obfuscation,” inProceedings of the 39th AAAI Conference on Arti- ficial Intelligence (AAAI), 2025

2025
[76]

Deepdi: Learning a relational graph convolutional network model on instructions for fast and accurate disassembly,

S. Yu, Y . Qu, X. Hu, and H. Yin, “Deepdi: Learning a relational graph convolutional network model on instructions for fast and accurate disassembly,” inProceedings of the 31st USENIX Security Symposium (Security), 2022

2022
[77]

Bingo: Cross-architecture cross-os binary search,

M. Chandramohan, Y . Xue, Z. Xu, Y . Liu, C. Y . Cho, and H. B. K. Tan, “Bingo: Cross-architecture cross-os binary search,” inPro- ceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE), 2016

2016
[78]

Xfl: Naming functions in binaries with extreme multi-label learning,

J. Patrick-Evans, M. Dannehl, and J. Kinder, “Xfl: Naming functions in binaries with extreme multi-label learning,” inProceedings of the 44th IEEE Symposium on Security and Privacy (SP), 2023

2023
[79]

Blens: Contrastive captioning of binary functions using ensemble embedding,

T. Benoit, Y . Wang, M. Dannehl, and J. Kinder, “Blens: Contrastive captioning of binary functions using ensemble embedding,” inPro- ceedings of the 34th USENIX Security Symposium (Security), 2025

2025
[80]

When coding style survives com- pilation: De-anonymizing programmers from executable binaries,

A. Caliskan, F. Yamaguchi, E. Dauber, R. E. Harang, K. Rieck, R. Greenstadt, and A. Narayanan, “When coding style survives com- pilation: De-anonymizing programmers from executable binaries,” inProceedings of the 25th Annual Network and Distributed System Security Symposium (NDSS), 2018

2018

Showing first 80 references.

[1] [1]

Firmalice - automatic detection of authentication bypass vulnerabil- ities in binary firmware,

Y . Shoshitaishvili, R. Wang, C. Hauser, C. Kruegel, and G. Vigna, “Firmalice - automatic detection of authentication bypass vulnerabil- ities in binary firmware,” inProceedings of the 22nd Annual Network and Distributed System Security Symposium (NDSS), 2015

2015

[2] [2]

Leveraging semantic relations in code and data to enhance taint analysis of embedded systems,

J. Zhao, Y . Li, Y . Zou, Z. Liang, Y . Xiao, Y . Li, B. Peng, N. Zhong, X. Wang, W. Wanget al., “Leveraging semantic relations in code and data to enhance taint analysis of embedded systems,” inProceedings of the 33rd USENIX Security Symposium (Security), 2024

2024

[3] [3]

Finding bugs using your own code: detecting functionally-similar yet inconsistent code,

M. Ahmadi, R. M. Farkhani, R. Williams, and L. Lu, “Finding bugs using your own code: detecting functionally-similar yet inconsistent code,” inProceedings of the 30th USENIX Security Symposium (Security), 2021

2021

[4] [4]

Evaluating and improving neural program- smoothing-based fuzzing,

M. Wu, L. Jiang, J. Xiang, Y . Zhang, G. Yang, H. Ma, S. Nie, S. Wu, H. Cui, and L. Zhang, “Evaluating and improving neural program- smoothing-based fuzzing,” inProceedings of the 44th IEEE/ACM International Conference on Software Engineering (ICSE), 2022

2022

[5] [5]

Mtfuzz: fuzzing with a multi-task neural network,

D. She, R. Krishna, L. Yan, S. Jana, and B. Ray, “Mtfuzz: fuzzing with a multi-task neural network,” inProceedings of the 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), 2020

2020

[6] [6]

Inspector gadget: Automated extraction of proprietary gadgets from malware binaries,

C. Kolbitsch, T. Holz, C. Kruegel, and E. Kirda, “Inspector gadget: Automated extraction of proprietary gadgets from malware binaries,” inProceedings of the 31st IEEE Symposium on Security and Privacy (SP), 2010

2010

[7] [7]

Identifying dormant functionality in malware programs,

P. M. Comparetti, G. Salvaneschi, E. Kirda, C. Kolbitsch, C. Kruegel, and S. Zanero, “Identifying dormant functionality in malware programs,” inProceedings of the 31st IEEE Symposium on Security and Privacy (SP), 2010

2010

[8] [8]

Autoprobe: Towards automatic active malicious server probing using dynamic binary analysis,

Z. Xu, A. Nappa, R. Baykov, G. Yang, J. Caballero, and G. Gu, “Autoprobe: Towards automatic active malicious server probing using dynamic binary analysis,” inProceedings of the 21st ACM SIGSAC Conference on Computer and Communications Security (CCS), 2014

2014

[9] [9]

Dispatcher: Enabling active botnet infiltration using automatic protocol reverse- engineering,

J. Caballero, P. Poosankam, C. Kreibich, and D. Song, “Dispatcher: Enabling active botnet infiltration using automatic protocol reverse- engineering,” inProceedings of the 16th ACM SIGSAC Conference on Computer and Communications Security (CCS), 2009

2009

[10] [10]

Binary sight-seeing: Accelerating reverse engineering via point-of-interest- beacons,

R. A. See, M. Gehring, M. Fischer, and S. Karuppayah, “Binary sight-seeing: Accelerating reverse engineering via point-of-interest- beacons,” inProceedings of the 39th Annual Computer Security Applications Conference (ACSAC), 2023

2023

[11] [11]

One bad apple spoils the barrel: Under- standing the security risks introduced by third-party components in iot firmware,

B. Zhao, S. Ji, J. Xu, Y . Tian, Q. Wei, Q. Wang, C. Lyu, X. Zhang, C. Lin, J. Wuet al., “One bad apple spoils the barrel: Under- standing the security risks introduced by third-party components in iot firmware,”IEEE Transactions on Dependable and Secure Computing (TDSC), 2023

2023

[12] [12]

Binaryai: Binary software composition analysis via intelligent bi- nary source code matching,

L. Jiang, J. An, H. Huang, Q. Tang, S. Nie, S. Wu, and Y . Zhang, “Binaryai: Binary software composition analysis via intelligent bi- nary source code matching,” inProceedings of the 46th IEEE/ACM International Conference on Software Engineering (ICSE), 2024

2024

[13] [13]

Veribin: Adaptive verification of patches at the binary level

H. Wu, J. Wu, R. Wu, A. Sharma, A. Machiry, and A. Bianchi, “Veribin: Adaptive verification of patches at the binary level.” in Proceedings of the 32nd Annual Network and Distributed System Security Symposium (NDSS), 2025

2025

[14] [14]

Precise and accurate patch presence test for binaries,

H. Zhang and Z. Qian, “Precise and accurate patch presence test for binaries,” inProceedings of the 27th USENIX Security Symposium (Security), 2018

2018

[15] [15]

Patchdiscovery: Patch presence test for identifying binary vulnerabilities based on key basic blocks,

X. Xu, Q. Zheng, Z. Yan, M. Fan, A. Jia, Z. Zhou, H. Wang, and T. Liu, “Patchdiscovery: Patch presence test for identifying binary vulnerabilities based on key basic blocks,”IEEE Transactions on Software Engineering (TSE), 2023

2023

[16] [16]

Automating patching of vulnerable open- source software versions in application binaries

R. Duan, A. Bijlani, Y . Ji, O. Alrawi, Y . Xiong, M. Ike, B. Saltafor- maggio, and W. Lee, “Automating patching of vulnerable open- source software versions in application binaries.” inProceedings of the 26th Annual Network and Distributed System Security Sympo- sium (NDSS), 2019

2019

[17] [17]

An infrastructure to support interoperability in reverse engineering,

N. A. Kraft, B. A. Malloy, and J. F. Power, “An infrastructure to support interoperability in reverse engineering,”Information and Software Technology, 2007

2007

[18] [18]

Toward an infrastructure to support interoperability in reverse engineering,

——, “Toward an infrastructure to support interoperability in reverse engineering,” inProceedings of the 12th Working Conference on Reverse Engineering (WCRE), 2005

2005

[19] [19]

Memory forensics and the windows subsystem for linux,

N. Lewis, A. Case, A. Ali-Gombe, and G. G. Richard III, “Memory forensics and the windows subsystem for linux,”Digital Investiga- tion, 2018

2018

[20] [20]

Seance: Divination of tool-breaking changes in forensically impor- tant binaries,

R. D. Maggio, A. Case, A. Ali-Gombe, and G. G. Richard III, “Seance: Divination of tool-breaking changes in forensically impor- tant binaries,”Forensic Science International: Digital Investigation, 2021

2021

[21] [21]

Characterization of the windows kernel version vari- ability for accurate memory analysis,

M. I. Cohen, “Characterization of the windows kernel version vari- ability for accurate memory analysis,”Digital Investigation, 2015

2015

[22] [22]

Bin-carver: Au- tomatic recovery of binary executable files,

S. Hand, Z. Lin, G. Gu, and B. Thuraisingham, “Bin-carver: Au- tomatic recovery of binary executable files,”Digital Investigation, 2012

2012

[23] [23]

Who wrote this code? identifying the authors of program binaries,

N. Rosenblum, X. Zhu, and B. P. Miller, “Who wrote this code? identifying the authors of program binaries,” inProceedings of the 16th European Symposium on Research in Computer Security (ESORICS), 2011

2011

[24] [24]

When coding style survives com- pilation: De-anonymizing programmers from executable binaries,

A. Caliskan, F. Yamaguchi, E. Dauber, R. Harang, K. Rieck, R. Greenstadt, and A. Narayanan, “When coding style survives com- pilation: De-anonymizing programmers from executable binaries,” inProceedings of the 25th Annual Network and Distributed System Security Symposium (NDSS), 2018

2018

[25] [25]

Soft- ware plagiarism detection with birthmarks based on dynamic key instruction sequences,

Z. Tian, Q. Zheng, T. Liu, M. Fan, E. Zhuang, and Z. Yang, “Soft- ware plagiarism detection with birthmarks based on dynamic key instruction sequences,”IEEE Transactions on Software Engineering (TSE), 2015

2015

[26] [26]

Identifying open- source license violation and 1-day security risk at large scale,

R. Duan, A. Bijlani, M. Xu, T. Kim, and W. Lee, “Identifying open- source license violation and 1-day security risk at large scale,” in Proceedings of the 24th ACM SIGSAC Conference on Computer and Communications Security (CCS), 2017

2017

[27] [27]

Binary translation: Static, dynamic, retargetable?

C. Cifuentes and V . Malhotra, “Binary translation: Static, dynamic, retargetable?” inProceedings of the IEEE International Conference on Software Maintenance (ICSM), 1996

1996

[28] [28]

De- composing legacy programs: A first step towards migrating to client– server platforms,

G. Canfora, A. Cimitile, A. De Lucia, and G. A. Di Lucca, “De- composing legacy programs: A first step towards migrating to client– server platforms,”Journal of Systems and Software (JSS), 2000

2000

[29] [29]

Re- verse engineering from mainframe assembly to c codes in legacy migration,

D. Fujiwara, N. Ishiura, R. Sakai, R. Aoki, and T. Ogawara, “Re- verse engineering from mainframe assembly to c codes in legacy migration,” inProceedings of the 5th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI), 2016

2016

[30] [30]

Structural analysis of binary executable headers for malware detection optimization,

B. David, E. Filiol, and K. Gallienne, “Structural analysis of binary executable headers for malware detection optimization,”Journal of Computer Virology and Hacking Techniques, 2017

2017

[31] [31]

Automating the detection of evasive windows malware: An evaluated yara rule library for anti-vm and anti-sandbox techniques,

S. Kanj, G. Vila, and J. Pegueroles, “Automating the detection of evasive windows malware: An evaluated yara rule library for anti-vm and anti-sandbox techniques,”Journal of Cybersecurity and Privacy (JCP), 2026

2026

[32] [32]

Detection of malware by using yara rules,

R. H. Mahdi and H. Trabelsi, “Detection of malware by using yara rules,” inProceedings of the 21st International Multi-Conference on Systems, Signals & Devices (SSD), 2024

2024

[33] [33]

Malware detection based on multiple pe headers identification and optimization for specific types of files (jaec),

F. Zatloukal and J. Znoj, “Malware detection based on multiple pe headers identification and optimization for specific types of files (jaec),”Journal of Advanced Engineering and Computation, 2017

2017

[34] [34]

Analyzing memory accesses in x86 executables,

G. Balakrishnan and T. Reps, “Analyzing memory accesses in x86 executables,” inProceedings of the 13th International Conference on Compiler Construction (CC), 2004

2004

[35] [35]

When function signature recovery meets compiler optimization,

Y . Lin and D. Gao, “When function signature recovery meets compiler optimization,” inProceedings of the 42nd IEEE Symposium on Security and Privacy (SP), 2021

2021

[36] [36]

cfi: Type-assisted control flow integrity for x86-64 binaries,

P. Muntean, M. Fischer, G. Tan, Z. Lin, J. Grossklags, and C. Eckert, “cfi: Type-assisted control flow integrity for x86-64 binaries,” in Proceedings of the 21th International Symposium on Research in Attacks, Intrusions, and Defenses (RAID), 2018

2018

[37] [37]

Scalable data structure de- tection and classification for c/c++ binaries,

I. Haller, A. Slowinska, and H. Bos, “Scalable data structure de- tection and classification for c/c++ binaries,”Empirical Software Engineering, 2016

2016

[38] [38]

Airtaint: Making dynamic taint analysis faster and easier,

Q. Sang, Y . Wang, Y . Liu, X. Jia, T. Bao, and P. Su, “Airtaint: Making dynamic taint analysis faster and easier,” inProceedings of the 45th IEEE Symposium on Security and Privacy (SP), 2024

2024

[39] [39]

Hardtaint: production-run dynamic taint analysis via selec- tive hardware tracing,

Y . Zhang, T. Liu, Y . Wang, Y . Qi, K. Ji, J. Tang, X. Wang, X. Li, and Z. Zuo, “Hardtaint: production-run dynamic taint analysis via selec- tive hardware tracing,”Proceedings of the ACM on Programming Languages (PACMPL), 2024

2024

[40] [40]

Detecting malware activities with malpminer: A dynamic analysis approach,

M. F. Abdelwahed, M. M. Kamal, and S. G. Sayed, “Detecting malware activities with malpminer: A dynamic analysis approach,” IEEE Access, 2023

2023

[41] [41]

Unleashing mayhem on binary code,

S. K. Cha, T. Avgerinos, A. Rebert, and D. Brumley, “Unleashing mayhem on binary code,” inProceedings of the 33rd IEEE Sympo- sium on Security and Privacy (SP), 2012

2012

[42] [42]

Cryptographic function detection in obfuscated binaries via bit-precise symbolic loop mapping,

D. Xu, J. Ming, and D. Wu, “Cryptographic function detection in obfuscated binaries via bit-precise symbolic loop mapping,” in Proceedings of the 38th IEEE Symposium on Security and Privacy (SP), 2017

2017

[43] [43]

Symbolic execution with symcc: Don’t interpret, compile!

S. Poeplau and A. Francillon, “Symbolic execution with symcc: Don’t interpret, compile!” inProceedings of the 29th USENIX Security Symposium (Security), 2020

2020

[44] [44]

uth, C. Dietrich, and R. Drechsler, “Accurate and extensible symbolic execution of binary code based on formal isa semantics,

S. Tempel, T. Brandt, C. L ¨"uth, C. Dietrich, and R. Drechsler, “Accurate and extensible symbolic execution of binary code based on formal isa semantics,” inDesign, Automation, and Test in Europe (DATE), 2025

2025

[45] [45]

Mopt: Optimized mutation scheduling for fuzzers,

C. Lyu, S. Ji, C. Zhang, Y . Li, W.-H. Lee, Y . Song, and R. Beyah, “Mopt: Optimized mutation scheduling for fuzzers,” inProceedings of the 28th USENIX Security Symposium (Security), 2019

2019

[46] [46]

Redqueen: Fuzzing with input-to-state correspondence

C. Aschermann, S. Schumilo, T. Blazytko, R. Gawlik, and T. Holz, “Redqueen: Fuzzing with input-to-state correspondence.” inPro- ceedings of the 26th Network and Distributed System Security Symposium (NDSS), 2019

2019

[47] [47]

ohme, and A. Roychoudhury, “Model-based whitebox fuzzing for program binaries,

V .-T. Pham, M. B ¨"ohme, and A. Roychoudhury, “Model-based whitebox fuzzing for program binaries,” inProceedings of the 31st International Conference on Automated Software Engineering (ASE), 2016

2016

[48] [48]

Pangolin: Incremental hybrid fuzzing with polyhedral path abstraction,

H. Huang, P. Yao, R. Wu, Q. Shi, and C. Zhang, “Pangolin: Incremental hybrid fuzzing with polyhedral path abstraction,” in Proceedings of the 41th IEEE Symposium on Security and Privacy (SP), 2020

2020

[49] [49]

Auto- mated vulnerability discovery system based on hybrid execution,

T. Liu, Z. Wang, Y . Zhang, Z. Liu, B. Fang, and Z. Pang, “Auto- mated vulnerability discovery system based on hybrid execution,” inProceedings of the 7th International Conference on Data Science in Cyberspace (DSC), 2022

2022

[50] [50]

Ghidra: Software reverse engineering (sre) suite of tools,

N. S. A. (NSA), “Ghidra: Software reverse engineering (sre) suite of tools,” https://ghidra-sre.org/, 2019

2019

[51] [51]

Ida pro disassembler,

Hex-Rays, “Ida pro disassembler,” https://www.hex-rays.com/ products/ida/, 2022

2022

[52] [52]

Codesurfer/x86—a platform for analyzing x86 executables,

G. Balakrishnan, R. Gruian, T. Reps, and T. Teitelbaum, “Codesurfer/x86—a platform for analyzing x86 executables,” in Proceedings of the 14th International Conference on Compiler Construction (CC), 2005

2005

[53] [53]

Sok:(state of) the art of war: Offensive techniques in binary anal- ysis,

Y . Shoshitaishvili, R. Wang, C. Salls, N. Stephens, M. Polino, A. Dutcher, J. Grosen, S. Feng, C. Hauser, C. Kruegelet al., “Sok:(state of) the art of war: Offensive techniques in binary anal- ysis,” inProceedings of the 37th IEEE Symposium on Security and Privacy (SP), 2016

2016

[54] [54]

Tie: Principled reverse engineering of types in bi- nary programs,

T. Avgerinos, “Tie: Principled reverse engineering of types in bi- nary programs,” inProceedings of the 18th Annual Network and Distributed System Security Symposium (NDSS), 2011

2011

[55] [55]

Howard: A dynamic exca- vator for reverse engineering data structures

A. Slowinska, T. Stancescu, and H. Bos, “Howard: A dynamic exca- vator for reverse engineering data structures.” inProceedings of the 18th Annual Network and Distributed System Security Symposium (NDSS), 2011

2011

[56] [56]

Automatic reverse engineering of data structures from binary execution,

Z. Lin, X. Zhang, and D. Xu, “Automatic reverse engineering of data structures from binary execution,” inProceedings of the 11th Annual Information Security Symposium, 2010

2010

[57] [57]

Google scholar,

Google, “Google scholar,” https://scholar.google.com/intl/en/scholar/ about.html, 2026

2026

[58] [58]

Recognizing functions in binaries with neural networks,

E. C. R. Shin, D. Song, and R. Moazzezi, “Recognizing functions in binaries with neural networks,” inProceedings of the 24th USENIX Security Symposium (Security), 2015

2015

[59] [59]

A survey of binary code similarity,

I. U. Haq and J. Caballero, “A survey of binary code similarity,” ACM Computing Surveys (CSUR), 2021

2021

[60] [60]

A survey of available information recovery of binary programs based on machine learn- ing,

W. Shao, Q. Yang, X. Guo, and R. Cai, “A survey of available information recovery of binary programs based on machine learn- ing,” inProceedings of the 5th International Conference on Artificial Intelligence and Big Data (ICAIBD), 2022

2022

[61] [61]

A survey on machine learning-based malware detection in executable files,

J. Singh and J. Singh, “A survey on machine learning-based malware detection in executable files,”Journal of Systems Architecture (JSA), 2021

2021

[62] [62]

Application of deep learning in malware detection: a review,

Y . Song, D. Zhang, J. Wang, Y . Wang, Y . Wang, and P. Ding, “Application of deep learning in malware detection: a review,” Journal of Big Data, 2025

2025

[63] [63]

Survey of techniques to detect com- mon weaknesses in program binaries,

A. Adhikari and P. Kulkarni, “Survey of techniques to detect com- mon weaknesses in program binaries,”Cyber Security and Applica- tions (CSA), 2025

2025

[64] [64]

A survey of automatic software vulnerability detection, program repair, and defect prediction techniques,

Z. Shen and S. Chen, “A survey of automatic software vulnerability detection, program repair, and defect prediction techniques,”Security and Communication Networks, 2020

2020

[65] [65]

Finer: Enhancing state-of-the- art classifiers with feature attribution to facilitate security analysis,

Y . He, J. Lou, Z. Qin, and K. Ren, “Finer: Enhancing state-of-the- art classifiers with feature attribution to facilitate security analysis,” inProceedings of the 30th ACM SIGSAC Conference on Computer and Communications Security (CCS), 2023

2023

[66] [66]

Lemna: Ex- plaining deep learning based security applications,

W. Guo, D. Mu, J. Xu, P. Su, G. Wang, and X. Xing, “Lemna: Ex- plaining deep learning based security applications,” inProceedings of the 25th ACM SIGSAC Conference on Computer and Communi- cations Security (CCS), 2018

2018

[67] [67]

Palmtree: Learning an assembly language model for instruction embedding,

X. Li, Y . Qu, and H. Yin, “Palmtree: Learning an assembly language model for instruction embedding,” inProceedings of the 28th ACM SIGSAC Conference on Computer and Communications Security (CCS), 2021

2021

[68] [68]

Codeart: Better code models by attention regularization when sym- bols are lacking,

Z. Su, X. Xu, Z. Huang, Z. Zhang, Y . Ye, J. Huang, and X. Zhang, “Codeart: Better code models by attention regularization when sym- bols are lacking,” inProceedings of the 32nd ACM International Conference on the Foundations of Software Engineering (FSE), 2024

2024

[69] [69]

Clap: learning transferable binary code representations with natural language supervision,

H. Wang, Z. Gao, C. Zhang, Z. Sha, M. Sun, Y . Zhou, W. Zhu, W. Sun, H. Qiu, and X. Xiao, “Clap: learning transferable binary code representations with natural language supervision,” inPro- ceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), 2024

2024

[70] [70]

Hext5: Unified pre-training for stripped binary code information inference,

J. Xiong, G. Chen, K. Chen, H. Gao, S. Cheng, and W. Zhang, “Hext5: Unified pre-training for stripped binary code information inference,” inProceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2023

2023

[71] [71]

Nova: Generative language models for assembly code with hier- archical attention and contrastive learning,

N. Jiang, C. Wang, K. Liu, X. Xu, L. Tan, X. Zhang, and P. Babkin, “Nova: Generative language models for assembly code with hier- archical attention and contrastive learning,” inProceedings of the 13th International Conference on Learning Representations (ICLR), 2025

2025

[72] [72]

Out of distribution data detection using dropout bayesian neural networks,

A. T. Nguyen, F. Lu, G. L. Munoz, E. Raff, C. Nicholas, and J. Holt, “Out of distribution data detection using dropout bayesian neural networks,” inProceedings of the 36th AAAI Conference on Artificial Intelligence (AAAI), 2022

2022

[73] [73]

Detecting and mitigating sampling bias in cyberse- curity with unlabeled data,

S. Thirumuruganathan, F. Deniz, I. Khalil, T. Yu, M. Nabeel, and M. Ouzzani, “Detecting and mitigating sampling bias in cyberse- curity with unlabeled data,” inProceedings of the 33rd USENIX Security Symposium (Security), 2024

2024

[74] [74]

Cade: Detecting and explaining concept drift samples for security applications,

L. Yang, W. Guo, Q. Hao, A. Ciptadi, A. Ahmadzadeh, X. Xing, and G. Wang, “Cade: Detecting and explaining concept drift samples for security applications,” inProceedings of the 30th USENIX Security Symposium (Security), 2021

2021

[75] [75]

Can llms obfuscate code? a systematic analysis of large language models into assembly code obfuscation,

S. Mohseni, S. Mohammadi, D. Tilwani, Y . Saxena, G. K. Ndawula, S. Vema, E. Raff, and M. Gaur, “Can llms obfuscate code? a systematic analysis of large language models into assembly code obfuscation,” inProceedings of the 39th AAAI Conference on Arti- ficial Intelligence (AAAI), 2025

2025

[76] [76]

Deepdi: Learning a relational graph convolutional network model on instructions for fast and accurate disassembly,

S. Yu, Y . Qu, X. Hu, and H. Yin, “Deepdi: Learning a relational graph convolutional network model on instructions for fast and accurate disassembly,” inProceedings of the 31st USENIX Security Symposium (Security), 2022

2022

[77] [77]

Bingo: Cross-architecture cross-os binary search,

M. Chandramohan, Y . Xue, Z. Xu, Y . Liu, C. Y . Cho, and H. B. K. Tan, “Bingo: Cross-architecture cross-os binary search,” inPro- ceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE), 2016

2016

[78] [78]

Xfl: Naming functions in binaries with extreme multi-label learning,

J. Patrick-Evans, M. Dannehl, and J. Kinder, “Xfl: Naming functions in binaries with extreme multi-label learning,” inProceedings of the 44th IEEE Symposium on Security and Privacy (SP), 2023

2023

[79] [79]

Blens: Contrastive captioning of binary functions using ensemble embedding,

T. Benoit, Y . Wang, M. Dannehl, and J. Kinder, “Blens: Contrastive captioning of binary functions using ensemble embedding,” inPro- ceedings of the 34th USENIX Security Symposium (Security), 2025

2025

[80] [80]

When coding style survives com- pilation: De-anonymizing programmers from executable binaries,

A. Caliskan, F. Yamaguchi, E. Dauber, R. E. Harang, K. Rieck, R. Greenstadt, and A. Narayanan, “When coding style survives com- pilation: De-anonymizing programmers from executable binaries,” inProceedings of the 25th Annual Network and Distributed System Security Symposium (NDSS), 2018

2018