OpenAnt: LLM-Powered Vulnerability Discovery Through Code Decomposition, Adversarial Verification, and Dynamic Testing

Gadi Evron; Nahum Korda

arxiv: 2606.19149 · v2 · pith:CEEPKL22new · submitted 2026-06-17 · 💻 cs.CR · cs.LG

OpenAnt: LLM-Powered Vulnerability Discovery Through Code Decomposition, Adversarial Verification, and Dynamic Testing

Nahum Korda , Gadi Evron This is my paper

Pith reviewed 2026-06-26 20:19 UTC · model grok-4.3

classification 💻 cs.CR cs.LG

keywords vulnerability discoveryLLM reasoningcode decompositionadversarial verificationdynamic testingstatic analysisfalse positive reductionsecurity automation

0 comments

The pith

OpenAnt decomposes codebases and uses LLM reasoning plus sandbox testing to find unknown vulnerabilities with fewer false positives.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces OpenAnt as a system that breaks large codebases into smaller reachable analysis units, applies large language models to simulate realistic attacker scenarios for candidate issues, and then generates and runs exploit code in temporary sandboxed environments to confirm findings. Traditional static tools generate too many false alarms while dynamic fuzzing needs heavy setup and misses many bug classes, so the goal is a hybrid pipeline that keeps costs reasonable for repository-scale work. Evaluation on projects including OpenSSL, WordPress, and Flowise indicates the method can surface real previously unknown problems while cutting the code under review by up to 97 percent and lowering false positives. If the approach holds, it points toward automated security checks that combine semantic understanding with concrete validation steps rather than relying on either alone.

Core claim

OpenAnt integrates static program analysis with LLM-based reasoning in a multi-stage pipeline. Codebases are decomposed into self-contained analysis units filtered by reachability from external entry points, reducing the analysis surface by up to 97% while preserving attack-relevant code. Candidate vulnerabilities then undergo adversarial verification through constrained attacker simulation. Findings are validated through dynamic verification in which exploit environments are generated automatically, executed in sandboxed containers, and discarded after use. Evaluation on widely used open-source projects shows the architecture identifies previously unknown vulnerabilities while maintaining m

What carries the argument

OpenAnt's three-stage pipeline that decomposes reachable code units, runs adversarial attacker simulation for verification, and performs automated dynamic execution in sandboxes.

If this is right

Previously unknown vulnerabilities can be identified in widely used projects such as OpenSSL, WordPress, and Flowise.
Analysis surface is reduced by up to 97 percent while attack-relevant code is retained.
False positives are substantially lower than those produced by traditional static analysis alone.
Analysis cost remains manageable for repository-scale security work.
Closed-loop pipelines that combine semantic reasoning with exploit validation offer a route to scalable automated security analysis.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same decomposition and validation loop could be applied to detect configuration or logic errors that are not classic memory-safety bugs.
Pairing the initial decomposition step with existing static analyzers might further trim the set of units sent to the LLM stage.
The sandbox generation process suggests a route for embedding continuous vulnerability checks inside developer build pipelines.

Load-bearing premise

Filtering code into self-contained units by reachability from external entry points keeps every attack-relevant path without dropping exploitable code.

What would settle it

Run OpenAnt on a codebase containing several known, documented vulnerabilities and measure whether it reports them while keeping the number of candidates that fail sandbox verification low.

read the original abstract

Automated vulnerability discovery in large codebases remains challenging: traditional static analysis produces high false-positive rates, while dynamic approaches such as fuzzing require substantial infrastructure and often target narrow classes of bugs. Recent advances in large language models (LLMs) enable semantic reasoning about program behavior, but applying LLMs to repository-scale security analysis introduces challenges related to context management, cost, and verification. We present OpenAnt, an open-source vulnerability discovery system that integrates static program analysis with LLM-based reasoning in a multi-stage pipeline. OpenAnt introduces three key techniques. First, codebases are decomposed into self-contained analysis units filtered by reachability from external entry points, reducing the analysis surface by up to 97% while preserving attack-relevant code. Second, candidate vulnerabilities undergo adversarial verification through constrained attacker simulation, where the model evaluates exploitability under realistic attacker capabilities. Third, findings are validated through dynamic verification, in which exploit environments are generated automatically, executed in sandboxed containers, and discarded after use. Evaluation on widely used open-source projects including OpenSSL, WordPress, and Flowise shows that this architecture can identify previously unknown vulnerabilities while maintaining manageable analysis cost and substantially reducing false positives. Our results suggest that closed-loop vulnerability discovery pipelines, combining semantic reasoning with exploit validation, provide a practical path toward scalable automated security analysis. OpenAnt is released as open source under the Apache 2.0 license at https://github.com/knostic/OpenAnt.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

OpenAnt describes a three-stage LLM pipeline for vuln discovery with open-source release, but the abstract gives no metrics or evaluation details to assess whether it works.

read the letter

The main thing to know is that this paper introduces OpenAnt as a pipeline that decomposes codebases by reachability from external entry points, runs LLM-based adversarial verification on candidates, and then generates and executes exploits in sandboxes. The abstract claims this finds unknown vulnerabilities in projects like OpenSSL while cutting analysis surface by up to 97 percent, but supplies no counts, false-positive rates, or methodology.

What is actually new is the concrete combination of reachability filtering to create self-contained units, constrained attacker simulation inside the model, and automated containerized dynamic validation. Releasing the full system under Apache 2.0 at the linked GitHub repo is useful, since it lets others run the same pipeline on the same targets.

The paper does a straightforward job laying out the practical problems with scaling LLMs to repository analysis—context limits, cost, and verification—and positions the three stages as a way to address them. That framing is clear.

The soft spots are the missing data. No results table, no comparison to prior tools, and no check that the reachability filter actually kept every known CVE in the evaluated projects. The stress-test concern about internally triggered or chained surfaces is reasonable here; the abstract asserts preservation but offers no soundness argument or empirical test against it. Without those pieces the central claim stays untestable.

This paper is for researchers building automated security analysis systems who want an architecture sketch and runnable code. A reader looking for quantified evidence or reproducible findings will come away empty. It deserves peer review so referees can see whether the full manuscript supplies the missing evaluation and addresses the preservation assumption.

Referee Report

2 major / 0 minor

Summary. The paper presents OpenAnt, an open-source system integrating static analysis and LLM reasoning for vulnerability discovery. It decomposes codebases into self-contained units filtered by reachability from external entry points (claimed to reduce analysis surface by up to 97% while preserving attack-relevant code), applies adversarial verification via constrained attacker simulation, and validates findings through automated dynamic testing in sandboxed containers. Evaluation on OpenSSL, WordPress, and Flowise is claimed to identify previously unknown vulnerabilities at manageable cost with substantially reduced false positives.

Significance. If the empirical claims are substantiated with quantitative data and validation of modeling assumptions, the work could advance practical automated security analysis by combining semantic LLM reasoning with verification stages to address scalability and false-positive issues in large codebases. The open-source release under Apache 2.0 supports reproducibility and extension.

major comments (2)

[Abstract] Abstract (evaluation paragraph): The central claim that the architecture identifies previously unknown vulnerabilities while reducing false positives is stated without any supporting metrics, counts of vulnerabilities found, false-positive rates, analysis costs, or evaluation methodology. The soundness of the result cannot be assessed.
[Code decomposition] Code decomposition (first key technique): The assumption that reachability filtering from external entry points preserves all attack-relevant code is presented without a soundness argument or empirical validation (e.g., checking whether any known CVEs in the evaluated projects fall outside the retained units). This is load-bearing for the 97% reduction claim, as under-approximation could exclude internally triggered or chained attack surfaces.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight opportunities to strengthen the presentation of our empirical claims and the justification for our core techniques. We address each major comment below and commit to revisions that improve clarity and substantiation without altering the underlying contributions.

read point-by-point responses

Referee: [Abstract] Abstract (evaluation paragraph): The central claim that the architecture identifies previously unknown vulnerabilities while reducing false positives is stated without any supporting metrics, counts of vulnerabilities found, false-positive rates, analysis costs, or evaluation methodology. The soundness of the result cannot be assessed.

Authors: We agree that the abstract's evaluation paragraph would be strengthened by including concrete quantitative metrics. In the revised manuscript we will update the abstract to reference key results from the evaluation section, including the number of previously unknown vulnerabilities identified across the three projects, measured false-positive rates before and after the verification stages, and per-project analysis costs (in LLM tokens and wall-clock time). This will allow readers to assess the claims directly from the abstract while preserving its high-level nature. revision: yes
Referee: [Code decomposition] Code decomposition (first key technique): The assumption that reachability filtering from external entry points preserves all attack-relevant code is presented without a soundness argument or empirical validation (e.g., checking whether any known CVEs in the evaluated projects fall outside the retained units). This is load-bearing for the 97% reduction claim, as under-approximation could exclude internally triggered or chained attack surfaces.

Authors: Reachability filtering from external entry points follows standard practice in security-oriented static analysis to focus on externally triggerable code. We will add an explicit soundness discussion and empirical validation in the revised manuscript: we will report the fraction of known CVEs from the evaluated projects (OpenSSL, WordPress) that remain inside the retained units after filtering, and we will discuss the threat model under which internally triggered or chained surfaces are considered out of scope. If the validation reveals any excluded CVEs, we will qualify the 97% reduction claim accordingly. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical system description with no derivations or fitted parameters

full rationale

The paper presents OpenAnt as an engineering system combining static analysis, LLM reasoning, adversarial verification, and dynamic testing. The abstract and described techniques rely on empirical evaluation on OpenSSL, WordPress, and Flowise rather than any mathematical derivation chain, equations, or parameter fitting. The reachability filtering claim is stated as a design choice that reduces surface while preserving attack-relevant code, without reduction to a self-referential definition or prior self-citation that bears the central result. No load-bearing step reduces by construction to its own inputs; the work is self-contained as an applied pipeline description.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are described in the abstract.

pith-pipeline@v0.9.1-grok · 5796 in / 1098 out tokens · 23621 ms · 2026-06-26T20:19:35.178142+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

35 extracted references · 7 canonical work pages · 2 internal anchors

[1]

Shen, M., Li, Z., Xu, W., & Chen, Y. (2023). An empirical study on the use of static analysis tools in open source embedded software. arXiv preprint arXiv:2305.07023

work page arXiv 2023
[2]

Kuszczyński, K., & Walkowski, M. (2023). Comparative analysis of open-source tools for conducting static code analysis. Sensors, 23(18), 7753

2023
[3]

Johnson, B., Song, Y., Murphy-Hill, E., & Bowdidge, R. (2013). Why don’t software developers use static analysis tools to find bugs? In Proceedings of the 2013 International Conference on Software Engineering (ICSE) (pp. 672–681). IEEE

2013
[4]

Christakis, M., & Bird, C. (2016). What developers want and need from program analysis: An empirical study. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering (ASE) (pp. 332–343). IEEE

2016
[5]

Bennett, G., et al. (2024). Do developers use static application security testing tools? ACM Computing Surveys. 14

2024
[6]

Aloraini, B., et al. (2019). An empirical study of security warnings from static application security testing tools. Journal of Systems and Software, 148, 230–245

2019
[7]

B., et al

Okutan, A., Grichi, M., Dwyer, M. B., et al. (2024). An empirical study of static analysis tools for secure code review. arXiv preprint arXiv:2407.12241

work page arXiv 2024
[8]

Ma, W., Liu, S., Lin, Z., et al. (2023). LLMs: Understanding code syntax and semantics for code analysis. arXiv preprint arXiv:2305.12138

work page internal anchor Pith review Pith/arXiv arXiv 2023
[9]

Pearce, H., Ahmad, A., Tan, B., et al. (2022). Asleep at the keyboard? Assessing the security of GitHub Copilot’s code contributions. In Proceedings of the IEEE Symposium on Security and Privacy (S&P) (pp. 754–768)

2022
[10]

Fried, D., Chan, A., Darrell, T., & Klein, D. (2023). Code as policies: Language model programs for embodied control. In Proceedings of Robotics: Science and Systems (RSS)

2023
[11]

F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., & Liang, P

Liu, N. F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., & Liang, P. (2024). Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics, 12, 157–173

2024
[12]

National Institute of Standards and Technology. (n.d.). Software Assurance Reference Dataset (SARD): Juliet test suite. Retrieved June 2026, fromhttps://samate.nist.gov/ SARD/

2026
[13]

OWASP Foundation. (n.d.). OWASP Benchmark project. Retrieved June 2026, from https://owasp.org/www-project-benchmark/

2026
[14]

Riddell, M., Ni, A., & Cohan, A. (2024). Quantifying contamination in evaluating code generation capabilities of language models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL) (pp. 14116–14137)

2024
[15]

Guo, D., et al. (2025). LessLeak-Bench: A first investigation of data leakage in LLMs across 83 software engineering benchmarks. arXiv preprint arXiv:2502.06215

work page arXiv 2025
[16]

A., Garcia-Ferrero, I., et al

Sainz, O., Campos, J. A., Garcia-Ferrero, I., et al. (2023). NLP evaluation in trouble: On the need to measure LLM data contamination for each benchmark. In Findings of the Association for Computational Linguistics: EMNLP 2023

2023
[17]

Li, Y., et al. (2024). Unveiling the spectrum of data contamination in language models: A survey from detection to remediation. In Findings of the Association for Computational Linguistics (ACL 2024)

2024
[18]

A., & Kholoosi, M

Croft, R., Babar, M. A., & Kholoosi, M. M. (2023). Data quality for software vulnerability datasets. In Proceedings of the 45th International Conference on Software Engineering (ICSE)

2023
[19]

Ding, Y., Fu, Y., Ibrahim, O., et al. (2025). Vulnerability detection with code language models: How far are we? In Proceedings of the 47th International Conference on Software Engineering (ICSE)

2025
[20]

Yang, X., et al. (2023). Understanding the effectiveness of large language models in detect- ing security vulnerabilities. arXiv preprint arXiv:2311.16169

work page arXiv 2023
[21]

Semgrep. (n.d.). Semgrep: Static analysis for finding bugs and enforcing code standards. Retrieved June 2026, fromhttps://semgrep.dev

2026
[22]

GitHub. (n.d.). CodeQL: Semantic code analysis engine. Retrieved June 2026, fromhttps: //codeql.github.com 15

2026
[23]

E., Koo, H., & Okun, V

Black, P. E., Koo, H., & Okun, V. (2013). Report on the Static Analysis Tool Exposition (SATE) IV. NIST Special Publication 500-297

2013
[24]

Google Project Zero & Google DeepMind. (2024). From naptime to big sleep: Using large language models to catch vulnerabilities in real-world code. Google Project Zero Blog. https://projectzero.google/2024/10/from-naptime-to-big-sleep.html

2024
[25]

Anthropic. (2026). Claude Security. Retrieved June 2026, fromhttps://www.anthropic. com/product/security

2026
[26]

OpenAI. (2025). Introducing Aardvark: OpenAI’s agentic security researcher. OpenAI Re- search Blog.https://openai.com/index/introducing-aardvark/

2025
[27]

Fang, R., Bindu, R., Gupta, A., & Kang, D. (2024). LLM agents can autonomously exploit one-day vulnerabilities. arXiv preprint arXiv:2404.08144

work page internal anchor Pith review Pith/arXiv arXiv 2024
[28]

Fang, R., Bindu, R., Gupta, A., Zhan, Q., & Kang, D. (2024). LLM agents can au- tonomously hack websites. arXiv preprint arXiv:2402.06664

work page arXiv 2024
[29]

Happe, A., & Cito, J. (2023). Getting pwn’d by AI: Penetration testing with large language models. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE)

2023
[30]

Deng, G., Liu, Y., et al. (2024). PentestGPT: Evaluating and harnessing large language models for automated penetration testing. In Proceedings of the 33rd USENIX Security Symposium

2024
[31]

Zalewski, M. (2014). American fuzzy lop (AFL).https://lcamtuf.coredump.cx/afl/

2014
[32]

Google. (n.d.). OSS-Fuzz: Continuous fuzzing for open source software. Retrieved June 2026, fromhttps://github.com/google/oss-fuzz

2026
[33]

K., Avgerinos, T., Rebert, A., & Brumley, D

Cha, S. K., Avgerinos, T., Rebert, A., & Brumley, D. (2012). Unleashing Mayhem on binary code. In Proceedings of the IEEE Symposium on Security and Privacy (S&P) (pp. 380–394)

2012
[34]

Shoshitaishvili, Y., Wang, R., Salls, C., et al. (2016). SoK: (State of) the art of war: Offensive techniques in binary analysis. In Proceedings of the IEEE Symposium on Security and Privacy (S&P) (pp. 138–157)

2016
[35]

Defense Advanced Research Projects Agency. (2016). Cyber Grand Challenge.https:// www.darpa.mil/program/cyber-grand-challenge 16

2016

[1] [1]

Shen, M., Li, Z., Xu, W., & Chen, Y. (2023). An empirical study on the use of static analysis tools in open source embedded software. arXiv preprint arXiv:2305.07023

work page arXiv 2023

[2] [2]

Kuszczyński, K., & Walkowski, M. (2023). Comparative analysis of open-source tools for conducting static code analysis. Sensors, 23(18), 7753

2023

[3] [3]

Johnson, B., Song, Y., Murphy-Hill, E., & Bowdidge, R. (2013). Why don’t software developers use static analysis tools to find bugs? In Proceedings of the 2013 International Conference on Software Engineering (ICSE) (pp. 672–681). IEEE

2013

[4] [4]

Christakis, M., & Bird, C. (2016). What developers want and need from program analysis: An empirical study. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering (ASE) (pp. 332–343). IEEE

2016

[5] [5]

Bennett, G., et al. (2024). Do developers use static application security testing tools? ACM Computing Surveys. 14

2024

[6] [6]

Aloraini, B., et al. (2019). An empirical study of security warnings from static application security testing tools. Journal of Systems and Software, 148, 230–245

2019

[7] [7]

B., et al

Okutan, A., Grichi, M., Dwyer, M. B., et al. (2024). An empirical study of static analysis tools for secure code review. arXiv preprint arXiv:2407.12241

work page arXiv 2024

[8] [8]

Ma, W., Liu, S., Lin, Z., et al. (2023). LLMs: Understanding code syntax and semantics for code analysis. arXiv preprint arXiv:2305.12138

work page internal anchor Pith review Pith/arXiv arXiv 2023

[9] [9]

Pearce, H., Ahmad, A., Tan, B., et al. (2022). Asleep at the keyboard? Assessing the security of GitHub Copilot’s code contributions. In Proceedings of the IEEE Symposium on Security and Privacy (S&P) (pp. 754–768)

2022

[10] [10]

Fried, D., Chan, A., Darrell, T., & Klein, D. (2023). Code as policies: Language model programs for embodied control. In Proceedings of Robotics: Science and Systems (RSS)

2023

[11] [11]

F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., & Liang, P

Liu, N. F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., & Liang, P. (2024). Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics, 12, 157–173

2024

[12] [12]

National Institute of Standards and Technology. (n.d.). Software Assurance Reference Dataset (SARD): Juliet test suite. Retrieved June 2026, fromhttps://samate.nist.gov/ SARD/

2026

[13] [13]

OWASP Foundation. (n.d.). OWASP Benchmark project. Retrieved June 2026, from https://owasp.org/www-project-benchmark/

2026

[14] [14]

Riddell, M., Ni, A., & Cohan, A. (2024). Quantifying contamination in evaluating code generation capabilities of language models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL) (pp. 14116–14137)

2024

[15] [15]

Guo, D., et al. (2025). LessLeak-Bench: A first investigation of data leakage in LLMs across 83 software engineering benchmarks. arXiv preprint arXiv:2502.06215

work page arXiv 2025

[16] [16]

A., Garcia-Ferrero, I., et al

Sainz, O., Campos, J. A., Garcia-Ferrero, I., et al. (2023). NLP evaluation in trouble: On the need to measure LLM data contamination for each benchmark. In Findings of the Association for Computational Linguistics: EMNLP 2023

2023

[17] [17]

Li, Y., et al. (2024). Unveiling the spectrum of data contamination in language models: A survey from detection to remediation. In Findings of the Association for Computational Linguistics (ACL 2024)

2024

[18] [18]

A., & Kholoosi, M

Croft, R., Babar, M. A., & Kholoosi, M. M. (2023). Data quality for software vulnerability datasets. In Proceedings of the 45th International Conference on Software Engineering (ICSE)

2023

[19] [19]

Ding, Y., Fu, Y., Ibrahim, O., et al. (2025). Vulnerability detection with code language models: How far are we? In Proceedings of the 47th International Conference on Software Engineering (ICSE)

2025

[20] [20]

Yang, X., et al. (2023). Understanding the effectiveness of large language models in detect- ing security vulnerabilities. arXiv preprint arXiv:2311.16169

work page arXiv 2023

[21] [21]

Semgrep. (n.d.). Semgrep: Static analysis for finding bugs and enforcing code standards. Retrieved June 2026, fromhttps://semgrep.dev

2026

[22] [22]

GitHub. (n.d.). CodeQL: Semantic code analysis engine. Retrieved June 2026, fromhttps: //codeql.github.com 15

2026

[23] [23]

E., Koo, H., & Okun, V

Black, P. E., Koo, H., & Okun, V. (2013). Report on the Static Analysis Tool Exposition (SATE) IV. NIST Special Publication 500-297

2013

[24] [24]

Google Project Zero & Google DeepMind. (2024). From naptime to big sleep: Using large language models to catch vulnerabilities in real-world code. Google Project Zero Blog. https://projectzero.google/2024/10/from-naptime-to-big-sleep.html

2024

[25] [25]

Anthropic. (2026). Claude Security. Retrieved June 2026, fromhttps://www.anthropic. com/product/security

2026

[26] [26]

OpenAI. (2025). Introducing Aardvark: OpenAI’s agentic security researcher. OpenAI Re- search Blog.https://openai.com/index/introducing-aardvark/

2025

[27] [27]

Fang, R., Bindu, R., Gupta, A., & Kang, D. (2024). LLM agents can autonomously exploit one-day vulnerabilities. arXiv preprint arXiv:2404.08144

work page internal anchor Pith review Pith/arXiv arXiv 2024

[28] [28]

Fang, R., Bindu, R., Gupta, A., Zhan, Q., & Kang, D. (2024). LLM agents can au- tonomously hack websites. arXiv preprint arXiv:2402.06664

work page arXiv 2024

[29] [29]

Happe, A., & Cito, J. (2023). Getting pwn’d by AI: Penetration testing with large language models. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE)

2023

[30] [30]

Deng, G., Liu, Y., et al. (2024). PentestGPT: Evaluating and harnessing large language models for automated penetration testing. In Proceedings of the 33rd USENIX Security Symposium

2024

[31] [31]

Zalewski, M. (2014). American fuzzy lop (AFL).https://lcamtuf.coredump.cx/afl/

2014

[32] [32]

Google. (n.d.). OSS-Fuzz: Continuous fuzzing for open source software. Retrieved June 2026, fromhttps://github.com/google/oss-fuzz

2026

[33] [33]

K., Avgerinos, T., Rebert, A., & Brumley, D

Cha, S. K., Avgerinos, T., Rebert, A., & Brumley, D. (2012). Unleashing Mayhem on binary code. In Proceedings of the IEEE Symposium on Security and Privacy (S&P) (pp. 380–394)

2012

[34] [34]

Shoshitaishvili, Y., Wang, R., Salls, C., et al. (2016). SoK: (State of) the art of war: Offensive techniques in binary analysis. In Proceedings of the IEEE Symposium on Security and Privacy (S&P) (pp. 138–157)

2016

[35] [35]

Defense Advanced Research Projects Agency. (2016). Cyber Grand Challenge.https:// www.darpa.mil/program/cyber-grand-challenge 16

2016