Contextualizing Sink Knowledge for Java Vulnerability Discovery
Pith reviewed 2026-05-13 21:45 UTC · model grok-4.3
The pith
GONDAR discovers four times more Java vulnerabilities than Jazzer by targeting sink APIs with LLM filtering and collaborative agents.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
GONDAR is a sink-centric fuzzing framework that systematically leverages sink API semantics for targeted vulnerability discovery. It identifies reachable and exploitable sink call sites through CWE-specific scanning combined with LLM-assisted static filtering, then deploys an exploration agent to generate inputs that reach target call sites by solving path constraints and an exploitation agent to synthesize proof-of-concept exploits by satisfying vulnerability-triggering conditions. The agents and fuzzer continuously exchange seeds and runtime feedback.
What carries the argument
Two specialized agents—an exploration agent that solves path constraints to reach sink call sites and an exploitation agent that reasons about vulnerability conditions—working collaboratively with a coverage-guided fuzzer.
Load-bearing premise
LLM-assisted static filtering combined with CWE scanning can reliably identify reachable and exploitable sink call sites without excessive false positives or missed targets.
What would settle it
Running GONDAR and Jazzer side-by-side on the same real-world Java benchmarks and counting the distinct vulnerabilities each finds; a ratio close to one would falsify the four-times improvement claim.
Figures
read the original abstract
Java applications are prone to vulnerabilities stemming from the insecure use of security-sensitive APIs, such as file operations enabling path traversal or deserialization routines allowing remote code execution. These sink APIs encode critical information for vulnerability discovery: the program-specific constraints required to reach them and the exploitation conditions necessary to trigger security flaws. Despite this, existing fuzzers largely overlook such vulnerability-specific knowledge, limiting their effectiveness. We present GONDAR, a sink-centric fuzzing framework that systematically leverages sink API semantics for targeted vulnerability discovery. GONDAR first identifies reachable and exploitable sink call sites through CWE-specific scanning combined with LLM-assisted static filtering. It then deploys two specialized agents that work collaboratively with a coverage-guided fuzzer: an exploration agent generates inputs to reach target call sites by iteratively solving path constraints, while an exploitation agent synthesizes proof-of-concept exploits by reasoning about and satisfying vulnerability-triggering conditions. The agents and fuzzer continuously exchange seeds and runtime feedback, complementing each other. We evaluated GONDAR on real-world Java benchmarks, where it discovers four times more vulnerabilities than Jazzer, the state-of-the-art Java fuzzer. Notably, an earlier GONDAR version contributed to Team Atlanta's first-place CRS in the DARPA AI Cyber Challenge, and is integrated into OSS-CRS, a sandbox project in The Linux Foundation's OpenSSF, to analyze open-source Java projects, where it has already uncovered a zero-day vulnerability.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents GONDAR, a sink-centric fuzzing framework for Java vulnerability discovery. It first identifies reachable and exploitable sink call sites via CWE-specific scanning plus LLM-assisted static filtering, then deploys an exploration agent (to reach targets by solving path constraints) and an exploitation agent (to synthesize PoC exploits by satisfying vulnerability conditions) that collaborate with a coverage-guided fuzzer through seed and feedback exchange. On real-world Java benchmarks the framework is reported to discover four times more vulnerabilities than Jazzer; an earlier version contributed to a first-place DARPA AI Cyber Challenge result and has been integrated into OSS-CRS where it found a zero-day.
Significance. If the headline performance claims are substantiated with proper controls and metrics, the work would demonstrate that explicit sink-semantic knowledge can materially improve directed fuzzing effectiveness over general-purpose tools, with immediate practical value shown by the DARPA placement and zero-day discovery. The agent-based decomposition of reachability and exploit synthesis is a concrete engineering contribution that could be adopted or extended by other fuzzing pipelines.
major comments (2)
- [Abstract / Evaluation] Abstract and Evaluation section: the central claim of a 4× improvement over Jazzer is presented without any description of benchmark selection criteria, vulnerability counting methodology, statistical significance testing, or comparison controls. Because the headline result is load-bearing for the paper’s contribution, these omissions prevent verification that the gains are attributable to sink-centric targeting rather than other factors.
- [Approach / Sink Identification] Sink-identification pipeline (described in the approach section): no precision, recall, or false-positive rate is reported for the CWE-specific scanning combined with LLM-assisted static filtering. This step is the prerequisite that supplies targets to both agents; without quantified reliability on the same benchmarks used for the 4× claim, the attribution of improved vulnerability discovery to “contextualizing sink knowledge” remains unverified.
minor comments (2)
- [Approach] The interaction protocol between the exploration agent, exploitation agent, and the underlying fuzzer is described at a high level; a diagram or pseudocode listing the seed-exchange and feedback loop would improve clarity.
- [Abstract / Evaluation] The abstract states that GONDAR “has already uncovered a zero-day,” but the main text does not appear to provide a case study or CVE reference for that finding; adding a brief description would strengthen the practical-impact claim.
Simulated Author's Rebuttal
We are grateful to the referee for the thorough review and constructive suggestions. Below we provide point-by-point responses to the major comments. We will revise the manuscript to incorporate additional details and evaluations as outlined.
read point-by-point responses
-
Referee: [Abstract / Evaluation] Abstract and Evaluation section: the central claim of a 4× improvement over Jazzer is presented without any description of benchmark selection criteria, vulnerability counting methodology, statistical significance testing, or comparison controls. Because the headline result is load-bearing for the paper’s contribution, these omissions prevent verification that the gains are attributable to sink-centric targeting rather than other factors.
Authors: We acknowledge that the abstract is concise and does not include these methodological details. The Evaluation section of the manuscript describes the benchmarks as real-world Java applications and reports the number of vulnerabilities discovered, but we agree that more explicit criteria and methodology would strengthen the claims. In the revised manuscript, we will expand the Evaluation section with a subsection on 'Benchmark and Evaluation Methodology' that details: (1) benchmark selection criteria (e.g., open-source projects with known security issues, varying sizes from 10k to 100k LOC), (2) vulnerability counting as the number of unique, reproducible security violations (confirmed via stack traces or CVE matching) found within a 24-hour budget per tool, (3) note on the absence of statistical significance testing due to the high cost of repeated experiments but consistency across runs, and (4) additional controls including runs of Jazzer augmented with our identified sinks. This will better substantiate that the improvements stem from the sink-centric approach. revision: yes
-
Referee: [Approach / Sink Identification] Sink-identification pipeline (described in the approach section): no precision, recall, or false-positive rate is reported for the CWE-specific scanning combined with LLM-assisted static filtering. This step is the prerequisite that supplies targets to both agents; without quantified reliability on the same benchmarks used for the 4× claim, the attribution of improved vulnerability discovery to “contextualizing sink knowledge” remains unverified.
Authors: We agree that empirical validation of the sink identification pipeline is crucial. The Approach section details the CWE-specific scanning rules and the LLM-assisted filtering process, including the prompts used. However, we did not report precision and recall because establishing ground truth for sink reachability and exploitability across the entire benchmark suite would require extensive manual effort. For the revision, we will add an 'Accuracy of Sink Identification' subsection in the Evaluation, where we select a random sample of 100 methods from the benchmarks, manually determine reachable and exploitable sinks, and compute precision, recall, and F1 scores for the CWE scanner alone and the combined pipeline. This will provide the necessary quantification and support the attribution of performance gains to our sink knowledge contextualization. revision: yes
Circularity Check
No circularity: engineering framework evaluated against external baseline
full rationale
The paper describes GONDAR as a practical fuzzing pipeline that identifies sinks via CWE scanning plus LLM-assisted static filtering, then deploys exploration and exploitation agents alongside a coverage-guided fuzzer. No equations, fitted parameters, or self-referential derivations appear in the provided text. The headline performance claim (4x vulnerabilities vs. Jazzer plus a zero-day) is measured directly against an external tool on real-world benchmarks, making the result falsifiable outside any internal construction. No self-citations are invoked to justify uniqueness theorems, ansatzes, or load-bearing premises. The pipeline steps are presented as implementation choices whose effectiveness is assessed empirically rather than by definition or renaming of prior results. This satisfies the criteria for a self-contained artifact with no detectable circularity.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLM-assisted static filtering can accurately distinguish reachable and exploitable sink call sites from false positives
invented entities (2)
-
Exploration agent
no independent evidence
-
Exploitation agent
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Apache Log4j vulnerability guidance
CISA, “Apache Log4j vulnerability guidance.” https://www.cisa.gov /news-events/news/apache-log4j-vulnerability-guidance, 2021
work page 2021
-
[2]
Jazzer: Coverage-guided, in-process fuzzing for the JVM
Code Intelligence, “Jazzer: Coverage-guided, in-process fuzzing for the JVM.” https://github.com/CodeIntelligenceTesting/jazzer, 2021
work page 2021
-
[3]
JQF: Coverage-guided property- based testing in Java,
R. Padhye, C. Lemieux, and K. Sen, “JQF: Coverage-guided property- based testing in Java,” inProceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2019, (New York, NY , USA), p. 398–401, Association for Computing Machinery, 2019
work page 2019
-
[4]
FUGIO: Automatic exploit generation for PHP object injection vulnerabilities,
S. Park, D. Kim, S. Jana, and S. Son, “FUGIO: Automatic exploit generation for PHP object injection vulnerabilities,” in31st USENIX Security Symposium (USENIX Security 22), (Boston, MA), USENIX Association, 2022
work page 2022
-
[5]
S. Cao, B. He, X. Sun, Y . Ouyang, C. Zhang, X. Wu, T. Su, L. Bo, B. Li, C. Ma, J. Li, and T. Wei, “ODDFuzz: Discovering Java deserialization vulnerabilities via structure-aware directed greybox fuzzing,” in2023 IEEE Symposium on Security and Privacy (SP), (Los Alamitos, CA, USA), pp. 2726–2743, IEEE Computer Society, May 2023
work page 2023
-
[6]
Where URLs become weapons: Automated discovery of SSRF vulnerabilities in web applications,
E. Wang, J. Chen, W. Xie, C. Wang, Y . Gao, Z. Wang, H. Duan, Y . Liu, and B. Wang, “Where URLs become weapons: Automated discovery of SSRF vulnerabilities in web applications,” in2024 IEEE Symposium on Security and Privacy (SP), pp. 239–257, 2024
work page 2024
-
[7]
Y . Zhao, Y . Zhang, and M. Yang, “Remote code execution from SSTI in the sandbox: Automatically detecting and exploiting template escape bugs,” inProceedings of the 32nd USENIX Conference on Security Symposium, SEC ’23, (USA), USENIX Association, 2023
work page 2023
-
[8]
B. Chen, L. Zhang, X. Huang, Y . Cao, K. Lian, Y . Zhang, and M. Yang, “Efficient detection of Java deserialization gadget chains via bottom-up gadget search and dataflow-aided payload construction,” in 2024 IEEE Symposium on Security and Privacy (SP), pp. 3961–3978, 2024
work page 2024
- [9]
-
[10]
Z. Lin, Y . Zhang, J. Dai, X. Huang, B. Xiang, G. Yang, L. Yuan, L. Zhang, F. Liu, T. Chen, and M. Yang,Effective directed fuzzing with hierarchical scheduling for web vulnerability detection. USA: USENIX Association, 2025
work page 2025
-
[11]
E. Trickel, F. Pagani, C. Zhu, L. Dresel, G. Vigna, C. Kruegel, R. Wang, T. Bao, Y . Shoshitaishvili, and A. Doup ´e, “Toss a fault to your Witcher: Applying grey-box coverage-guided mutational fuzzing to detect SQL and command injection vulnerabilities,” in2023 IEEE Symposium on Security and Privacy (SP), pp. 2658–2675, 2023
work page 2023
-
[12]
Atropos: Effective fuzzing of web applications for server-side vulnerabilities,
E. G ¨uler, S. Schumilo, M. Schloegel, N. Bars, P. G ¨orz, X. Xu, C. Kaygusuz, and T. Holz, “Atropos: Effective fuzzing of web applications for server-side vulnerabilities,” inProceedings of the 33rd USENIX Conference on Security Symposium, SEC ’24, (USA), USENIX Association, 2024
work page 2024
-
[13]
Predator: Directed web application fuzzing for efficient vulnerability validation,
C. Wang, W. Meng, C. Luo, and P. Li, “Predator: Directed web application fuzzing for efficient vulnerability validation,” in2025 IEEE Symposium on Security and Privacy (SP), pp. 886–902, 2025
work page 2025
-
[14]
GitHub, “CodeQL.” https://github.com/github/codeql. Semantic code analysis engine
-
[15]
OSS-CRS: Open Source Cyber Reasoning System
OpenSSF, “OSS-CRS: Open Source Cyber Reasoning System.” https: //github.com/ossf/oss-crs, 2025
work page 2025
-
[16]
Code Intelligence, “Jazzer sanitizers.” https://github.com/CodeIntelli genceTesting/jazzer/tree/main/sanitizers/src/main/java/com/code int elligence/jazzer/sanitizers, 2021
work page 2021
-
[17]
LLVM Project, “libFuzzer – value profile.” https://llvm.org/docs/Li bFuzzer.html#value-profile. Accessed: 2025
work page 2025
-
[18]
OSS-Fuzz: Continuous fuzzing for open source software
Google, “OSS-Fuzz: Continuous fuzzing for open source software.” https://github.com/google/oss-fuzz, 2016
work page 2016
- [19]
- [20]
-
[21]
Joern: The bug hunter’s workbench
joern.io, “Joern: The bug hunter’s workbench.” https://github.com/j oernio/joern, 2024
work page 2024
-
[22]
Optimization of object-oriented programs using static class hierarchy analysis,
J. Dean, D. Grove, and C. Chambers, “Optimization of object-oriented programs using static class hierarchy analysis,” inECOOP’95 — Object-Oriented Programming, 9th European Conference, ˚Aarhus, Denmark, August 7–11, 1995(M. Tokoro and R. Pareschi, eds.), (Berlin, Heidelberg), pp. 77–101, Springer Berlin Heidelberg, 1995
work page 1995
-
[23]
Fast static analysis of C++ virtual function calls,
D. F. Bacon and P. F. Sweeney, “Fast static analysis of C++ virtual function calls,”SIGPLAN Not., vol. 31, p. 324–341, Oct. 1996
work page 1996
-
[24]
IRIS: LLM-assisted static analysis for detecting security vulnerabilities,
Z. Li, S. Dutta, and M. Naik, “IRIS: LLM-assisted static analysis for detecting security vulnerabilities,” inInternational Conference on Learning Representations(Y . Yue, A. Garg, N. Peng, F. Sha, and R. Yu, eds.), vol. 2025, pp. 35735–35758, 2025
work page 2025
-
[25]
DARPA AI Cyber Challenge (AIxCC)
Defense Advanced Research Projects Agency (DARPA), “DARPA AI Cyber Challenge (AIxCC).” https://aicyberchallenge.com/, 2023. Accessed: 2025-11-13
work page 2023
-
[26]
Amazon Web Services, “Amazon EC2 on-demand pricing.” https: //aws.amazon.com/ec2/pricing/on-demand/. Accessed: 2025-11-13
work page 2025
-
[27]
SoK: DARPA’s AI Cyber Challenge (AIxCC): Competition Design, Archi- tectures, and Lessons Learned,
C. Zhang, Y . Park, F. Fleischer, Y .-F. Fu, J. Kim, D. Kim, Y . Kim, Q. Xu, A. Chin, Z. Sheng, H. Zhao, B. J. Lee, J. Wang, M. Pel- ican, D. J. Musliner, J. Huang, J. Silliman, M. Mcdaniel, J. Casa- vant, I. Goldthwaite, N. Vidovich, M. Lehman, and T. Kim, “SoK: DARPA’s AI Cyber Challenge (AIxCC): Competition Design, Archi- tectures, and Lessons Learned,” 2026
work page 2026
-
[28]
OSS- Fuzz-Gen: Automated fuzz target generation
D. Liu, O. Chang, J. Metzman, M. Sablotny, and M. Maruseac, “OSS- Fuzz-Gen: Automated fuzz target generation.” https://github.com/goo gle/oss-fuzz-gen, 2024
work page 2024
-
[29]
C. Frohoff, G. Lawrence,et al., “ysoserial: A proof-of-concept tool for generating payloads that exploit unsafe Java object deserializa- tion.” https://github.com/frohoff/ysoserial, 2015
work page 2015
-
[30]
recheck: The trustworthy ReDoS checker
Makenowjust Labs, “recheck: The trustworthy ReDoS checker.” https: //makenowjust-labs.github.io/recheck/. Accessed: 2025
work page 2025
-
[31]
Atheris: A coverage-guided, native Python fuzzer
Google, “Atheris: A coverage-guided, native Python fuzzer.” https: //github.com/google/atheris, 2020
work page 2020
-
[32]
go-fuzz: Randomized testing for Go
D. Vyukov, “go-fuzz: Randomized testing for Go.” https://github.c om/dvyukov/go-fuzz, 2015
work page 2015
-
[33]
A hybrid analysis to detect Java se- rialisation vulnerabilities,
S. Rasheed and J. Dietrich, “A hybrid analysis to detect Java se- rialisation vulnerabilities,” in2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 1209– 1213, 2020
work page 2020
-
[34]
Improving Java deserialization gadget chain mining via overriding-guided object generation,
S. Cao, X. Sun, X. Wu, L. Bo, B. Li, R. Wu, W. Liu, B. He, Y . Ouyang, and J. Li, “Improving Java deserialization gadget chain mining via overriding-guided object generation,” inProceedings of the 45th International Conference on Software Engineering, ICSE ’23, p. 397–409, IEEE Press, 2023
work page 2023
-
[35]
Automated discovery of deserialization gadget chains,
I. Haken, “Automated discovery of deserialization gadget chains,” in Black Hat USA, 2018
work page 2018
-
[36]
Acquirer: A hybrid approach to detecting algorithmic complexity vulnerabilities,
Y . Liu and W. Meng, “Acquirer: A hybrid approach to detecting algorithmic complexity vulnerabilities,” inProceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Secu- rity, CCS ’22, (New York, NY , USA), p. 2071–2084, Association for Computing Machinery, 2022
work page 2022
-
[37]
W. Blair, A. Mambretti, S. Arshad, M. Weissbacher, W. Robertson, E. Kirda, and M. Egele, “HotFuzz: Discovering temporal and spatial denial-of-service vulnerabilities through guided micro-fuzzing,”ACM Trans. Priv. Secur., vol. 25, July 2022
work page 2022
-
[38]
UChecker: Automatically detecting PHP-based unrestricted file upload vulnerabilities,
J. Huang, Y . Li, J. Zhang, and R. Dai, “UChecker: Automatically detecting PHP-based unrestricted file upload vulnerabilities,” in2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 581–592, 2019
work page 2019
-
[39]
FUSE: Finding file upload bugs via penetration testing,
T. Lee, S. Wi, S. Lee, and S. Son, “FUSE: Finding file upload bugs via penetration testing,” inNetwork and Distributed System Security Symposium (NDSS), 2020
work page 2020
-
[40]
Z. Liu, K. An, and Y . Cao, “Undefined-oriented programming: De- tecting and chaining prototype pollution gadgets in Node.js template engines for malicious consequences,” in2024 IEEE Symposium on Security and Privacy (SP), pp. 4015–4033, 2024
work page 2024
-
[41]
Detecting Node.js prototype pollution vulnerabilities via object lookup analysis,
S. Li, M. Kang, J. Hou, and Y . Cao, “Detecting Node.js prototype pollution vulnerabilities via object lookup analysis,” inProceedings of the 29th ACM Joint Meeting on European Software Engineering Con- ference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2021, (New York, NY , USA), p. 268–279, Association for Computing Machinery, 2021
work page 2021
-
[42]
Silent spring: Pro- totype pollution leads to remote code execution in Node.js,
M. Shcherbakov, M. Balliu, and C.-A. Staicu, “Silent spring: Pro- totype pollution leads to remote code execution in Node.js,” in Proceedings of the 32nd USENIX Conference on Security Symposium, SEC ’23, (USA), USENIX Association, 2023
work page 2023
-
[43]
webFuzz: Grey-box fuzzing for web ap- plications,
O. van Rooij, M. A. Charalambous, D. Kaizer, M. Papaevripides, and E. Athanasopoulos, “webFuzz: Grey-box fuzzing for web ap- plications,” inComputer Security – ESORICS 2021: 26th European Symposium on Research in Computer Security, Darmstadt, Ger- many, October 4–8, 2021, Proceedings, Part I, (Berlin, Heidelberg), p. 152–172, Springer-Verlag, 2021
work page 2021
-
[44]
Backrest: A model-based feedback-driven greybox fuzzer for web applications,
F. Gauthier, B. Hassanshahi, B. Selwyn-Smith, T. N. Mai, M. Schl¨uter, and M. Williams, “Backrest: A model-based feedback-driven greybox fuzzer for web applications,”arXiv preprint arXiv:2108.08455, 2021
-
[45]
Y . Deng, C. S. Xia, H. Peng, C. Yang, and L. Zhang, “Large language models are zero-shot fuzzers: Fuzzing deep-learning libraries via large language models,” inProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2023, (New York, NY , USA), p. 423–435, Association for Computing Machinery, 2023
work page 2023
-
[46]
Fuzz4ALL: Universal fuzzing with large language models,
C. S. Xia, M. Paltenghi, J. L. Tian, M. Pradel, and L. Zhang, “Fuzz4ALL: Universal fuzzing with large language models,” in2024 IEEE/ACM 46th International Conference on Software Engineering (ICSE), (Los Alamitos, CA, USA), pp. 1547–1559, IEEE Computer Society, Apr. 2024
work page 2024
-
[47]
CodaMosa: Escaping coverage plateaus in test generation with pre-trained large language models,
C. Lemieux, J. P. Inala, S. K. Lahiri, and S. Sen, “CodaMosa: Escaping coverage plateaus in test generation with pre-trained large language models,” inProceedings of the 45th International Confer- ence on Software Engineering, ICSE ’23, p. 919–931, IEEE Press, 2023
work page 2023
-
[48]
Harnessing large language models for seed generation in greybox fuzzing,
W. Shi, Y . Zhang, X. Xing, and J. Xu, “Harnessing large language models for seed generation in greybox fuzzing,”arXiv preprint arXiv:2411.18143, 2024
-
[49]
C. Chen, B. Dolan-Gavitt, and Z. Lin,ELFUZZ: Efficient input gen- eration via LLM-driven synthesis over fuzzer space. USA: USENIX Association, 2025
work page 2025
-
[50]
Augmenting greybox fuzzing with generative AI,
J. Hu, Q. Zhang, and H. Yin, “Augmenting greybox fuzzing with generative AI,”arXiv preprint arXiv:2306.06782, 2023
-
[51]
Large language model guided protocol fuzzing,
R. Meng, M. Mirchev, M. B ¨ohme, and A. Roychoudhury, “Large language model guided protocol fuzzing,” inProceedings of the 31st Annual Network and Distributed System Security Symposium, NDSS 2024, 2024
work page 2024
-
[52]
WhiteFox: White-box compiler fuzzing empowered by large lan- guage models,
C. Yang, Y . Deng, R. Lu, J. Yao, J. Liu, R. Jabbarvand, and L. Zhang, “WhiteFox: White-box compiler fuzzing empowered by large lan- guage models,”Proc. ACM Program. Lang., vol. 8, Oct. 2024
work page 2024
-
[53]
C. Yang, Z. Zhao, and L. Zhang,KernelGPT: Enhanced kernel fuzzing via large language models, p. 560–573. New York, NY , USA: Association for Computing Machinery, 2025
work page 2025
-
[54]
OSS- Fuzz-Gen: Automated fuzz target generation,
D. Liu, O. Chang, J. metzman, M. Sablotny, and M. Maruseac, “OSS- Fuzz-Gen: Automated fuzz target generation,” May 2024
work page 2024
-
[55]
How effective are they? Exploring large language model based fuzz driver generation,
C. Zhang, Y . Zheng, M. Bai, Y . Li, W. Ma, X. Xie, Y . Li, L. Sun, and Y . Liu, “How effective are they? Exploring large language model based fuzz driver generation,” inProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2024, (New York, NY , USA), p. 1223–1235, Association for Computing Machinery, 2024
work page 2024
-
[56]
Prompt fuzzing for fuzz driver generation,
Y . Lyu, Y . Xie, P. Chen, and H. Chen, “Prompt fuzzing for fuzz driver generation,” inProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security, CCS ’24, (New York, NY , USA), p. 3793–3807, Association for Computing Machinery, 2024. Appendix A. Benchmark Contamination Analysis Table 11 summarizes the composition and contami...
work page 2024
-
[57]
The paper provides a valuable step forward in an established field via a coherent end-to-end design tar- geting the reachability–exploitability gap in sink-based vulnerability discovery
-
[58]
The paper creates a new tool to enable future science which provides substantial improvement over the cho- sen baseline, supported by ablations analysis
-
[59]
The new tool offers potential community value if the framework and benchmark are released. D.4. Noteworthy Concerns
-
[60]
Internal precision is very low ( ∼14%), implying many false positives and unclear practical reliability
-
[61]
Best performance relies on expensive flagship LLMs that lead to high monetary cost; cheaper models seem to significantly degrade performance. Some cheaper, open-source models may do better in terms of exploita- tion success, but internal working effectiveness remains overlooked, casting a shadow on performance
-
[62]
Comparison is limited to Jazzer, lacking evaluation against state-of-the-art analyzers (e.g., IRIS@ICLR’25, RepoAudit@ICML’25, LLMDFA@NeuIPS’24) and stronger fuzzers (e.g., PolyFuzz@USENIX Secu- rity’23). While a set of state-of-the-art industry tools is considered, those academic works are overlooked, leaving the technical advancement unclear. Appendix E...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.