pith. machine review for the scientific record. sign in

arxiv: 2603.27224 · v3 · submitted 2026-03-28 · 💻 cs.SE · cs.CR

Recognition: no theorem link

Finding Memory Leaks in C/C++ Programs via Neuro-Symbolic Augmented Static Analysis

Authors on Pith no claims yet

Pith reviewed 2026-05-14 22:30 UTC · model grok-4.3

classification 💻 cs.SE cs.CR
keywords memory leaksstatic analysisC/C++LLMZ3CodeQLbug detectionneuro-symbolic
0
0 comments X

The pith

MemHint augments static analyzers with LLMs and Z3 to detect 52 memory leaks in 3.4 million lines of C/C++ code.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents MemHint, a neuro-symbolic system that improves static analysis for memory leaks in C and C++ programs. It uses large language models to classify custom memory allocation and deallocation functions beyond standard primitives and to record ownership details for arguments or return values. Z3 then validates each classification against the function's control-flow graph and later filters warnings to infeasible paths only. The validated summaries are injected into CodeQL and Infer via their extension points, after which an LLM confirms which remaining warnings represent genuine leaks. On seven real projects totaling over 3.4 million lines, the pipeline produced 52 unique detections, 49 of which were confirmed and fixed, at roughly $1.7 per bug.

Core claim

MemHint parses a target codebase, applies an LLM to label each function as allocator, deallocator or neither while producing ownership summaries, discards any summary whose claimed operation cannot occur on a feasible path according to Z3, injects the surviving summaries into CodeQL and Infer, uses Z3 to drop warnings on infeasible paths, and runs a final LLM check to retain only genuine bugs.

What carries the argument

Neuro-symbolic pipeline that uses LLM classification of custom memory functions to produce ownership summaries and Z3 to validate reachability and filter infeasible paths.

Load-bearing premise

The LLM correctly identifies which functions perform memory operations and what carries ownership, and Z3 accurately determines which paths are feasible.

What would settle it

A codebase containing many project-specific allocators where the LLM misclassifies them would cause MemHint to report no more leaks than unaugmented CodeQL or Infer.

Figures

Figures reproduced from arXiv: 2603.27224 by Bo Wang, David Lo, Huihui Huang, Jieke Shi, Zhou Yang.

Figure 1
Figure 1. Figure 1: A memory leak in OpenSSL (v3.6.1) [5 lll➊hh f igure 1: A memory leak in OpenSSL (v3.6.1) [5 Figure 1: A memory leak in OpenSSL (v3.6.1) [ [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: Overview of the MemHint pipeline. Stage 1 (Summary Generation): extract metadata, generate summaries with an LLM, and validate with Z3 SMT solver. Stage 2 (Summary-Augmented Analysis): inject validated summaries into static analyzers (e.g., CodeQL, Infer). Stage 3 (Warning Validation): filter infeasible warnings with Z3 and validate remaining ones using an LLM. (a) Valid Allocator Entry 𝑏1 Alloc Ret null R… view at source ↗
Figure 4
Figure 4. Figure 4: Z3-based Summary Validation. (a)–(b) Allocator: (a) [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Z3-based memory-leak feasibility checking Figure 5: Z3-based memory-leak feasibility checking [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
read the original abstract

Memory leaks remain prevalent in real-world C/C++ software. Static analyzers such as CodeQL provide scalable program analysis but frequently miss such bugs because they cannot recognize project-specific custom memory-management functions and lack path-sensitive control-flow modeling. We present MemHint, a neuro-symbolic pipeline that addresses both limitations by combining LLMs' semantic understanding of code with Z3-based symbolic reasoning. MemHint parses the target codebase and applies an LLM to classify each function as a memory allocator, deallocator, or neither, producing function summaries that record which argument or return value carries memory ownership, extending the analyzer's built-in knowledge beyond standard primitives such as malloc and free. A Z3-based validation step checks each summary against the function's control-flow graph, discarding those whose claimed memory operation is unreachable on any feasible path. The validated summaries are injected into CodeQL and Infer via their respective extension mechanisms. Z3 path feasibility filtering then eliminates warnings on infeasible paths, and a final LLM-based validation step confirms whether each remaining warning is a genuine bug. On seven real-world C/C++ projects totaling over 3.4M lines of code, MemHint detects 52 unique memory leaks (49 confirmed/fixed, 4 CVEs submitted) at approximately $1.7 per detected bug, compared to 19 by vanilla CodeQL and 3 by vanilla Infer.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The paper introduces MemHint, a neuro-symbolic pipeline that uses an LLM to classify custom C/C++ memory-management functions (allocators/deallocators) and produce ownership summaries, validates them with Z3 reachability checks on the CFG, injects the summaries into CodeQL and Infer, applies Z3 path-feasibility filtering, and uses a final LLM step to confirm warnings. On seven real-world projects (>3.4 MLOC) it reports 52 unique leaks (49 confirmed/fixed, 4 CVEs) versus 19 for vanilla CodeQL and 3 for vanilla Infer, at roughly $1.7 per detected bug.

Significance. If the LLM classifications prove reliable, the work shows a practical, low-cost route to extending scalable static analyzers to project-specific APIs by combining semantic understanding with symbolic validation. The scale of the evaluation corpus, the concrete bug counts, independent confirmations, and cost metric are concrete strengths that would be of interest to the static-analysis and software-engineering communities.

major comments (3)
  1. [§3] §3 (LLM-based classification pipeline): No precision, recall, or sampled manual-audit numbers are reported for the LLM's labeling of custom allocators, deallocators, and ownership semantics on the 3.4 MLOC corpus. Because the generated summaries are injected directly into CodeQL and Infer, any systematic misclassification would inflate the reported 52 detections; the Z3 step only checks reachability of the claimed operation, not semantic correctness of the label.
  2. [§4] §4 (Experimental evaluation): The comparison with vanilla CodeQL and Infer lacks an ablation that removes either the LLM summaries or the Z3 filtering steps, so the individual contribution of each component to the jump from 19/3 to 52 leaks cannot be quantified. In addition, no false-negative analysis or sampling of missed leaks is provided.
  3. [§4] §4 (Bug confirmation): The claim that 49 of the 52 leaks were independently confirmed/fixed is load-bearing for the central improvement claim, yet the manuscript gives no protocol for the confirmation process, inter-annotator agreement, or how false positives were ruled out.
minor comments (3)
  1. [§4] The cost figure of approximately $1.7 per bug should be accompanied by an explicit breakdown (LLM API calls, Z3 queries, etc.) in the evaluation section.
  2. Clarify the exact extension mechanisms used to inject the validated summaries into CodeQL and Infer (e.g., which predicates or taint rules are overridden).
  3. A small number of typos and inconsistent capitalization appear in the abstract and section headings; a light copy-edit pass is recommended.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and indicate the planned revisions.

read point-by-point responses
  1. Referee: [§3] §3 (LLM-based classification pipeline): No precision, recall, or sampled manual-audit numbers are reported for the LLM's labeling of custom allocators, deallocators, and ownership semantics on the 3.4 MLOC corpus. Because the generated summaries are injected directly into CodeQL and Infer, any systematic misclassification would inflate the reported 52 detections; the Z3 step only checks reachability of the claimed operation, not semantic correctness of the label.

    Authors: We agree that quantitative metrics for the LLM classification step are missing and would strengthen the evaluation. In the revision we will add a manual audit of a random sample of 100 functions drawn from the corpus, reporting precision, recall, and F1 scores separately for allocator/deallocator identification and for ownership-summary accuracy. While the Z3 reachability check discards summaries whose claimed operations are unreachable, we acknowledge it does not verify semantic correctness; the added audit will quantify any misclassification rate. The high rate of independently confirmed bugs (49/52) supplies supporting end-to-end evidence, but we will make the classification reliability explicit. revision: yes

  2. Referee: [§4] §4 (Experimental evaluation): The comparison with vanilla CodeQL and Infer lacks an ablation that removes either the LLM summaries or the Z3 filtering steps, so the individual contribution of each component to the jump from 19/3 to 52 leaks cannot be quantified. In addition, no false-negative analysis or sampling of missed leaks is provided.

    Authors: We concur that an ablation study is needed to isolate component contributions. We will add results in §4 for three configurations on the same seven projects: (i) vanilla CodeQL/Infer, (ii) augmented with LLM summaries only, and (iii) full pipeline with both LLM summaries and Z3 filtering. This will quantify the incremental gains. For false negatives we will sample 10 % of functions in the largest project, manually inspect for missed leaks, and report an estimated recall; we note that exhaustive ground truth is unavailable, but sampling provides a practical bound. revision: yes

  3. Referee: [§4] §4 (Bug confirmation): The claim that 49 of the 52 leaks were independently confirmed/fixed is load-bearing for the central improvement claim, yet the manuscript gives no protocol for the confirmation process, inter-annotator agreement, or how false positives were ruled out.

    Authors: We will expand §4 with a precise confirmation protocol: each of the 52 warnings was reviewed independently by two authors; disagreements were resolved by joint discussion until consensus. We will report the resulting inter-annotator agreement percentage. False positives were ruled out by (a) confirming via Z3 that no deallocation occurs on any feasible path and (b) manual inspection of the call graph and ownership flow. Where possible, confirmation was further supported by submitted patches or CVEs. These details will be added to the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical results on external projects with independent validation

full rationale

The paper describes a neuro-symbolic pipeline evaluated on seven external real-world C/C++ codebases (3.4M LOC total). Claims rest on direct comparisons to unmodified CodeQL and Infer baselines plus manual confirmation of detected bugs (49/52 fixed, CVEs submitted). No mathematical derivation chain, no fitted parameters renamed as predictions, no self-citations invoked as load-bearing uniqueness theorems, and no ansatz or renaming of known results. The LLM classification and Z3 filtering steps are tool components whose correctness is assessed via external outcomes rather than by construction from the same inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach rests on standard assumptions about LLM semantic understanding and solver correctness rather than new fitted parameters or invented entities.

axioms (2)
  • standard math Z3 solver accurately determines path feasibility in the control-flow graph
    Invoked in the validation step to discard unreachable summaries.
  • domain assumption LLM can reliably classify functions as allocators, deallocators, or neither based on code semantics
    Core step that extends the analyzer beyond built-in primitives.

pith-pipeline@v0.9.0 · 5552 in / 1392 out tokens · 44422 ms · 2026-05-14T22:30:44.761213+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

77 extracted references · 77 canonical work pages

  1. [1]

    Duck, and Abhik Roychoudhury

    Jinsheng Ba, Gregory J. Duck, and Abhik Roychoudhury. 2023. Efficient Greybox Fuzzing to Detect Memory Errors. InProceedings of the 37th IEEE/ACM Inter- national Conference on Automated Software Engineering(Rochester, MI, USA) (ASE ’22). Association for Computing Machinery, New York, NY, USA, Article 37, 12 pages. doi:10.1145/3551349.3561161

  2. [2]

    Max Brunsfeld and Tree sitter contributors. [n. d.]. Tree-sitter: An incremental parsing system for programming tools. https://tree-sitter.github.io/tree-sitter/. Accessed: 2026

  3. [3]

    Cristian Cadar, Daniel Dunbar, Dawson R Engler, et al. 2008. Klee: unassisted and automatic generation of high-coverage tests for complex systems programs.. InOSDI, Vol. 8. 209–224

  4. [4]

    Xi Chen, Asia Slowinska, and Herbert Bos. 2013. MemBrush: A practical tool to detect custom memory allocators in C binaries. In2013 20th Working Conference on Reverse Engineering (WCRE). 477–478. doi:10.1109/WCRE.2013.6671326

  5. [5]

    Sigmund Cherem, Lonnie Princehouse, and Radu Rugina. 2007. Practical memory leak detection using guarded value-flow analysis. InProceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation. 480– 491

  6. [6]

    Accessed 2026

    Clang. Accessed 2026. LeakSanitizer: a run-time memory leak detector. https: //clang.llvm.org/docs/LeakSanitizer.html

  7. [7]

    William G. Cochran. 1977.Sampling Techniques, 3rd Edition. John Wiley

  8. [8]

    Leonardo De Moura and Nikolaj Bjørner. 2008. Z3: an efficient SMT solver. In Proceedings of the Theory and Practice of Software, 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems(Budapest, Hungary)(TACAS’08/ETAPS’08). Springer-Verlag, Berlin, Heidelberg, 337–340

  9. [9]

    Guilherme Otávio de Sena and Rivalino Matias. 2018. A Systematic Map- ping Review of Memory Leak Detection Techniques. In2018 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW). 264–270. doi:10.1109/ISSREW.2018.00017

  10. [10]

    Dino Distefano, Manuel Fähndrich, Francesco Logozzo, and Peter W. O’Hearn

  11. [11]

    ACM62, 8 (July 2019), 62–70

    Scaling static analyses at Facebook.Commun. ACM62, 8 (July 2019), 62–70. doi:10.1145/3338112

  12. [12]

    Xueying Du, Jiayi Feng, Yi Zou, Wei Xu, Jie Ma, Wei Zhang, Sisi Liu, Xin Peng, and Yiling Lou. 2026. Reducing False Positives in Static Bug Detection with LLMs: An Empirical Study in Industry.arXiv preprint arXiv:2601.18844(2026)

  13. [13]

    Navid Emamdoost, Qiushi Wu, Kangjie Lu, and Stephen McCamant. 2021. Detect- ing kernel memory leaks in specialized modules with ownership reasoning. In The 2021 Annual Network and Distributed System Security Symposium (NDSS’21)

  14. [14]

    Facebook. 2026. Infer Static Analyzer. https://fbinfer.com. [Accessed 23-03-2026]

  15. [15]

    Gang Fan, Rongxin Wu, Qingkai Shi, Xiao Xiao, Jinguo Zhou, and Charles Zhang

  16. [16]

    In2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)

    SMOKE: Scalable Path-Sensitive Memory Leak Detection for Millions of Lines of Code. In2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). 72–82. doi:10.1109/ICSE.2019.00025

  17. [17]

    Zhiyu Fan, Xiang Gao, Martin Mirchev, Abhik Roychoudhury, and Shin Hwei Tan. 2023. Automated repair of programs from large language models. In2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 1469–1481

  18. [18]

    Qing Gao, Yingfei Xiong, Yaqing Mi, Lu Zhang, Weikun Yang, Zhaoping Zhou, Bing Xie, and Hong Mei. 2015. Safe memory-leak fixing for c programs. In2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 1. IEEE, 459–470

  19. [19]

    GitHub. 2026. CodeQL. https://codeql.github.com/. [Accessed 23-03-2026]

  20. [20]

    GitHub. 2026. GitHub Actions workflow security analysis with CodeQL is now generally available. https://github.blog/changelog/2025-04-22-github-actions- workflow-security-analysis-with-codeql-is-now-generally-available/. [Ac- cessed 23-03-2026]

  21. [21]

    Jinyao Guo, Chengpeng Wang, Xiangzhe Xu, Zian Su, and Xiangyu Zhang. 2025. Repoaudit: An autonomous llm-agent for repository-level code auditing.arXiv preprint arXiv:2501.18160(2025)

  22. [22]

    Zhaoqiang Guo, Tingting Tan, Shiran Liu, Xutong Liu, Wei Lai, Yibiao Yang, Yanhui Li, Lin Chen, Wei Dong, and Yuming Zhou. 2023. Mitigating false pos- itive static analysis warnings: Progress, challenges, and opportunities.IEEE Transactions on Software Engineering49, 12 (2023), 5154–5188

  23. [23]

    Wookhyun Han, Byunggill Joe, Byoungyoung Lee, Chengyu Song, and Insik Shin

  24. [24]

    InNetwork and Distributed Systems Security (NDSS) Symposium 2018

    Enhancing memory error detection for large-scale applications and fuzz testing. InNetwork and Distributed Systems Security (NDSS) Symposium 2018

  25. [25]

    Reed Hastings. 1992. Purify: Fast detection of memory leaks and access errors. InProceedings of the USENIX Winter’92 Conference. 125–136

  26. [26]

    David L Heine and Monica S Lam. 2003. A practical flow-sensitive and context- sensitive C and C++ memory leak detector. InProceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation. 168–181

  27. [27]

    Xinyi Hou, Yanjie Zhao, Yue Liu, Zhou Yang, Kailong Wang, Li Li, Xiapu Luo, David Lo, John Grundy, and Haoyu Wang. 2024. Large Language Models for Software Engineering: A Systematic Literature Review.ACM Trans. Softw. Eng. Methodol.33, 8, Article 220 (Dec. 2024), 79 pages. doi:10.1145/3695988

  28. [28]

    Huimin Hu, Yingying Wang, Julia Rubin, and Michael Pradel. 2025. An Empirical Study of Suppressed Static Analysis Warnings.Proc. ACM Softw. Eng.2, FSE, Article FSE014 (June 2025), 22 pages. doi:10.1145/3715729

  29. [29]

    Mohsen Iranmanesh, Sina Moradi Sabet, Sina Marefat, Ali Javidi Ghasr, Allison Wilson, Iman Sharafaldin, and Mohammad A Tayebi. 2025. ZeroFalse: Improving Precision in Static Analysis with LLMs.arXiv preprint arXiv:2510.02534(2025)

  30. [30]

    Nihal Jain, Robert Kwiatkowski, Baishakhi Ray, Murali Krishna Ramanathan, and Varun Kumar. 2025. On Mitigating Code LLM Hallucinations with API Documentation. In2025 IEEE/ACM 47th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). 237–248. doi:10.1109/ ICSE-SEIP66354.2025.00027

  31. [31]

    Juyong Jiang, Fan Wang, Jiasi Shen, Sungju Kim, and Sunghun Kim. 2026. A Survey on Large Language Models for Code Generation.ACM Trans. Softw. Eng. Methodol.35, 2, Article 58 (Jan. 2026), 72 pages. doi:10.1145/3747588

  32. [32]

    Changhee Jung, Sangho Lee, Easwaran Raman, and Santosh Pande. 2014. Au- tomated memory leak detection for production use. InProceedings of the 36th International Conference on Software Engineering. 825–836

  33. [33]

    Yungbum Jung and Kwangkeun Yi. 2008. Practical memory leak detector based on parameterized procedural summaries. InProceedings of the 7th international symposium on Memory management. 131–140

  34. [34]

    Hong Jin Kang, Khai Loong Aw, and David Lo. 2022. Detecting false alarms from automatic static analysis tools: how far are we?. InProceedings of the 44th International Conference on Software Engineering(Pittsburgh, Pennsylvania)(ICSE ’22). Association for Computing Machinery, New York, NY, USA, 698–709. doi:10. 1145/3510003.3510214

  35. [35]

    Giyeol Kim, Dohyun Ryu, Seungjin Bae, Changyul Lee, and Taegyu Kim. 2025. Fuzzing Acceleration for Memory Safety Bug Discovery with Slicer. In2025 IEEE Annual Computer Security Applications Conference (ACSAC). 46–59. doi:10.1109/ ACSAC67867.2025.00020

  36. [36]

    Claire Le Goues, Michael Pradel, and Abhik Roychoudhury. 2019. Automated program repair.Commun. ACM62, 12 (2019), 56–65

  37. [37]

    Haonan Li, Hang Zhang, Kexin Pei, and Zhiyun Qian. 2025. Towards More Accurate Static Analysis for Taint-Style Bug Detection in Linux Kernel. In2025 40th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 380–392

  38. [38]

    Kaixuan Li, Sen Chen, Lingling Fan, Ruitao Feng, Han Liu, Chengwei Liu, Yang Liu, and Yixiang Chen. 2023. Comparison and evaluation on static application security testing (sast) tools for java. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 921–933

  39. [39]

    Wen Li, Haipeng Cai, Yulei Sui, and David Manz. 2020. PCA: memory leak detection using partial call-path analysis. InProceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1621–1625

  40. [40]

    Ziyang Li, Saikat Dutta, and Mayur Naik. 2024. IRIS: LLM-assisted static analysis for detecting security vulnerabilities.arXiv preprint arXiv:2405.17238(2024). 11 Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Huihui Huang, Jieke Shi, Bo Wang, Zhou Yang, and David Lo

  41. [41]

    Hongliang Liang, Luming Yin, Guohao Wu, Yuxiang Li, Qiuping Yi, and Lei Wang. 2025. LeakGuard: Detecting Memory Leaks Accurately and Scalably. arXiv preprint arXiv:2504.04422(2025)

  42. [42]

    Huqiu Liu, Yuping Wang, Lingbo Jiang, and Shimin Hu. 2014. PF-Miner: A new paired functions mining method for Android kernel in error paths. In2014 IEEE 38th Annual Computer Software and Applications Conference. IEEE, 33–42

  43. [43]

    Hu-Qiu Liu, Jia-Ju Bai, Yu-Ping Wang, Zhe Bian, and Shi-Min Hu. 2015. Pairminer: mining for paired functions in Kernel extensions. In2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 93–101

  44. [44]

    LLVM Project. 2026. Clang Static Analyzer. https://clang-analyzer.llvm.org/. [Accessed 23-03-2026]

  45. [45]

    Yunlong Lyu, Yi Fang, Yiwei Zhang, Qibin Sun, Siqi Ma, Elisa Bertino, Kangjie Lu, and Juanru Li. 2022. Goshawk: Hunting Memory Corruptions via Structure- Aware and Object-Centric Memory Operation Synopsis. In43rd IEEE Symposium on Security and Privacy, SP 2022, San Francisco, CA, USA, May 22-26, 2022. IEEE, 2096–2113. doi:10.1109/SP46214.2022.9833613

  46. [46]

    Lezhi Ma, Shangqing Liu, Yi Li, Xiaofei Xie, and Lei Bu. 2025. Specgen: Automated generation of formal program specifications via large language models. In2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE). IEEE, 16–28

  47. [47]

    MITRE. 2026. CWE-401: Missing Release of Memory after Effective Lifetime (4.19.1). https://cwe.mitre.org/data/definitions/401.html. [Accessed 23-03-2026]

  48. [48]

    Aniruddhan Murali, Mahmoud Alfadel, Meiyappan Nagappan, Meng Xu, and Chengnian Sun. 2024. AddressWatcher: Sanitizer-Based Localization of Memory Leak Fixes.IEEE Transactions on Software Engineering50, 9 (2024), 2398–2411. doi:10.1109/TSE.2024.3438119

  49. [49]

    Darragh Murphy. 2025. This hidden Windows 11 setting might be quietly draining your RAM. https://tech.yahoo.com/computing/articles/hidden-windows-11- setting-might-121538216.html. [Accessed 23-03-2026]

  50. [50]

    Tukaram Muske and Alexander Serebrenik. 2016. Survey of approaches for handling static analysis alarms. In2016 IEEE 16th International Working Conference on Source Code Analysis and Manipulation (SCAM). IEEE, 157–166

  51. [51]

    Tukaram Muske and Alexander Serebrenik. 2022. Survey of Approaches for Postprocessing of Static Analysis Alarms.ACM Comput. Surv.55, 3, Article 48 (Feb. 2022), 39 pages. doi:10.1145/3494521

  52. [52]

    Nicholas Nethercote and Julian Seward. 2007. Valgrind: a framework for heavy- weight dynamic binary instrumentation.ACM Sigplan notices42, 6 (2007), 89–100

  53. [53]

    OpenCVE. 2026. CVEs and Security Vulnerabilities. https://app.opencve.io/cve/ ?weakness=CWE-401. [Accessed 23-03-2026]

  54. [54]

    OpenSSL. 2026. openssl/openssl: TLS/SSL and crypto library. https://github.com/ openssl/openssl. [Accessed 23-03-2026]

  55. [55]

    Accessed 2026

    FreeRDP Project. Accessed 2026. FreeRDP: A Remote Desktop Protocol Imple- mentation. https://www.freerdp.com

  56. [56]

    Accessed 2026

    LLVM Project. Accessed 2026. libFuzzer: a library for coverage-guided fuzz testing. https://llvm.org/docs/LibFuzzer.html

  57. [57]

    Rapid7. 2025. MongoBleed CVE-2025-1484: Critical Memory Leak in MongoDB Allowing Attackers to Extract Sensitive Data. https: //www.rapid7.com/blog/post/etr-mongobleed-cve-2025-1484-critical-memory- leak-in-mongodb-allowing-attackers-to-extract-sensitive-data/. [Accessed 23-03-2026]

  58. [58]

    Suman Saha, Jean-Pierre Lozi, Gaël Thomas, Julia L Lawall, and Gilles Muller

  59. [59]

    In2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)

    Hector: Detecting resource-release omission faults in error-handling code for systems software. In2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). IEEE, 1–12

  60. [60]

    Accessed 2026

    Semgrep Inc. Accessed 2026. Semgrep. https://semgrep.dev

  61. [61]

    Kostya Serebryany. 2017. OSS-Fuzz - Google’s continuous fuzzing service for open source software. USENIX Association, Vancouver, BC

  62. [62]

    Konstantin Serebryany, Derek Bruening, Alexander Potapenko, and Dmitry Vyukov. 2012. AddressSanitizer: a fast address sanity checker. InProceedings of the 2012 USENIX Conference on Annual Technical Conference(Boston, MA) (USENIX ATC’12). USENIX Association, USA, 28

  63. [63]

    Ekaterina Shemetova, Ivan Smirnov, Anton Alekseev, Ilya Shenbin, Alexey Rukhovich, Sergey Nikolenko, Vadim Lomshakov, and Irina Piontkovskaya. 2025. LAMeD: LLM-generated Annotations for Memory Leak Detection. InProceedings of the 29th International Conference on Evaluation and Assessment in Software Engineering (EASE ’25). Association for Computing Machin...

  64. [64]

    Jieke Shi, Zhou Yang, and David Lo. 2025. Efficient and Green Large Language Models for Software Engineering: Literature Review, Vision, and the Road Ahead. ACM Trans. Softw. Eng. Methodol.34, 5, Article 137 (May 2025), 22 pages. doi:10. 1145/3708525

  65. [65]

    Qingkai Shi, Xiao Xiao, Rongxin Wu, Jinguo Zhou, Gang Fan, and Charles Zhang

  66. [66]

    InProceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation

    Pinpoint: Fast and precise sparse value flow analysis for million lines of code. InProceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation. 693–706

  67. [67]

    Yulei Sui, Ding Ye, and Jingling Xue. 2014. Detecting Memory Leaks Statically with Full-Sparse Value-Flow Analysis.IEEE Trans. Software Eng.40, 2 (2014), 107–122. doi:10.1109/TSE.2014.2302311

  68. [68]

    George Tsigkourakos and Constantinos Patsakis. 2026. QRS: A Rule-Synthesizing Neuro-Symbolic Triad for Autonomous Vulnerability Discovery.arXiv preprint arXiv:2602.09774(2026)

  69. [69]

    Anthony J Viera, Joanne M Garrett, et al . 2005. Understanding interobserver agreement: the kappa statistic.Fam med37, 5 (2005), 360–363

  70. [70]

    Jianqiang Wang, Siqi Ma, Yuanyuan Zhang, Juanru Li, Zheyu Ma, Long Mai, Tiancheng Chen, and Dawu Gu. 2019. NLP-EYE: Detecting Memory Cor- ruptions via Semantic-Aware Memory Operation Function Identification. In 22nd International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2019). USENIX Association, Chaoyang District, Beijing, 309–321....

  71. [71]

    Cheng Wen, Jialun Cao, Jie Su, Zhiwu Xu, Shengchao Qin, Mengda He, Haokun Li, Shing-Chi Cheung, and Cong Tian. 2024. Enchanting program specification synthesis by large language models using static analysis and program verification. InInternational Conference on Computer Aided Verification. Springer, 302–328

  72. [72]

    Cheng Wen, Haijun Wang, Yuekang Li, Shengchao Qin, Yang Liu, Zhiwu Xu, Hongxu Chen, Xiaofei Xie, Geguang Pu, and Ting Liu. 2020. MemLock: memory usage guided fuzzing. InProceedings of the ACM/IEEE 42nd International Confer- ence on Software Engineering(Seoul, South Korea)(ICSE ’20). Association for Com- puting Machinery, New York, NY, USA, 765–777. doi:10...

  73. [73]

    Yichen Xie and Alex Aiken. 2005. Context-and path-sensitive memory leak detection. InProceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering. 115–125

  74. [74]

    Wen Xu, Sanidhya Kashyap, Changwoo Min, and Taesoo Kim. 2017. Designing New Operating Primitives to Improve Fuzzing Performance. InProceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security(Dallas, Texas, USA)(CCS ’17). Association for Computing Machinery, New York, NY, USA, 2313–2328. doi:10.1145/3133956.3134046

  75. [75]

    Duo Zhang, Om Rameshwar Gatla, Wei Xu, and Mai Zheng. 2021. A study of persistent memory bugs in the Linux kernel. InProceedings of the 14th ACM International Conference on Systems and Storage(Haifa, Israel)(SYSTOR ’21). Association for Computing Machinery, New York, NY, USA, Article 6, 6 pages. doi:10.1145/3456727.3463783

  76. [76]

    Yuntong Zhang, Haifeng Ruan, Zhiyu Fan, and Abhik Roychoudhury. 2024. Au- tocoderover: Autonomous program improvement. InProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis. 1592–1604

  77. [77]

    Ziyao Zhang, Chong Wang, Yanlin Wang, Ensheng Shi, Yuchi Ma, Wanjun Zhong, Jiachi Chen, Mingzhi Mao, and Zibin Zheng. 2025. LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation.Proc. ACM Softw. Eng.2, ISSTA, Article ISSTA022 (June 2025), 23 pages. doi:10.1145/3728894 12