arxiv: 2603.27224 · v3 · submitted 2026-03-28 · 💻 cs.SE · cs.CR

Recognition: no theorem link

Finding Memory Leaks in C/C++ Programs via Neuro-Symbolic Augmented Static Analysis

Huihui Huang , Jieke Shi , Bo Wang , Zhou Yang , David Lo

Authors on Pith no claims yet

Pith reviewed 2026-05-14 22:30 UTC · model grok-4.3

classification 💻 cs.SE cs.CR

keywords memory leaksstatic analysisC/C++LLMZ3CodeQLbug detectionneuro-symbolic

0 comments

The pith

MemHint augments static analyzers with LLMs and Z3 to detect 52 memory leaks in 3.4 million lines of C/C++ code.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents MemHint, a neuro-symbolic system that improves static analysis for memory leaks in C and C++ programs. It uses large language models to classify custom memory allocation and deallocation functions beyond standard primitives and to record ownership details for arguments or return values. Z3 then validates each classification against the function's control-flow graph and later filters warnings to infeasible paths only. The validated summaries are injected into CodeQL and Infer via their extension points, after which an LLM confirms which remaining warnings represent genuine leaks. On seven real projects totaling over 3.4 million lines, the pipeline produced 52 unique detections, 49 of which were confirmed and fixed, at roughly $1.7 per bug.

Core claim

MemHint parses a target codebase, applies an LLM to label each function as allocator, deallocator or neither while producing ownership summaries, discards any summary whose claimed operation cannot occur on a feasible path according to Z3, injects the surviving summaries into CodeQL and Infer, uses Z3 to drop warnings on infeasible paths, and runs a final LLM check to retain only genuine bugs.

What carries the argument

Neuro-symbolic pipeline that uses LLM classification of custom memory functions to produce ownership summaries and Z3 to validate reachability and filter infeasible paths.

Load-bearing premise

The LLM correctly identifies which functions perform memory operations and what carries ownership, and Z3 accurately determines which paths are feasible.

What would settle it

A codebase containing many project-specific allocators where the LLM misclassifies them would cause MemHint to report no more leaks than unaugmented CodeQL or Infer.

Figures

Figures reproduced from arXiv: 2603.27224 by Bo Wang, David Lo, Huihui Huang, Jieke Shi, Zhou Yang.

**Figure 1.** Figure 1: A memory leak in OpenSSL (v3.6.1) [5 lll➊hh f igure 1: A memory leak in OpenSSL (v3.6.1) [5 Figure 1: A memory leak in OpenSSL (v3.6.1) [ [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 3.** Figure 3: Overview of the MemHint pipeline. Stage 1 (Summary Generation): extract metadata, generate summaries with an LLM, and validate with Z3 SMT solver. Stage 2 (Summary-Augmented Analysis): inject validated summaries into static analyzers (e.g., CodeQL, Infer). Stage 3 (Warning Validation): filter infeasible warnings with Z3 and validate remaining ones using an LLM. (a) Valid Allocator Entry 𝑏1 Alloc Ret null R… view at source ↗

**Figure 4.** Figure 4: Z3-based Summary Validation. (a)–(b) Allocator: (a) [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Z3-based memory-leak feasibility checking Figure 5: Z3-based memory-leak feasibility checking [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

read the original abstract

Memory leaks remain prevalent in real-world C/C++ software. Static analyzers such as CodeQL provide scalable program analysis but frequently miss such bugs because they cannot recognize project-specific custom memory-management functions and lack path-sensitive control-flow modeling. We present MemHint, a neuro-symbolic pipeline that addresses both limitations by combining LLMs' semantic understanding of code with Z3-based symbolic reasoning. MemHint parses the target codebase and applies an LLM to classify each function as a memory allocator, deallocator, or neither, producing function summaries that record which argument or return value carries memory ownership, extending the analyzer's built-in knowledge beyond standard primitives such as malloc and free. A Z3-based validation step checks each summary against the function's control-flow graph, discarding those whose claimed memory operation is unreachable on any feasible path. The validated summaries are injected into CodeQL and Infer via their respective extension mechanisms. Z3 path feasibility filtering then eliminates warnings on infeasible paths, and a final LLM-based validation step confirms whether each remaining warning is a genuine bug. On seven real-world C/C++ projects totaling over 3.4M lines of code, MemHint detects 52 unique memory leaks (49 confirmed/fixed, 4 CVEs submitted) at approximately $1.7 per detected bug, compared to 19 by vanilla CodeQL and 3 by vanilla Infer.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MemHint augments CodeQL and Infer with LLM-classified custom allocators plus Z3 reachability checks and finds more real leaks on large C/C++ codebases, but the LLM step has no reported accuracy numbers.

read the letter

MemHint uses an LLM to label project-specific memory functions as allocators or deallocators, records ownership in summaries, runs Z3 to drop unreachable ones, injects the summaries into CodeQL and Infer, and adds path filtering before a final LLM check on warnings. On seven projects totaling 3.4M lines it reports 52 unique leaks against 19 for plain CodeQL and 3 for plain Infer, with 49 confirmed or fixed and four CVEs at roughly $1.70 per bug. That is the concrete advance: a working hybrid that extends existing analyzers without rewriting them. The evaluation on real code and the confirmation of bugs are the parts that land. The Z3 filter is a straightforward way to prune bad summaries before they reach the analyzer. The soft spot is exactly the one the stress-test flags. The abstract gives no precision, recall, or manual audit for the LLM classifications, and Z3 only confirms reachability of the claimed operation, not whether the label itself is semantically correct. Without an ablation or sampled validation it is hard to tell how much of the gain comes from accurate classifications versus other factors or luck on these particular projects. The method could still be useful in practice, but the central claim rests on an unmeasured step. This is for tool builders and practitioners who maintain or extend static analyzers for C/C++. A reader who wants a concrete example of neuro-symbolic augmentation on real code will get value from the pipeline and the numbers. It deserves peer review because the results are grounded in substantial external codebases and include independent bug confirmation, even though the LLM evaluation needs more detail.

Referee Report

3 major / 3 minor

Summary. The paper introduces MemHint, a neuro-symbolic pipeline that uses an LLM to classify custom C/C++ memory-management functions (allocators/deallocators) and produce ownership summaries, validates them with Z3 reachability checks on the CFG, injects the summaries into CodeQL and Infer, applies Z3 path-feasibility filtering, and uses a final LLM step to confirm warnings. On seven real-world projects (>3.4 MLOC) it reports 52 unique leaks (49 confirmed/fixed, 4 CVEs) versus 19 for vanilla CodeQL and 3 for vanilla Infer, at roughly $1.7 per detected bug.

Significance. If the LLM classifications prove reliable, the work shows a practical, low-cost route to extending scalable static analyzers to project-specific APIs by combining semantic understanding with symbolic validation. The scale of the evaluation corpus, the concrete bug counts, independent confirmations, and cost metric are concrete strengths that would be of interest to the static-analysis and software-engineering communities.

major comments (3)

[§3] §3 (LLM-based classification pipeline): No precision, recall, or sampled manual-audit numbers are reported for the LLM's labeling of custom allocators, deallocators, and ownership semantics on the 3.4 MLOC corpus. Because the generated summaries are injected directly into CodeQL and Infer, any systematic misclassification would inflate the reported 52 detections; the Z3 step only checks reachability of the claimed operation, not semantic correctness of the label.
[§4] §4 (Experimental evaluation): The comparison with vanilla CodeQL and Infer lacks an ablation that removes either the LLM summaries or the Z3 filtering steps, so the individual contribution of each component to the jump from 19/3 to 52 leaks cannot be quantified. In addition, no false-negative analysis or sampling of missed leaks is provided.
[§4] §4 (Bug confirmation): The claim that 49 of the 52 leaks were independently confirmed/fixed is load-bearing for the central improvement claim, yet the manuscript gives no protocol for the confirmation process, inter-annotator agreement, or how false positives were ruled out.

minor comments (3)

[§4] The cost figure of approximately $1.7 per bug should be accompanied by an explicit breakdown (LLM API calls, Z3 queries, etc.) in the evaluation section.
Clarify the exact extension mechanisms used to inject the validated summaries into CodeQL and Infer (e.g., which predicates or taint rules are overridden).
A small number of typos and inconsistent capitalization appear in the abstract and section headings; a light copy-edit pass is recommended.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and indicate the planned revisions.

read point-by-point responses

Referee: [§3] §3 (LLM-based classification pipeline): No precision, recall, or sampled manual-audit numbers are reported for the LLM's labeling of custom allocators, deallocators, and ownership semantics on the 3.4 MLOC corpus. Because the generated summaries are injected directly into CodeQL and Infer, any systematic misclassification would inflate the reported 52 detections; the Z3 step only checks reachability of the claimed operation, not semantic correctness of the label.

Authors: We agree that quantitative metrics for the LLM classification step are missing and would strengthen the evaluation. In the revision we will add a manual audit of a random sample of 100 functions drawn from the corpus, reporting precision, recall, and F1 scores separately for allocator/deallocator identification and for ownership-summary accuracy. While the Z3 reachability check discards summaries whose claimed operations are unreachable, we acknowledge it does not verify semantic correctness; the added audit will quantify any misclassification rate. The high rate of independently confirmed bugs (49/52) supplies supporting end-to-end evidence, but we will make the classification reliability explicit. revision: yes
Referee: [§4] §4 (Experimental evaluation): The comparison with vanilla CodeQL and Infer lacks an ablation that removes either the LLM summaries or the Z3 filtering steps, so the individual contribution of each component to the jump from 19/3 to 52 leaks cannot be quantified. In addition, no false-negative analysis or sampling of missed leaks is provided.

Authors: We concur that an ablation study is needed to isolate component contributions. We will add results in §4 for three configurations on the same seven projects: (i) vanilla CodeQL/Infer, (ii) augmented with LLM summaries only, and (iii) full pipeline with both LLM summaries and Z3 filtering. This will quantify the incremental gains. For false negatives we will sample 10 % of functions in the largest project, manually inspect for missed leaks, and report an estimated recall; we note that exhaustive ground truth is unavailable, but sampling provides a practical bound. revision: yes
Referee: [§4] §4 (Bug confirmation): The claim that 49 of the 52 leaks were independently confirmed/fixed is load-bearing for the central improvement claim, yet the manuscript gives no protocol for the confirmation process, inter-annotator agreement, or how false positives were ruled out.

Authors: We will expand §4 with a precise confirmation protocol: each of the 52 warnings was reviewed independently by two authors; disagreements were resolved by joint discussion until consensus. We will report the resulting inter-annotator agreement percentage. False positives were ruled out by (a) confirming via Z3 that no deallocation occurs on any feasible path and (b) manual inspection of the call graph and ownership flow. Where possible, confirmation was further supported by submitted patches or CVEs. These details will be added to the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical results on external projects with independent validation

full rationale

The paper describes a neuro-symbolic pipeline evaluated on seven external real-world C/C++ codebases (3.4M LOC total). Claims rest on direct comparisons to unmodified CodeQL and Infer baselines plus manual confirmation of detected bugs (49/52 fixed, CVEs submitted). No mathematical derivation chain, no fitted parameters renamed as predictions, no self-citations invoked as load-bearing uniqueness theorems, and no ansatz or renaming of known results. The LLM classification and Z3 filtering steps are tool components whose correctness is assessed via external outcomes rather than by construction from the same inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach rests on standard assumptions about LLM semantic understanding and solver correctness rather than new fitted parameters or invented entities.

axioms (2)

standard math Z3 solver accurately determines path feasibility in the control-flow graph
Invoked in the validation step to discard unreachable summaries.
domain assumption LLM can reliably classify functions as allocators, deallocators, or neither based on code semantics
Core step that extends the analyzer beyond built-in primitives.

pith-pipeline@v0.9.0 · 5552 in / 1392 out tokens · 44422 ms · 2026-05-14T22:30:44.761213+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

77 extracted references · 77 canonical work pages

[1]

Duck, and Abhik Roychoudhury

Jinsheng Ba, Gregory J. Duck, and Abhik Roychoudhury. 2023. Efficient Greybox Fuzzing to Detect Memory Errors. InProceedings of the 37th IEEE/ACM Inter- national Conference on Automated Software Engineering(Rochester, MI, USA) (ASE ’22). Association for Computing Machinery, New York, NY, USA, Article 37, 12 pages. doi:10.1145/3551349.3561161

work page doi:10.1145/3551349.3561161 2023
[2]

Max Brunsfeld and Tree sitter contributors. [n. d.]. Tree-sitter: An incremental parsing system for programming tools. https://tree-sitter.github.io/tree-sitter/. Accessed: 2026

work page 2026
[3]

Cristian Cadar, Daniel Dunbar, Dawson R Engler, et al. 2008. Klee: unassisted and automatic generation of high-coverage tests for complex systems programs.. InOSDI, Vol. 8. 209–224

work page 2008
[4]

Xi Chen, Asia Slowinska, and Herbert Bos. 2013. MemBrush: A practical tool to detect custom memory allocators in C binaries. In2013 20th Working Conference on Reverse Engineering (WCRE). 477–478. doi:10.1109/WCRE.2013.6671326

work page doi:10.1109/wcre.2013.6671326 2013
[5]

Sigmund Cherem, Lonnie Princehouse, and Radu Rugina. 2007. Practical memory leak detection using guarded value-flow analysis. InProceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation. 480– 491

work page 2007
[6]

Accessed 2026

Clang. Accessed 2026. LeakSanitizer: a run-time memory leak detector. https: //clang.llvm.org/docs/LeakSanitizer.html

work page 2026
[7]

William G. Cochran. 1977.Sampling Techniques, 3rd Edition. John Wiley

work page 1977
[8]

Leonardo De Moura and Nikolaj Bjørner. 2008. Z3: an efficient SMT solver. In Proceedings of the Theory and Practice of Software, 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems(Budapest, Hungary)(TACAS’08/ETAPS’08). Springer-Verlag, Berlin, Heidelberg, 337–340

work page 2008
[9]

Guilherme Otávio de Sena and Rivalino Matias. 2018. A Systematic Map- ping Review of Memory Leak Detection Techniques. In2018 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW). 264–270. doi:10.1109/ISSREW.2018.00017

work page doi:10.1109/issrew.2018.00017 2018
[10]

Dino Distefano, Manuel Fähndrich, Francesco Logozzo, and Peter W. O’Hearn

work page
[11]

ACM62, 8 (July 2019), 62–70

Scaling static analyses at Facebook.Commun. ACM62, 8 (July 2019), 62–70. doi:10.1145/3338112

work page doi:10.1145/3338112 2019
[12]

Xueying Du, Jiayi Feng, Yi Zou, Wei Xu, Jie Ma, Wei Zhang, Sisi Liu, Xin Peng, and Yiling Lou. 2026. Reducing False Positives in Static Bug Detection with LLMs: An Empirical Study in Industry.arXiv preprint arXiv:2601.18844(2026)

work page arXiv 2026
[13]

Navid Emamdoost, Qiushi Wu, Kangjie Lu, and Stephen McCamant. 2021. Detect- ing kernel memory leaks in specialized modules with ownership reasoning. In The 2021 Annual Network and Distributed System Security Symposium (NDSS’21)

work page 2021
[14]

Facebook. 2026. Infer Static Analyzer. https://fbinfer.com. [Accessed 23-03-2026]

work page 2026
[15]

Gang Fan, Rongxin Wu, Qingkai Shi, Xiao Xiao, Jinguo Zhou, and Charles Zhang

work page
[16]

In2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)

SMOKE: Scalable Path-Sensitive Memory Leak Detection for Millions of Lines of Code. In2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). 72–82. doi:10.1109/ICSE.2019.00025

work page doi:10.1109/icse.2019.00025 2019
[17]

Zhiyu Fan, Xiang Gao, Martin Mirchev, Abhik Roychoudhury, and Shin Hwei Tan. 2023. Automated repair of programs from large language models. In2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 1469–1481

work page 2023
[18]

Qing Gao, Yingfei Xiong, Yaqing Mi, Lu Zhang, Weikun Yang, Zhaoping Zhou, Bing Xie, and Hong Mei. 2015. Safe memory-leak fixing for c programs. In2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 1. IEEE, 459–470

work page 2015
[19]

GitHub. 2026. CodeQL. https://codeql.github.com/. [Accessed 23-03-2026]

work page 2026
[20]

GitHub. 2026. GitHub Actions workflow security analysis with CodeQL is now generally available. https://github.blog/changelog/2025-04-22-github-actions- workflow-security-analysis-with-codeql-is-now-generally-available/. [Ac- cessed 23-03-2026]

work page 2026
[21]

Jinyao Guo, Chengpeng Wang, Xiangzhe Xu, Zian Su, and Xiangyu Zhang. 2025. Repoaudit: An autonomous llm-agent for repository-level code auditing.arXiv preprint arXiv:2501.18160(2025)

work page arXiv 2025
[22]

Zhaoqiang Guo, Tingting Tan, Shiran Liu, Xutong Liu, Wei Lai, Yibiao Yang, Yanhui Li, Lin Chen, Wei Dong, and Yuming Zhou. 2023. Mitigating false pos- itive static analysis warnings: Progress, challenges, and opportunities.IEEE Transactions on Software Engineering49, 12 (2023), 5154–5188

work page 2023
[23]

Wookhyun Han, Byunggill Joe, Byoungyoung Lee, Chengyu Song, and Insik Shin

work page
[24]

InNetwork and Distributed Systems Security (NDSS) Symposium 2018

Enhancing memory error detection for large-scale applications and fuzz testing. InNetwork and Distributed Systems Security (NDSS) Symposium 2018

work page 2018
[25]

Reed Hastings. 1992. Purify: Fast detection of memory leaks and access errors. InProceedings of the USENIX Winter’92 Conference. 125–136

work page 1992
[26]

David L Heine and Monica S Lam. 2003. A practical flow-sensitive and context- sensitive C and C++ memory leak detector. InProceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation. 168–181

work page 2003
[27]

Xinyi Hou, Yanjie Zhao, Yue Liu, Zhou Yang, Kailong Wang, Li Li, Xiapu Luo, David Lo, John Grundy, and Haoyu Wang. 2024. Large Language Models for Software Engineering: A Systematic Literature Review.ACM Trans. Softw. Eng. Methodol.33, 8, Article 220 (Dec. 2024), 79 pages. doi:10.1145/3695988

work page doi:10.1145/3695988 2024
[28]

Huimin Hu, Yingying Wang, Julia Rubin, and Michael Pradel. 2025. An Empirical Study of Suppressed Static Analysis Warnings.Proc. ACM Softw. Eng.2, FSE, Article FSE014 (June 2025), 22 pages. doi:10.1145/3715729

work page doi:10.1145/3715729 2025
[29]

Mohsen Iranmanesh, Sina Moradi Sabet, Sina Marefat, Ali Javidi Ghasr, Allison Wilson, Iman Sharafaldin, and Mohammad A Tayebi. 2025. ZeroFalse: Improving Precision in Static Analysis with LLMs.arXiv preprint arXiv:2510.02534(2025)

work page arXiv 2025
[30]

Nihal Jain, Robert Kwiatkowski, Baishakhi Ray, Murali Krishna Ramanathan, and Varun Kumar. 2025. On Mitigating Code LLM Hallucinations with API Documentation. In2025 IEEE/ACM 47th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). 237–248. doi:10.1109/ ICSE-SEIP66354.2025.00027

work page arXiv 2025
[31]

Juyong Jiang, Fan Wang, Jiasi Shen, Sungju Kim, and Sunghun Kim. 2026. A Survey on Large Language Models for Code Generation.ACM Trans. Softw. Eng. Methodol.35, 2, Article 58 (Jan. 2026), 72 pages. doi:10.1145/3747588

work page doi:10.1145/3747588 2026
[32]

Changhee Jung, Sangho Lee, Easwaran Raman, and Santosh Pande. 2014. Au- tomated memory leak detection for production use. InProceedings of the 36th International Conference on Software Engineering. 825–836

work page 2014
[33]

Yungbum Jung and Kwangkeun Yi. 2008. Practical memory leak detector based on parameterized procedural summaries. InProceedings of the 7th international symposium on Memory management. 131–140

work page 2008
[34]

Hong Jin Kang, Khai Loong Aw, and David Lo. 2022. Detecting false alarms from automatic static analysis tools: how far are we?. InProceedings of the 44th International Conference on Software Engineering(Pittsburgh, Pennsylvania)(ICSE ’22). Association for Computing Machinery, New York, NY, USA, 698–709. doi:10. 1145/3510003.3510214

work page arXiv 2022
[35]

Giyeol Kim, Dohyun Ryu, Seungjin Bae, Changyul Lee, and Taegyu Kim. 2025. Fuzzing Acceleration for Memory Safety Bug Discovery with Slicer. In2025 IEEE Annual Computer Security Applications Conference (ACSAC). 46–59. doi:10.1109/ ACSAC67867.2025.00020

work page arXiv 2025
[36]

Claire Le Goues, Michael Pradel, and Abhik Roychoudhury. 2019. Automated program repair.Commun. ACM62, 12 (2019), 56–65

work page 2019
[37]

Haonan Li, Hang Zhang, Kexin Pei, and Zhiyun Qian. 2025. Towards More Accurate Static Analysis for Taint-Style Bug Detection in Linux Kernel. In2025 40th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 380–392

work page 2025
[38]

Kaixuan Li, Sen Chen, Lingling Fan, Ruitao Feng, Han Liu, Chengwei Liu, Yang Liu, and Yixiang Chen. 2023. Comparison and evaluation on static application security testing (sast) tools for java. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 921–933

work page 2023
[39]

Wen Li, Haipeng Cai, Yulei Sui, and David Manz. 2020. PCA: memory leak detection using partial call-path analysis. InProceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1621–1625

work page 2020
[40]

Ziyang Li, Saikat Dutta, and Mayur Naik. 2024. IRIS: LLM-assisted static analysis for detecting security vulnerabilities.arXiv preprint arXiv:2405.17238(2024). 11 Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Huihui Huang, Jieke Shi, Bo Wang, Zhou Yang, and David Lo

work page arXiv 2024
[41]

Hongliang Liang, Luming Yin, Guohao Wu, Yuxiang Li, Qiuping Yi, and Lei Wang. 2025. LeakGuard: Detecting Memory Leaks Accurately and Scalably. arXiv preprint arXiv:2504.04422(2025)

work page arXiv 2025
[42]

Huqiu Liu, Yuping Wang, Lingbo Jiang, and Shimin Hu. 2014. PF-Miner: A new paired functions mining method for Android kernel in error paths. In2014 IEEE 38th Annual Computer Software and Applications Conference. IEEE, 33–42

work page 2014
[43]

Hu-Qiu Liu, Jia-Ju Bai, Yu-Ping Wang, Zhe Bian, and Shi-Min Hu. 2015. Pairminer: mining for paired functions in Kernel extensions. In2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 93–101

work page 2015
[44]

LLVM Project. 2026. Clang Static Analyzer. https://clang-analyzer.llvm.org/. [Accessed 23-03-2026]

work page 2026
[45]

Yunlong Lyu, Yi Fang, Yiwei Zhang, Qibin Sun, Siqi Ma, Elisa Bertino, Kangjie Lu, and Juanru Li. 2022. Goshawk: Hunting Memory Corruptions via Structure- Aware and Object-Centric Memory Operation Synopsis. In43rd IEEE Symposium on Security and Privacy, SP 2022, San Francisco, CA, USA, May 22-26, 2022. IEEE, 2096–2113. doi:10.1109/SP46214.2022.9833613

work page doi:10.1109/sp46214.2022.9833613 2022
[46]

Lezhi Ma, Shangqing Liu, Yi Li, Xiaofei Xie, and Lei Bu. 2025. Specgen: Automated generation of formal program specifications via large language models. In2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE). IEEE, 16–28

work page 2025
[47]

MITRE. 2026. CWE-401: Missing Release of Memory after Effective Lifetime (4.19.1). https://cwe.mitre.org/data/definitions/401.html. [Accessed 23-03-2026]

work page 2026
[48]

Aniruddhan Murali, Mahmoud Alfadel, Meiyappan Nagappan, Meng Xu, and Chengnian Sun. 2024. AddressWatcher: Sanitizer-Based Localization of Memory Leak Fixes.IEEE Transactions on Software Engineering50, 9 (2024), 2398–2411. doi:10.1109/TSE.2024.3438119

work page doi:10.1109/tse.2024.3438119 2024
[49]

Darragh Murphy. 2025. This hidden Windows 11 setting might be quietly draining your RAM. https://tech.yahoo.com/computing/articles/hidden-windows-11- setting-might-121538216.html. [Accessed 23-03-2026]

work page 2025
[50]

Tukaram Muske and Alexander Serebrenik. 2016. Survey of approaches for handling static analysis alarms. In2016 IEEE 16th International Working Conference on Source Code Analysis and Manipulation (SCAM). IEEE, 157–166

work page 2016
[51]

Tukaram Muske and Alexander Serebrenik. 2022. Survey of Approaches for Postprocessing of Static Analysis Alarms.ACM Comput. Surv.55, 3, Article 48 (Feb. 2022), 39 pages. doi:10.1145/3494521

work page doi:10.1145/3494521 2022
[52]

Nicholas Nethercote and Julian Seward. 2007. Valgrind: a framework for heavy- weight dynamic binary instrumentation.ACM Sigplan notices42, 6 (2007), 89–100

work page 2007
[53]

OpenCVE. 2026. CVEs and Security Vulnerabilities. https://app.opencve.io/cve/ ?weakness=CWE-401. [Accessed 23-03-2026]

work page 2026
[54]

OpenSSL. 2026. openssl/openssl: TLS/SSL and crypto library. https://github.com/ openssl/openssl. [Accessed 23-03-2026]

work page 2026
[55]

Accessed 2026

FreeRDP Project. Accessed 2026. FreeRDP: A Remote Desktop Protocol Imple- mentation. https://www.freerdp.com

work page 2026
[56]

Accessed 2026

LLVM Project. Accessed 2026. libFuzzer: a library for coverage-guided fuzz testing. https://llvm.org/docs/LibFuzzer.html

work page 2026
[57]

Rapid7. 2025. MongoBleed CVE-2025-1484: Critical Memory Leak in MongoDB Allowing Attackers to Extract Sensitive Data. https: //www.rapid7.com/blog/post/etr-mongobleed-cve-2025-1484-critical-memory- leak-in-mongodb-allowing-attackers-to-extract-sensitive-data/. [Accessed 23-03-2026]

work page 2025
[58]

Suman Saha, Jean-Pierre Lozi, Gaël Thomas, Julia L Lawall, and Gilles Muller

work page
[59]

In2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)

Hector: Detecting resource-release omission faults in error-handling code for systems software. In2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). IEEE, 1–12

work page
[60]

Accessed 2026

Semgrep Inc. Accessed 2026. Semgrep. https://semgrep.dev

work page 2026
[61]

Kostya Serebryany. 2017. OSS-Fuzz - Google’s continuous fuzzing service for open source software. USENIX Association, Vancouver, BC

work page 2017
[62]

Konstantin Serebryany, Derek Bruening, Alexander Potapenko, and Dmitry Vyukov. 2012. AddressSanitizer: a fast address sanity checker. InProceedings of the 2012 USENIX Conference on Annual Technical Conference(Boston, MA) (USENIX ATC’12). USENIX Association, USA, 28

work page 2012
[63]

Ekaterina Shemetova, Ivan Smirnov, Anton Alekseev, Ilya Shenbin, Alexey Rukhovich, Sergey Nikolenko, Vadim Lomshakov, and Irina Piontkovskaya. 2025. LAMeD: LLM-generated Annotations for Memory Leak Detection. InProceedings of the 29th International Conference on Evaluation and Assessment in Software Engineering (EASE ’25). Association for Computing Machin...

work page doi:10.1145/3756681.3756999 2025
[64]

Jieke Shi, Zhou Yang, and David Lo. 2025. Efficient and Green Large Language Models for Software Engineering: Literature Review, Vision, and the Road Ahead. ACM Trans. Softw. Eng. Methodol.34, 5, Article 137 (May 2025), 22 pages. doi:10. 1145/3708525

work page 2025
[65]

Qingkai Shi, Xiao Xiao, Rongxin Wu, Jinguo Zhou, Gang Fan, and Charles Zhang

work page
[66]

InProceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation

Pinpoint: Fast and precise sparse value flow analysis for million lines of code. InProceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation. 693–706

work page
[67]

Yulei Sui, Ding Ye, and Jingling Xue. 2014. Detecting Memory Leaks Statically with Full-Sparse Value-Flow Analysis.IEEE Trans. Software Eng.40, 2 (2014), 107–122. doi:10.1109/TSE.2014.2302311

work page doi:10.1109/tse.2014.2302311 2014
[68]

George Tsigkourakos and Constantinos Patsakis. 2026. QRS: A Rule-Synthesizing Neuro-Symbolic Triad for Autonomous Vulnerability Discovery.arXiv preprint arXiv:2602.09774(2026)

work page arXiv 2026
[69]

Anthony J Viera, Joanne M Garrett, et al . 2005. Understanding interobserver agreement: the kappa statistic.Fam med37, 5 (2005), 360–363

work page 2005
[70]

Jianqiang Wang, Siqi Ma, Yuanyuan Zhang, Juanru Li, Zheyu Ma, Long Mai, Tiancheng Chen, and Dawu Gu. 2019. NLP-EYE: Detecting Memory Cor- ruptions via Semantic-Aware Memory Operation Function Identification. In 22nd International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2019). USENIX Association, Chaoyang District, Beijing, 309–321....

work page 2019
[71]

Cheng Wen, Jialun Cao, Jie Su, Zhiwu Xu, Shengchao Qin, Mengda He, Haokun Li, Shing-Chi Cheung, and Cong Tian. 2024. Enchanting program specification synthesis by large language models using static analysis and program verification. InInternational Conference on Computer Aided Verification. Springer, 302–328

work page 2024
[72]

Cheng Wen, Haijun Wang, Yuekang Li, Shengchao Qin, Yang Liu, Zhiwu Xu, Hongxu Chen, Xiaofei Xie, Geguang Pu, and Ting Liu. 2020. MemLock: memory usage guided fuzzing. InProceedings of the ACM/IEEE 42nd International Confer- ence on Software Engineering(Seoul, South Korea)(ICSE ’20). Association for Com- puting Machinery, New York, NY, USA, 765–777. doi:10...

work page doi:10.1145/3377811.3380396 2020
[73]

Yichen Xie and Alex Aiken. 2005. Context-and path-sensitive memory leak detection. InProceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering. 115–125

work page 2005
[74]

Wen Xu, Sanidhya Kashyap, Changwoo Min, and Taesoo Kim. 2017. Designing New Operating Primitives to Improve Fuzzing Performance. InProceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security(Dallas, Texas, USA)(CCS ’17). Association for Computing Machinery, New York, NY, USA, 2313–2328. doi:10.1145/3133956.3134046

work page doi:10.1145/3133956.3134046 2017
[75]

Duo Zhang, Om Rameshwar Gatla, Wei Xu, and Mai Zheng. 2021. A study of persistent memory bugs in the Linux kernel. InProceedings of the 14th ACM International Conference on Systems and Storage(Haifa, Israel)(SYSTOR ’21). Association for Computing Machinery, New York, NY, USA, Article 6, 6 pages. doi:10.1145/3456727.3463783

work page doi:10.1145/3456727.3463783 2021
[76]

Yuntong Zhang, Haifeng Ruan, Zhiyu Fan, and Abhik Roychoudhury. 2024. Au- tocoderover: Autonomous program improvement. InProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis. 1592–1604

work page 2024
[77]

Ziyao Zhang, Chong Wang, Yanlin Wang, Ensheng Shi, Yuchi Ma, Wanjun Zhong, Jiachi Chen, Mingzhi Mao, and Zibin Zheng. 2025. LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation.Proc. ACM Softw. Eng.2, ISSTA, Article ISSTA022 (June 2025), 23 pages. doi:10.1145/3728894 12

work page doi:10.1145/3728894 2025