NESA: Relational Neuro-Symbolic Static Program Analysis

Chengpeng Wang; Jinyao Guo; Mingwei Zheng; Qingkai Shi; Wuqi Zhang; Xiangyu Zhang; Xuwei Liu; Yifei Gao

arxiv: 2412.14399 · v2 · submitted 2024-12-18 · 💻 cs.PL · cs.SE

NESA: Relational Neuro-Symbolic Static Program Analysis

Chengpeng Wang , Yifei Gao , Wuqi Zhang , Xuwei Liu , Jinyao Guo , Mingwei Zheng , Qingkai Shi , Xiangyu Zhang This is my paper

Pith reviewed 2026-05-23 07:01 UTC · model grok-4.3

classification 💻 cs.PL cs.SE

keywords static program analysisneuro-symbolic analysislarge language modelstaint detectionprogram slicingbug detectionDatalog policy language

0 comments

The pith

NESA decomposes static program analysis into syntactic and semantic sub-problems so LLMs handle only the semantic parts with fewer hallucinations while matching or exceeding traditional tools.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents NESA as a way to perform static program analysis without compilation by letting users write policies that split a task into simpler pieces targeting small code snippets. Syntactic pieces are solved exactly by parsers; semantic pieces go to LLMs but only after the policy has narrowed the scope and after lazy incremental prompting has been applied. The claim is that this split keeps LLM errors low enough for the overall results to be at least as good as existing analyzers and sometimes better, as shown on program slicing, bug detection, and a taint task where F1 reached 0.72. The method also found thirteen real memory-leak bugs later fixed by developers. If the decomposition works as described, analysis becomes both customizable by non-experts and runnable directly on source without building the program first.

Core claim

NESA provides an analysis policy language, a restricted form of Datalog, that lets users decompose a static-analysis problem into sub-problems on smaller code snippets; syntactic sub-problems are solved by parsing and semantic sub-problems are solved by LLMs under lazy incremental prompting, yielding comparable or better precision and recall than conventional compiled analyzers on slicing and taint-detection benchmarks.

What carries the argument

The analysis policy language (restricted Datalog) together with lazy incremental prompting, which decomposes tasks so that LLMs receive only narrowed semantic questions on small snippets while syntactic facts are supplied exactly by parsers.

If this is right

Custom taint and slicing policies can be written and executed without any compiler or build system.
On the TaintBench benchmark the reported F1 of 0.72 exceeds the industrial baseline by 0.20.
Thirteen memory-leak bugs in real programs were located and later confirmed and fixed by their developers.
The same policy mechanism supports both program slicing and bug detection on standard benchmarks and real-world code.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same decomposition pattern could be tried on other LLM-based code tasks such as refactoring or test generation where hallucinations are currently high.
If policies can be written once and reused across projects, the cost of maintaining many specialized analyzers might drop.
Performance would likely improve further if stronger LLMs are substituted, because the policy already limits the scope of each LLM call.

Load-bearing premise

That splitting an analysis into smaller syntactic or semantic properties on limited code snippets will keep LLM hallucinations low enough for the combined results to stay accurate.

What would settle it

Run the same taint-detection policy on TaintBench with both NESA and the industrial baseline and measure whether NESA's F1 remains at least 0.20 higher and whether the LLM answers on the semantic sub-problems show markedly fewer contradictions than when the same questions are asked without the policy decomposition.

Figures

Figures reproduced from arXiv: 2412.14399 by Chengpeng Wang, Jinyao Guo, Mingwei Zheng, Qingkai Shi, Wuqi Zhang, Xiangyu Zhang, Xuwei Liu, Yifei Gao.

**Figure 2.** Figure 2: The workflow of LLMSA 2.2.1 Static Analysis via Prompting. Large language models (LLMs), advanced neural networks pre-trained upon a huge amount of data, have demonstrated remarkable performance in understanding program semantics [26–29], which suggests that they can be treated as a compiler or a static analyzer to interpret program semantics. For example, the users can provide the definitions of the progr… view at source ↗

**Figure 3.** Figure 3: The examples of analysis policy and neural relation specification. In the sub-figure (a), the symbolic, neural, and [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: The syntax of the analysis policy language [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: An analysis policy of intra-procedural XSS detection. The neural relations are in [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: The instantiation of rule semantics natural join on the non-neural relations and further projects the joined result upon the bounded terms in 𝑅𝑛, i.e., term(𝑅𝑛) ∩A, which yields a set of tuples 𝑇 . To form the tuples in 𝑅𝐼 , we only need to apply the constrained neural constructor 𝜏 (𝑅𝑛, A) to the tuples in 𝑇 , as any other tuples would make S(𝑅1) ⊲⊳ · · · S(𝑅𝑛−1) ⊲⊳ S(𝑅𝑛) empty. The strategy of lazy promp… view at source ↗

**Figure 7.** Figure 7: The improved inference rule for a neural relation [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

**Figure 8.** Figure 8: An example of incremental prompting in the evaluation of rule (4) of the analysis policy in Figure [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗

**Figure 9.** Figure 9: The average rounds of conducted and skipped prompting in program slicing and bug detection [PITH_FULL_IMAGE:figures/full_fig_p022_9.png] view at source ↗

read the original abstract

Static program analysis plays an essential role in program optimization, bug detection, and debugging. However, reliance on compilation and limited customization hinder its adoption in the real world. This paper presents a compositional neuro-symbolic approach named NESA that facilitates compilation-free and customizable static program analysis using large language models (LLMs) with mitigated hallucinations. Specifically, we propose an analysis policy language, a restricted form of Datalog, to support users decomposing a static program analysis problem into several sub-problems that target simpler syntactic or semantic properties upon smaller code snippets. The problem decomposition enables the LLMs to target more manageable semantic-related sub-problems with reduced hallucinations, while the syntactic ones are resolved by parsing-based analysis without hallucinations. An analysis policy then is evaluated with lazy and incremental prompting, which significantly mitigates the hallucinations and improves the performance. We evaluate NESA for program slicing and bug detection upon benchmark and real-world programs. Evaluation results show that while NESA supports compilation-free and customizable analysis, it can still achieve comparable and even better performance than existing techniques. In a customized taint vulnerability detection upon TaintBench, for example, NESA achieves a precision of 66.27%, a recall of 78.57%, and an F1 score of 0.72, surpassing an industrial approach by 0.20 in F1 score. NESA also detects 13 real-world memory leak bugs, which have been fixed by developers.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

NESA's Datalog policy plus lazy prompting combo is a practical engineering step for LLM static analysis, but the abstract gives no ablation to show the decomposition actually cuts hallucinations.

read the letter

The main takeaway is that NESA lets users write a restricted Datalog policy to split an analysis into syntactic parsing steps and smaller semantic LLM queries, then runs those queries lazily and incrementally. This is meant to keep the whole thing compilation-free while still hitting usable accuracy on taint detection and slicing. The reported numbers on TaintBench (F1 0.72, 0.20 above the industrial baseline) and the 13 real memory-leak bugs are the concrete evidence offered.

Referee Report

2 major / 1 minor

Summary. The paper proposes NESA, a compositional neuro-symbolic static analysis framework that introduces a restricted Datalog-style policy language allowing users to decompose analysis tasks into syntactic sub-problems (handled by parsers) and semantic sub-problems (handled by LLMs), evaluated via lazy incremental prompting. It claims this mitigates hallucinations, enables compilation-free customizable analysis, and yields strong results on program slicing and bug detection, including 66.27% precision / 78.57% recall / 0.72 F1 on customized taint detection on TaintBench (surpassing an industrial baseline by 0.20 F1) plus detection of 13 fixed real-world memory leaks.

Significance. If the decomposition mechanism reliably reduces hallucinations relative to direct LLM use, the work would offer a practical route to customizable, compilation-free analyses that combine symbolic precision on syntax with neural flexibility on semantics, with direct applicability to taint analysis and slicing.

major comments (2)

[Evaluation / abstract performance claims] The central claim that policy decomposition plus lazy incremental prompting 'significantly mitigates the hallucinations' (abstract) is load-bearing for both the novelty and the reported performance gains, yet the manuscript provides no ablation, no hallucination-rate measurements, and no head-to-head comparison of the same LLM on identical tasks with vs. without the Datalog policy decomposition.
[Evaluation section] Reported metrics (TaintBench F1 0.72, 13 real-world bugs) are presented without accompanying experimental protocol details, data splits, error analysis, or controls for post-hoc prompt tuning or benchmark idiosyncrasies, undermining confidence that the gains are attributable to the neuro-symbolic composition rather than external factors.

minor comments (1)

Notation for the policy language and the precise semantics of 'lazy incremental prompting' could be formalized more explicitly to aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our evaluation claims and experimental details. We address each major comment below and outline planned revisions.

read point-by-point responses

Referee: [Evaluation / abstract performance claims] The central claim that policy decomposition plus lazy incremental prompting 'significantly mitigates the hallucinations' (abstract) is load-bearing for both the novelty and the reported performance gains, yet the manuscript provides no ablation, no hallucination-rate measurements, and no head-to-head comparison of the same LLM on identical tasks with vs. without the Datalog policy decomposition.

Authors: We acknowledge that the manuscript lacks direct ablations, quantitative hallucination-rate measurements, or head-to-head LLM comparisons with versus without the policy decomposition. The reported gains are shown via end-to-end comparisons against industrial baselines and real-world bug detection, with the design rationale that restricting LLM sub-problems to smaller syntactic/semantic scopes reduces error scope. We will revise to add an explicit limitations subsection on hallucination evidence and include qualitative examples of decomposition effects where space allows. revision: partial
Referee: [Evaluation section] Reported metrics (TaintBench F1 0.72, 13 real-world bugs) are presented without accompanying experimental protocol details, data splits, error analysis, or controls for post-hoc prompt tuning or benchmark idiosyncrasies, undermining confidence that the gains are attributable to the neuro-symbolic composition rather than external factors.

Authors: We agree that additional protocol details would increase confidence in the results. The current evaluation section references the benchmarks (TaintBench, real-world programs) and metrics but omits full data splits, prompt templates, and systematic error analysis. We will expand the evaluation section in revision to include these elements, along with controls for prompt variations. revision: yes

Circularity Check

0 steps flagged

No circularity; results measured on external benchmarks

full rationale

The paper describes an empirical neuro-symbolic system whose core claims are performance numbers (F1 0.72 on TaintBench, 13 real-world bugs) obtained by running the implemented analyzer on independent benchmark suites and real programs. No equations, fitted parameters, or first-principles derivations appear in the provided text; the policy language and prompting strategy are design choices whose effectiveness is asserted via external evaluation rather than reduced to quantities defined inside the method. The absence of ablations on hallucination reduction is a question of evidence strength, not a self-referential reduction of the reported results to the paper's own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the domain assumption that LLMs can be made reliable for semantic properties once problems are decomposed; no free parameters or invented physical entities are introduced in the abstract.

axioms (1)

domain assumption Decomposition of analysis tasks into smaller syntactic or semantic properties on smaller code snippets will allow LLMs to solve the semantic sub-problems with reduced hallucinations.
Invoked in the description of the compositional neuro-symbolic approach and the role of the analysis policy language.

pith-pipeline@v0.9.0 · 5809 in / 1378 out tokens · 56336 ms · 2026-05-23T07:01:34.328442+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Agentic Interpretation: Lattice-Structured Evidence for LLM-Based Program Analysis
cs.SE 2026-05 unverdicted novelty 7.0

Agentic interpretation uses lattices to track LLM judgments on decomposed program claims during analysis.

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

Thomas W. Reps. Program analysis via graph reachability.Inf. Softw. Technol., 40(11-12):701–726, 1998. doi: 10.1016/S0950-5849(98)00093-7. URL https://doi.org/10.1016/S0950-5849(98)00093-7

work page doi:10.1016/s0950-5849(98)00093-7 1998
[2]

Exocompilation for productive programming of hardware accelerators

Yuka Ikarashi, Gilbert Louis Bernstein, Alex Reinking, Hasan Genc, and Jonathan Ragan-Kelley. Exocompilation for productive programming of hardware accelerators. In Ranjit Jhala and Isil Dillig, editors, PLDI ’22: 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, San Diego, CA, USA, June 13 - 17, 2022 , pages 703...

work page doi:10.1145/3519939.3523446 2022
[3]

O’Hearn, and Hongseok Yang

Cristiano Calcagno, Dino Distefano, Peter W. O’Hearn, and Hongseok Yang. Compositional shape analysis by means of bi-abduction. In Zhong Shao and Benjamin C. Pierce, editors, Proceedings of the 36th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2009, Savannah, GA, USA, January 21-23, 2009 , pages 289–300. ACM, 2009. doi: 10.114...

work page doi:10.1145/1480881.1480917 2009
[4]

McDaniel

Steven Arzt, Siegfried Rasthofer, Christian Fritz, Eric Bodden, Alexandre Bartel, Jacques Klein, Yves Le Traon, Damien Octeau, and Patrick D. McDaniel. Flowdroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for android apps. In Michael F. P. O’Boyle and Keshav Pingali, editors,ACM SIGPLAN Conference on Programming Lan...

work page doi:10.1145/2594291.2594299 2014
[5]

Semfix: program repair via semantic analysis

Hoang Duong Thien Nguyen, Dawei Qi, Abhik Roychoudhury, and Satish Chandra. Semfix: program repair via semantic analysis. In David Notkin, Betty H. C. Cheng, and Klaus Pohl, editors, 35th International Conference on Software Engineering, ICSE ’13, San Francisco, CA, USA, May 18-26, 2013 , pages 772–781. IEEE Computer Society, 2013. doi: 10.1109/ICSE.2013....

work page doi:10.1109/icse.2013.6606623 2013
[6]

Provenfix: Temporal property-guided program repair

Yahui Song, Xiang Gao, Wenhua Li, Wei-Ngan Chin, and Abhik Roychoudhury. Provenfix: Temporal property-guided program repair. Proc. ACM Softw. Eng., 1(FSE):226–248, 2024. doi: 10.1145/3643737. URL https://doi.org/10.1145/3643737

work page doi:10.1145/3643737 2024
[7]

SVF: interprocedural static value-flow analysis in LLVM

Yulei Sui and Jingling Xue. SVF: interprocedural static value-flow analysis in LLVM. In Ayal Zaks and Manuel V. Hermenegildo, editors, Proceedings of the 25th International Conference on Compiler Construction, CC 2016, Barcelona, Spain, March 12-18, 2016 , pages 265–266. ACM, 2016. doi: 10.1145/2892208.2892235

work page doi:10.1145/2892208.2892235 2016
[8]

Pinpoint: fast and precise sparse value flow analysis for million lines of code

Qingkai Shi, Xiao Xiao, Rongxin Wu, Jinguo Zhou, Gang Fan, and Charles Zhang. Pinpoint: fast and precise sparse value flow analysis for million lines of code. In Jeffrey S. Foster and Dan Grossman, editors, Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2018, Philadelphia, PA, USA, June 18-22, 2018 ,...

work page doi:10.1145/3192366.3192418 2018
[9]

Murphy-Hill, and Robert W

Brittany Johnson, Yoonki Song, Emerson R. Murphy-Hill, and Robert W. Bowdidge. Why don’t software developers use static analysis tools to find bugs? In David Notkin, Betty H. C. Cheng, and Klaus Pohl, editors, 35th International Conference on Software Engineering, ICSE ’13, San Francisco, CA, USA, May 18-26, 2013 , pages 672–681. IEEE Computer Society, 20...

work page doi:10.1109/icse.2013.6606613 2013
[10]

What developers want and need from program analysis: an empirical study

Maria Christakis and Christian Bird. What developers want and need from program analysis: an empirical study. In David Lo, Sven Apel, and Sarfraz Khurshid, editors, Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, ASE 2016, Singapore, September 3-7, 2016 , pages 332–343. ACM, 2016. doi: 10.1145/2970276.2970347

work page doi:10.1145/2970276.2970347 2016
[11]

SIRO: empowering version compatibility in intermediate representations via program synthesis

Bowen Zhang, Wei Chen, Peisen Yao, Chengpeng Wang, Wensheng Tang, and Charles Zhang. SIRO: empowering version compatibility in intermediate representations via program synthesis. In Rajiv Gupta, Nael B. Abu-Ghazaleh, Madan Musuvathi, and Dan Tsafrir, editors, Proceedings of the 29th ACM International Conference on Architectural Support for Programming Lan...

work page doi:10.1145/3620666.3651366 2024
[12]

Repocoder: Repository-level code completion through iterative retrieval and generation

Fengji Zhang, Bei Chen, Yue Zhang, Jacky Keung, Jin Liu, Daoguang Zan, Yi Mao, Jian-Guang Lou, and Weizhu Chen. Repocoder: Repository-level code completion through iterative retrieval and generation. In Houda Bouamor, Juan Pino, and Kalika Bali, editors, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Si...

work page doi:10.18653/v1/2023.emnlp-main.151 2023
[13]

Copiloting the copilots: Fusing large language models with completion engines for automated program repair

Yuxiang Wei, Chunqiu Steven Xia, and Lingming Zhang. Copiloting the copilots: Fusing large language models with completion engines for automated program repair. In Satish Chandra, Kelly Blincoe, and Paolo Tonella, editors, Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, E...

work page doi:10.1145/3611643.3616271 2023
[14]

Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik R

Carlos E. Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik R. Narasimhan. Swe-bench: Can language models resolve real-world github issues? In The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024 . OpenReview.net, 2024. URL https://openreview.net/forum?id=VTF8yNQM66

work page 2024
[15]

Universal fuzzing via large language models

Chunqiu Steven Xia, Matteo Paltenghi, Jia Le Tian, Michael Pradel, and Lingming Zhang. Universal fuzzing via large language models. CoRR, abs/2308.04748, 2023. doi: 10.48550/ARXIV.2308.04748

work page doi:10.48550/arxiv.2308.04748 2023
[16]

Fung, Sha Li, Zixuan Huang, Xu Cao, Xingyao Wang, Yiquan Wang, Heng Ji, and Chengxiang Zhai

Ke Yang, Jiateng Liu, John Wu, Chaoqi Yang, Yi R. Fung, Sha Li, Zixuan Huang, Xu Cao, Xingyao Wang, Yiquan Wang, Heng Ji, and Chengxiang Zhai. If LLM is the wizard, then code is the wand: A survey on how code empowers large language models to serve as intelligent agents. CoRR, abs/2401.00812, 2024. doi: 10.48550/ARXIV.2401.00812. URL https://doi.org/10.48...

work page doi:10.48550/arxiv.2401.00812 2024
[17]

Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models

Yue Zhang, Yafu Li, Leyang Cui, Deng Cai, Lemao Liu, Tingchen Fu, Xinting Huang, Enbo Zhao, Yu Zhang, Yulong Chen, Longyue Wang, Anh Tuan Luu, Wei Bi, Freda Shi, and Shuming Shi. Siren’s song in the AI ocean: A survey on hallucination in large language models. CoRR, abs/2309.01219, 2023. doi: 10.48550/ARXIV.2309.01219

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2309.01219 2023
[18]

Classes of recursively enumerable sets and their decision problems

Henry Gordon Rice. Classes of recursively enumerable sets and their decision problems. Transactions of the American Mathematical society, 74(2):358–366, 1953

work page 1953
[19]

Tim Boland and Paul E. Black. Juliet 1.1 C/C++ and java test suite. Computer, 45(10):88–90, 2012. doi: 10.1109/MC.2012.345

work page doi:10.1109/mc.2012.345 2012
[20]

Aashish Yadavally, Yi Li, Shaohua Wang, and Tien N. Nguyen. A learning-based approach to static program slicing. Proc. ACM Program. Lang., 8(OOPSLA1):83–109, 2024. doi: 10.1145/3649814. URL https://doi.org/10.1145/3649814

work page doi:10.1145/3649814 2024
[21]

Taintbench: Automatic real-world malware benchmarking of android taint analyses

Linghui Luo, Felix Pauck, Goran Piskachev, Manuel Benz, Ivan Pashchenko, Martin Mory, Eric Bodden, Ben Hermann, and Fabio Massacci. Taintbench: Automatic real-world malware benchmarking of android taint analyses. Empir. Softw. Eng., 27(1):16, 2022. doi: 10.1007/S10664-021-10013-5. URL https://doi.org/10.1007/s10664-021-10013-5

work page doi:10.1007/s10664-021-10013-5 2022
[22]

Fink, and Rastislav Bodík

Manu Sridharan, Stephen J. Fink, and Rastislav Bodík. Thin slicing. In Jeanne Ferrante and Kathryn S. McKinley, editors, Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation, San Diego, California, USA, June 10-13, 2007 , pages 112–122. ACM, 2007. doi: 10.1145/1250734.1250748. URL https://doi.org/10.1145/1250734.1250748

work page doi:10.1145/1250734.1250748 2007
[23]

Plankton: Reconciling binary code and debug information

Anshunkang Zhou, Chengfeng Ye, Heqing Huang, Yuandao Cai, and Charles Zhang. Plankton: Reconciling binary code and debug information. In Rajiv Gupta, Nael B. Abu-Ghazaleh, Madan Musuvathi, and Dan Tsafrir, editors,Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, ASPLOS...

work page doi:10.1145/3620665.3640382 2024
[24]

A wrapper script to build whole-program LLVM bitcode files

WLLVM. A wrapper script to build whole-program LLVM bitcode files. https://github.com/travitch/whole-program-llvm, 2024. [Online; accessed 3-Dec-2024]

work page 2024
[25]

QL: object-oriented queries on relational data

Pavel Avgustinov, Oege de Moor, Michael Peyton Jones, and Max Schäfer. QL: object-oriented queries on relational data. In Shriram Krishnamurthi and Benjamin S. Lerner, editors, 30th European Conference on Object-Oriented Programming, ECOOP 2016, July 18-22, 2016, Rome, Italy, volume 56 of LIPIcs, pages 2:1–2:25. Schloss Dagstuhl - Leibniz-Zentrum für Info...

work page doi:10.4230/lipics.ecoop 2016
[26]

Kexin Pei, David Bieber, Kensen Shi, Charles Sutton, and Pengcheng Yin. Can large language models reason about program invariants? In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors, International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA , volume 202 o...

work page 2023
[27]

E&v: Prompting large language models to perform static analysis by pseudo-code execution and verification

Yu Hao, Weiteng Chen, Ziqiao Zhou, and Weidong Cui. E&v: Prompting large language models to perform static analysis by pseudo-code execution and verification. CoRR, abs/2312.08477, 2023. doi: 10.48550/ARXIV.2312.08477. URL https://doi.org/10.48550/arXiv.2312.08477

work page doi:10.48550/arxiv.2312.08477 2023
[28]

Symmetry-preserving program representations for learning code semantics

Kexin Pei, Weichen Li, Qirui Jin, Shuyang Liu, Scott Geng, Lorenzo Cavallaro, Junfeng Yang, and Suman Jana. Symmetry-preserving program representations for learning code semantics. CoRR, abs/2308.03312, 2023. doi: 10.48550/ARXIV.2308.03312. URL https: //doi.org/10.48550/arXiv.2308.03312

work page doi:10.48550/arxiv.2308.03312 2023
[29]

Ethainter: a smart contract security analyzer for composite vulnerabilities

Lexi Brent, Neville Grech, Sifis Lagouvardos, Bernhard Scholz, and Yannis Smaragdakis. Ethainter: a smart contract security analyzer for composite vulnerabilities. In Alastair F. Donaldson and Emina Torlak, editors, Proceedings of the 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI 2020, London, UK, June 1...

work page doi:10.1145/3385412.3385990 2020
[30]

Chi, Quoc V

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. Chain-of- thought prompting elicits reasoning in large language models. In NeurIPS, 2022

work page 2022
[31]

Towards mitigating LLM hallucination via self reflection

Ziwei Ji, Tiezheng Yu, Yan Xu, Nayeon Lee, Etsuko Ishii, and Pascale Fung. Towards mitigating LLM hallucination via self reflection. In Houda Bouamor, Juan Pino, and Kalika Bali, editors, Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023, pages 1827–1843. Association for Computational Linguistics, 2023

work page 2023
[32]

Tree-sitter-a new parsing system for programming tools

Max Brunsfeld. Tree-sitter-a new parsing system for programming tools. In Strange Loop Conference,. Accessed–. URL: https://www. thestrangeloop. com//tree-sitter—a-new-parsing-system-for-programming-tools. html , 2018

work page 2018
[33]

PointerBench - A Points-to and Alias Analysis Benchmark Suite

PointerBench. PointerBench - A Points-to and Alias Analysis Benchmark Suite. https://github.com/secure-software-engineering/ PointerBench, 2024. [Online; accessed 3-Dec-2024]. ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article 1. Publication date: December 2024. LLMSA: A Compositional Neuro-Symbolic Approach to Compilation-free and Customizable Stat...

work page 2024
[34]

Kung, Geert Janssen, Wei Zhang, Giacomo Domeniconi, Vladimir Zolotov, and et al

Ruchir Puri, David S. Kung, Geert Janssen, Wei Zhang, Giacomo Domeniconi, Vladimir Zolotov, Julian Dolby, Jie Chen, Mihir R. Choudhury, Lindsey Decker, Veronika Thost, Luca Buratti, Saurabh Pujar, and Ulrich Finkler. Project codenet: A large-scale AI for code dataset for learning a diversity of coding tasks. CoRR, abs/2105.12655, 2021. URL https://arxiv.o...

work page arXiv 2021
[35]

A program slicer for java (tool paper)

Carlos Galindo, Sergio Pérez, and Josep Silva. A program slicer for java (tool paper). In Bernd-Holger Schlingloff and Ming Chai, editors, Software Engineering and Formal Methods - 20th International Conference, SEFM 2022, Berlin, Germany, September 26-30, 2022, Proceedings, volume 13550 of Lecture Notes in Computer Science , pages 146–151. Springer, 2022...

work page doi:10.1007/978-3-031-17108-6 2022
[36]

The analysis policies of different clients

LLMSA. The analysis policies of different clients. https://anonymous.4open.science/r/LLMSA-54FE/src/acsa/analysis/, 2024. [Online; accessed 3-Dec-2024]

work page 2024
[37]

Porting doop to soufflé: a tale of inter-engine portability for datalog-based analyses

Tony Antoniadis, Konstantinos Triantafyllou, and Yannis Smaragdakis. Porting doop to soufflé: a tale of inter-engine portability for datalog-based analyses. In Karim Ali and Cristina Cifuentes, editors,Proceedings of the 6th ACM SIGPLAN International Workshop on State Of the Art in Program Analysis, SOAP@PLDI 2017, Barcelona, Spain, June 18, 2017 , pages ...

work page doi:10.1145/3088515.3088522 2017
[38]

What you always wanted to know about datalog (and never dared to ask)

Stefano Ceri, Georg Gottlob, Letizia Tanca, et al. What you always wanted to know about datalog (and never dared to ask). IEEE transactions on knowledge and data engineering , 1(1):146–166, 1989

work page 1989
[39]

Port, Kotagiri Ramamohanarao, and Krishnamurthy Meenakshi

Isaac Balbin, Graeme S. Port, Kotagiri Ramamohanarao, and Krishnamurthy Meenakshi. Efficient bottom-up computation of queries on stratified databases. The Journal of logic programming , 11(3-4):295–344, 1991

work page 1991
[40]

Cristian Cadar, Daniel Dunbar, and Dawson R. Engler. KLEE: unassisted and automatic generation of high-coverage tests for complex systems programs. In Richard Draves and Robbert van Renesse, editors, 8th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2008, December 8-10, 2008, San Diego, California, USA, Proceedings , pages 209–224....

work page 2008
[41]

Semgrep*: Improving the limited performance of static application security testing (SAST) tools

Gareth Bennett, Tracy Hall, Emily Winter, and Steve Counsell. Semgrep*: Improving the limited performance of static application security testing (SAST) tools. In Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering, EASE 2024, Salerno, Italy, June 18-21, 2024 , pages 614–623. ACM, 2024. doi: 10.1145/3661167...

work page doi:10.1145/3661167.3661262 2024
[42]

Clang-Tidy

Clang-Tidy. Clang-Tidy. https://clang.llvm.org/extra/clang-tidy/, 2024. [Online; accessed 3-Dec-2024]

work page 2024
[43]

Elnar Hajiyev, Mathieu Verbaere, and Oege de Moor.codeQuest: scalable source code queries with datalog. In Dave Thomas, editor, ECOOP 2006 - Object-Oriented Programming, 20th European Conference, Nantes, France, July 3-7, 2006, Proceedings , volume 4067 of Lecture Notes in Computer Science , pages 2–27. Springer, 2006. doi: 10.1007/11785477\_2. URL https:...

work page doi:10.1007/11785477 2006
[44]

DIFFBASE: a differential factbase for effective software evolution management

Xiuheng Wu, Chenguang Zhu, and Yi Li. DIFFBASE: a differential factbase for effective software evolution management. In Diomidis Spinellis, Georgios Gousios, Marsha Chechik, and Massimiliano Di Penta, editors, ESEC/FSE ’21: 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Athens, Greece, Aug...

work page doi:10.1145/3468264.3468605 2021
[45]

Modeling and discovering vulnerabilities with code property graphs

Fabian Yamaguchi, Nico Golde, Daniel Arp, and Konrad Rieck. Modeling and discovering vulnerabilities with code property graphs. In 2014 IEEE Symposium on Security and Privacy, SP 2014, Berkeley, CA, USA, May 18-21, 2014 , pages 590–604. IEEE Computer Society, 2014. doi: 10.1109/SP.2014.44. URL https://doi.org/10.1109/SP.2014.44

work page doi:10.1109/sp.2014.44 2014
[46]

Using datalog for fast and easy program analysis

Yannis Smaragdakis and Martin Bravenboer. Using datalog for fast and easy program analysis. In Oege de Moor, Georg Gottlob, Tim Furche, and Andrew Jon Sellers, editors, Datalog Reloaded - First International Workshop, Datalog 2010, Oxford, UK, March 16-19, 2010. Revised Selected Papers , volume 6702 of Lecture Notes in Computer Science , pages 245–251. Sp...

work page doi:10.1007/978-3-642-24206- 2010
[47]

Strictly declarative specification of sophisticated points-to analyses

Martin Bravenboer and Yannis Smaragdakis. Strictly declarative specification of sophisticated points-to analyses. In Shail Arora and Gary T. Leavens, editors, Proceedings of the 24th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2009, October 25-29, 2009, Orlando, Florida, USA , pages 243–262. A...

work page doi:10.1145/1640089.1640108 2009
[48]

Soufflé: On synthesis of program analyzers

Herbert Jordan, Bernhard Scholz, and Pavle Subotic. Soufflé: On synthesis of program analyzers. In Swarat Chaudhuri and Azadeh Farzan, editors, Computer Aided Verification - 28th International Conference, CA V 2016, Toronto, ON, Canada, July 17-23, 2016, Proceedings, Part II, volume 9780 of Lecture Notes in Computer Science , pages 422–430. Springer, 2016...

work page doi:10.1007/978-3-319-41540-6 2016
[49]

Nyx: Detecting exploitable front-running vulnerabilities in smart contracts

Wuqi Zhang, Zhuo Zhang, Qingkai Shi, Lu Liu, Lili Wei, Yepang Liu, Xiangyu Zhang, and Shing-Chi Cheung. Nyx: Detecting exploitable front-running vulnerabilities in smart contracts. In 2024 IEEE Symposium on Security and Privacy (SP) , pages 146–146. IEEE Computer Society, 2024

work page 2024
[50]

Practical se- curity analysis of zero-knowledge proof circuits

Hongbo Wen, Jon Stephens, Yanju Chen, Kostas Ferles, Shankara Pailoor, Kyle Charbonnet, Isil Dillig, and Yu Feng. Practical se- curity analysis of zero-knowledge proof circuits. In Davide Balzarotti and Wenyuan Xu, editors, 33rd USENIX Security Symposium, USENIX Security 2024, Philadelphia, PA, USA, August 14-16, 2024 . USENIX Association, 2024. URL https...

work page 2024
[51]

ARBITRAR: user-guided API misuse detection

Ziyang Li, Aravind Machiry, Binghong Chen, Mayur Naik, Ke Wang, and Le Song. ARBITRAR: user-guided API misuse detection. In 42nd IEEE Symposium on Security and Privacy, SP 2021, San Francisco, CA, USA, 24-27 May 2021 , pages 1400–1415. IEEE, 2021. doi: 10.1109/SP40001.2021.00090. URL https://doi.org/10.1109/SP40001.2021.00090

work page doi:10.1109/sp40001.2021.00090 2021
[52]

Dataflow analysis-inspired deep learning for efficient vulnerability detection

Benjamin Steenhoek, Hongyang Gao, and Wei Le. Dataflow analysis-inspired deep learning for efficient vulnerability detection. In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering, ICSE 2024, Lisbon, Portugal, April 14-20, 2024 , pages 16:1–16:13. ACM, 2024. doi: 10.1145/3597503.3623345. URL https://doi.org/10.1145/3597503.3623345

work page doi:10.1145/3597503.3623345 2024
[53]

In: 2022 International Joint Conference on Neural Networks (IJCNN)

Hazim Hanif and Sergio Maffeis. Vulberta: Simplified source code pre-training for vulnerability detection. In International Joint Conference on Neural Networks, IJCNN 2022, Padua, Italy, July 18-23, 2022 , pages 1–8. IEEE, 2022. doi: 10.1109/IJCNN55064.2022.9892280. URL https://doi.org/10.1109/IJCNN55064.2022.9892280

work page doi:10.1109/ijcnn55064.2022.9892280 2022
[54]

Hoppity: Learning graph transformations to detect and fix bugs in programs

Elizabeth Dinella, Hanjun Dai, Ziyang Li, Mayur Naik, Le Song, and Ke Wang. Hoppity: Learning graph transformations to detect and fix bugs in programs. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020 . OpenReview.net, 2020. URL https://openreview.net/forum?id=SJeqs6EFvB

work page 2020
[55]

Evaluating the effectiveness of deep learning models for foundational program analysis tasks

Qian Chen, Chenyang Yu, Ruyan Liu, Chi Zhang, Yu Wang, Ke Wang, Ting Su, and Linzhang Wang. Evaluating the effectiveness of deep learning models for foundational program analysis tasks. Proc. ACM Program. Lang., 8(OOPSLA1):500–528, 2024. doi: 10.1145/3649829. URL https://doi.org/10.1145/3649829

work page doi:10.1145/3649829 2024
[56]

In 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023

Aashish Yadavally, Tien N. Nguyen, Wenbo Wang, and Shaohua Wang. (partial) program dependence learning. In 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023 , pages 2501–2513. IEEE, 2023. doi: 10.1109/ICSE48619.2023.00209. URL https://doi.org/10.1109/ICSE48619.2023.00209

work page doi:10.1109/icse48619.2023.00209 2023
[57]

Enhancing static analysis for practical bug detection: An llm-integrated approach

Haonan Li, Yu Hao, Yizhuo Zhai, and Zhiyun Qian. Enhancing static analysis for practical bug detection: An llm-integrated approach. Proc. ACM Program. Lang., 8(OOPSLA1):474–499, 2024. doi: 10.1145/3649828. URL https://doi.org/10.1145/3649828. ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article 1. Publication date: December 2024

work page doi:10.1145/3649828 2024

[1] [1]

Thomas W. Reps. Program analysis via graph reachability.Inf. Softw. Technol., 40(11-12):701–726, 1998. doi: 10.1016/S0950-5849(98)00093-7. URL https://doi.org/10.1016/S0950-5849(98)00093-7

work page doi:10.1016/s0950-5849(98)00093-7 1998

[2] [2]

Exocompilation for productive programming of hardware accelerators

Yuka Ikarashi, Gilbert Louis Bernstein, Alex Reinking, Hasan Genc, and Jonathan Ragan-Kelley. Exocompilation for productive programming of hardware accelerators. In Ranjit Jhala and Isil Dillig, editors, PLDI ’22: 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, San Diego, CA, USA, June 13 - 17, 2022 , pages 703...

work page doi:10.1145/3519939.3523446 2022

[3] [3]

O’Hearn, and Hongseok Yang

Cristiano Calcagno, Dino Distefano, Peter W. O’Hearn, and Hongseok Yang. Compositional shape analysis by means of bi-abduction. In Zhong Shao and Benjamin C. Pierce, editors, Proceedings of the 36th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2009, Savannah, GA, USA, January 21-23, 2009 , pages 289–300. ACM, 2009. doi: 10.114...

work page doi:10.1145/1480881.1480917 2009

[4] [4]

McDaniel

Steven Arzt, Siegfried Rasthofer, Christian Fritz, Eric Bodden, Alexandre Bartel, Jacques Klein, Yves Le Traon, Damien Octeau, and Patrick D. McDaniel. Flowdroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for android apps. In Michael F. P. O’Boyle and Keshav Pingali, editors,ACM SIGPLAN Conference on Programming Lan...

work page doi:10.1145/2594291.2594299 2014

[5] [5]

Semfix: program repair via semantic analysis

Hoang Duong Thien Nguyen, Dawei Qi, Abhik Roychoudhury, and Satish Chandra. Semfix: program repair via semantic analysis. In David Notkin, Betty H. C. Cheng, and Klaus Pohl, editors, 35th International Conference on Software Engineering, ICSE ’13, San Francisco, CA, USA, May 18-26, 2013 , pages 772–781. IEEE Computer Society, 2013. doi: 10.1109/ICSE.2013....

work page doi:10.1109/icse.2013.6606623 2013

[6] [6]

Provenfix: Temporal property-guided program repair

Yahui Song, Xiang Gao, Wenhua Li, Wei-Ngan Chin, and Abhik Roychoudhury. Provenfix: Temporal property-guided program repair. Proc. ACM Softw. Eng., 1(FSE):226–248, 2024. doi: 10.1145/3643737. URL https://doi.org/10.1145/3643737

work page doi:10.1145/3643737 2024

[7] [7]

SVF: interprocedural static value-flow analysis in LLVM

Yulei Sui and Jingling Xue. SVF: interprocedural static value-flow analysis in LLVM. In Ayal Zaks and Manuel V. Hermenegildo, editors, Proceedings of the 25th International Conference on Compiler Construction, CC 2016, Barcelona, Spain, March 12-18, 2016 , pages 265–266. ACM, 2016. doi: 10.1145/2892208.2892235

work page doi:10.1145/2892208.2892235 2016

[8] [8]

Pinpoint: fast and precise sparse value flow analysis for million lines of code

Qingkai Shi, Xiao Xiao, Rongxin Wu, Jinguo Zhou, Gang Fan, and Charles Zhang. Pinpoint: fast and precise sparse value flow analysis for million lines of code. In Jeffrey S. Foster and Dan Grossman, editors, Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2018, Philadelphia, PA, USA, June 18-22, 2018 ,...

work page doi:10.1145/3192366.3192418 2018

[9] [9]

Murphy-Hill, and Robert W

Brittany Johnson, Yoonki Song, Emerson R. Murphy-Hill, and Robert W. Bowdidge. Why don’t software developers use static analysis tools to find bugs? In David Notkin, Betty H. C. Cheng, and Klaus Pohl, editors, 35th International Conference on Software Engineering, ICSE ’13, San Francisco, CA, USA, May 18-26, 2013 , pages 672–681. IEEE Computer Society, 20...

work page doi:10.1109/icse.2013.6606613 2013

[10] [10]

What developers want and need from program analysis: an empirical study

Maria Christakis and Christian Bird. What developers want and need from program analysis: an empirical study. In David Lo, Sven Apel, and Sarfraz Khurshid, editors, Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, ASE 2016, Singapore, September 3-7, 2016 , pages 332–343. ACM, 2016. doi: 10.1145/2970276.2970347

work page doi:10.1145/2970276.2970347 2016

[11] [11]

SIRO: empowering version compatibility in intermediate representations via program synthesis

Bowen Zhang, Wei Chen, Peisen Yao, Chengpeng Wang, Wensheng Tang, and Charles Zhang. SIRO: empowering version compatibility in intermediate representations via program synthesis. In Rajiv Gupta, Nael B. Abu-Ghazaleh, Madan Musuvathi, and Dan Tsafrir, editors, Proceedings of the 29th ACM International Conference on Architectural Support for Programming Lan...

work page doi:10.1145/3620666.3651366 2024

[12] [12]

Repocoder: Repository-level code completion through iterative retrieval and generation

Fengji Zhang, Bei Chen, Yue Zhang, Jacky Keung, Jin Liu, Daoguang Zan, Yi Mao, Jian-Guang Lou, and Weizhu Chen. Repocoder: Repository-level code completion through iterative retrieval and generation. In Houda Bouamor, Juan Pino, and Kalika Bali, editors, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Si...

work page doi:10.18653/v1/2023.emnlp-main.151 2023

[13] [13]

Copiloting the copilots: Fusing large language models with completion engines for automated program repair

Yuxiang Wei, Chunqiu Steven Xia, and Lingming Zhang. Copiloting the copilots: Fusing large language models with completion engines for automated program repair. In Satish Chandra, Kelly Blincoe, and Paolo Tonella, editors, Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, E...

work page doi:10.1145/3611643.3616271 2023

[14] [14]

Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik R

Carlos E. Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik R. Narasimhan. Swe-bench: Can language models resolve real-world github issues? In The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024 . OpenReview.net, 2024. URL https://openreview.net/forum?id=VTF8yNQM66

work page 2024

[15] [15]

Universal fuzzing via large language models

Chunqiu Steven Xia, Matteo Paltenghi, Jia Le Tian, Michael Pradel, and Lingming Zhang. Universal fuzzing via large language models. CoRR, abs/2308.04748, 2023. doi: 10.48550/ARXIV.2308.04748

work page doi:10.48550/arxiv.2308.04748 2023

[16] [16]

Fung, Sha Li, Zixuan Huang, Xu Cao, Xingyao Wang, Yiquan Wang, Heng Ji, and Chengxiang Zhai

Ke Yang, Jiateng Liu, John Wu, Chaoqi Yang, Yi R. Fung, Sha Li, Zixuan Huang, Xu Cao, Xingyao Wang, Yiquan Wang, Heng Ji, and Chengxiang Zhai. If LLM is the wizard, then code is the wand: A survey on how code empowers large language models to serve as intelligent agents. CoRR, abs/2401.00812, 2024. doi: 10.48550/ARXIV.2401.00812. URL https://doi.org/10.48...

work page doi:10.48550/arxiv.2401.00812 2024

[17] [17]

Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models

Yue Zhang, Yafu Li, Leyang Cui, Deng Cai, Lemao Liu, Tingchen Fu, Xinting Huang, Enbo Zhao, Yu Zhang, Yulong Chen, Longyue Wang, Anh Tuan Luu, Wei Bi, Freda Shi, and Shuming Shi. Siren’s song in the AI ocean: A survey on hallucination in large language models. CoRR, abs/2309.01219, 2023. doi: 10.48550/ARXIV.2309.01219

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2309.01219 2023

[18] [18]

Classes of recursively enumerable sets and their decision problems

Henry Gordon Rice. Classes of recursively enumerable sets and their decision problems. Transactions of the American Mathematical society, 74(2):358–366, 1953

work page 1953

[19] [19]

Tim Boland and Paul E. Black. Juliet 1.1 C/C++ and java test suite. Computer, 45(10):88–90, 2012. doi: 10.1109/MC.2012.345

work page doi:10.1109/mc.2012.345 2012

[20] [20]

Aashish Yadavally, Yi Li, Shaohua Wang, and Tien N. Nguyen. A learning-based approach to static program slicing. Proc. ACM Program. Lang., 8(OOPSLA1):83–109, 2024. doi: 10.1145/3649814. URL https://doi.org/10.1145/3649814

work page doi:10.1145/3649814 2024

[21] [21]

Taintbench: Automatic real-world malware benchmarking of android taint analyses

Linghui Luo, Felix Pauck, Goran Piskachev, Manuel Benz, Ivan Pashchenko, Martin Mory, Eric Bodden, Ben Hermann, and Fabio Massacci. Taintbench: Automatic real-world malware benchmarking of android taint analyses. Empir. Softw. Eng., 27(1):16, 2022. doi: 10.1007/S10664-021-10013-5. URL https://doi.org/10.1007/s10664-021-10013-5

work page doi:10.1007/s10664-021-10013-5 2022

[22] [22]

Fink, and Rastislav Bodík

Manu Sridharan, Stephen J. Fink, and Rastislav Bodík. Thin slicing. In Jeanne Ferrante and Kathryn S. McKinley, editors, Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation, San Diego, California, USA, June 10-13, 2007 , pages 112–122. ACM, 2007. doi: 10.1145/1250734.1250748. URL https://doi.org/10.1145/1250734.1250748

work page doi:10.1145/1250734.1250748 2007

[23] [23]

Plankton: Reconciling binary code and debug information

Anshunkang Zhou, Chengfeng Ye, Heqing Huang, Yuandao Cai, and Charles Zhang. Plankton: Reconciling binary code and debug information. In Rajiv Gupta, Nael B. Abu-Ghazaleh, Madan Musuvathi, and Dan Tsafrir, editors,Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, ASPLOS...

work page doi:10.1145/3620665.3640382 2024

[24] [24]

A wrapper script to build whole-program LLVM bitcode files

WLLVM. A wrapper script to build whole-program LLVM bitcode files. https://github.com/travitch/whole-program-llvm, 2024. [Online; accessed 3-Dec-2024]

work page 2024

[25] [25]

QL: object-oriented queries on relational data

Pavel Avgustinov, Oege de Moor, Michael Peyton Jones, and Max Schäfer. QL: object-oriented queries on relational data. In Shriram Krishnamurthi and Benjamin S. Lerner, editors, 30th European Conference on Object-Oriented Programming, ECOOP 2016, July 18-22, 2016, Rome, Italy, volume 56 of LIPIcs, pages 2:1–2:25. Schloss Dagstuhl - Leibniz-Zentrum für Info...

work page doi:10.4230/lipics.ecoop 2016

[26] [26]

Kexin Pei, David Bieber, Kensen Shi, Charles Sutton, and Pengcheng Yin. Can large language models reason about program invariants? In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors, International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA , volume 202 o...

work page 2023

[27] [27]

E&v: Prompting large language models to perform static analysis by pseudo-code execution and verification

Yu Hao, Weiteng Chen, Ziqiao Zhou, and Weidong Cui. E&v: Prompting large language models to perform static analysis by pseudo-code execution and verification. CoRR, abs/2312.08477, 2023. doi: 10.48550/ARXIV.2312.08477. URL https://doi.org/10.48550/arXiv.2312.08477

work page doi:10.48550/arxiv.2312.08477 2023

[28] [28]

Symmetry-preserving program representations for learning code semantics

Kexin Pei, Weichen Li, Qirui Jin, Shuyang Liu, Scott Geng, Lorenzo Cavallaro, Junfeng Yang, and Suman Jana. Symmetry-preserving program representations for learning code semantics. CoRR, abs/2308.03312, 2023. doi: 10.48550/ARXIV.2308.03312. URL https: //doi.org/10.48550/arXiv.2308.03312

work page doi:10.48550/arxiv.2308.03312 2023

[29] [29]

Ethainter: a smart contract security analyzer for composite vulnerabilities

Lexi Brent, Neville Grech, Sifis Lagouvardos, Bernhard Scholz, and Yannis Smaragdakis. Ethainter: a smart contract security analyzer for composite vulnerabilities. In Alastair F. Donaldson and Emina Torlak, editors, Proceedings of the 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI 2020, London, UK, June 1...

work page doi:10.1145/3385412.3385990 2020

[30] [30]

Chi, Quoc V

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. Chain-of- thought prompting elicits reasoning in large language models. In NeurIPS, 2022

work page 2022

[31] [31]

Towards mitigating LLM hallucination via self reflection

Ziwei Ji, Tiezheng Yu, Yan Xu, Nayeon Lee, Etsuko Ishii, and Pascale Fung. Towards mitigating LLM hallucination via self reflection. In Houda Bouamor, Juan Pino, and Kalika Bali, editors, Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023, pages 1827–1843. Association for Computational Linguistics, 2023

work page 2023

[32] [32]

Tree-sitter-a new parsing system for programming tools

Max Brunsfeld. Tree-sitter-a new parsing system for programming tools. In Strange Loop Conference,. Accessed–. URL: https://www. thestrangeloop. com//tree-sitter—a-new-parsing-system-for-programming-tools. html , 2018

work page 2018

[33] [33]

PointerBench - A Points-to and Alias Analysis Benchmark Suite

PointerBench. PointerBench - A Points-to and Alias Analysis Benchmark Suite. https://github.com/secure-software-engineering/ PointerBench, 2024. [Online; accessed 3-Dec-2024]. ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article 1. Publication date: December 2024. LLMSA: A Compositional Neuro-Symbolic Approach to Compilation-free and Customizable Stat...

work page 2024

[34] [34]

Kung, Geert Janssen, Wei Zhang, Giacomo Domeniconi, Vladimir Zolotov, and et al

Ruchir Puri, David S. Kung, Geert Janssen, Wei Zhang, Giacomo Domeniconi, Vladimir Zolotov, Julian Dolby, Jie Chen, Mihir R. Choudhury, Lindsey Decker, Veronika Thost, Luca Buratti, Saurabh Pujar, and Ulrich Finkler. Project codenet: A large-scale AI for code dataset for learning a diversity of coding tasks. CoRR, abs/2105.12655, 2021. URL https://arxiv.o...

work page arXiv 2021

[35] [35]

A program slicer for java (tool paper)

Carlos Galindo, Sergio Pérez, and Josep Silva. A program slicer for java (tool paper). In Bernd-Holger Schlingloff and Ming Chai, editors, Software Engineering and Formal Methods - 20th International Conference, SEFM 2022, Berlin, Germany, September 26-30, 2022, Proceedings, volume 13550 of Lecture Notes in Computer Science , pages 146–151. Springer, 2022...

work page doi:10.1007/978-3-031-17108-6 2022

[36] [36]

The analysis policies of different clients

LLMSA. The analysis policies of different clients. https://anonymous.4open.science/r/LLMSA-54FE/src/acsa/analysis/, 2024. [Online; accessed 3-Dec-2024]

work page 2024

[37] [37]

Porting doop to soufflé: a tale of inter-engine portability for datalog-based analyses

Tony Antoniadis, Konstantinos Triantafyllou, and Yannis Smaragdakis. Porting doop to soufflé: a tale of inter-engine portability for datalog-based analyses. In Karim Ali and Cristina Cifuentes, editors,Proceedings of the 6th ACM SIGPLAN International Workshop on State Of the Art in Program Analysis, SOAP@PLDI 2017, Barcelona, Spain, June 18, 2017 , pages ...

work page doi:10.1145/3088515.3088522 2017

[38] [38]

What you always wanted to know about datalog (and never dared to ask)

Stefano Ceri, Georg Gottlob, Letizia Tanca, et al. What you always wanted to know about datalog (and never dared to ask). IEEE transactions on knowledge and data engineering , 1(1):146–166, 1989

work page 1989

[39] [39]

Port, Kotagiri Ramamohanarao, and Krishnamurthy Meenakshi

Isaac Balbin, Graeme S. Port, Kotagiri Ramamohanarao, and Krishnamurthy Meenakshi. Efficient bottom-up computation of queries on stratified databases. The Journal of logic programming , 11(3-4):295–344, 1991

work page 1991

[40] [40]

Cristian Cadar, Daniel Dunbar, and Dawson R. Engler. KLEE: unassisted and automatic generation of high-coverage tests for complex systems programs. In Richard Draves and Robbert van Renesse, editors, 8th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2008, December 8-10, 2008, San Diego, California, USA, Proceedings , pages 209–224....

work page 2008

[41] [41]

Semgrep*: Improving the limited performance of static application security testing (SAST) tools

Gareth Bennett, Tracy Hall, Emily Winter, and Steve Counsell. Semgrep*: Improving the limited performance of static application security testing (SAST) tools. In Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering, EASE 2024, Salerno, Italy, June 18-21, 2024 , pages 614–623. ACM, 2024. doi: 10.1145/3661167...

work page doi:10.1145/3661167.3661262 2024

[42] [42]

Clang-Tidy

Clang-Tidy. Clang-Tidy. https://clang.llvm.org/extra/clang-tidy/, 2024. [Online; accessed 3-Dec-2024]

work page 2024

[43] [43]

Elnar Hajiyev, Mathieu Verbaere, and Oege de Moor.codeQuest: scalable source code queries with datalog. In Dave Thomas, editor, ECOOP 2006 - Object-Oriented Programming, 20th European Conference, Nantes, France, July 3-7, 2006, Proceedings , volume 4067 of Lecture Notes in Computer Science , pages 2–27. Springer, 2006. doi: 10.1007/11785477\_2. URL https:...

work page doi:10.1007/11785477 2006

[44] [44]

DIFFBASE: a differential factbase for effective software evolution management

Xiuheng Wu, Chenguang Zhu, and Yi Li. DIFFBASE: a differential factbase for effective software evolution management. In Diomidis Spinellis, Georgios Gousios, Marsha Chechik, and Massimiliano Di Penta, editors, ESEC/FSE ’21: 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Athens, Greece, Aug...

work page doi:10.1145/3468264.3468605 2021

[45] [45]

Modeling and discovering vulnerabilities with code property graphs

Fabian Yamaguchi, Nico Golde, Daniel Arp, and Konrad Rieck. Modeling and discovering vulnerabilities with code property graphs. In 2014 IEEE Symposium on Security and Privacy, SP 2014, Berkeley, CA, USA, May 18-21, 2014 , pages 590–604. IEEE Computer Society, 2014. doi: 10.1109/SP.2014.44. URL https://doi.org/10.1109/SP.2014.44

work page doi:10.1109/sp.2014.44 2014

[46] [46]

Using datalog for fast and easy program analysis

Yannis Smaragdakis and Martin Bravenboer. Using datalog for fast and easy program analysis. In Oege de Moor, Georg Gottlob, Tim Furche, and Andrew Jon Sellers, editors, Datalog Reloaded - First International Workshop, Datalog 2010, Oxford, UK, March 16-19, 2010. Revised Selected Papers , volume 6702 of Lecture Notes in Computer Science , pages 245–251. Sp...

work page doi:10.1007/978-3-642-24206- 2010

[47] [47]

Strictly declarative specification of sophisticated points-to analyses

Martin Bravenboer and Yannis Smaragdakis. Strictly declarative specification of sophisticated points-to analyses. In Shail Arora and Gary T. Leavens, editors, Proceedings of the 24th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2009, October 25-29, 2009, Orlando, Florida, USA , pages 243–262. A...

work page doi:10.1145/1640089.1640108 2009

[48] [48]

Soufflé: On synthesis of program analyzers

Herbert Jordan, Bernhard Scholz, and Pavle Subotic. Soufflé: On synthesis of program analyzers. In Swarat Chaudhuri and Azadeh Farzan, editors, Computer Aided Verification - 28th International Conference, CA V 2016, Toronto, ON, Canada, July 17-23, 2016, Proceedings, Part II, volume 9780 of Lecture Notes in Computer Science , pages 422–430. Springer, 2016...

work page doi:10.1007/978-3-319-41540-6 2016

[49] [49]

Nyx: Detecting exploitable front-running vulnerabilities in smart contracts

Wuqi Zhang, Zhuo Zhang, Qingkai Shi, Lu Liu, Lili Wei, Yepang Liu, Xiangyu Zhang, and Shing-Chi Cheung. Nyx: Detecting exploitable front-running vulnerabilities in smart contracts. In 2024 IEEE Symposium on Security and Privacy (SP) , pages 146–146. IEEE Computer Society, 2024

work page 2024

[50] [50]

Practical se- curity analysis of zero-knowledge proof circuits

Hongbo Wen, Jon Stephens, Yanju Chen, Kostas Ferles, Shankara Pailoor, Kyle Charbonnet, Isil Dillig, and Yu Feng. Practical se- curity analysis of zero-knowledge proof circuits. In Davide Balzarotti and Wenyuan Xu, editors, 33rd USENIX Security Symposium, USENIX Security 2024, Philadelphia, PA, USA, August 14-16, 2024 . USENIX Association, 2024. URL https...

work page 2024

[51] [51]

ARBITRAR: user-guided API misuse detection

Ziyang Li, Aravind Machiry, Binghong Chen, Mayur Naik, Ke Wang, and Le Song. ARBITRAR: user-guided API misuse detection. In 42nd IEEE Symposium on Security and Privacy, SP 2021, San Francisco, CA, USA, 24-27 May 2021 , pages 1400–1415. IEEE, 2021. doi: 10.1109/SP40001.2021.00090. URL https://doi.org/10.1109/SP40001.2021.00090

work page doi:10.1109/sp40001.2021.00090 2021

[52] [52]

Dataflow analysis-inspired deep learning for efficient vulnerability detection

Benjamin Steenhoek, Hongyang Gao, and Wei Le. Dataflow analysis-inspired deep learning for efficient vulnerability detection. In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering, ICSE 2024, Lisbon, Portugal, April 14-20, 2024 , pages 16:1–16:13. ACM, 2024. doi: 10.1145/3597503.3623345. URL https://doi.org/10.1145/3597503.3623345

work page doi:10.1145/3597503.3623345 2024

[53] [53]

In: 2022 International Joint Conference on Neural Networks (IJCNN)

Hazim Hanif and Sergio Maffeis. Vulberta: Simplified source code pre-training for vulnerability detection. In International Joint Conference on Neural Networks, IJCNN 2022, Padua, Italy, July 18-23, 2022 , pages 1–8. IEEE, 2022. doi: 10.1109/IJCNN55064.2022.9892280. URL https://doi.org/10.1109/IJCNN55064.2022.9892280

work page doi:10.1109/ijcnn55064.2022.9892280 2022

[54] [54]

Hoppity: Learning graph transformations to detect and fix bugs in programs

Elizabeth Dinella, Hanjun Dai, Ziyang Li, Mayur Naik, Le Song, and Ke Wang. Hoppity: Learning graph transformations to detect and fix bugs in programs. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020 . OpenReview.net, 2020. URL https://openreview.net/forum?id=SJeqs6EFvB

work page 2020

[55] [55]

Evaluating the effectiveness of deep learning models for foundational program analysis tasks

Qian Chen, Chenyang Yu, Ruyan Liu, Chi Zhang, Yu Wang, Ke Wang, Ting Su, and Linzhang Wang. Evaluating the effectiveness of deep learning models for foundational program analysis tasks. Proc. ACM Program. Lang., 8(OOPSLA1):500–528, 2024. doi: 10.1145/3649829. URL https://doi.org/10.1145/3649829

work page doi:10.1145/3649829 2024

[56] [56]

In 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023

Aashish Yadavally, Tien N. Nguyen, Wenbo Wang, and Shaohua Wang. (partial) program dependence learning. In 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023 , pages 2501–2513. IEEE, 2023. doi: 10.1109/ICSE48619.2023.00209. URL https://doi.org/10.1109/ICSE48619.2023.00209

work page doi:10.1109/icse48619.2023.00209 2023

[57] [57]

Enhancing static analysis for practical bug detection: An llm-integrated approach

Haonan Li, Yu Hao, Yizhuo Zhai, and Zhiyun Qian. Enhancing static analysis for practical bug detection: An llm-integrated approach. Proc. ACM Program. Lang., 8(OOPSLA1):474–499, 2024. doi: 10.1145/3649828. URL https://doi.org/10.1145/3649828. ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article 1. Publication date: December 2024

work page doi:10.1145/3649828 2024