pith. sign in

arxiv: 2412.14399 · v2 · submitted 2024-12-18 · 💻 cs.PL · cs.SE

NESA: Relational Neuro-Symbolic Static Program Analysis

Pith reviewed 2026-05-23 07:01 UTC · model grok-4.3

classification 💻 cs.PL cs.SE
keywords static program analysisneuro-symbolic analysislarge language modelstaint detectionprogram slicingbug detectionDatalog policy language
0
0 comments X

The pith

NESA decomposes static program analysis into syntactic and semantic sub-problems so LLMs handle only the semantic parts with fewer hallucinations while matching or exceeding traditional tools.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents NESA as a way to perform static program analysis without compilation by letting users write policies that split a task into simpler pieces targeting small code snippets. Syntactic pieces are solved exactly by parsers; semantic pieces go to LLMs but only after the policy has narrowed the scope and after lazy incremental prompting has been applied. The claim is that this split keeps LLM errors low enough for the overall results to be at least as good as existing analyzers and sometimes better, as shown on program slicing, bug detection, and a taint task where F1 reached 0.72. The method also found thirteen real memory-leak bugs later fixed by developers. If the decomposition works as described, analysis becomes both customizable by non-experts and runnable directly on source without building the program first.

Core claim

NESA provides an analysis policy language, a restricted form of Datalog, that lets users decompose a static-analysis problem into sub-problems on smaller code snippets; syntactic sub-problems are solved by parsing and semantic sub-problems are solved by LLMs under lazy incremental prompting, yielding comparable or better precision and recall than conventional compiled analyzers on slicing and taint-detection benchmarks.

What carries the argument

The analysis policy language (restricted Datalog) together with lazy incremental prompting, which decomposes tasks so that LLMs receive only narrowed semantic questions on small snippets while syntactic facts are supplied exactly by parsers.

If this is right

  • Custom taint and slicing policies can be written and executed without any compiler or build system.
  • On the TaintBench benchmark the reported F1 of 0.72 exceeds the industrial baseline by 0.20.
  • Thirteen memory-leak bugs in real programs were located and later confirmed and fixed by their developers.
  • The same policy mechanism supports both program slicing and bug detection on standard benchmarks and real-world code.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same decomposition pattern could be tried on other LLM-based code tasks such as refactoring or test generation where hallucinations are currently high.
  • If policies can be written once and reused across projects, the cost of maintaining many specialized analyzers might drop.
  • Performance would likely improve further if stronger LLMs are substituted, because the policy already limits the scope of each LLM call.

Load-bearing premise

That splitting an analysis into smaller syntactic or semantic properties on limited code snippets will keep LLM hallucinations low enough for the combined results to stay accurate.

What would settle it

Run the same taint-detection policy on TaintBench with both NESA and the industrial baseline and measure whether NESA's F1 remains at least 0.20 higher and whether the LLM answers on the semantic sub-problems show markedly fewer contradictions than when the same questions are asked without the policy decomposition.

Figures

Figures reproduced from arXiv: 2412.14399 by Chengpeng Wang, Jinyao Guo, Mingwei Zheng, Qingkai Shi, Wuqi Zhang, Xiangyu Zhang, Xuwei Liu, Yifei Gao.

Figure 1
Figure 1. Figure 1: Two motivating examples of compilation-free and customizable static analysis [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The workflow of LLMSA 2.2.1 Static Analysis via Prompting. Large language models (LLMs), advanced neural networks pre-trained upon a huge amount of data, have demonstrated remarkable performance in understanding program semantics [26–29], which suggests that they can be treated as a compiler or a static analyzer to interpret program semantics. For example, the users can provide the definitions of the progr… view at source ↗
Figure 3
Figure 3. Figure 3: The examples of analysis policy and neural relation specification. In the sub-figure (a), the symbolic, neural, and [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The syntax of the analysis policy language [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: An analysis policy of intra-procedural XSS detection. The neural relations are in [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: The instantiation of rule semantics natural join on the non-neural relations and further projects the joined result upon the bounded terms in 𝑅𝑛, i.e., term(𝑅𝑛) ∩A, which yields a set of tuples 𝑇 . To form the tuples in 𝑅𝐼 , we only need to apply the constrained neural constructor 𝜏 (𝑅𝑛, A) to the tuples in 𝑇 , as any other tuples would make S(𝑅1) ⊲⊳ · · · S(𝑅𝑛−1) ⊲⊳ S(𝑅𝑛) empty. The strategy of lazy promp… view at source ↗
Figure 7
Figure 7. Figure 7: The improved inference rule for a neural relation [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: An example of incremental prompting in the evaluation of rule (4) of the analysis policy in Figure [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: The average rounds of conducted and skipped prompting in program slicing and bug detection [PITH_FULL_IMAGE:figures/full_fig_p022_9.png] view at source ↗
read the original abstract

Static program analysis plays an essential role in program optimization, bug detection, and debugging. However, reliance on compilation and limited customization hinder its adoption in the real world. This paper presents a compositional neuro-symbolic approach named NESA that facilitates compilation-free and customizable static program analysis using large language models (LLMs) with mitigated hallucinations. Specifically, we propose an analysis policy language, a restricted form of Datalog, to support users decomposing a static program analysis problem into several sub-problems that target simpler syntactic or semantic properties upon smaller code snippets. The problem decomposition enables the LLMs to target more manageable semantic-related sub-problems with reduced hallucinations, while the syntactic ones are resolved by parsing-based analysis without hallucinations. An analysis policy then is evaluated with lazy and incremental prompting, which significantly mitigates the hallucinations and improves the performance. We evaluate NESA for program slicing and bug detection upon benchmark and real-world programs. Evaluation results show that while NESA supports compilation-free and customizable analysis, it can still achieve comparable and even better performance than existing techniques. In a customized taint vulnerability detection upon TaintBench, for example, NESA achieves a precision of 66.27%, a recall of 78.57%, and an F1 score of 0.72, surpassing an industrial approach by 0.20 in F1 score. NESA also detects 13 real-world memory leak bugs, which have been fixed by developers.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes NESA, a compositional neuro-symbolic static analysis framework that introduces a restricted Datalog-style policy language allowing users to decompose analysis tasks into syntactic sub-problems (handled by parsers) and semantic sub-problems (handled by LLMs), evaluated via lazy incremental prompting. It claims this mitigates hallucinations, enables compilation-free customizable analysis, and yields strong results on program slicing and bug detection, including 66.27% precision / 78.57% recall / 0.72 F1 on customized taint detection on TaintBench (surpassing an industrial baseline by 0.20 F1) plus detection of 13 fixed real-world memory leaks.

Significance. If the decomposition mechanism reliably reduces hallucinations relative to direct LLM use, the work would offer a practical route to customizable, compilation-free analyses that combine symbolic precision on syntax with neural flexibility on semantics, with direct applicability to taint analysis and slicing.

major comments (2)
  1. [Evaluation / abstract performance claims] The central claim that policy decomposition plus lazy incremental prompting 'significantly mitigates the hallucinations' (abstract) is load-bearing for both the novelty and the reported performance gains, yet the manuscript provides no ablation, no hallucination-rate measurements, and no head-to-head comparison of the same LLM on identical tasks with vs. without the Datalog policy decomposition.
  2. [Evaluation section] Reported metrics (TaintBench F1 0.72, 13 real-world bugs) are presented without accompanying experimental protocol details, data splits, error analysis, or controls for post-hoc prompt tuning or benchmark idiosyncrasies, undermining confidence that the gains are attributable to the neuro-symbolic composition rather than external factors.
minor comments (1)
  1. Notation for the policy language and the precise semantics of 'lazy incremental prompting' could be formalized more explicitly to aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our evaluation claims and experimental details. We address each major comment below and outline planned revisions.

read point-by-point responses
  1. Referee: [Evaluation / abstract performance claims] The central claim that policy decomposition plus lazy incremental prompting 'significantly mitigates the hallucinations' (abstract) is load-bearing for both the novelty and the reported performance gains, yet the manuscript provides no ablation, no hallucination-rate measurements, and no head-to-head comparison of the same LLM on identical tasks with vs. without the Datalog policy decomposition.

    Authors: We acknowledge that the manuscript lacks direct ablations, quantitative hallucination-rate measurements, or head-to-head LLM comparisons with versus without the policy decomposition. The reported gains are shown via end-to-end comparisons against industrial baselines and real-world bug detection, with the design rationale that restricting LLM sub-problems to smaller syntactic/semantic scopes reduces error scope. We will revise to add an explicit limitations subsection on hallucination evidence and include qualitative examples of decomposition effects where space allows. revision: partial

  2. Referee: [Evaluation section] Reported metrics (TaintBench F1 0.72, 13 real-world bugs) are presented without accompanying experimental protocol details, data splits, error analysis, or controls for post-hoc prompt tuning or benchmark idiosyncrasies, undermining confidence that the gains are attributable to the neuro-symbolic composition rather than external factors.

    Authors: We agree that additional protocol details would increase confidence in the results. The current evaluation section references the benchmarks (TaintBench, real-world programs) and metrics but omits full data splits, prompt templates, and systematic error analysis. We will expand the evaluation section in revision to include these elements, along with controls for prompt variations. revision: yes

Circularity Check

0 steps flagged

No circularity; results measured on external benchmarks

full rationale

The paper describes an empirical neuro-symbolic system whose core claims are performance numbers (F1 0.72 on TaintBench, 13 real-world bugs) obtained by running the implemented analyzer on independent benchmark suites and real programs. No equations, fitted parameters, or first-principles derivations appear in the provided text; the policy language and prompting strategy are design choices whose effectiveness is asserted via external evaluation rather than reduced to quantities defined inside the method. The absence of ablations on hallucination reduction is a question of evidence strength, not a self-referential reduction of the reported results to the paper's own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the domain assumption that LLMs can be made reliable for semantic properties once problems are decomposed; no free parameters or invented physical entities are introduced in the abstract.

axioms (1)
  • domain assumption Decomposition of analysis tasks into smaller syntactic or semantic properties on smaller code snippets will allow LLMs to solve the semantic sub-problems with reduced hallucinations.
    Invoked in the description of the compositional neuro-symbolic approach and the role of the analysis policy language.

pith-pipeline@v0.9.0 · 5809 in / 1378 out tokens · 56336 ms · 2026-05-23T07:01:34.328442+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Agentic Interpretation: Lattice-Structured Evidence for LLM-Based Program Analysis

    cs.SE 2026-05 unverdicted novelty 7.0

    Agentic interpretation uses lattices to track LLM judgments on decomposed program claims during analysis.

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages · cited by 1 Pith paper · 1 internal anchor

  1. [1]

    Thomas W. Reps. Program analysis via graph reachability.Inf. Softw. Technol., 40(11-12):701–726, 1998. doi: 10.1016/S0950-5849(98)00093-7. URL https://doi.org/10.1016/S0950-5849(98)00093-7

  2. [2]

    Exocompilation for productive programming of hardware accelerators

    Yuka Ikarashi, Gilbert Louis Bernstein, Alex Reinking, Hasan Genc, and Jonathan Ragan-Kelley. Exocompilation for productive programming of hardware accelerators. In Ranjit Jhala and Isil Dillig, editors, PLDI ’22: 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, San Diego, CA, USA, June 13 - 17, 2022 , pages 703...

  3. [3]

    O’Hearn, and Hongseok Yang

    Cristiano Calcagno, Dino Distefano, Peter W. O’Hearn, and Hongseok Yang. Compositional shape analysis by means of bi-abduction. In Zhong Shao and Benjamin C. Pierce, editors, Proceedings of the 36th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2009, Savannah, GA, USA, January 21-23, 2009 , pages 289–300. ACM, 2009. doi: 10.114...

  4. [4]

    McDaniel

    Steven Arzt, Siegfried Rasthofer, Christian Fritz, Eric Bodden, Alexandre Bartel, Jacques Klein, Yves Le Traon, Damien Octeau, and Patrick D. McDaniel. Flowdroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for android apps. In Michael F. P. O’Boyle and Keshav Pingali, editors,ACM SIGPLAN Conference on Programming Lan...

  5. [5]

    Semfix: program repair via semantic analysis

    Hoang Duong Thien Nguyen, Dawei Qi, Abhik Roychoudhury, and Satish Chandra. Semfix: program repair via semantic analysis. In David Notkin, Betty H. C. Cheng, and Klaus Pohl, editors, 35th International Conference on Software Engineering, ICSE ’13, San Francisco, CA, USA, May 18-26, 2013 , pages 772–781. IEEE Computer Society, 2013. doi: 10.1109/ICSE.2013....

  6. [6]

    Provenfix: Temporal property-guided program repair

    Yahui Song, Xiang Gao, Wenhua Li, Wei-Ngan Chin, and Abhik Roychoudhury. Provenfix: Temporal property-guided program repair. Proc. ACM Softw. Eng., 1(FSE):226–248, 2024. doi: 10.1145/3643737. URL https://doi.org/10.1145/3643737

  7. [7]

    SVF: interprocedural static value-flow analysis in LLVM

    Yulei Sui and Jingling Xue. SVF: interprocedural static value-flow analysis in LLVM. In Ayal Zaks and Manuel V. Hermenegildo, editors, Proceedings of the 25th International Conference on Compiler Construction, CC 2016, Barcelona, Spain, March 12-18, 2016 , pages 265–266. ACM, 2016. doi: 10.1145/2892208.2892235

  8. [8]

    Pinpoint: fast and precise sparse value flow analysis for million lines of code

    Qingkai Shi, Xiao Xiao, Rongxin Wu, Jinguo Zhou, Gang Fan, and Charles Zhang. Pinpoint: fast and precise sparse value flow analysis for million lines of code. In Jeffrey S. Foster and Dan Grossman, editors, Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2018, Philadelphia, PA, USA, June 18-22, 2018 ,...

  9. [9]

    Murphy-Hill, and Robert W

    Brittany Johnson, Yoonki Song, Emerson R. Murphy-Hill, and Robert W. Bowdidge. Why don’t software developers use static analysis tools to find bugs? In David Notkin, Betty H. C. Cheng, and Klaus Pohl, editors, 35th International Conference on Software Engineering, ICSE ’13, San Francisco, CA, USA, May 18-26, 2013 , pages 672–681. IEEE Computer Society, 20...

  10. [10]

    What developers want and need from program analysis: an empirical study

    Maria Christakis and Christian Bird. What developers want and need from program analysis: an empirical study. In David Lo, Sven Apel, and Sarfraz Khurshid, editors, Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, ASE 2016, Singapore, September 3-7, 2016 , pages 332–343. ACM, 2016. doi: 10.1145/2970276.2970347

  11. [11]

    SIRO: empowering version compatibility in intermediate representations via program synthesis

    Bowen Zhang, Wei Chen, Peisen Yao, Chengpeng Wang, Wensheng Tang, and Charles Zhang. SIRO: empowering version compatibility in intermediate representations via program synthesis. In Rajiv Gupta, Nael B. Abu-Ghazaleh, Madan Musuvathi, and Dan Tsafrir, editors, Proceedings of the 29th ACM International Conference on Architectural Support for Programming Lan...

  12. [12]

    Repocoder: Repository-level code completion through iterative retrieval and generation

    Fengji Zhang, Bei Chen, Yue Zhang, Jacky Keung, Jin Liu, Daoguang Zan, Yi Mao, Jian-Guang Lou, and Weizhu Chen. Repocoder: Repository-level code completion through iterative retrieval and generation. In Houda Bouamor, Juan Pino, and Kalika Bali, editors, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Si...

  13. [13]

    Copiloting the copilots: Fusing large language models with completion engines for automated program repair

    Yuxiang Wei, Chunqiu Steven Xia, and Lingming Zhang. Copiloting the copilots: Fusing large language models with completion engines for automated program repair. In Satish Chandra, Kelly Blincoe, and Paolo Tonella, editors, Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, E...

  14. [14]

    Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik R

    Carlos E. Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik R. Narasimhan. Swe-bench: Can language models resolve real-world github issues? In The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024 . OpenReview.net, 2024. URL https://openreview.net/forum?id=VTF8yNQM66

  15. [15]

    Universal fuzzing via large language models

    Chunqiu Steven Xia, Matteo Paltenghi, Jia Le Tian, Michael Pradel, and Lingming Zhang. Universal fuzzing via large language models. CoRR, abs/2308.04748, 2023. doi: 10.48550/ARXIV.2308.04748

  16. [16]

    Fung, Sha Li, Zixuan Huang, Xu Cao, Xingyao Wang, Yiquan Wang, Heng Ji, and Chengxiang Zhai

    Ke Yang, Jiateng Liu, John Wu, Chaoqi Yang, Yi R. Fung, Sha Li, Zixuan Huang, Xu Cao, Xingyao Wang, Yiquan Wang, Heng Ji, and Chengxiang Zhai. If LLM is the wizard, then code is the wand: A survey on how code empowers large language models to serve as intelligent agents. CoRR, abs/2401.00812, 2024. doi: 10.48550/ARXIV.2401.00812. URL https://doi.org/10.48...

  17. [17]

    Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models

    Yue Zhang, Yafu Li, Leyang Cui, Deng Cai, Lemao Liu, Tingchen Fu, Xinting Huang, Enbo Zhao, Yu Zhang, Yulong Chen, Longyue Wang, Anh Tuan Luu, Wei Bi, Freda Shi, and Shuming Shi. Siren’s song in the AI ocean: A survey on hallucination in large language models. CoRR, abs/2309.01219, 2023. doi: 10.48550/ARXIV.2309.01219

  18. [18]

    Classes of recursively enumerable sets and their decision problems

    Henry Gordon Rice. Classes of recursively enumerable sets and their decision problems. Transactions of the American Mathematical society, 74(2):358–366, 1953

  19. [19]

    Tim Boland and Paul E. Black. Juliet 1.1 C/C++ and java test suite. Computer, 45(10):88–90, 2012. doi: 10.1109/MC.2012.345

  20. [20]

    Aashish Yadavally, Yi Li, Shaohua Wang, and Tien N. Nguyen. A learning-based approach to static program slicing. Proc. ACM Program. Lang., 8(OOPSLA1):83–109, 2024. doi: 10.1145/3649814. URL https://doi.org/10.1145/3649814

  21. [21]

    Taintbench: Automatic real-world malware benchmarking of android taint analyses

    Linghui Luo, Felix Pauck, Goran Piskachev, Manuel Benz, Ivan Pashchenko, Martin Mory, Eric Bodden, Ben Hermann, and Fabio Massacci. Taintbench: Automatic real-world malware benchmarking of android taint analyses. Empir. Softw. Eng., 27(1):16, 2022. doi: 10.1007/S10664-021-10013-5. URL https://doi.org/10.1007/s10664-021-10013-5

  22. [22]

    Fink, and Rastislav Bodík

    Manu Sridharan, Stephen J. Fink, and Rastislav Bodík. Thin slicing. In Jeanne Ferrante and Kathryn S. McKinley, editors, Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation, San Diego, California, USA, June 10-13, 2007 , pages 112–122. ACM, 2007. doi: 10.1145/1250734.1250748. URL https://doi.org/10.1145/1250734.1250748

  23. [23]

    Plankton: Reconciling binary code and debug information

    Anshunkang Zhou, Chengfeng Ye, Heqing Huang, Yuandao Cai, and Charles Zhang. Plankton: Reconciling binary code and debug information. In Rajiv Gupta, Nael B. Abu-Ghazaleh, Madan Musuvathi, and Dan Tsafrir, editors,Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, ASPLOS...

  24. [24]

    A wrapper script to build whole-program LLVM bitcode files

    WLLVM. A wrapper script to build whole-program LLVM bitcode files. https://github.com/travitch/whole-program-llvm, 2024. [Online; accessed 3-Dec-2024]

  25. [25]

    QL: object-oriented queries on relational data

    Pavel Avgustinov, Oege de Moor, Michael Peyton Jones, and Max Schäfer. QL: object-oriented queries on relational data. In Shriram Krishnamurthi and Benjamin S. Lerner, editors, 30th European Conference on Object-Oriented Programming, ECOOP 2016, July 18-22, 2016, Rome, Italy, volume 56 of LIPIcs, pages 2:1–2:25. Schloss Dagstuhl - Leibniz-Zentrum für Info...

  26. [26]

    Kexin Pei, David Bieber, Kensen Shi, Charles Sutton, and Pengcheng Yin. Can large language models reason about program invariants? In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors, International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA , volume 202 o...

  27. [27]

    E&v: Prompting large language models to perform static analysis by pseudo-code execution and verification

    Yu Hao, Weiteng Chen, Ziqiao Zhou, and Weidong Cui. E&v: Prompting large language models to perform static analysis by pseudo-code execution and verification. CoRR, abs/2312.08477, 2023. doi: 10.48550/ARXIV.2312.08477. URL https://doi.org/10.48550/arXiv.2312.08477

  28. [28]

    Symmetry-preserving program representations for learning code semantics

    Kexin Pei, Weichen Li, Qirui Jin, Shuyang Liu, Scott Geng, Lorenzo Cavallaro, Junfeng Yang, and Suman Jana. Symmetry-preserving program representations for learning code semantics. CoRR, abs/2308.03312, 2023. doi: 10.48550/ARXIV.2308.03312. URL https: //doi.org/10.48550/arXiv.2308.03312

  29. [29]

    Ethainter: a smart contract security analyzer for composite vulnerabilities

    Lexi Brent, Neville Grech, Sifis Lagouvardos, Bernhard Scholz, and Yannis Smaragdakis. Ethainter: a smart contract security analyzer for composite vulnerabilities. In Alastair F. Donaldson and Emina Torlak, editors, Proceedings of the 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI 2020, London, UK, June 1...

  30. [30]

    Chi, Quoc V

    Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. Chain-of- thought prompting elicits reasoning in large language models. In NeurIPS, 2022

  31. [31]

    Towards mitigating LLM hallucination via self reflection

    Ziwei Ji, Tiezheng Yu, Yan Xu, Nayeon Lee, Etsuko Ishii, and Pascale Fung. Towards mitigating LLM hallucination via self reflection. In Houda Bouamor, Juan Pino, and Kalika Bali, editors, Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023, pages 1827–1843. Association for Computational Linguistics, 2023

  32. [32]

    Tree-sitter-a new parsing system for programming tools

    Max Brunsfeld. Tree-sitter-a new parsing system for programming tools. In Strange Loop Conference,. Accessed–. URL: https://www. thestrangeloop. com//tree-sitter—a-new-parsing-system-for-programming-tools. html , 2018

  33. [33]

    PointerBench - A Points-to and Alias Analysis Benchmark Suite

    PointerBench. PointerBench - A Points-to and Alias Analysis Benchmark Suite. https://github.com/secure-software-engineering/ PointerBench, 2024. [Online; accessed 3-Dec-2024]. ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article 1. Publication date: December 2024. LLMSA: A Compositional Neuro-Symbolic Approach to Compilation-free and Customizable Stat...

  34. [34]

    Kung, Geert Janssen, Wei Zhang, Giacomo Domeniconi, Vladimir Zolotov, and et al

    Ruchir Puri, David S. Kung, Geert Janssen, Wei Zhang, Giacomo Domeniconi, Vladimir Zolotov, Julian Dolby, Jie Chen, Mihir R. Choudhury, Lindsey Decker, Veronika Thost, Luca Buratti, Saurabh Pujar, and Ulrich Finkler. Project codenet: A large-scale AI for code dataset for learning a diversity of coding tasks. CoRR, abs/2105.12655, 2021. URL https://arxiv.o...

  35. [35]

    A program slicer for java (tool paper)

    Carlos Galindo, Sergio Pérez, and Josep Silva. A program slicer for java (tool paper). In Bernd-Holger Schlingloff and Ming Chai, editors, Software Engineering and Formal Methods - 20th International Conference, SEFM 2022, Berlin, Germany, September 26-30, 2022, Proceedings, volume 13550 of Lecture Notes in Computer Science , pages 146–151. Springer, 2022...

  36. [36]

    The analysis policies of different clients

    LLMSA. The analysis policies of different clients. https://anonymous.4open.science/r/LLMSA-54FE/src/acsa/analysis/, 2024. [Online; accessed 3-Dec-2024]

  37. [37]

    Porting doop to soufflé: a tale of inter-engine portability for datalog-based analyses

    Tony Antoniadis, Konstantinos Triantafyllou, and Yannis Smaragdakis. Porting doop to soufflé: a tale of inter-engine portability for datalog-based analyses. In Karim Ali and Cristina Cifuentes, editors,Proceedings of the 6th ACM SIGPLAN International Workshop on State Of the Art in Program Analysis, SOAP@PLDI 2017, Barcelona, Spain, June 18, 2017 , pages ...

  38. [38]

    What you always wanted to know about datalog (and never dared to ask)

    Stefano Ceri, Georg Gottlob, Letizia Tanca, et al. What you always wanted to know about datalog (and never dared to ask). IEEE transactions on knowledge and data engineering , 1(1):146–166, 1989

  39. [39]

    Port, Kotagiri Ramamohanarao, and Krishnamurthy Meenakshi

    Isaac Balbin, Graeme S. Port, Kotagiri Ramamohanarao, and Krishnamurthy Meenakshi. Efficient bottom-up computation of queries on stratified databases. The Journal of logic programming , 11(3-4):295–344, 1991

  40. [40]

    Cristian Cadar, Daniel Dunbar, and Dawson R. Engler. KLEE: unassisted and automatic generation of high-coverage tests for complex systems programs. In Richard Draves and Robbert van Renesse, editors, 8th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2008, December 8-10, 2008, San Diego, California, USA, Proceedings , pages 209–224....

  41. [41]

    Semgrep*: Improving the limited performance of static application security testing (SAST) tools

    Gareth Bennett, Tracy Hall, Emily Winter, and Steve Counsell. Semgrep*: Improving the limited performance of static application security testing (SAST) tools. In Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering, EASE 2024, Salerno, Italy, June 18-21, 2024 , pages 614–623. ACM, 2024. doi: 10.1145/3661167...

  42. [42]

    Clang-Tidy

    Clang-Tidy. Clang-Tidy. https://clang.llvm.org/extra/clang-tidy/, 2024. [Online; accessed 3-Dec-2024]

  43. [43]

    Elnar Hajiyev, Mathieu Verbaere, and Oege de Moor.codeQuest: scalable source code queries with datalog. In Dave Thomas, editor, ECOOP 2006 - Object-Oriented Programming, 20th European Conference, Nantes, France, July 3-7, 2006, Proceedings , volume 4067 of Lecture Notes in Computer Science , pages 2–27. Springer, 2006. doi: 10.1007/11785477\_2. URL https:...

  44. [44]

    DIFFBASE: a differential factbase for effective software evolution management

    Xiuheng Wu, Chenguang Zhu, and Yi Li. DIFFBASE: a differential factbase for effective software evolution management. In Diomidis Spinellis, Georgios Gousios, Marsha Chechik, and Massimiliano Di Penta, editors, ESEC/FSE ’21: 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Athens, Greece, Aug...

  45. [45]

    Modeling and discovering vulnerabilities with code property graphs

    Fabian Yamaguchi, Nico Golde, Daniel Arp, and Konrad Rieck. Modeling and discovering vulnerabilities with code property graphs. In 2014 IEEE Symposium on Security and Privacy, SP 2014, Berkeley, CA, USA, May 18-21, 2014 , pages 590–604. IEEE Computer Society, 2014. doi: 10.1109/SP.2014.44. URL https://doi.org/10.1109/SP.2014.44

  46. [46]

    Using datalog for fast and easy program analysis

    Yannis Smaragdakis and Martin Bravenboer. Using datalog for fast and easy program analysis. In Oege de Moor, Georg Gottlob, Tim Furche, and Andrew Jon Sellers, editors, Datalog Reloaded - First International Workshop, Datalog 2010, Oxford, UK, March 16-19, 2010. Revised Selected Papers , volume 6702 of Lecture Notes in Computer Science , pages 245–251. Sp...

  47. [47]

    Strictly declarative specification of sophisticated points-to analyses

    Martin Bravenboer and Yannis Smaragdakis. Strictly declarative specification of sophisticated points-to analyses. In Shail Arora and Gary T. Leavens, editors, Proceedings of the 24th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2009, October 25-29, 2009, Orlando, Florida, USA , pages 243–262. A...

  48. [48]

    Soufflé: On synthesis of program analyzers

    Herbert Jordan, Bernhard Scholz, and Pavle Subotic. Soufflé: On synthesis of program analyzers. In Swarat Chaudhuri and Azadeh Farzan, editors, Computer Aided Verification - 28th International Conference, CA V 2016, Toronto, ON, Canada, July 17-23, 2016, Proceedings, Part II, volume 9780 of Lecture Notes in Computer Science , pages 422–430. Springer, 2016...

  49. [49]

    Nyx: Detecting exploitable front-running vulnerabilities in smart contracts

    Wuqi Zhang, Zhuo Zhang, Qingkai Shi, Lu Liu, Lili Wei, Yepang Liu, Xiangyu Zhang, and Shing-Chi Cheung. Nyx: Detecting exploitable front-running vulnerabilities in smart contracts. In 2024 IEEE Symposium on Security and Privacy (SP) , pages 146–146. IEEE Computer Society, 2024

  50. [50]

    Practical se- curity analysis of zero-knowledge proof circuits

    Hongbo Wen, Jon Stephens, Yanju Chen, Kostas Ferles, Shankara Pailoor, Kyle Charbonnet, Isil Dillig, and Yu Feng. Practical se- curity analysis of zero-knowledge proof circuits. In Davide Balzarotti and Wenyuan Xu, editors, 33rd USENIX Security Symposium, USENIX Security 2024, Philadelphia, PA, USA, August 14-16, 2024 . USENIX Association, 2024. URL https...

  51. [51]

    ARBITRAR: user-guided API misuse detection

    Ziyang Li, Aravind Machiry, Binghong Chen, Mayur Naik, Ke Wang, and Le Song. ARBITRAR: user-guided API misuse detection. In 42nd IEEE Symposium on Security and Privacy, SP 2021, San Francisco, CA, USA, 24-27 May 2021 , pages 1400–1415. IEEE, 2021. doi: 10.1109/SP40001.2021.00090. URL https://doi.org/10.1109/SP40001.2021.00090

  52. [52]

    Dataflow analysis-inspired deep learning for efficient vulnerability detection

    Benjamin Steenhoek, Hongyang Gao, and Wei Le. Dataflow analysis-inspired deep learning for efficient vulnerability detection. In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering, ICSE 2024, Lisbon, Portugal, April 14-20, 2024 , pages 16:1–16:13. ACM, 2024. doi: 10.1145/3597503.3623345. URL https://doi.org/10.1145/3597503.3623345

  53. [53]

    In: 2022 International Joint Conference on Neural Networks (IJCNN)

    Hazim Hanif and Sergio Maffeis. Vulberta: Simplified source code pre-training for vulnerability detection. In International Joint Conference on Neural Networks, IJCNN 2022, Padua, Italy, July 18-23, 2022 , pages 1–8. IEEE, 2022. doi: 10.1109/IJCNN55064.2022.9892280. URL https://doi.org/10.1109/IJCNN55064.2022.9892280

  54. [54]

    Hoppity: Learning graph transformations to detect and fix bugs in programs

    Elizabeth Dinella, Hanjun Dai, Ziyang Li, Mayur Naik, Le Song, and Ke Wang. Hoppity: Learning graph transformations to detect and fix bugs in programs. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020 . OpenReview.net, 2020. URL https://openreview.net/forum?id=SJeqs6EFvB

  55. [55]

    Evaluating the effectiveness of deep learning models for foundational program analysis tasks

    Qian Chen, Chenyang Yu, Ruyan Liu, Chi Zhang, Yu Wang, Ke Wang, Ting Su, and Linzhang Wang. Evaluating the effectiveness of deep learning models for foundational program analysis tasks. Proc. ACM Program. Lang., 8(OOPSLA1):500–528, 2024. doi: 10.1145/3649829. URL https://doi.org/10.1145/3649829

  56. [56]

    In 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023

    Aashish Yadavally, Tien N. Nguyen, Wenbo Wang, and Shaohua Wang. (partial) program dependence learning. In 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023 , pages 2501–2513. IEEE, 2023. doi: 10.1109/ICSE48619.2023.00209. URL https://doi.org/10.1109/ICSE48619.2023.00209

  57. [57]

    Enhancing static analysis for practical bug detection: An llm-integrated approach

    Haonan Li, Yu Hao, Yizhuo Zhai, and Zhiyun Qian. Enhancing static analysis for practical bug detection: An llm-integrated approach. Proc. ACM Program. Lang., 8(OOPSLA1):474–499, 2024. doi: 10.1145/3649828. URL https://doi.org/10.1145/3649828. ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article 1. Publication date: December 2024