pith. machine review for the scientific record. sign in

arxiv: 2604.24354 · v1 · submitted 2026-04-27 · 💻 cs.LO

Recognition: unknown

Understanding and Improving Automated Proof Synthesis for Interactive Theorem Provers

Authors on Pith no claims yet

Pith reviewed 2026-05-07 17:46 UTC · model grok-4.3

classification 💻 cs.LO
keywords automated theorem provingproof synthesisinteractive theorem proverstactic searchhuman expert patternsformal verificationdeep learning
0
0 comments X

The pith

Guiding proof search with human expert tactic patterns lets tools prove 8 percent more theorems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies theorems that current deep-learning proof synthesizers still cannot prove in interactive theorem provers. Analysis of proof scripts and search traces shows that successful proofs tend to follow sequences of tactics that human experts use. The authors therefore introduce a lightweight addition called pattern-guided tactic search that steers existing synthesizers toward those human-like choices during search. Experiments on standard benchmarks show the change raises the total number of proved theorems by an average of 8.05 percent and newly proves 20 percent more theorems that had remained out of reach. The same guidance also yields shorter proof scripts and works better on complex statements.

Core claim

Tactic selections that conform to human expert proof patterns succeed more often than those chosen by purely statistical search. By adding a heuristic that prefers such patterns, the pattern-guided tactic search method improves existing proof synthesis tools without retraining their models.

What carries the argument

Pattern-guided tactic search (PGTS), a heuristic that reorders or filters candidate tactics during search to favor sequences matching observed human expert patterns from proof corpora.

If this is right

  • Existing proof synthesizers can be improved by adding the guidance layer without retraining their neural models.
  • A measurable fraction of previously unprovable theorems become reachable, raising coverage by roughly one-fifth.
  • Generated proofs for complex theorems become shorter and more readable.
  • The approach scales to harder statements where pure search currently fails.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The finding implies that explicit structural knowledge from human proofs can usefully complement statistical learning in automated reasoning.
  • Similar pattern guidance might improve other search-based tasks such as program synthesis or hardware verification.
  • If the human patterns prove stable across corpora, future tools could learn or mine them automatically instead of using a fixed heuristic.

Load-bearing premise

The measured gains come specifically from directing search toward human expert patterns rather than from other implementation details or the choice of test theorems.

What would settle it

Applying the same PGTS heuristic to a fresh set of theorems or a different base proof synthesizer and finding no average increase (or a decrease) in proved theorems would falsify the central claim.

Figures

Figures reproduced from arXiv: 2604.24354 by Bingxu Xiao, Lingru Zhou, Manqing Zhang, Yepang Liu, Yunwei Dong.

Figure 2
Figure 2. Figure 2: An example of proof script in Verdi project and generates more concise proof scripts, thereby ad￾vancing the efficiency and effectiveness of automated proof synthesis. The remainder of this paper is organized as follows. Section 2 presents the background of this study. Section 3 describes the experimental setup and methodology of our empirical investigation. Section 4 reports the results and findings for e… view at source ↗
Figure 3
Figure 3. Figure 3: An example workflow of step-by-step proof search and tactic selection distributed systems. The script defines a theorem named trace_outputs_backup, which asserts that any event emitted solely by the backup node does not contribute to the system’s observable output. To establish the correctness of this theo￾rem, the user interacts with the ITPs by applying a sequence of tactics that incrementally construct … view at source ↗
Figure 5
Figure 5. Figure 5: illustrates the relationship between the num￾ber of Logical-Mathematical Symbols in theorems and the proof success rate. The regression lines indicate that as the number of Logical-Mathematical Symbols increases, the proof success rates of all proof synthesis tools exhibit a consistent decline. This trend may be attributed to the increasing complexity and abstraction introduced by a higher density of symbo… view at source ↗
Figure 4
Figure 4. Figure 4: The proportion of theorem types proven by proof synthesis tools view at source ↗
Figure 6
Figure 6. Figure 6: illustrates the proof success rates of theorems involving different types of logical-mathematical symbols. Notably, theorems containing quantifiers (e.g., ∀, ∃) exhibit the highest proof success rate across all evaluated synthesis tools. In contrast, theorems involving implication operators (→) and logical connectives (∧, ∨, ¬) demonstrate relatively lower success rates. In particular, the presence of logi… view at source ↗
Figure 8
Figure 8. Figure 8: illustrates the proof success rates associated with different human expert proof patterns observed in the proof scripts. Among the various patterns, the "Introduc￾tion" and "Analysis" patterns exhibit the highest success rates, indicating that current proof synthesis tools are more effective when dealing with goal decomposition and case analysis tasks. In contrast, patterns such as "Rewriting" and "Automat… view at source ↗
Figure 7
Figure 7. Figure 7: presents a comparative analysis of proof success rates between scenarios involving the introduction of auxil￾iary lemmas and those without such additional components. The empirical data clearly demonstrates a consistent pattern across all proof synthesis tools: the success rates are signif￾icantly lower in cases requiring auxiliary lemmas compared to those that can be completed without lemma introduction. … view at source ↗
Figure 9
Figure 9. Figure 9: Relationship Between Proof Search Depth and Number of Tactic Attempts for Successful and Failed Theorems complex or ambiguous proof states. These findings suggest that improving the accuracy and utility of tactic ranking, particularly in the early stages of proof search, is essential for enhancing proof synthesis performance. Finding 5: Failed proof search involves significantly more tactic attempts, large… view at source ↗
Figure 10
Figure 10. Figure 10: Correlation of Human Expert Proof Pattern Align￾ment with Tactic Effectiveness in Proof Search findings, we present a set of suggestions for enhancing future proof synthesis tools. An effective direction for future proof synthesis tools is the joint prediction of tactics and theorem types, en￾abling more context-aware and accurate proof generation. We observe a notable decline in the performance of existi… view at source ↗
Figure 11
Figure 11. Figure 11: illustrates the number of theorems proved by all studied baselines using the original depth-first search (DFS), our proposed Pattern-Guided Tactic Search (PGTS), and a variant, RPGTS. The results show that incorporating human-written proof patterns into the search process as a heuristic significantly enhances the theorem-proving capa￾bilities of automated proof synthesis tools. Notably, guiding all tactic… view at source ↗
Figure 13
Figure 13. Figure 13: The Average Reduction in Proof Steps by Four Proof Synthesis Tools Using Different Proof Search Methods Compared to Human-Written Proofs Answer for RQ6: PGTS significantly reduces proof lengths compared to DFS, achieving an average reduc￾tion of 20.82% across proof synthesis tools. 8. Discussion This section discusses several potential limitations of our approach PGTS, as well as threats to its validity. … view at source ↗
Figure 12
Figure 12. Figure 12: presents a heatmap illustrating the average proof length generated by different proof search strate￾gies across four foundational proof synthesis tools. In the heatmap, darker colors indicate shorter proof sequences, signifying higher-quality proofs. The experimental results demonstrate that our proposed PGTS tool significantly im￾proves proof synthesis by reducing proof lengths in most cases. Specificall… view at source ↗
Figure 14
Figure 14. Figure 14: Average Number of Tactic Attempts per Proof Step under Different Search Strategies To understand this phenomenon, we conducted a deeper analysis. The primary cause lies in the pattern-matching mechanism: tactics that align with human-like proof patterns are prioritized during the search. However, our current de￾sign considers only the category of a tactic, without vali￾dating the correctness of its parame… view at source ↗
read the original abstract

Formal verification using interactive theorem provers ensures high-quality software. However, writing proof scripts for interactive theorem provers is labor-intensive and requires deep expertise. Recent studies have leveraged deep learning to automate theorem proving by learning from manually written proof corpora. Nevertheless, these techniques still achieve limited success rates in proof synthesis. This paper investigates the theorems that current proof synthesis techniques are unable to prove and analyzes their characteristics. Through an in-depth analysis of the proven theorems, proof scripts, and the proof search process, we identify the limitations of existing tools and summarize the key factors influencing proof success rates. Our research provides valuable insights for the future optimization of automated proof tools. Based on our empirical study, we discover that tactic selections conforming to human expert proof patterns are more likely to lead to successful proofs. Inspired by this finding, we propose a pattern-guided tactic search (PGTS) method to heuristically enhance the search process of existing proof synthesis tools. Our evaluation experiments demonstrate that PGTS improves the number of theorems proved by existing proof synthesis tools by an average of 8.05\%, while also achieving an average 20\% increase in previously unproven theorems. Furthermore, PGTS enhances the capability of proof synthesis tools to prove complex theorems and generates more concise proof scripts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper analyzes limitations of existing deep learning-based proof synthesis tools for interactive theorem provers by examining unproven theorems, proof scripts, and search processes. It identifies that tactic selections conforming to human expert proof patterns are more likely to succeed, proposes a Pattern-Guided Tactic Search (PGTS) heuristic to enhance existing tools, and reports that PGTS yields an average 8.05% increase in proved theorems plus a 20% increase in previously unproven theorems, while also improving performance on complex theorems and producing more concise proofs.

Significance. If the empirical claims hold under detailed scrutiny, the work's analysis of proof patterns offers a useful lens for tool improvement in automated formal verification. The identification of human-pattern alignment as a success factor could guide future heuristics even if the specific PGTS gains require further validation.

major comments (2)
  1. [Abstract and Evaluation section] Abstract and Evaluation section: The central quantitative claims—an average 8.05% improvement in theorems proved and 20% more previously unproven theorems—are stated without any description of the underlying dataset(s), the specific interactive theorem prover, the baseline proof synthesis tools, the total number of theorems, the number of experimental runs, or any statistical tests. This absence prevents verification of the reported gains and their attribution to PGTS.
  2. [Section on PGTS proposal and evaluation] Section on PGTS proposal and evaluation: The claim that gains arise specifically from guiding toward human expert proof patterns is not supported by an ablation that disables only the pattern heuristic while holding search budget, infrastructure, and other implementation details fixed. Without such isolation, the improvements cannot be confidently attributed to the identified patterns rather than incidental changes in the overall procedure.
minor comments (2)
  1. [Introduction] The manuscript should define 'human expert proof patterns' and 'pattern-guided tactic search' with concrete examples in the early sections to improve accessibility.
  2. [Evaluation] Any tables or figures presenting success rates should include standard deviations or confidence intervals and clearly label all baselines and metrics.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and will incorporate revisions to improve clarity and rigor.

read point-by-point responses
  1. Referee: [Abstract and Evaluation section] Abstract and Evaluation section: The central quantitative claims—an average 8.05% improvement in theorems proved and 20% more previously unproven theorems—are stated without any description of the underlying dataset(s), the specific interactive theorem prover, the baseline proof synthesis tools, the total number of theorems, the number of experimental runs, or any statistical tests. This absence prevents verification of the reported gains and their attribution to PGTS.

    Authors: The Evaluation section of the manuscript provides the dataset details, the interactive theorem prover, baseline tools, theorem counts, run numbers, and experimental protocol. However, we agree the abstract should be self-contained. We will revise the abstract to briefly state the experimental setup, including the prover, dataset size, baselines, number of theorems, runs performed, and any statistical tests. This will facilitate verification without altering the reported results. revision: yes

  2. Referee: [Section on PGTS proposal and evaluation] Section on PGTS proposal and evaluation: The claim that gains arise specifically from guiding toward human expert proof patterns is not supported by an ablation that disables only the pattern heuristic while holding search budget, infrastructure, and other implementation details fixed. Without such isolation, the improvements cannot be confidently attributed to the identified patterns rather than incidental changes in the overall procedure.

    Authors: We acknowledge that the current evaluation compares PGTS-augmented tools against unmodified baselines but does not isolate the pattern heuristic via a controlled ablation that holds search budget, infrastructure, and all other factors fixed. To strengthen attribution of gains specifically to the human-pattern guidance, we will add such an ablation study to the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity in empirical analysis and heuristic evaluation

full rationale

The paper conducts an independent empirical analysis of proof scripts, search processes, and unproven theorems to identify human-expert pattern correlations, then implements PGTS as a heuristic inspired by that observation and measures success-rate gains directly against baselines. No equations, self-definitions, fitted parameters presented as predictions, or load-bearing self-citations appear in the derivation; the reported 8.05% and 20% improvements are external experimental outcomes, not reductions to the input analysis.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an empirical study on machine learning for theorem proving; the abstract introduces no new mathematical axioms, free parameters, or invented entities.

pith-pipeline@v0.9.0 · 5529 in / 1012 out tokens · 50302 ms · 2026-05-07T17:46:17.967942+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

36 extracted references · 2 canonical work pages · 1 internal anchor

  1. [1]

    Reliability in software-intensive systems: challenges, solutions, and future perspectives

    Francisco Henrique Ferreira, Elisa Yumi Nakagawa, and Ro- drigo Pereira dos Santos. Reliability in software-intensive systems: challenges, solutions, and future perspectives. In2021 47th Euromi- cro Conference on Software Engineering and Advanced Applications (SEAA), pages 54–61. IEEE, 2021

  2. [2]

    Towards an understanding of reliability of software-intensive systems-of-systems.Information and Software Technology, 158:107186, 2023

    Francisco Henrique Cerdeira Ferreira, Elisa Yumi Nakagawa, and Rodrigo Pereira dos Santos. Towards an understanding of reliability of software-intensive systems-of-systems.Information and Software Technology, 158:107186, 2023

  3. [3]

    Software reliability engineering: A roadmap

    Michael R Lyu. Software reliability engineering: A roadmap. In Future of Software Engineering (FOSE’07) , pages 153–170. IEEE, 2007

  4. [4]

    Software reliability assessment with OR applications , volume 364

    PK Kapur, Hoang Pham, Anshu Gupta, PC Jha, et al. Software reliability assessment with OR applications , volume 364. Springer, 2011

  5. [5]

    sel4: Formal verification of an os kernel

    Gerwin Klein, Kevin Elphinstone, Gernot Heiser, June Andronick, David Cock, Philip Derrin, Dhammika Elkaduwe, Kai Engelhardt, Rafal Kolanski, Michael Norrish, et al. sel4: Formal verification of an os kernel. InProceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, pages 207–220, 2009

  6. [6]

    Formal verification of a realistic compiler.Communi- cations of the ACM, 52(7):107–115, 2009

    Xavier Leroy. Formal verification of a realistic compiler.Communi- cations of the ACM, 52(7):107–115, 2009

  7. [7]

    Productivity for proof engineering

    Mark Staples, Ross Jeffery, June Andronick, Toby Murray, Gerwin Klein, and Rafal Kolanski. Productivity for proof engineering. In Proceedings of the 8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement,pages1–4,2014

  8. [8]

    A survey of formal verification approaches for practical systems

    Qiao Zhang, Danyang Zhuo, and James Wilcox. A survey of formal verification approaches for practical systems

  9. [9]

    InERTS 2016: Embedded Real Time Software and Systems, 8th European Congress, 2016

    Xavier Leroy, Sandrine Blazy, Daniel Kästner, Bernhard Schommer, MarkusPister,andChristianFerdinand.Compcert-aformallyverified optimizing compiler. InERTS 2016: Embedded Real Time Software and Systems, 8th European Congress, 2016

  10. [10]

    Formalverificationof a constant-time preserving c compiler.Proceedings of the ACM on Programming Languages, 4(POPL):1–30, 2019

    GillesBarthe,SandrineBlazy,BenjaminGrégoire,RémiHutin,Vin- centLaporte,DavidPichardie,andAlixTrieu. Formalverificationof a constant-time preserving c compiler.Proceedings of the ACM on Programming Languages, 4(POPL):1–30, 2019

  11. [11]

    understanding and improving automated proof synthesis for interactive theorem provers

    Manqing Zhang, Lingru Zhou, Bingxu Xiao, Yunwei Dong, and Yepang Liu. Replication package for “understanding and improving automated proof synthesis for interactive theorem provers”.https: //github.com/zmqgeek/PGTS, 2025

  12. [12]

    Learningtoprovetheoremsviainteracting with proof assistants

    KaiyuYangandJiaDeng. Learningtoprovetheoremsviainteracting with proof assistants. In International Conference on Machine Learning, pages 6984–6994. PMLR, 2019

  13. [13]

    Tactok: Semantics- aware proof synthesis

    Emily First, Yuriy Brun, and Arjun Guha. Tactok: Semantics- aware proof synthesis. Proceedings of the ACM on Programming Languages, 4(OOPSLA):1–31, 2020

  14. [14]

    Passport:Improvingautomatedformal verification using identifiers

    Alex Sanchez-Stern, Emily First, Timothy Zhou, Zhanna Kaufman, YuriyBrun,andTaliaRinger. Passport:Improvingautomatedformal verification using identifiers. ACM Transactions on Programming Languages and Systems, 45(2):1–30, 2023

  15. [15]

    Generating correctness proofs with neural networks

    Alex Sanchez-Stern, Yousef Alhessi, Lawrence Saul, and Sorin Lerner. Generating correctness proofs with neural networks. In Proceedings of the 4th ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, pages 1–10, 2020

  16. [16]

    Proof au- tomation with large language models

    Minghai Lu, Benjamin Delaware, and Tianyi Zhang. Proof au- tomation with large language models. In Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engi- neering, pages 1509–1520, 2024

  17. [17]

    Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth

    Jiawei Han, Jian Pei, Behzad Mortazavi-Asl, Helen Pinto, Qiming Chen, Umeshwar Dayal, and Meichun Hsu. Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth. In proceedings of the 17th international conference on data engineering, pages 215–224. IEEE Piscataway, NJ, USA, 2001

  18. [18]

    Mining sequentialpatternsbypattern-growth:Theprefixspanapproach

    Jian Pei, Jiawei Han, Behzad Mortazavi-Asl, Jianyong Wang, Helen Pinto, Qiming Chen, Umeshwar Dayal, and Mei-Chun Hsu. Mining sequentialpatternsbypattern-growth:Theprefixspanapproach. IEEE Transactions on knowledge and data engineering,16(11):1424–1440, M. Zhang, L. Zhou, B. Xiao et al.:Preprint submitted to Elsevier Page 16 of 17 Journal of Systems and S...

  19. [19]

    Miningcomparativesentencesfromsocialmedia text

    FabíolaSFPereira. Miningcomparativesentencesfromsocialmedia text. In Workshop on interactions between data mining and natural language processing (DMNLP) co-located with European conference on machine learning and principles and practice of knowledge discov- ery in databases (ECML/PKDD), pages 41–48, 2015

  20. [20]

    Diversity-driven automated formal verification

    Emily First and Yuriy Brun. Diversity-driven automated formal verification. In Proceedings of the 44th International Conference on Software Engineering, pages 749–761, 2022

  21. [21]

    Cobblestone: A Divide-and-Conquer Approach for Automating Formal Verification

    Saketh Ram Kasibatla, Arpan Agarwal, Yuriy Brun, Sorin Lerner, Talia Ringer, and Emily First. Cobblestone: Iterative automation for formal verification.arXiv preprint arXiv:2410.19940, 2024

  22. [22]

    Thor: Wielding hammers to integrate language mod- els and automated theorem provers.Advances in Neural Information Processing Systems, 35:8360–8373, 2022

    Albert Qiaochu Jiang, Wenda Li, Szymon Tworkowski, Konrad Czechowski, Tomasz Odrzygóźdź, Piotr Miłoś, Yuhuai Wu, and Mateja Jamnik. Thor: Wielding hammers to integrate language mod- els and automated theorem provers.Advances in Neural Information Processing Systems, 35:8360–8373, 2022

  23. [23]

    Three years of ex- perience with sledgehammer, a practical link between automatic and interactive theorem provers

    Lawrence C Paulsson and Jasmin C Blanchette. Three years of ex- perience with sledgehammer, a practical link between automatic and interactive theorem provers. InProceedings of the 8th International Workshop on the Implementation of Logics (IWIL-2010), Yogyakarta, Indonesia. EPiC, volume 2, 2012

  24. [24]

    Learning-assisted automated reasoning with flyspeck.Journal of Automated Reasoning , 53:173– 213, 2014

    Cezary Kaliszyk and Josef Urban. Learning-assisted automated reasoning with flyspeck.Journal of Automated Reasoning , 53:173– 213, 2014

  25. [25]

    Hammer for coq: Automation for dependent type theory.Journal of automated reasoning, 61:423– 453, 2018

    Łukasz Czajka and Cezary Kaliszyk. Hammer for coq: Automation for dependent type theory.Journal of automated reasoning, 61:423– 453, 2018

  26. [26]

    Baldur: Whole-proof generation and repair with large language models

    Emily First, Markus N Rabe, Talia Ringer, and Yuriy Brun. Baldur: Whole-proof generation and repair with large language models. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineer- ing, pages 1229–1241, 2023

  27. [27]

    Leandojo: Theorem proving with retrieval-augmented language models

    KaiyuYang,AidanSwope,AlexGu,RahulChalamala,PeiyangSong, Shixing Yu, Saad Godil, Ryan J Prenger, and Animashree Anandku- mar. Leandojo: Theorem proving with retrieval-augmented language models. Advances in Neural Information Processing Systems , 36: 21573–21612, 2023

  28. [28]

    Zhongxin Liu, Xin Xia, Ahmed E Hassan, David Lo, Zhenchang Xing, and Xinyu Wang. Neural-machine-translation-based commit message generation: how far are we? In Proceedings of the 33rd ACM/IEEE international conference on automated software engi- neering, pages 373–384, 2018

  29. [29]

    Data quality matters:Acasestudyondatalabelcorrectnessforsecuritybugreport prediction

    Xiaoxue Wu, Wei Zheng, Xin Xia, and David Lo. Data quality matters:Acasestudyondatalabelcorrectnessforsecuritybugreport prediction. IEEE Transactions on Software Engineering,48(7):2541– 2556, 2021

  30. [30]

    Deep learning based vulnerability detection: Are we there yet? IEEE Transactions on Software Engineering,48(9):3280–3296,2021

    Saikat Chakraborty, Rahul Krishna, Yangruibo Ding, and Baishakhi Ray. Deep learning based vulnerability detection: Are we there yet? IEEE Transactions on Software Engineering,48(9):3280–3296,2021

  31. [31]

    What makes a good commit message? InProceedings of the 44th International Conference on Software Engineering,pages2389– 2401, 2022

    Yingchen Tian, Yuxia Zhang, Klaas-Jan Stol, Lin Jiang, and Hui Liu. What makes a good commit message? InProceedings of the 44th International Conference on Software Engineering,pages2389– 2401, 2022

  32. [32]

    Revisiting, benchmarking and exploring api recommendation: How far are we?IEEE Transactions on Software Engineering, 49(4):1876–1897, 2022

    Yun Peng, Shuqing Li, Wenwei Gu, Yichen Li, Wenxuan Wang, Cuiyun Gao, and Michael R Lyu. Revisiting, benchmarking and exploring api recommendation: How far are we?IEEE Transactions on Software Engineering, 49(4):1876–1897, 2022

  33. [33]

    Revisitinglearning- basedcommitmessagegeneration

    JinhaoDong,YilingLou,DanHao,andLinTan. Revisitinglearning- basedcommitmessagegeneration. In 2023 IEEE/ACM 45th Interna- tional Conference on Software Engineering (ICSE) , pages 794–805. IEEE, 2023

  34. [34]

    An empirical study of deep learning models for vulnerability detection

    BenjaminSteenhoek,MdMahbuburRahman,RichardJiles,andWei Le. An empirical study of deep learning models for vulnerability detection. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), pages 2237–2248. IEEE, 2023

  35. [35]

    Automaticcommitmessagegeneration: A critical review and directions for future work.IEEE Transactions on Software Engineering, 50(4):816–835, 2024

    Yuxia Zhang, Zhiqing Qiu, Klaas-Jan Stol, Wenhui Zhu, Jiaxin Zhu, YingchenTian,andHuiLiu. Automaticcommitmessagegeneration: A critical review and directions for future work.IEEE Transactions on Software Engineering, 50(4):816–835, 2024

  36. [36]

    Learning-based models for vulnerability detection: An extensive study

    Chao Ni, Liyu Shen, Xiaodan Xu, Xin Yin, and Shaohua Wang. Learning-based models for vulnerability detection: An extensive study. arXiv preprint arXiv:2408.07526, 2024. M. Zhang, L. Zhou, B. Xiao et al.:Preprint submitted to Elsevier Page 17 of 17