ClassInvGen: Class Invariant Synthesis using Large Language Models

Chuyue Sun; Clark Barrett; David Dill; Jubi Taneja; Saikat Chakraborty; Shuvendu K. Lahiri; Viraj Agashe; Xiaokang Qiu

arxiv: 2502.18917 · v2 · pith:ILVUCNJQnew · submitted 2025-02-26 · 💻 cs.AI · cs.PL· cs.SE

ClassInvGen: Class Invariant Synthesis using Large Language Models

Chuyue Sun , Viraj Agashe , Saikat Chakraborty , Jubi Taneja , Clark Barrett , David Dill , Xiaokang Qiu , Shuvendu K. Lahiri This is my paper

Pith reviewed 2026-05-23 02:40 UTC · model grok-4.3

classification 💻 cs.AI cs.PLcs.SE

keywords class invariant synthesislarge language modelsC++ specificationsinvariant inferenceexecutable invariantsprogram verificationDaikon comparisonspecification generation

0 comments

The pith

ClassInvGen uses LLMs to co-generate executable C++ class invariants and tests that outperform direct prompting and Daikon.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces ClassInvGen to synthesize class invariants for C++ by prompting large language models to produce both the invariant as a pure function and matching test inputs. This joint generation aims to yield invariants that are correct when executed and complete enough to distinguish program behaviors. On a contributed benchmark of standard C++ data structures the method produces invariants that pass more tests and kill more mutants than baselines. The same approach is shown to scale to classes in a high-integrity production codebase. If the claim holds, developers gain a practical way to obtain enforceable specifications without manual annotation or limited dynamic analysis.

Core claim

ClassInvGen is a method for co-generating executable class invariants and test inputs to produce high-quality class invariants for C++, leveraging LLMs' ability to synthesize pure functions. It outperforms a pure LLM-based technique to generate specifications from code as well as prior data-driven invariant inference techniques such as Daikon on a benchmark of standard C++ data structures, and demonstrates applicability through a case study on classes in a widely used high-integrity C++ codebase.

What carries the argument

ClassInvGen, the co-generation procedure that prompts an LLM to emit both a pure-function class invariant and test inputs used to validate it.

If this is right

Generated invariants can be compiled and checked at runtime or fed to static verifiers when the language supports it.
The new benchmark and mutant harness supply a concrete way to quantify both correctness and completeness of any invariant generator for C++.
The case-study results indicate the method can be applied directly to classes inside existing high-integrity codebases without language changes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same co-generation pattern could be tried on other object-oriented languages that support pure functions for invariants.
Test inputs produced alongside invariants might serve as an automatic feedback loop to refine the LLM output in subsequent prompts.
If the invariants prove stable across versions, they could be stored as living documentation that is mechanically checked on every build.

Load-bearing premise

Large language models can reliably output correct, executable pure functions that serve as class invariants in C++ and exceed the quality of both direct prompting and dynamic tools like Daikon without systematic logical errors.

What would settle it

On the provided C++ data-structure benchmark, if the invariants produced by ClassInvGen do not pass a higher fraction of tests or kill a higher fraction of mutants than those from pure LLM prompting or from Daikon, the performance claim is false.

Figures

Figures reproduced from arXiv: 2502.18917 by Chuyue Sun, Clark Barrett, David Dill, Jubi Taneja, Saikat Chakraborty, Shuvendu K. Lahiri, Viraj Agashe, Xiaokang Qiu.

**Figure 2.** Figure 2: Overview of ClassInvGen. An overview of the ClassInvGen framework is shown in [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Header file of AvlTree. The algorithm then generates invariants using generateInvariantWithLLM and stores them in invariants_dict (Line 12). The helper function getCodeForClass constructs class representations by combining the declaration text with any existing invariants from invariants_dict (Lines 15–18). The algorithm concludes by returning final_invariants for the target class, effectively managing the… view at source ↗

**Figure 4.** Figure 4: Evaluation of ClassInvGen generated invariants ClassInvGen produces filtered invariants, which we evaluate using an automated pipeline ( [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: Completeness Experiment Result. The 3 bars from left to right are Tests, [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

**Figure 6.** Figure 6: Mutant that survived unit tests but was killed by ClassInvGen [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗

**Figure 7.** Figure 7: Another Mutant that survived unit tests but was killed by ClassInvGen [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗

**Figure 8.** Figure 8: Daikon vs. ClassInvGen Kills KLEE [6], Dafny [26] and Frama-C [20]. We selected the Z3 codebase due to its stringent correctness requirements; as an SMT solver, Z3 is employed in applications demanding high reliability. This high-stakes environment makes Z3 an ideal testbed for assessing the effectiveness of synthesized invariants. The bdd_manager class5 is particularly noteworthy. It was chosen because i… view at source ↗

**Figure 9.** Figure 9: Z3 developer-written class invariants for [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗

**Figure 10.** Figure 10: Correct and useful invariant for bdd_manager class 1 // Cache consistency: Entries in the operation cache should be valid 2 for (const auto* e : m_op_cache ) { 3 assert(e != nullptr); 4 assert(e->m_result != null_bdd); 5 } [PITH_FULL_IMAGE:figures/full_fig_p016_10.png] view at source ↗

**Figure 12.** Figure 12: Correct and useless invariant for bdd_manager class 1 // The number of nodes should not exceed the maximum number of BDD nodes 2 assert(m_nodes.size () <= m_max_num_bdd_nodes ); [PITH_FULL_IMAGE:figures/full_fig_p017_12.png] view at source ↗

**Figure 14.** Figure 14: ClassInvGen Generation system prompt: instruction and task descrip [PITH_FULL_IMAGE:figures/full_fig_p024_14.png] view at source ↗

**Figure 15.** Figure 15: ClassInvGen Generation system prompt: input-output format. [PITH_FULL_IMAGE:figures/full_fig_p025_15.png] view at source ↗

**Figure 17.** Figure 17: ClassInvGen Refinement system prompt: instruction and task descrip [PITH_FULL_IMAGE:figures/full_fig_p026_17.png] view at source ↗

**Figure 18.** Figure 18: ClassInvGen Refinement user prompt template. [PITH_FULL_IMAGE:figures/full_fig_p027_18.png] view at source ↗

**Figure 19.** Figure 19: AvlTree instrumentation and invariant example [PITH_FULL_IMAGE:figures/full_fig_p033_19.png] view at source ↗

**Figure 20.** Figure 20: Example of an incorrect invariant of AvlTree because there is no [PITH_FULL_IMAGE:figures/full_fig_p033_20.png] view at source ↗

**Figure 21.** Figure 21: A test suite generated for AvlTree 1 avl_tree.cpp: In lambda function: 2 avl_tree.cpp :21:16: error: use of 3 ’is_balanced’ before deduction of ’ auto’ (a) gcc compiler error messages 1 void AvlTree :: check_invariant () { 2 auto is_bst = [&](const std :: unique_ptr <Node >& node , 3 const T& min , const T& max) -> bool { 4 if (! node) return true; 5 if (node ->data <= min || 6 node ->data >= max) return … view at source ↗

**Figure 23.** Figure 23: BST property after refinement: auto is changed to explicit declarations [PITH_FULL_IMAGE:figures/full_fig_p034_23.png] view at source ↗

read the original abstract

Formal program specifications in the form of preconditions, postconditions, and class invariants have several benefits for the construction and maintenance of programs. They not only aid in program understanding due to their unambiguous semantics but can also be enforced dynamically (or even statically when the language supports a formal verifier). However, synthesizing high-quality specifications in an underlying programming language is limited by the expressivity of the specifications or the need to express them in a declarative manner. Prior work has demonstrated the potential of large language models (LLMs) for synthesizing high-quality method pre/postconditions for Python and Java, but does not consider class invariants. In this work, we describe ClassInvGen, a method for co-generating executable class invariants and test inputs to produce high-quality class invariants for a mainstream language such as C++, leveraging LLMs' ability to synthesize pure functions. We show that ClassInvGen outperforms a pure LLM-based technique to generate specifications (from code) as well as prior data-driven invariant inference techniques such as Daikon. We contribute a benchmark of standard C++ data structures along with a harness that can help measure both the correctness and completeness of generated specifications using tests and mutants. We also demonstrate its applicability to real-world code by performing a case study on several classes within a widely used and high-integrity C++ codebase.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ClassInvGen gets LLMs to produce executable C++ class invariants by co-generating test inputs too, and it beats direct prompting plus Daikon on their new benchmark, but the shared harness is the part that needs checking.

read the letter

The core advance is taking the LLM approach that already worked for method pre- and postconditions and extending it to class invariants in C++, with the added step of producing test inputs at the same time. They also ship a benchmark of standard data structures and a harness that scores both correctness and completeness through tests and mutants, plus a case study on real high-integrity code. That combination is new relative to the Daikon-style and pure-LLM baselines they cite, and the executable pure-function form is a practical choice for C++ where declarative specs are harder to enforce. The benchmark and harness themselves are the most reusable pieces here; anyone working on invariant tools will probably grab those. The evaluation setup does carry the risk the stress-test flagged: when the same LLM run produces both the invariant and the tests that later judge it, the reported gains in completeness could partly reflect alignment between the two rather than independent strength. Mutant killing also tends to miss certain logical gaps that static checks or manual review would catch. The abstract gives no breakdown of how many invariants were hand-checked or how the mutant operators were chosen, so the outperformance numbers will need the full tables and any ablation on prompt or harness design to land cleanly. This paper is aimed at people building spec generators or LLM-assisted verification tools rather than pure theory audiences. The benchmark and the C++ focus make it worth a referee's time even if the evaluation needs tightening on independence of oracles.

Referee Report

2 major / 2 minor

Summary. The paper introduces ClassInvGen, an LLM-based method for co-generating executable class invariants (as pure C++ functions) and associated test inputs. It claims this approach outperforms both direct LLM prompting for specifications and prior dynamic invariant inference tools such as Daikon, evaluated on a contributed benchmark of standard C++ data structures using a harness that assesses correctness and completeness via tests and mutants. A case study applying the method to classes in a high-integrity real-world C++ codebase is also presented.

Significance. If the empirical claims hold under scrutiny, the work advances automated synthesis of class invariants for C++, supporting program understanding, dynamic enforcement, and verification in a mainstream language. The contributed benchmark and harness represent a concrete resource for the community. The emphasis on executable pure-function invariants and the real-world case study are positive elements.

major comments (2)

[Section 4] Section 4 (Evaluation harness): the co-generation of test inputs together with the invariants creates a potential circularity risk for the central outperformance claim. The manuscript must clarify whether test inputs are produced independently of the synthesized invariants (e.g., via a fixed oracle or exhaustive enumeration) or demonstrate that results remain stable when evaluated against externally supplied test suites.
[Section 4.2–4.3] Section 4.2–4.3 (Mutant analysis and results): mutant killing is used to assess completeness, yet the paper provides no explicit justification that the chosen mutation operators target the classes of faults typical for class invariants (e.g., missed state transitions or incorrect boundary conditions). Without this or supplementary manual validation counts, the superiority claim over Daikon rests on an incomplete fault model.

minor comments (2)

[Abstract] Abstract: the claim of outperformance is stated without any numerical results or effect sizes; including at least the headline correctness/completeness deltas would make the summary self-contained.
[Section 3] Notation: the distinction between “pure LLM-based technique” and ClassInvGen should be made explicit in the first paragraph of Section 3 so readers immediately see the precise difference in prompting strategy.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments and detailed review of our manuscript. We address each major comment point by point below.

read point-by-point responses

Referee: [Section 4] Section 4 (Evaluation harness): the co-generation of test inputs together with the invariants creates a potential circularity risk for the central outperformance claim. The manuscript must clarify whether test inputs are produced independently of the synthesized invariants (e.g., via a fixed oracle or exhaustive enumeration) or demonstrate that results remain stable when evaluated against externally supplied test suites.

Authors: We acknowledge the potential circularity concern arising from co-generating invariants and test inputs. The current approach has the LLM produce test inputs from the class interface and code to exercise the invariants, which may not be fully independent. In the revised manuscript, we will clarify the generation process to show that test inputs are derived from the class definition without reference to the specific invariant being synthesized. We will also add results evaluating the generated invariants against an externally supplied test suite (drawn from standard C++ data structure tests) to confirm that the outperformance claims remain stable. revision: yes
Referee: [Section 4.2–4.3] Section 4.2–4.3 (Mutant analysis and results): mutant killing is used to assess completeness, yet the paper provides no explicit justification that the chosen mutation operators target the classes of faults typical for class invariants (e.g., missed state transitions or incorrect boundary conditions). Without this or supplementary manual validation counts, the superiority claim over Daikon rests on an incomplete fault model.

Authors: We agree that an explicit justification for the mutation operators relative to typical class invariant faults is needed to strengthen the completeness evaluation. In the revised manuscript, we will expand the discussion in Sections 4.2–4.3 to explain how the chosen operators (including those for state transitions and boundary conditions) target common invariant faults. We will also include results from manual validation of a sample of mutants to further support the fault model and the comparison against Daikon. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical method evaluated against external baselines

full rationale

The paper presents ClassInvGen as an LLM-based technique for synthesizing executable C++ class invariants, with evaluation against pure LLM prompting and Daikon on a contributed benchmark using tests and mutants. No equations, fitted parameters, self-definitional loops, or load-bearing self-citations appear in the abstract or described claims. The central outperformance result is framed as an empirical comparison rather than a derivation that reduces to its own inputs by construction. The contributed harness is presented as an independent measurement tool, not a self-referential fit.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated.

pith-pipeline@v0.9.0 · 5794 in / 986 out tokens · 25509 ms · 2026-05-23T02:40:32.183198+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Large Lemma Miners: Can LLMs do Induction Proofs for Hardware?
cs.LO 2025-11 conditional novelty 6.0

A neurosymbolic method using two LLM prompting frameworks generates provably correct inductive arguments for 84% of a set of mid-size open-source RTL hardware designs.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

In: Proceedings ofthe29thACMSIGPLAN-SIGACTSymposiumonPrinciplesofProgram- ming Languages

Ammons, G., Bodík, R., Larus, J.R.: Mining specifications. In: Proceedings ofthe29thACMSIGPLAN-SIGACTSymposiumonPrinciplesofProgram- ming Languages. p. 4–16. POPL ’02, Association for Computing Machin- ery, New York, NY, USA (2002). https://doi.org/10.1145/503272.503275, https://doi.org/10.1145/503272.503275

work page doi:10.1145/503272.503275 2002
[2]

In: Beyer, D., Cavalcanti, A

Boockmann, J.H., Lüttgen, G.: Comprehending object state via dynamic class invariant learning. In: Beyer, D., Cavalcanti, A. (eds.) Fundamental Approaches to Software Engineering. pp. 143–164. Springer Nature Switzer- land, Cham (2024)

work page 2024
[3]

https://doi.org/10.5281/zenodo.14061403,https:// doi.org/10.5281/zenodo.14061403

Brunsfeld, M., Qureshi, A., Hlynskyi, A., Thomson, P., ObserverOfTime, Vera, J., dundargoc, Turnbull, P., Clem, T., Creager, D., Helwer, A., Rix, R., Kavolis, D., van Antwerpen, H., Davis, M., Lillis, W., Ika, Yahyaabadi, A., Nguy˜ en, T.A., bfredl, Massicotte, M., Brunk, S., Clason, C., Hasabnis, N., Dong, M., Moelius, S., Kalt, S., Finer, S., Kolja: tre...

work page doi:10.5281/zenodo.14061403 2024
[4]

IEEE Transactions on ComputersC-35(8), 677–691 (1986)

Bryant, R.: Graph-based algorithms for boolean function manipu- lation. IEEE Transactions on ComputersC-35(8), 677–691 (1986). https://doi.org/10.1109/TC.1986.1676819

work page doi:10.1109/tc.1986.1676819 1986
[5]

Interna- tional journal on software tools for technology transfer7, 212–232 (2005)

Burdy, L., Cheon, Y., Cok, D.R., Ernst, M.D., Kiniry, J.R., Leavens, G.T., Leino, K.R.M., Poll, E.: An overview of jml tools and applications. Interna- tional journal on software tools for technology transfer7, 212–232 (2005)

work page 2005
[6]

In: Proceedings of the 8th USENIX Conference on Operating Systems Design and Imple- mentation

Cadar, C., Dunbar, D., Engler, D.: Klee: unassisted and automatic genera- tion of high-coverage tests for complex systems programs. In: Proceedings of the 8th USENIX Conference on Operating Systems Design and Imple- mentation. p. 209–224. OSDI’08, USENIX Association, USA (2008)

work page 2008
[7]

Cousot, P., Cousot, R.: Abstract interpretation: a unified lattice model for staticanalysisofprogramsbyconstructionorapproximationoffixpoints.In: Proceedings of the 4th ACM SIGACT-SIGPLAN symposium on Principles of programming languages. pp. 238–252 (1977)

work page 1977
[8]

In: Proceedings of the 30th International Conference on Software Engineering

Csallner, C., Tillmann, N., Smaragdakis, Y.: Dysy: dynamic sym- bolic execution for invariant inference. In: Proceedings of the 30th International Conference on Software Engineering. p. 281–290. ICSE ’08, Association for Computing Machinery, New York, NY, USA (2008). https://doi.org/10.1145/1368088.1368127,https://doi.org/10. 1145/1368088.1368127

work page doi:10.1145/1368088.1368127 2008
[9]

Dimitrios, A.: Algorithms and data structures (2023),https: //github.com/djeada/Algorithms-And-Data-Structures/tree/ master/src/collections_and_containers/cpp

work page 2023
[10]

Sun et al

Endres, M., Fakhoury, S., Chakraborty, S., Lahiri, S.K.: Can large language models transform natural language intent into formal 20 C. Sun et al. method postconditions? Proc. ACM Softw. Eng.1(FSE) (Jul 2024). https://doi.org/10.1145/3660791,https://doi.org/10.1145/3660791

work page doi:10.1145/3660791 2024
[11]

Science of Computer Programming69(1–3), 35–45 (Dec 2007)

Ernst,M.D.,Perkins,J.H.,Guo,P.J.,McCamant,S.,Pacheco,C.,Tschantz, M.S., Xiao, C.: The Daikon system for dynamic detection of likely invari- ants. Science of Computer Programming69(1–3), 35–45 (Dec 2007)

work page 2007
[12]

In: Static Analysis: 17th International Symposium, SAS 2010, Perpignan, France, September 14-16, 2010

Fähndrich, M.: Static verification for code contracts. In: Static Analysis: 17th International Symposium, SAS 2010, Perpignan, France, September 14-16, 2010. Proceedings 17. pp. 2–5. Springer (2010)

work page 2010
[13]

Retrieval-Augmented Generation for Large Language Models: A Survey

Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., Wang, M., Wang, H.: Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[14]

In: Biere, A., Bloem, R

Garg, P., Löding, C., Madhusudan, P., Neider, D.: Ice: a robust frame- work for learning invariants. In: Biere, A., Bloem, R. (eds.) Computer Aided Verification. pp. 69–87. Springer International Publishing, Cham (2014)

work page 2014
[15]

Garg, P., Neider, D., Madhusudan, P., Roth, D.: Learning invariants using decision trees and implication counterexamples. In: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Program- mingLanguages.p.499–512.POPL’16,AssociationforComputingMachin- ery, New York, NY, USA (2016). https://doi.org/10.1145/2837614.2837664, https://do...

work page doi:10.1145/2837614.2837664 2016
[16]

Greiner, S., Bühlmann, N., Ohrndorf, M., Tsigkanos, C., Nierstrasz, O., Kehrer, T.: Automated generation of code contracts: Generative ai to the rescue? In: Proceedings of the 23rd ACM SIGPLAN International Con- ference on Generative Programming: Concepts and Experiences. pp. 1–14 (2024)

work page 2024
[17]

arXiv preprint (March 2019),https://www.microsoft.com/en-us/research/publication/ are-my-invariants-valid-a-learning-approach/

Hellendoorn, V.J., Devanbu, P.T., Polozov, A., Marron, M.: Are my invariants valid? a learning approach. arXiv preprint (March 2019),https://www.microsoft.com/en-us/research/publication/ are-my-invariants-valid-a-learning-approach/

work page 2019
[18]

In: Proceedings of the 31st ACM SIGPLAN-SIGACT sympo- sium on Principles of programming languages

Henzinger, T.A., Jhala, R., Majumdar, R., McMillan, K.L.: Abstractions from proofs. In: Proceedings of the 31st ACM SIGPLAN-SIGACT sympo- sium on Principles of programming languages. pp. 232–244 (2004)

work page 2004
[19]

Kamath, A., Senthilnathan, A., Chakraborty, S., Deligiannis, P., Lahiri, S.K., Lal, A., Rastogi, A., Roy, S., Sharma, R.: Finding inductive loop invariants using large language models (2023),https://arxiv.org/abs/ 2311.07948

work page arXiv 2023
[20]

Formal aspects of computing 27(3), 573–609 (2015)

Kirchner, F., Kosmatov, N., Prevosto, V., Signoles, J., Yakobowski, B.: Frama-c: A software analysis perspective. Formal aspects of computing 27(3), 573–609 (2015)

work page 2015
[21]

In: Formal Methods in Computer-Aided Design (FM- CAD’24) (July 2024)

Lahiri, S.: Evaluating llm-driven user-intent formalization for verification- aware languages. In: Formal Methods in Computer-Aided Design (FM- CAD’24) (July 2024)

work page 2024
[22]

In: Bouajjani, A., Maler, O

Lahiri, S.K., Qadeer, S., Galeotti, J.P., Voung, J.W., Wies, T.: Intra-module inference. In: Bouajjani, A., Maler, O. (eds.) Computer Aided Verification. pp. 493–508. Springer Berlin Heidelberg, Berlin, Heidelberg (2009) ClassInvGen: Class Invariant Synthesis using Large Language Models 21

work page 2009
[23]

In: International Symposium on Code Generation and Optimization, 2004

Lattner, C., Adve, V.: Llvm: a compilation framework for lifelong pro- gram analysis & transformation. In: International Symposium on Code Generation and Optimization, 2004. CGO 2004. pp. 75–86 (2004). https://doi.org/10.1109/CGO.2004.1281665

work page doi:10.1109/cgo.2004.1281665 2004
[24]

Proceedings of the ACM on Programming Languages 7(OOPSLA1), 286–315 (2023)

Lattuada, A., Hance, T., Cho, C., Brun, M., Subasinghe, I., Zhou, Y., Howell, J., Parno, B., Hawblitzel, C.: Verus: Verifying rust programs using linear ghost types. Proceedings of the ACM on Programming Languages 7(OOPSLA1), 286–315 (2023)

work page 2023
[25]

In: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Im- plementation

Le, T.C., Zheng, G., Nguyen, T.: Sling: using dynamic analysis to in- fer program invariants in separation logic. In: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Im- plementation. p. 788–801. PLDI 2019, Association for Computing Machin- ery, New York, NY, USA (2019). https://doi.org/10.1145/3314221.3314634, https://do...

work page doi:10.1145/3314221.3314634 2019
[26]

In: International conference on logic for programming artificial intelligence and reasoning

Leino, K.R.M.: Dafny: An automatic program verifier for functional cor- rectness. In: International conference on logic for programming artificial intelligence and reasoning. pp. 348–370. Springer (2010)

work page 2010
[27]

Nguyen, Wenbo Wang, and Shaohua Wang

Lemieux, C., Inala, J.P., Lahiri, S.K., Sen, S.: Codamosa: Es- caping coverage plateaus in test generation with pre-trained large language models. In: Proceedings of the 45th Interna- tional Conference on Software Engineering. p. 919–931. ICSE ’23, IEEE Press (2023). https://doi.org/10.1109/ICSE48619.2023.00085, https://doi.org/10.1109/ICSE48619.2023.00085

work page doi:10.1109/icse48619.2023.00085 2023
[28]

Transactions of the Association for Computational Linguistics12, 157–173 (2024)

Liu, N.F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., Liang, P.: Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics12, 157–173 (2024)

work page 2024
[29]

Lohmann, N., other contributors: mutate_cpp - mutation testing tool for c++.https://github.com/nlohmann/mutate_cpp(2017)

work page 2017
[30]

CoRR abs/2401.08807(2024)

Ma, L., Liu, S., Li, Y., Xie, X., Bu, L.: Specgen: Automated genera- tion of formal program specifications via large language models. CoRR abs/2401.08807(2024). https://doi.org/10.48550/ARXIV.2401.08807, https://doi.org/10.48550/arXiv.2401.08807

work page doi:10.48550/arxiv.2401.08807 2024
[31]

In: Jensen, K., Podel- ski, A

McMillan, K.L.: An interpolating theorem prover. In: Jensen, K., Podel- ski, A. (eds.) Tools and Algorithms for the Construction and Analysis of Systems. pp. 16–30. Springer Berlin Heidelberg, Berlin, Heidelberg (2004)

work page 2004
[32]

See http://www

Meyer, B.: The eiffel programming language. See http://www. eiffel. com (1992)

work page 1992
[33]

In: Proceedings of the 44th International Conference on Software Engineering

Molina, F., d’Amorim, M., Aguirre, N.: Fuzzing class specifications. In: Proceedings of the 44th International Conference on Software Engineering. p. 1008–1020. ICSE ’22, Association for Computing Machinery, New York, NY, USA (2022). https://doi.org/10.1145/3510003.3510120,https://doi. org/10.1145/3510003.3510120

work page doi:10.1145/3510003.3510120 2022
[34]

In: 2008 Tools and Algorithms for Construction and Analysis of Systems

de Moura, L., Bjørner, N.: Z3: an efficient smt solver. In: 2008 Tools and Algorithms for Construction and Analysis of Systems. pp. 337–340. 22 C. Sun et al. Springer, Berlin, Heidelberg (March 2008),https://www.microsoft.com/ en-us/research/publication/z3-an-efficient-smt-solver/

work page 2008
[35]

Newcomb, J.L., Bodik, R.: Using human-in-the-loop synthesis to author functional reactive programs (2019),https://arxiv.org/abs/1909.11206

work page arXiv 2019
[36]

In: 2012 34th Interna- tional Conference on Software Engineering (ICSE)

Nguyen, T., Kapur, D., Weimer, W., Forrest, S.: Using dynamic analy- sis to discover polynomial and array invariants. In: 2012 34th Interna- tional Conference on Software Engineering (ICSE). pp. 683–693 (2012). https://doi.org/10.1109/ICSE.2012.6227149

work page doi:10.1109/icse.2012.6227149 2012
[37]

In: Proceedings of the 37th ACM SIG- PLAN Conference on Programming Language Design and Implementa- tion, PLDI 2016, Santa Barbara, CA, USA, June 13-17, 2016

Padhi, S., Sharma, R., Millstein, T.D.: Data-driven precondition infer- ence with learned features. In: Proceedings of the 37th ACM SIG- PLAN Conference on Programming Language Design and Implementa- tion, PLDI 2016, Santa Barbara, CA, USA, June 13-17, 2016. pp. 42–56 (2016). https://doi.org/10.1145/2908080.2908099,http://doi.acm.org/ 10.1145/2908080.2908099

work page doi:10.1145/2908080.2908099 2016
[38]

(eds.) Proceedings of the 40th International Conference on Machine Learning

Pei, K., Bieber, D., Shi, K., Sutton, C., Yin, P.: Can large language mod- els reason about program invariants? In: Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., Scarlett, J. (eds.) Proceedings of the 40th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 202, pp. 27496–27520. PMLR (23–29 Jul 20...

work page 2023
[39]

In: Proceedings of the eigh- teenth international symposium on Software testing and analysis

Polikarpova, N., Ciupa, I., Meyer, B.: A comparative study of programmer- written and automatically inferred contracts. In: Proceedings of the eigh- teenth international symposium on Software testing and analysis. pp. 93– 104 (2009)

work page 2009
[40]

Schäfer, M., Nadi, S., Eghbali, A., Tip, F.: An empirical evaluation of using large language models for automated unit test generation (2023),https: //arxiv.org/abs/2302.06527

work page arXiv 2023
[41]

ACM SIGPLAN Notices46(9), 266–278 (2011)

Swamy, N., Chen, J., Fournet, C., Strub, P.Y., Bhargavan, K., Yang, J.: Se- cure distributed programming with value-dependent types. ACM SIGPLAN Notices46(9), 266–278 (2011)

work page 2011
[42]

In: Gurfinkel, A., Ganesh, V

Wen, C., Cao, J., Su, J., Xu, Z., Qin, S., He, M., Li, H., Cheung, S.C., Tian, C.: Enchanting program specification synthesis by large language models us- ing static analysis and program verification. In: Gurfinkel, A., Ganesh, V. (eds.) Computer Aided Verification. pp. 302–328. Springer Nature Switzer- land, Cham (2024)

work page 2024
[43]

Wikipedia contributors: Avl tree — Wikipedia, the free encyclope- dia (2024),https://en.wikipedia.org/w/index.php?title=AVL_tree& oldid=1255561501, [Online; accessed 15-November-2024]

work page 2024
[44]

In: Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering

Wu, G., Cao, W., Yao, Y., Wei, H., Chen, T., Ma, X.: Llm meets bounded model checking: Neuro-symbolic loop invariant inference. In: Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering. p. 406–417. ASE ’24, Association for Computing Machin- ery, New York, NY, USA (2024). https://doi.org/10.1145/3691620.3695014, https...

work page doi:10.1145/3691620.3695014 2024
[45]

ClassInvGen: Class Invariant Synthesis using Large Language Models 23 Proceedings of the ACM on Programming Languages8(OOPSLA2), 709– 735 (2024)

Yang, C., Deng, Y., Lu, R., Yao, J., Liu, J., Jabbarvand, R., Zhang, L.: Whitefox:White-boxcompilerfuzzingempoweredbylargelanguagemodels. ClassInvGen: Class Invariant Synthesis using Large Language Models 23 Proceedings of the ACM on Programming Languages8(OOPSLA2), 709– 735 (2024)

work page 2024
[46]

REASONING :

z3Prover: z3prover util classes.https://github.com/Z3Prover/z3/tree/ master/src/util(2024) 24 C. Sun et al. A Prompt 1You are an expert in creating program invariants from code and natural language . 2Invariants are assertions on the variables in scope that hold true at different program points 3We are interested in finding invariants that hold at both st...

work page 2024
[50]

REASONING :

If you can decompose a single invariant into smaller ones , try to output multiple invariants . Fig.15: ClassInvGen Generation system prompt: input-output format. 1Name of Data Structure to Annotate : { struct } 2Code : 3‘‘‘ 4{ code } 5‘‘‘ Fig.16: ClassInvGen Generation user prompt template. 26 C. Sun et al. 1You are an expert in repairing program invaria...

work page
[51]

Follow the output format strictly , particularly enclosing each invariant in triple - ticks (‘‘‘), and enclosing the reasoning in $$$

work page
[52]

Only find object invariants for the target class provided to you , do not infer invariants for any other class

work page
[53]

Make sure the invariant is a statement in the same underlying programming language as the source program

work page
[54]

000102...6979899

If you can decompose a single invariant into smaller ones , try to output multiple invariants . Fig.17: ClassInvGen Refinement system prompt: instruction and task descrip- tion. ClassInvGen: Class Invariant Synthesis using Large Language Models 27 1Please fix the failed invariants given the feedback , tests and the original source code . 2 3Failed Invaria...

work page

[1] [1]

In: Proceedings ofthe29thACMSIGPLAN-SIGACTSymposiumonPrinciplesofProgram- ming Languages

Ammons, G., Bodík, R., Larus, J.R.: Mining specifications. In: Proceedings ofthe29thACMSIGPLAN-SIGACTSymposiumonPrinciplesofProgram- ming Languages. p. 4–16. POPL ’02, Association for Computing Machin- ery, New York, NY, USA (2002). https://doi.org/10.1145/503272.503275, https://doi.org/10.1145/503272.503275

work page doi:10.1145/503272.503275 2002

[2] [2]

In: Beyer, D., Cavalcanti, A

Boockmann, J.H., Lüttgen, G.: Comprehending object state via dynamic class invariant learning. In: Beyer, D., Cavalcanti, A. (eds.) Fundamental Approaches to Software Engineering. pp. 143–164. Springer Nature Switzer- land, Cham (2024)

work page 2024

[3] [3]

https://doi.org/10.5281/zenodo.14061403,https:// doi.org/10.5281/zenodo.14061403

Brunsfeld, M., Qureshi, A., Hlynskyi, A., Thomson, P., ObserverOfTime, Vera, J., dundargoc, Turnbull, P., Clem, T., Creager, D., Helwer, A., Rix, R., Kavolis, D., van Antwerpen, H., Davis, M., Lillis, W., Ika, Yahyaabadi, A., Nguy˜ en, T.A., bfredl, Massicotte, M., Brunk, S., Clason, C., Hasabnis, N., Dong, M., Moelius, S., Kalt, S., Finer, S., Kolja: tre...

work page doi:10.5281/zenodo.14061403 2024

[4] [4]

IEEE Transactions on ComputersC-35(8), 677–691 (1986)

Bryant, R.: Graph-based algorithms for boolean function manipu- lation. IEEE Transactions on ComputersC-35(8), 677–691 (1986). https://doi.org/10.1109/TC.1986.1676819

work page doi:10.1109/tc.1986.1676819 1986

[5] [5]

Interna- tional journal on software tools for technology transfer7, 212–232 (2005)

Burdy, L., Cheon, Y., Cok, D.R., Ernst, M.D., Kiniry, J.R., Leavens, G.T., Leino, K.R.M., Poll, E.: An overview of jml tools and applications. Interna- tional journal on software tools for technology transfer7, 212–232 (2005)

work page 2005

[6] [6]

In: Proceedings of the 8th USENIX Conference on Operating Systems Design and Imple- mentation

Cadar, C., Dunbar, D., Engler, D.: Klee: unassisted and automatic genera- tion of high-coverage tests for complex systems programs. In: Proceedings of the 8th USENIX Conference on Operating Systems Design and Imple- mentation. p. 209–224. OSDI’08, USENIX Association, USA (2008)

work page 2008

[7] [7]

Cousot, P., Cousot, R.: Abstract interpretation: a unified lattice model for staticanalysisofprogramsbyconstructionorapproximationoffixpoints.In: Proceedings of the 4th ACM SIGACT-SIGPLAN symposium on Principles of programming languages. pp. 238–252 (1977)

work page 1977

[8] [8]

In: Proceedings of the 30th International Conference on Software Engineering

Csallner, C., Tillmann, N., Smaragdakis, Y.: Dysy: dynamic sym- bolic execution for invariant inference. In: Proceedings of the 30th International Conference on Software Engineering. p. 281–290. ICSE ’08, Association for Computing Machinery, New York, NY, USA (2008). https://doi.org/10.1145/1368088.1368127,https://doi.org/10. 1145/1368088.1368127

work page doi:10.1145/1368088.1368127 2008

[9] [9]

Dimitrios, A.: Algorithms and data structures (2023),https: //github.com/djeada/Algorithms-And-Data-Structures/tree/ master/src/collections_and_containers/cpp

work page 2023

[10] [10]

Sun et al

Endres, M., Fakhoury, S., Chakraborty, S., Lahiri, S.K.: Can large language models transform natural language intent into formal 20 C. Sun et al. method postconditions? Proc. ACM Softw. Eng.1(FSE) (Jul 2024). https://doi.org/10.1145/3660791,https://doi.org/10.1145/3660791

work page doi:10.1145/3660791 2024

[11] [11]

Science of Computer Programming69(1–3), 35–45 (Dec 2007)

Ernst,M.D.,Perkins,J.H.,Guo,P.J.,McCamant,S.,Pacheco,C.,Tschantz, M.S., Xiao, C.: The Daikon system for dynamic detection of likely invari- ants. Science of Computer Programming69(1–3), 35–45 (Dec 2007)

work page 2007

[12] [12]

In: Static Analysis: 17th International Symposium, SAS 2010, Perpignan, France, September 14-16, 2010

Fähndrich, M.: Static verification for code contracts. In: Static Analysis: 17th International Symposium, SAS 2010, Perpignan, France, September 14-16, 2010. Proceedings 17. pp. 2–5. Springer (2010)

work page 2010

[13] [13]

Retrieval-Augmented Generation for Large Language Models: A Survey

Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., Wang, M., Wang, H.: Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[14] [14]

In: Biere, A., Bloem, R

Garg, P., Löding, C., Madhusudan, P., Neider, D.: Ice: a robust frame- work for learning invariants. In: Biere, A., Bloem, R. (eds.) Computer Aided Verification. pp. 69–87. Springer International Publishing, Cham (2014)

work page 2014

[15] [15]

Garg, P., Neider, D., Madhusudan, P., Roth, D.: Learning invariants using decision trees and implication counterexamples. In: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Program- mingLanguages.p.499–512.POPL’16,AssociationforComputingMachin- ery, New York, NY, USA (2016). https://doi.org/10.1145/2837614.2837664, https://do...

work page doi:10.1145/2837614.2837664 2016

[16] [16]

Greiner, S., Bühlmann, N., Ohrndorf, M., Tsigkanos, C., Nierstrasz, O., Kehrer, T.: Automated generation of code contracts: Generative ai to the rescue? In: Proceedings of the 23rd ACM SIGPLAN International Con- ference on Generative Programming: Concepts and Experiences. pp. 1–14 (2024)

work page 2024

[17] [17]

arXiv preprint (March 2019),https://www.microsoft.com/en-us/research/publication/ are-my-invariants-valid-a-learning-approach/

Hellendoorn, V.J., Devanbu, P.T., Polozov, A., Marron, M.: Are my invariants valid? a learning approach. arXiv preprint (March 2019),https://www.microsoft.com/en-us/research/publication/ are-my-invariants-valid-a-learning-approach/

work page 2019

[18] [18]

In: Proceedings of the 31st ACM SIGPLAN-SIGACT sympo- sium on Principles of programming languages

Henzinger, T.A., Jhala, R., Majumdar, R., McMillan, K.L.: Abstractions from proofs. In: Proceedings of the 31st ACM SIGPLAN-SIGACT sympo- sium on Principles of programming languages. pp. 232–244 (2004)

work page 2004

[19] [19]

Kamath, A., Senthilnathan, A., Chakraborty, S., Deligiannis, P., Lahiri, S.K., Lal, A., Rastogi, A., Roy, S., Sharma, R.: Finding inductive loop invariants using large language models (2023),https://arxiv.org/abs/ 2311.07948

work page arXiv 2023

[20] [20]

Formal aspects of computing 27(3), 573–609 (2015)

Kirchner, F., Kosmatov, N., Prevosto, V., Signoles, J., Yakobowski, B.: Frama-c: A software analysis perspective. Formal aspects of computing 27(3), 573–609 (2015)

work page 2015

[21] [21]

In: Formal Methods in Computer-Aided Design (FM- CAD’24) (July 2024)

Lahiri, S.: Evaluating llm-driven user-intent formalization for verification- aware languages. In: Formal Methods in Computer-Aided Design (FM- CAD’24) (July 2024)

work page 2024

[22] [22]

In: Bouajjani, A., Maler, O

Lahiri, S.K., Qadeer, S., Galeotti, J.P., Voung, J.W., Wies, T.: Intra-module inference. In: Bouajjani, A., Maler, O. (eds.) Computer Aided Verification. pp. 493–508. Springer Berlin Heidelberg, Berlin, Heidelberg (2009) ClassInvGen: Class Invariant Synthesis using Large Language Models 21

work page 2009

[23] [23]

In: International Symposium on Code Generation and Optimization, 2004

Lattner, C., Adve, V.: Llvm: a compilation framework for lifelong pro- gram analysis & transformation. In: International Symposium on Code Generation and Optimization, 2004. CGO 2004. pp. 75–86 (2004). https://doi.org/10.1109/CGO.2004.1281665

work page doi:10.1109/cgo.2004.1281665 2004

[24] [24]

Proceedings of the ACM on Programming Languages 7(OOPSLA1), 286–315 (2023)

Lattuada, A., Hance, T., Cho, C., Brun, M., Subasinghe, I., Zhou, Y., Howell, J., Parno, B., Hawblitzel, C.: Verus: Verifying rust programs using linear ghost types. Proceedings of the ACM on Programming Languages 7(OOPSLA1), 286–315 (2023)

work page 2023

[25] [25]

In: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Im- plementation

Le, T.C., Zheng, G., Nguyen, T.: Sling: using dynamic analysis to in- fer program invariants in separation logic. In: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Im- plementation. p. 788–801. PLDI 2019, Association for Computing Machin- ery, New York, NY, USA (2019). https://doi.org/10.1145/3314221.3314634, https://do...

work page doi:10.1145/3314221.3314634 2019

[26] [26]

In: International conference on logic for programming artificial intelligence and reasoning

Leino, K.R.M.: Dafny: An automatic program verifier for functional cor- rectness. In: International conference on logic for programming artificial intelligence and reasoning. pp. 348–370. Springer (2010)

work page 2010

[27] [27]

Nguyen, Wenbo Wang, and Shaohua Wang

Lemieux, C., Inala, J.P., Lahiri, S.K., Sen, S.: Codamosa: Es- caping coverage plateaus in test generation with pre-trained large language models. In: Proceedings of the 45th Interna- tional Conference on Software Engineering. p. 919–931. ICSE ’23, IEEE Press (2023). https://doi.org/10.1109/ICSE48619.2023.00085, https://doi.org/10.1109/ICSE48619.2023.00085

work page doi:10.1109/icse48619.2023.00085 2023

[28] [28]

Transactions of the Association for Computational Linguistics12, 157–173 (2024)

Liu, N.F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., Liang, P.: Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics12, 157–173 (2024)

work page 2024

[29] [29]

Lohmann, N., other contributors: mutate_cpp - mutation testing tool for c++.https://github.com/nlohmann/mutate_cpp(2017)

work page 2017

[30] [30]

CoRR abs/2401.08807(2024)

Ma, L., Liu, S., Li, Y., Xie, X., Bu, L.: Specgen: Automated genera- tion of formal program specifications via large language models. CoRR abs/2401.08807(2024). https://doi.org/10.48550/ARXIV.2401.08807, https://doi.org/10.48550/arXiv.2401.08807

work page doi:10.48550/arxiv.2401.08807 2024

[31] [31]

In: Jensen, K., Podel- ski, A

McMillan, K.L.: An interpolating theorem prover. In: Jensen, K., Podel- ski, A. (eds.) Tools and Algorithms for the Construction and Analysis of Systems. pp. 16–30. Springer Berlin Heidelberg, Berlin, Heidelberg (2004)

work page 2004

[32] [32]

See http://www

Meyer, B.: The eiffel programming language. See http://www. eiffel. com (1992)

work page 1992

[33] [33]

In: Proceedings of the 44th International Conference on Software Engineering

Molina, F., d’Amorim, M., Aguirre, N.: Fuzzing class specifications. In: Proceedings of the 44th International Conference on Software Engineering. p. 1008–1020. ICSE ’22, Association for Computing Machinery, New York, NY, USA (2022). https://doi.org/10.1145/3510003.3510120,https://doi. org/10.1145/3510003.3510120

work page doi:10.1145/3510003.3510120 2022

[34] [34]

In: 2008 Tools and Algorithms for Construction and Analysis of Systems

de Moura, L., Bjørner, N.: Z3: an efficient smt solver. In: 2008 Tools and Algorithms for Construction and Analysis of Systems. pp. 337–340. 22 C. Sun et al. Springer, Berlin, Heidelberg (March 2008),https://www.microsoft.com/ en-us/research/publication/z3-an-efficient-smt-solver/

work page 2008

[35] [35]

Newcomb, J.L., Bodik, R.: Using human-in-the-loop synthesis to author functional reactive programs (2019),https://arxiv.org/abs/1909.11206

work page arXiv 2019

[36] [36]

In: 2012 34th Interna- tional Conference on Software Engineering (ICSE)

Nguyen, T., Kapur, D., Weimer, W., Forrest, S.: Using dynamic analy- sis to discover polynomial and array invariants. In: 2012 34th Interna- tional Conference on Software Engineering (ICSE). pp. 683–693 (2012). https://doi.org/10.1109/ICSE.2012.6227149

work page doi:10.1109/icse.2012.6227149 2012

[37] [37]

In: Proceedings of the 37th ACM SIG- PLAN Conference on Programming Language Design and Implementa- tion, PLDI 2016, Santa Barbara, CA, USA, June 13-17, 2016

Padhi, S., Sharma, R., Millstein, T.D.: Data-driven precondition infer- ence with learned features. In: Proceedings of the 37th ACM SIG- PLAN Conference on Programming Language Design and Implementa- tion, PLDI 2016, Santa Barbara, CA, USA, June 13-17, 2016. pp. 42–56 (2016). https://doi.org/10.1145/2908080.2908099,http://doi.acm.org/ 10.1145/2908080.2908099

work page doi:10.1145/2908080.2908099 2016

[38] [38]

(eds.) Proceedings of the 40th International Conference on Machine Learning

Pei, K., Bieber, D., Shi, K., Sutton, C., Yin, P.: Can large language mod- els reason about program invariants? In: Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., Scarlett, J. (eds.) Proceedings of the 40th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 202, pp. 27496–27520. PMLR (23–29 Jul 20...

work page 2023

[39] [39]

In: Proceedings of the eigh- teenth international symposium on Software testing and analysis

Polikarpova, N., Ciupa, I., Meyer, B.: A comparative study of programmer- written and automatically inferred contracts. In: Proceedings of the eigh- teenth international symposium on Software testing and analysis. pp. 93– 104 (2009)

work page 2009

[40] [40]

Schäfer, M., Nadi, S., Eghbali, A., Tip, F.: An empirical evaluation of using large language models for automated unit test generation (2023),https: //arxiv.org/abs/2302.06527

work page arXiv 2023

[41] [41]

ACM SIGPLAN Notices46(9), 266–278 (2011)

Swamy, N., Chen, J., Fournet, C., Strub, P.Y., Bhargavan, K., Yang, J.: Se- cure distributed programming with value-dependent types. ACM SIGPLAN Notices46(9), 266–278 (2011)

work page 2011

[42] [42]

In: Gurfinkel, A., Ganesh, V

Wen, C., Cao, J., Su, J., Xu, Z., Qin, S., He, M., Li, H., Cheung, S.C., Tian, C.: Enchanting program specification synthesis by large language models us- ing static analysis and program verification. In: Gurfinkel, A., Ganesh, V. (eds.) Computer Aided Verification. pp. 302–328. Springer Nature Switzer- land, Cham (2024)

work page 2024

[43] [43]

Wikipedia contributors: Avl tree — Wikipedia, the free encyclope- dia (2024),https://en.wikipedia.org/w/index.php?title=AVL_tree& oldid=1255561501, [Online; accessed 15-November-2024]

work page 2024

[44] [44]

In: Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering

Wu, G., Cao, W., Yao, Y., Wei, H., Chen, T., Ma, X.: Llm meets bounded model checking: Neuro-symbolic loop invariant inference. In: Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering. p. 406–417. ASE ’24, Association for Computing Machin- ery, New York, NY, USA (2024). https://doi.org/10.1145/3691620.3695014, https...

work page doi:10.1145/3691620.3695014 2024

[45] [45]

ClassInvGen: Class Invariant Synthesis using Large Language Models 23 Proceedings of the ACM on Programming Languages8(OOPSLA2), 709– 735 (2024)

Yang, C., Deng, Y., Lu, R., Yao, J., Liu, J., Jabbarvand, R., Zhang, L.: Whitefox:White-boxcompilerfuzzingempoweredbylargelanguagemodels. ClassInvGen: Class Invariant Synthesis using Large Language Models 23 Proceedings of the ACM on Programming Languages8(OOPSLA2), 709– 735 (2024)

work page 2024

[46] [46]

REASONING :

z3Prover: z3prover util classes.https://github.com/Z3Prover/z3/tree/ master/src/util(2024) 24 C. Sun et al. A Prompt 1You are an expert in creating program invariants from code and natural language . 2Invariants are assertions on the variables in scope that hold true at different program points 3We are interested in finding invariants that hold at both st...

work page 2024

[47] [50]

REASONING :

If you can decompose a single invariant into smaller ones , try to output multiple invariants . Fig.15: ClassInvGen Generation system prompt: input-output format. 1Name of Data Structure to Annotate : { struct } 2Code : 3‘‘‘ 4{ code } 5‘‘‘ Fig.16: ClassInvGen Generation user prompt template. 26 C. Sun et al. 1You are an expert in repairing program invaria...

work page

[48] [51]

Follow the output format strictly , particularly enclosing each invariant in triple - ticks (‘‘‘), and enclosing the reasoning in $$$

work page

[49] [52]

Only find object invariants for the target class provided to you , do not infer invariants for any other class

work page

[50] [53]

Make sure the invariant is a statement in the same underlying programming language as the source program

work page

[51] [54]

000102...6979899

If you can decompose a single invariant into smaller ones , try to output multiple invariants . Fig.17: ClassInvGen Refinement system prompt: instruction and task descrip- tion. ClassInvGen: Class Invariant Synthesis using Large Language Models 27 1Please fix the failed invariants given the feedback , tests and the original source code . 2 3Failed Invaria...

work page