pith. sign in

arxiv: 2502.18917 · v2 · pith:ILVUCNJQnew · submitted 2025-02-26 · 💻 cs.AI · cs.PL· cs.SE

ClassInvGen: Class Invariant Synthesis using Large Language Models

Pith reviewed 2026-05-23 02:40 UTC · model grok-4.3

classification 💻 cs.AI cs.PLcs.SE
keywords class invariant synthesislarge language modelsC++ specificationsinvariant inferenceexecutable invariantsprogram verificationDaikon comparisonspecification generation
0
0 comments X

The pith

ClassInvGen uses LLMs to co-generate executable C++ class invariants and tests that outperform direct prompting and Daikon.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces ClassInvGen to synthesize class invariants for C++ by prompting large language models to produce both the invariant as a pure function and matching test inputs. This joint generation aims to yield invariants that are correct when executed and complete enough to distinguish program behaviors. On a contributed benchmark of standard C++ data structures the method produces invariants that pass more tests and kill more mutants than baselines. The same approach is shown to scale to classes in a high-integrity production codebase. If the claim holds, developers gain a practical way to obtain enforceable specifications without manual annotation or limited dynamic analysis.

Core claim

ClassInvGen is a method for co-generating executable class invariants and test inputs to produce high-quality class invariants for C++, leveraging LLMs' ability to synthesize pure functions. It outperforms a pure LLM-based technique to generate specifications from code as well as prior data-driven invariant inference techniques such as Daikon on a benchmark of standard C++ data structures, and demonstrates applicability through a case study on classes in a widely used high-integrity C++ codebase.

What carries the argument

ClassInvGen, the co-generation procedure that prompts an LLM to emit both a pure-function class invariant and test inputs used to validate it.

If this is right

  • Generated invariants can be compiled and checked at runtime or fed to static verifiers when the language supports it.
  • The new benchmark and mutant harness supply a concrete way to quantify both correctness and completeness of any invariant generator for C++.
  • The case-study results indicate the method can be applied directly to classes inside existing high-integrity codebases without language changes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same co-generation pattern could be tried on other object-oriented languages that support pure functions for invariants.
  • Test inputs produced alongside invariants might serve as an automatic feedback loop to refine the LLM output in subsequent prompts.
  • If the invariants prove stable across versions, they could be stored as living documentation that is mechanically checked on every build.

Load-bearing premise

Large language models can reliably output correct, executable pure functions that serve as class invariants in C++ and exceed the quality of both direct prompting and dynamic tools like Daikon without systematic logical errors.

What would settle it

On the provided C++ data-structure benchmark, if the invariants produced by ClassInvGen do not pass a higher fraction of tests or kill a higher fraction of mutants than those from pure LLM prompting or from Daikon, the performance claim is false.

Figures

Figures reproduced from arXiv: 2502.18917 by Chuyue Sun, Clark Barrett, David Dill, Jubi Taneja, Saikat Chakraborty, Shuvendu K. Lahiri, Viraj Agashe, Xiaokang Qiu.

Figure 1
Figure 1. Figure 1: An invariant in the doubly linked list class in Z3. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of ClassInvGen. An overview of the ClassInvGen framework is shown in [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Header file of AvlTree. The algorithm then generates invariants using generateInvariantWithLLM and stores them in invariants_dict (Line 12). The helper function getCodeForClass constructs class representations by combining the declaration text with any existing invariants from invariants_dict (Lines 15–18). The algorithm concludes by returning final_invariants for the target class, effectively managing the… view at source ↗
Figure 4
Figure 4. Figure 4: Evaluation of ClassInvGen gener￾ated invariants ClassInvGen produces filtered invari￾ants, which we evaluate using an auto￾mated pipeline ( [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Completeness Experiment Result. The 3 bars from left to right are Tests, [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Mutant that survived unit tests but was killed by ClassInvGen [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Another Mutant that survived unit tests but was killed by ClassInvGen [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Daikon vs. ClassInvGen Kills KLEE [6], Dafny [26] and Frama-C [20]. We selected the Z3 codebase due to its stringent correctness requirements; as an SMT solver, Z3 is employed in appli￾cations demanding high reliability. This high-stakes environment makes Z3 an ideal testbed for assessing the effectiveness of synthesized invariants. The bdd_manager class5 is particularly noteworthy. It was chosen because i… view at source ↗
Figure 9
Figure 9. Figure 9: Z3 developer-written class invariants for [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Correct and useful invariant for bdd_manager class 1 // Cache consistency: Entries in the operation cache should be valid 2 for (const auto* e : m_op_cache ) { 3 assert(e != nullptr); 4 assert(e->m_result != null_bdd); 5 } [PITH_FULL_IMAGE:figures/full_fig_p016_10.png] view at source ↗
Figure 12
Figure 12. Figure 12: Correct and useless invariant for bdd_manager class 1 // The number of nodes should not exceed the maximum number of BDD nodes 2 assert(m_nodes.size () <= m_max_num_bdd_nodes ); [PITH_FULL_IMAGE:figures/full_fig_p017_12.png] view at source ↗
Figure 14
Figure 14. Figure 14: ClassInvGen Generation system prompt: instruction and task descrip [PITH_FULL_IMAGE:figures/full_fig_p024_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: ClassInvGen Generation system prompt: input-output format. [PITH_FULL_IMAGE:figures/full_fig_p025_15.png] view at source ↗
Figure 17
Figure 17. Figure 17: ClassInvGen Refinement system prompt: instruction and task descrip [PITH_FULL_IMAGE:figures/full_fig_p026_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: ClassInvGen Refinement user prompt template. [PITH_FULL_IMAGE:figures/full_fig_p027_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: AvlTree instrumentation and invariant example [PITH_FULL_IMAGE:figures/full_fig_p033_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: Example of an incorrect invariant of AvlTree because there is no [PITH_FULL_IMAGE:figures/full_fig_p033_20.png] view at source ↗
Figure 21
Figure 21. Figure 21: A test suite generated for AvlTree 1 avl_tree.cpp: In lambda function: 2 avl_tree.cpp :21:16: error: use of 3 ’is_balanced’ before deduction of ’ auto’ (a) gcc compiler error messages 1 void AvlTree :: check_invariant () { 2 auto is_bst = [&](const std :: unique_ptr <Node >& node , 3 const T& min , const T& max) -> bool { 4 if (! node) return true; 5 if (node ->data <= min || 6 node ->data >= max) return … view at source ↗
Figure 23
Figure 23. Figure 23: BST property after refinement: auto is changed to explicit declarations [PITH_FULL_IMAGE:figures/full_fig_p034_23.png] view at source ↗
read the original abstract

Formal program specifications in the form of preconditions, postconditions, and class invariants have several benefits for the construction and maintenance of programs. They not only aid in program understanding due to their unambiguous semantics but can also be enforced dynamically (or even statically when the language supports a formal verifier). However, synthesizing high-quality specifications in an underlying programming language is limited by the expressivity of the specifications or the need to express them in a declarative manner. Prior work has demonstrated the potential of large language models (LLMs) for synthesizing high-quality method pre/postconditions for Python and Java, but does not consider class invariants. In this work, we describe ClassInvGen, a method for co-generating executable class invariants and test inputs to produce high-quality class invariants for a mainstream language such as C++, leveraging LLMs' ability to synthesize pure functions. We show that ClassInvGen outperforms a pure LLM-based technique to generate specifications (from code) as well as prior data-driven invariant inference techniques such as Daikon. We contribute a benchmark of standard C++ data structures along with a harness that can help measure both the correctness and completeness of generated specifications using tests and mutants. We also demonstrate its applicability to real-world code by performing a case study on several classes within a widely used and high-integrity C++ codebase.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces ClassInvGen, an LLM-based method for co-generating executable class invariants (as pure C++ functions) and associated test inputs. It claims this approach outperforms both direct LLM prompting for specifications and prior dynamic invariant inference tools such as Daikon, evaluated on a contributed benchmark of standard C++ data structures using a harness that assesses correctness and completeness via tests and mutants. A case study applying the method to classes in a high-integrity real-world C++ codebase is also presented.

Significance. If the empirical claims hold under scrutiny, the work advances automated synthesis of class invariants for C++, supporting program understanding, dynamic enforcement, and verification in a mainstream language. The contributed benchmark and harness represent a concrete resource for the community. The emphasis on executable pure-function invariants and the real-world case study are positive elements.

major comments (2)
  1. [Section 4] Section 4 (Evaluation harness): the co-generation of test inputs together with the invariants creates a potential circularity risk for the central outperformance claim. The manuscript must clarify whether test inputs are produced independently of the synthesized invariants (e.g., via a fixed oracle or exhaustive enumeration) or demonstrate that results remain stable when evaluated against externally supplied test suites.
  2. [Section 4.2–4.3] Section 4.2–4.3 (Mutant analysis and results): mutant killing is used to assess completeness, yet the paper provides no explicit justification that the chosen mutation operators target the classes of faults typical for class invariants (e.g., missed state transitions or incorrect boundary conditions). Without this or supplementary manual validation counts, the superiority claim over Daikon rests on an incomplete fault model.
minor comments (2)
  1. [Abstract] Abstract: the claim of outperformance is stated without any numerical results or effect sizes; including at least the headline correctness/completeness deltas would make the summary self-contained.
  2. [Section 3] Notation: the distinction between “pure LLM-based technique” and ClassInvGen should be made explicit in the first paragraph of Section 3 so readers immediately see the precise difference in prompting strategy.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments and detailed review of our manuscript. We address each major comment point by point below.

read point-by-point responses
  1. Referee: [Section 4] Section 4 (Evaluation harness): the co-generation of test inputs together with the invariants creates a potential circularity risk for the central outperformance claim. The manuscript must clarify whether test inputs are produced independently of the synthesized invariants (e.g., via a fixed oracle or exhaustive enumeration) or demonstrate that results remain stable when evaluated against externally supplied test suites.

    Authors: We acknowledge the potential circularity concern arising from co-generating invariants and test inputs. The current approach has the LLM produce test inputs from the class interface and code to exercise the invariants, which may not be fully independent. In the revised manuscript, we will clarify the generation process to show that test inputs are derived from the class definition without reference to the specific invariant being synthesized. We will also add results evaluating the generated invariants against an externally supplied test suite (drawn from standard C++ data structure tests) to confirm that the outperformance claims remain stable. revision: yes

  2. Referee: [Section 4.2–4.3] Section 4.2–4.3 (Mutant analysis and results): mutant killing is used to assess completeness, yet the paper provides no explicit justification that the chosen mutation operators target the classes of faults typical for class invariants (e.g., missed state transitions or incorrect boundary conditions). Without this or supplementary manual validation counts, the superiority claim over Daikon rests on an incomplete fault model.

    Authors: We agree that an explicit justification for the mutation operators relative to typical class invariant faults is needed to strengthen the completeness evaluation. In the revised manuscript, we will expand the discussion in Sections 4.2–4.3 to explain how the chosen operators (including those for state transitions and boundary conditions) target common invariant faults. We will also include results from manual validation of a sample of mutants to further support the fault model and the comparison against Daikon. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical method evaluated against external baselines

full rationale

The paper presents ClassInvGen as an LLM-based technique for synthesizing executable C++ class invariants, with evaluation against pure LLM prompting and Daikon on a contributed benchmark using tests and mutants. No equations, fitted parameters, self-definitional loops, or load-bearing self-citations appear in the abstract or described claims. The central outperformance result is framed as an empirical comparison rather than a derivation that reduces to its own inputs by construction. The contributed harness is presented as an independent measurement tool, not a self-referential fit.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated.

pith-pipeline@v0.9.0 · 5794 in / 986 out tokens · 25509 ms · 2026-05-23T02:40:32.183198+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Large Lemma Miners: Can LLMs do Induction Proofs for Hardware?

    cs.LO 2025-11 conditional novelty 6.0

    A neurosymbolic method using two LLM prompting frameworks generates provably correct inductive arguments for 84% of a set of mid-size open-source RTL hardware designs.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · cited by 1 Pith paper · 1 internal anchor

  1. [1]

    In: Proceedings ofthe29thACMSIGPLAN-SIGACTSymposiumonPrinciplesofProgram- ming Languages

    Ammons, G., Bodík, R., Larus, J.R.: Mining specifications. In: Proceedings ofthe29thACMSIGPLAN-SIGACTSymposiumonPrinciplesofProgram- ming Languages. p. 4–16. POPL ’02, Association for Computing Machin- ery, New York, NY, USA (2002). https://doi.org/10.1145/503272.503275, https://doi.org/10.1145/503272.503275

  2. [2]

    In: Beyer, D., Cavalcanti, A

    Boockmann, J.H., Lüttgen, G.: Comprehending object state via dynamic class invariant learning. In: Beyer, D., Cavalcanti, A. (eds.) Fundamental Approaches to Software Engineering. pp. 143–164. Springer Nature Switzer- land, Cham (2024)

  3. [3]

    https://doi.org/10.5281/zenodo.14061403,https:// doi.org/10.5281/zenodo.14061403

    Brunsfeld, M., Qureshi, A., Hlynskyi, A., Thomson, P., ObserverOfTime, Vera, J., dundargoc, Turnbull, P., Clem, T., Creager, D., Helwer, A., Rix, R., Kavolis, D., van Antwerpen, H., Davis, M., Lillis, W., Ika, Yahyaabadi, A., Nguy˜ en, T.A., bfredl, Massicotte, M., Brunk, S., Clason, C., Hasabnis, N., Dong, M., Moelius, S., Kalt, S., Finer, S., Kolja: tre...

  4. [4]

    IEEE Transactions on ComputersC-35(8), 677–691 (1986)

    Bryant, R.: Graph-based algorithms for boolean function manipu- lation. IEEE Transactions on ComputersC-35(8), 677–691 (1986). https://doi.org/10.1109/TC.1986.1676819

  5. [5]

    Interna- tional journal on software tools for technology transfer7, 212–232 (2005)

    Burdy, L., Cheon, Y., Cok, D.R., Ernst, M.D., Kiniry, J.R., Leavens, G.T., Leino, K.R.M., Poll, E.: An overview of jml tools and applications. Interna- tional journal on software tools for technology transfer7, 212–232 (2005)

  6. [6]

    In: Proceedings of the 8th USENIX Conference on Operating Systems Design and Imple- mentation

    Cadar, C., Dunbar, D., Engler, D.: Klee: unassisted and automatic genera- tion of high-coverage tests for complex systems programs. In: Proceedings of the 8th USENIX Conference on Operating Systems Design and Imple- mentation. p. 209–224. OSDI’08, USENIX Association, USA (2008)

  7. [7]

    Cousot, P., Cousot, R.: Abstract interpretation: a unified lattice model for staticanalysisofprogramsbyconstructionorapproximationoffixpoints.In: Proceedings of the 4th ACM SIGACT-SIGPLAN symposium on Principles of programming languages. pp. 238–252 (1977)

  8. [8]

    In: Proceedings of the 30th International Conference on Software Engineering

    Csallner, C., Tillmann, N., Smaragdakis, Y.: Dysy: dynamic sym- bolic execution for invariant inference. In: Proceedings of the 30th International Conference on Software Engineering. p. 281–290. ICSE ’08, Association for Computing Machinery, New York, NY, USA (2008). https://doi.org/10.1145/1368088.1368127,https://doi.org/10. 1145/1368088.1368127

  9. [9]

    Dimitrios, A.: Algorithms and data structures (2023),https: //github.com/djeada/Algorithms-And-Data-Structures/tree/ master/src/collections_and_containers/cpp

  10. [10]

    Sun et al

    Endres, M., Fakhoury, S., Chakraborty, S., Lahiri, S.K.: Can large language models transform natural language intent into formal 20 C. Sun et al. method postconditions? Proc. ACM Softw. Eng.1(FSE) (Jul 2024). https://doi.org/10.1145/3660791,https://doi.org/10.1145/3660791

  11. [11]

    Science of Computer Programming69(1–3), 35–45 (Dec 2007)

    Ernst,M.D.,Perkins,J.H.,Guo,P.J.,McCamant,S.,Pacheco,C.,Tschantz, M.S., Xiao, C.: The Daikon system for dynamic detection of likely invari- ants. Science of Computer Programming69(1–3), 35–45 (Dec 2007)

  12. [12]

    In: Static Analysis: 17th International Symposium, SAS 2010, Perpignan, France, September 14-16, 2010

    Fähndrich, M.: Static verification for code contracts. In: Static Analysis: 17th International Symposium, SAS 2010, Perpignan, France, September 14-16, 2010. Proceedings 17. pp. 2–5. Springer (2010)

  13. [13]

    Retrieval-Augmented Generation for Large Language Models: A Survey

    Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., Wang, M., Wang, H.: Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997 (2023)

  14. [14]

    In: Biere, A., Bloem, R

    Garg, P., Löding, C., Madhusudan, P., Neider, D.: Ice: a robust frame- work for learning invariants. In: Biere, A., Bloem, R. (eds.) Computer Aided Verification. pp. 69–87. Springer International Publishing, Cham (2014)

  15. [15]

    Garg, P., Neider, D., Madhusudan, P., Roth, D.: Learning invariants using decision trees and implication counterexamples. In: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Program- mingLanguages.p.499–512.POPL’16,AssociationforComputingMachin- ery, New York, NY, USA (2016). https://doi.org/10.1145/2837614.2837664, https://do...

  16. [16]

    Greiner, S., Bühlmann, N., Ohrndorf, M., Tsigkanos, C., Nierstrasz, O., Kehrer, T.: Automated generation of code contracts: Generative ai to the rescue? In: Proceedings of the 23rd ACM SIGPLAN International Con- ference on Generative Programming: Concepts and Experiences. pp. 1–14 (2024)

  17. [17]

    arXiv preprint (March 2019),https://www.microsoft.com/en-us/research/publication/ are-my-invariants-valid-a-learning-approach/

    Hellendoorn, V.J., Devanbu, P.T., Polozov, A., Marron, M.: Are my invariants valid? a learning approach. arXiv preprint (March 2019),https://www.microsoft.com/en-us/research/publication/ are-my-invariants-valid-a-learning-approach/

  18. [18]

    In: Proceedings of the 31st ACM SIGPLAN-SIGACT sympo- sium on Principles of programming languages

    Henzinger, T.A., Jhala, R., Majumdar, R., McMillan, K.L.: Abstractions from proofs. In: Proceedings of the 31st ACM SIGPLAN-SIGACT sympo- sium on Principles of programming languages. pp. 232–244 (2004)

  19. [19]

    Kamath, A., Senthilnathan, A., Chakraborty, S., Deligiannis, P., Lahiri, S.K., Lal, A., Rastogi, A., Roy, S., Sharma, R.: Finding inductive loop invariants using large language models (2023),https://arxiv.org/abs/ 2311.07948

  20. [20]

    Formal aspects of computing 27(3), 573–609 (2015)

    Kirchner, F., Kosmatov, N., Prevosto, V., Signoles, J., Yakobowski, B.: Frama-c: A software analysis perspective. Formal aspects of computing 27(3), 573–609 (2015)

  21. [21]

    In: Formal Methods in Computer-Aided Design (FM- CAD’24) (July 2024)

    Lahiri, S.: Evaluating llm-driven user-intent formalization for verification- aware languages. In: Formal Methods in Computer-Aided Design (FM- CAD’24) (July 2024)

  22. [22]

    In: Bouajjani, A., Maler, O

    Lahiri, S.K., Qadeer, S., Galeotti, J.P., Voung, J.W., Wies, T.: Intra-module inference. In: Bouajjani, A., Maler, O. (eds.) Computer Aided Verification. pp. 493–508. Springer Berlin Heidelberg, Berlin, Heidelberg (2009) ClassInvGen: Class Invariant Synthesis using Large Language Models 21

  23. [23]

    In: International Symposium on Code Generation and Optimization, 2004

    Lattner, C., Adve, V.: Llvm: a compilation framework for lifelong pro- gram analysis & transformation. In: International Symposium on Code Generation and Optimization, 2004. CGO 2004. pp. 75–86 (2004). https://doi.org/10.1109/CGO.2004.1281665

  24. [24]

    Proceedings of the ACM on Programming Languages 7(OOPSLA1), 286–315 (2023)

    Lattuada, A., Hance, T., Cho, C., Brun, M., Subasinghe, I., Zhou, Y., Howell, J., Parno, B., Hawblitzel, C.: Verus: Verifying rust programs using linear ghost types. Proceedings of the ACM on Programming Languages 7(OOPSLA1), 286–315 (2023)

  25. [25]

    In: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Im- plementation

    Le, T.C., Zheng, G., Nguyen, T.: Sling: using dynamic analysis to in- fer program invariants in separation logic. In: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Im- plementation. p. 788–801. PLDI 2019, Association for Computing Machin- ery, New York, NY, USA (2019). https://doi.org/10.1145/3314221.3314634, https://do...

  26. [26]

    In: International conference on logic for programming artificial intelligence and reasoning

    Leino, K.R.M.: Dafny: An automatic program verifier for functional cor- rectness. In: International conference on logic for programming artificial intelligence and reasoning. pp. 348–370. Springer (2010)

  27. [27]

    Nguyen, Wenbo Wang, and Shaohua Wang

    Lemieux, C., Inala, J.P., Lahiri, S.K., Sen, S.: Codamosa: Es- caping coverage plateaus in test generation with pre-trained large language models. In: Proceedings of the 45th Interna- tional Conference on Software Engineering. p. 919–931. ICSE ’23, IEEE Press (2023). https://doi.org/10.1109/ICSE48619.2023.00085, https://doi.org/10.1109/ICSE48619.2023.00085

  28. [28]

    Transactions of the Association for Computational Linguistics12, 157–173 (2024)

    Liu, N.F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., Liang, P.: Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics12, 157–173 (2024)

  29. [29]

    Lohmann, N., other contributors: mutate_cpp - mutation testing tool for c++.https://github.com/nlohmann/mutate_cpp(2017)

  30. [30]

    CoRR abs/2401.08807(2024)

    Ma, L., Liu, S., Li, Y., Xie, X., Bu, L.: Specgen: Automated genera- tion of formal program specifications via large language models. CoRR abs/2401.08807(2024). https://doi.org/10.48550/ARXIV.2401.08807, https://doi.org/10.48550/arXiv.2401.08807

  31. [31]

    In: Jensen, K., Podel- ski, A

    McMillan, K.L.: An interpolating theorem prover. In: Jensen, K., Podel- ski, A. (eds.) Tools and Algorithms for the Construction and Analysis of Systems. pp. 16–30. Springer Berlin Heidelberg, Berlin, Heidelberg (2004)

  32. [32]

    See http://www

    Meyer, B.: The eiffel programming language. See http://www. eiffel. com (1992)

  33. [33]

    In: Proceedings of the 44th International Conference on Software Engineering

    Molina, F., d’Amorim, M., Aguirre, N.: Fuzzing class specifications. In: Proceedings of the 44th International Conference on Software Engineering. p. 1008–1020. ICSE ’22, Association for Computing Machinery, New York, NY, USA (2022). https://doi.org/10.1145/3510003.3510120,https://doi. org/10.1145/3510003.3510120

  34. [34]

    In: 2008 Tools and Algorithms for Construction and Analysis of Systems

    de Moura, L., Bjørner, N.: Z3: an efficient smt solver. In: 2008 Tools and Algorithms for Construction and Analysis of Systems. pp. 337–340. 22 C. Sun et al. Springer, Berlin, Heidelberg (March 2008),https://www.microsoft.com/ en-us/research/publication/z3-an-efficient-smt-solver/

  35. [35]

    Newcomb, J.L., Bodik, R.: Using human-in-the-loop synthesis to author functional reactive programs (2019),https://arxiv.org/abs/1909.11206

  36. [36]

    In: 2012 34th Interna- tional Conference on Software Engineering (ICSE)

    Nguyen, T., Kapur, D., Weimer, W., Forrest, S.: Using dynamic analy- sis to discover polynomial and array invariants. In: 2012 34th Interna- tional Conference on Software Engineering (ICSE). pp. 683–693 (2012). https://doi.org/10.1109/ICSE.2012.6227149

  37. [37]

    In: Proceedings of the 37th ACM SIG- PLAN Conference on Programming Language Design and Implementa- tion, PLDI 2016, Santa Barbara, CA, USA, June 13-17, 2016

    Padhi, S., Sharma, R., Millstein, T.D.: Data-driven precondition infer- ence with learned features. In: Proceedings of the 37th ACM SIG- PLAN Conference on Programming Language Design and Implementa- tion, PLDI 2016, Santa Barbara, CA, USA, June 13-17, 2016. pp. 42–56 (2016). https://doi.org/10.1145/2908080.2908099,http://doi.acm.org/ 10.1145/2908080.2908099

  38. [38]

    (eds.) Proceedings of the 40th International Conference on Machine Learning

    Pei, K., Bieber, D., Shi, K., Sutton, C., Yin, P.: Can large language mod- els reason about program invariants? In: Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., Scarlett, J. (eds.) Proceedings of the 40th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 202, pp. 27496–27520. PMLR (23–29 Jul 20...

  39. [39]

    In: Proceedings of the eigh- teenth international symposium on Software testing and analysis

    Polikarpova, N., Ciupa, I., Meyer, B.: A comparative study of programmer- written and automatically inferred contracts. In: Proceedings of the eigh- teenth international symposium on Software testing and analysis. pp. 93– 104 (2009)

  40. [40]

    Schäfer, M., Nadi, S., Eghbali, A., Tip, F.: An empirical evaluation of using large language models for automated unit test generation (2023),https: //arxiv.org/abs/2302.06527

  41. [41]

    ACM SIGPLAN Notices46(9), 266–278 (2011)

    Swamy, N., Chen, J., Fournet, C., Strub, P.Y., Bhargavan, K., Yang, J.: Se- cure distributed programming with value-dependent types. ACM SIGPLAN Notices46(9), 266–278 (2011)

  42. [42]

    In: Gurfinkel, A., Ganesh, V

    Wen, C., Cao, J., Su, J., Xu, Z., Qin, S., He, M., Li, H., Cheung, S.C., Tian, C.: Enchanting program specification synthesis by large language models us- ing static analysis and program verification. In: Gurfinkel, A., Ganesh, V. (eds.) Computer Aided Verification. pp. 302–328. Springer Nature Switzer- land, Cham (2024)

  43. [43]

    Wikipedia contributors: Avl tree — Wikipedia, the free encyclope- dia (2024),https://en.wikipedia.org/w/index.php?title=AVL_tree& oldid=1255561501, [Online; accessed 15-November-2024]

  44. [44]

    In: Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering

    Wu, G., Cao, W., Yao, Y., Wei, H., Chen, T., Ma, X.: Llm meets bounded model checking: Neuro-symbolic loop invariant inference. In: Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering. p. 406–417. ASE ’24, Association for Computing Machin- ery, New York, NY, USA (2024). https://doi.org/10.1145/3691620.3695014, https...

  45. [45]

    ClassInvGen: Class Invariant Synthesis using Large Language Models 23 Proceedings of the ACM on Programming Languages8(OOPSLA2), 709– 735 (2024)

    Yang, C., Deng, Y., Lu, R., Yao, J., Liu, J., Jabbarvand, R., Zhang, L.: Whitefox:White-boxcompilerfuzzingempoweredbylargelanguagemodels. ClassInvGen: Class Invariant Synthesis using Large Language Models 23 Proceedings of the ACM on Programming Languages8(OOPSLA2), 709– 735 (2024)

  46. [46]

    REASONING :

    z3Prover: z3prover util classes.https://github.com/Z3Prover/z3/tree/ master/src/util(2024) 24 C. Sun et al. A Prompt 1You are an expert in creating program invariants from code and natural language . 2Invariants are assertions on the variables in scope that hold true at different program points 3We are interested in finding invariants that hold at both st...

  47. [50]

    REASONING :

    If you can decompose a single invariant into smaller ones , try to output multiple invariants . Fig.15: ClassInvGen Generation system prompt: input-output format. 1Name of Data Structure to Annotate : { struct } 2Code : 3‘‘‘ 4{ code } 5‘‘‘ Fig.16: ClassInvGen Generation user prompt template. 26 C. Sun et al. 1You are an expert in repairing program invaria...

  48. [51]

    Follow the output format strictly , particularly enclosing each invariant in triple - ticks (‘‘‘), and enclosing the reasoning in $$$

  49. [52]

    Only find object invariants for the target class provided to you , do not infer invariants for any other class

  50. [53]

    Make sure the invariant is a statement in the same underlying programming language as the source program

  51. [54]

    000102...6979899

    If you can decompose a single invariant into smaller ones , try to output multiple invariants . Fig.17: ClassInvGen Refinement system prompt: instruction and task descrip- tion. ClassInvGen: Class Invariant Synthesis using Large Language Models 27 1Please fix the failed invariants given the feedback , tests and the original source code . 2 3Failed Invaria...