An Empirical Study of LLM-Generated Specifications for VeriFast
Pith reviewed 2026-06-26 04:46 UTC · model grok-4.3
The pith
LLMs generate functionally preserving specifications for VeriFast over 91 percent of the time but verify successfully only 31.4 percent of the time.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Through systematic testing, the authors establish that LLMs preserve functional behavior in both the source code and the generated specifications at rates exceeding 91 percent, yet only 31.4 percent of the cases lead to successful verification by VeriFast. Models such as Gemini 2.5 Pro and the inclusion of formal contracts as prompts yield higher success rates. The dominant source of errors, accounting for 94 percent, is the LLMs' incorrect handling of the domain-specific knowledge required by separation logic verifiers.
What carries the argument
The multi-stage empirical evaluation using eight prompting approaches on ten LLMs with three input types to generate and check specifications for 303 C functions, focusing on functional equivalence and verifiability.
If this is right
- Higher success rates occur when using Gemini 2.5 Pro compared to other models.
- Including formal contracts in the prompt improves verification outcomes.
- The bulk of verification failures stem from errors in applying separation logic concepts rather than from altering program semantics.
- Functional preservation remains high across most configurations, indicating LLMs reliably maintain observable behavior.
Where Pith is reading between the lines
- Fine-tuning or retrieval-augmented generation focused on separation logic rules could raise verification success rates.
- The same limitations may affect LLM use with other separation logic tools beyond VeriFast.
- Scaling the method to full programs rather than individual functions might expose additional challenges in maintaining consistency across specifications.
Load-bearing premise
The 303 C functions along with the eight prompting approaches and three input types adequately represent practical VeriFast usage scenarios, and the automated checks for functional preservation and verifiability measure real-world utility without systematic bias.
What would settle it
Applying the identical prompting and evaluation protocol to a fresh collection of C functions from industrial codebases or to a different separation logic verifier would produce markedly different functional preservation or verification success percentages.
Figures
read the original abstract
Static verification tools can assure industrial scale software, but require significant human labor to write specifications. This is particularly true of static verifiers based on separation logic (SL verifiers), which excel at verifying heapmanipulating programs, but require many complex auxiliary specifications to reason about heap structure. Recent work applies large language models (LLMs) to generate code, tests, and proofs, including specifications for verifiers, but mostly targeting non-SL verifiers. To address this gap, this paper thoroughly evaluates how well LLMs perform when prompted to generate specifications for verifying 303 C functions with the SL verifier VeriFast. We explored eight prompting approaches, ten LLMs, and three input types in two stages. Quantitative and qualitative analyses are used to assess the LLM-generated code and specifications for functional behavior, verifiability and errors. The results show that LLMs preserve functional behavior in source code and specifications (both over 91%), but achieve modest verification success (31.4%). Using Gemini 2.5 Pro and providing formal contracts lead to higher success rates in our setting. Moreover, most errors (94%) come from LLMs' mistakes in the domainspecific knowledge of SL verifiers such as VeriFast. These findings provide guidance for optimizing LLM-generated specifications for SL verifiers.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript reports an empirical evaluation of LLMs for generating specifications for the separation logic verifier VeriFast on 303 C functions. It examines eight prompting approaches, ten LLMs, and three input types across two stages, finding that generated code and specifications preserve functional behavior in over 91% of cases, achieve 31.4% verification success (higher with Gemini 2.5 Pro and formal contracts), and that 94% of errors arise from domain-specific knowledge gaps in SL verifiers.
Significance. If the measurements prove robust, the work fills a gap in LLM-assisted specification generation for heap-manipulating programs under separation logic, offering quantitative guidance on prompting strategies and highlighting domain knowledge as the dominant failure mode. The scale of the study (303 functions, multiple models and inputs) strengthens its potential utility for tool builders and practitioners.
major comments (3)
- [§4] §4 (Experimental Setup): The selection criteria and sampling method for the 303 C functions are not described in sufficient detail to evaluate selection bias or representativeness, which directly affects the reliability of the reported 31.4% verification success rate and the generalizability claim.
- [§5] §5 (Results, functional preservation): The automated procedure used to measure functional behavior preservation (claimed >91% for both code and specifications) is not specified, including whether it relies on test suites, differential execution, or manual review; without this, the central quantitative claims cannot be verified.
- [§5.3] §5.3 (Error Analysis): The classification process that attributes 94% of errors to domain-specific knowledge of VeriFast/SL verifiers lacks an explicit coding scheme, inter-rater reliability measure, or example annotations, making the dominant-error conclusion load-bearing yet unverifiable from the reported methodology.
minor comments (2)
- [Abstract] Abstract: The phrase 'two stages' is used without defining what the stages consist of or how they differ in prompting or evaluation.
- [Figures/Tables] Table captions and axis labels in the results figures should explicitly state the exact success metric (e.g., 'VeriFast verification success rate') to avoid ambiguity with functional-preservation percentages.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. The comments highlight important areas for improving methodological transparency, and we will revise the paper accordingly to address them.
read point-by-point responses
-
Referee: [§4] §4 (Experimental Setup): The selection criteria and sampling method for the 303 C functions are not described in sufficient detail to evaluate selection bias or representativeness, which directly affects the reliability of the reported 31.4% verification success rate and the generalizability claim.
Authors: We agree that §4 requires additional detail on function selection. The 303 functions were drawn from open-source C repositories and prior VeriFast benchmarks, with a focus on heap-manipulating code; we will expand the section to explicitly describe the sources, inclusion criteria, sampling procedure, and any steps taken to mitigate bias. revision: yes
-
Referee: [§5] §5 (Results, functional preservation): The automated procedure used to measure functional behavior preservation (claimed >91% for both code and specifications) is not specified, including whether it relies on test suites, differential execution, or manual review; without this, the central quantitative claims cannot be verified.
Authors: The preservation metric combined differential execution against available test suites with manual behavioral comparison for functions lacking tests. We will add a precise description of this procedure, including any automation scripts and decision rules, to the revised §5. revision: yes
-
Referee: [§5.3] §5.3 (Error Analysis): The classification process that attributes 94% of errors to domain-specific knowledge of VeriFast/SL verifiers lacks an explicit coding scheme, inter-rater reliability measure, or example annotations, making the dominant-error conclusion load-bearing yet unverifiable from the reported methodology.
Authors: We will augment §5.3 with the full coding scheme (distinguishing functional, syntactic, and domain-knowledge errors), inter-rater agreement statistics, and additional annotated examples to make the 94% attribution fully reproducible. revision: yes
Circularity Check
No significant circularity
full rationale
This paper is an empirical evaluation reporting measured outcomes (functional preservation rates, verification success, error distributions) from direct experiments on 303 C functions across LLMs, prompting strategies, and input types. No derivation chain, equations, fitted parameters presented as predictions, or self-referential definitions exist in the work. All reported metrics are computed from the experimental runs themselves without reduction to prior fitted values or self-citations that bear the central claim. The study is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Chaff: Engineering an efficient sat solver,
M. W. Moskewiczet al., “Chaff: Engineering an efficient sat solver,” in Proc. of the 38th annual Design Automation Conf., 2001, pp. 530–535
2001
-
[2]
An axiomatic basis for computer programming,
C. A. R. Hoare, “An axiomatic basis for computer programming,” Communications of the ACM, vol. 12, no. 10, pp. 576–580, 1969
1969
-
[3]
Dafny: An automatic program verifier for functional correctness,
K. R. M. Leino, “Dafny: An automatic program verifier for functional correctness,” inInternational conference on logic for programming artificial intelligence and reasoning. Springer, 2010, pp. 348–370
2010
-
[4]
Verus: Verifying rust progs. using linear ghost types,
A. Lattuadaet al., “Verus: Verifying rust progs. using linear ghost types,” Proc. of the ACM on PLs., vol. 7, no. OOPSLA1, pp. 286–315, 2023
2023
-
[5]
Formally verified cloud-scale authorization,
A. Chakarovet al., “Formally verified cloud-scale authorization,” in 2025 IEEE/ACM 47th Int’l. Conf. on Softw. Eng. (ICSE). IEEE Computer Society, 2025, pp. 703–703
2025
-
[6]
Software verification with verifast: Industrial case studies,
P. Philippaertset al., “Software verification with verifast: Industrial case studies,”Science of Computer Programming, vol. 82, pp. 77–97, 2014
2014
-
[7]
Codet: Code generation with generated tests,
B. Chenet al., “Codet: Code generation with generated tests,”arXiv preprint arXiv:2207.10397, 2022
Pith/arXiv arXiv 2022
-
[8]
Automatic generation of programming exercises and code explanations using large language models,
S. Sarsaet al., “Automatic generation of programming exercises and code explanations using large language models,” inProc. of the 2022 ACM Conf. on Int’l. Computing Educ. Research-Vol. 1, 2022, pp. 27–43
2022
-
[9]
Software testing with large language models: Survey, landscape, and vision,
J. Wanget al., “Software testing with large language models: Survey, landscape, and vision,”IEEE Trans. Softw. Eng., 2024
2024
-
[10]
Fuzz4all: Universal fuzzing with llms,
C. S. Xiaet al., “Fuzz4all: Universal fuzzing with llms,” inProc. of the IEEE/ACM 46th Int’l Conf. on Softw. Eng., 2024, pp. 1–13
2024
-
[11]
The daikon system for dynamic detection of likely invariants,
M. D. Ernstet al., “The daikon system for dynamic detection of likely invariants,”Science of computer prog., vol. 69, no. 1-3, pp. 35–45, 2007
2007
-
[12]
Can large language models reason about program invariants?
K. Peiet al., “Can large language models reason about program invariants?” inInt’l Conf. on ML. PMLR, 2023, pp. 27 496–27 520
2023
-
[13]
Impact of large language models on generating software specifications,
D. Xieet al., “Impact of large language models on generating software specifications,”arXiv preprint arXiv:2306.03324, 2023
arXiv 2023
-
[14]
Classinvgen: Class invariant synthesis using large language models,
C. Sunet al., “Classinvgen: Class invariant synthesis using large language models,” inInt’l Symp on AI Verif.Springer, 2025, pp. 64–96
2025
-
[15]
Clause2inv: A generate-combine-check framework for loop invariant inference,
W. Caoet al., “Clause2inv: A generate-combine-check framework for loop invariant inference,”Proceedings of the ACM on Software Engineering, vol. 2, no. ISSTA, pp. 1009–1030, 2025
2025
-
[16]
Lyra: Orchestrating dual correction in automated theorem proving,
C. Zhenget al., “Lyra: Orchestrating dual correction in automated theorem proving,”arXiv preprint arXiv:2309.15806, 2023
arXiv 2023
-
[17]
Leandojo: Theorem proving with retrieval-augmented language models,
K. Yanget al., “Leandojo: Theorem proving with retrieval-augmented language models,”Adv. in Neural Info. Processing Syst., vol. 36, 2024
2024
-
[18]
Lisa: Language models of isabelle proofs,
A. Q. Jianget al., “Lisa: Language models of isabelle proofs,” in6th Conf. on AI and Theorem Proving, 2021, pp. 378–392
2021
-
[19]
Llmstep: Llm proofstep suggestions in lean,
S. Wellecket al., “Llmstep: Llm proofstep suggestions in lean,”arXiv preprint arXiv:2310.18457, 2023
arXiv 2023
-
[20]
Baldur: Whole-proof generation and repair with large language models,
E. Firstet al., “Baldur: Whole-proof generation and repair with large language models,” inProc. of the 31st ACM Joint European Softw. Eng. Conf. and Symp. on the Found. of Softw. Eng., 2023, pp. 1229–1241
2023
-
[21]
Vecogen: Automating generation of formally verified c code with large language models,
M. Sevenhuijsenet al., “Vecogen: Automating generation of formally verified c code with large language models,” in2025 IEEE/ACM 13th Int’l Conf. on FM in SE (FormaliSE). IEEE, 2025, pp. 101–112
2025
-
[22]
Enchanting program specification synthesis by large language models using static analysis and program verification,
C. Wenet al., “Enchanting program specification synthesis by large language models using static analysis and program verification,” inInt’l Conf. on Computer Aided Verification. Springer, 2024, pp. 302–328
2024
-
[23]
Can chatgpt support software verification?
C. Janßenet al., “Can chatgpt support software verification?” inInt’l Conf. on Fundam. Approaches to SE. Springer, 2024, pp. 266–279
2024
-
[24]
Leveraging llms for program verification,
A. Kamathet al., “Leveraging llms for program verification,” in2024 FM in Computer-Aided Design (FMCAD). IEEE, 2024, pp. 107–118
2024
-
[25]
L. Maet al.,SpecGen: Automated Generation of Formal Program Specifications via Large Language Models. IEEE Press, 2025, p. 16–28. [Online]. Available: https://doi.org/10.1109/ICSE55347.2025.00129
-
[26]
Towards ai-assisted synthesis of verified dafny methods,
M. R. H. Misuet al., “Towards ai-assisted synthesis of verified dafny methods,”Proc. of the ACM on SE, vol. 1, no. FSE, pp. 812–835, 2024
2024
-
[27]
Laurel: Unblocking automated verification with large language models,
E. Mugnieret al., “Laurel: Unblocking automated verification with large language models,”Proc. ACM Program. Lang., vol. 9, no. OOPSLA1, Apr. 2025. [Online]. Available: https://doi.org/10.1145/3720499
-
[28]
Automatic generation of loop invariants in dafny with large language models,
J. Pascoal Fariaet al., “Automatic generation of loop invariants in dafny with large language models,” inInternational Conference on Fundamentals of Software Engineering. Springer, 2025, pp. 138–154
2025
-
[29]
Leveraging large language models to boost dafny’s developers productivity,
´A. F. Silvaet al., “Leveraging large language models to boost dafny’s developers productivity,” inProc. of the 2024 IEEE/ACM 12th Int’l Conf. on FM in SE (FormaliSE), 2024, pp. 138–142
2024
-
[30]
Evaluating llm-driven user-intent formalization for verification-aware languages,
S. K. Lahiri, “Evaluating llm-driven user-intent formalization for verification-aware languages,” inCONFERENCE ON FORMAL METH- ODS IN COMPUTER-AIDED DESIGN–FMCAD 2024, 2024, p. 142
2024
-
[31]
dafny-annotator: Ai-assisted verification of dafny programs,
G. Poesiaet al., “dafny-annotator: Ai-assisted verification of dafny programs,”arXiv preprint arXiv:2411.15143, 2024
arXiv 2024
-
[32]
Leveraging large language models for automated proof synthesis in rust,
J. Yaoet al., “Leveraging large language models for automated proof synthesis in rust,”arXiv preprint arXiv:2311.03739, 2023
arXiv 2023
-
[33]
Automated proof generation for rust code via self- evolution,
T. Chenet al., “Automated proof generation for rust code via self- evolution,” inThe 13th Int’l Conf. on Learning Representations, 2025. [Online]. Available: https://openreview.net/forum?id=2NqssmiXLu
2025
-
[34]
Autoverus: Automated proof generation for rust code,
C. Yanget al., “Autoverus: Automated proof generation for rust code,” Proc. ACM Program. Lang., vol. 9, no. OOPSLA2, Oct. 2025. [Online]. Available: https://doi.org/10.1145/3763174
-
[35]
Evaluating the ability of gpt-4o to generate verifiable specifications in verifast,
M. Regoet al., “Evaluating the ability of gpt-4o to generate verifiable specifications in verifast,” in2025 IEEE/ACM 2nd Int’l Conf on AI Foundation Models and Softw. Eng. (Forge), 2025, pp. 246–251
2025
-
[36]
Dafnypro: Llm-assisted automated verification for dafny programs,
D. Banerjeeet al., “Dafnypro: Llm-assisted automated verification for dafny programs,”arXiv preprint arXiv:2601.05385, 2026
arXiv 2026
-
[37]
Inferring multiple helper dafny assertions with llms,
´A. Silvaet al., “Inferring multiple helper dafny assertions with llms,” arXiv preprint arXiv:2511.00125, 2025
arXiv 2025
-
[38]
Sep. logic: A logic for shared mutable data structs
J. C. Reynolds, “Sep. logic: A logic for shared mutable data structs.” in Proc. 17th Ann. IEEE Symp. on Logic in CS. IEEE, 2002, pp. 55–74
2002
-
[39]
Sep. logic and abstraction,
M. Parkinsonet al., “Sep. logic and abstraction,” inProc. of the 32nd ACM SIGPLAN-SIGACT Symp. on Princ. of PLs., 2005, pp. 247–258
2005
-
[40]
Viper: A verification infrastructure for permission- based reasoning,
P. M ¨ulleret al., “Viper: A verification infrastructure for permission- based reasoning,” inVerification, Model Checking, and Abstract In- terpretation: 17th Int’l Conf., VMCAI 2016, St. Petersburg, FL, USA, January 17-19, 2016. Proceedings 17. Springer, 2016, pp. 41–62
2016
-
[41]
Gillian, part i: a multi-language platform for symbolic execution,
J. Fragoso Santoset al., “Gillian, part i: a multi-language platform for symbolic execution,” inProc. of the 41st ACM SIGPLAN Conf. on Prog. Lang. Design and Implementation, 2020, pp. 927–942
2020
-
[42]
Verifast: A powerful, sound, predictable, fast verifier for c and java,
B. Jacobset al., “Verifast: A powerful, sound, predictable, fast verifier for c and java,” inNASA FM Symp.Springer, 2011, pp. 41–55
2011
-
[43]
Implicit dynamic frames,
J. Smanset al., “Implicit dynamic frames,”ACM Transactions on Prog. Langs. and Syst. (TOPLAS), vol. 34, no. 1, pp. 1–58, 2012
2012
-
[44]
Attention is all you need,
A. Vaswaniet al., “Attention is all you need,”Advances in neural information processing systems, vol. 30, 2017
2017
-
[45]
Llama: Open and efficient foundation language models,
H. Touvronet al., “Llama: Open and efficient foundation language models,”arXiv preprint arXiv:2302.13971, 2023
Pith/arXiv arXiv 2023
-
[46]
A. Liuet al., “Deepseek-v3 technical report,”arXiv preprint arXiv:2412.19437, 2024
Pith/arXiv arXiv 2024
-
[47]
Codegen: An open large language model for code with multi-turn program synth
E. Nijkampet al., “Codegen: An open large language model for code with multi-turn program synth.”arXiv preprint arXiv:2203.13474, 2022
Pith/arXiv arXiv 2022
-
[48]
Starcoder 2 and the stack v2: The next generation,
A. Lozhkovet al., “Starcoder 2 and the stack v2: The next generation,” arXiv preprint arXiv:2402.19173, 2024
Pith/arXiv arXiv 2024
-
[49]
Language models are few-shot learners,
T. Brownet al., “Language models are few-shot learners,”Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020
1901
-
[50]
Palm: Scaling language modeling with pathways,
A. Chowdheryet al., “Palm: Scaling language modeling with pathways,” J. of Machine Learning Research, vol. 24, no. 240, pp. 1–113, 2023
2023
-
[51]
A. Grattafioriet al., “The llama 3 herd of models,”arXiv preprint arXiv:2407.21783, 2024
Pith/arXiv arXiv 2024
-
[52]
An explanation of in-context learning as implicit bayesian inference,
S. M. Xieet al., “An explanation of in-context learning as implicit bayesian inference,”arXiv preprint arXiv:2111.02080, 2021
Pith/arXiv arXiv 2021
-
[53]
Chain-of-thought prompting elicits reasoning in llms,
J. Weiet al., “Chain-of-thought prompting elicits reasoning in llms,” Adv. in neural info. proc. syst., vol. 35, pp. 24 824–24 837, 2022
2022
-
[54]
Retrieval-augmented generation for knowledge-intensive nlp tasks,
P. Lewiset al., “Retrieval-augmented generation for knowledge-intensive nlp tasks,”Adv. in neural info. proc. syst., vol. 33, pp. 9459–9474, 2020
2020
-
[55]
Solving quantitative reasoning probls. with lang. models,
A. Lewkowyczet al., “Solving quantitative reasoning probls. with lang. models,”Adv. in neural info. proc. syst., vol. 35, pp. 3843–3857, 2022
2022
-
[56]
Star: Bootstrapping reasoning with reasoning,
E. Zelikmanet al., “Star: Bootstrapping reasoning with reasoning,”Adv. in Neural Info. Proc. Syst., vol. 35, pp. 15 476–15 488, 2022
2022
-
[57]
Swe-agent: Agent-comptr. interfaces enable autom. softw. eng
J. Yanget al., “Swe-agent: Agent-comptr. interfaces enable autom. softw. eng.”Adv. in Neural Info. Proc. Syst., vol. 37, pp. 50 528–50 652, 2024
2024
-
[58]
Repocoder: Repository-level code completion through iterative retrieval and gen
F. Zhanget al., “Repocoder: Repository-level code completion through iterative retrieval and gen.”arXiv preprint arXiv:2303.12570, 2023
arXiv 2023
-
[59]
Repoagent: An llm-powered open-source framework for repository-level code documentation generation,
Q. Luoet al., “Repoagent: An llm-powered open-source framework for repository-level code documentation generation,”arXiv preprint arXiv:2402.16667, 2024
arXiv 2024
-
[60]
Llms for validating network protocol parsers,
M. Zhenget al., “Llms for validating network protocol parsers,” in2025 IEEE Secur. and Priv. Wkshps. (SPW). IEEE, 2025, pp. 56–64
2025
-
[61]
Large language models are zero-shot fuzzers: Fuzzing deep-learning libraries via large language models,
Y . Denget al., “Large language models are zero-shot fuzzers: Fuzzing deep-learning libraries via large language models,” inProc. of the 32nd ACM SIGSOFT int’l symp. on softw. test. and anal., 2023, pp. 423–435
2023
-
[62]
Codamosa: Escaping coverage plateaus in test generation with pre-trained large language models,
C. Lemieuxet al., “Codamosa: Escaping coverage plateaus in test generation with pre-trained large language models,” in2023 IEEE/ACM 45th Int’l Conf. on Softw. Eng. (ICSE). IEEE, 2023, pp. 919–931
2023
-
[63]
Autocoderover: Autonomous program improvement,
Y . Zhanget al., “Autocoderover: Autonomous program improvement,” inProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, 2024, pp. 1592–1604
2024
-
[64]
Swe-bench: Can language models resolve real- world github issues?
C. E. Jimenezet al., “Swe-bench: Can language models resolve real- world github issues?”arXiv preprint arXiv:2310.06770, 2023
Pith/arXiv arXiv 2023
-
[65]
Cocomic: Code completion by jointly modeling in-file and cross-file context,
Y . Dinget al., “Cocomic: Code completion by jointly modeling in-file and cross-file context,” inProceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), 2024, pp. 3433–3445
2024
-
[66]
K. Thompsonet al., “Rango: Adaptive retrieval-augmented proving for automated software verification,” inProceedings of the IEEE/ACM 47th International Conference on Software Engineering, ser. ICSE ’25. IEEE Press, 2025, p. 347–359. [Online]. Available: https: //doi.org/10.1109/ICSE55347.2025.00161
-
[67]
Adaptive proof refinement with llm-guided strategy selection,
M. Luet al., “Adaptive proof refinement with llm-guided strategy selection,”arXiv preprint arXiv:2510.25103, 2025
arXiv 2025
-
[68]
Magnushammer: A transformer-based approach to premise selection,
M. Mikułaet al., “Magnushammer: A transformer-based approach to premise selection,” inThe 12th Int’l Conf. on Learning Rep., 2024. [Online]. Available: https://openreview.net/forum?id=oYjPk8mqA V
2024
-
[69]
Robertsonet al.,The probabilistic relevance framework: BM25 and beyond
S. Robertsonet al.,The probabilistic relevance framework: BM25 and beyond. Now Publishers Inc, 2009, vol. 4
2009
-
[70]
Dense passage retrieval for open-domain question answering
V . Karpukhinet al., “Dense passage retrieval for open-domain question answering.” inEMNLP (1), 2020, pp. 6769–6781
2020
-
[71]
Verifast 25.11,
B. Jacobset al., “Verifast 25.11,” https://github.com/verifast/verifast, 2025
2025
-
[72]
Annotation inference for separation logic based verifiers,
F. V ogelset al., “Annotation inference for separation logic based verifiers,” inInternational Conference on Formal Methods for Open Object-Based Distributed Systems. Springer, 2011, pp. 319–333
2011
-
[73]
Z3: An efficient smt solver,
L. De Mouraet al., “Z3: An efficient smt solver,” inInt’l Conf. on Tools & Algo. for the Const. and Anal. of Syst.Springer, 2008, pp. 337–340
2008
-
[74]
Evaluating large language models in class-level code generation,
X. Duet al., “Evaluating large language models in class-level code generation,” inProceedings of the IEEE/ACM 46th International Conference on Software Engineering, ser. ICSE ’24. New York, NY , USA: Association for Computing Machinery, 2024. [Online]. Available: https://doi.org/10.1145/3597503.3639219
-
[75]
Understanding interobserver agreement: the kappa statistic,
A. J. Vieraet al., “Understanding interobserver agreement: the kappa statistic,”Fam med, vol. 37, no. 5, pp. 360–363, 2005
2005
-
[76]
Can large language models transform natural language intent into formal method postconditions?
M. Endreset al., “Can large language models transform natural language intent into formal method postconditions?”Proceedings of the ACM on Software Engineering, vol. 1, no. FSE, pp. 1889–1912, 2024
1912
-
[77]
Automated lemma synth. in symbolic-heap sep. logic,
Q.-T. Taet al., “Automated lemma synth. in symbolic-heap sep. logic,” Proc. of the ACM on Prog. Langs., vol. 2, no. POPL, pp. 1–29, 2017
2017
-
[78]
Compositional shape analysis by means of bi- abduction,
C. Calcagnoet al., “Compositional shape analysis by means of bi- abduction,” inProc. of the 36th annual ACM SIGPLAN-SIGACT symp. on Princ. of prog. langs., 2009, pp. 289–300
2009
-
[79]
Compositional shape analysis by means of bi-abduction,
——, “Compositional shape analysis by means of bi-abduction,”Journal of the ACM (JACM), vol. 58, no. 6, pp. 1–66, 2011
2011
-
[80]
Inferring loop invariants using postconditions
C. A. Furiaet al., “Inferring loop invariants using postconditions.”Fields of logic and computation, vol. 6300, pp. 277–300, 2010
2010
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.