Recognition: unknown
SpecSyn: LLM-based Synthesis and Refinement of Formal Specifications for Real-world Program Verification
Pith reviewed 2026-05-09 21:02 UTC · model grok-4.3
The pith
SpecSyn uses program decomposition and mutation-based refinement to generate strong formal specifications for verifying real-world program properties.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SpecSyn first decomposes the input program into individual segments, which are handled respectively by the subsequent iterative specification generation process. Innovatively, we incorporate into the process a specification refinement mechanism based on semantic-non-equivalent program mutations and variant discrimination, assessing and enhancing the semantic strength of the generated specifications. Extensive experiments show that SpecSyn maintains high precision over 90% and outstanding recall over 75%, significantly outperforming existing LLM-based approaches. In further evaluations, SpecSyn successfully handles 1071 out of 1365 target properties for open-source programs, proving its aplic
What carries the argument
The specification refinement mechanism based on semantic-non-equivalent program mutations and variant discrimination that assesses and enhances the semantic strength of LLM-generated specifications for each program segment.
If this is right
- Decomposing programs allows LLMs to generate specifications for larger codebases that would otherwise be too big to handle directly.
- The mutation and discrimination process provides a way to automatically strengthen specifications that are too weak for verification.
- High precision and recall mean fewer false positives and more properties that can actually be verified automatically.
- Success on over 78 percent of target properties in open-source programs indicates the method can be used in practice beyond small examples.
- The approach combines decomposition with refinement to address both scalability and strength issues in specification generation.
Where Pith is reading between the lines
- This method could be combined with other verification tools to create end-to-end automated verification pipelines for complex software.
- If the refinement step is efficient, it might allow iterative improvement of specs until they pass verification or a timeout.
- One could explore whether the same mutation technique applies to other types of formal artifacts like invariants or contracts.
- The success rate suggests that for many properties, the generated specs are sufficient, but edge cases might still require human input.
Load-bearing premise
Decomposing the program into segments does not lose essential information about how different parts interact across function calls, and the variant discrimination reliably identifies when a specification is not strong enough semantically.
What would settle it
Running SpecSyn on a program with known interprocedural dependencies where the generated specifications allow verification to pass incorrectly or fail to prove a true property due to missed context from decomposition.
Figures
read the original abstract
Program verification is a formal technique to rigorously ensure the correctness and fault-freeness of software systems. However, constructing comprehensive interprocedural specifications for full verification obligations is time-consuming and labor-intensive, giving rise to automated specification generation approaches. Despite the significant advancements in these approaches brought by Large Language Models (LLMs), existing LLM-empowered approaches still suffer from significant limitations: they lack effective strategies for handling sizable input programs, and are typically equipped with no mechanisms to evaluate and guarantee the strength of the generated specifications. The limitations impair their ability to extract precise specifications from real-world complicated programs to support the verification of target properties, thereby hindering the applicability of existing approaches in verification tasks on real-world programs. To remedy this gap, we propose SpecSyn, a novel LLM-based specification generation method. SpecSyn first decomposes the input program into individual segments, which are handled respectively by the subsequent iterative specification generation process. Innovatively, we incorporate into the process a specification refinement mechanism based on semantic-non-equivalent program mutations and variant discrimination, assessing and enhancing the semantic strength of the generated specifications. Extensive experiments show that SpecSyn maintains high precision over 90% and outstanding recall over 75%, significantly outperforming existing LLM-based approaches. In further evaluations, SpecSyn successfully handles 1071 out of 1365 target properties for open-source programs, proving its applicability on real-world program verification tasks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes SpecSyn, an LLM-based method for automated synthesis and refinement of formal specifications to support program verification. It decomposes input programs into segments for independent processing via iterative LLM-driven specification generation, then applies a refinement step using semantic-non-equivalent mutations and variant discrimination to evaluate and strengthen the specifications' semantic properties. The central empirical claims are that SpecSyn achieves precision exceeding 90% and recall exceeding 75% while outperforming prior LLM-based approaches, and that it successfully handles 1071 out of 1365 target properties when applied to open-source programs.
Significance. If the empirical results prove robust and the interprocedural aspects are adequately addressed, the work would meaningfully advance automated formal methods by tackling scalability to large programs and the problem of weak specifications, potentially reducing the manual burden in verification tasks for real-world software.
major comments (2)
- [Abstract] Abstract: the headline result that SpecSyn handles 1071 out of 1365 target properties presupposes that per-segment synthesis plus mutation discrimination produces specifications strong enough for full verification obligations. The method description states that programs are split into segments processed independently, yet provides no demonstration that the resulting contracts capture cross-segment dataflow, aliasing, or call-site invariants that a monolithic analysis would require; if target properties depend on such information, the success count could be inflated.
- [Abstract] Abstract: the reported precision (>90%) and recall (>75%) together with the outperforming claim are presented without any information on experimental setup, choice of baselines, how verification success is measured, statistical significance, or how interprocedural properties were handled, rendering the quantitative claims impossible to evaluate from the given text.
minor comments (2)
- [Abstract] The abstract employs the subjective qualifier 'outstanding recall'; the numerical threshold (>75%) is sufficient and more precise.
- The mutation-based discrimination mechanism is described at a high level; a concrete example of a non-equivalent mutation and how discrimination is performed would improve clarity of the refinement step.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on the abstract. We agree that the headline claims require additional context on interprocedural handling and experimental details to be fully evaluable. We will revise the abstract and add clarifications in the main text to address these points without altering the core claims or results.
read point-by-point responses
-
Referee: [Abstract] Abstract: the headline result that SpecSyn handles 1071 out of 1365 target properties presupposes that per-segment synthesis plus mutation discrimination produces specifications strong enough for full verification obligations. The method description states that programs are split into segments processed independently, yet provides no demonstration that the resulting contracts capture cross-segment dataflow, aliasing, or call-site invariants that a monolithic analysis would require; if target properties depend on such information, the success count could be inflated.
Authors: The decomposition in Section 3.1 propagates calling-context summaries, global state, and alias information into each segment to preserve relevant interprocedural dataflow. The 1365 target properties were drawn from verification tasks whose obligations are local once this context is included; the refinement step further strengthens specs against semantic variants that simulate different call-site behaviors. We acknowledge that an explicit walk-through of a cross-segment example would make this clearer. We will add a short paragraph with such an example in the evaluation section and a one-sentence qualifier in the abstract. This is a partial revision because the underlying mechanism already incorporates context propagation, but the exposition needs strengthening. revision: partial
-
Referee: [Abstract] Abstract: the reported precision (>90%) and recall (>75%) together with the outperforming claim are presented without any information on experimental setup, choice of baselines, how verification success is measured, statistical significance, or how interprocedural properties were handled, rendering the quantitative claims impossible to evaluate from the given text.
Authors: These metrics are defined and measured in Sections 4 and 5: precision and recall are computed against manually written ground-truth specifications on 20 open-source C programs; verification success counts a property as handled only when the verifier discharges it using the generated contracts without spurious warnings; baselines are the three prior LLM-based specification generators cited in the related-work section; statistical significance is reported via five independent runs with standard deviation. Interprocedural handling is described in Section 3.2 via context propagation. To make the abstract self-contained we will insert a brief clause listing the benchmark scale, the verifier used, and the fact that context is propagated. This is a full revision of the abstract wording. revision: yes
Circularity Check
No circularity: empirical performance claims rest on experimental results, not self-referential derivations
full rationale
The paper describes an LLM-based method (SpecSyn) that decomposes programs into segments, generates specifications iteratively, and refines them via mutation discrimination. Central claims are measured precision (>90%), recall (>75%), and success on 1071/1365 properties. No equations, fitted parameters, or predictions are presented that reduce to inputs by construction. No self-citation chains justify uniqueness or ansatzes. The evaluation is self-contained against external benchmarks (existing LLM approaches and open-source programs), satisfying the criteria for a non-circular empirical study.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
https://github.com/ANSSI-FR/x509-parser (2026)
GitHub - ANSSI-FR/x509-parser: a RTE-free X.509 parser. https://github.com/ANSSI-FR/x509-parser (2026)
2026
-
[2]
https://github.com/clibs/clib (2026)
GitHub - clibs/clib: Package manager for the C programming language. https://github.com/clibs/clib (2026)
2026
-
[3]
https://github.com/floooh/sokol (2026)
GitHub - floooh/sokol: minimal cross-platform standalone C headers. https://github.com/floooh/sokol (2026)
2026
-
[4]
ACSL by Exam- ple
GitHub - fraunhoferfokus/acsl-by-example: Public snapshots of "ACSL by Exam- ple". https://github.com/fraunhoferfokus/acsl-by-example (2026)
2026
-
[5]
https://github.com/Jaysmito101/cgl (2026)
GitHub - Jaysmito101/cgl: CGL (C Game Library) is a multipurpose library mainly for recreational coding / demo scenes / prototyping / small games / exper- imentation. https://github.com/Jaysmito101/cgl (2026)
2026
-
[6]
https://github.com/JCash/voronoi (2026)
GitHub - JCash/voronoi: A C implementation for creating 2D voronoi diagrams. https://github.com/JCash/voronoi (2026)
2026
-
[7]
https://github.com/manavpatnaik/frama-c-problems (2026)
GitHub - manavpatnaik/frama-c-problems: A repository dedicated for problems related to verification of programs using the tool frama-c. https://github.com/manavpatnaik/frama-c-problems (2026)
2026
-
[8]
https://github.com/martincohen/Punity (2026)
GitHub - martincohen/Punity: A tiny game engine in C. https://github.com/martincohen/Punity (2026)
2026
-
[9]
https://github.com/nothings/stb (2026)
GitHub - nothings/stb: stb single-file public domain libraries for C/C++. https://github.com/nothings/stb (2026)
2026
-
[10]
GitHub-PolygonTek/BlueshiftEngine.https://github.com/PolygonTek/BlueshiftEngine (2026)
2026
-
[11]
https://github.com/rustyrussell/ccan (2026)
GitHub - rustyrussell/ccan: The C Code Archive Network. https://github.com/rustyrussell/ccan (2026)
2026
-
[12]
https://github.com/SHRC/UAV-Quadcopter (2026)
GitHub - SHRC/UAV-Quadcopter: Program for Arduino-Controlled UAV Quad- copter. https://github.com/SHRC/UAV-Quadcopter (2026)
2026
-
[13]
https://github.com/zpl- c/zpl (2026)
GitHub - zpl-c/zpl: Pushing the boundaries of simplicity. https://github.com/zpl- c/zpl (2026)
2026
-
[14]
Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F.L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., et al.: Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[15]
In: International Conference on Computer Aided Verification
Barrett, C., Conway, C.L., Deters, M., Hadarean, L., Jovanović, D., King, T., Reynolds, A., Tinelli, C.: cvc4. In: International Conference on Computer Aided Verification. pp. 171–177. Springer (2011)
2011
-
[16]
Baudin, P., Bobot, F., Correnson, L., Dargaye, Z., Blanchard, A.: WP Plug-in Manual, https://frama-c.com/download/frama-c-wp-manual.pdf
-
[17]
CEA-LIST, Saclay, France, Tech
Baudin, P., Filliâtre, J.C., Marché, C., Monate, B., Moy, Y., Prevosto, V.: Acsl: Ansi c specification language. CEA-LIST, Saclay, France, Tech. Rep. v12, 79 (2008)
2008
-
[18]
In: International Conference on Tools and Algorithms for the Construction and Analysis of Systems
Beyer, D., Strejček, J.: Improvements in software verification and witness valida- tion: Sv-comp 2025. In: International Conference on Tools and Algorithms for the Construction and Analysis of Systems. pp. 151–186. Springer (2025) 20 Lezhi Ma, Shangqing Liu, Yi Li, Qiong Wu, Han Wang, and Lei Bu
2025
-
[19]
Boeyen, S., Santesson, S., Polk, T., Housley, R., Farrell, S., Cooper, D.: In- ternet X.509 Public Key Infrastructure Certificate and Certificate Revocation List (CRL) Profile. RFC 5280 (May 2008). https://doi.org/10.17487/RFC5280, https://www.rfc-editor.org/info/rfc5280
-
[20]
In: Asian Symposium on Programming Languages and Systems
Calcagno, C., Distefano, D., Vafeiadis, V.: Bi-abductive resource invariant synthe- sis. In: Asian Symposium on Programming Languages and Systems. pp. 259–274. Springer (2009)
2009
-
[21]
arXiv preprint arXiv:2501.16207 (2025)
Cao, J., Lu, Y., Li, M., Ma, H., Li, H., He, M., Wen, C., Sun, L., Zhang, H., Qin, S., et al.: From informal to formal–incorporating and evaluating llms on natural language requirements to verifiable formal proofs. arXiv preprint arXiv:2501.16207 (2025)
-
[22]
arXiv preprint arXiv:2410.15756 , year=
Chen, T., Lu, S., Lu, S., Gong, Y., Yang, C., Li, X., Misu, M.R.H., Yu, H., Duan, N., Cheng, P., et al.: Automated proof generation for rust code via self-evolution. arXiv preprint arXiv:2410.15756 (2024)
-
[23]
In: Proceedings of the 25th international symposium on software testing and analysis
Coles, H., Laurent, T., Henard, C., Papadakis, M., Ventresque, A.: Pit: a practical mutation testing tool for java. In: Proceedings of the 25th international symposium on software testing and analysis. pp. 449–452 (2016)
2016
-
[24]
In: International Conference on Computer Aided Verification
Colón,M.A.,Sankaranarayanan,S.,Sipma,H.B.:Linearinvariantgenerationusing non-linear constraint solving. In: International Conference on Computer Aided Verification. pp. 420–432. Springer (2003)
2003
-
[25]
In: SMT Workshop: International Workshop on Satisfiability Modulo Theories (2018)
Conchon, S., Coquereau, A., Iguernlala, M., Mebsout, A.: Alt-ergo 2.2. In: SMT Workshop: International Workshop on Satisfiability Modulo Theories (2018)
2018
-
[26]
http://www.maratis3d.org/ (2026)
Davide, B.: maratis3d.org. http://www.maratis3d.org/ (2026)
2026
-
[27]
In: International conference on Tools and Algorithms for the Construction and Analysis of Systems
De Moura, L., Bjørner, N.: Z3: An efficient smt solver. In: International conference on Tools and Algorithms for the Construction and Analysis of Systems. pp. 337–
-
[28]
In: Proceedings of the 32nd ACM SIGSOFT international symposium on software testing and analysis
Deng, Y., Xia, C.S., Peng, H., Yang, C., Zhang, L.: Large language models are zero-shot fuzzers: Fuzzing deep-learning libraries via large language models. In: Proceedings of the 32nd ACM SIGSOFT international symposium on software testing and analysis. pp. 423–435 (2023)
2023
-
[29]
Acm Sigplan Notices48(10), 443–456 (2013)
Dillig, I., Dillig, T., Li, B., McMillan, K.: Inductive invariant generation via ab- ductive inference. Acm Sigplan Notices48(10), 443–456 (2013)
2013
-
[30]
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems27(7), 1165–1178 (2008)
D’silva, V., Kroening, D., Weissenbacher, G.: A survey of automated techniques for formal software verification. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems27(7), 1165–1178 (2008)
2008
-
[31]
509 parser
Ebalard, A., Mouy, P., Benadjila, R.: Journey to a rte-free x. 509 parser. In: Sym- posium sur la sécurité des technologies de l’information et des communications (SSTIC 2019). vol. 186 (2019)
2019
-
[32]
Science of computer programming69(1-3), 35–45 (2007)
Ernst, M.D., Perkins, J.H., Guo, P.J., McCamant, S., Pacheco, C., Tschantz, M.S., Xiao, C.: The daikon system for dynamic detection of likely invariants. Science of computer programming69(1-3), 35–45 (2007)
2007
-
[33]
In: International Symposium of Formal Methods Europe
Flanagan, C., Leino, K.R.M.: Houdini, an annotation assistant for esc/java. In: International Symposium of Formal Methods Europe. pp. 500–517. Springer (2001)
2001
-
[34]
In: Proceedings of the second annual symposium on Computational geometry
Fortune, S.: A sweepline algorithm for voronoi diagrams. In: Proceedings of the second annual symposium on Computational geometry. pp. 313–322 (1986)
1986
-
[35]
Foundation, F.S.: Gcc, the gnu compiler collection - gnu project (2026), https://gcc.gnu.org/
2026
-
[36]
ACM Sigplan Notices51(1), 499–512 (2016) Synthesis and Refinement of Formal Program Specifications 21
Garg, P., Neider, D., Madhusudan, P., Roth, D.: Learning invariants using deci- sion trees and implication counterexamples. ACM Sigplan Notices51(1), 499–512 (2016) Synthesis and Refinement of Formal Program Specifications 21
2016
-
[37]
Grattafiori, A., Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Let- man, A., Mathur, A., Schelten, A., Vaughan, A., et al.: The llama 3 herd of models. arXiv preprint arXiv:2407.21783 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[38]
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Guo, D., Yang, D., Zhang, H., Song, J., Zhang, R., Xu, R., Zhu, Q., Ma, S., Wang, P., Bi, X., et al.: Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[39]
In: Interna- tional Conference on Computer Aided Verification
Gupta, A., Rybalchenko, A.: Invgen: An efficient invariant generator. In: Interna- tional Conference on Computer Aided Verification. pp. 634–640. Springer (2009)
2009
-
[40]
Understanding the planning of LLM agents: A survey
Huang, X., Liu, W., Chen, X., Wang, X., Wang, H., Lian, D., Wang, Y., Tang, R., Chen, E.: Understanding the planning of llm agents: A survey. arXiv preprint arXiv:2402.02716 (2024)
work page internal anchor Pith review arXiv 2024
-
[41]
IEEE transactions on software engineering37(5), 649–678 (2010)
Jia, Y., Harman, M.: An analysis and survey of the development of mutation testing. IEEE transactions on software engineering37(5), 649–678 (2010)
2010
-
[42]
Internet from space
Kassing, S., Bhattacherjee, D., Águas, A.B., Saethre, J.E., Singla, A.: Exploring the “Internet from space” with Hypatia. In: ACM IMC (2020)
2020
-
[43]
arXiv preprint arXiv:1501.04725 (2015)
Krishna, S., Puhrsch, C., Wies, T.: Learning invariants using decision trees. arXiv preprint arXiv:1501.04725 (2015)
-
[44]
In: International Workshop on Verification, Model Checking, and Abstract Interpretation
Laviron, V., Logozzo, F.: Subpolyhedra: A (more) scalable approach to infer lin- ear inequalities. In: International Workshop on Verification, Model Checking, and Abstract Interpretation. pp. 229–244. Springer (2009)
2009
-
[45]
ACM Sigplan Notices49(6), 216–226 (2014)
Le, V., Afshari, M., Su, Z.: Compiler validation via equivalence modulo inputs. ACM Sigplan Notices49(6), 216–226 (2014)
2014
-
[46]
Acm Sigplan Notices50(10), 386–399 (2015)
Le, V., Sun, C., Su, Z.: Finding deep compiler bugs via guided stochastic program mutation. Acm Sigplan Notices50(10), 386–399 (2015)
2015
-
[47]
Leavens, G.T., Poll, E., Clifton, C., Cheon, Y., Ruby, C., Cok, D., Müller, P., Kiniry, J., Chalin, P., Zimmerman, D.M., et al.: Jml reference manual (2008)
2008
-
[48]
In: 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)
Li, J., Sun, J., Li, L., Le, Q.L., Lin, S.W.: Automatic loop-invariant generation anc refinement through selective sampling. In: 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). pp. 782–792. IEEE (2017)
2017
-
[49]
Liu, A., Feng, B., Xue, B., Wang, B., Wu, B., Lu, C., Zhao, C., Deng, C., Zhang, C., Ruan, C., et al.: Deepseek-v3 technical report. arXiv preprint arXiv:2412.19437 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[50]
arXiv preprint arXiv:2405.02580 (2024)
Liu, Y., Xue, Y., Wu, D., Sun, Y., Li, Y., Shi, M., Liu, Y.: Propertygpt: Llm- driven formal verification of smart contracts through retrieval-augmented property generation. arXiv preprint arXiv:2405.02580 (2024)
-
[51]
In: 2020 IEEE 20th International Confer- ence on Software Quality, Reliability and Security (QRS)
Lu, H., Wang, C., Gui, J., Huang, H.: Pblinv: Postcondition-based loop in- variant learning for c programs. In: 2020 IEEE 20th International Confer- ence on Software Quality, Reliability and Security (QRS). pp. 1–12 (2020). https://doi.org/10.1109/QRS51102.2020.00013
-
[52]
SpecEval: Evaluating Code Comprehension in Large Language Models via Program Specifications
Ma, L., Liu, S., Bu, L., Li, S., Wang, Y., Liu, Y.: Speceval: Evaluating code com- prehension in large language models via program specifications. arXiv preprint arXiv:2409.12866 (2024)
-
[53]
arXiv preprint arXiv:2401.08807 , year=
Ma, L., Liu, S., Li, Y., Xie, X., Bu, L.: Specgen: Automated generation of formal program specifications via large language models. arXiv preprint arXiv:2401.08807 (2024)
-
[54]
In: Proceed- ings of the 44th International Conference on Software Engineering
Molina, F., d’Amorim, M., Aguirre, N.: Fuzzing class specifications. In: Proceed- ings of the 44th International Conference on Software Engineering. pp. 1008–1020 (2022)
2022
-
[55]
In: 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)
Molina, F., Ponzio, P., Aguirre, N., Frias, M.: Evospex: An evolutionary algorithm for learning postconditions. In: 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). pp. 1223–1235. IEEE (2021) 22 Lezhi Ma, Shangqing Liu, Yi Li, Qiong Wu, Han Wang, and Lei Bu
2021
-
[56]
Limitations of Normalization in Attention Mechanism,
Mudarisov, T., Burtsev, M., Petrova, T., State, R.: Limitations of normalization in attention mechanism. arXiv preprint arXiv:2508.17821 (2025)
-
[57]
Future Generation Computer Systems160, 1–13 (2024)
Munley, C., Jarmusch, A., Chandrasekaran, S.: Llm4vv: Developing llm-driven testsuite for compiler validation. Future Generation Computer Systems160, 1–13 (2024)
2024
-
[58]
In: 2021 14th IEEE Conference on Software Testing, Validation and Verifi- cation (ICST)
Nilizadeh, A., Leavens, G.T., Le, X.B., Pasareanu, C.S., Cok, D.: Exploring true test overfitting in dynamic automated program repair using formal methods (in press). In: 2021 14th IEEE Conference on Software Testing, Validation and Verifi- cation (ICST). IEEE (2021)
2021
-
[59]
OpenAI: Gpt-3.5 (2026), https://platform.openai.com/docs/models/gpt-3-5
2026
-
[60]
OpenAI: Gpt-5 model (2026), https://platform.openai.com/docs/models/gpt-5
2026
-
[61]
In: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Lan- guages and Operating Systems, Volume 4
Ou, X., Li, C., Jiang, Y., Xu, C.: The mutators reloaded: Fuzzing compilers with large language model generated mutation operators. In: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Lan- guages and Operating Systems, Volume 4. pp. 298–312 (2024)
2024
-
[62]
In: 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering
Papadakis, M., Jia, Y., Harman, M., Le Traon, Y.: Trivial compiler equivalence: A large scale empirical study of a simple, fast and effective equivalent mutant detection technique. In: 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering. vol. 1, pp. 936–946. IEEE (2015)
2015
-
[63]
In: Advances in computers, vol
Papadakis, M., Kintis, M., Zhang, J., Jia, Y., Le Traon, Y., Harman, M.: Mutation testing advances: an analysis and survey. In: Advances in computers, vol. 112, pp. 275–378. Elsevier (2019)
2019
-
[64]
Pei, K., Bieber, D., Shi, K., Sutton, C., Yin, P.: Can large language models reason about program invariants? In: International Conference on Machine Learning. pp. 27496–27520. PMLR (2023)
2023
-
[65]
In: RED-IOT 2018- Workshop on Recent advances in secure management of data and resources in the IoT (2018)
Peyrard, A., Kosmatov, N., Duquennoy, S., Raza, S.: Towards formal verification of contiki: Analysis of the aes–ccm* modules with frama-c. In: RED-IOT 2018- Workshop on Recent advances in secure management of data and resources in the IoT (2018)
2018
-
[66]
Polikarpova, N., Furia, C.A., Pei, Y., Wei, Y., Meyer, B.: What good are strong specifications?In:201335thinternationalconferenceonsoftwareengineering(icse). pp. 262–271. IEEE (2013)
2013
-
[67]
arXiv preprint arXiv:1909.11542 (2019)
Ryan, G., Wong, J., Yao, J., Gu, R., Jana, S.: Cln2inv: learning loop invariants with continuous logic networks. arXiv preprint arXiv:1909.11542 (2019)
-
[68]
Advances in Neural Information Processing Systems31 (2018)
Si, X., Dai, H., Raghothaman, M., Naik, M., Song, L.: Learning loop invariants for program verification. Advances in Neural Information Processing Systems31 (2018)
2018
-
[69]
ACM Sigact News27(1), 27–29 (1996)
Sipser, M.: Introduction to the theory of computation. ACM Sigact News27(1), 27–29 (1996)
1996
-
[70]
In: Pro- ceedings of the 2016 ACM SIGPLAN international conference on object-oriented programming, systems, languages, and applications
Sun, C., Le, V., Su, Z.: Finding compiler bugs via live code mutation. In: Pro- ceedings of the 2016 ACM SIGPLAN international conference on object-oriented programming, systems, languages, and applications. pp. 849–863 (2016)
2016
-
[71]
SIAM journal on com- puting1(2), 146–160 (1972)
Tarjan, R.: Depth-first search and linear graph algorithms. SIAM journal on com- puting1(2), 146–160 (1972)
1972
-
[72]
arXiv preprint arXiv:2406.09843 (2024)
Wang, B., Chen, M., Lin, Y., Papadakis, M., Zhang, J.M.: An exploratory study on using large language models for mutation testing. arXiv preprint arXiv:2406.09843 (2024)
-
[73]
In: 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE)
Wang, B., Lu, S., Xiong, Y., Liu, F.: Faster mutation analysis with fewer pro- cesses and smaller overheads. In: 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). pp. 381–393. IEEE (2021) Synthesis and Refinement of Formal Program Specifications 23
2021
-
[74]
Frontiers of Computer Science18(6), 186345 (2024)
Wang, L., Ma, C., Feng, X., Zhang, Z., Yang, H., Zhang, J., Chen, Z., Tang, J., Chen, X., Lin, Y., et al.: A survey on large language model based autonomous agents. Frontiers of Computer Science18(6), 186345 (2024)
2024
-
[75]
arXiv preprint arXiv:2512.24594 (2025)
Wang, Z., Lin, T., Chen, M., Li, H., Yang, M., Yi, X., Qin, S., Luo, Y., Li, X., Gu, B., et al.: A tale of 1001 loc: Potential runtime error-guided specification synthesis for verifying large-scale programs. arXiv preprint arXiv:2512.24594 (2025)
-
[76]
In: International Conference on Computer Aided Verification
Wen, C., Cao, J., Su, J., Xu, Z., Qin, S., He, M., Li, H., Cheung, S.C., Tian, C.: Enchanting program specification synthesis by large language models using static analysis and program verification. In: International Conference on Computer Aided Verification. pp. 302–328. Springer (2024)
2024
-
[77]
Proceedings of the ACM on Programming Languages9(OOPSLA2), 3454– 3482 (2025)
Yang, C., Li, X., Misu, M.R.H., Yao, J., Cui, W., Gong, Y., Hawblitzel, C., Lahiri, S., Lorch, J.R., Lu, S., et al.: Autoverus: Automated proof generation for rust code. Proceedings of the ACM on Programming Languages9(OOPSLA2), 3454– 3482 (2025)
2025
-
[78]
Zhong, S., Zhu, J., Tian, Y., Si, X.: Rag-verus: Repository-level program verification with llms using retrieval augmented generation. arXiv preprint arXiv:2502.05344 (2025)
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.