Towards an Agentic LLM-based Approach to Requirement Formalization from Unstructured Specifications

Alberto Tagliaferro; Bruno Guindani; Livia Lestingi; Matteo Rossi

arxiv: 2604.18228 · v1 · submitted 2026-04-20 · 💻 cs.SE

Towards an Agentic LLM-based Approach to Requirement Formalization from Unstructured Specifications

Alberto Tagliaferro , Bruno Guindani , Livia Lestingi , Matteo Rossi This is my paper

Pith reviewed 2026-05-10 04:32 UTC · model grok-4.3

classification 💻 cs.SE

keywords agentic LLMrequirement formalizationformal verificationnatural language specificationssafety-critical systemssemantic alignmentcompatibility filtering

0 comments

The pith

A modular agentic LLM pipeline extracts formal verification properties from unstructured natural language specifications with 77.8% accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an agentic methodology to automatically derive formal properties suitable for verification from early-stage natural language specifications of safety-critical systems. The method relies on a pipeline that extracts requirements, filters them for compatibility with a chosen formal language, and uses an LLM to translate them into precise formal statements. This addresses the difficulty of ensuring that generated formal artifacts match the meaning of the original informal requirements. Experiments in three scenarios indicate the pipeline succeeds in producing both syntactically correct and semantically aligned properties in 77.8 percent of cases. If the method holds, it would allow more reliable use of formal verification tools early in the design of systems where safety depends on correct requirement interpretation.

Core claim

The authors claim that their agentic LLM-based pipeline, which integrates requirement extraction, compatibility filtering with respect to a target formalism, and translation into formal properties, generates syntactically and semantically aligned formal properties from unstructured specifications, as demonstrated by 77.8% accuracy in three experimental scenarios. By explicitly incorporating modeling and verification constraints, the approach helps connect informal system descriptions to semantically meaningful formal verification.

What carries the argument

The modular pipeline performing requirement extraction, compatibility filtering, and LLM-based translation into formal properties.

If this is right

Enables derivation of verification-ready properties directly from natural language at the start of safety-critical system development.
Reduces the manual effort and potential errors in translating informal requirements to formal artifacts.
Produces properties that account for both modeling and verification constraints of the target formalism.
Supports bridging the gap between unstructured specifications and formal verification through AI assistance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The pipeline could be combined with automated model checkers to provide immediate feedback on potential safety issues in initial designs.
It may generalize to additional formalisms or application domains not tested in the three scenarios.
Future work could add targeted human review at the filtering step to further improve reliability in high-stakes contexts.

Load-bearing premise

The steps of compatibility filtering and LLM translation together maintain the semantic alignment between the informal requirements and the generated formal properties without requiring further human validation.

What would settle it

A case where the generated formal property allows execution traces or states that contradict the safety or functional intent stated in the original natural language specification.

Figures

Figures reproduced from arXiv: 2604.18228 by Alberto Tagliaferro, Bruno Guindani, Livia Lestingi, Matteo Rossi.

read the original abstract

Early-stage specifications of safety-critical systems are typically expressed in natural language, making it difficult to derive formal properties suitable for verification and needed to guarantee safety. While recent Large Language Model (LLM)-based approaches can generate formal artifacts from text, they mainly focus on syntactic correctness and do not ensure semantic alignment between informal requirements and formally verifiable properties. We propose an agentic methodology that automatically extracts verification-ready properties from unstructured specifications. The modular pipeline combines requirement extraction, compatibility filtering with respect to a target formalism, and translation into formal properties. Experimental results across three scenarios show that the pipeline generates syntactically and semantically aligned formal properties with a 77.8% accuracy. By explicitly accounting for modeling and verification constraints, the approach is a paving step towards exploiting Artificial Intelligence (AI) to bridge the gap between informal descriptions and semantically meaningful formal verification.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The agentic pipeline adds a useful modular structure for turning specs into formal properties, but the 77.8% accuracy claim rests on evaluation details that are missing from the abstract.

read the letter

The paper's main takeaway is an agentic LLM pipeline that extracts, filters, and translates unstructured specs into formal properties, with a claimed 77.8% accuracy on alignment. But the lack of evaluation details makes that number hard to assess right now. The new part is the explicit modular structure: requirement extraction followed by compatibility filtering against the target formalism and then translation. This goes beyond earlier LLM efforts that stopped at syntactic output. The filtering step is a practical touch to make sure the results can actually be used for verification. It does well in framing the problem for safety-critical systems where natural language specs are common. The approach tries to account for modeling constraints upfront, which is a step in the right direction for real-world use. The soft spots are in the results section. The abstract gives the accuracy figure but no breakdown on how semantic alignment was measured, what the three scenarios were, or any comparison to baselines. If semantic alignment relies on the same LLM or unblinded review without a clear ground truth, the metric could be overstated. This matches the concern that the claim rests on an unclear reference. The paper seems aimed at researchers in software engineering and formal methods who want to apply LLMs to requirements. Someone looking for a pipeline idea rather than a fully validated method might find it worth reading. It deserves a serious referee because the topic is relevant and the structure is thoughtful, even if the evidence needs bolstering. I would recommend sending it for peer review, with the expectation that the authors provide more on the evaluation protocol.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes an agentic LLM-based pipeline to extract verification-ready formal properties from unstructured natural-language specifications of safety-critical systems. The modular approach consists of requirement extraction, compatibility filtering against a target formalism, and LLM translation to formal properties. Experimental results on three scenarios are reported to produce syntactically and semantically aligned formal properties at 77.8% accuracy, with the goal of bridging informal descriptions to semantically meaningful formal verification.

Significance. If the accuracy claim can be substantiated with a reproducible evaluation protocol, the work would offer a practical step toward automating requirement formalization in safety-critical domains, where manual translation from natural language to verifiable properties remains a major bottleneck.

major comments (2)

[Abstract] Abstract: The headline result of 77.8% accuracy for both syntactic and semantic alignment is stated without any description of the evaluation metric for semantic alignment, the ground-truth reference (expert annotation, formal equivalence check, or otherwise), baseline comparisons, scenario selection criteria, or statistical significance. This directly undermines assessment of the central empirical claim.
[Pipeline description] Pipeline description (requirement extraction + compatibility filtering + LLM translation): No quantitative breakdown is provided on the contribution of the compatibility filtering step to semantic preservation, nor is there an independent validation method (e.g., blinded expert review or automated equivalence checking) that would distinguish true semantic fidelity from plausible but drifted output.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments on our manuscript. We address each major comment below and will revise the manuscript to improve the clarity and substantiation of our empirical claims where feasible.

read point-by-point responses

Referee: [Abstract] Abstract: The headline result of 77.8% accuracy for both syntactic and semantic alignment is stated without any description of the evaluation metric for semantic alignment, the ground-truth reference (expert annotation, formal equivalence check, or otherwise), baseline comparisons, scenario selection criteria, or statistical significance. This directly undermines assessment of the central empirical claim.

Authors: We agree that the abstract is too concise on evaluation details. The full manuscript (Section 4) specifies that syntactic correctness is checked via automated parsing against the target formalism's grammar, while semantic alignment is evaluated by direct expert comparison of intent between the original natural-language requirement and the generated formal property. Ground truth is provided by manually formalized properties created by the authors with domain knowledge. The three scenarios were selected as representative cases from safety-critical domains with publicly available unstructured specs. We performed no baseline comparisons because the work focuses on the novel agentic pipeline rather than a comparative benchmark, and statistical significance is not reported given the small number of scenarios. We will revise the abstract to briefly describe the evaluation approach, ground truth, and scenario criteria while noting these scope limitations. revision: yes
Referee: [Pipeline description] Pipeline description (requirement extraction + compatibility filtering + LLM translation): No quantitative breakdown is provided on the contribution of the compatibility filtering step to semantic preservation, nor is there an independent validation method (e.g., blinded expert review or automated equivalence checking) that would distinguish true semantic fidelity from plausible but drifted output.

Authors: The compatibility filtering step explicitly checks extracted requirements against the syntactic and semantic constraints of the target formalism before translation, which reduces the risk of semantic drift by discarding incompatible cases early. The manuscript describes this qualitatively in the pipeline section. We acknowledge that a quantitative breakdown would strengthen the presentation and will add an ablation study in the revision showing end-to-end accuracy with and without the filtering stage. Semantic validation in the current work relies on post-hoc expert review rather than blinded protocols or automated equivalence checkers; the latter remains an open research problem for natural-language-to-formal translations. We will expand the text to clarify the validation procedure and explicitly discuss it as a limitation. revision: partial

Circularity Check

0 steps flagged

No circularity: accuracy is empirical outcome, not derived by construction

full rationale

The paper describes an agentic LLM pipeline (requirement extraction + compatibility filtering + translation) and reports 77.8% syntactic/semantic alignment accuracy as the result of running the pipeline on three scenarios. No equations, parameters, or derivations are present. The accuracy figure is presented as an independent experimental measurement rather than a quantity obtained by fitting, self-definition, or reduction to the method's own inputs. No load-bearing self-citations or uniqueness claims appear in the abstract or description that would create circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is an empirical methodology proposal with no mathematical derivations, so the ledger contains no free parameters, axioms, or invented entities beyond standard LLM usage.

pith-pipeline@v0.9.0 · 5449 in / 1043 out tokens · 38775 ms · 2026-05-10T04:32:00.314107+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages

[1]

A survey of statistical model checking

Gul Agha and Karl Palmskog. “A survey of statistical model checking”. In:ACM Trans- actions on Modeling and Computer Simulation (TOMACS)28.1 (2018), pp. 1–39

work page 2018
[2]

The benefits of relaxing punctuality

Rajeev Alur et al. “The benefits of relaxing punctuality”. In:Journal of the ACM (JACM) 43.1 (1996), pp. 116–146

work page 1996
[3]

Results of Field Trials with a Mobile Service Robot for Older Adults in 16 Private Households

Markus Bajones et al. “Results of Field Trials with a Mobile Service Robot for Older Adults in 16 Private Households”. In:ACM Transactions on Human-Robot Interaction (THRI)9.2 (2019), pp. 1–27

work page 2019
[4]

Leveraging LLMs for Formal Software Requirements–Challenges and Prospects

Arshad Beg et al. “Leveraging LLMs for Formal Software Requirements–Challenges and Prospects”. In:International Workshop on Artificial Intelligence and fOrmal VERifica- tion, Logic, Automata, and sYnthesis. ceur-ws.org. 2025, pp. 95–105

work page 2025
[5]

Leveraging natural language processing for a consistency check- ing toolchain of automotive requirements

Vincent Bertram et al. “Leveraging natural language processing for a consistency check- ing toolchain of automotive requirements”. In:International Requirements Engineering Conference. IEEE. 2023, pp. 212–222

work page 2023
[6]

Language Models are Few-Shot Learners

Tom B. Brown et al. “Language Models are Few-Shot Learners”. In:Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual. 2020. [7]Case studies - Service robots. 2018.url:https://ifr.org/case- studies/service- robots-case-studies

work page 2020
[7]

LLM-basedSatisfiabilityCheckingofStringRequirementsbyConsistent Data and Checker Generation

BoqiChenetal.“LLM-basedSatisfiabilityCheckingofStringRequirementsbyConsistent Data and Checker Generation”. In:International Requirements Engineering Conference. 2025, pp. 231–243. 11 Tagliaferro, Guindani, Lestingi, and Rossi

work page 2025
[8]

nl2spec: Interactively translating unstructured natural language to temporal logics with large language models

Matthias Cosler et al. “nl2spec: Interactively translating unstructured natural language to temporal logics with large language models”. In:International Conference on Computer Aided Verification. Springer. 2023, pp. 383–396

work page 2023
[9]

Uppaal SMC tutorial

Alexandre David et al. “Uppaal SMC tutorial”. In:International journal on software tools for technology transfer17.4 (2015), pp. 397–415

work page 2015
[10]

Automatic Generation of Formal Specification and Verification Annotations Using LLMs and Test Oracles

Joao Pascoal Faria et al. “Automatic Generation of Formal Specification and Verification Annotations Using LLMs and Test Oracles”. In:arXiv preprint arXiv:2601.12845(2026)

work page arXiv 2026
[11]

Lessons from the use of natural language inference (nli) in requirements engineering tasks

Mohamad Fazelnia et al. “Lessons from the use of natural language inference (nli) in requirements engineering tasks”. In:International Requirements Engineering Conference. IEEE. 2024, pp. 103–115

work page 2024
[12]

Automated requirement contradiction detection through formal logic and LLMs

Alexander Elenga Gärtner and Dietmar Göhlich. “Automated requirement contradiction detection through formal logic and LLMs”. In:Automated Software Engineering31.2 (2024), p. 49

work page 2024
[13]

Translating Requirements in Property Specification Patterns using LLMs

Dario Guidotti et al. “Translating Requirements in Property Specification Patterns using LLMs”. In:31st RCRA Workshop. ceur-ws.org. 2024, pp. 68–80

work page 2024
[14]

State of Practice: LLMs in Software Engineering and Software Architecture

Jasmin Jahić and Ashkan Sami. “State of Practice: LLMs in Software Engineering and Software Architecture”. In:International Conference on Software Architecture Compan- ion. 2024, pp. 311–318

work page 2024
[15]

Extracting Formal Specifications From Documents Using LLMS for Test Au- tomation

Hui Li et al. “Extracting Formal Specifications From Documents Using LLMS for Test Au- tomation”. In:International Conference on Program Comprehension. IEEE. 2025, pp. 1– 12

work page 2025
[16]

Data-driven requirements elicitation: A systematic literature review

Sachiko Lim et al. “Data-driven requirements elicitation: A systematic literature review”. In:SN Computer Science2.1 (2021), p. 16

work page 2021
[17]

Improving requirements completeness: Automated assistance through large language models

Dipeeka Luitel et al. “Improving requirements completeness: Automated assistance through large language models”. In:Requirements Engineering29.1 (2024), pp. 73–95

work page 2024
[18]

URL https://arxiv.org/abs/2512

Zhi Ma et al. “Bridging Natural Language and Formal Specification–Automated Trans- lation of Software Requirements to LTL via Hierarchical Semantics Decomposition Using LLMs”. In:arXiv preprint arXiv:2512.17334(2025)

work page arXiv 2025
[19]

Can GPT-4 aid in detecting ambiguities, inconsistencies, and incompleteness in requirements analysis? A comprehensive case study

Taslim Mahbub et al. “Can GPT-4 aid in detecting ambiguities, inconsistencies, and incompleteness in requirements analysis? A comprehensive case study.” In:IEEE Access (2024)

work page 2024
[20]

Specification patterns for robotic missions

Claudio Menghi et al. “Specification patterns for robotic missions”. In:IEEE Transactions on Software Engineering(2019)

work page 2019
[21]

The state of the art in automated requirements elicitation

Hendrik Meth et al. “The state of the art in automated requirements elicitation”. In: Information and Software Technology55.10 (2013), pp. 1695–1709

work page 2013
[22]

Towards verifiable multi-agent interaction pattern specifica- tion

Alberto Tagliaferro et al. “Towards verifiable multi-agent interaction pattern specifica- tion”. In:Proceedings of the 2024 IEEE/ACM 12th International Conference on Formal Methods in Software Engineering (FormaliSE). 2024, pp. 122–126

work page 2024
[23]

Verification-Oriented Specification of Multi-agent Interaction Patterns

Alberto Tagliaferro et al. “Verification-Oriented Specification of Multi-agent Interaction Patterns”.In:Workshop on Agents and Robots for reliable Engineered Autonomy.Springer. 2024, pp. 38–53

work page 2024
[24]

Attention is All you Need

Ashish Vaswani et al. “Attention is All you Need”. In:Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA. 2017, pp. 5998–6008. 12 Tagliaferro, Guindani, Lestingi, and Rossi

work page 2017
[25]

Enchanting program specification synthesis by large language models using static analysis and program verification

Cheng Wen et al. “Enchanting program specification synthesis by large language models using static analysis and program verification”. In:International Conference on Computer Aided Verification. Springer. 2024, pp. 302–328

work page 2024
[26]

Autoformalization with large language models

Yuhuai Wu et al. “Autoformalization with large language models”. In:Advances in neural information processing systems35 (2022), pp. 32353–32368

work page 2022
[27]

Formal consistency checking over specifications in natural languages

Rongjie Yan et al. “Formal consistency checking over specifications in natural languages”. In:Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE. 2015, pp. 1677–1682

work page 2015
[28]

A., D˛ abrowski, J., Alhoshan, W., Zhao, L., and Ferrari, A., 2025

Mohammad Amin Zadenoori et al. “Large language models (LLMs) for requirements engineering (RE): A systematic literature review”. In:arXiv preprint arXiv:2509.11446 (2025)

work page arXiv 2025
[29]

NL2CTL: automatic generation of formal requirements specifica- tions via large language models

Mengyan Zhao et al. “NL2CTL: automatic generation of formal requirements specifica- tions via large language models”. In:International Conference on Formal Engineering Methods. Springer. 2024, pp. 1–17. 13

work page 2024

[1] [1]

A survey of statistical model checking

Gul Agha and Karl Palmskog. “A survey of statistical model checking”. In:ACM Trans- actions on Modeling and Computer Simulation (TOMACS)28.1 (2018), pp. 1–39

work page 2018

[2] [2]

The benefits of relaxing punctuality

Rajeev Alur et al. “The benefits of relaxing punctuality”. In:Journal of the ACM (JACM) 43.1 (1996), pp. 116–146

work page 1996

[3] [3]

Results of Field Trials with a Mobile Service Robot for Older Adults in 16 Private Households

Markus Bajones et al. “Results of Field Trials with a Mobile Service Robot for Older Adults in 16 Private Households”. In:ACM Transactions on Human-Robot Interaction (THRI)9.2 (2019), pp. 1–27

work page 2019

[4] [4]

Leveraging LLMs for Formal Software Requirements–Challenges and Prospects

Arshad Beg et al. “Leveraging LLMs for Formal Software Requirements–Challenges and Prospects”. In:International Workshop on Artificial Intelligence and fOrmal VERifica- tion, Logic, Automata, and sYnthesis. ceur-ws.org. 2025, pp. 95–105

work page 2025

[5] [5]

Leveraging natural language processing for a consistency check- ing toolchain of automotive requirements

Vincent Bertram et al. “Leveraging natural language processing for a consistency check- ing toolchain of automotive requirements”. In:International Requirements Engineering Conference. IEEE. 2023, pp. 212–222

work page 2023

[6] [6]

Language Models are Few-Shot Learners

Tom B. Brown et al. “Language Models are Few-Shot Learners”. In:Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual. 2020. [7]Case studies - Service robots. 2018.url:https://ifr.org/case- studies/service- robots-case-studies

work page 2020

[7] [7]

LLM-basedSatisfiabilityCheckingofStringRequirementsbyConsistent Data and Checker Generation

BoqiChenetal.“LLM-basedSatisfiabilityCheckingofStringRequirementsbyConsistent Data and Checker Generation”. In:International Requirements Engineering Conference. 2025, pp. 231–243. 11 Tagliaferro, Guindani, Lestingi, and Rossi

work page 2025

[8] [8]

nl2spec: Interactively translating unstructured natural language to temporal logics with large language models

Matthias Cosler et al. “nl2spec: Interactively translating unstructured natural language to temporal logics with large language models”. In:International Conference on Computer Aided Verification. Springer. 2023, pp. 383–396

work page 2023

[9] [9]

Uppaal SMC tutorial

Alexandre David et al. “Uppaal SMC tutorial”. In:International journal on software tools for technology transfer17.4 (2015), pp. 397–415

work page 2015

[10] [10]

Automatic Generation of Formal Specification and Verification Annotations Using LLMs and Test Oracles

Joao Pascoal Faria et al. “Automatic Generation of Formal Specification and Verification Annotations Using LLMs and Test Oracles”. In:arXiv preprint arXiv:2601.12845(2026)

work page arXiv 2026

[11] [11]

Lessons from the use of natural language inference (nli) in requirements engineering tasks

Mohamad Fazelnia et al. “Lessons from the use of natural language inference (nli) in requirements engineering tasks”. In:International Requirements Engineering Conference. IEEE. 2024, pp. 103–115

work page 2024

[12] [12]

Automated requirement contradiction detection through formal logic and LLMs

Alexander Elenga Gärtner and Dietmar Göhlich. “Automated requirement contradiction detection through formal logic and LLMs”. In:Automated Software Engineering31.2 (2024), p. 49

work page 2024

[13] [13]

Translating Requirements in Property Specification Patterns using LLMs

Dario Guidotti et al. “Translating Requirements in Property Specification Patterns using LLMs”. In:31st RCRA Workshop. ceur-ws.org. 2024, pp. 68–80

work page 2024

[14] [14]

State of Practice: LLMs in Software Engineering and Software Architecture

Jasmin Jahić and Ashkan Sami. “State of Practice: LLMs in Software Engineering and Software Architecture”. In:International Conference on Software Architecture Compan- ion. 2024, pp. 311–318

work page 2024

[15] [15]

Extracting Formal Specifications From Documents Using LLMS for Test Au- tomation

Hui Li et al. “Extracting Formal Specifications From Documents Using LLMS for Test Au- tomation”. In:International Conference on Program Comprehension. IEEE. 2025, pp. 1– 12

work page 2025

[16] [16]

Data-driven requirements elicitation: A systematic literature review

Sachiko Lim et al. “Data-driven requirements elicitation: A systematic literature review”. In:SN Computer Science2.1 (2021), p. 16

work page 2021

[17] [17]

Improving requirements completeness: Automated assistance through large language models

Dipeeka Luitel et al. “Improving requirements completeness: Automated assistance through large language models”. In:Requirements Engineering29.1 (2024), pp. 73–95

work page 2024

[18] [18]

URL https://arxiv.org/abs/2512

Zhi Ma et al. “Bridging Natural Language and Formal Specification–Automated Trans- lation of Software Requirements to LTL via Hierarchical Semantics Decomposition Using LLMs”. In:arXiv preprint arXiv:2512.17334(2025)

work page arXiv 2025

[19] [19]

Can GPT-4 aid in detecting ambiguities, inconsistencies, and incompleteness in requirements analysis? A comprehensive case study

Taslim Mahbub et al. “Can GPT-4 aid in detecting ambiguities, inconsistencies, and incompleteness in requirements analysis? A comprehensive case study.” In:IEEE Access (2024)

work page 2024

[20] [20]

Specification patterns for robotic missions

Claudio Menghi et al. “Specification patterns for robotic missions”. In:IEEE Transactions on Software Engineering(2019)

work page 2019

[21] [21]

The state of the art in automated requirements elicitation

Hendrik Meth et al. “The state of the art in automated requirements elicitation”. In: Information and Software Technology55.10 (2013), pp. 1695–1709

work page 2013

[22] [22]

Towards verifiable multi-agent interaction pattern specifica- tion

Alberto Tagliaferro et al. “Towards verifiable multi-agent interaction pattern specifica- tion”. In:Proceedings of the 2024 IEEE/ACM 12th International Conference on Formal Methods in Software Engineering (FormaliSE). 2024, pp. 122–126

work page 2024

[23] [23]

Verification-Oriented Specification of Multi-agent Interaction Patterns

Alberto Tagliaferro et al. “Verification-Oriented Specification of Multi-agent Interaction Patterns”.In:Workshop on Agents and Robots for reliable Engineered Autonomy.Springer. 2024, pp. 38–53

work page 2024

[24] [24]

Attention is All you Need

Ashish Vaswani et al. “Attention is All you Need”. In:Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA. 2017, pp. 5998–6008. 12 Tagliaferro, Guindani, Lestingi, and Rossi

work page 2017

[25] [25]

Enchanting program specification synthesis by large language models using static analysis and program verification

Cheng Wen et al. “Enchanting program specification synthesis by large language models using static analysis and program verification”. In:International Conference on Computer Aided Verification. Springer. 2024, pp. 302–328

work page 2024

[26] [26]

Autoformalization with large language models

Yuhuai Wu et al. “Autoformalization with large language models”. In:Advances in neural information processing systems35 (2022), pp. 32353–32368

work page 2022

[27] [27]

Formal consistency checking over specifications in natural languages

Rongjie Yan et al. “Formal consistency checking over specifications in natural languages”. In:Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE. 2015, pp. 1677–1682

work page 2015

[28] [28]

A., D˛ abrowski, J., Alhoshan, W., Zhao, L., and Ferrari, A., 2025

Mohammad Amin Zadenoori et al. “Large language models (LLMs) for requirements engineering (RE): A systematic literature review”. In:arXiv preprint arXiv:2509.11446 (2025)

work page arXiv 2025

[29] [29]

NL2CTL: automatic generation of formal requirements specifica- tions via large language models

Mengyan Zhao et al. “NL2CTL: automatic generation of formal requirements specifica- tions via large language models”. In:International Conference on Formal Engineering Methods. Springer. 2024, pp. 1–17. 13

work page 2024