Recognition: no theorem link
Coherency through formalisations of Structured Natural Language, A case study on FRETish
Pith reviewed 2026-05-12 04:20 UTC · model grok-4.3
The pith
Coherency requires that requirement descriptions at every formalization level follow roughly the same logical structure.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that the coherency guideline produces a new automated translation from FRETish to MTL that is logically equivalent to the original FRET translation (verified by model checking) yet statistically preferable, while the comparison process reveals inconsistencies in the existing mapping.
What carries the argument
The coherency guideline, which requires that natural-language, technical, and formal versions of a requirement follow approximately the same logical structure and is used here to construct an alternative translation function from FRETish sentences to MTL formulas.
If this is right
- The new translation can be substituted for the original inside FRET without changing any verification results.
- Structural mismatches between FRETish text and MTL formulas become detectable by direct comparison rather than by exhaustive checking.
- Structured natural language can function as a stable intermediate layer when LLMs are asked to produce or reason about requirements that must later be verified formally.
- Statistical measures can be used to choose among logically equivalent formalizations.
Where Pith is reading between the lines
- The same coherency test could be applied to other requirement tools that move between multiple description layers.
- Requiring structural preservation might reduce the rate at which LLM-generated requirements fail formal verification.
- The guideline could be turned into an automated check that flags translations whose logical shape diverges from the source text.
Load-bearing premise
That keeping the logical structure constant across formalization levels produces translations that are not only equivalent but measurably better in practice.
What would settle it
A concrete counter-example in which the original FRET translation and the new coherent translation differ in the verdicts they return on some model-checking instance, or a larger set of requirements where the original version outperforms the new one on the reported statistics.
Figures
read the original abstract
Formalisation is the process of writing system requirements in a formal language. These requirements mostly originate in Natural Language. In the field of Formal Methods, formalisation is often identified as one of the most delicate and complicated steps in the verification process. Not seldomly, formalisation tools and environments choose various levels of requirement descriptions: Natural Language, Technical Language, Diagram Representations and Formal Language, to mention a few. In the literature, there are various maxims and principles of good practice to guide the process of requirement formalisation. In this paper we propose a new guideline: Coherency through Formalisations. The guideline states that the different levels of formalisation mentioned above should roughly follow the same logical structure. The principle seems particularly relevant in the setting where LLMs are prompted to perform reasoning tasks that can be checked by formal tools using Structured Natural Language to act as an intermediate layer bridging both paradigms. In the light of coherency, we analyze NASA's Formal Requirement Elicitation Tool FRET and propose an alternative automated translation of the Controlled Natural Language FRETish to the formal language of MTL. We compare our translation to the original translation and prove equivalence using model checking. Some statistics are performed which seem to favor the new translation. As expected, the translation process yielded interesting reflections and revealed inconsistencies which we present and discuss.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces the guideline 'Coherency through Formalisations,' which requires that different levels of requirement formalization (natural language, structured language, diagrams, formal language) preserve roughly the same logical structure. It analyzes NASA's FRET tool, proposes an alternative automated translation from the controlled natural language FRETish to Metric Temporal Logic (MTL), claims to prove equivalence to FRET's original translation via model checking, and reports statistics that appear to favor the new translation. The work also discusses inconsistencies revealed during the translation process.
Significance. If the claimed equivalence holds generally and the statistics are robust, the coherency guideline and improved FRETish-to-MTL translation could strengthen multi-level formalization pipelines, especially those using structured natural language as an intermediate layer for LLM-assisted reasoning. The approach provides a concrete case study that could inform tool design in formal methods.
major comments (1)
- [Abstract] Abstract: The claim that equivalence is 'proven using model checking' only establishes agreement on the finite set of traces or models actually checked. Without an inductive argument over the translation rules or a bisimulation relating the two translation functions, this does not establish general semantic equivalence for arbitrary well-formed FRETish sentences (e.g., differences in scoping of 'until', 'globally', or 'eventually' operators). The reported statistics therefore compare procedures whose relationship is only partially verified.
minor comments (1)
- [Abstract] Abstract: 'Not seldomly' is nonstandard; 'Not seldom' or 'Frequently' would be clearer.
Simulated Author's Rebuttal
We thank the referee for the careful reading and for highlighting the distinction between verification on finite instances and a general semantic proof. We address the single major comment below and will revise the manuscript to avoid overstating the strength of the model-checking evidence.
read point-by-point responses
-
Referee: [Abstract] Abstract: The claim that equivalence is 'proven using model checking' only establishes agreement on the finite set of traces or models actually checked. Without an inductive argument over the translation rules or a bisimulation relating the two translation functions, this does not establish general semantic equivalence for arbitrary well-formed FRETish sentences (e.g., differences in scoping of 'until', 'globally', or 'eventually' operators). The reported statistics therefore compare procedures whose relationship is only partially verified.
Authors: We agree that the wording 'prove equivalence using model checking' is imprecise and risks suggesting a general result. In the manuscript we translated every syntactic pattern defined in the FRETish grammar (including all combinations of temporal operators and their scoping) to both the original FRET translation and our alternative, then used model checking to confirm that the resulting MTL formulas agree on all traces up to a sufficient bound. This covers the finite set of patterns that FRETish actually permits, but it does not constitute an inductive or bisimulation proof over the translation functions themselves. We will revise the abstract, the relevant section on the comparison, and the conclusion to state that equivalence was verified by exhaustive model checking over the FRETish pattern catalogue rather than claiming a general semantic proof. The statistics will be presented as comparing two procedures whose agreement has been confirmed on the complete set of FRETish constructs. revision: yes
Circularity Check
No circularity; derivation relies on external model checking and comparative statistics
full rationale
The paper introduces coherency as a proposed guideline, then constructs an alternative FRETish-to-MTL translation motivated by it. Equivalence is verified separately via model checking (an external tool applied to concrete traces) and statistics are computed on the outputs of the two independent translations. No step reduces a result to its own definition by construction, no parameters are fitted and relabeled as predictions, and no load-bearing claims rest on self-citations or imported uniqueness theorems. The chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Model checking can establish equivalence between two translations of requirements
Reference graph
Works this paper leans on
-
[1]
Henzinger, Logics and models of real time: A survey
Rajeev Alur and Thomas A. Henzinger, Logics and models of real time: A survey. In Workshop/School/Symposium of the REX Project (Research and Education in Concurrent Systems), Springer, pp. 74–106, 1991. 15
work page 1991
-
[2]
Ezio Bartocci and Yli` es Falcone, Lectures on runtime verification. Springer, 2018
work page 2018
-
[3]
In International Conference on Formal Engineering Methods, Springer, pp
Andreas Bauer, Martin Leucker, and Jonathan Streit, SALT—structured assertion language for temporal logic. In International Conference on Formal Engineering Methods, Springer, pp. 757–775, 2006
work page 2006
-
[4]
In NASA Formal Methods Symposium, Springer, pp
Andreas Bauer and Martin Leucker, The theory and practice of SALT. In NASA Formal Methods Symposium, Springer, pp. 13–40, 2011
work page 2011
-
[5]
In International Conference on Computer Aided Verification, Springer, pp
Roderick Bloem, Roberto Cavada, Ingo Pill, Marco Roveri, and Andrei Tchaltsev, RAT: A Tool for the Formal Analysis of Requirements: (Tool Paper). In International Conference on Computer Aided Verification, Springer, pp. 263–267, 2007
work page 2007
-
[6]
Roman B¨ ogli, Atefeh Rohani, Thomas Studer, Christos Tsigkanos, and Timo Kehrer, Tem- poral logics meet real-world software requirements: a reality check. In 2025 IEEE/ACM 13th International Conference on Formal Methods in Software Engineering (FormaliSE), IEEE, pp. 74–85, 2025
work page 2025
-
[7]
Achim D. Brucker, Teddy Cameron-Burke, and Amy Stell, Formally verified interval arith- metic and its application to program verification. In Proceedings of the 2024 IEEE/ACM 12th International Conference on Formal Methods in Software Engineering (FormaliSE), pp. 111–121, 2024
work page 2024
-
[8]
Andrea Brunello, Angelo Montanari, and Mark Reynolds, Synthesis of LTL formulas from natural language texts: State of the art and research directions. In 26th International sym- posium on temporal representation and reasoning (TIME 2019), Schloss Dagstuhl–Leibniz- Zentrum f¨ ur Informatik, pp. 17–1, 2019
work page 2019
-
[9]
In 2019 IEEE 17th international conference on industrial informatics (INDIN), IEEE, pp
Igor Buzhinsky, Formalization of natural language requirements into temporal logics: a survey. In 2019 IEEE 17th international conference on industrial informatics (INDIN), IEEE, pp. 400–406, 2019
work page 2019
-
[10]
In International Conference on Computer Aided Verification, Springer, pp
Adrien Champion, Alain Mebsout, Christoph Sticksel, and Cesare Tinelli, The Kind 2 model checker. In International Conference on Computer Aided Verification, Springer, pp. 510–517, 2016
work page 2016
-
[11]
Esther Conrad, Laura Titolo, Dimitra Giannakopoulou, Thomas Pressburger, and Aaron Dutle, A compositional proof framework for FRETish requirements. In Proceedings of the 11th ACM SIGPLAN International Conference on Certified Programs and Proofs (CPP 2022), Springer, doi:10.1145/3497775.3503685, 2022
-
[12]
In International Conference on Computer Aided Verification, Springer, pp
Matthias Cosler, Christopher Hahn, Daniel Mendoza, Frederik Schmitt, and Caroline Trip- pel, nl2spec: Interactively translating unstructured natural language to temporal logics with large language models. In International Conference on Computer Aided Verification, Springer, pp. 383–396, 2023
work page 2023
-
[13]
Matthew B. Dwyer, George S. Avrunin, and James C. Corbett, Patterns in property specifications for finite-state verification. In Proceedings of the 21st international conference on Software engineering, pp. 411–420, 1999
work page 1999
-
[14]
Aaron W. Fifarek, Lucas G. Wagner, Jonathan A. Hoffman, Benjamin D. Rodes, M. An- thony Aiello, and Jennifer A. Davis, SpeAR v2.0: Formalized past LTL specification and analysis of requirements. In NASA Formal Methods Symposium, Springer, pp. 420–426, 2017. 16
work page 2017
-
[15]
In Proceedings of the 7th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pp
Dov Gabbay, Amir Pnueli, Saharon Shelah, and Jonathan Stavi, On the temporal analysis of fairness. In Proceedings of the 7th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pp. 163–173, 1980
work page 1980
-
[16]
In International Conference on Computer Aided Verification, Springer, pp
Andrew Gacek, John Backes, Mike Whalen, Lucas Wagner, and Elaheh Ghassabani, The JKind model checker. In International Conference on Computer Aided Verification, Springer, pp. 20–27, 2018
work page 2018
-
[17]
arXiv preprint arXiv:1905.04422, 2019
Tiantian Gao, Controlled natural languages and default reasoning. arXiv preprint arXiv:1905.04422, 2019
-
[18]
Dimitra Giannakopoulou, Thomas Pressburger, Anastasia Mavridou, Julian Rhein, Johann Schumann, and Nija Shi, Formal Requirements Elicitation with FRET. REFSQ Workshops, 2020
work page 2020
-
[19]
Dimitra Giannakopoulou, Thomas Pressburger, Anastasia Mavridou, and Johann Schu- mann, Generation of formal requirements from structured natural language. In Interna- tional working conference on requirements engineering: Foundation for software quality, Springer, pp. 19–35, 2020
work page 2020
-
[20]
Dimitra Giannakopoulou, Andreas Katis, Anastasia Mavridou, and Thomas Pressburger, Compositional realizability checking within FRET. 2021
work page 2021
-
[21]
Information and Software Technology 137:106590, 2021
Dimitra Giannakopoulou, Thomas Pressburger, Anastasia Mavridou, and Johann Schu- mann, Automated formalization of structured natural language requirements. Information and Software Technology 137:106590, 2021
work page 2021
-
[22]
Artificial Intelligence and Law 33(2):405–435, 2025
Clement Guitton, Aurelia Tam` o-Larrieux, Simon Mayer, and Gijs van Dijck, The challenge of open-texture in law. Artificial Intelligence and Law 33(2):405–435, 2025
work page 2025
-
[23]
Christopher Hahn, Frederik Schmitt, Julia J. Tillman, Niklas Metzger, Julian Siber, and Bernd Finkbeiner, Formal specifications from natural language. arXiv preprint arXiv:2206.01962, 2022
-
[24]
Halpern and Yoav Shoham, A propositional modal logic of time intervals
Joseph Y. Halpern and Yoav Shoham, A propositional modal logic of time intervals. Journal of the ACM (JACM) 38(4):935–962, 1991
work page 1991
-
[25]
Artificial Intelligence and Law, 2022
Liane Huttner and Denis Merigoux, Catala: Moving Towards the Future of Legal Expert Systems. Artificial Intelligence and Law, 2022
work page 2022
-
[26]
Journal of Cross-disciplinary Research in Computational Law 2(1), 2023
Florian Idelberger, The Uncanny Valley of Computable Contracts: Assessing Controlled Natural Languages for Computable Contracts. Journal of Cross-disciplinary Research in Computational Law 2(1), 2023
work page 2023
-
[27]
In NASA Formal Methods Symposium, Springer, pp
Andreas Katis, Anastasia Mavridou, and Thomas Pressburger, A Streamlined, Formal Approach to Requirements-Based Testing. In NASA Formal Methods Symposium, Springer, pp. 159–179, 2025
work page 2025
-
[28]
In International Conference on Computer Aided Verification, Springer, pp
Andreas Katis, Anastasia Mavridou, Dimitra Giannakopoulou, Thomas Pressburger, and Johann Schumann, Capture, analyze, diagnose: Realizability checking of requirements in FRET. In International Conference on Computer Aided Verification, Springer, pp. 490–504, 2022
work page 2022
-
[29]
David Kooi and Anastasia Mavridou, Integrating realizability checking in FRET. Technical report, 2019. 17
work page 2019
-
[30]
Real-time sys- tems 2(4):255–299, 1990
Ron Koymans, Specifying real-time properties with metric temporal logic. Real-time sys- tems 2(4):255–299, 1990
work page 1990
-
[31]
Computational linguistics 40(1):121–170, 2014
Tobias Kuhn, A survey and classification of controlled natural languages. Computational linguistics 40(1):121–170, 2014
work page 2014
-
[32]
Workshop on Language and Robotics at CoRL 2022, 2022
Jason Xinyu Liu, Ziyi Yang, Benjamin Schornstein, Sam Liang, Ifrah Idrees, Stefanie Tellex, and Ankit Shah, Lang2ltl: Translating natural language commands to temporal specifi- cation with large language models. Workshop on Language and Robotics at CoRL 2022, 2022
work page 2022
-
[33]
Bulletin-European Association for Theoretical Computer Science 79:122–128, 2003
Nicolas Markey, Temporal logic with past is exponentially more succinct. Bulletin-European Association for Theoretical Computer Science 79:122–128, 2003
work page 2003
-
[34]
IEEE software 29(2):17–18, 2012
Alistair Mavin, Listen, then use EARS. IEEE software 29(2):17–18, 2012
work page 2012
-
[35]
Daniel Mendoza, Christopher Hahn, and Caroline Trippel, Translating natural language to temporal logics with large language models and model checkers. FMCAD, 2024
work page 2024
-
[36]
IEEE Transactions on Software Engi- neering 47(10):2208–2224, 2019
Claudio Menghi, Christos Tsigkanos, Patrizio Pelliccione, Carlo Ghezzi, and Thorsten Berger, Specification patterns for robotic missions. IEEE Transactions on Software Engi- neering 47(10):2208–2224, 2019
work page 2019
- [37]
-
[38]
Ben Moszkowski, A temporal logic for multi-level reasoning about hardware. Technical report, 1982
work page 1982
-
[39]
In Proceedings of the 1985 ACM thirteenth annual conference on Computer Science, pp
Eva-Martin Mueckstein, Controlled natural language interfaces (extended abstract) the best of three worlds. In Proceedings of the 1985 ACM thirteenth annual conference on Computer Science, pp. 176–178, 1985
work page 1985
-
[40]
Joosten, Model-checking in the Foundations of Algorithmic Law and the Case of Regulation 561
Moritz M¨ uller and Joost J. Joosten, Model-checking in the Foundations of Algorithmic Law and the Case of Regulation 561. arXiv:2307.05658 [cs.LO], 2023
-
[41]
Jo¨ el Ouaknine and James Worrell, On metric temporal logic and faulty Turing machines. In International Conference on Foundations of Software Science and Computation Structures, Springer, pp. 217–230, 2006
work page 2006
-
[42]
In Proceedings of the eighteenth international symposium on Software testing and analysis, pp
Charles Pecheur, Franco Raimondi, and Guillaume Brat, A formal analysis of requirements- based testing. In Proceedings of the eighteenth international symposium on Software testing and analysis, pp. 47–56, 2009
work page 2009
-
[43]
Synthese 190(14):2897–2924, 2013
Jaroslav Peregrin and Vladim´ ır Svoboda, Criteria for logical formalization. Synthese 190(14):2897–2924, 2013
work page 2013
-
[44]
Amir Pnueli, The temporal logic of programs. In 18th annual symposium on foundations of computer science (sfcs 1977), IEEE, Computer Society Press, pp. 46–57, 1977
work page 1977
-
[45]
Arthur N. Prior, Time and modality. Oxford University Press, 1957
work page 1957
-
[46]
In Working Conference on Verified Software: Theories, Tools, and Experiments, Springer, pp
Kristin Yvonne Rozier, Specification: The biggest bottleneck in formal methods and au- tonomy. In Working Conference on Verified Software: Theories, Tools, and Experiments, Springer, pp. 8–26, 2016. 18
work page 2016
-
[47]
In Proceedings of CAV 2014, LNCS 8559, Springer, pp
Alessandro Cimatti, Alberto Griggio, Bastiaan Schaafsma, and Roberto Sebastiani, The nuXmv Symbolic Model Checker. In Proceedings of CAV 2014, LNCS 8559, Springer, pp. 334–342, 2014
work page 2014
-
[48]
Seminar at Technical University of Denmark, 2023
Anastasia Mavridou, Formal Requirements Elicitation With Fret. Seminar at Technical University of Denmark, 2023
work page 2023
-
[49]
GitHub repository, ac- cessed 2026-04-22
NASA-SW-VnV, FRET: Formal Requirements Elicitation Tool. GitHub repository, ac- cessed 2026-04-22. 19
work page 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.