Formal Specifications from Natural Language
read the original abstract
We study the generalization abilities of language models when translating natural language into formal specifications with complex semantics. In particular, we fine-tune language models on three datasets consisting of English sentences and their corresponding formal representation: 1) regular expressions (regex), frequently used in programming and search; 2) First-order logic (FOL), commonly used in software verification and theorem proving; and 3) linear-time temporal logic (LTL), which forms the basis for industrial hardware specification languages. Our experiments show that, in these diverse domains, the language models maintain their generalization capabilities from pre-trained knowledge of natural language to generalize, e.g., to new variable names or operator descriptions. Additionally, they achieve competitive performance, and even outperform the state-of-the-art for translating into regular expressions, with the benefits of being easy to access, efficient to fine-tune, and without a particular need for domain-specific reasoning.
This paper has not been read by Pith yet.
Forward citations
Cited by 4 Pith papers
-
Can LLMs Write Correct TLA+ Specifications? Evaluating Natural-Language-to-TLA+ Generation
Across 30 LLMs and 205 TLA+ tasks, syntactic correctness reaches at most 26.6% and semantic correctness 8.6%, with all successes limited to progressive prompting and no advantage from larger models.
-
Fixing FOLIO and MALLS: Verified Annotations and an LLM-assisted Framework to Focus Human Relabeling
Audit finds 36-39% incorrect FOL labels in FOLIO and MALLS; corrections raise LLM accuracy 9-22 points and an LLM-guided review framework achieves 90% dataset quality after checking fewer than 24% of examples.
-
POSTCONDBENCH: Benchmarking Correctness and Completeness in Formal Postcondition Inference
POSTCONDBENCH is a new multilingual benchmark that evaluates LLM postcondition generation on real code using defect discrimination to assess completeness beyond surface matching.
-
Coherency through formalisations of Structured Natural Language, A case study on FRETish
A coherency guideline for multi-level requirement formalization is introduced, along with an equivalent but statistically improved FRETish-to-MTL translation for the FRET tool.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.