pith. sign in

arxiv: 2206.01962 · v2 · pith:DDRJVLMInew · submitted 2022-06-04 · 💻 cs.SE · cs.LG· cs.PL

Formal Specifications from Natural Language

classification 💻 cs.SE cs.LGcs.PL
keywords languageformalmodelsnaturalexpressionsfine-tunegeneralizationlogic
0
0 comments X
read the original abstract

We study the generalization abilities of language models when translating natural language into formal specifications with complex semantics. In particular, we fine-tune language models on three datasets consisting of English sentences and their corresponding formal representation: 1) regular expressions (regex), frequently used in programming and search; 2) First-order logic (FOL), commonly used in software verification and theorem proving; and 3) linear-time temporal logic (LTL), which forms the basis for industrial hardware specification languages. Our experiments show that, in these diverse domains, the language models maintain their generalization capabilities from pre-trained knowledge of natural language to generalize, e.g., to new variable names or operator descriptions. Additionally, they achieve competitive performance, and even outperform the state-of-the-art for translating into regular expressions, with the benefits of being easy to access, efficient to fine-tune, and without a particular need for domain-specific reasoning.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Can LLMs Write Correct TLA+ Specifications? Evaluating Natural-Language-to-TLA+ Generation

    cs.AI 2026-06 accept novelty 8.0

    Across 30 LLMs and 205 TLA+ tasks, syntactic correctness reaches at most 26.6% and semantic correctness 8.6%, with all successes limited to progressive prompting and no advantage from larger models.

  2. Fixing FOLIO and MALLS: Verified Annotations and an LLM-assisted Framework to Focus Human Relabeling

    cs.CL 2026-06 unverdicted novelty 7.0

    Audit finds 36-39% incorrect FOL labels in FOLIO and MALLS; corrections raise LLM accuracy 9-22 points and an LLM-guided review framework achieves 90% dataset quality after checking fewer than 24% of examples.

  3. POSTCONDBENCH: Benchmarking Correctness and Completeness in Formal Postcondition Inference

    cs.SE 2026-05 unverdicted novelty 7.0

    POSTCONDBENCH is a new multilingual benchmark that evaluates LLM postcondition generation on real code using defect discrimination to assess completeness beyond surface matching.

  4. Coherency through formalisations of Structured Natural Language, A case study on FRETish

    cs.CL 2026-05 unverdicted novelty 5.0

    A coherency guideline for multi-level requirement formalization is introduced, along with an equivalent but statistically improved FRETish-to-MTL translation for the FRET tool.