pith. sign in

arxiv: 2310.04870 · v5 · pith:5SYYQOBLnew · submitted 2023-10-07 · 💻 cs.FL · cs.AI· cs.LG· cs.LO

Lemur: Integrating Large Language Models in Automated Program Verification

classification 💻 cs.FL cs.AIcs.LGcs.LO
keywords automatedverificationprogramllmsmethodologyabstractbenchmarkscalculus
0
0 comments X
read the original abstract

The demonstrated code-understanding capability of LLMs raises the question of whether they can be used for automated program verification, a task that demands high-level abstract reasoning about program properties that is challenging for verification tools. We propose a general methodology to combine the power of LLMs and automated reasoners for automated program verification. We formally describe this methodology as a set of transition rules and prove its soundness. We instantiate the calculus as a sound automated verification procedure and demonstrate practical improvements on a set of synthetic and competition benchmarks.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Satisfiability Solving with LLMs: A Matched-Pair Evaluation of Reasoning Capability

    cs.AI 2026-05 unverdicted novelty 7.0

    A matched-pair protocol and Accurate Differentiation Rate metric reveal that conventional LLM accuracy on SAT problems is often inflated by over-predicting satisfiability, while cross-representation agreement exceeds ...

  2. Event-B Agent: Towards LLM Agent for Formal Model Synthesis and Repair

    cs.SE 2026-05 unverdicted novelty 7.0

    Event-B Agent is an LLM agent that synthesizes, refines, and repairs Event-B formal models from natural language requirements via iterative verification feedback loops.

  3. Guiding Human Validation of LLM-Generated Code via Verifiable Literate Programming

    cs.SE 2026-07 unverdicted novelty 6.0

    VLP adds an NL documentation layer with trace-linked mismatch detection and derived formal checks to make human validation of LLM code feasible, lifting pass@1 from 28.7-73.2% to 65.4-93.5%.

  4. Evaluating LLM-Generated ACSL Annotations for Formal Verification

    cs.SE 2026-02 unverdicted novelty 4.0

    Rule-based annotation generation for ACSL outperforms LLM-based methods in achieving successful formal verification of C programs.