Lemur: Integrating Large Language Models in Automated Program Verification

Clark Barrett; Haoze Wu; Nina Narodytska

arxiv: 2310.04870 · v5 · pith:5SYYQOBLnew · submitted 2023-10-07 · 💻 cs.FL · cs.AI· cs.LG· cs.LO

Lemur: Integrating Large Language Models in Automated Program Verification

Haoze Wu , Clark Barrett , Nina Narodytska This is my paper

classification 💻 cs.FL cs.AIcs.LGcs.LO

keywords automatedverificationprogramllmsmethodologyabstractbenchmarkscalculus

0 comments

read the original abstract

The demonstrated code-understanding capability of LLMs raises the question of whether they can be used for automated program verification, a task that demands high-level abstract reasoning about program properties that is challenging for verification tools. We propose a general methodology to combine the power of LLMs and automated reasoners for automated program verification. We formally describe this methodology as a set of transition rules and prove its soundness. We instantiate the calculus as a sound automated verification procedure and demonstrate practical improvements on a set of synthetic and competition benchmarks.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Satisfiability Solving with LLMs: A Matched-Pair Evaluation of Reasoning Capability
cs.AI 2026-05 unverdicted novelty 7.0

A matched-pair protocol and Accurate Differentiation Rate metric reveal that conventional LLM accuracy on SAT problems is often inflated by over-predicting satisfiability, while cross-representation agreement exceeds ...
Event-B Agent: Towards LLM Agent for Formal Model Synthesis and Repair
cs.SE 2026-05 unverdicted novelty 7.0

Event-B Agent is an LLM agent that synthesizes, refines, and repairs Event-B formal models from natural language requirements via iterative verification feedback loops.
Guiding Human Validation of LLM-Generated Code via Verifiable Literate Programming
cs.SE 2026-07 unverdicted novelty 6.0

VLP adds an NL documentation layer with trace-linked mismatch detection and derived formal checks to make human validation of LLM code feasible, lifting pass@1 from 28.7-73.2% to 65.4-93.5%.
Evaluating LLM-Generated ACSL Annotations for Formal Verification
cs.SE 2026-02 unverdicted novelty 4.0

Rule-based annotation generation for ACSL outperforms LLM-based methods in achieving successful formal verification of C programs.