Cobblestone: A Divide-and-Conquer Approach for Automating Formal Verification

· 2024 · cs.LO · arXiv 2410.19940

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

open full Pith review browse 8 citing papers arXiv PDF

abstract

Formal verification using proof assistants, such as Coq, is an effective way of improving software quality, but requires significant effort and expertise. Machine learning can automatically synthesize proofs, but such tools are able to prove only a fraction of desired software properties. We introduce Cobblestone, a divide-and-conquer approach for proof synthesis. Cobblestone uses a large language model (LLM) to generate potential proofs, uses those proofs to break the problem into simpler parts, automatically identifies which of those parts were successfully proven, and iterates on the remaining parts to build a correct proof that is guaranteed to be sound, despite the reliance on unsound LLMs. We evaluate Cobblestone on four benchmarks of open-source Coq projects, controlling for training data leakage. Fully automatically, Cobblestone outperforms state-of-the-art non-LLM tools, and proves many theorems that other LLM-based tools cannot, and on many benchmarks, outperforms them. Each Cobblestone run costs only $1.25 and takes 14.7 minutes, on average. Cobblestone can also be used with external input, from a user or another tool, providing a proof structure or relevant lemmas. Evaluated with such an oracle, Cobblestone proves up to 58% of theorems. Overall, our research shows that tools can make use of partial progress and external input to more effectively automate formal verification.

citation-role summary

background 4

citation-polarity summary

background 4

representative citing papers

Trustworthy Software Project Generation : a Case Study with an Interactive Theorem Prover

cs.SE · 2026-05-25 · conditional · novelty 7.0

An LLM agent with Rocq backend automatically builds a verified RISC-V RV32I interpreter (1859 lines Rocq, 2848 lines extracted C++) that passes 265 tests and 12-hour fuzzing, while a Dafny backend fails.

A Learning Method for Symbolic Systems Using Large Language Models

cs.SE · 2026-05-09 · unverdicted · novelty 7.0

LLM2Ltac mines symbolic tactics from 11,725 Coq theorems using LLMs and integrates them into CoqHammer, improving proof rates by 23.87% on 6,199 theorems from four large verification projects.

Certified Program Synthesis with a Multi-Modal Verifier

cs.SE · 2026-04-17 · unverdicted · novelty 7.0

LeetProof achieves higher rates of fully certified program synthesis from natural language by using a multi-modal verifier in Lean to validate specifications via randomized testing and delegate proofs to AI tools, outperforming single-mode baselines on benchmarks while uncovering defects in prior参考.

The Search for Constrained Random Generators

cs.PL · 2025-11-15 · unverdicted · novelty 7.0

A Lean library called Palamedes uses synthesis rules from generator semantics and catamorphism-anamorphism rewriting to automatically produce correct constrained random generators.

AutoSOUP: Safety-Oriented Unit Proof Generation for Component-level Memory-Safety Verification

cs.SE · 2026-05-11 · unverdicted · novelty 6.0

AutoSOUP automates component-level memory-safety verification by generating Safety-Oriented Unit Proofs via three techniques and a hybrid LLM-plus-program-synthesis architecture called LLM-As-Function-Call.

PROMISE: Proof Automation as Structural Imitation of Human Reasoning

cs.LO · 2026-04-07 · unverdicted · novelty 6.0

PROMISE reframes automated proof generation as stateful search over structural embeddings of proof states, outperforming prior LLM-based systems by up to 26 points on the seL4 benchmark.

Understanding and Improving Automated Proof Synthesis for Interactive Theorem Provers

cs.LO · 2026-04-27 · unverdicted · novelty 5.0

A pattern-guided tactic search method improves automated proof synthesis success rates by an average of 8.05% and achieves a 20% increase on previously unproven theorems.

On Reasoning-Centric LLM-based Automated Theorem Proving

cs.SE · 2026-04-21 · unverdicted · novelty 5.0

ReCent-Prover achieves a 22.58% relative improvement over prior state-of-the-art in proved theorems on the CoqStoq benchmark by using reasoning-centric techniques under a fixed LLM invocation budget.

citing papers explorer

Showing 2 of 2 citing papers after filters.

PROMISE: Proof Automation as Structural Imitation of Human Reasoning cs.LO · 2026-04-07 · unverdicted · none · ref 14 · internal anchor
PROMISE reframes automated proof generation as stateful search over structural embeddings of proof states, outperforming prior LLM-based systems by up to 26 points on the seL4 benchmark.
Understanding and Improving Automated Proof Synthesis for Interactive Theorem Provers cs.LO · 2026-04-27 · unverdicted · none · ref 21 · internal anchor
A pattern-guided tactic search method improves automated proof synthesis success rates by an average of 8.05% and achieves a 20% increase on previously unproven theorems.

Cobblestone: A Divide-and-Conquer Approach for Automating Formal Verification

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer