IDS is an agentic LLM system that incrementally synthesizes both implementation and proof for distributed key-value stores, succeeding on all 7 specs where prior agents succeeded on only 2.
miniCodeProps: A minimal benchmark for proving code properties
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 4roles
dataset 1polarities
background 1representative citing papers
s2n-bignum-bench is a new benchmark requiring LLMs to synthesize HOL Light proofs for real-world low-level cryptographic assembly code.
VeriContest supplies 946 problems with specs, code, proofs, and tests to benchmark verifiable code generation in Rust/Verus, showing models reach 92% on code but only 5% end-to-end on full verifiable synthesis.
BRIDGE improves Lean executable correctness up to 1.5x and sample efficiency roughly 2x over direct prompting by using domain-guided intermediate representations across code, specs, and proofs.
citing papers explorer
-
Inductive Deductive Synthesis: Enabling AI to Generate Formally Verified Systems
IDS is an agentic LLM system that incrementally synthesizes both implementation and proof for distributed key-value stores, succeeding on all 7 specs where prior agents succeeded on only 2.
-
s2n-bignum-bench: A practical benchmark for evaluating low-level code reasoning of LLMs
s2n-bignum-bench is a new benchmark requiring LLMs to synthesize HOL Light proofs for real-world low-level cryptographic assembly code.
-
VeriContest: A Competitive-Programming Benchmark for Verifiable Code Generation
VeriContest supplies 946 problems with specs, code, proofs, and tests to benchmark verifiable code generation in Rust/Verus, showing models reach 92% on code but only 5% end-to-end on full verifiable synthesis.
-
BRIDGE: Building Representations In Domain Guided Program Synthesis
BRIDGE improves Lean executable correctness up to 1.5x and sample efficiency roughly 2x over direct prompting by using domain-guided intermediate representations across code, specs, and proofs.