miniCodeProps: A minimal benchmark for proving code properties

Evan Lohn, Sean Welleck · 2024 · arXiv 2406.11915

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

dataset 1

citation-polarity summary

background 1

representative citing papers

Inductive Deductive Synthesis: Enabling AI to Generate Formally Verified Systems

cs.AI · 2026-05-22 · unverdicted · novelty 7.0

IDS is an agentic LLM system that incrementally synthesizes both implementation and proof for distributed key-value stores, succeeding on all 7 specs where prior agents succeeded on only 2.

s2n-bignum-bench: A practical benchmark for evaluating low-level code reasoning of LLMs

cs.PL · 2026-03-15 · unverdicted · novelty 7.0

s2n-bignum-bench is a new benchmark requiring LLMs to synthesize HOL Light proofs for real-world low-level cryptographic assembly code.

VeriContest: A Competitive-Programming Benchmark for Verifiable Code Generation

cs.SE · 2026-05-08 · unverdicted · novelty 6.0

VeriContest supplies 946 problems with specs, code, proofs, and tests to benchmark verifiable code generation in Rust/Verus, showing models reach 92% on code but only 5% end-to-end on full verifiable synthesis.

BRIDGE: Building Representations In Domain Guided Program Synthesis

cs.LG · 2025-11-26 · unverdicted · novelty 5.0

BRIDGE improves Lean executable correctness up to 1.5x and sample efficiency roughly 2x over direct prompting by using domain-guided intermediate representations across code, specs, and proofs.

citing papers explorer

Showing 4 of 4 citing papers.

Inductive Deductive Synthesis: Enabling AI to Generate Formally Verified Systems cs.AI · 2026-05-22 · unverdicted · none · ref 29
IDS is an agentic LLM system that incrementally synthesizes both implementation and proof for distributed key-value stores, succeeding on all 7 specs where prior agents succeeded on only 2.
s2n-bignum-bench: A practical benchmark for evaluating low-level code reasoning of LLMs cs.PL · 2026-03-15 · unverdicted · none · ref 11
s2n-bignum-bench is a new benchmark requiring LLMs to synthesize HOL Light proofs for real-world low-level cryptographic assembly code.
VeriContest: A Competitive-Programming Benchmark for Verifiable Code Generation cs.SE · 2026-05-08 · unverdicted · none · ref 29
VeriContest supplies 946 problems with specs, code, proofs, and tests to benchmark verifiable code generation in Rust/Verus, showing models reach 92% on code but only 5% end-to-end on full verifiable synthesis.
BRIDGE: Building Representations In Domain Guided Program Synthesis cs.LG · 2025-11-26 · unverdicted · none · ref 21
BRIDGE improves Lean executable correctness up to 1.5x and sample efficiency roughly 2x over direct prompting by using domain-guided intermediate representations across code, specs, and proofs.

miniCodeProps: A minimal benchmark for proving code properties

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer