IDS is an agentic LLM system that incrementally synthesizes both implementation and proof for distributed key-value stores, succeeding on all 7 specs where prior agents succeeded on only 2.
url: https://arxiv.org/abs/2306.15626
10 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
polarities
background 2representative citing papers
LiveFMBench shows that direct LLM prompting for C program formal specs overestimates accuracy by ~20% due to unfaithful behaviors like deceiving provers, while agentic workflows help under low sampling but overall performance remains far below human-authored specs.
LLMs display clear performance stratification on formal language tasks aligned with Chomsky hierarchy complexity levels, limited by severe efficiency barriers rather than absolute capability.
Lean Atlas visualizes Lean 4 dependency graphs and applies Lean Compass to reduce the nodes needing human semantic review by 27-99% across six evaluated projects.
A Lean library called Palamedes uses synthesis rules from generator semantics and catamorphism-anamorphism rewriting to automatically produce correct constrained random generators.
Agent-directed tree search improves LLM performance on Lean formal verification tasks, with context-based orchestration solving more intermediate specs at lower token cost than baseline agents.
A hybrid pipeline lets an LLM write high-level proof sketches in a compact DSL that a lightweight kernel then expands into explicit, checkable obligations for reliable math and logic reasoning.
A minimal agentic system achieves competitive performance in automated theorem proving with a simpler design and lower cost than state-of-the-art methods.
RLVR training raises verified Dafny pass rates from 9.7% to 31.1% on a filtered benchmark while a Lean proof scaffold lifts success from 46.2% to 69.2% on a pilot set and solves 7 of 42 prior unsolved tasks.
pAI/MSc is a customizable multi-agent system that reduces human steering by orders of magnitude when turning a hypothesis into a literature-grounded, mathematically established, experimentally supported manuscript draft in ML theory.
citing papers explorer
-
Inductive Deductive Synthesis: Enabling AI to Generate Formally Verified Systems
IDS is an agentic LLM system that incrementally synthesizes both implementation and proof for distributed key-value stores, succeeding on all 7 specs where prior agents succeeded on only 2.
-
LiveFMBench: Unveiling the Power and Limits of Agentic Workflows in Specification Generation
LiveFMBench shows that direct LLM prompting for C program formal specs overestimates accuracy by ~20% due to unfaithful behaviors like deceiving provers, while agentic workflows help under low sampling but overall performance remains far below human-authored specs.
-
Evaluating the Formal Reasoning Capabilities of Large Language Models through Chomsky Hierarchy
LLMs display clear performance stratification on formal language tasks aligned with Chomsky hierarchy complexity levels, limited by severe efficiency barriers rather than absolute capability.
-
Lean Atlas: An Integrated Proof Environment for Scalable Human-AI Collaborative Formalization
Lean Atlas visualizes Lean 4 dependency graphs and applies Lean Compass to reduce the nodes needing human semantic review by 27-99% across six evaluated projects.
-
The Search for Constrained Random Generators
A Lean library called Palamedes uses synthesis rules from generator semantics and catamorphism-anamorphism rewriting to automatically produce correct constrained random generators.
-
Automating Formal Verification with Agent-Guided Tree Search
Agent-directed tree search improves LLM performance on Lean formal verification tasks, with context-based orchestration solving more intermediate specs at lower token cost than baseline agents.
-
ProofSketcher: Hybrid LLM + Lightweight Proof Checker for Reliable Math/Logic Reasoning
A hybrid pipeline lets an LLM write high-level proof sketches in a compact DSL that a lightweight kernel then expands into explicit, checkable obligations for reliable math and logic reasoning.
-
A Minimal Agent for Automated Theorem Proving
A minimal agentic system achieves competitive performance in automated theorem proving with a simpler design and lower cost than state-of-the-art methods.
-
Automating Formal Verification with Reinforcement Learning and Recursive Inference
RLVR training raises verified Dafny pass rates from 9.7% to 31.1% on a filtered benchmark while a Lean proof scaffold lifts success from 46.2% to 69.2% on a pilot set and solves 7 of 42 prior unsolved tasks.
-
pAI/MSc: ML Theory Research with Humans on the Loop
pAI/MSc is a customizable multi-agent system that reduces human steering by orders of magnitude when turning a hypothesis into a literature-grounded, mathematically established, experimentally supported manuscript draft in ML theory.