Towards large language models as copilots for theorem proving in lean

Peiyang Song, Kaiyu Yang, Anima Anandkumar · 2024 · arXiv 2404.12534

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

representative citing papers

AI co-mathematician: Accelerating mathematicians with agentic AI

cs.AI · 2026-05-07 · unverdicted · novelty 7.0 · 2 refs

An interactive AI workbench for mathematicians achieves 48% on FrontierMath Tier 4 and helped solve open problems in early tests.

LiveFMBench: Unveiling the Power and Limits of Agentic Workflows in Specification Generation

cs.SE · 2026-05-02 · conditional · novelty 7.0

LiveFMBench shows that direct LLM prompting for C program formal specs overestimates accuracy by ~20% due to unfaithful behaviors like deceiving provers, while agentic workflows help under low sampling but overall performance remains far below human-authored specs.

Yanasse: Finding New Proofs from Deep Vision's Analogies, Part 1

cs.AI · 2026-04-19 · unverdicted · novelty 7.0

A domain-independent analogy engine transfers Lean tactic patterns from probability to representation theory, producing four new machine-verified proofs.

Lean Atlas: An Integrated Proof Environment for Scalable Human-AI Collaborative Formalization

cs.HC · 2026-03-16 · conditional · novelty 7.0

Lean Atlas visualizes Lean 4 dependency graphs and applies Lean Compass to reduce the nodes needing human semantic review by 27-99% across six evaluated projects.

Ax-Prover: A Deep Reasoning Agentic Framework for Theorem Proving in Mathematics and Quantum Physics

cs.AI · 2025-10-14 · unverdicted · novelty 6.0

Ax-Prover is a tool-using multi-agent LLM system that matches state-of-the-art provers on public math benchmarks and outperforms them on new abstract-algebra and quantum-theory benchmarks while also assisting an expert with a cryptography proof.

ACE: A Security Architecture for LLM-Integrated App Systems

cs.CR · 2025-04-29 · unverdicted · novelty 6.0

ACE decouples planning into abstract and concrete phases with static information-flow verification and enforces execution barriers to secure LLM app systems against prompt injection and related attacks.

Interactive Evaluation Requires a Design Science

cs.AI · 2026-05-18 · unverdicted · novelty 5.0

Interactive evaluation of AI must be reframed as a distinct paradigm that maps interaction trajectories to judgments on process, recoverability, coordination, robustness, and system performance, supported by a two-axis taxonomy and design principles.

Deep Vision: A Formal Proof of Wolstenholmes Theorem in Lean 4

cs.LO · 2026-04-14 · accept · novelty 5.0

Wolstenholme's theorem is formally verified in Lean 4 via expansion of a shifted factorial product and vanishing power sums modulo p.

Riemann-Bench: A Benchmark for Moonshot Mathematics

cs.AI · 2026-04-08 · conditional · novelty 5.0

Riemann-Bench is a private benchmark of 25 research-level math problems on which all tested frontier AI models score below 10%.

citing papers explorer

Showing 9 of 9 citing papers.

AI co-mathematician: Accelerating mathematicians with agentic AI cs.AI · 2026-05-07 · unverdicted · none · ref 24 · 2 links
An interactive AI workbench for mathematicians achieves 48% on FrontierMath Tier 4 and helped solve open problems in early tests.
LiveFMBench: Unveiling the Power and Limits of Agentic Workflows in Specification Generation cs.SE · 2026-05-02 · conditional · none · ref 28
LiveFMBench shows that direct LLM prompting for C program formal specs overestimates accuracy by ~20% due to unfaithful behaviors like deceiving provers, while agentic workflows help under low sampling but overall performance remains far below human-authored specs.
Yanasse: Finding New Proofs from Deep Vision's Analogies, Part 1 cs.AI · 2026-04-19 · unverdicted · none · ref 20
A domain-independent analogy engine transfers Lean tactic patterns from probability to representation theory, producing four new machine-verified proofs.
Lean Atlas: An Integrated Proof Environment for Scalable Human-AI Collaborative Formalization cs.HC · 2026-03-16 · conditional · none · ref 15
Lean Atlas visualizes Lean 4 dependency graphs and applies Lean Compass to reduce the nodes needing human semantic review by 27-99% across six evaluated projects.
Ax-Prover: A Deep Reasoning Agentic Framework for Theorem Proving in Mathematics and Quantum Physics cs.AI · 2025-10-14 · unverdicted · none · ref 60
Ax-Prover is a tool-using multi-agent LLM system that matches state-of-the-art provers on public math benchmarks and outperforms them on new abstract-algebra and quantum-theory benchmarks while also assisting an expert with a cryptography proof.
ACE: A Security Architecture for LLM-Integrated App Systems cs.CR · 2025-04-29 · unverdicted · none · ref 34
ACE decouples planning into abstract and concrete phases with static information-flow verification and enforces execution barriers to secure LLM app systems against prompt injection and related attacks.
Interactive Evaluation Requires a Design Science cs.AI · 2026-05-18 · unverdicted · none · ref 52
Interactive evaluation of AI must be reframed as a distinct paradigm that maps interaction trajectories to judgments on process, recoverability, coordination, robustness, and system performance, supported by a two-axis taxonomy and design principles.
Deep Vision: A Formal Proof of Wolstenholmes Theorem in Lean 4 cs.LO · 2026-04-14 · accept · none · ref 39
Wolstenholme's theorem is formally verified in Lean 4 via expansion of a shifted factorial product and vanishing power sums modulo p.
Riemann-Bench: A Benchmark for Moonshot Mathematics cs.AI · 2026-04-08 · conditional · none · ref 14
Riemann-Bench is a private benchmark of 25 research-level math problems on which all tested frontier AI models score below 10%.

Towards large language models as copilots for theorem proving in lean

fields

years

verdicts

representative citing papers

citing papers explorer