hub

Hilbert: Recursively building formal proofs with informal reasoning.CoRR, abs/2509.22819

Sumanth Varambally, Thomas V oice, Yanchao Sun, Zhifeng Chen, Rose Yu, Ke Ye · 2025 · arXiv 2509.22819

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

read on arXiv browse 11 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

Advancing Mathematics Research with AI-Driven Formal Proof Search

cs.AI · 2026-05-21 · unverdicted · novelty 7.0

LLM-based agents in Lean solved 9 of 353 open Erdős problems and proved 44 of 492 OEIS conjectures at a few hundred dollars each.

Not All Proofs Are Equal: Evaluating LLM Proof Quality Beyond Correctness

cs.CL · 2026-05-11 · unverdicted · novelty 7.0

LLM proofs for hard math problems show large differences in quality metrics like conciseness and cognitive simplicity that correctness-only tests miss, along with trade-offs between quality and correctness.

Certified Program Synthesis with a Multi-Modal Verifier

cs.SE · 2026-04-17 · unverdicted · novelty 7.0

LeetProof achieves higher rates of fully certified program synthesis from natural language by using a multi-modal verifier in Lean to validate specifications via randomized testing and delegate proofs to AI tools, outperforming single-mode baselines on benchmarks while uncovering defects in prior参考.

Lean Atlas: An Integrated Proof Environment for Scalable Human-AI Collaborative Formalization

cs.HC · 2026-03-16 · conditional · novelty 7.0

Lean Atlas visualizes Lean 4 dependency graphs and applies Lean Compass to reduce the nodes needing human semantic review by 27-99% across six evaluated projects.

Scaling Self-Play with Self-Guidance

cs.LG · 2026-04-22 · unverdicted · novelty 6.0

SGS adds self-guidance to LLM self-play for Lean4 theorem proving, surpassing RL baselines and enabling a 7B model to outperform a 671B model after 200 rounds.

A Minimal Agent for Automated Theorem Proving

cs.AI · 2026-02-27 · unverdicted · novelty 6.0

A minimal agentic system achieves competitive performance in automated theorem proving with a simpler design and lower cost than state-of-the-art methods.

ContextPilot: Fast Long-Context Inference via Context Reuse

cs.LG · 2025-11-05 · unverdicted · novelty 6.0

ContextPilot reduces LLM prefill latency by up to 3x via context indexing, ordering, de-duplication, and succinct annotations that maximize KV-cache reuse while preserving or improving reasoning quality.

Ax-Prover: A Deep Reasoning Agentic Framework for Theorem Proving in Mathematics and Quantum Physics

cs.AI · 2025-10-14 · unverdicted · novelty 6.0

Ax-Prover is a tool-using multi-agent LLM system that matches state-of-the-art provers on public math benchmarks and outperforms them on new abstract-algebra and quantum-theory benchmarks while also assisting an expert with a cryptography proof.

STAR-P\'olyaMath: Multi-Agent Reasoning under Persistent Meta-Strategic Supervision

cs.MA · 2026-05-19 · unverdicted · novelty 5.0

STAR-PólyaMath introduces a multi-agent framework with meta-strategic supervision and state-machine orchestration that reports state-of-the-art and perfect scores on eight top math competition benchmarks.

Evaluating LLM-Generated ACSL Annotations for Formal Verification

cs.SE · 2026-02-14 · unverdicted · novelty 4.0

Rule-based annotation generation for ACSL outperforms LLM-based methods in achieving successful formal verification of C programs.

AI for Mathematics: Progress, Challenges, and Prospects

math.HO · 2026-01-19 · unverdicted · novelty 4.0

AI for math combines task-specific architectures and general foundation models to support research and advance AI reasoning capabilities.

citing papers explorer

Showing 11 of 11 citing papers.

Advancing Mathematics Research with AI-Driven Formal Proof Search cs.AI · 2026-05-21 · unverdicted · none · ref 63
LLM-based agents in Lean solved 9 of 353 open Erdős problems and proved 44 of 492 OEIS conjectures at a few hundred dollars each.
Not All Proofs Are Equal: Evaluating LLM Proof Quality Beyond Correctness cs.CL · 2026-05-11 · unverdicted · none · ref 31
LLM proofs for hard math problems show large differences in quality metrics like conciseness and cognitive simplicity that correctness-only tests miss, along with trade-offs between quality and correctness.
Certified Program Synthesis with a Multi-Modal Verifier cs.SE · 2026-04-17 · unverdicted · none · ref 66
LeetProof achieves higher rates of fully certified program synthesis from natural language by using a multi-modal verifier in Lean to validate specifications via randomized testing and delegate proofs to AI tools, outperforming single-mode baselines on benchmarks while uncovering defects in prior参考.
Lean Atlas: An Integrated Proof Environment for Scalable Human-AI Collaborative Formalization cs.HC · 2026-03-16 · conditional · none · ref 18
Lean Atlas visualizes Lean 4 dependency graphs and applies Lean Compass to reduce the nodes needing human semantic review by 27-99% across six evaluated projects.
Scaling Self-Play with Self-Guidance cs.LG · 2026-04-22 · unverdicted · none · ref 38
SGS adds self-guidance to LLM self-play for Lean4 theorem proving, surpassing RL baselines and enabling a 7B model to outperform a 671B model after 200 rounds.
A Minimal Agent for Automated Theorem Proving cs.AI · 2026-02-27 · unverdicted · none · ref 15
A minimal agentic system achieves competitive performance in automated theorem proving with a simpler design and lower cost than state-of-the-art methods.
ContextPilot: Fast Long-Context Inference via Context Reuse cs.LG · 2025-11-05 · unverdicted · none · ref 3
ContextPilot reduces LLM prefill latency by up to 3x via context indexing, ordering, de-duplication, and succinct annotations that maximize KV-cache reuse while preserving or improving reasoning quality.
Ax-Prover: A Deep Reasoning Agentic Framework for Theorem Proving in Mathematics and Quantum Physics cs.AI · 2025-10-14 · unverdicted · none · ref 65
Ax-Prover is a tool-using multi-agent LLM system that matches state-of-the-art provers on public math benchmarks and outperforms them on new abstract-algebra and quantum-theory benchmarks while also assisting an expert with a cryptography proof.
STAR-P\'olyaMath: Multi-Agent Reasoning under Persistent Meta-Strategic Supervision cs.MA · 2026-05-19 · unverdicted · none · ref 17
STAR-PólyaMath introduces a multi-agent framework with meta-strategic supervision and state-machine orchestration that reports state-of-the-art and perfect scores on eight top math competition benchmarks.
Evaluating LLM-Generated ACSL Annotations for Formal Verification cs.SE · 2026-02-14 · unverdicted · none · ref 18
Rule-based annotation generation for ACSL outperforms LLM-based methods in achieving successful formal verification of C programs.
AI for Mathematics: Progress, Challenges, and Prospects math.HO · 2026-01-19 · unverdicted · none · ref 138
AI for math combines task-specific architectures and general foundation models to support research and advance AI reasoning capabilities.

Hilbert: Recursively building formal proofs with informal reasoning.CoRR, abs/2509.22819

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer