arXiv preprint arXiv:2407.03203 , year=

Ruida Wang, Jipeng Zhang, Yizhen Jia, Rui Pan, Shizhe Diao, Renjie Pi, Tong Zhang · 2024 · arXiv 2407.03203

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

representative citing papers

TheoremBench: Evaluating LLMs on Theorem Proving in Formal Mathematics

cs.AI · 2026-06-08 · unverdicted · novelty 8.0

TheoremBench is a Lean4 benchmark of classical theorems in main and premised forms that evaluates LLM provers on partial progress, coverage, and token efficiency rather than binary success on competition problems.

Beyond the Library: An Agentic Framework for Autoformalizing Research Mathematics

cs.AI · 2026-06-30 · accept · novelty 7.0 · 2 refs

An orchestrator-driven agentic pipeline using general coding LLMs autoformalizes 32 PutnamBench problems and the main theorems plus proofs from five STOC papers into Lean 4, with two proofs using only the kernel.

Goedel-Architect: Streamlining Formal Theorem Proving with Blueprint Generation and Refinement

cs.AI · 2026-06-04 · conditional · novelty 7.0

Goedel-Architect introduces blueprint generation and iterative refinement for Lean 4 theorem proving, reaching 99.2% on MiniF2F-test and 75.6% on PutnamBench with DeepSeek-V4-Flash.

RAG over Thinking Traces Can Improve Reasoning Tasks

cs.IR · 2026-05-05 · unverdicted · novelty 7.0

Retrieving structured thinking traces as a corpus improves reasoning performance on AIME, LiveCodeBench, and GPQA over standard RAG or no retrieval.

LiveFMBench: Unveiling the Power and Limits of Agentic Workflows in Specification Generation

cs.SE · 2026-05-02 · conditional · novelty 7.0

LiveFMBench shows that direct LLM prompting for C program formal specs overestimates accuracy by ~20% due to unfaithful behaviors like deceiving provers, while agentic workflows help under low sampling but overall performance remains far below human-authored specs.

citing papers explorer

Showing 0 of 0 citing papers after filters.

No citing papers match the current filters.

arXiv preprint arXiv:2407.03203 , year=

fields

years

verdicts

representative citing papers

citing papers explorer