A new benchmark of 9,415 Lean 4 specifications derived from 2,772 scraped Python property-based tests, plus a three-agent LLM transpilation pipeline and proof-generation baselines.
Lean-github: Compiling github lean repositories for a versatile lean prover
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
A recurrent-depth architecture enables language models to improve reasoning performance by iterating computation in latent space, achieving gains equivalent to much larger models on benchmarks.
Proposes a three-layer framework using formal AI reasoning for verification, derivation, and discovery in wireless communications theory.
citing papers explorer
-
FVSpec: Real-World Property-Based Tests as Lean Challenges
A new benchmark of 9,415 Lean 4 specifications derived from 2,772 scraped Python property-based tests, plus a three-agent LLM transpilation pipeline and proof-generation baselines.
-
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
A recurrent-depth architecture enables language models to improve reasoning performance by iterating computation in latent space, achieving gains equivalent to much larger models on benchmarks.
-
Rethinking Wireless Communications through Formal Mathematical AI Reasoning
Proposes a three-layer framework using formal AI reasoning for verification, derivation, and discovery in wireless communications theory.