A new benchmark of 9,415 Lean 4 specifications derived from 2,772 scraped Python property-based tests, plus a three-agent LLM transpilation pipeline and proof-generation baselines.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2representative citing papers
FormalScience provides a scalable human-in-the-loop system for autoformalising scientific reasoning into Lean, demonstrated on a new 200-problem physics dataset with perfect formal validity.
citing papers explorer
-
FVSpec: Real-World Property-Based Tests as Lean Challenges
A new benchmark of 9,415 Lean 4 specifications derived from 2,772 scraped Python property-based tests, plus a three-agent LLM transpilation pipeline and proof-generation baselines.
-
FormalScience: Scalable Human-in-the-Loop Autoformalisation of Science with Agentic Code Generation in Lean
FormalScience provides a scalable human-in-the-loop system for autoformalising scientific reasoning into Lean, demonstrated on a new 200-problem physics dataset with perfect formal validity.