Event-B Agent is an LLM agent that synthesizes, refines, and repairs Event-B formal models from natural language requirements via iterative verification feedback loops.
arXiv preprint arXiv:2407.10040 , year=
6 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
CAM-Bench is a new Lean 4 theorem-proving benchmark of 1,000 problems in computational and applied mathematics, built from textbook exercises using a dependency-recovery pipeline to reconstruct local context.
OProver-32B achieves top Pass@32 scores on MiniF2F, ProverBench, and PutnamBench by combining continued pretraining with iterative agentic proving, retrieval, SFT on repairs, and RL on unresolved cases using a 6.86M-proof dataset.
Segment-level supervision extracts coherent proof segments to train policy models that achieve 61-66% success on miniF2F, outperforming step-level and whole-proof methods while also improving existing provers.
VeriSpecGen uses traceable refinement to synthesize intent-aligned Lean specifications from natural language, reaching 86.6% on the VERINA SpecGen task and producing 343K training trajectories that improve downstream models by 62-106%.
Aristotle reaches gold-medal-equivalent performance on 2025 IMO problems via integrated Lean proof search, informal lemma formalization, and a dedicated geometry solver.
citing papers explorer
-
Event-B Agent: Towards LLM Agent for Formal Model Synthesis and Repair
Event-B Agent is an LLM agent that synthesizes, refines, and repairs Event-B formal models from natural language requirements via iterative verification feedback loops.
-
CAM-Bench: A Benchmark for Computational and Applied Mathematics in Lean
CAM-Bench is a new Lean 4 theorem-proving benchmark of 1,000 problems in computational and applied mathematics, built from textbook exercises using a dependency-recovery pipeline to reconstruct local context.
-
OProver: A Unified Framework for Agentic Formal Theorem Proving
OProver-32B achieves top Pass@32 scores on MiniF2F, ProverBench, and PutnamBench by combining continued pretraining with iterative agentic proving, retrieval, SFT on repairs, and RL on unresolved cases using a 6.86M-proof dataset.
-
Rethinking Supervision Granularity: Segment-Level Learning for LLM-Based Theorem Proving
Segment-level supervision extracts coherent proof segments to train policy models that achieve 61-66% success on miniF2F, outperforming step-level and whole-proof methods while also improving existing provers.
-
Intent-aligned Formal Specification Synthesis via Traceable Refinement
VeriSpecGen uses traceable refinement to synthesize intent-aligned Lean specifications from natural language, reaching 86.6% on the VERINA SpecGen task and producing 343K training trajectories that improve downstream models by 62-106%.
-
Aristotle: IMO-level Automated Theorem Proving
Aristotle reaches gold-medal-equivalent performance on 2025 IMO problems via integrated Lean proof search, informal lemma formalization, and a dedicated geometry solver.