Title resolution pending

URLhttps://doi · 2023 · arXiv 2210.12283

16 Pith papers cite this work. Polarity classification is still indexing.

16 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Inductive Deductive Synthesis: Enabling AI to Generate Formally Verified Systems

cs.AI · 2026-05-22 · unverdicted · novelty 7.0

IDS is an agentic LLM system that incrementally synthesizes both implementation and proof for distributed key-value stores, succeeding on all 7 specs where prior agents succeeded on only 2.

Advancing Mathematics Research with AI-Driven Formal Proof Search

cs.AI · 2026-05-21 · unverdicted · novelty 7.0

LLM-based agents in Lean solved 9 of 353 open Erdős problems and proved 44 of 492 OEIS conjectures at a few hundred dollars each.

Fidelity Probes for Specification--Code Alignment

cs.LG · 2026-05-17 · unverdicted · novelty 7.0

Fidelity probes from code raise specification fidelity from 0.63 to 0.94 on a 12k-line COBOL benchmark over eight iterations, with convergence predicted by a two-state Markov fixed point from four iterations of rate data.

Neurosymbolic Auditing of Natural-Language Software Requirements

cs.SE · 2026-05-13 · unverdicted · novelty 7.0

VERIMED translates natural-language requirements to formal logic via LLMs, detects ambiguity from stochastic formalization differences, and audits for inconsistency and safety violations using SMT queries.

Automatic Textbook Formalization

cs.AI · 2026-04-03 · accept · novelty 7.0

Multi-agent AI system formalizes entire 500-page graduate algebraic combinatorics textbook into Lean, creating 130K lines of code in one week at human-expert cost.

ImProver 2: Iteratively Self-Improving LMs for Neurosymbolic Proof Optimization

cs.AI · 2026-05-21 · unverdicted · novelty 6.0

ImProver 2 combines a data-efficient expert-iteration pipeline with a neurosymbolic scaffold to train a 7B model that outperforms larger models in Lean 4 proof optimization across structural metrics.

OProver: A Unified Framework for Agentic Formal Theorem Proving

cs.CL · 2026-05-17 · unverdicted · novelty 6.0

OProver-32B achieves top Pass@32 scores on MiniF2F, ProverBench, and PutnamBench by combining continued pretraining with iterative agentic proving, retrieval, SFT on repairs, and RL on unresolved cases using a 6.86M-proof dataset.

Viverra: Text-to-Code with Guarantees

cs.SE · 2026-05-14 · unverdicted · novelty 6.0

Viverra generates C code from text descriptions together with assertions that are verified by model checkers, and a user study with over 400 participants shows the verified assertions improve code comprehension.

Rethinking Supervision Granularity: Segment-Level Learning for LLM-Based Theorem Proving

cs.AI · 2026-05-12 · unverdicted · novelty 6.0

Segment-level supervision extracts coherent proof segments to train policy models that achieve 61-66% success on miniF2F, outperforming step-level and whole-proof methods while also improving existing provers.

A Minimal Agent for Automated Theorem Proving

cs.AI · 2026-02-27 · unverdicted · novelty 6.0

A minimal agentic system achieves competitive performance in automated theorem proving with a simpler design and lower cost than state-of-the-art methods.

VERGE: Formal Refinement and Guidance Engine for Verifiable LLM Reasoning

cs.CL · 2026-01-27 · unverdicted · novelty 6.0

VERGE decomposes LLM outputs into atomic claims, autoformalizes them to first-order logic, verifies with SMT solvers and consensus, and refines via minimal correction subsets, yielding 18.7% average uplift on reasoning benchmarks.

Aristotle: IMO-level Automated Theorem Proving

cs.AI · 2025-10-01 · unverdicted · novelty 6.0

Aristotle reaches gold-medal-equivalent performance on 2025 IMO problems via integrated Lean proof search, informal lemma formalization, and a dedicated geometry solver.

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

cs.CL · 2024-02-05 · unverdicted · novelty 6.0

DeepSeekMath 7B reaches 51.7% on MATH via continued pretraining on curated web math data and Group Relative Policy Optimization.

ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models

cs.CL · 2023-05-23 · conditional · novelty 6.0

ReWOO decouples reasoning from tool observations in augmented language models, delivering 5x token efficiency and 4% higher accuracy on multi-step reasoning benchmarks like HotpotQA.

On Reasoning-Centric LLM-based Automated Theorem Proving

cs.SE · 2026-04-21 · unverdicted · novelty 5.0

ReCent-Prover achieves a 22.58% relative improvement over prior state-of-the-art in proved theorems on the CoqStoq benchmark by using reasoning-centric techniques under a fixed LLM invocation budget.

Rethinking Wireless Communications through Formal Mathematical AI Reasoning

eess.SP · 2026-04-28 · unverdicted · novelty 4.0

Proposes a three-layer framework using formal AI reasoning for verification, derivation, and discovery in wireless communications theory.

citing papers explorer

Showing 16 of 16 citing papers.

Inductive Deductive Synthesis: Enabling AI to Generate Formally Verified Systems cs.AI · 2026-05-22 · unverdicted · none · ref 19
IDS is an agentic LLM system that incrementally synthesizes both implementation and proof for distributed key-value stores, succeeding on all 7 specs where prior agents succeeded on only 2.
Advancing Mathematics Research with AI-Driven Formal Proof Search cs.AI · 2026-05-21 · unverdicted · none · ref 34
LLM-based agents in Lean solved 9 of 353 open Erdős problems and proved 44 of 492 OEIS conjectures at a few hundred dollars each.
Fidelity Probes for Specification--Code Alignment cs.LG · 2026-05-17 · unverdicted · none · ref 2
Fidelity probes from code raise specification fidelity from 0.63 to 0.94 on a 12k-line COBOL benchmark over eight iterations, with convergence predicted by a two-state Markov fixed point from four iterations of rate data.
Neurosymbolic Auditing of Natural-Language Software Requirements cs.SE · 2026-05-13 · unverdicted · none · ref 14
VERIMED translates natural-language requirements to formal logic via LLMs, detects ambiguity from stochastic formalization differences, and audits for inconsistency and safety violations using SMT queries.
Automatic Textbook Formalization cs.AI · 2026-04-03 · accept · none · ref 10
Multi-agent AI system formalizes entire 500-page graduate algebraic combinatorics textbook into Lean, creating 130K lines of code in one week at human-expert cost.
ImProver 2: Iteratively Self-Improving LMs for Neurosymbolic Proof Optimization cs.AI · 2026-05-21 · unverdicted · none · ref 3
ImProver 2 combines a data-efficient expert-iteration pipeline with a neurosymbolic scaffold to train a 7B model that outperforms larger models in Lean 4 proof optimization across structural metrics.
OProver: A Unified Framework for Agentic Formal Theorem Proving cs.CL · 2026-05-17 · unverdicted · none · ref 132
OProver-32B achieves top Pass@32 scores on MiniF2F, ProverBench, and PutnamBench by combining continued pretraining with iterative agentic proving, retrieval, SFT on repairs, and RL on unresolved cases using a 6.86M-proof dataset.
Viverra: Text-to-Code with Guarantees cs.SE · 2026-05-14 · unverdicted · none · ref 8
Viverra generates C code from text descriptions together with assertions that are verified by model checkers, and a user study with over 400 participants shows the verified assertions improve code comprehension.
Rethinking Supervision Granularity: Segment-Level Learning for LLM-Based Theorem Proving cs.AI · 2026-05-12 · unverdicted · none · ref 9
Segment-level supervision extracts coherent proof segments to train policy models that achieve 61-66% success on miniF2F, outperforming step-level and whole-proof methods while also improving existing provers.
A Minimal Agent for Automated Theorem Proving cs.AI · 2026-02-27 · unverdicted · none · ref 41
A minimal agentic system achieves competitive performance in automated theorem proving with a simpler design and lower cost than state-of-the-art methods.
VERGE: Formal Refinement and Guidance Engine for Verifiable LLM Reasoning cs.CL · 2026-01-27 · unverdicted · none · ref 2
VERGE decomposes LLM outputs into atomic claims, autoformalizes them to first-order logic, verifies with SMT solvers and consensus, and refines via minimal correction subsets, yielding 18.7% average uplift on reasoning benchmarks.
Aristotle: IMO-level Automated Theorem Proving cs.AI · 2025-10-01 · unverdicted · none · ref 21
Aristotle reaches gold-medal-equivalent performance on 2025 IMO problems via integrated Lean proof search, informal lemma formalization, and a dedicated geometry solver.
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models cs.CL · 2024-02-05 · unverdicted · none · ref 20
DeepSeekMath 7B reaches 51.7% on MATH via continued pretraining on curated web math data and Group Relative Policy Optimization.
ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models cs.CL · 2023-05-23 · conditional · none · ref 33
ReWOO decouples reasoning from tool observations in augmented language models, delivering 5x token efficiency and 4% higher accuracy on multi-step reasoning benchmarks like HotpotQA.
On Reasoning-Centric LLM-based Automated Theorem Proving cs.SE · 2026-04-21 · unverdicted · none · ref 10
ReCent-Prover achieves a 22.58% relative improvement over prior state-of-the-art in proved theorems on the CoqStoq benchmark by using reasoning-centric techniques under a fixed LLM invocation budget.
Rethinking Wireless Communications through Formal Mathematical AI Reasoning eess.SP · 2026-04-28 · unverdicted · none · ref 69
Proposes a three-layer framework using formal AI reasoning for verification, derivation, and discovery in wireless communications theory.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer