pith. sign in

arxiv: 2605.15445 · v1 · pith:HYDIVZQ5new · submitted 2026-05-14 · 💻 cs.AI

From LLM-Generated Conjectures to Lean Formalizations: Automated Polynomial Inequality Proving via Sum-of-Squares Certificates

Pith reviewed 2026-05-19 14:36 UTC · model grok-4.3

classification 💻 cs.AI
keywords polynomial inequalitiessum-of-squares certificateslarge language modelsLean theorem proverneuro-symbolic methodsautomated theorem provingformal verification
0
0 comments X

The pith

LLM-suggested approximate sum-of-squares decompositions can be refined symbolically into exact Lean-verified proofs for polynomial inequalities with up to 10 variables.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces NSPI, a neuro-symbolic system that lets an LLM first propose an approximate sum-of-squares decomposition for a given polynomial inequality. Symbolic computation then tightens this guess into an exact algebraic identity that certifies the inequality holds. The resulting certificate is automatically translated into a machine-checked proof inside the Lean theorem prover. By linking heuristic generation to precise algebraic refinement and formal verification, the approach seeks to prove inequalities whose variable count or degree exceeds the practical reach of either pure symbolic solvers or standalone language-model attempts.

Core claim

NSPI supplies an end-to-end pipeline in which an LLM generates a candidate approximate polynomial sum-of-squares decomposition; symbolic refinement converts the candidate into an exact SOS representation that proves the target inequality; the proof is then certified inside Lean, yielding a fully machine-checked formalization. Experiments show the pipeline succeeds on challenging benchmark inequalities involving polynomials with as many as 10 variables.

What carries the argument

The neuro-symbolic refinement loop that converts an LLM-proposed approximate sum-of-squares decomposition into an exact algebraic certificate via symbolic computation.

If this is right

  • Purely symbolic solvers can be reserved for the final tightening step rather than the entire search.
  • Machine-checked Lean proofs become a routine byproduct of the inequality-proving process.
  • The method extends the reach of automated proving beyond the small-variable regime typical of earlier LLM-only attempts.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar hybrid pipelines might be tested on other algebraic tasks such as proving identities or solving systems of polynomial equations.
  • If refinement cost remains modest, the same pattern could reduce the need for hand-crafted heuristics in formal mathematics libraries.

Load-bearing premise

LLM-generated approximate SOS decompositions are close enough to exact identities that symbolic refinement succeeds without prohibitive cost or repeated failure on the target polynomials.

What would settle it

A benchmark set of 8-to-10-variable polynomial inequalities on which the symbolic refiner fails to produce an exact SOS decomposition for a majority of LLM proposals would falsify the claimed scalability.

Figures

Figures reproduced from arXiv: 2605.15445 by Gaolei He, Hanrui Zhao, Jianlin Wang, Ruobing Zuo, Zhengfeng Yang.

Figure 1
Figure 1. Figure 1: Overview of Neuro-Symbolic SOS-based Polynomial Inequality Proving (NSPI). (1) Neural Conjecture Module: Non￾negative polynomial-SOS representation pairs are constructed using computation-driven and structure-driven approaches. A Large Language Model (LLM) is trained on the constructed data to function as an SOS structure conjecturer, which generates corresponding SOS representations based on the non-negat… view at source ↗
Figure 2
Figure 2. Figure 2: Two-stage training of the SOS conjecturer: (1) Cold Start: supervised fine-tuning (SFT) on large-scale synthetic polyno￾mial–SOS pairs; (2) Progressive reinforcement learning: curriculum-based GRPO on challenging training data. most notably diagonally dominant (dd) matrices and scaled diagonally dominant (sdd) matrices. Specifically, a matrix G ∈ R m×m is defined as diag￾onally dominant if it satisfies the… view at source ↗
Figure 3
Figure 3. Figure 3: Distribution of sample counts across data source cate￾gories in PolyIneq-Real. 2014; Tung, 2012; Riasat, 2008; Manfrino et al., 2010; Mil￾dorf, 2005; Parvardi, 2011; Tung & Zhe, 2014; Lee, 2005; Andreescu, 2019), forming PolyIneq-Real [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Performance on PolyIneqBench across different training stages. Left: Success rate heatmap showing capability boundary migration. Middle: Overall performance gains from base model through RL stages. Right: Scalability trends across curriculum groups. work outperforms symbol-based methods, pure LLM-based approaches, and other hybrid systems, achieving best per￾formance. Notably, while baseline methods strugg… view at source ↗
Figure 6
Figure 6. Figure 6: Framework of SOS data construction methods in Section 4.1.1. The generation of polynomial-SOS pairs is based on the quadratic form f(x) = v(x) TGev(x), where the Gram matrix Ge is synthesized through either computation-driven or structure-driven approaches. B.1. Spectral Foundations for Computation-Driven Methods The computation-driven methods rely on the relationship between eigenvalues and the PSD proper… view at source ↗
Figure 7
Figure 7. Figure 7: Statistical distribution of degrees for synthetic polynomial data. As illustrated in [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Statistical distribution of SOS term counts in synthetic polynomials. generating rewards {r1, . . . , rG}, and the advantage is computed as follows: Ai ˆ = ri − mean(r1, . . . , rG) std(r1, . . . , rG) (34) The policy model πθ is then optimized by maximizing the following objective: JGRPO(θ) = Eq∼Q,{oi}i=1G∼πold(O|q) 1 G X G i=1 1 |oi | X |oi| t=1 {min h γi,t(θ)Aˆ i,t, clip (γi,t(θ), 1 − ϵ, 1 + ϵ) Aˆ i,ti … view at source ↗
Figure 9
Figure 9. Figure 9: Automated Lean Proof Generation Example. left: The pre-defined Lean Proof Template. right: An Example of automated proof generation within Lean4, where the formal verification is completed using the integrated tactic “LLM Ineq”. 20 [PITH_FULL_IMAGE:figures/full_fig_p020_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Lean template for inequality proof based on SOS representation. The template uses the following placeholders: • <Theorem Name>: the customized theorem name; • <Polynomial>: the target polynomial expression (often an expanded or normalized form equivalent to the original); • <Exact SOS Expression>: a list of exact square terms constituting the SOS certificate, encoded as a Lean list of pairs (e.g., (p, k))… view at source ↗
Figure 11
Figure 11. Figure 11: Detailed workflow of the NSPI method applied to the Peyrl-Parrilo polynomial from PolyIneq-Real (Case 1).This example illustrates the complete pipeline from initial conjecture to formal certification. LLM Conjecture: <SOS Expression>: Newton-refined Approximations Exact SOS Representation via Rational Recovery Conjecture Newton Iteration Rational Recovery Case 2: A polynomial with 10 variables from PolyIn… view at source ↗
Figure 12
Figure 12. Figure 12: Case study of a challenging synthetic polynomial from PolyIneq-Synth. This 10-variable instance represents a significant challenge in high-dimensional formal reasoning; notably, NSPI is the sole method among all evaluated baselines (Case 2) capable of successfully generating a verified formal proof. 27 [PITH_FULL_IMAGE:figures/full_fig_p027_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Comparative analysis of different polynomial solving methods: (a) average number of terms; (b) average total degree [PITH_FULL_IMAGE:figures/full_fig_p028_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Average number of variables in polynomial inequalities successfully verified by each method. This metric illustrates the scalability of various provers, where NSPI (ours) demonstrates a superior capability in handling higher-dimensional problems compared to both symbolic baselines and LLM-based provers. solved instances [PITH_FULL_IMAGE:figures/full_fig_p029_14.png] view at source ↗
read the original abstract

Automated proving of polynomial inequalities is a fundamental challenge in automated mathematical reasoning, where rich algebraic structure and a rapidly growing certificate search space hinder scalability. Purely symbolic approaches provide strong guarantees but often scale poorly as the number of variables or the degree increases, due to expensive algebraic manipulations and rapidly growing intermediate expressions. In parallel, LLM-guided methods have made notable progress, particularly on competition-style inequalities with a small number of variables. To address the remaining scalability challenges, we propose NSPI, a neuro-symbolic framework that combines the complementary strengths of LLMs and symbolic computation for polynomial-inequality proving. Concretely, an LLM proposes a conjecture in the form of an approximate polynomial Sum-Of-Squares (SOS) decomposition; we refine it via symbolic computation to obtain an exact polynomial SOS representation, which directly proves the target inequality, and we further certify the proof in Lean, yielding an end-to-end pipeline from heuristic discovery to machine-checked proof. Experiments on challenging benchmarks involving polynomials with up to 10 variables demonstrate the effectiveness and scalability of the proposed method.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes NSPI, a neuro-symbolic pipeline for automated proving of polynomial inequalities. An LLM generates an approximate Sum-of-Squares (SOS) decomposition as a conjecture; this is refined by symbolic computation into an exact SOS certificate that proves the inequality; the certificate is then formalized and checked in Lean. The authors claim this yields an end-to-end workflow from heuristic discovery to machine-checked proof and that experiments on benchmarks with polynomials of up to 10 variables demonstrate effectiveness and scalability.

Significance. If the empirical claims are substantiated, the work would be significant for automated mathematical reasoning: it offers a concrete way to leverage LLM heuristics while retaining the strong guarantees of SOS certificates and Lean formalization. The integration of approximate LLM output with exact symbolic refinement and external verification is a clear strength, and the focus on higher-variable cases addresses a recognized scalability bottleneck in purely symbolic inequality proving.

major comments (2)
  1. [Experiments section] Abstract and Experiments section: the claim that experiments 'demonstrate the effectiveness and scalability' of NSPI on polynomials with up to 10 variables is load-bearing for the central contribution, yet the manuscript supplies no quantitative success rates for the symbolic refinement step, no failure-mode analysis, no baseline comparisons, and no runtime scaling data as the number of variables or degree increases. Without these metrics it is impossible to assess whether LLM-generated approximations reliably lie inside the basin of attraction of the refinement procedure.
  2. [Method section] Method section (pipeline description): the correctness of the end-to-end claim rests on the unexamined assumption that LLM approximate SOS decompositions are sufficiently close to exact rational SOS certificates for symbolic refinement (e.g., Newton iteration or exact SDP over rationals) to succeed at acceptable cost. No analysis or bounds are given on the approximation error or on how this error grows with dimension, leaving the scalability assertion for 10-variable instances unsupported.
minor comments (1)
  1. [Method and Lean sections] The notation for the approximate versus exact SOS decompositions could be made more consistent across the method description and the Lean formalization section to improve readability.

Simulated Author's Rebuttal

2 responses · 1 unresolved

Thank you for the constructive feedback on our manuscript. We address each major comment below, indicating planned revisions where appropriate to strengthen the presentation of our experimental results and methodological assumptions.

read point-by-point responses
  1. Referee: [Experiments section] Abstract and Experiments section: the claim that experiments 'demonstrate the effectiveness and scalability' of NSPI on polynomials with up to 10 variables is load-bearing for the central contribution, yet the manuscript supplies no quantitative success rates for the symbolic refinement step, no failure-mode analysis, no baseline comparisons, and no runtime scaling data as the number of variables or degree increases. Without these metrics it is impossible to assess whether LLM-generated approximations reliably lie inside the basin of attraction of the refinement procedure.

    Authors: We acknowledge that the Experiments section primarily reports successful end-to-end proofs on the selected benchmarks with up to 10 variables but lacks the detailed quantitative breakdown requested. To address this directly, the revised manuscript will expand the Experiments section with: (i) success rates specifically for the symbolic refinement step (e.g., percentage of LLM conjectures that converge to exact rational SOS certificates), (ii) a failure-mode analysis categorizing cases where refinement fails (such as when the initial approximation error exceeds the basin of attraction), (iii) baseline comparisons against purely symbolic SOS solvers, and (iv) runtime scaling data and plots as the number of variables and degree increase. These additions will provide concrete evidence for the reliability of the LLM-generated approximations and the overall scalability claims. revision: yes

  2. Referee: [Method section] Method section (pipeline description): the correctness of the end-to-end claim rests on the unexamined assumption that LLM approximate SOS decompositions are sufficiently close to exact rational SOS certificates for symbolic refinement (e.g., Newton iteration or exact SDP over rationals) to succeed at acceptable cost. No analysis or bounds are given on the approximation error or on how this error grows with dimension, leaving the scalability assertion for 10-variable instances unsupported.

    Authors: The Method section describes the neuro-symbolic pipeline in which the LLM provides an approximate SOS conjecture that is subsequently refined symbolically. We agree that no explicit error bounds or growth analysis with dimension are provided. In revision we will add an empirical analysis subsection that reports observed approximation errors from the LLM outputs on the benchmark set and demonstrates the convergence radius of the refinement procedures (Newton iteration and rational SDP) used. This will substantiate that the approximations lie within the basin of attraction for the instances considered. However, deriving general theoretical bounds on error growth with dimension would require new analysis of LLM behavior in algebraic domains and lies beyond the scope of the current empirical study; our 10-variable results remain supported by the observed practical success of the refinement step. revision: partial

standing simulated objections not resolved
  • Deriving theoretical bounds on LLM approximation error and its scaling with dimension, which would necessitate substantial new theoretical research outside the present empirical contribution.

Circularity Check

0 steps flagged

NSPI is a procedural pipeline with external verification; no derivation reduces to inputs by construction

full rationale

The paper presents NSPI as a sequence of distinct stages: LLM heuristic generation of an approximate SOS decomposition, followed by symbolic refinement to an exact SOS certificate, followed by Lean formalization. Each stage is described as a computational procedure whose output is checked by independent tools (symbolic solvers and the Lean theorem prover). No equation or central claim is shown to be equivalent to its own inputs, no parameter is fitted on a subset and then relabeled as a prediction, and no load-bearing premise rests solely on a self-citation whose content is unverified outside the present work. The reported effectiveness on up to 10-variable benchmarks is offered as empirical evidence rather than a mathematical identity derived from the method itself. Consequently the derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Review is based on abstract only; concrete free parameters, implementation axioms, and any invented entities are not visible. The approach implicitly rests on the classical fact that a sum-of-squares representation certifies non-negativity.

axioms (1)
  • standard math A polynomial that admits an exact sum-of-squares decomposition is non-negative and thereby proves the target inequality.
    This is the foundational certificate used by the symbolic refinement step.
invented entities (1)
  • NSPI framework no independent evidence
    purpose: Hybrid pipeline that routes LLM output through symbolic refinement and Lean certification.
    New procedural combination introduced by the paper.

pith-pipeline@v0.9.0 · 5733 in / 1238 out tokens · 57954 ms · 2026-05-19T14:36:59.475822+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages

  1. [1]

    9 From LLM Conjectures to Lean Formalizations: Automated Inequality Proving via Sum-of-Squares Certificates Heule, M

    Springer, 1993. 9 From LLM Conjectures to Lean Formalizations: Automated Inequality Proving via Sum-of-Squares Certificates Heule, M. J. H., Kullmann, O., and Marek, V . W. Solving and verifying the boolean pythagorean triples problem via cube-and-conquer. In Creignou, N. and Le Berre, D. (eds.),Theory and Applications of Satisfiability Testing – SAT 2016...

  2. [2]

    URL https://www.sciencedirect.com/ science/article/pii/S0304397508006452

    doi: https://doi.org/10.1016/j.tcs.2008.09.025. URL https://www.sciencedirect.com/ science/article/pii/S0304397508006452. Symbolic-Numerical Computations. Peyrl, H. and Parrilo, P. A. Computing sum of squares decompositions with rational coefficients.Theor. Com- put. Sci., 409:269–281, 2008b. URL https://api. semanticscholar.org/CorpusID:2370460. Polu, S....

  3. [3]

    The polynomialf(x)admits a rational SOS decomposition, that is, there exist polynomialsf i(x)∈Q[x]such that f(x) = rX i=1 fi(x)2

  4. [4]

    There exists a symmetric positive semidefinite Gram matrixG∈S m ∩Q m×m such that f(x) =v(x) ⊤Gv(x), G⪰0. For the numerical solution GN obtained after the Gauss–Newton refinement, when the backward errorθ is sufficiently small, one can recover from GN an exact rational PSD matrix eG, which satisfies the polynomial identity exactly, thereby yielding a certi...

  5. [5]

    Analyze the input polynomial carefully, focusing on the coefficients and the combinations of variables involved

  6. [6]

    Infer possible linear or polynomial terms inside each square in the sum-of-squares expression

  7. [7]

    Keep the output compact and well-structured

    Construct a sum of square terms without expanding the squares. Keep the output compact and well-structured

  8. [8]

    Minor numerical deviations are acceptable

    Aim for the expanded form of your SOS expression to closely approximate the coefficients in the original polynomial. Minor numerical deviations are acceptable

  9. [9]

    If multiple valid SOS decompositions exist, prefer one that is simple, symmetric, and easy to interpret

  10. [10]

    Include variables and constants inside parentheses when appropriate to match constant terms in the input polynomial. Output format: Please provide your response in the following structure: <SOS Expression>: <sum of squares expression> Where<sum of squares expression>is a sum-of-squares term, expressed compactly. For example: (x1 + 1)ˆ2 + (2 *x1 + 3 *x2)ˆ2...

  11. [11]

    The output must be a sum-of-squares expression

    Do not simply rewrite the input polynomial or output expanded terms. The output must be a sum-of-squares expression

  12. [12]

    Constants inside square terms can be fractional or decimal, as needed, to best approximate the original polynomial

  13. [13]

    Avoid expanding the squares in the output; always keep terms inside parentheses squared

  14. [14]

    Symmetry and simplicity are preferred if multiple answers fit

    Aim for clear and interpretable variable groupings. Symmetry and simplicity are preferred if multiple answers fit

  15. [15]

    LLM Ineq

    Your SOS expression should fully explain the input polynomial’s structure and coefficients to the best achievable extent. Example: Input polynomial: x1ˆ2 + 2 *x1 + 1 Output: (SOS Expression): (x1 + 1)ˆ2 Explanation: The polynomial is a perfect square trinomial; the SOS reconstruction is exact. Input polynomial: 5*x1ˆ2 + 12 *x1*x2 + 6 *x1 + 9 *x2ˆ2 + 9 Out...