pith. machine review for the scientific record. sign in

arxiv: 2604.28172 · v1 · submitted 2026-04-30 · 💻 cs.CC

Recognition: unknown

Superpolynomial Length Lower Bounds for Tree-Like Semantic Proof Systems with Bounded Line Size

Authors on Pith no claims yet

Pith reviewed 2026-05-07 04:54 UTC · model grok-4.3

classification 💻 cs.CC
keywords proof complexityFrege proof systemstree-like refutationslower boundssemantic proofsCNF formulasthreshold logic
0
0 comments X

The pith

Explicit families of CNF formulas almost surely require superpolynomial length tree-like semantic Frege refutations with bounded line size.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper establishes superpolynomial length lower bounds for tree-like semantic Frege systems when each line has bounded size s(n). For s(n) ranging from n to the power 2 minus epsilon to 2 to the power n to the 1 minus epsilon, it gives explicit families of CNF formulas of size at most s(n) to the 1 plus epsilon such that a random one from the family has no tree-like refutation of polynomial length using lines of that size. The bounds also apply to related systems like threshold logic with appropriate degree. A reader should care because this shows that even with powerful semantic inferences, the combination of tree structure and line size limit forces long proofs for most formulas in these families.

Core claim

We prove that for any n^{2-ε} ≤ s(n) ≤ 2^{n^{1-ε}} there is an explicit family A of n-variate CNF formulas with |A| ≤ s(n)^{1+ε} such that a random A from A has the property that every tree-like Frege refutation using lines of size s(n) has superpolynomial length in |A|. This holds for the semantic versions of Frege and threshold systems and more generally for any semantic tree-like proof system with at most exp(s(n)) possible lines.

What carries the argument

The explicit family of CNF formulas together with the counting argument that bounds the number of possible lines of size s(n) by exp(s(n)).

If this is right

  • Short proofs in these systems can refute only a vanishing fraction of the formulas in the family.
  • The lower bounds extend to tree-like degree-d threshold systems for d ≈ log s(n).
  • Any semantic tree-like system limited to exp(s(n)) lines requires superpolynomial length for almost all formulas in the family.
  • Tree-like structure imposes length lower bounds independent of allowing semantic lines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The explicitness of the family makes these formulas candidates for hard examples in proof complexity experiments or solver benchmarks.
  • The counting method may generalize to prove length lower bounds in other restricted proof systems.
  • If s(n) could be increased to 2^{n^δ} for larger δ, it would get closer to general lower bounds for Frege systems.
  • These results highlight the importance of allowing dag-like or non-tree proofs to achieve short refutations for random CNFs.

Load-bearing premise

That the number of distinct lines possible with size s(n) is at most exponential in s(n), so that short tree-like proofs using them cannot hit most formulas in the large explicit family.

What would settle it

Constructing or finding a polynomial-length tree-like semantic Frege refutation with line size s(n) for some formula chosen randomly from the family A would falsify the claim.

read the original abstract

We prove superpolynomial length lower bounds for the semantic tree-like Frege refutation system with bounded line size. Concretely, for any function $n^{2-\varepsilon} \leq s(n) \leq 2^{n^{1-\varepsilon}}$ we exhibit an explicit family $\mathcal{A}$ of $n$-variate CNF formulas $A$, each of size $|A| \le s(n)^{1+\varepsilon}$, such that if $A$ is chosen uniformly from $\mathcal{A}$, then asymptotically almost surely any tree-like Frege refutation of $A$ in line-size $s(n)$ is of length super-polynomial in $|A|$. Our lower bounds apply also to tree-like degree-$d$ threshold systems, for $d \approx \log\bigl(s(n)\bigr)$, that is, for $d$ up to $n^{1-\varepsilon}$. More generally, our lower bounds apply to the semantic version of these systems and to any semantic tree-like proof system where the number of distinct lines is bounded by $\exp\bigl(s(n)\bigr)$.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The paper proves superpolynomial length lower bounds for tree-like semantic Frege refutations (and generalizations to threshold and other semantic systems) with line size s(n) for n^{2-ε} ≤ s(n) ≤ 2^{n^{1-ε}}. It constructs an explicit family 𝒜 of n-variate CNF formulas A with |A| ≤ s(n)^{1+ε} clauses such that a uniformly random A ∈ 𝒜 requires refutation length superpolynomial in |A| with high probability. The argument relies on a counting bound over the at most exp(O(s(n))) possible lines combined with properties of the explicit family 𝒜 to show that short proofs cannot refute most members of 𝒜.

Significance. If correct, the result is significant because it establishes strong lower bounds for semantic (rather than syntactic) tree-like proof systems with bounded line size, a setting where standard counting arguments are weaker. The explicit construction of 𝒜 and the broad range of s(n) (including superpolynomial and subexponential regimes) are strengths, as is the extension to degree-d threshold systems for d ≈ log s(n). The probabilistic method on an explicit family yields falsifiable predictions about concrete formula families and avoids parameter-fitting.

major comments (3)
  1. [§3] §3 (Construction of 𝒜): The explicit family 𝒜 must be defined so that, for any fixed tree-like proof using k ≤ L lines, the fraction of A ∈ 𝒜 refuted by that proof is at most exp(−Ω(s(n)·L)). The standard union bound over all CNFs only yields L = o(n/s(n)), which is sub-polynomial in |A| ≈ s^{1+ε} and insufficient for the claimed superpolynomial lower bound. Please state the precise combinatorial property of 𝒜 (e.g., limited intersections with small clause sets) that improves the fraction by the required exponential factor, and verify it holds for the given construction.
  2. [§4] §4, the main counting argument (around the union bound over proofs): With at most exp(O(s(n))) lines, there are at most exp(O(s(n)·L)) candidate proofs of length L. The manuscript must show explicitly that the measure of A ∈ 𝒜 refuted by any such proof is small enough that the total measure is o(1) whenever L is only polynomial in |A|. Cite the equation that converts the per-proof fraction into the final superpolynomial bound on L.
  3. [§5] §5 (Extension to threshold systems): The adaptation from Frege to degree-d threshold systems with d ≈ log s(n) requires that the line-size bound and the number of possible lines remain exp(O(s(n))). Confirm that the same counting argument applies verbatim and that the explicit family 𝒜 remains hard for the threshold variant; any additional combinatorial property needed for threshold lines should be stated.
minor comments (3)
  1. The notation distinguishing the family 𝒜 from a formula A is slightly overloaded; a different symbol for the family (e.g., ℱ) would improve readability.
  2. [§1] In the abstract and §1, the phrase “asymptotically almost surely” should be accompanied by the precise probability bound (e.g., 1−o(1) as n→∞) that is proved for the random choice of A.
  3. The manuscript would benefit from a short table comparing the new lower bounds with prior results for syntactic vs. semantic tree-like systems across the same range of s(n).

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the careful and constructive review. The comments correctly identify places where additional explicit statements and citations would improve clarity. We address each major comment below and will revise the manuscript to incorporate the requested details and verifications.

read point-by-point responses
  1. Referee: [§3] §3 (Construction of 𝒜): The explicit family 𝒜 must be defined so that, for any fixed tree-like proof using k ≤ L lines, the fraction of A ∈ 𝒜 refuted by that proof is at most exp(−Ω(s(n)·L)). The standard union bound over all CNFs only yields L = o(n/s(n)), which is sub-polynomial in |A| ≈ s^{1+ε} and insufficient for the claimed superpolynomial lower bound. Please state the precise combinatorial property of 𝒜 (e.g., limited intersections with small clause sets) that improves the fraction by the required exponential factor, and verify it holds for the given construction.

    Authors: The family 𝒜 is constructed in Section 3 so that any fixed collection of at most s(n) clauses intersects only an exp(−Ω(s(n))) fraction of the formulas in 𝒜. This limited-intersection property (arising from the explicit disperser-based selection of clauses) directly yields the stronger per-proof bound of exp(−Ω(s(n)·L)) for any tree-like proof of length L. We will insert a new lemma (Lemma 3.4) that isolates this combinatorial property, prove it for the given explicit construction, and show how it upgrades the union bound beyond the generic CNF case. revision: yes

  2. Referee: [§4] §4, the main counting argument (around the union bound over proofs): With at most exp(O(s(n))) lines, there are at most exp(O(s(n)·L)) candidate proofs of length L. The manuscript must show explicitly that the measure of A ∈ 𝒜 refuted by any such proof is small enough that the total measure is o(1) whenever L is only polynomial in |A|. Cite the equation that converts the per-proof fraction into the final superpolynomial bound on L.

    Authors: Section 4 bounds the number of possible lines by exp(O(s(n))), hence at most exp(O(s(n)·L)) candidate proofs. Each such proof refutes at most an exp(−Ω(s(n)·L)) fraction of 𝒜 by the property of §3. The union bound therefore gives total measure at most exp(O(s L) − Ω(s L)) = o(1) whenever L = |A|^{o(1)}. Equation (4.3) performs exactly this conversion; we will expand the paragraph containing (4.3) to display the arithmetic explicitly and to cite the per-proof fraction from the new Lemma 3.4. revision: yes

  3. Referee: [§5] §5 (Extension to threshold systems): The adaptation from Frege to degree-d threshold systems with d ≈ log s(n) requires that the line-size bound and the number of possible lines remain exp(O(s(n))). Confirm that the same counting argument applies verbatim and that the explicit family 𝒜 remains hard for the threshold variant; any additional combinatorial property needed for threshold lines should be stated.

    Authors: Each degree-d threshold line (d ≈ log s(n)) admits a description of size O(s(n)), so the total number of distinct lines remains exp(O(s(n))). The counting argument of §4 therefore carries over verbatim. The same limited-intersection property of 𝒜 already established for ordinary clauses continues to hold when the lines are threshold gates, because the proof only uses the semantic semantics and the cardinality bound on the line set; no further combinatorial hypothesis is required. We will add a short paragraph at the end of §5 confirming these facts and noting that the bound on the number of lines is unchanged. revision: yes

Circularity Check

0 steps flagged

No circularity: explicit family and direct counting argument are self-contained

full rationale

The paper constructs an explicit family A of n-variate CNF formulas of size at most s(n)^{1+ε} and applies a union bound over the at most exp(O(s(n)·L)) candidate tree-like proofs of length L with line size s(n). For a random A drawn from the family, short proofs refute only a negligible fraction, establishing the superpolynomial length lower bound directly. This is a standard probabilistic-method argument with no self-definitional equations, fitted parameters renamed as predictions, load-bearing self-citations, or ansatzes smuggled via prior work. The derivation is independent of its own outputs and does not reduce to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The proof relies on standard mathematical tools in complexity theory without introducing new free parameters or invented entities.

axioms (2)
  • standard math The probabilistic method applies to show that a random formula from the family has the desired property with high probability.
    Invoked to transfer the existence to almost sure properties.
  • domain assumption Semantic proof systems can use any valid logical inference as a rule.
    This allows the lower bound to apply to the semantic version as stated.

pith-pipeline@v0.9.0 · 5511 in / 1392 out tokens · 159804 ms · 2026-05-07T04:54:11.591804+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

21 extracted references · 20 canonical work pages

  1. [1]

    Neural network learning: Theoretical foundations

    Martin Anthony and Peter L Bartlett. Neural network learning: Theoretical foundations . cambridge university press, 2009. https://doi.org/10.1017/CBO9780511624216 doi:10.1017/CBO9780511624216

  2. [2]

    Proof complexity meets algebra

    Albert Atserias and Joanna Ochremiak. Proof complexity meets algebra. 20(1), December 2018. https://doi.org/10.1145/3265985 doi:10.1145/3265985

  3. [3]

    VC dimension and distribution-free sample-based testing

    Eric Blais, Renato Ferreira Pinto Jr., and Nathaniel Harms. VC dimension and distribution-free sample-based testing. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing , page 504–517. Association for Computing Machinery, 2021. https://doi.org/10.1145/3406325.3451104 doi:10.1145/3406325.3451104

  4. [4]

    The resolution complexity of independent sets and vertex covers in random graphs

    Paul Beame, Russell Impagliazzo, and Ashish Sabharwal. The resolution complexity of independent sets and vertex covers in random graphs. Computational Complexity , 16(3):245 --297, October 2007. https://doi.org/10.1007/s00037-007-0230-0 doi:10.1007/s00037-007-0230-0

  5. [5]

    Lower bounds for Lov´ asz–Schrijver systems and beyond follow from multiparty communication complexity.SIAM Journal on Computing, 37(3):845–869, 2007

    Paul Beame, Toniann Pitassi, and Nathan Segerlind. Lower bounds for L ovász– S chrijver systems and beyond follow from multiparty communication complexity. SIAM Journal on Computing , 37(3):845--869, 2007. https://doi.org/10.1137/060654645 doi:10.1137/060654645

  6. [6]

    Free resolutions of function classes via order complexes

    Justin Chen, Christopher Eur, Greg Yang, and Mengyuan Zhang. Free resolutions of function classes via order complexes. Advances in Applied Mathematics , 120:102074, 2020. https://doi.org/10.1016/j.aam.2020.102074 doi:10.1016/j.aam.2020.102074

  7. [7]

    The relative efficiency of propositional proof systems

    Stephen A. Cook and Robert A. Reckhow. The relative efficiency of propositional proof systems. Journal of Symbolic Logic , 44(1):36 --50, March 1979. Preliminary version in STOC '74. https://doi.org/10.2307/2273702 doi:10.2307/2273702

  8. [8]

    de Rezende, Aaron Potechin, and Kilian Risse

    Susanna F. de Rezende, Aaron Potechin, and Kilian Risse. Clique is hard on average for unary S herali-- A dams. In 2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS) , pages 12--25, 2023. https://doi.org/10.1109/FOCS57990.2023.00008 doi:10.1109/FOCS57990.2023.00008

  9. [9]

    de Rezende, Aaron Potechin, and Kilian Risse

    Susanna F. de Rezende, Aaron Potechin, and Kilian Risse. Clique is hard on average for S herali-- A dams with bounded coefficients. CoRR , abs/2404.16722, 2024. https://doi.org/10.48550/ARXIV.2404.16722 doi:10.48550/ARXIV.2404.16722

  10. [10]

    Separations in proof complexity and TFNP

    Mika G\" o \" o s, Alexandros Hollender, Siddhartha Jain, Gilbert Maystre, William Pires, Robert Robere, and Ran Tao. Separations in proof complexity and TFNP . J. ACM , 71(4), August 2024. https://doi.org/10.1145/3663758 doi:10.1145/3663758

  11. [11]

    Communication lower bounds via critical block sensitivity

    Mika G\" o \" o s and Toniann Pitassi. Communication lower bounds via critical block sensitivity. SIAM Journal on Computing , 47(5):1778--1806, 2018. https://doi.org/10.1137/16M1082007 doi:10.1137/16M1082007

  12. [12]

    TFNP Intersections Through the Lens of Feasible Disjunction

    Pavel Hub\' a c ek, Erfan Khaniki, and Neil Thapen. TFNP Intersections Through the Lens of Feasible Disjunction . In Venkatesan Guruswami, editor, 15th Innovations in Theoretical Computer Science Conference (ITCS 2024) , volume 287 of Leibniz International Proceedings in Informatics (LIPIcs) , pages 63:1--63:24, Dagstuhl, Germany, 2024. Schloss Dagstuhl -...

  13. [13]

    Proof complexity of natural formulas via communication arguments

    Dmitry Itsykson and Artur Riazanov. Proof Complexity of Natural Formulas via Communication Arguments . In Valentine Kabanets, editor, 36th Computational Complexity Conference (CCC 2021) , volume 200 of Leibniz International Proceedings in Informatics (LIPIcs) , pages 3:1--3:34, Dagstuhl, Germany, 2021. Schloss Dagstuhl -- Leibniz-Zentrum f \"u r Informati...

  14. [14]

    Resolution over linear equations modulo two

    Dmitry Itsykson and Dmitry Sokolov. Resolution over linear equations modulo two. Annals of Pure and Applied Logic , 171(1):102722, 2020. https://doi.org/10.1016/j.apal.2019.102722 doi:10.1016/j.apal.2019.102722

  15. [15]

    Kothari and Peter Manohar

    Pravesh K. Kothari and Peter Manohar. An exponential lower bound for linear 3-query locally correctable codes. In Bojan Mohar, Igor Shinkar, and Ryan O'Donnell, editors, Proceedings of the 56th Annual ACM Symposium on Theory of Computing, STOC 2024, Vancouver, BC, Canada, June 24-28, 2024 , pages 776--787. ACM , 2024. https://doi.org/10.1145/3618260.36496...

  16. [16]

    Improved Distance (Sensitivity) Oracles with Subquadratic Space

    Pravesh K. Kothari and Peter Manohar. Exponential lower bounds for smooth 3- LCC s and sharp bounds for designs. In 65th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2024, Chicago, IL, USA, October 27-30, 2024 , pages 1802--1845. IEEE , 2024. https://doi.org/10.1109/FOCS61266.2024.00110 doi:10.1109/FOCS61266.2024.00110

  17. [17]

    Bounds on the Total Coefficient Size of Nullstellensatz Proofs of the Pigeonhole Principle

    Aaron Potechin and Aaron Zhang. Bounds on the Total Coefficient Size of Nullstellensatz Proofs of the Pigeonhole Principle . In Karl Bringmann, Martin Grohe, Gabriele Puppis, and Ola Svensson, editors, 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024) , volume 297 of Leibniz International Proceedings in Informatics (LIPIcs...

  18. [18]

    Robert A. Reckhow. On the Lengths of Proofs in the Propositional Calculus . PhD thesis, University of Toronto, 1975. https://doi.org/1807/100390 doi:1807/100390

  19. [19]

    1972 , _month = jul, journal =

    N Sauer. On the density of families of sets. Journal of Combinatorial Theory, Series A , 13(1):145--147, 1972. https://doi.org/10.1016/0097-3165(72)90019-2 doi:10.1016/0097-3165(72)90019-2

  20. [20]

    A combinatorial problem; stability and order for models and theories in infinitary languages

    Saharon Shelah. A combinatorial problem; stability and order for models and theories in infinitary languages. Pacific Journal of Mathematics , 41(1):247 -- 261, 1972. https://doi.org/10.2140/pjm.1972.41.247 doi:10.2140/pjm.1972.41.247

  21. [21]

    Understanding Machine Learning

    Shai Shalev-Shwartz and Shai Ben-David. Understanding Machine Learning: From Theory to Algorithms . Cambridge University Press, 2014. https://doi.org/10.1017/CBO9781107298019 doi:10.1017/CBO9781107298019