arxiv: 2604.28172 · v1 · submitted 2026-04-30 · 💻 cs.CC

Recognition: unknown

Superpolynomial Length Lower Bounds for Tree-Like Semantic Proof Systems with Bounded Line Size

Susanna F. de Rezende , David Engstr\"om , Yassine Ghannane , Kilian Risse

Authors on Pith no claims yet

Pith reviewed 2026-05-07 04:54 UTC · model grok-4.3

classification 💻 cs.CC

keywords proof complexityFrege proof systemstree-like refutationslower boundssemantic proofsCNF formulasthreshold logic

0 comments

The pith

Explicit families of CNF formulas almost surely require superpolynomial length tree-like semantic Frege refutations with bounded line size.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper establishes superpolynomial length lower bounds for tree-like semantic Frege systems when each line has bounded size s(n). For s(n) ranging from n to the power 2 minus epsilon to 2 to the power n to the 1 minus epsilon, it gives explicit families of CNF formulas of size at most s(n) to the 1 plus epsilon such that a random one from the family has no tree-like refutation of polynomial length using lines of that size. The bounds also apply to related systems like threshold logic with appropriate degree. A reader should care because this shows that even with powerful semantic inferences, the combination of tree structure and line size limit forces long proofs for most formulas in these families.

Core claim

We prove that for any n^{2-ε} ≤ s(n) ≤ 2^{n^{1-ε}} there is an explicit family A of n-variate CNF formulas with |A| ≤ s(n)^{1+ε} such that a random A from A has the property that every tree-like Frege refutation using lines of size s(n) has superpolynomial length in |A|. This holds for the semantic versions of Frege and threshold systems and more generally for any semantic tree-like proof system with at most exp(s(n)) possible lines.

What carries the argument

The explicit family of CNF formulas together with the counting argument that bounds the number of possible lines of size s(n) by exp(s(n)).

If this is right

Short proofs in these systems can refute only a vanishing fraction of the formulas in the family.
The lower bounds extend to tree-like degree-d threshold systems for d ≈ log s(n).
Any semantic tree-like system limited to exp(s(n)) lines requires superpolynomial length for almost all formulas in the family.
Tree-like structure imposes length lower bounds independent of allowing semantic lines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The explicitness of the family makes these formulas candidates for hard examples in proof complexity experiments or solver benchmarks.
The counting method may generalize to prove length lower bounds in other restricted proof systems.
If s(n) could be increased to 2^{n^δ} for larger δ, it would get closer to general lower bounds for Frege systems.
These results highlight the importance of allowing dag-like or non-tree proofs to achieve short refutations for random CNFs.

Load-bearing premise

That the number of distinct lines possible with size s(n) is at most exponential in s(n), so that short tree-like proofs using them cannot hit most formulas in the large explicit family.

What would settle it

Constructing or finding a polynomial-length tree-like semantic Frege refutation with line size s(n) for some formula chosen randomly from the family A would falsify the claim.

read the original abstract

We prove superpolynomial length lower bounds for the semantic tree-like Frege refutation system with bounded line size. Concretely, for any function $n^{2-\varepsilon} \leq s(n) \leq 2^{n^{1-\varepsilon}}$ we exhibit an explicit family $\mathcal{A}$ of $n$-variate CNF formulas $A$, each of size $|A| \le s(n)^{1+\varepsilon}$, such that if $A$ is chosen uniformly from $\mathcal{A}$, then asymptotically almost surely any tree-like Frege refutation of $A$ in line-size $s(n)$ is of length super-polynomial in $|A|$. Our lower bounds apply also to tree-like degree-$d$ threshold systems, for $d \approx \log\bigl(s(n)\bigr)$, that is, for $d$ up to $n^{1-\varepsilon}$. More generally, our lower bounds apply to the semantic version of these systems and to any semantic tree-like proof system where the number of distinct lines is bounded by $\exp\bigl(s(n)\bigr)$.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives superpolynomial lower bounds for semantic tree-like Frege and threshold systems with line size s(n) in the stated range, via an explicit family of CNF formulas where almost all members require long refutations.

read the letter

The core result is a superpolynomial length lower bound for tree-like semantic Frege refutations when line size sits between n to the 2-ε and 2 to the n to the 1-ε. They build an explicit family of n-variable CNFs, each with at most s^{1+ε} clauses, such that a random member of the family almost surely has no short refutation in these systems. The same argument covers tree-like degree-d threshold systems for d up to n^{1-ε} and, more generally, any semantic tree-like system whose lines are at most exp(s) many. That is the new piece: previous lower bounds handled ordinary Frege or unrestricted semantic systems, but this pins down the bounded-line tree-like case over a wide parameter range with an explicit construction rather than a non-constructive existence argument. The probabilistic counting over the family is the main tool, and the explicitness is a genuine plus because it keeps the result usable for further work on proof systems. The extension to threshold systems is a clean bonus that broadens the reach without extra machinery. The soft spot is that the lower bound holds only for almost all formulas in the family, not for every formula, which is standard but limits how far one can push the statement. The counting argument must rely on the family being sparse enough that small unsatisfiable clause sets appear in very few members; the abstract indicates they achieve this, but the details of the construction and the exact probability bounds would need verification to rule out any slippage in the exponents. No obvious circularity or parameter fitting appears. This is a paper for people already working in proof complexity who care about restricted systems and explicit lower bounds. It is not going to reset the field, but it fills a specific gap cleanly enough that a serious referee should see it. I would send it to review.

Referee Report

3 major / 3 minor

Summary. The paper proves superpolynomial length lower bounds for tree-like semantic Frege refutations (and generalizations to threshold and other semantic systems) with line size s(n) for n^{2-ε} ≤ s(n) ≤ 2^{n^{1-ε}}. It constructs an explicit family 𝒜 of n-variate CNF formulas A with |A| ≤ s(n)^{1+ε} clauses such that a uniformly random A ∈ 𝒜 requires refutation length superpolynomial in |A| with high probability. The argument relies on a counting bound over the at most exp(O(s(n))) possible lines combined with properties of the explicit family 𝒜 to show that short proofs cannot refute most members of 𝒜.

Significance. If correct, the result is significant because it establishes strong lower bounds for semantic (rather than syntactic) tree-like proof systems with bounded line size, a setting where standard counting arguments are weaker. The explicit construction of 𝒜 and the broad range of s(n) (including superpolynomial and subexponential regimes) are strengths, as is the extension to degree-d threshold systems for d ≈ log s(n). The probabilistic method on an explicit family yields falsifiable predictions about concrete formula families and avoids parameter-fitting.

major comments (3)

[§3] §3 (Construction of 𝒜): The explicit family 𝒜 must be defined so that, for any fixed tree-like proof using k ≤ L lines, the fraction of A ∈ 𝒜 refuted by that proof is at most exp(−Ω(s(n)·L)). The standard union bound over all CNFs only yields L = o(n/s(n)), which is sub-polynomial in |A| ≈ s^{1+ε} and insufficient for the claimed superpolynomial lower bound. Please state the precise combinatorial property of 𝒜 (e.g., limited intersections with small clause sets) that improves the fraction by the required exponential factor, and verify it holds for the given construction.
[§4] §4, the main counting argument (around the union bound over proofs): With at most exp(O(s(n))) lines, there are at most exp(O(s(n)·L)) candidate proofs of length L. The manuscript must show explicitly that the measure of A ∈ 𝒜 refuted by any such proof is small enough that the total measure is o(1) whenever L is only polynomial in |A|. Cite the equation that converts the per-proof fraction into the final superpolynomial bound on L.
[§5] §5 (Extension to threshold systems): The adaptation from Frege to degree-d threshold systems with d ≈ log s(n) requires that the line-size bound and the number of possible lines remain exp(O(s(n))). Confirm that the same counting argument applies verbatim and that the explicit family 𝒜 remains hard for the threshold variant; any additional combinatorial property needed for threshold lines should be stated.

minor comments (3)

The notation distinguishing the family 𝒜 from a formula A is slightly overloaded; a different symbol for the family (e.g., ℱ) would improve readability.
[§1] In the abstract and §1, the phrase “asymptotically almost surely” should be accompanied by the precise probability bound (e.g., 1−o(1) as n→∞) that is proved for the random choice of A.
The manuscript would benefit from a short table comparing the new lower bounds with prior results for syntactic vs. semantic tree-like systems across the same range of s(n).

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the careful and constructive review. The comments correctly identify places where additional explicit statements and citations would improve clarity. We address each major comment below and will revise the manuscript to incorporate the requested details and verifications.

read point-by-point responses

Referee: [§3] §3 (Construction of 𝒜): The explicit family 𝒜 must be defined so that, for any fixed tree-like proof using k ≤ L lines, the fraction of A ∈ 𝒜 refuted by that proof is at most exp(−Ω(s(n)·L)). The standard union bound over all CNFs only yields L = o(n/s(n)), which is sub-polynomial in |A| ≈ s^{1+ε} and insufficient for the claimed superpolynomial lower bound. Please state the precise combinatorial property of 𝒜 (e.g., limited intersections with small clause sets) that improves the fraction by the required exponential factor, and verify it holds for the given construction.

Authors: The family 𝒜 is constructed in Section 3 so that any fixed collection of at most s(n) clauses intersects only an exp(−Ω(s(n))) fraction of the formulas in 𝒜. This limited-intersection property (arising from the explicit disperser-based selection of clauses) directly yields the stronger per-proof bound of exp(−Ω(s(n)·L)) for any tree-like proof of length L. We will insert a new lemma (Lemma 3.4) that isolates this combinatorial property, prove it for the given explicit construction, and show how it upgrades the union bound beyond the generic CNF case. revision: yes
Referee: [§4] §4, the main counting argument (around the union bound over proofs): With at most exp(O(s(n))) lines, there are at most exp(O(s(n)·L)) candidate proofs of length L. The manuscript must show explicitly that the measure of A ∈ 𝒜 refuted by any such proof is small enough that the total measure is o(1) whenever L is only polynomial in |A|. Cite the equation that converts the per-proof fraction into the final superpolynomial bound on L.

Authors: Section 4 bounds the number of possible lines by exp(O(s(n))), hence at most exp(O(s(n)·L)) candidate proofs. Each such proof refutes at most an exp(−Ω(s(n)·L)) fraction of 𝒜 by the property of §3. The union bound therefore gives total measure at most exp(O(s L) − Ω(s L)) = o(1) whenever L = |A|^{o(1)}. Equation (4.3) performs exactly this conversion; we will expand the paragraph containing (4.3) to display the arithmetic explicitly and to cite the per-proof fraction from the new Lemma 3.4. revision: yes
Referee: [§5] §5 (Extension to threshold systems): The adaptation from Frege to degree-d threshold systems with d ≈ log s(n) requires that the line-size bound and the number of possible lines remain exp(O(s(n))). Confirm that the same counting argument applies verbatim and that the explicit family 𝒜 remains hard for the threshold variant; any additional combinatorial property needed for threshold lines should be stated.

Authors: Each degree-d threshold line (d ≈ log s(n)) admits a description of size O(s(n)), so the total number of distinct lines remains exp(O(s(n))). The counting argument of §4 therefore carries over verbatim. The same limited-intersection property of 𝒜 already established for ordinary clauses continues to hold when the lines are threshold gates, because the proof only uses the semantic semantics and the cardinality bound on the line set; no further combinatorial hypothesis is required. We will add a short paragraph at the end of §5 confirming these facts and noting that the bound on the number of lines is unchanged. revision: yes

Circularity Check

0 steps flagged

No circularity: explicit family and direct counting argument are self-contained

full rationale

The paper constructs an explicit family A of n-variate CNF formulas of size at most s(n)^{1+ε} and applies a union bound over the at most exp(O(s(n)·L)) candidate tree-like proofs of length L with line size s(n). For a random A drawn from the family, short proofs refute only a negligible fraction, establishing the superpolynomial length lower bound directly. This is a standard probabilistic-method argument with no self-definitional equations, fitted parameters renamed as predictions, load-bearing self-citations, or ansatzes smuggled via prior work. The derivation is independent of its own outputs and does not reduce to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The proof relies on standard mathematical tools in complexity theory without introducing new free parameters or invented entities.

axioms (2)

standard math The probabilistic method applies to show that a random formula from the family has the desired property with high probability.
Invoked to transfer the existence to almost sure properties.
domain assumption Semantic proof systems can use any valid logical inference as a rule.
This allows the lower bound to apply to the semantic version as stated.

pith-pipeline@v0.9.0 · 5511 in / 1392 out tokens · 159804 ms · 2026-05-07T04:54:11.591804+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

21 extracted references · 20 canonical work pages

[1]

Neural network learning: Theoretical foundations

Martin Anthony and Peter L Bartlett. Neural network learning: Theoretical foundations . cambridge university press, 2009. https://doi.org/10.1017/CBO9780511624216 doi:10.1017/CBO9780511624216

work page doi:10.1017/cbo9780511624216 2009
[2]

Proof complexity meets algebra

Albert Atserias and Joanna Ochremiak. Proof complexity meets algebra. 20(1), December 2018. https://doi.org/10.1145/3265985 doi:10.1145/3265985

work page doi:10.1145/3265985 2018
[3]

VC dimension and distribution-free sample-based testing

Eric Blais, Renato Ferreira Pinto Jr., and Nathaniel Harms. VC dimension and distribution-free sample-based testing. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing , page 504–517. Association for Computing Machinery, 2021. https://doi.org/10.1145/3406325.3451104 doi:10.1145/3406325.3451104

work page doi:10.1145/3406325.3451104 2021
[4]

The resolution complexity of independent sets and vertex covers in random graphs

Paul Beame, Russell Impagliazzo, and Ashish Sabharwal. The resolution complexity of independent sets and vertex covers in random graphs. Computational Complexity , 16(3):245 --297, October 2007. https://doi.org/10.1007/s00037-007-0230-0 doi:10.1007/s00037-007-0230-0

work page doi:10.1007/s00037-007-0230-0 2007
[5]

Lower bounds for Lov´ asz–Schrijver systems and beyond follow from multiparty communication complexity.SIAM Journal on Computing, 37(3):845–869, 2007

Paul Beame, Toniann Pitassi, and Nathan Segerlind. Lower bounds for L ovász– S chrijver systems and beyond follow from multiparty communication complexity. SIAM Journal on Computing , 37(3):845--869, 2007. https://doi.org/10.1137/060654645 doi:10.1137/060654645

work page doi:10.1137/060654645 2007
[6]

Free resolutions of function classes via order complexes

Justin Chen, Christopher Eur, Greg Yang, and Mengyuan Zhang. Free resolutions of function classes via order complexes. Advances in Applied Mathematics , 120:102074, 2020. https://doi.org/10.1016/j.aam.2020.102074 doi:10.1016/j.aam.2020.102074

work page doi:10.1016/j.aam.2020.102074 2020
[7]

The relative efficiency of propositional proof systems

Stephen A. Cook and Robert A. Reckhow. The relative efficiency of propositional proof systems. Journal of Symbolic Logic , 44(1):36 --50, March 1979. Preliminary version in STOC '74. https://doi.org/10.2307/2273702 doi:10.2307/2273702

work page doi:10.2307/2273702 1979
[8]

de Rezende, Aaron Potechin, and Kilian Risse

Susanna F. de Rezende, Aaron Potechin, and Kilian Risse. Clique is hard on average for unary S herali-- A dams. In 2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS) , pages 12--25, 2023. https://doi.org/10.1109/FOCS57990.2023.00008 doi:10.1109/FOCS57990.2023.00008

work page doi:10.1109/focs57990.2023.00008 2023
[9]

de Rezende, Aaron Potechin, and Kilian Risse

Susanna F. de Rezende, Aaron Potechin, and Kilian Risse. Clique is hard on average for S herali-- A dams with bounded coefficients. CoRR , abs/2404.16722, 2024. https://doi.org/10.48550/ARXIV.2404.16722 doi:10.48550/ARXIV.2404.16722

work page doi:10.48550/arxiv.2404.16722 2024
[10]

Separations in proof complexity and TFNP

Mika G\" o \" o s, Alexandros Hollender, Siddhartha Jain, Gilbert Maystre, William Pires, Robert Robere, and Ran Tao. Separations in proof complexity and TFNP . J. ACM , 71(4), August 2024. https://doi.org/10.1145/3663758 doi:10.1145/3663758

work page doi:10.1145/3663758 2024
[11]

Communication lower bounds via critical block sensitivity

Mika G\" o \" o s and Toniann Pitassi. Communication lower bounds via critical block sensitivity. SIAM Journal on Computing , 47(5):1778--1806, 2018. https://doi.org/10.1137/16M1082007 doi:10.1137/16M1082007

work page doi:10.1137/16m1082007 2018
[12]

TFNP Intersections Through the Lens of Feasible Disjunction

Pavel Hub\' a c ek, Erfan Khaniki, and Neil Thapen. TFNP Intersections Through the Lens of Feasible Disjunction . In Venkatesan Guruswami, editor, 15th Innovations in Theoretical Computer Science Conference (ITCS 2024) , volume 287 of Leibniz International Proceedings in Informatics (LIPIcs) , pages 63:1--63:24, Dagstuhl, Germany, 2024. Schloss Dagstuhl -...

work page doi:10.4230/lipics.itcs.2024.63 2024
[13]

Proof complexity of natural formulas via communication arguments

Dmitry Itsykson and Artur Riazanov. Proof Complexity of Natural Formulas via Communication Arguments . In Valentine Kabanets, editor, 36th Computational Complexity Conference (CCC 2021) , volume 200 of Leibniz International Proceedings in Informatics (LIPIcs) , pages 3:1--3:34, Dagstuhl, Germany, 2021. Schloss Dagstuhl -- Leibniz-Zentrum f \"u r Informati...

work page doi:10.4230/lipics.ccc.2021.3 2021
[14]

Resolution over linear equations modulo two

Dmitry Itsykson and Dmitry Sokolov. Resolution over linear equations modulo two. Annals of Pure and Applied Logic , 171(1):102722, 2020. https://doi.org/10.1016/j.apal.2019.102722 doi:10.1016/j.apal.2019.102722

work page doi:10.1016/j.apal.2019.102722 2020
[15]

Kothari and Peter Manohar

Pravesh K. Kothari and Peter Manohar. An exponential lower bound for linear 3-query locally correctable codes. In Bojan Mohar, Igor Shinkar, and Ryan O'Donnell, editors, Proceedings of the 56th Annual ACM Symposium on Theory of Computing, STOC 2024, Vancouver, BC, Canada, June 24-28, 2024 , pages 776--787. ACM , 2024. https://doi.org/10.1145/3618260.36496...

work page doi:10.1145/3618260.3649640 2024
[16]

Improved Distance (Sensitivity) Oracles with Subquadratic Space

Pravesh K. Kothari and Peter Manohar. Exponential lower bounds for smooth 3- LCC s and sharp bounds for designs. In 65th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2024, Chicago, IL, USA, October 27-30, 2024 , pages 1802--1845. IEEE , 2024. https://doi.org/10.1109/FOCS61266.2024.00110 doi:10.1109/FOCS61266.2024.00110

work page doi:10.1109/focs61266.2024.00110 2024
[17]

Bounds on the Total Coefficient Size of Nullstellensatz Proofs of the Pigeonhole Principle

Aaron Potechin and Aaron Zhang. Bounds on the Total Coefficient Size of Nullstellensatz Proofs of the Pigeonhole Principle . In Karl Bringmann, Martin Grohe, Gabriele Puppis, and Ola Svensson, editors, 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024) , volume 297 of Leibniz International Proceedings in Informatics (LIPIcs...

work page doi:10.4230/lipics.icalp.2024.117 2024
[18]

Robert A. Reckhow. On the Lengths of Proofs in the Propositional Calculus . PhD thesis, University of Toronto, 1975. https://doi.org/1807/100390 doi:1807/100390

1975
[19]

1972 , _month = jul, journal =

N Sauer. On the density of families of sets. Journal of Combinatorial Theory, Series A , 13(1):145--147, 1972. https://doi.org/10.1016/0097-3165(72)90019-2 doi:10.1016/0097-3165(72)90019-2

work page doi:10.1016/0097-3165(72)90019-2 1972
[20]

A combinatorial problem; stability and order for models and theories in infinitary languages

Saharon Shelah. A combinatorial problem; stability and order for models and theories in infinitary languages. Pacific Journal of Mathematics , 41(1):247 -- 261, 1972. https://doi.org/10.2140/pjm.1972.41.247 doi:10.2140/pjm.1972.41.247

work page doi:10.2140/pjm.1972.41.247 1972
[21]

Understanding Machine Learning

Shai Shalev-Shwartz and Shai Ben-David. Understanding Machine Learning: From Theory to Algorithms . Cambridge University Press, 2014. https://doi.org/10.1017/CBO9781107298019 doi:10.1017/CBO9781107298019

work page doi:10.1017/cbo9781107298019 2014