pith. sign in

arxiv: 2604.07278 · v1 · submitted 2026-04-08 · 💻 cs.CC

Multiple Planted Structures Below sqrt{n}: An SoS Integrality Gap and an SQ Lower Bound

Pith reviewed 2026-05-10 17:31 UTC · model grok-4.3

classification 💻 cs.CC
keywords sum-of-squaresintegrality gapplanted cliquestatistical querymulti-plant inferenceaverage-case hardnesscomputational lower bounds
0
0 comments X

The pith

Degree-d Sum-of-Squares cannot certify an upper bound below kt on the total size of t disjoint planted cliques in G(n,1/2) when kt ≤ n^{1/2 - c sqrt(d/log n)} for constant c>0.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper constructs a degree-d Sum-of-Squares pseudoexpectation for the natural relaxation that maximizes the total size of up to t disjoint cliques. In the regime kt ≤ n^{1/2 - c sqrt(d/log n)}, this pseudoexpectation achieves objective value kt(1-o(1)), showing that degree-d SoS cannot certify that no such collection exists. A complementary statistical query lower bound shows that no polynomial-time SQ algorithm can distinguish t disjoint planted kxk bicliques from the null distribution when kt = O(n^{1/2-δ}) for any fixed δ>0. These results extend single-planted hardness to the multi-plant setting while preserving an adjusted threshold below sqrt(n).

Core claim

For G ~ G(n,1/2), there exists a degree-d SoS pseudoexpectation for the natural relaxation maximizing the total size of up to t disjoint cliques such that the objective value is kt(1-o(1)) throughout the regime kt ≤ n^{1/2 - c sqrt(d/log n)} for a universal constant c>0. Therefore degree-d SoS cannot certify an upper bound below kt. As complementary evidence, no polynomial-time SQ algorithm can distinguish the planted distribution of t disjoint kxk bicliques (equivalently, a row-mixture distribution) from the null distribution with constant advantage when kt = O(n^{1/2-δ}) for any fixed δ>0.

What carries the argument

The degree-d SoS pseudoexpectation for the natural relaxation that maximizes the total size of up to t disjoint cliques, which satisfies all SoS constraints while attaining nearly the full planted objective kt(1-o(1)) in the stated regime.

If this is right

  • Degree-d SoS cannot certify an upper bound below kt for t disjoint planted cliques in the regime kt ≤ n^{1/2 - c sqrt(d/log n)}.
  • The integrality gap extends single-planted-clique SoS lower bounds to the multi-plant setting while incorporating explicit disjointness constraints.
  • No polynomial-time SQ algorithm can detect t disjoint planted kxk bicliques with constant advantage when kt = O(n^{1/2-δ}) for fixed δ>0.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Raising the SoS degree d only modestly pushes the failure threshold downward from sqrt(n), since the exponent depends on sqrt(d).
  • The same style of pseudoexpectation construction may produce integrality gaps for other multi-planted subgraph problems such as dense subgraphs or bicliques in the clique setting.
  • Relaxing the strict disjointness requirement on the planted structures could eliminate or shift the observed gap.

Load-bearing premise

The pseudoexpectation construction and SQ lower bound both require the background to be exactly G(n,1/2) and the planted structures to remain completely disjoint.

What would settle it

An explicit degree-d SoS proof that the maximum total size of t disjoint cliques is o(kt) on a graph known to contain t disjoint k-cliques with kt = n^{1/2 - c sqrt(d/log n)} / 2 would falsify the integrality-gap claim.

read the original abstract

We study computational limitations in \emph{multi-plant} average-case inference problems, in which $t$ disjoint planted structures of size $k$ are embedded in a random background on $n$ elements. A natural parameter in this setting is the total planted size $K := kt$. For several classic planted-subgraph problems, including planted clique, existing algorithmic and lower-bound evidence suggests a characteristic computational threshold near $\sqrt{n}$ in the single-plant setting. Our main result is a Sum-of-Squares (SoS) integrality gap for refuting the presence of multiple planted cliques. Specifically, for $G \sim G(n,1/2)$, we construct a degree-$d$ SoS pseudoexpectation for the natural relaxation that maximizes the total size of up to $t$ disjoint cliques. Throughout the regime $kt \le n^{1/2 - c\sqrt{d/\log n}},$ for a universal constant $c>0$, this relaxation achieves objective value $kt(1-o(1))$, and therefore degree-$d$ SoS cannot certify an upper bound below $kt$. This extends the planted-clique SoS lower bounds of~\cite{BarakHKKMP19} to a multi-plant setting with explicit disjointness constraints. As complementary evidence from a different computational model, we prove a lower bound in the statistical query (SQ) framework, extending the results of~\cite{FeldmanGRVX17}. We show that for detecting $t$ disjoint planted $k \times k$ bicliques (equivalently, a row-mixture distribution), when $kt = O(n^{1/2-\delta})$ for any fixed $\delta>0$, no polynomial-time SQ algorithm can distinguish the planted and null distributions with constant advantage.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript claims to establish an SoS integrality gap for refuting multiple disjoint planted cliques in the Erdős–Rényi graph G(n,1/2). Specifically, it constructs a degree-d pseudoexpectation that achieves an objective value of kt(1-o(1)) for the total size of t cliques of size k each, in the regime where kt ≤ n^{1/2 - c √(d/log n)} for a universal constant c > 0. This implies that degree-d SoS cannot certify an upper bound below kt. As a complementary result, it proves an SQ lower bound showing that no poly-time statistical query algorithm can detect t disjoint planted k×k bicliques with constant advantage when kt = O(n^{1/2-δ}) for any fixed δ > 0.

Significance. If the results hold, this paper makes a significant contribution by extending the known SoS lower bounds for planted clique (Barak et al.) to a multi-structure setting with disjointness constraints, and similarly extending SQ lower bounds. The explicit construction of the pseudoexpectation and the precise regime below the √n threshold provide strong evidence for the computational hardness of these multi-plant problems. The use of product-style or moment-matching arguments to handle multiple structures while preserving positivity is a technical strength that could inform future work on average-case hardness.

major comments (2)
  1. The extension of the single-clique pseudoexpectation from Barak et al. to multiple disjoint cliques via a product-style or moment-matching argument needs to explicitly verify that the disjointness constraints are satisfied at degree d. Specifically, the polynomials ensuring that the sum of vertex indicators across the t sets is at most 1 must be checked for the pseudoexpectation, and the error analysis for the total planted mass o(√n) should be detailed to ensure the moment matrix positivity holds throughout the regime kt ≤ n^{1/2 - c √(d/log n)}.
  2. The statistical query lower bound for the row-mixture distribution corresponding to t disjoint bicliques should provide a rigorous bound showing that the advantage is bounded away from any constant (i.e., o(1)) for polynomial query complexity, with explicit dependence on δ in the regime kt = O(n^{1/2-δ}).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their positive assessment of the manuscript and for the constructive major comments. We address each point below and will revise the manuscript accordingly to improve clarity.

read point-by-point responses
  1. Referee: The extension of the single-clique pseudoexpectation from Barak et al. to multiple disjoint cliques via a product-style or moment-matching argument needs to explicitly verify that the disjointness constraints are satisfied at degree d. Specifically, the polynomials ensuring that the sum of vertex indicators across the t sets is at most 1 must be checked for the pseudoexpectation, and the error analysis for the total planted mass o(√n) should be detailed to ensure the moment matrix positivity holds throughout the regime kt ≤ n^{1/2 - c √(d/log n)}.

    Authors: We appreciate the referee's request for greater explicitness on this technical point. The construction in Section 3 defines the degree-d pseudoexpectation as a product of single-clique pseudoexpectations (following Barak et al.) with additional moment-matching terms to enforce disjointness. The verification that the relevant polynomials (those enforcing that the sum of indicators over all t sets is at most 1) evaluate to at most 1 + o(1) under the pseudoexpectation is contained in the proof of Theorem 3.1 and Lemma 3.5; the cross terms are bounded using the assumption that the total planted mass K = kt lies below n^{1/2} by the factor n^{-c √(d/log n)}, which ensures the error remains small enough for the moment matrix to stay positive semidefinite. The o(√n) total-mass error is controlled throughout the stated regime. To address the comment directly, we will insert a new subsection (3.4) that isolates and expands these constraint checks and error bounds with additional intermediate lemmas. revision: yes

  2. Referee: The statistical query lower bound for the row-mixture distribution corresponding to t disjoint bicliques should provide a rigorous bound showing that the advantage is bounded away from any constant (i.e., o(1)) for polynomial query complexity, with explicit dependence on δ in the regime kt = O(n^{1/2-δ}).

    Authors: We agree that an explicit statement of the o(1) advantage bound and its dependence on δ strengthens the presentation. Theorem 4.1 already establishes that any statistical query algorithm making at most n^{O(1)} queries has distinguishing advantage at most exp(-Ω(n^δ / polylog n)), which is o(1) for any fixed δ > 0 as n → ∞. The exponent's dependence on δ follows directly from the correlation analysis of the row-mixture distribution (Lemma 4.3). We will revise the theorem statement, add a corollary that isolates the o(1) claim, and include a short paragraph in Section 4.2 that makes the δ-dependence explicit in the final bound. revision: yes

Circularity Check

0 steps flagged

No significant circularity; explicit constructions are independent

full rationale

The paper's SoS integrality gap is established by an explicit degree-d pseudoexpectation construction over G(n,1/2) that maximizes the sum of up to t disjoint cliques while satisfying the disjointness polynomials at degree d. This extends the single-clique pseudoexpectation of Barak et al. (2019) via a product-style moment-matching argument that preserves positivity and low-degree marginals, with the total planted mass o(sqrt(n)) ensuring the constraints hold; the construction does not reduce to any fitted parameter renamed as a prediction, nor to a self-citation of the authors' own prior results. The complementary SQ lower bound similarly proceeds by an explicit argument distinguishing the planted row-mixture distribution from the null when kt = O(n^{1/2-δ}), extending Feldman et al. (2017) without load-bearing self-citations or ansatz smuggling. All load-bearing steps are direct mathematical constructions against external benchmarks and remain self-contained.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claims rest on the standard Erdős–Rényi random-graph model and the definition of the degree-d SoS hierarchy; no new entities are postulated and the only free parameters are the universal constants c and delta that define the sub-sqrt(n) regimes.

free parameters (2)
  • c
    Universal positive constant appearing in the exponent of the SoS gap regime kt ≤ n^{1/2 - c sqrt(d/log n)}
  • delta
    Fixed positive constant in the SQ lower-bound regime kt = O(n^{1/2 - delta})
axioms (2)
  • domain assumption G is drawn from the Erdős–Rényi model G(n, 1/2)
    Standard null model for planted-clique problems invoked throughout the abstract
  • standard math The degree-d Sum-of-Squares hierarchy is the standard semidefinite relaxation for polynomial optimization over the clique indicator variables
    Background fact from the SoS literature used to define the pseudoexpectation

pith-pipeline@v0.9.0 · 5640 in / 1725 out tokens · 47338 ms · 2026-05-10T17:31:40.192304+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages

  1. [1]

    Cryptography from planted graphs: Security with logarithmic-size messages

    Damiano Abram, Amos Beimel, Yuval Ishai, Eyal Kushilevitz, and Varun Narayanan. Cryptography from planted graphs: Security with logarithmic-size messages. In Guy N. Rothblum and Hoeteck Wee, editors,Theory of Cryptography - 21st International Conference, TCC 2023, Taipei, Taiwan, November 29 - December 2, 2023, Proceedings, Part I, volume 14369 ofLecture ...

  2. [2]

    Public-key cryptography from different as- sumptions

    Benny Applebaum, Boaz Barak, and Avi Wigderson. Public-key cryptography from different as- sumptions. In Leonard J. Schulman, editor,Proceedings of the 42nd ACM Symposium on Theory of Computing, STOC 2010, Cambridge, Massachusetts, USA, 5-8 June 2010, pages 171–180. ACM, 2010

  3. [3]

    Hopkins, Jonathan A

    Boaz Barak, Samuel B. Hopkins, Jonathan A. Kelner, Pravesh K. Kothari, Ankur Moitra, and Aaron Potechin. A nearly tight sum-of-squares lower bound for the planted clique problem.SIAM J. Comput., 48(2):687–735, 2019

  4. [4]

    Furst, Jeffrey C

    Avrim Blum, Merrick L. Furst, Jeffrey C. Jackson, Michael J. Kearns, Yishay Mansour, and Steven Rudich. Weakly learning DNF and characterizing statistical query learning using fourier analysis. In Frank Thomson Leighton and Michael T. Goodrich, editors,Proceedings of the Twenty-Sixth Annual ACM Symposium on Theory of Computing, 23-25 May 1994, Montr´eal, ...

  5. [5]

    Sta- tistical query algorithms and low-degree tests are almost equivalent.arXiv preprint arXiv:2009.06107, 2020

    Matthew S. Brennan, Guy Bresler, Samuel B. Hopkins, Jerry Li, and Tselil Schramm. Statistical query algorithms and low-degree tests are almost equivalent.CoRR, abs/2009.06107, 2020

  6. [6]

    Almost-linear planted cliques elude the metropolis process.Random Struct

    Zongchen Chen, Elchanan Mossel, and Ilias Zadik. Almost-linear planted cliques elude the metropolis process.Random Struct. Algorithms, 66(2), 2025

  7. [7]

    On the hardness of signaling

    Shaddin Dughmi. On the hardness of signaling. InProceedings of the 55th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 354–363. IEEE, 2014

  8. [8]

    Finding and certifying a large hidden clique in a semirandom graph.Random Struct

    Uriel Feige and Robert Krauthgamer. Finding and certifying a large hidden clique in a semirandom graph.Random Struct. Algorithms, 16(2):195–208, 2000

  9. [9]

    Vempala, and Ying Xiao

    Vitaly Feldman, Elena Grigorescu, Lev Reyzin, Santosh S. Vempala, and Ying Xiao. Statistical algorithms and a lower bound for detecting planted cliques.Journal of the ACM, 64(2):8:1–8:37, April 2017

  10. [10]

    Holland, Kathryn Blackmond Laskey, and Samuel Leinhardt

    Paul W. Holland, Kathryn Blackmond Laskey, and Samuel Leinhardt. Stochastic blockmodels: First steps.Social Networks, 5(2):109–137, 1983

  11. [11]

    Large cliques elude the metropolis process.Random Structures & Algorithms, 3(4): 347–360, 1992

    Mark Jerrum. Large cliques elude the metropolis process.Random Structures & Algorithms, 3(4): 347–360, 1992

  12. [12]

    Hiding cliques for cryptographic security.Des

    Ari Juels and Marcus Peinado. Hiding cliques for cryptographic security.Des. Codes Cryptogr., 20(3): 269–280, 2000

  13. [13]

    Efficient noise-tolerant learning from statistical queries.Journal of the ACM, 45(6): 983–1006, 1998

    Michael Kearns. Efficient noise-tolerant learning from statistical queries.Journal of the ACM, 45(6): 983–1006, 1998

  14. [14]

    Vempala, Alexander S

    Pravesh Kothari, Santosh S. Vempala, Alexander S. Wein, and Jeff Xu. Is planted coloring easier than planted clique? In Gergely Neu and Lorenzo Rosasco, editors,The Thirty Sixth Annual Conference on Learning Theory, COLT 2023, 12-15 July 2023, Bangalore, India, volume 195 ofProceedings of Machine Learning Research, pages 5343–5372. PMLR, 2023. 12

  15. [15]

    Expected complexity of graph partitioning problems.Discret

    Ludek Kucera. Expected complexity of graph partitioning problems.Discret. Appl. Math., 57(2-3): 193–212, 1995

  16. [16]

    Sum-of-squares lower bounds for planted clique

    Raghu Meka, Aaron Potechin, and Avi Wigderson. Sum-of-squares lower bounds for planted clique. In Rocco A. Servedio and Ronitt Rubinfeld, editors,Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, STOC 2015, Portland, OR, USA, June 14-17, 2015, pages 87–96. ACM, 2015

  17. [17]

    Statistical queries and statistical algorithms: Foundations and applications.arXiv preprint arXiv:2004.00557, 2020

    Lev Reyzin. Statistical queries and statistical algorithms: Foundations and applications.arXiv preprint arXiv:2004.00557, 2020. A Technical lemmas for Section 2 A.1 Coefficients, calibration, and constraints proofs proof of Lemma 3.It suffices to check monomialsp=X M ′. Let M:=M ′ ∪ {(u, r),(v, r)} so thatX M =x u,rxv,rXM ′ ande={u, v} ∈ E M. ConsiderF(G)...