Multiple Planted Structures Below sqrt{n}: An SoS Integrality Gap and an SQ Lower Bound
Pith reviewed 2026-05-10 17:31 UTC · model grok-4.3
The pith
Degree-d Sum-of-Squares cannot certify an upper bound below kt on the total size of t disjoint planted cliques in G(n,1/2) when kt ≤ n^{1/2 - c sqrt(d/log n)} for constant c>0.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
For G ~ G(n,1/2), there exists a degree-d SoS pseudoexpectation for the natural relaxation maximizing the total size of up to t disjoint cliques such that the objective value is kt(1-o(1)) throughout the regime kt ≤ n^{1/2 - c sqrt(d/log n)} for a universal constant c>0. Therefore degree-d SoS cannot certify an upper bound below kt. As complementary evidence, no polynomial-time SQ algorithm can distinguish the planted distribution of t disjoint kxk bicliques (equivalently, a row-mixture distribution) from the null distribution with constant advantage when kt = O(n^{1/2-δ}) for any fixed δ>0.
What carries the argument
The degree-d SoS pseudoexpectation for the natural relaxation that maximizes the total size of up to t disjoint cliques, which satisfies all SoS constraints while attaining nearly the full planted objective kt(1-o(1)) in the stated regime.
If this is right
- Degree-d SoS cannot certify an upper bound below kt for t disjoint planted cliques in the regime kt ≤ n^{1/2 - c sqrt(d/log n)}.
- The integrality gap extends single-planted-clique SoS lower bounds to the multi-plant setting while incorporating explicit disjointness constraints.
- No polynomial-time SQ algorithm can detect t disjoint planted kxk bicliques with constant advantage when kt = O(n^{1/2-δ}) for fixed δ>0.
Where Pith is reading between the lines
- Raising the SoS degree d only modestly pushes the failure threshold downward from sqrt(n), since the exponent depends on sqrt(d).
- The same style of pseudoexpectation construction may produce integrality gaps for other multi-planted subgraph problems such as dense subgraphs or bicliques in the clique setting.
- Relaxing the strict disjointness requirement on the planted structures could eliminate or shift the observed gap.
Load-bearing premise
The pseudoexpectation construction and SQ lower bound both require the background to be exactly G(n,1/2) and the planted structures to remain completely disjoint.
What would settle it
An explicit degree-d SoS proof that the maximum total size of t disjoint cliques is o(kt) on a graph known to contain t disjoint k-cliques with kt = n^{1/2 - c sqrt(d/log n)} / 2 would falsify the integrality-gap claim.
read the original abstract
We study computational limitations in \emph{multi-plant} average-case inference problems, in which $t$ disjoint planted structures of size $k$ are embedded in a random background on $n$ elements. A natural parameter in this setting is the total planted size $K := kt$. For several classic planted-subgraph problems, including planted clique, existing algorithmic and lower-bound evidence suggests a characteristic computational threshold near $\sqrt{n}$ in the single-plant setting. Our main result is a Sum-of-Squares (SoS) integrality gap for refuting the presence of multiple planted cliques. Specifically, for $G \sim G(n,1/2)$, we construct a degree-$d$ SoS pseudoexpectation for the natural relaxation that maximizes the total size of up to $t$ disjoint cliques. Throughout the regime $kt \le n^{1/2 - c\sqrt{d/\log n}},$ for a universal constant $c>0$, this relaxation achieves objective value $kt(1-o(1))$, and therefore degree-$d$ SoS cannot certify an upper bound below $kt$. This extends the planted-clique SoS lower bounds of~\cite{BarakHKKMP19} to a multi-plant setting with explicit disjointness constraints. As complementary evidence from a different computational model, we prove a lower bound in the statistical query (SQ) framework, extending the results of~\cite{FeldmanGRVX17}. We show that for detecting $t$ disjoint planted $k \times k$ bicliques (equivalently, a row-mixture distribution), when $kt = O(n^{1/2-\delta})$ for any fixed $\delta>0$, no polynomial-time SQ algorithm can distinguish the planted and null distributions with constant advantage.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims to establish an SoS integrality gap for refuting multiple disjoint planted cliques in the Erdős–Rényi graph G(n,1/2). Specifically, it constructs a degree-d pseudoexpectation that achieves an objective value of kt(1-o(1)) for the total size of t cliques of size k each, in the regime where kt ≤ n^{1/2 - c √(d/log n)} for a universal constant c > 0. This implies that degree-d SoS cannot certify an upper bound below kt. As a complementary result, it proves an SQ lower bound showing that no poly-time statistical query algorithm can detect t disjoint planted k×k bicliques with constant advantage when kt = O(n^{1/2-δ}) for any fixed δ > 0.
Significance. If the results hold, this paper makes a significant contribution by extending the known SoS lower bounds for planted clique (Barak et al.) to a multi-structure setting with disjointness constraints, and similarly extending SQ lower bounds. The explicit construction of the pseudoexpectation and the precise regime below the √n threshold provide strong evidence for the computational hardness of these multi-plant problems. The use of product-style or moment-matching arguments to handle multiple structures while preserving positivity is a technical strength that could inform future work on average-case hardness.
major comments (2)
- The extension of the single-clique pseudoexpectation from Barak et al. to multiple disjoint cliques via a product-style or moment-matching argument needs to explicitly verify that the disjointness constraints are satisfied at degree d. Specifically, the polynomials ensuring that the sum of vertex indicators across the t sets is at most 1 must be checked for the pseudoexpectation, and the error analysis for the total planted mass o(√n) should be detailed to ensure the moment matrix positivity holds throughout the regime kt ≤ n^{1/2 - c √(d/log n)}.
- The statistical query lower bound for the row-mixture distribution corresponding to t disjoint bicliques should provide a rigorous bound showing that the advantage is bounded away from any constant (i.e., o(1)) for polynomial query complexity, with explicit dependence on δ in the regime kt = O(n^{1/2-δ}).
Simulated Author's Rebuttal
We thank the referee for their positive assessment of the manuscript and for the constructive major comments. We address each point below and will revise the manuscript accordingly to improve clarity.
read point-by-point responses
-
Referee: The extension of the single-clique pseudoexpectation from Barak et al. to multiple disjoint cliques via a product-style or moment-matching argument needs to explicitly verify that the disjointness constraints are satisfied at degree d. Specifically, the polynomials ensuring that the sum of vertex indicators across the t sets is at most 1 must be checked for the pseudoexpectation, and the error analysis for the total planted mass o(√n) should be detailed to ensure the moment matrix positivity holds throughout the regime kt ≤ n^{1/2 - c √(d/log n)}.
Authors: We appreciate the referee's request for greater explicitness on this technical point. The construction in Section 3 defines the degree-d pseudoexpectation as a product of single-clique pseudoexpectations (following Barak et al.) with additional moment-matching terms to enforce disjointness. The verification that the relevant polynomials (those enforcing that the sum of indicators over all t sets is at most 1) evaluate to at most 1 + o(1) under the pseudoexpectation is contained in the proof of Theorem 3.1 and Lemma 3.5; the cross terms are bounded using the assumption that the total planted mass K = kt lies below n^{1/2} by the factor n^{-c √(d/log n)}, which ensures the error remains small enough for the moment matrix to stay positive semidefinite. The o(√n) total-mass error is controlled throughout the stated regime. To address the comment directly, we will insert a new subsection (3.4) that isolates and expands these constraint checks and error bounds with additional intermediate lemmas. revision: yes
-
Referee: The statistical query lower bound for the row-mixture distribution corresponding to t disjoint bicliques should provide a rigorous bound showing that the advantage is bounded away from any constant (i.e., o(1)) for polynomial query complexity, with explicit dependence on δ in the regime kt = O(n^{1/2-δ}).
Authors: We agree that an explicit statement of the o(1) advantage bound and its dependence on δ strengthens the presentation. Theorem 4.1 already establishes that any statistical query algorithm making at most n^{O(1)} queries has distinguishing advantage at most exp(-Ω(n^δ / polylog n)), which is o(1) for any fixed δ > 0 as n → ∞. The exponent's dependence on δ follows directly from the correlation analysis of the row-mixture distribution (Lemma 4.3). We will revise the theorem statement, add a corollary that isolates the o(1) claim, and include a short paragraph in Section 4.2 that makes the δ-dependence explicit in the final bound. revision: yes
Circularity Check
No significant circularity; explicit constructions are independent
full rationale
The paper's SoS integrality gap is established by an explicit degree-d pseudoexpectation construction over G(n,1/2) that maximizes the sum of up to t disjoint cliques while satisfying the disjointness polynomials at degree d. This extends the single-clique pseudoexpectation of Barak et al. (2019) via a product-style moment-matching argument that preserves positivity and low-degree marginals, with the total planted mass o(sqrt(n)) ensuring the constraints hold; the construction does not reduce to any fitted parameter renamed as a prediction, nor to a self-citation of the authors' own prior results. The complementary SQ lower bound similarly proceeds by an explicit argument distinguishing the planted row-mixture distribution from the null when kt = O(n^{1/2-δ}), extending Feldman et al. (2017) without load-bearing self-citations or ansatz smuggling. All load-bearing steps are direct mathematical constructions against external benchmarks and remain self-contained.
Axiom & Free-Parameter Ledger
free parameters (2)
- c
- delta
axioms (2)
- domain assumption G is drawn from the Erdős–Rényi model G(n, 1/2)
- standard math The degree-d Sum-of-Squares hierarchy is the standard semidefinite relaxation for polynomial optimization over the clique indicator variables
Reference graph
Works this paper leans on
-
[1]
Cryptography from planted graphs: Security with logarithmic-size messages
Damiano Abram, Amos Beimel, Yuval Ishai, Eyal Kushilevitz, and Varun Narayanan. Cryptography from planted graphs: Security with logarithmic-size messages. In Guy N. Rothblum and Hoeteck Wee, editors,Theory of Cryptography - 21st International Conference, TCC 2023, Taipei, Taiwan, November 29 - December 2, 2023, Proceedings, Part I, volume 14369 ofLecture ...
work page 2023
-
[2]
Public-key cryptography from different as- sumptions
Benny Applebaum, Boaz Barak, and Avi Wigderson. Public-key cryptography from different as- sumptions. In Leonard J. Schulman, editor,Proceedings of the 42nd ACM Symposium on Theory of Computing, STOC 2010, Cambridge, Massachusetts, USA, 5-8 June 2010, pages 171–180. ACM, 2010
work page 2010
-
[3]
Boaz Barak, Samuel B. Hopkins, Jonathan A. Kelner, Pravesh K. Kothari, Ankur Moitra, and Aaron Potechin. A nearly tight sum-of-squares lower bound for the planted clique problem.SIAM J. Comput., 48(2):687–735, 2019
work page 2019
-
[4]
Avrim Blum, Merrick L. Furst, Jeffrey C. Jackson, Michael J. Kearns, Yishay Mansour, and Steven Rudich. Weakly learning DNF and characterizing statistical query learning using fourier analysis. In Frank Thomson Leighton and Michael T. Goodrich, editors,Proceedings of the Twenty-Sixth Annual ACM Symposium on Theory of Computing, 23-25 May 1994, Montr´eal, ...
work page 1994
-
[5]
Matthew S. Brennan, Guy Bresler, Samuel B. Hopkins, Jerry Li, and Tselil Schramm. Statistical query algorithms and low-degree tests are almost equivalent.CoRR, abs/2009.06107, 2020
-
[6]
Almost-linear planted cliques elude the metropolis process.Random Struct
Zongchen Chen, Elchanan Mossel, and Ilias Zadik. Almost-linear planted cliques elude the metropolis process.Random Struct. Algorithms, 66(2), 2025
work page 2025
-
[7]
Shaddin Dughmi. On the hardness of signaling. InProceedings of the 55th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 354–363. IEEE, 2014
work page 2014
-
[8]
Finding and certifying a large hidden clique in a semirandom graph.Random Struct
Uriel Feige and Robert Krauthgamer. Finding and certifying a large hidden clique in a semirandom graph.Random Struct. Algorithms, 16(2):195–208, 2000
work page 2000
-
[9]
Vitaly Feldman, Elena Grigorescu, Lev Reyzin, Santosh S. Vempala, and Ying Xiao. Statistical algorithms and a lower bound for detecting planted cliques.Journal of the ACM, 64(2):8:1–8:37, April 2017
work page 2017
-
[10]
Holland, Kathryn Blackmond Laskey, and Samuel Leinhardt
Paul W. Holland, Kathryn Blackmond Laskey, and Samuel Leinhardt. Stochastic blockmodels: First steps.Social Networks, 5(2):109–137, 1983
work page 1983
-
[11]
Large cliques elude the metropolis process.Random Structures & Algorithms, 3(4): 347–360, 1992
Mark Jerrum. Large cliques elude the metropolis process.Random Structures & Algorithms, 3(4): 347–360, 1992
work page 1992
-
[12]
Hiding cliques for cryptographic security.Des
Ari Juels and Marcus Peinado. Hiding cliques for cryptographic security.Des. Codes Cryptogr., 20(3): 269–280, 2000
work page 2000
-
[13]
Efficient noise-tolerant learning from statistical queries.Journal of the ACM, 45(6): 983–1006, 1998
Michael Kearns. Efficient noise-tolerant learning from statistical queries.Journal of the ACM, 45(6): 983–1006, 1998
work page 1998
-
[14]
Pravesh Kothari, Santosh S. Vempala, Alexander S. Wein, and Jeff Xu. Is planted coloring easier than planted clique? In Gergely Neu and Lorenzo Rosasco, editors,The Thirty Sixth Annual Conference on Learning Theory, COLT 2023, 12-15 July 2023, Bangalore, India, volume 195 ofProceedings of Machine Learning Research, pages 5343–5372. PMLR, 2023. 12
work page 2023
-
[15]
Expected complexity of graph partitioning problems.Discret
Ludek Kucera. Expected complexity of graph partitioning problems.Discret. Appl. Math., 57(2-3): 193–212, 1995
work page 1995
-
[16]
Sum-of-squares lower bounds for planted clique
Raghu Meka, Aaron Potechin, and Avi Wigderson. Sum-of-squares lower bounds for planted clique. In Rocco A. Servedio and Ronitt Rubinfeld, editors,Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, STOC 2015, Portland, OR, USA, June 14-17, 2015, pages 87–96. ACM, 2015
work page 2015
-
[17]
Lev Reyzin. Statistical queries and statistical algorithms: Foundations and applications.arXiv preprint arXiv:2004.00557, 2020. A Technical lemmas for Section 2 A.1 Coefficients, calibration, and constraints proofs proof of Lemma 3.It suffices to check monomialsp=X M ′. Let M:=M ′ ∪ {(u, r),(v, r)} so thatX M =x u,rxv,rXM ′ ande={u, v} ∈ E M. ConsiderF(G)...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.