arxiv: 2604.08691 · v2 · submitted 2026-04-09 · 🧮 math.ST · cs.CC· math.PR· stat.TH

Recognition: unknown

Planted clique detection and recovery from the hypergraph adjacency matrix

Kalle Alaluusua , B. R. Vinay Kumar

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:47 UTC · model grok-4.3

classification 🧮 math.ST cs.CCmath.PRstat.TH

keywords planted cliquehypergraphadjacency matrixspectral detectionspectral recoveryleave-one-outrandom hypergraphs

0 comments

The pith

Spectral methods detect and recover planted cliques in hypergraphs from only the pairwise adjacency matrix at the √n scale.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies the planted clique problem when the input is restricted to the adjacency matrix formed by counting how many hyperedges contain each pair of nodes. It establishes that a test based on the largest singular value of this matrix can asymptotically distinguish the presence of a planted clique of size roughly √n from a pure random hypergraph, with the threshold expressed explicitly in terms of the background hyperedge probability p. A polynomial-time algorithm that extracts the leading eigenvector of the same matrix is shown to recover the exact clique set above the same threshold. Both results extend to regimes in which p may decrease with n. The proofs adapt a leave-one-out eigenvector perturbation argument to the dependent entries that arise because each hyperedge increments multiple matrix entries.

Core claim

In a random hypergraph on n nodes containing a planted clique of size k, where each possible hyperedge appears independently with probability p outside the clique, and where the only observed data is the n-by-n matrix A whose (i,j) entry equals the number of hyperedges containing both i and j, the spectral norm of A yields an asymptotically powerful test for the presence of the clique whenever k exceeds a constant multiple of √n that depends on p; moreover, the leading eigenvector of A produces exact recovery of the clique vertices for the same scaling of k.

What carries the argument

The leading eigenvector of the projected adjacency matrix, whose concentration and separation from the bulk are controlled by a leave-one-out analysis that handles the linear dependence among matrix entries induced by shared hyperedges.

If this is right

Detection and recovery remain possible after the hypergraph is projected to its pairwise count matrix, so the full higher-order incidence information is not required.
The √n threshold for exact recovery is achievable by a polynomial-time method even when matrix entries are dependent.
The same guarantees continue to hold when the background probability p is allowed to depend on n, covering sparse hypergraph regimes.
The leave-one-out technique extends directly to other random matrices whose entries are sums over overlapping random sets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar spectral guarantees may apply to community detection or other planted structures when only pairwise projections of hypergraph data are available.
Algorithms that deliberately discard higher-order hyperedge information could still succeed for clique recovery tasks at moderate densities.
The explicit dependence on p supplies a practical rule for setting detection thresholds once background density is estimated from data.

Load-bearing premise

Hyperedges outside the planted clique appear independently with probability p, and the analyst receives only the matrix of pairwise counts rather than the full list of hyperedges.

What would settle it

A simulation in which the clique size is set to (c-ε)√n for the explicit constant c derived in the paper and p fixed; the spectral-norm test should then have detection power approaching the false-alarm rate, and the eigenvector method should fail to output the exact planted set with high probability.

read the original abstract

Hypergraph data are often projected onto a weighted graph by constructing an adjacency matrix whose $(i,j)$ entry counts the number of hyperedges containing both nodes $i$ and $j$. This reduction is computationally convenient, but it can lose information: distinct hypergraphs may induce the same matrix, and the matrix entries are generally dependent because each hyperedge contributes to multiple pairs. We study the planted clique problem under this matrix-only observation model. For detection, we show that a spectral norm test is asymptotically powerful at the $\sqrt{n}$ scale, with explicit dependence on the background hyperedge probability $p$. For recovery, we analyze a polynomial-time spectral method based on the leading eigenvector and prove exact recovery at the canonical $\sqrt{n}$ scale, again with explicit dependence on $p$. We also extend both results to sparse regimes in which the hyperedge probability may depend on $n$. Our analysis adapts a leave--one--out eigenvector framework to this setting. These results provide rigorous detection and recovery guarantees when only the adjacency matrix is observed.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper adapts leave-one-out eigenvector methods to deliver sqrt(n) detection and exact recovery guarantees for planted cliques when only the projected pairwise count matrix is observed, with explicit p dependence.

read the letter

The paper's main result is that a spectral norm test detects the planted clique at the sqrt(n) scale with power depending on p, and a leading eigenvector method achieves exact recovery at the same scale, all under the matrix observation model from the hypergraph. They also cover cases where p depends on n in sparse regimes. What is new is the adaptation of the leave-one-out eigenvector framework specifically to the dependence structure from hyperedges. In the standard graph planted clique, entries are independent, but here each hyperedge adds to several A_ij, creating correlations. The paper claims to handle this while keeping the canonical threshold. It does a good job providing explicit dependence on p, which is useful for understanding how the background noise affects the signal. The polynomial-time nature of the recovery method is a plus for practicality. The soft spots center on the technical handling of dependence. The concern about the leave-one-out perturbation not being a simple low-rank update is reasonable. When you remove node i, the changes to A are correlated across all pairs that were in hyperedges with i. If the bounds on the operator norm of the difference do not account for the hyperedge size or the expected number of such hyperedges, the proof that the eigenvector stays delocalized or separated might have gaps, particularly when p is small or hyperedges are large. The abstract does not show the derivation, so the strength rests on whether those extra correlation terms are controlled without worsening the threshold. Overall, this is aimed at theorists working on random hypergraph models and spectral algorithms for community detection. Someone looking for guarantees in projected data settings would get concrete thresholds to compare against. I would recommend sending this to peer review. The modeling choice is relevant, the methods are standard but extended, and the results are stated clearly enough for referees to assess the adaptation.

Referee Report

2 major / 2 minor

Summary. The manuscript studies the planted clique problem in hypergraphs under the observation model where only the pairwise count matrix A is available, with A_ij counting the number of hyperedges containing both i and j. It claims that a spectral norm test is asymptotically powerful for detection at the √n scale with explicit dependence on the background hyperedge probability p, and that a polynomial-time method based on the leading eigenvector of A achieves exact recovery at the same scale (again with explicit p-dependence). Both results are extended to sparse regimes p = p(n), using an adaptation of the leave-one-out eigenvector framework to handle the induced dependencies in A.

Significance. If the central claims hold, the results are significant because they provide the first rigorous detection and recovery guarantees for planted cliques when only the projected pairwise adjacency matrix is observed, a model that arises naturally in applications where full hypergraph data is unavailable or too large. The explicit p-dependence and sparse-regime extensions increase practical relevance, and the technical adaptation of leave-one-out analysis to this dependent-matrix setting would be a useful contribution to the literature on spectral methods for structured random matrices.

major comments (2)

[recovery analysis / leave-one-out framework] Abstract and recovery section: The exact-recovery claim for the leading-eigenvector method at the √n scale rests on an adapted leave-one-out analysis. However, because each hyperedge of size greater than 2 contributes to multiple entries of A, the difference A - A^{(-i)} induces correlated perturbations across all pairs sharing a hyperedge with i. Standard leave-one-out perturbation bounds (typically relying on near-independence or controlled Lipschitz constants) may not directly apply; if the paper's operator-norm or delocalization lemmas do not explicitly tighten for this correlation structure (scaling with hyperedge size and degree), the eigenvector separation from the bulk could fail to hold at the claimed threshold, particularly when p(n) is not vanishing fast enough.
[detection analysis / spectral norm test] Detection section: The spectral-norm test is asserted to be asymptotically powerful at the √n scale with explicit p-dependence. The proof must derive the limiting distribution or concentration of ||A|| under the null and alternative while accounting for the dependence among matrix entries induced by shared hyperedges; without a concrete bound on the variance or tail of the norm that incorporates the hyperedge size, the power claim at the information-theoretic threshold remains unverified.

minor comments (2)

[abstract] The abstract contains a typographical double dash in 'leave--one--out'; standardize to 'leave-one-out' throughout.
[introduction / model definition] Notation for the hyperedge size (presumably k) and the precise definition of the planted clique model should be introduced earlier and used consistently when stating the thresholds.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments. The points raised concern the handling of entry-wise dependencies in A induced by hyperedges, which our analysis addresses through tailored concentration bounds. We provide clarifications below and will revise the manuscript to make these aspects more explicit.

read point-by-point responses

Referee: [recovery analysis / leave-one-out framework] Abstract and recovery section: The exact-recovery claim for the leading-eigenvector method at the √n scale rests on an adapted leave-one-out analysis. However, because each hyperedge of size greater than 2 contributes to multiple entries of A, the difference A - A^{(-i)} induces correlated perturbations across all pairs sharing a hyperedge with i. Standard leave-one-out perturbation bounds (typically relying on near-independence or controlled Lipschitz constants) may not directly apply; if the paper's operator-norm or delocalization lemmas do not explicitly tighten for this correlation structure (scaling with hyperedge size and degree), the eigenvector separation from the bulk could fail to hold at the claimed threshold, particularly when p(n) is not vanishing fast enough.

Authors: We thank the referee for highlighting the correlation structure. Our adapted leave-one-out analysis in Section 4 explicitly accounts for this by bounding the perturbation ||A - A^{(-i)}|| via a decomposition that groups entries affected by the same hyperedge. We apply a matrix Bernstein-type inequality to the sum of dependent rank-1 updates corresponding to hyperedges incident to i, with the variance proxy incorporating the maximum number of shared hyperedges per pair (controlled by the degree and hyperedge size k). This yields an operator-norm bound of order O(√(n p log n) + k p n) with high probability, which is o(1) relative to the √n signal strength at the detection threshold. The delocalization of the leading eigenvector then follows from a standard Davis-Kahan argument with this perturbation. We will add a dedicated paragraph in the proof of Lemma 4.3 explicitly stating the dependence on k and the correlation control. revision: partial
Referee: [detection analysis / spectral norm test] Detection section: The spectral-norm test is asserted to be asymptotically powerful at the √n scale with explicit p-dependence. The proof must derive the limiting distribution or concentration of ||A|| under the null and alternative while accounting for the dependence among matrix entries induced by shared hyperedges; without a concrete bound on the variance or tail of the norm that incorporates the hyperedge size, the power claim at the information-theoretic threshold remains unverified.

Authors: The detection proof in Section 3 derives the required concentration by viewing A as a sum over hyperedges of dependent matrix-valued random variables. We compute the variance of the quadratic form underlying ||A|| by enumerating the covariance contributions from pairs of pairs that share a hyperedge (probability scaling as p times a combinatorial factor in k). This produces an explicit variance bound of O(n^2 p + n^3 p^2 k), leading to sub-Gaussian tails via a matrix Freedman inequality adapted to the dependence. Under the null, ||A|| concentrates around its mean at scale o(√n), while under the alternative the mean shifts by Θ(√n), establishing asymptotic power with the stated p-dependence. The hyperedge size k enters the constants but does not alter the √n threshold. We will insert an explicit corollary after Theorem 3.1 stating the variance and tail bounds in terms of k and p. revision: partial

Circularity Check

0 steps flagged

No circularity; derivation adapts external leave-one-out framework to hypergraph dependence without self-referential reduction

full rationale

The paper states explicit detection and recovery thresholds derived from the random hypergraph generative model and an adapted leave-one-out eigenvector analysis on the observed pairwise count matrix. No steps reduce by construction to fitted inputs, self-definitions, or load-bearing self-citations; the dependence structure is handled directly in the perturbation bounds rather than assumed away or renamed from prior results. The analysis is self-contained against the model assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on a standard random hypergraph generative model with a planted clique and on the construction of the adjacency matrix from hyperedge counts; no free parameters are fitted inside the proofs and no new entities are postulated.

axioms (1)

domain assumption Each possible hyperedge is included independently with probability p outside the planted clique.
This is the standard generative model for the planted clique problem adapted to hypergraphs.

pith-pipeline@v0.9.0 · 5488 in / 1245 out tokens · 64291 ms · 2026-05-10T16:47:54.437379+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

37 extracted references · 8 canonical work pages · 2 internal anchors

[1]

E. Abbe, J. Fan, K. Wang, and Y. Zhong. Entrywise eigenvector analysis of random matrices with low expected rank.The Annals of Statistics, 48(3):1452–1474, 2020

2020
[2]

Consistent line clustering using geometric hypergraphs

K. Alaluusua, K. Avrachenkov, B. R. Vinay Kumar, and L. Leskelä. Consistent line clustering using geometric hypergraphs, 2025. arXiv preprint arXiv:2505.24868

work page internal anchor Pith review Pith/arXiv arXiv 2025
[3]

N. Alon, M. Krivelevich, and B. Sudakov. Finding a large hidden clique in a random graph. Random Structures & Algorithms, 13(3–4):457–466, 1998

1998
[4]

Planted clique recovery in random geometric graphs

K. Avrachenkov, A. Bobu, N. Litvak, and R. Michielan. Planted clique recovery in random geometric graphs, 2025. arXiv preprint arXiv:2510.12365

work page internal anchor Pith review Pith/arXiv arXiv 2025
[5]

Battiston, E

F. Battiston, E. Amico, A. Barrat, G. Bianconi, G. Ferraz de Arruda, B. Franceschiello, I. Iacopini, S. Kéfi, V. Latora, Y. Moreno, et al. The physics of higher-order interactions in complex systems.Nature Physics, 17(10):1093–1098, 2021

2021
[6]

C. Bick, E. Gross, H. A. Harrington, and M. T. Schaub. What are higher-order networks? SIAM Review, 65(3):686–731, 2023

2023
[7]

Boccaletti, P

S. Boccaletti, P. De Lellis, C. I. Del Genio, K. Alfaro-Bittner, R. Criado, S. Jalan, and M. Romance. The structure and dynamics of networks with higher-order interactions. Physics Reports, 1018:1–64, 2023

2023
[8]

Bollobás and P

B. Bollobás and P. Erdős. Cliques in random graphs.Mathematical Proceedings of the Cambridge Philosophical Society, 80(3):419–427, 1976

1976
[9]

Bresler, C

G. Bresler, C. Guo, and Y. Polyanskiy. Thresholds for reconstruction of random hy- pergraphs from graph projections. InProceedings of the 37th Conference on Learning Theory, volume 247 ofProceedings of Machine Learning Research, pages 632–647, 2024. 37

2024
[10]

Partial and exact recovery of a random hypergraph from its graph projection.arXiv preprint arXiv:2502.14988, 2025

G.Bresler, C.Guo, Y.Polyanskiy, andA.Yao. Partialandexactrecoveryofarandom hypergraph from its graph projection, 2025. arXiv preprint arXiv:2502.14988

work page arXiv 2025
[11]

Chatterjee, K

U. Chatterjee, K. Huang, R. Karmakar, B. R. Vinay Kumar, G. Lugosi, N. Malhotra, A. Mandal, and M. A. Tarafdar. Detecting weighted hidden cliques, 2025. arXiv preprint arXiv:2506.21543

work page arXiv 2025
[12]

Z. Chen, E. Mossel, and I. Zadik. Almost-linear planted cliques elude the Metropolis process. Random Structures & Algorithms, 66(2):e21274, 2025

2025
[13]

Corinzia, P

L. Corinzia, P. Penna, W. Szpankowski, and J. Buhmann. Statistical and computa- tional thresholds for the plantedk-densest sub-hypergraph problem. InInternational Conference on Artificial Intelligence and Statistics, pages 11615–11640. PMLR, 2022

2022
[14]

S. Deng, S. Ling, and T. Strohmer. Strong consistency, graph laplacians, and the stochastic block model.Journal of Machine Learning Research, 22(117):1–44, 2021

2021
[15]

Dhawan, C

A. Dhawan, C. Mao, and A. S. Wein. Detection of dense subhypergraphs by low- degree polynomials. Random Structures & Algorithms, 66(1):e21279, 2025

2025
[16]

Dumitriu and H.-X

I. Dumitriu and H.-X. Wang. Optimal and exact recovery on the general nonuniform hypergraph stochastic block model.The Annals of Statistics, 54(1):48–73, 2026

2026
[17]

Dumitriu, H.-X

I. Dumitriu, H.-X. Wang, and Y. Zhu. Partial recovery and weak consistency in the non-uniform hypergraph stochastic block model.Combinatorics, Probability and Computing, 34(1):1–51, 2025

2025
[18]

Gamarnik and I

D. Gamarnik and I. Zadik. The landscape of the planted clique problem: dense sub- graphs and the overlap gap property.The Annals of Applied Probability, 34(4):3375– 3434, 2024

2024
[19]

Gaudio and N

J. Gaudio and N. Joshi. Community detection in the hypergraph SBM: exact recov- ery given the similarity matrix. InProceedings of the 36th Conference on Learning Theory, pages 469–510, 2023

2023
[20]

Guruswami, P

V. Guruswami, P. K. Kothari, and P. Manohar. Bypassing the XOR trick: stronger certificates for hypergraph clique number. In Approximation, Random- ization, and Combinatorial Optimization. Algorithms and Techniques (APPROX- /RANDOM 2022), volume 245 ofLeibniz International Proceedings in Informatics (LIPIcs), pages 42:1–42:7. Schloss Dagstuhl – Leibniz-Ze...

2022
[21]

Hajek, Y

B. Hajek, Y. Wu, and J. Xu. Information limits for recovering a hidden community. IEEE Transactions on Information Theory, 63(8):4729–4745, 2017

2017
[22]

Hirahara and N

S. Hirahara and N. Shimizu. Planted clique conjectures are equivalent. InProceedings of the 56th Annual ACM Symposium on Theory of Computing, pages 358–366, 2024

2024
[23]

Kothari, A

P. Kothari, A. Louis, R. Paul, and P. Raghavendra. Improved certificates for indepen- dence number in semirandom hypergraphs, 2026. arXiv preprint arXiv:2603.08693

work page arXiv 2026
[24]

LaRock and R

T. LaRock and R. Lambiotte. Exploring the non-uniqueness of node co-occurrence matrices of hypergraphs, 2025. arXiv preprint arXiv:2506.01479. 38

work page arXiv 2025
[25]

J. Lee, D. Kim, and H. W. Chung. Robust hypergraph clustering via convex relax- ation of truncated MLE. IEEE Journal on Selected Areas in Information Theory, 1(3):613–631, 2020

2020
[26]

Lucas, L

M. Lucas, L. Gallo, A. Ghavasieh, F. Battiston, and M. De Domenico. Reducibility of higher-order networks from dynamics.Nature Communications, 17:1551, 2026

2026
[27]

Luo and A

Y. Luo and A. R. Zhang. Open problem: average-case hardness of hypergraphic planted clique detection. InProceedings of the 33rd Conference on Learning Theory, pages 3852–3856, 2020

2020
[28]

Luo and A

Y. Luo and A. R. Zhang. Tensor clustering with planted structures: statistical optimality and computational limits.The Annals of Statistics, 50(1):584–613, 2022

2022
[29]

Mardia, K

J. Mardia, K. A. Verchand, and A. S. Wein. Low-degree phase transitions for detect- ing a planted clique in sublinear time. InConference on Learning Theory, 2024

2024
[30]

Morgan and C

A. Morgan and C. Guo. Achievability of heterogeneous hypergraph recovery from its graph projection, 2026. arXiv preprint arXiv:2603.01268

work page arXiv 2026
[31]

J. A. Tropp. User-friendly tail bounds for sums of random matrices.Foundations of Computational Mathematics, 12(4):389–434, 2012

2012
[32]

Välimaa and L

I. Välimaa and L. Leskelä. Consistent spectral clustering in sparse tensor block models, 2025. arXiv preprint arXiv:2501.13820

work page arXiv 2025
[33]

Vershynin.High-dimensional probability: an introduction with applications in data science, volume 47

R. Vershynin.High-dimensional probability: an introduction with applications in data science, volume 47. Cambridge University Press, 2018

2018
[34]

Y. Yu, T. Wang, and R. J. Samworth. A useful variant of the davis–Kahan theorem for statisticians. Biometrika, 102(2):315–323, 2015

2015
[35]

Yuan and Z

M. Yuan and Z. Shang. Sharp detection boundaries on testing dense subhypergraph. Bernoulli, 28(4):2459–2491, 2022

2022
[36]

Zhang and D

A. Zhang and D. Xia. Tensor SVD: statistical and computational limits. IEEE Transactions on Information Theory, 64(11):7311–7338, 2018. 39 A Perturbation and concentration This appendix collects standard inequalities and auxiliary perturbation/concentration results used throughout the paper. A.1 Basic inequalities Cauchy–Schwarz inequality.Forx,y∈Rn, |⟨x,...

2018
[37]

SinceEout i ⊂Ei, it suffices to bound∑ e∈Eia2 e

(A.13) 44 Proof. SinceEout i ⊂Ei, it suffices to bound∑ e∈Eia2 e. For eache∈Ei, letTe :=e\{i}, so |Te|=d−1 and ae = ∑ j∈Tevj. By Cauchy–Schwarz (A.1),   ∑ j∈Te vj   2 ≤   ∑ j∈Te 12     ∑ j∈Te v2 j   = ( d−1) ∑ j∈Te v2 j. Summing over all(d−1)-subsetsTe⊂[n]\{i}and interchanging the order of summation, ∑ e∈Ei a2 e ≤(d−1) ∑ Te⊂[n]\{i} |Te|=d−1 ∑ ...