pith. machine review for the scientific record. sign in

arxiv: 2505.24868 · v3 · submitted 2025-05-30 · 🧮 math.ST · stat.ML· stat.TH

Recognition: unknown

Consistent line clustering using geometric hypergraphs

Authors on Pith no claims yet
classification 🧮 math.ST stat.MLstat.TH
keywords latentintersectionnearpointsboundsclusteringcollinearexact
0
0 comments X
read the original abstract

Subspace clustering becomes inherently difficult near intersections, where points from different subspaces are barely separated. Most existing theoretical results address this issue by imposing separation or sampling assumptions that limit the statistical effect of points near the intersection. We study a minimal setting of two intersecting lines in which the latent sampling law places polynomially large mass in small neighborhoods of the intersection. We derive information-theoretic lower bounds for exact and almost exact recovery under Gaussian noise. In particular, we show that the exact-recovery threshold is determined by the rate at which the latent law concentrates near the intersection. Since any two points are collinear, pairwise information alone does not reveal whether they are sampled from the same latent line. We therefore construct a hypergraph in which nearly collinear triples form hyperedges, and study the resulting hypergraph similarity matrix. Under a simple regularity condition on the latent distribution, we introduce a spectral algorithm that achieves the information-theoretic bounds up to polylogarithmic factors.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Planted clique detection and recovery from the hypergraph adjacency matrix

    math.ST 2026-04 unverdicted novelty 6.0

    Spectral norm test and leading-eigenvector method achieve detection and exact recovery of planted cliques from hypergraph adjacency matrices at the sqrt(n) scale.