Two provably consistent divide and conquer clustering algorithms for large networks

Peter J. Bickel; Purnamrita Sarkar; Soumendu Sundar Mukherjee

arxiv: 1708.05573 · v1 · pith:FAHXR3RDnew · submitted 2017-08-18 · 📊 stat.ML · math.ST· stat.CO· stat.ME· stat.TH

Two provably consistent divide and conquer clustering algorithms for large networks

Soumendu Sundar Mukherjee , Purnamrita Sarkar , Peter J. Bickel This is my paper

classification 📊 stat.ML math.STstat.COstat.MEstat.TH

keywords algorithmsclusteringdivide-and-conquermethodsnetworkstraditionalaccuracylarge

0 comments

read the original abstract

In this article, we advance divide-and-conquer strategies for solving the community detection problem in networks. We propose two algorithms which perform clustering on a number of small subgraphs and finally patches the results into a single clustering. The main advantage of these algorithms is that they bring down significantly the computational cost of traditional algorithms, including spectral clustering, semi-definite programs, modularity based methods, likelihood based methods etc., without losing on accuracy and even improving accuracy at times. These algorithms are also, by nature, parallelizable. Thus, exploiting the facts that most traditional algorithms are accurate and the corresponding optimization problems are much simpler in small problems, our divide-and-conquer methods provide an omnibus recipe for scaling traditional algorithms up to large networks. We prove consistency of these algorithms under various subgraph selection procedures and perform extensive simulations and real-data analysis to understand the advantages of the divide-and-conquer approach in various settings.

This paper has not been read by Pith yet.

Two provably consistent divide and conquer clustering algorithms for large networks

discussion (0)