A Statistical Model for Motifs Detection

Andrea Montanari; Hamid Javadi

read the original abstract

We consider a statistical model for the problem of finding subgraphs with specified topology in an otherwise random graph. This task plays an important role in the analysis of social and biological networks. In these types of networks, small subgraphs with a specific structure have important functional roles, and they are referred to as `motifs.' Within this model, one or multiple copies of a subgraph is added (`planted') in an Erd\H{o}s-Renyi random graph with $n$ vertices and edge probability $q_0$. We ask whether the resulting graph can be distinguished reliably from a pure Erd\H{o}s-Renyi random graph, and we present two types of result. First we investigate the question from a purely statistical perspective, and ask whether there is any test that can distinguish between the two graph models. We provide necessary and sufficient conditions that are essentially tight for small enough subgraphs. Next we study two polynomial-time algorithms for solving the same problem: a spectral algorithm, and a semidefinite programming (SDP) relaxation. For the spectral algorithm, we establish sufficient conditions under which it distinguishes the two graph models with high probability. Under the same conditions the spectral algorithm indeed identifies the hidden subgraph. The spectral algorithm is substantially sub-optimal with respect to the optimal test. We show that a similar gap is present for the more sophisticated SDP approach.

A Statistical Model for Motifs Detection

discussion (0)