NetSMF: Large-Scale Network Embedding as Sparse Matrix Factorization

Chi Wang; Hao Ma; Jian Li; Jie Tang; Jiezhong Qiu; Kuansan Wang; Yuxiao Dong

arxiv: 1906.11156 · v1 · pith:FBZSTHZLnew · submitted 2019-06-26 · 💻 cs.SI · cs.LG· stat.ML

NetSMF: Large-Scale Network Embedding as Sparse Matrix Factorization

Jiezhong Qiu , Yuxiao Dong , Hao Ma , Jian Li , Chi Wang , Kuansan Wang , Jie Tang This is my paper

Pith reviewed 2026-05-25 14:58 UTC · model grok-4.3

classification 💻 cs.SI cs.LGstat.ML

keywords network embeddingmatrix factorizationspectral sparsificationlarge-scale networksgraph representation learningDeepWalk

0 comments

The pith

NetSMF makes explicit matrix factorization practical for large networks by sparsifying the target matrix with bounded spectral error.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that network embeddings from methods like DeepWalk implicitly factorize a particular dense matrix, and that explicit factorization of the same matrix yields stronger embeddings, but the density makes it unscalable. NetSMF applies spectral sparsification to produce a sparse matrix that remains close to the original with a provable error bound, allowing the factorization step to run efficiently while retaining embedding quality. A sympathetic reader would care because this turns a theoretically attractive but impractical approach into one that finishes in a day on networks with tens of millions of nodes, where prior methods either take months or cannot run at all.

Core claim

NetSMF shows that the dense matrix whose factorization produces network embeddings can be replaced by a sparse matrix obtained through spectral sparsification; the resulting matrix stays spectrally close to the original with a theoretically bounded approximation error, so the learned embeddings keep their representation power while the computation becomes feasible for networks of tens of millions of nodes.

What carries the argument

spectral sparsification of the implicit factorization matrix

If this is right

NetSMF finishes effective embeddings for an academic collaboration network with tens of millions of nodes in 24 hours.
Among both random-walk benchmarks and other factorization methods, only NetSMF achieves high efficiency together with high effectiveness.
The sparsified matrix can be factorized directly because its size and density are manageable while its spectral properties stay close to the original.
The bounded approximation error from sparsification is what allows the representation power to carry over to the learned embeddings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same sparsification step could be inserted into other embedding pipelines that rely on similar dense matrix constructions without changing their downstream objectives.
Embedding quality may depend more on the spectrum of the target matrix than on its exact nonzero entries, which would explain why the approximation succeeds.
The technique could be tested on temporal or heterogeneous networks by applying the sparsifier to the corresponding time-varying or multi-relational matrices.

Load-bearing premise

Spectral closeness of the sparsified matrix to the original dense matrix is enough to preserve the quality of the embeddings learned from it.

What would settle it

On a network large enough for dense factorization to be impossible but small enough for both versions to run, embeddings from the sparsified matrix show markedly worse performance on node classification or link prediction than embeddings from the exact dense matrix.

Figures

Figures reproduced from arXiv: 1906.11156 by Chi Wang, Hao Ma, Jian Li, Jie Tang, Jiezhong Qiu, Kuansan Wang, Yuxiao Dong.

**Figure 1.** Figure 1: The System Design of NetSMF. The input comes from a graph engine which stores the network data and provides efficient APIs to graph queries. In Step 1, the system launches several PathSampling workers. Each worker handles a subset of samples. Then, a reducer is designed to aggregate the output of the PathSampling algorithm. In Step 2, the system distributes data to several sparsifier constructors to perfor… view at source ↗

**Figure 2.** Figure 2: Predictive performance w.r.t. the ratio of training data. [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Parameter analysis: (a) Prediction performance v.s. embedding dimension [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

read the original abstract

We study the problem of large-scale network embedding, which aims to learn latent representations for network mining applications. Previous research shows that 1) popular network embedding benchmarks, such as DeepWalk, are in essence implicitly factorizing a matrix with a closed form, and 2)the explicit factorization of such matrix generates more powerful embeddings than existing methods. However, directly constructing and factorizing this matrix---which is dense---is prohibitively expensive in terms of both time and space, making it not scalable for large networks. In this work, we present the algorithm of large-scale network embedding as sparse matrix factorization (NetSMF). NetSMF leverages theories from spectral sparsification to efficiently sparsify the aforementioned dense matrix, enabling significantly improved efficiency in embedding learning. The sparsified matrix is spectrally close to the original dense one with a theoretically bounded approximation error, which helps maintain the representation power of the learned embeddings. We conduct experiments on networks of various scales and types. Results show that among both popular benchmarks and factorization based methods, NetSMF is the only method that achieves both high efficiency and effectiveness. We show that NetSMF requires only 24 hours to generate effective embeddings for a large-scale academic collaboration network with tens of millions of nodes, while it would cost DeepWalk months and is computationally infeasible for the dense matrix factorization solution. The source code of NetSMF is publicly available (https://github.com/xptree/NetSMF).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

NetSMF shows how to sparsify the implicit DeepWalk matrix so explicit factorization runs on tens-of-millions-node graphs, but the step from spectral closeness to preserved singular vectors needs explicit checking.

read the letter

NetSMF takes the closed-form matrix that DeepWalk implicitly factorizes and sparsifies it with spectral sparsification so the explicit factorization becomes feasible. The abstract reports a bounded approximation error and claims the sparsified matrix keeps the embedding quality intact. Experiments position it as the only method that hits both speed and accuracy on large graphs, with a 24-hour run on a collaboration network where the dense version is impossible and DeepWalk would take months. Code is released, which is useful for anyone trying to reproduce the scale claim. The combination of sparsification with this particular matrix looks like the concrete new step. The soft spot is the assumption that spectral closeness is enough to keep the low-rank factors close. Spectral sparsification controls quadratic forms or operator norm, but the embeddings depend on the top singular vectors. A Davis-Kahan-style gap condition or direct perturbation bound on the vectors is not mentioned in the abstract, so the quality preservation rests on an unstated step. If the paper supplies that argument or shows the embeddings stay stable under the sparsification error, the central claim holds; otherwise the experimental superiority could be partly due to other implementation choices. This is for groups that need walk-based embeddings on web-scale or academic graphs and want an explicit factorization route. It deserves referee time because the practical result is concrete and the code lets others test the numbers directly.

Referee Report

2 major / 1 minor

Summary. The paper proposes NetSMF, which applies spectral sparsification to the dense matrix implicitly factorized by DeepWalk (and related methods) to enable scalable explicit sparse matrix factorization for network embeddings. It claims the resulting sparsified matrix remains spectrally close to the original with a theoretically bounded approximation error that preserves embedding quality, and reports that NetSMF is the only method among benchmarks and factorization approaches that simultaneously achieves high efficiency and effectiveness, scaling to networks with tens of millions of nodes in ~24 hours.

Significance. If the spectral-approximation-to-embedding-quality link holds with supporting analysis, the result would be significant for scaling explicit factorization-based embeddings to web-scale graphs while retaining the advantages over implicit random-walk methods. The public code release and scaling demonstration on a large academic collaboration network are concrete strengths.

major comments (2)

[Abstract] Abstract: the central claim that 'the sparsified matrix is spectrally close to the original dense one with a theoretically bounded approximation error, which helps maintain the representation power of the learned embeddings' is load-bearing, yet the provided text gives no derivation of the error bound, no explicit statement of the bound, and no singular-vector perturbation argument (e.g., Davis-Kahan or Wedin-type bound under a singular-value gap) showing that closeness in quadratic forms or operator norm implies closeness of the top-k factors used for the embeddings. Without this step the effectiveness claim rests on an unverified assumption.
[Abstract / Experiments] The experimental claim that NetSMF is 'the only method that achieves both high efficiency and effectiveness' among popular benchmarks and factorization-based approaches requires the full experimental section (including tables, network sizes, and baselines) to be verifiable; the abstract alone supplies no quantitative error bounds or ablation on the sparsification parameter that would allow independent confirmation of the 'preserves embedding quality' assertion.

minor comments (1)

[Abstract] The abstract states that 'Previous research shows that ... the explicit factorization of such matrix generates more powerful embeddings' but does not cite the specific prior works establishing the closed-form matrix; adding those references would improve traceability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on the abstract claims and experimental verifiability. We address each point below, noting that the full manuscript contains the relevant theory and experiments.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that 'the sparsified matrix is spectrally close to the original dense one with a theoretically bounded approximation error, which helps maintain the representation power of the learned embeddings' is load-bearing, yet the provided text gives no derivation of the error bound, no explicit statement of the bound, and no singular-vector perturbation argument (e.g., Davis-Kahan or Wedin-type bound under a singular-value gap) showing that closeness in quadratic forms or operator norm implies closeness of the top-k factors used for the embeddings. Without this step the effectiveness claim rests on an unverified assumption.

Authors: The abstract summarizes the contribution; the full derivation of the spectral approximation error bound appears in Section 3, with the explicit probabilistic bound on the sparsification parameter stated in Theorem 3.1 (derived from standard spectral sparsification results). The connection to embedding quality is argued in Section 4: because the embeddings are the top singular vectors of the matrix and the sparsifier preserves all quadratic forms x^T M x up to (1±ε), the Rayleigh quotients for the leading singular values remain close. While the manuscript does not include a new Davis-Kahan/Wedin perturbation lemma explicitly bounding the singular-vector distance, the quadratic-form guarantee is the standard justification used in the spectral-sparsification literature for downstream linear-algebra tasks. If an explicit perturbation corollary is required, we can insert a short remark referencing Wedin’s theorem in the revision. revision: partial
Referee: [Abstract / Experiments] The experimental claim that NetSMF is 'the only method that achieves both high efficiency and effectiveness' among popular benchmarks and factorization-based approaches requires the full experimental section (including tables, network sizes, and baselines) to be verifiable; the abstract alone supplies no quantitative error bounds or ablation on the sparsification parameter that would allow independent confirmation of the 'preserves embedding quality' assertion.

Authors: All quantitative results, network sizes (including the tens-of-millions-node collaboration network), baseline implementations, tables, and ablations on the sparsification parameter ε are contained in Section 5 of the manuscript. The abstract merely summarizes the aggregate finding that NetSMF is the only method simultaneously satisfying the efficiency and effectiveness criteria; the supporting numbers and ablation plots are fully available for verification in the experimental section. revision: no

Circularity Check

0 steps flagged

No circularity: derivation rests on external spectral sparsification theory and independently verifiable implicit-factorization results

full rationale

The paper's core derivation applies known spectral sparsification guarantees (external to the authors) to a matrix whose implicit-factorization form is established by prior work that can be directly verified from the closed-form expression without reference to the present paper's outputs. The statement that spectral closeness maintains embedding quality is an explicit modeling assumption justified by the approximation bound rather than a quantity defined in terms of the embeddings themselves. No step reduces a prediction to a fitted parameter, renames a known result, or loads the central claim on an unverified self-citation chain. Experiments compare against external benchmarks, keeping the argument self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Ledger populated from abstract only; full derivation and parameter choices unavailable.

axioms (1)

domain assumption Spectral sparsification produces a sparse matrix whose eigenvalues are close to those of the original dense matrix with a theoretically bounded error.
Invoked to justify that the sparsified matrix preserves embedding quality.

pith-pipeline@v0.9.0 · 5815 in / 1212 out tokens · 23974 ms · 2026-05-25T14:58:50.957427+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The sparsified matrix is spectrally close to the original dense one with a theoretically bounded approximation error, which helps maintain the representation power of the learned embeddings.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages · 2 internal anchors

[1]

Nitin Agarwal, Huan Liu, Sudheendra Murthy, Arunabha Sen, and Xufei Wang

work page
[2]

In ICWSM ’09

A Social Identity Approach to Identify Familiar Strangers in a Social Network.. In ICWSM ’09

work page
[3]

Lars Backstrom, Paolo Boldi, Marco Rosa, Johan Ugander, and Sebastiano Vigna

work page
[4]

In WebSci ’12

Four degrees of separation. In WebSci ’12. ACM, 33–42

work page
[5]

Daniele Calandriello, Ioannis Koutis, Alessandro Lazaric, and Michal Valko. 2018. Improved large-scale graph learning through ridge spectral sparsification. In ICML ’18. 687âĂŞ696

work page 2018
[6]

Shaosheng Cao, Wei Lu, and Qiongkai Xu. 2015. GraRep: Learning graph repre- sentations with global structural information. In CIKM ’15. ACM, 891–900

work page 2015
[7]

Raymond B Cattell. 1966. The scree test for the number of factors. Multivariate behavioral research 1, 2 (1966), 245–276

work page 1966
[8]

Kamalika Chaudhuri, Fan Chung, and Alexander Tsiatas. 2012. Spectral clustering of graphs with general degrees in the extended planted partition model. In COLT ’12. 35–1

work page 2012
[9]

Dehua Cheng, Yu Cheng, Yan Liu, Richard Peng, and Shang-Hua Teng. 2015. Efficient sampling for Gaussian graphical models via spectral sparsification. In COLT ’15. 364–390

work page 2015
[10]

Dehua Cheng, Yu Cheng, Yan Liu, Richard Peng, and Shang-Hua Teng. 2015. Spectral sparsification of random-walk matrix polynomials. arXiv preprint arXiv:1502.03496 (2015)

work page internal anchor Pith review Pith/arXiv arXiv 2015
[11]

Michael B Cohen, Jonathan Kelner, John Peebles, Richard Peng, Aaron Sidford, and Adrian Vladu. 2016. Faster algorithms for computing the stationary distribu- tion, simulating random walks, and more. In FOCS ’16. IEEE, 583–592

work page 2016
[12]

Leonardo Dagum and Ramesh Menon. 1998. OpenMP: an industry standard API for shared-memory programming. IEEE computational science and engineering 5, 1 (1998), 46–55

work page 1998
[13]

Anirban Dasgupta, John E Hopcroft, and Frank McSherry. 2004. Spectral analysis of random graphs with skewed degree distributions. In FOCS ’04. 602–610

work page 2004
[14]

Yuxiao Dong, Nitesh V Chawla, and Ananthram Swami. 2017. metapath2vec: Scalable Representation Learning for Heterogeneous Networks. In KDD ’17

work page 2017
[15]

Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. 2008. LIBLINEAR: A library for large linear classification. JMLR ’089, Aug (2008), 1871–1874

work page 2008
[16]

Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In KDD ’16. ACM, 855–864

work page 2016
[17]

Nathan Halko, Per-Gunnar Martinsson, and Joel A Tropp. 2011. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM review 53, 2 (2011), 217–288

work page 2011
[18]

Hamilton, Rex Ying, and Jure Leskovec

William L. Hamilton, Rex Ying, and Jure Leskovec. 2017. Representation Learning on Graphs: Methods and Applications. IEEE Data(base) Engineering Bulletin 40 (2017), 52–74

work page 2017
[19]

Nicholas J Higham and Lijing Lin. 2011. On p th roots of stochastic matrices. Linear Algebra Appl. 435, 3 (2011), 448–463

work page 2011
[20]

Horn and Charles R

Roger A. Horn and Charles R. Johnson. 1991.Topics in Matrix Analysis. Cambridge University Press. https://doi.org/10.1017/CBO9780511840371

work page doi:10.1017/cbo9780511840371 1991
[21]

Shihao Ji, Nadathur Satish, Sheng Li, and Pradeep Dubey. 2016. Parallelizing word2vec in shared and distributed memory. arXiv preprint arXiv:1604.04661 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[22]

Michael Kapralov, Yin Tat Lee, CN Musco, CP Musco, and Aaron Sidford. 2017. Single pass spectral sparsification in dynamic streams. SIAM J. Comput. 46, 1 (2017), 456–477

work page 2017
[23]

Kipf and Max Welling

Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In ICLR ’17

work page 2017
[24]

Omer Levy and Yoav Goldberg. 2014. Neural Word Embedding as Implicit Matrix Factorization. In NIPS ’14. 2177–2185

work page 2014
[25]

Mu Li, David G Andersen, Jun Woo Park, Alexander J Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J Shekita, and Bor-Yiing Su. 2014. Scaling Distributed Machine Learning with the Parameter Server.. In OSDI ’14, Vol. 14. 583–598

work page 2014
[26]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. InICLR Workshop ’13

work page 2013
[27]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In NIPS’ 13. 3111–3119

work page 2013
[28]

Erik Ordentlich, Lee Yang, Andy Feng, Peter Cnudde, Mihajlo Grbovic, Nemanja Djuric, Vladan Radosavljevic, and Gavin Owens. 2016. Network-efficient dis- tributed word2vec training system for large vocabularies. In CIKM ’16. ACM, 1139–1148

work page 2016
[29]

Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In KDD ’14. ACM, 701–710

work page 2014
[30]

Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Kuansan Wang, and Jie Tang. 2018. Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec. In WSDM ’18. ACM, 459–467

work page 2018
[31]

Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-june Paul Hsu, and Kuansan Wang. 2015. An overview of microsoft academic service (mas) and applications. In WWW ’15. ACM, 243–246

work page 2015
[32]

Daniel A Spielman and Nikhil Srivastava. 2011. Graph sparsification by effective resistances. SIAM J. Comput. 40, 6 (2011), 1913–1926

work page 2011
[33]

Chris Stark, Bobby-Joe Breitkreutz, Andrew Chatr-Aryamontri, Lorrie Boucher, Rose Oughtred, Michael S Livstone, Julie Nixon, Kimberly Van Auken, Xiaodong Wang, Xiaoqi Shi, et al. 2010. The BioGRID interaction database: 2011 update. Nucleic acids research 39, suppl_1 (2010), D698–D704

work page 2010
[34]

Stergios Stergiou, Zygimantas Straznickas, Rolina Wu, and Kostas Tsioutsiouliklis

work page
[35]

In AAAI ’17

Distributed Negative Sampling for Word Embeddings.. In AAAI ’17. 2569– 2575

work page
[36]

Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. Line: Large-scale information network embedding. In WWW ’15. 1067–1077

work page 2015
[37]

Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. 2008. Arnet- miner: extraction and mining of academic social networks. In KDD ’08. 990–998

work page 2008
[38]

Lei Tang and Huan Liu. 2009. Relational learning via latent social dimensions. In KDD ’09. ACM, 817–826

work page 2009
[39]

Lei Tang and Huan Liu. 2009. Scalable learning of collective behavior based on sparse social dimensions. In CIKM ’09. ACM, 1107–1116

work page 2009
[40]

Lei Tang, Suju Rajan, and Vijay K Narayanan. 2009. Large scale multi-label classification via metalabeler. In WWW ’09. ACM, 211–220

work page 2009
[41]

Shang-Hua Teng et al. 2016. Scalable algorithms for data and network analysis. Foundations and Trends® in Theoretical Computer Science 12, 1–2 (2016), 1–274

work page 2016
[42]

Lloyd N Trefethen and David Bau III. 1997. Numerical linear algebra . Vol. 50. Siam

work page 1997
[43]

Anton Tsitsulin, Davide Mottin, Panagiotis Karras, and Emmanuel Müller. 2018. VERSE: Versatile Graph Embeddings from Similarity Measures. In WWW ’18. 539–548

work page 2018
[44]

Grigorios Tsoumakas, Ioannis Katakis, and Ioannis Vlahavas. 2009. Mining multi-label data. In Data mining and knowledge discovery handbook . Springer, 667–685

work page 2009
[45]

Ulrike Von Luxburg. 2007. A tutorial on spectral clustering. Statistics and computing 17, 4 (2007), 395–416

work page 2007
[46]

Jizhe Wang, Pipei Huang, Huan Zhao, Zhibo Zhang, Binqiang Zhao, and Dik Lun Lee. 2018. Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba. In KDD ’18. ACM

work page 2018
[47]

Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton, and Jure Leskovec. 2018. Graph Convolutional Neural Networks for Web-Scale Recommender Systems. KDD ’18

work page 2018
[48]

Matei Zaharia, Mosharaf Chowdhury, Michael J Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster computing with working sets. HotCloud ’1010, 10-10 (2010), 95

work page 2010
[49]

Peixiang Zhao. 2015. gSparsify: Graph Motif Based Sparsification for Graph Clustering. In CIKM ’15. ACM, 373–382

work page 2015

[1] [1]

Nitin Agarwal, Huan Liu, Sudheendra Murthy, Arunabha Sen, and Xufei Wang

work page

[2] [2]

In ICWSM ’09

A Social Identity Approach to Identify Familiar Strangers in a Social Network.. In ICWSM ’09

work page

[3] [3]

Lars Backstrom, Paolo Boldi, Marco Rosa, Johan Ugander, and Sebastiano Vigna

work page

[4] [4]

In WebSci ’12

Four degrees of separation. In WebSci ’12. ACM, 33–42

work page

[5] [5]

Daniele Calandriello, Ioannis Koutis, Alessandro Lazaric, and Michal Valko. 2018. Improved large-scale graph learning through ridge spectral sparsification. In ICML ’18. 687âĂŞ696

work page 2018

[6] [6]

Shaosheng Cao, Wei Lu, and Qiongkai Xu. 2015. GraRep: Learning graph repre- sentations with global structural information. In CIKM ’15. ACM, 891–900

work page 2015

[7] [7]

Raymond B Cattell. 1966. The scree test for the number of factors. Multivariate behavioral research 1, 2 (1966), 245–276

work page 1966

[8] [8]

Kamalika Chaudhuri, Fan Chung, and Alexander Tsiatas. 2012. Spectral clustering of graphs with general degrees in the extended planted partition model. In COLT ’12. 35–1

work page 2012

[9] [9]

Dehua Cheng, Yu Cheng, Yan Liu, Richard Peng, and Shang-Hua Teng. 2015. Efficient sampling for Gaussian graphical models via spectral sparsification. In COLT ’15. 364–390

work page 2015

[10] [10]

Dehua Cheng, Yu Cheng, Yan Liu, Richard Peng, and Shang-Hua Teng. 2015. Spectral sparsification of random-walk matrix polynomials. arXiv preprint arXiv:1502.03496 (2015)

work page internal anchor Pith review Pith/arXiv arXiv 2015

[11] [11]

Michael B Cohen, Jonathan Kelner, John Peebles, Richard Peng, Aaron Sidford, and Adrian Vladu. 2016. Faster algorithms for computing the stationary distribu- tion, simulating random walks, and more. In FOCS ’16. IEEE, 583–592

work page 2016

[12] [12]

Leonardo Dagum and Ramesh Menon. 1998. OpenMP: an industry standard API for shared-memory programming. IEEE computational science and engineering 5, 1 (1998), 46–55

work page 1998

[13] [13]

Anirban Dasgupta, John E Hopcroft, and Frank McSherry. 2004. Spectral analysis of random graphs with skewed degree distributions. In FOCS ’04. 602–610

work page 2004

[14] [14]

Yuxiao Dong, Nitesh V Chawla, and Ananthram Swami. 2017. metapath2vec: Scalable Representation Learning for Heterogeneous Networks. In KDD ’17

work page 2017

[15] [15]

Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. 2008. LIBLINEAR: A library for large linear classification. JMLR ’089, Aug (2008), 1871–1874

work page 2008

[16] [16]

Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In KDD ’16. ACM, 855–864

work page 2016

[17] [17]

Nathan Halko, Per-Gunnar Martinsson, and Joel A Tropp. 2011. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM review 53, 2 (2011), 217–288

work page 2011

[18] [18]

Hamilton, Rex Ying, and Jure Leskovec

William L. Hamilton, Rex Ying, and Jure Leskovec. 2017. Representation Learning on Graphs: Methods and Applications. IEEE Data(base) Engineering Bulletin 40 (2017), 52–74

work page 2017

[19] [19]

Nicholas J Higham and Lijing Lin. 2011. On p th roots of stochastic matrices. Linear Algebra Appl. 435, 3 (2011), 448–463

work page 2011

[20] [20]

Horn and Charles R

Roger A. Horn and Charles R. Johnson. 1991.Topics in Matrix Analysis. Cambridge University Press. https://doi.org/10.1017/CBO9780511840371

work page doi:10.1017/cbo9780511840371 1991

[21] [21]

Shihao Ji, Nadathur Satish, Sheng Li, and Pradeep Dubey. 2016. Parallelizing word2vec in shared and distributed memory. arXiv preprint arXiv:1604.04661 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016

[22] [22]

Michael Kapralov, Yin Tat Lee, CN Musco, CP Musco, and Aaron Sidford. 2017. Single pass spectral sparsification in dynamic streams. SIAM J. Comput. 46, 1 (2017), 456–477

work page 2017

[23] [23]

Kipf and Max Welling

Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In ICLR ’17

work page 2017

[24] [24]

Omer Levy and Yoav Goldberg. 2014. Neural Word Embedding as Implicit Matrix Factorization. In NIPS ’14. 2177–2185

work page 2014

[25] [25]

Mu Li, David G Andersen, Jun Woo Park, Alexander J Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J Shekita, and Bor-Yiing Su. 2014. Scaling Distributed Machine Learning with the Parameter Server.. In OSDI ’14, Vol. 14. 583–598

work page 2014

[26] [26]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. InICLR Workshop ’13

work page 2013

[27] [27]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In NIPS’ 13. 3111–3119

work page 2013

[28] [28]

Erik Ordentlich, Lee Yang, Andy Feng, Peter Cnudde, Mihajlo Grbovic, Nemanja Djuric, Vladan Radosavljevic, and Gavin Owens. 2016. Network-efficient dis- tributed word2vec training system for large vocabularies. In CIKM ’16. ACM, 1139–1148

work page 2016

[29] [29]

Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In KDD ’14. ACM, 701–710

work page 2014

[30] [30]

Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Kuansan Wang, and Jie Tang. 2018. Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec. In WSDM ’18. ACM, 459–467

work page 2018

[31] [31]

Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-june Paul Hsu, and Kuansan Wang. 2015. An overview of microsoft academic service (mas) and applications. In WWW ’15. ACM, 243–246

work page 2015

[32] [32]

Daniel A Spielman and Nikhil Srivastava. 2011. Graph sparsification by effective resistances. SIAM J. Comput. 40, 6 (2011), 1913–1926

work page 2011

[33] [33]

Chris Stark, Bobby-Joe Breitkreutz, Andrew Chatr-Aryamontri, Lorrie Boucher, Rose Oughtred, Michael S Livstone, Julie Nixon, Kimberly Van Auken, Xiaodong Wang, Xiaoqi Shi, et al. 2010. The BioGRID interaction database: 2011 update. Nucleic acids research 39, suppl_1 (2010), D698–D704

work page 2010

[34] [34]

Stergios Stergiou, Zygimantas Straznickas, Rolina Wu, and Kostas Tsioutsiouliklis

work page

[35] [35]

In AAAI ’17

Distributed Negative Sampling for Word Embeddings.. In AAAI ’17. 2569– 2575

work page

[36] [36]

Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. Line: Large-scale information network embedding. In WWW ’15. 1067–1077

work page 2015

[37] [37]

Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. 2008. Arnet- miner: extraction and mining of academic social networks. In KDD ’08. 990–998

work page 2008

[38] [38]

Lei Tang and Huan Liu. 2009. Relational learning via latent social dimensions. In KDD ’09. ACM, 817–826

work page 2009

[39] [39]

Lei Tang and Huan Liu. 2009. Scalable learning of collective behavior based on sparse social dimensions. In CIKM ’09. ACM, 1107–1116

work page 2009

[40] [40]

Lei Tang, Suju Rajan, and Vijay K Narayanan. 2009. Large scale multi-label classification via metalabeler. In WWW ’09. ACM, 211–220

work page 2009

[41] [41]

Shang-Hua Teng et al. 2016. Scalable algorithms for data and network analysis. Foundations and Trends® in Theoretical Computer Science 12, 1–2 (2016), 1–274

work page 2016

[42] [42]

Lloyd N Trefethen and David Bau III. 1997. Numerical linear algebra . Vol. 50. Siam

work page 1997

[43] [43]

Anton Tsitsulin, Davide Mottin, Panagiotis Karras, and Emmanuel Müller. 2018. VERSE: Versatile Graph Embeddings from Similarity Measures. In WWW ’18. 539–548

work page 2018

[44] [44]

Grigorios Tsoumakas, Ioannis Katakis, and Ioannis Vlahavas. 2009. Mining multi-label data. In Data mining and knowledge discovery handbook . Springer, 667–685

work page 2009

[45] [45]

Ulrike Von Luxburg. 2007. A tutorial on spectral clustering. Statistics and computing 17, 4 (2007), 395–416

work page 2007

[46] [46]

Jizhe Wang, Pipei Huang, Huan Zhao, Zhibo Zhang, Binqiang Zhao, and Dik Lun Lee. 2018. Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba. In KDD ’18. ACM

work page 2018

[47] [47]

Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton, and Jure Leskovec. 2018. Graph Convolutional Neural Networks for Web-Scale Recommender Systems. KDD ’18

work page 2018

[48] [48]

Matei Zaharia, Mosharaf Chowdhury, Michael J Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster computing with working sets. HotCloud ’1010, 10-10 (2010), 95

work page 2010

[49] [49]

Peixiang Zhao. 2015. gSparsify: Graph Motif Based Sparsification for Graph Clustering. In CIKM ’15. ACM, 373–382

work page 2015