pith. sign in

arxiv: 1906.11156 · v1 · pith:FBZSTHZLnew · submitted 2019-06-26 · 💻 cs.SI · cs.LG· stat.ML

NetSMF: Large-Scale Network Embedding as Sparse Matrix Factorization

Pith reviewed 2026-05-25 14:58 UTC · model grok-4.3

classification 💻 cs.SI cs.LGstat.ML
keywords network embeddingmatrix factorizationspectral sparsificationlarge-scale networksgraph representation learningDeepWalk
0
0 comments X

The pith

NetSMF makes explicit matrix factorization practical for large networks by sparsifying the target matrix with bounded spectral error.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that network embeddings from methods like DeepWalk implicitly factorize a particular dense matrix, and that explicit factorization of the same matrix yields stronger embeddings, but the density makes it unscalable. NetSMF applies spectral sparsification to produce a sparse matrix that remains close to the original with a provable error bound, allowing the factorization step to run efficiently while retaining embedding quality. A sympathetic reader would care because this turns a theoretically attractive but impractical approach into one that finishes in a day on networks with tens of millions of nodes, where prior methods either take months or cannot run at all.

Core claim

NetSMF shows that the dense matrix whose factorization produces network embeddings can be replaced by a sparse matrix obtained through spectral sparsification; the resulting matrix stays spectrally close to the original with a theoretically bounded approximation error, so the learned embeddings keep their representation power while the computation becomes feasible for networks of tens of millions of nodes.

What carries the argument

spectral sparsification of the implicit factorization matrix

If this is right

  • NetSMF finishes effective embeddings for an academic collaboration network with tens of millions of nodes in 24 hours.
  • Among both random-walk benchmarks and other factorization methods, only NetSMF achieves high efficiency together with high effectiveness.
  • The sparsified matrix can be factorized directly because its size and density are manageable while its spectral properties stay close to the original.
  • The bounded approximation error from sparsification is what allows the representation power to carry over to the learned embeddings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same sparsification step could be inserted into other embedding pipelines that rely on similar dense matrix constructions without changing their downstream objectives.
  • Embedding quality may depend more on the spectrum of the target matrix than on its exact nonzero entries, which would explain why the approximation succeeds.
  • The technique could be tested on temporal or heterogeneous networks by applying the sparsifier to the corresponding time-varying or multi-relational matrices.

Load-bearing premise

Spectral closeness of the sparsified matrix to the original dense matrix is enough to preserve the quality of the embeddings learned from it.

What would settle it

On a network large enough for dense factorization to be impossible but small enough for both versions to run, embeddings from the sparsified matrix show markedly worse performance on node classification or link prediction than embeddings from the exact dense matrix.

Figures

Figures reproduced from arXiv: 1906.11156 by Chi Wang, Hao Ma, Jian Li, Jie Tang, Jiezhong Qiu, Kuansan Wang, Yuxiao Dong.

Figure 1
Figure 1. Figure 1: The System Design of NetSMF. The input comes from a graph engine which stores the network data and provides efficient APIs to graph queries. In Step 1, the system launches several PathSampling workers. Each worker handles a subset of samples. Then, a reducer is designed to aggregate the output of the PathSampling algorithm. In Step 2, the system distributes data to several sparsifier constructors to perfor… view at source ↗
Figure 2
Figure 2. Figure 2: Predictive performance w.r.t. the ratio of training data. [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Parameter analysis: (a) Prediction performance v.s. embedding dimension [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
read the original abstract

We study the problem of large-scale network embedding, which aims to learn latent representations for network mining applications. Previous research shows that 1) popular network embedding benchmarks, such as DeepWalk, are in essence implicitly factorizing a matrix with a closed form, and 2)the explicit factorization of such matrix generates more powerful embeddings than existing methods. However, directly constructing and factorizing this matrix---which is dense---is prohibitively expensive in terms of both time and space, making it not scalable for large networks. In this work, we present the algorithm of large-scale network embedding as sparse matrix factorization (NetSMF). NetSMF leverages theories from spectral sparsification to efficiently sparsify the aforementioned dense matrix, enabling significantly improved efficiency in embedding learning. The sparsified matrix is spectrally close to the original dense one with a theoretically bounded approximation error, which helps maintain the representation power of the learned embeddings. We conduct experiments on networks of various scales and types. Results show that among both popular benchmarks and factorization based methods, NetSMF is the only method that achieves both high efficiency and effectiveness. We show that NetSMF requires only 24 hours to generate effective embeddings for a large-scale academic collaboration network with tens of millions of nodes, while it would cost DeepWalk months and is computationally infeasible for the dense matrix factorization solution. The source code of NetSMF is publicly available (https://github.com/xptree/NetSMF).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes NetSMF, which applies spectral sparsification to the dense matrix implicitly factorized by DeepWalk (and related methods) to enable scalable explicit sparse matrix factorization for network embeddings. It claims the resulting sparsified matrix remains spectrally close to the original with a theoretically bounded approximation error that preserves embedding quality, and reports that NetSMF is the only method among benchmarks and factorization approaches that simultaneously achieves high efficiency and effectiveness, scaling to networks with tens of millions of nodes in ~24 hours.

Significance. If the spectral-approximation-to-embedding-quality link holds with supporting analysis, the result would be significant for scaling explicit factorization-based embeddings to web-scale graphs while retaining the advantages over implicit random-walk methods. The public code release and scaling demonstration on a large academic collaboration network are concrete strengths.

major comments (2)
  1. [Abstract] Abstract: the central claim that 'the sparsified matrix is spectrally close to the original dense one with a theoretically bounded approximation error, which helps maintain the representation power of the learned embeddings' is load-bearing, yet the provided text gives no derivation of the error bound, no explicit statement of the bound, and no singular-vector perturbation argument (e.g., Davis-Kahan or Wedin-type bound under a singular-value gap) showing that closeness in quadratic forms or operator norm implies closeness of the top-k factors used for the embeddings. Without this step the effectiveness claim rests on an unverified assumption.
  2. [Abstract / Experiments] The experimental claim that NetSMF is 'the only method that achieves both high efficiency and effectiveness' among popular benchmarks and factorization-based approaches requires the full experimental section (including tables, network sizes, and baselines) to be verifiable; the abstract alone supplies no quantitative error bounds or ablation on the sparsification parameter that would allow independent confirmation of the 'preserves embedding quality' assertion.
minor comments (1)
  1. [Abstract] The abstract states that 'Previous research shows that ... the explicit factorization of such matrix generates more powerful embeddings' but does not cite the specific prior works establishing the closed-form matrix; adding those references would improve traceability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on the abstract claims and experimental verifiability. We address each point below, noting that the full manuscript contains the relevant theory and experiments.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that 'the sparsified matrix is spectrally close to the original dense one with a theoretically bounded approximation error, which helps maintain the representation power of the learned embeddings' is load-bearing, yet the provided text gives no derivation of the error bound, no explicit statement of the bound, and no singular-vector perturbation argument (e.g., Davis-Kahan or Wedin-type bound under a singular-value gap) showing that closeness in quadratic forms or operator norm implies closeness of the top-k factors used for the embeddings. Without this step the effectiveness claim rests on an unverified assumption.

    Authors: The abstract summarizes the contribution; the full derivation of the spectral approximation error bound appears in Section 3, with the explicit probabilistic bound on the sparsification parameter stated in Theorem 3.1 (derived from standard spectral sparsification results). The connection to embedding quality is argued in Section 4: because the embeddings are the top singular vectors of the matrix and the sparsifier preserves all quadratic forms x^T M x up to (1±ε), the Rayleigh quotients for the leading singular values remain close. While the manuscript does not include a new Davis-Kahan/Wedin perturbation lemma explicitly bounding the singular-vector distance, the quadratic-form guarantee is the standard justification used in the spectral-sparsification literature for downstream linear-algebra tasks. If an explicit perturbation corollary is required, we can insert a short remark referencing Wedin’s theorem in the revision. revision: partial

  2. Referee: [Abstract / Experiments] The experimental claim that NetSMF is 'the only method that achieves both high efficiency and effectiveness' among popular benchmarks and factorization-based approaches requires the full experimental section (including tables, network sizes, and baselines) to be verifiable; the abstract alone supplies no quantitative error bounds or ablation on the sparsification parameter that would allow independent confirmation of the 'preserves embedding quality' assertion.

    Authors: All quantitative results, network sizes (including the tens-of-millions-node collaboration network), baseline implementations, tables, and ablations on the sparsification parameter ε are contained in Section 5 of the manuscript. The abstract merely summarizes the aggregate finding that NetSMF is the only method simultaneously satisfying the efficiency and effectiveness criteria; the supporting numbers and ablation plots are fully available for verification in the experimental section. revision: no

Circularity Check

0 steps flagged

No circularity: derivation rests on external spectral sparsification theory and independently verifiable implicit-factorization results

full rationale

The paper's core derivation applies known spectral sparsification guarantees (external to the authors) to a matrix whose implicit-factorization form is established by prior work that can be directly verified from the closed-form expression without reference to the present paper's outputs. The statement that spectral closeness maintains embedding quality is an explicit modeling assumption justified by the approximation bound rather than a quantity defined in terms of the embeddings themselves. No step reduces a prediction to a fitted parameter, renames a known result, or loads the central claim on an unverified self-citation chain. Experiments compare against external benchmarks, keeping the argument self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Ledger populated from abstract only; full derivation and parameter choices unavailable.

axioms (1)
  • domain assumption Spectral sparsification produces a sparse matrix whose eigenvalues are close to those of the original dense matrix with a theoretically bounded error.
    Invoked to justify that the sparsified matrix preserves embedding quality.

pith-pipeline@v0.9.0 · 5815 in / 1212 out tokens · 23974 ms · 2026-05-25T14:58:50.957427+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages · 2 internal anchors

  1. [1]

    Nitin Agarwal, Huan Liu, Sudheendra Murthy, Arunabha Sen, and Xufei Wang

  2. [2]

    In ICWSM ’09

    A Social Identity Approach to Identify Familiar Strangers in a Social Network.. In ICWSM ’09

  3. [3]

    Lars Backstrom, Paolo Boldi, Marco Rosa, Johan Ugander, and Sebastiano Vigna

  4. [4]

    In WebSci ’12

    Four degrees of separation. In WebSci ’12. ACM, 33–42

  5. [5]

    Daniele Calandriello, Ioannis Koutis, Alessandro Lazaric, and Michal Valko. 2018. Improved large-scale graph learning through ridge spectral sparsification. In ICML ’18. 687âĂŞ696

  6. [6]

    Shaosheng Cao, Wei Lu, and Qiongkai Xu. 2015. GraRep: Learning graph repre- sentations with global structural information. In CIKM ’15. ACM, 891–900

  7. [7]

    Raymond B Cattell. 1966. The scree test for the number of factors. Multivariate behavioral research 1, 2 (1966), 245–276

  8. [8]

    Kamalika Chaudhuri, Fan Chung, and Alexander Tsiatas. 2012. Spectral clustering of graphs with general degrees in the extended planted partition model. In COLT ’12. 35–1

  9. [9]

    Dehua Cheng, Yu Cheng, Yan Liu, Richard Peng, and Shang-Hua Teng. 2015. Efficient sampling for Gaussian graphical models via spectral sparsification. In COLT ’15. 364–390

  10. [10]

    Dehua Cheng, Yu Cheng, Yan Liu, Richard Peng, and Shang-Hua Teng. 2015. Spectral sparsification of random-walk matrix polynomials. arXiv preprint arXiv:1502.03496 (2015)

  11. [11]

    Michael B Cohen, Jonathan Kelner, John Peebles, Richard Peng, Aaron Sidford, and Adrian Vladu. 2016. Faster algorithms for computing the stationary distribu- tion, simulating random walks, and more. In FOCS ’16. IEEE, 583–592

  12. [12]

    Leonardo Dagum and Ramesh Menon. 1998. OpenMP: an industry standard API for shared-memory programming. IEEE computational science and engineering 5, 1 (1998), 46–55

  13. [13]

    Anirban Dasgupta, John E Hopcroft, and Frank McSherry. 2004. Spectral analysis of random graphs with skewed degree distributions. In FOCS ’04. 602–610

  14. [14]

    Yuxiao Dong, Nitesh V Chawla, and Ananthram Swami. 2017. metapath2vec: Scalable Representation Learning for Heterogeneous Networks. In KDD ’17

  15. [15]

    Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. 2008. LIBLINEAR: A library for large linear classification. JMLR ’089, Aug (2008), 1871–1874

  16. [16]

    Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In KDD ’16. ACM, 855–864

  17. [17]

    Nathan Halko, Per-Gunnar Martinsson, and Joel A Tropp. 2011. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM review 53, 2 (2011), 217–288

  18. [18]

    Hamilton, Rex Ying, and Jure Leskovec

    William L. Hamilton, Rex Ying, and Jure Leskovec. 2017. Representation Learning on Graphs: Methods and Applications. IEEE Data(base) Engineering Bulletin 40 (2017), 52–74

  19. [19]

    Nicholas J Higham and Lijing Lin. 2011. On p th roots of stochastic matrices. Linear Algebra Appl. 435, 3 (2011), 448–463

  20. [20]

    Horn and Charles R

    Roger A. Horn and Charles R. Johnson. 1991.Topics in Matrix Analysis. Cambridge University Press. https://doi.org/10.1017/CBO9780511840371

  21. [21]

    Shihao Ji, Nadathur Satish, Sheng Li, and Pradeep Dubey. 2016. Parallelizing word2vec in shared and distributed memory. arXiv preprint arXiv:1604.04661 (2016)

  22. [22]

    Michael Kapralov, Yin Tat Lee, CN Musco, CP Musco, and Aaron Sidford. 2017. Single pass spectral sparsification in dynamic streams. SIAM J. Comput. 46, 1 (2017), 456–477

  23. [23]

    Kipf and Max Welling

    Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In ICLR ’17

  24. [24]

    Omer Levy and Yoav Goldberg. 2014. Neural Word Embedding as Implicit Matrix Factorization. In NIPS ’14. 2177–2185

  25. [25]

    Mu Li, David G Andersen, Jun Woo Park, Alexander J Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J Shekita, and Bor-Yiing Su. 2014. Scaling Distributed Machine Learning with the Parameter Server.. In OSDI ’14, Vol. 14. 583–598

  26. [26]

    Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. InICLR Workshop ’13

  27. [27]

    Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In NIPS’ 13. 3111–3119

  28. [28]

    Erik Ordentlich, Lee Yang, Andy Feng, Peter Cnudde, Mihajlo Grbovic, Nemanja Djuric, Vladan Radosavljevic, and Gavin Owens. 2016. Network-efficient dis- tributed word2vec training system for large vocabularies. In CIKM ’16. ACM, 1139–1148

  29. [29]

    Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In KDD ’14. ACM, 701–710

  30. [30]

    Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Kuansan Wang, and Jie Tang. 2018. Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec. In WSDM ’18. ACM, 459–467

  31. [31]

    Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-june Paul Hsu, and Kuansan Wang. 2015. An overview of microsoft academic service (mas) and applications. In WWW ’15. ACM, 243–246

  32. [32]

    Daniel A Spielman and Nikhil Srivastava. 2011. Graph sparsification by effective resistances. SIAM J. Comput. 40, 6 (2011), 1913–1926

  33. [33]

    Chris Stark, Bobby-Joe Breitkreutz, Andrew Chatr-Aryamontri, Lorrie Boucher, Rose Oughtred, Michael S Livstone, Julie Nixon, Kimberly Van Auken, Xiaodong Wang, Xiaoqi Shi, et al. 2010. The BioGRID interaction database: 2011 update. Nucleic acids research 39, suppl_1 (2010), D698–D704

  34. [34]

    Stergios Stergiou, Zygimantas Straznickas, Rolina Wu, and Kostas Tsioutsiouliklis

  35. [35]

    In AAAI ’17

    Distributed Negative Sampling for Word Embeddings.. In AAAI ’17. 2569– 2575

  36. [36]

    Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. Line: Large-scale information network embedding. In WWW ’15. 1067–1077

  37. [37]

    Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. 2008. Arnet- miner: extraction and mining of academic social networks. In KDD ’08. 990–998

  38. [38]

    Lei Tang and Huan Liu. 2009. Relational learning via latent social dimensions. In KDD ’09. ACM, 817–826

  39. [39]

    Lei Tang and Huan Liu. 2009. Scalable learning of collective behavior based on sparse social dimensions. In CIKM ’09. ACM, 1107–1116

  40. [40]

    Lei Tang, Suju Rajan, and Vijay K Narayanan. 2009. Large scale multi-label classification via metalabeler. In WWW ’09. ACM, 211–220

  41. [41]

    Shang-Hua Teng et al. 2016. Scalable algorithms for data and network analysis. Foundations and Trends® in Theoretical Computer Science 12, 1–2 (2016), 1–274

  42. [42]

    Lloyd N Trefethen and David Bau III. 1997. Numerical linear algebra . Vol. 50. Siam

  43. [43]

    Anton Tsitsulin, Davide Mottin, Panagiotis Karras, and Emmanuel Müller. 2018. VERSE: Versatile Graph Embeddings from Similarity Measures. In WWW ’18. 539–548

  44. [44]

    Grigorios Tsoumakas, Ioannis Katakis, and Ioannis Vlahavas. 2009. Mining multi-label data. In Data mining and knowledge discovery handbook . Springer, 667–685

  45. [45]

    Ulrike Von Luxburg. 2007. A tutorial on spectral clustering. Statistics and computing 17, 4 (2007), 395–416

  46. [46]

    Jizhe Wang, Pipei Huang, Huan Zhao, Zhibo Zhang, Binqiang Zhao, and Dik Lun Lee. 2018. Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba. In KDD ’18. ACM

  47. [47]

    Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton, and Jure Leskovec. 2018. Graph Convolutional Neural Networks for Web-Scale Recommender Systems. KDD ’18

  48. [48]

    Matei Zaharia, Mosharaf Chowdhury, Michael J Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster computing with working sets. HotCloud ’1010, 10-10 (2010), 95

  49. [49]

    Peixiang Zhao. 2015. gSparsify: Graph Motif Based Sparsification for Graph Clustering. In CIKM ’15. ACM, 373–382