pith. sign in

arxiv: 1907.01705 · v1 · pith:RO5BA4DRnew · submitted 2019-07-03 · 💻 cs.LG · stat.ML

Graph Embeddings at Scale

Pith reviewed 2026-05-25 10:22 UTC · model grok-4.3

classification 💻 cs.LG stat.ML
keywords graph embeddingsskip-gramdistributed traininglarge-scale graphslink predictionvertex representationsFriendster network
0
0 comments X

The pith

A distributed infrastructure scales skip-gram embeddings to graphs of 68 million vertices by avoiding any graph partitioning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a distributed system that trains vertex embeddings on very large networks without splitting the graph across machines. It dynamically builds limited-size computation graphs on workers and uses fast indexing to update the embeddings. The authors demonstrate this on the public Friendster network and a private 50-million-vertex graph, tracking both how accurately the embeddings predict links and how quickly the training converges. They also report that adding more workers improves both accuracy and speed. The approach matters because many industrial graphs are too large for single-machine training yet too awkward to partition without losing performance.

Core claim

The system scales the skip-gram algorithm to graphs containing tens of millions of vertices by completely avoiding graph partitioning, instead dynamically creating size-constrained computational graphs across worker nodes and relying on highly efficient indexing operations to update embeddings.

What carries the argument

Distributed infrastructure that avoids graph partitioning by dynamically creating size-constrained computational graphs and using efficient indexing for embedding updates.

If this is right

  • Skip-gram embeddings become feasible on graphs with 50-70 million vertices using commodity distributed hardware.
  • Link-prediction accuracy and convergence rate both improve as the number of worker nodes increases.
  • The same infrastructure can be applied to other existing embedding algorithms beyond skip-gram.
  • Heterogeneous graphs of industrial size can be embedded without manual partitioning decisions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method could reduce the engineering cost of deploying embeddings in recommendation or ranking pipelines that already use large social or interaction graphs.
  • If the indexing and dynamic-graph approach generalizes, it may allow rapid experimentation with new embedding objectives on the same infrastructure.
  • Downstream tasks that rely on vertex similarity, such as community detection, may see improved results once embeddings are trained at full graph scale.

Load-bearing premise

That a system which never partitions the graph but instead creates temporary small computation graphs on workers will still produce high-quality embeddings at this scale.

What would settle it

Run the system on the Friendster graph and obtain link-prediction accuracy no better than a random baseline or observe training that fails to converge within a reasonable number of iterations.

Figures

Figures reproduced from arXiv: 1907.01705 by Anish Khazane, C. Bayan Bruss, Jonathan Rider, Keegan E. Hines, Richard Serpe, Saurabh Nagrecha.

Figure 1
Figure 1. Figure 1: Distributed training architecture diagram [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Performance benchmarks on large scale graphs. [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Experiments run on the Friendster graph 5 CONCLUSION In this paper, we present a distributed architecture for training em￾beddings on large-scale graphs. We first explored the infrastructural advantages and disadvantages of other systems in industry that compute recommendations or embeddings at scale. We then pre￾sented our own approach, and delved into experimental results from scaling the skip-gram algor… view at source ↗
read the original abstract

Graph embedding is a popular algorithmic approach for creating vector representations for individual vertices in networks. Training these algorithms at scale is important for creating embeddings that can be used for classification, ranking, recommendation and other common applications in industry. While industrial systems exist for training graph embeddings on large datasets, many of these distributed architectures are forced to partition copious amounts of data and model logic across many worker nodes. In this paper, we propose a distributed infrastructure that completely avoids graph partitioning, dynamically creates size constrained computational graphs across worker nodes, and uses highly efficient indexing operations for updating embeddings that allow the system to function at scale. We show that our system can scale an existing embeddings algorithm - skip-gram - to train on the open-source Friendster network (68 million vertices) and on an internal heterogeneous graph (50 million vertices). We measure the performance of our system on two key quantitative metrics: link-prediction accuracy and rate of convergence. We conclude this work by analyzing how a greater number of worker nodes actually improves our system's performance on the aforementioned metrics and discuss our next steps for rigorously evaluating the embedding vectors produced by our system.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes a distributed infrastructure for training skip-gram graph embeddings that avoids graph partitioning by dynamically creating size-constrained computational graphs across workers and relying on efficient indexing for embedding updates. It claims this system scales an existing skip-gram implementation to the Friendster network (68M vertices) and an internal heterogeneous graph (50M vertices), with performance measured on link-prediction accuracy and rate of convergence; the abstract further states that increasing the number of workers improves these metrics.

Significance. If the no-partitioning design and scaling results hold with proper verification, the work would offer a practical systems contribution for industrial-scale graph embedding training by reducing partitioning overhead. The focus on an unmodified skip-gram algorithm and quantitative metrics on real large graphs is a strength, but the absence of supporting implementation details, baselines, and scaling data in the provided description limits assessable impact.

major comments (2)
  1. [Abstract] Abstract: the central claim that the system 'completely avoids graph partitioning' while scaling skip-gram to 68M vertices rests on unshown experiments; no description is given of how dynamic graph creation and indexing handle adjacency information or random walks without full replication or worker-count-dependent communication volume, which is load-bearing for the scalability assertion.
  2. [Abstract] Abstract: the reported improvements in link-prediction accuracy and convergence rate with more workers lack error bars, baseline comparisons (e.g., against partitioned systems), or any data on walk generation time and per-epoch communication costs, preventing verification that the design functions at the claimed scale without prohibitive overheads.
minor comments (1)
  1. The abstract concludes by deferring 'rigorous evaluation' of the produced embeddings without specifying the planned metrics, datasets, or comparison methods.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major comment point-by-point below, indicating where revisions will be made to strengthen the presentation of our no-partitioning approach and experimental claims.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that the system 'completely avoids graph partitioning' while scaling skip-gram to 68M vertices rests on unshown experiments; no description is given of how dynamic graph creation and indexing handle adjacency information or random walks without full replication or worker-count-dependent communication volume, which is load-bearing for the scalability assertion.

    Authors: The body of the manuscript (Sections 3 and 4) describes the dynamic creation of size-constrained computational graphs across workers and the use of efficient indexing for embedding updates. However, we agree that the abstract does not sufficiently elaborate on the handling of adjacency information and random walks to demonstrate the absence of full replication or worker-dependent communication volume. We will revise the abstract and add a clarifying subsection in the methods to explicitly address these mechanisms and their implications for scalability. revision: yes

  2. Referee: [Abstract] Abstract: the reported improvements in link-prediction accuracy and convergence rate with more workers lack error bars, baseline comparisons (e.g., against partitioned systems), or any data on walk generation time and per-epoch communication costs, preventing verification that the design functions at the claimed scale without prohibitive overheads.

    Authors: Section 5 of the manuscript presents the link-prediction accuracy and convergence results as a function of worker count. We acknowledge that the abstract claims would benefit from additional supporting details. In revision we will add error bars to the relevant figures, report walk generation times and per-epoch communication costs, and include a discussion of related partitioned baselines. Comprehensive new baseline experiments may require additional runs and will be noted as such. revision: partial

Circularity Check

0 steps flagged

No circularity: systems paper with no derivations or fitted predictions

full rationale

The paper presents a distributed systems architecture for scaling skip-gram embeddings on large graphs (Friendster 68M vertices, internal 50M) using dynamic computational graphs and indexing, with no equations, parameter fits, or mathematical derivations. Performance is measured empirically via link-prediction accuracy and convergence rate; the central claims are engineering feasibility and scaling behavior, not a closed-form result that reduces to its inputs. No self-citations load-bear any uniqueness theorem or ansatz. This is self-contained against external benchmarks and receives the default non-finding.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no equations, parameters, or postulates are present. The contribution is described as an engineering system rather than a mathematical derivation.

pith-pipeline@v0.9.0 · 5736 in / 1124 out tokens · 27158 ms · 2026-05-25T10:22:30.984585+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages · 5 internal anchors

  1. [1]

    Hongyun Cai, Vincent W Zheng, and Kevin Chen-Chuan Chang. 2018. A com- prehensive survey of graph embedding: Problems, techniques, and applications. IEEE Transactions on Knowledge and Data Engineering 30, 9 (2018), 1616–1637

  2. [2]

    Christopher M De Sa, Ce Zhang, Kunle Olukotun, and Christopher Ré. 2015. Taming the wild: A unified analysis of hogwild-style algorithms. In Advances in neural information processing systems . 2674–2682

  3. [3]

    Chantat Eksombatchai, Pranav Jindal, Jerry Zitao Liu, Yuchen Liu, Rahul Sharma, Charles Sugnet, Mark Ulrich, and Jure Leskovec. 2018. Pixie: A system for recommending 3+ billion items to 200+ million users in real-time. In Proceedings of the 2018 World Wide Web Conference on World Wide Web . International World Wide Web Conferences Steering Committee, 1775–1784

  4. [4]

    Joshua Goodman. 2001. Classes for Fast Maximum Entropy Training. CoRR cs.CL/0108006 (2001). http://arxiv.org/abs/cs.CL/0108006

  5. [5]

    Mihajlo Grbovic, Vladan Radosavljevic, Nemanja Djuric, Narayan Bhamidipati, Jaikit Savla, Varun Bhagwan, and Doug Sharp. 2015. E-commerce in your inbox: Product recommendations at scale. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . ACM, 1809– 1818

  6. [6]

    Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable Feature Learning for Networks. CoRR abs/1607.00653 (2016). arXiv:1607.00653 http://arxiv.org/abs/ 1607.00653

  7. [7]

    Michael Gutmann and Aapo HyvÃďrinen. 2010. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statis- tics (Proceedings of Machine Learning Research) , Yee Whye Teh and Mike Tit- terington (Eds.), Vol. 9. PMLR, Chia Laguna Re...

  8. [8]

    Thomas N Kipf and Max Welling. 2016. Variational graph auto-encoders. arXiv preprint arXiv:1611.07308 (2016)

  9. [9]

    KONECT. 2017. Friendster Network Dataset

  10. [10]

    Omer Levy, Yoav Goldberg, and Ido Dagan. 2015. Improving Distributional Similarity with Lessons Learned from Word Embeddings. Transactions of the Association for Computational Linguistics 3 (2015), 211–225. https://transacl.org/ ojs/index.php/tacl/article/view/570

  11. [11]

    Thomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. Graph Embeddings at Scale MLG’19, August 2019, Anchorage, Alaska USA In Neural Information Processing Systems

  12. [12]

    Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean

  13. [13]

    In Advances in Neural Information Processing Systems 26 , C

    Distributed Representations of Words and Phrases and their Com- positionality. In Advances in Neural Information Processing Systems 26 , C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Wein- berger (Eds.). Curran Associates, Inc., 3111–3119. http://papers.nips.cc/paper/ 5021-distributed-representations-of-words-and-phrases-and-their-compo...

  14. [14]

    Andriy Mnih and Koray Kavukcuoglu. 2013. Learning word embeddings ef- ficiently with noise-contrastive estimation. In Advances in neural information processing systems. 2265–2273

  15. [15]

    Erik Ordentlich, Lee Yang, Andy Feng, Peter Cnudde, Mihajlo Grbovic, Nemanja Djuric, Vladan Radosavljevic, and Gavin Owens. 2016. Network-efficient dis- tributed word2vec training system for large vocabularies. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. ACM, 1139–1148

  16. [16]

    Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. DeepWalk: Online Learn- ing of Social Representations. CoRR abs/1403.6652 (2014). arXiv:1403.6652 http://arxiv.org/abs/1403.6652

  17. [17]

    Benjamin Recht, Christopher Re, Stephen Wright, and Feng Niu. 2011. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In Advances in neural information processing systems . 693–701

  18. [18]

    Charalampos Tsourakakis, Christos Gkantsidis, Bozidar Radunovic, and Milan Vojnovic. 2014. FENNEL: Streaming Graph Partitioning for Massive Scale Graphs. In Proceedings of the 7th ACM International Conference on Web Search and Data Mining (WSDM ’14). ACM, New York, NY, USA, 333–342. https://doi.org/10. 1145/2556195.2556213

  19. [19]

    Jizhe Wang, Pipei Huang, Huan Zhao, Zhibo Zhang, Binqiang Zhao, and Dik Lun Lee. 2018. Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba. arXiv preprint arXiv:1803.02349 (2018)

  20. [20]

    Ledell Wu, Adam Fisch, Sumit Chopra, Keith Adams, Antoine Bordes, and Jason Weston. 2017. StarSpace: Embed All The Things! CoRR abs/1709.03856 (2017). arXiv:1709.03856 http://arxiv.org/abs/1709.03856