pith. machine review for the scientific record. sign in

arxiv: 1805.00280 · v1 · submitted 2018-05-01 · 💻 cs.DC

Recognition: unknown

Efficient Graph Computation for Node2Vec

Authors on Pith no claims yet
classification 💻 cs.DC
keywords node2veccomputationfast-node2vecgraphsoverheadverticesedgesrandom
0
0 comments X
read the original abstract

Node2Vec is a state-of-the-art general-purpose feature learning method for network analysis. However, current solutions cannot run Node2Vec on large-scale graphs with billions of vertices and edges, which are common in real-world applications. The existing distributed Node2Vec on Spark incurs significant space and time overhead. It runs out of memory even for mid-sized graphs with millions of vertices. Moreover, it considers at most 30 edges for every vertex in generating random walks, causing poor result quality. In this paper, we propose Fast-Node2Vec, a family of efficient Node2Vec random walk algorithms on a Pregel-like graph computation framework. Fast-Node2Vec computes transition probabilities during random walks to reduce memory space consumption and computation overhead for large-scale graphs. The Pregel-like scheme avoids space and time overhead of Spark's read-only RDD structures and shuffle operations. Moreover, we propose a number of optimization techniques to further reduce the computation overhead for popular vertices with large degrees. Empirical evaluation show that Fast-Node2Vec is capable of computing Node2Vec on graphs with billions of vertices and edges on a mid-sized machine cluster. Compared to Spark-Node2Vec, Fast-Node2Vec achieves 7.7--122x speedups.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. NOMAD: Generating Embeddings for Massive Distributed Graphs

    cs.LG 2026-04 unverdicted novelty 5.0

    NOMAD delivers an MPI-based distributed implementation of graph embedding models achieving 10-100x median speedups over multi-threaded baselines and 35-76x over prior distributed systems on large clusters.