Recognition: 1 theorem link
· Lean TheoremNOMAD: Generating Embeddings for Massive Distributed Graphs
Pith reviewed 2026-05-10 18:11 UTC · model grok-4.3
The pith
NOMAD distributes LINE-based graph embedding across machines with MPI to handle massive graphs at 10-100x speed.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
NOMAD is a distributed-memory framework that implements the proximity-based embedding models of the LINE algorithm using MPI. It introduces several practical trade-offs that lower communication volume for irregular graphs, yielding median speedups of 10-100x versus multi-threaded LINE and node2vec, 35-76x versus distributed PBG, and 12-370x end-to-end on real-world graphs while producing embeddings of competitive quality.
What carries the argument
The NOMAD framework, which runs proximity-based embedding models from LINE under MPI message passing and applies practical trade-offs that reduce inter-node communication for irregular distributed graphs.
Load-bearing premise
The practical adjustments that cut communication volume still capture the same node proximity information as the original models and do not systematically distort the learned vectors on the tested graphs.
What would settle it
Running NOMAD and the original single-node LINE on an identical large graph and finding that downstream task performance using NOMAD embeddings is substantially worse.
Figures
read the original abstract
Successful machine learning on graphs or networks requires embeddings that not only represent nodes and edges as low-dimensional vectors but also preserve the graph structure. Established methods for generating embeddings require flexible exploration of the entire graph through repeated use of random walks that capture graph structure with samples of nodes and edges. These methods create scalability challenges for massive graphs with millions-to-billions of edges because single-node solutions have inadequate memory and processing capabilities. We present NOMAD, a distributed-memory graph embedding framework using the Message Passing Interface (MPI) for distributed graphs. NOMAD implements proximity-based models proposed in the widely popular LINE (Large-scale Information Network Embedding) algorithm. We propose several practical trade-offs to improve the scalability and communication overheads confronted by irregular and distributed graph embedding methods, catering to massive-scale graphs arising in web and science domains. NOMAD demonstrates median speedups of 10/100x on CPU-based NERSC Perlmutter cluster relative to the popular reference implementations of multi-threaded LINE and node2vec, 35-76x over distributed PBG, and competitive embedding quality relative to LINE, node2vec, and GraphVite, while yielding 12-370x end-to-end speedups on real-world graphs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents NOMAD, an MPI-based distributed-memory framework for node embeddings on massive graphs (millions to billions of edges). It implements proximity-based models from the LINE algorithm, introduces practical trade-offs to reduce communication overhead for irregular distributed graphs, and reports median speedups of 10/100x over multi-threaded LINE and node2vec, 35-76x over distributed PBG, competitive embedding quality versus LINE/node2vec/GraphVite, and 12-370x end-to-end speedups on real-world graphs evaluated on the NERSC Perlmutter cluster.
Significance. If the empirical results hold, the work addresses a key scalability barrier for graph ML on web- and science-scale networks that exceed single-node memory/processing limits. Strengths include direct performance comparisons against standard reference implementations, explicit validation that the communication-reducing trade-offs preserve embedding fidelity on the evaluated graphs, and quantified end-to-end gains that demonstrate practical impact.
minor comments (3)
- The abstract states 'competitive embedding quality' and 'preserve the structural properties' but does not name the concrete metrics (e.g., AUC for link prediction, accuracy for node classification) or the exact graphs used for the 12-370x claim; these should be stated explicitly in the abstract and §4.
- The description of the practical trade-offs for communication reduction is central to the scalability argument; adding a short pseudocode listing or parameter table in the methods section would improve reproducibility without lengthening the paper.
- Speedup figures are given as medians; reporting the number of runs, standard deviation or error bars, and the precise graph partitioning strategy used on Perlmutter would allow readers to assess variability of the 10-100x and 35-76x claims.
Simulated Author's Rebuttal
We thank the referee for the positive review and recommendation for minor revision. The assessment accurately summarizes NOMAD's contributions to scalable distributed graph embeddings, including the reported speedups, communication optimizations, and competitive embedding quality on large real-world graphs. We appreciate the recognition of its practical impact for web- and science-scale networks.
Circularity Check
No significant circularity
full rationale
The paper presents an engineering framework (NOMAD) that implements the existing LINE proximity objective on distributed graphs using MPI, along with practical communication-reducing trade-offs. All central claims are empirical measurements of runtime speedups and embedding quality against external baselines (multi-threaded LINE, node2vec, PBG, GraphVite) on real-world graphs. No derivation chain, first-principles prediction, or fitted parameter is presented that reduces to its own inputs by construction. Self-citations, if any, are limited to referencing the original LINE paper and are not load-bearing for the reported results. The work is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Pytorch-biggraph: A large scale graph embedding system,
A. Lerer, L. Wu, J. Shen, T. Lacroix, L. Wehrstedt, A. Bose, and A. Peysakhovich, “Pytorch-biggraph: A large scale graph embedding system,”Proceedings of Machine Learning and Systems, vol. 1, pp. 120– 131, 2019. [Online]. Available: https://proceedings.mlsys.org/paper files /paper/2019/hash/1eb34d662b67a14e3511d0dfd78669be-Abstract.html
2019
-
[2]
Efficient graph computation for node2vec,
D. Zhou, S. Niu, and S. Chen, “Efficient graph computation for node2vec,”CoRR, vol. abs/1805.00280, 2018. [Online]. Available: http://arxiv.org/abs/1805.00280
-
[3]
Graphvite: A high- performance cpu-gpu hybrid system for node embedding,
Z. Zhu, S. Xu, J. Tang, and M. Qu, “Graphvite: A high- performance cpu-gpu hybrid system for node embedding,” inThe world wide web conference, 2019, pp. 2494–2504. [Online]. Available: https://doi.org/10.1145/3308558.3313508
-
[4]
Grape for fast and scalable graph processing and random-walk-based embedding,
L. Cappelletti, T. Fontana, E. Casiraghi, V . Ravanmehr, T. J. Callahan, C. Cano, M. P. Joachimiak, C. J. Mungall, P. N. Robinson, J. Reeseet al., “Grape for fast and scalable graph processing and random-walk-based embedding,”Nature Computational Science, vol. 3, no. 6, pp. 552–568,
-
[5]
Available: https://doi.org/10.1038/s43588-023-00465-8
[Online]. Available: https://doi.org/10.1038/s43588-023-00465-8
-
[6]
cugraph: Rapids graph analytics library,
RAPIDS AI, “cugraph: Rapids graph analytics library,” https://github.c om/rapidsai/cugraph, 2026, accessed: 2026-04-01
2026
-
[7]
Accelerating graph sampling for graph machine learning using gpus,
A. Jangda, S. Polisetty, A. Guha, and M. Serafini, “Accelerating graph sampling for graph machine learning using gpus,” inProceedings of the sixteenth European conference on computer systems, 2021, pp. 311–326. [Online]. Available: https://doi.org/10.1145/3447786.3456244
-
[8]
Generalized Slow Roll for Tensors
S. Pandey, L. Li, A. Hoisie, X. S. Li, and H. Liu, “C-saw: A framework for graph sampling and random walk on gpus,” in SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 2020, pp. 1–15. [Online]. Available: https://doi.org/10.1109/SC41405.2020.00060
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1109/sc41405.2020.00060 2020
-
[9]
InProceedings of the 29th Symposium on Operating Systems Principles(Koblenz, Germany)(SOSP ’23)
P. Gong, R. Liu, Z. Mao, Z. Cai, X. Yan, C. Li, M. Wang, and Z. Li, “gsampler: General and efficient gpu-based graph sampling for graph learning,” inProceedings of the 29th Symposium on Operating Systems Principles, 2023, pp. 562–578. [Online]. Available: https://doi.org/10.1145/3600006.3613168
-
[10]
Representation learning on graphs: Methods and applications,
W. L. Hamilton, R. Ying, and J. Leskovec, “Representation learning on graphs: Methods and applications,”IEEE Data Eng. Bull., vol. 40, pp. 52–74, 2017. [Online]. Available: https://api.semanticscholar.org/Corp usID:3215337
2017
-
[11]
Exploiting centrality information with graph convolutions for network representation learning,
H. Chen, H. Yin, T. Chen, Q. V . H. Nguyen, W.-C. Peng, and X. Li, “Exploiting centrality information with graph convolutions for network representation learning,” in2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, 2019, pp. 590–601. [Online]. Available: https://doi.org/10.1109/ICDE.2019.00059
-
[12]
Line: Large-scale information network embedding,
J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei, “Line: Large-scale information network embedding,” inProceedings of the 24th international conference on world wide web, 2015, pp. 1067–1077. [Online]. Available: https://doi.org/10.1145/2736277.2741093
-
[13]
Deepwalk: Online learning of social representations,
B. Perozzi, R. Al-Rfou, and S. Skiena, “Deepwalk: Online learning of social representations,” inProceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, 2014, pp. 701–710. [Online]. Available: https://doi.org/10.1145/262333 0.2623732
-
[14]
node2vec: Scalable Feature Learning for Networks
A. Grover and J. Leskovec, “node2vec: Scalable feature learning for networks,” inProceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, 2016, pp. 855–864. [Online]. Available: https://doi.org/10.1145/2939672.2939754
-
[15]
Unsupervised graph representation learning with inductive shallow node embedding,
R. Kiss and G. Sz ˝ucs, “Unsupervised graph representation learning with inductive shallow node embedding,”Complex & Intelligent Systems, vol. 10, no. 5, pp. 7333–7348, 2024. [Online]. Available: https://doi.org/10.1007/s40747-024-01545-6
-
[16]
Heterogeneous graph neural network for attribute completion,
K. Wang, Y . Yu, C. Huang, Z. Zhao, and J. Dong, “Heterogeneous graph neural network for attribute completion,”Knowledge-Based Systems, vol. 251, p. 109171, 2022. [Online]. Available: https: //doi.org/10.1016/j.knosys.2022.109171
-
[17]
Heterogeneous graph neural networks with weak information,
C. Na, K. Yang, and X. Sun, “Heterogeneous graph neural networks with weak information,”Expert Systems with Applications, p. 130133,
-
[18]
Available: https://doi.org/10.1016/j.eswa.2025.130133
[Online]. Available: https://doi.org/10.1016/j.eswa.2025.130133
-
[19]
Graph convolutional networks for graphs containing missing features,
H. Taguchi, X. Liu, and T. Murata, “Graph convolutional networks for graphs containing missing features,”Future Generation Computer Systems, vol. 117, pp. 155–168, 2021. [Online]. Available: https: //doi.org/10.1016/j.future.2020.11.016
-
[20]
Q. Wu, Z. Wang, C. Li, Y . Ye, Y . Li, and N. Sun, “Protein functional properties prediction in sparsely-label ppi networks through regularized non-negative matrix factorization,”BMC systems biology, vol. 9, no. Suppl 1, p. S9, 2015. [Online]. Available: https: //doi.org/10.1186/1752-0509-9-S1-S9
-
[21]
Science290(5500), 2319–2323 (2000) https://doi.org/10.1126/ science.290.5500.2319
J. B. Tenenbaum, V . d. Silva, and J. C. Langford, “A global geometric framework for nonlinear dimensionality reduction,”science, vol. 290, no. 5500, pp. 2319–2323, 2000. [Online]. Available: https://doi.org/10.1126/science.290.5500.2319
-
[22]
Laplacian eigenmaps for dimensionality reduction and data representation,
M. Belkin and P. Niyogi, “Laplacian eigenmaps for dimensionality reduction and data representation,”Neural Comput., vol. 15, no. 6, pp. 1373–1396, 2003. [Online]. Available: https://doi.org/10.1162/089976 603321780317
-
[23]
M. A. Cox and T. F. Cox, “Multidimensional scaling,” inHandbook of data visualization. Springer, 2008, pp. 315–347. [Online]. Available: https://doi.org/10.1007/978-3-540-33037-0 14
-
[24]
Hierarchical stochastic neighbor embedding,
N. Pezzotti, T. H ¨ollt, B. Lelieveldt, E. Eisemann, and A. Vilanova, “Hierarchical stochastic neighbor embedding,” inComputer graphics forum, vol. 35, no. 3. Wiley Online Library, 2016, pp. 21–30. [Online]. Available: https://doi.org/10.1111/cgf.12878
-
[25]
Comparing random walks in graph embedding and link prediction,
A. Vital Jr, F. N. Silva, and D. R. Amancio, “Comparing random walks in graph embedding and link prediction,”PloS one, vol. 19, no. 11, p. e0312863, 2024. [Online]. Available: https://doi.org/10.48550/arXiv.2308.03636
-
[26]
Graph embedding techniques, applications, and performance: A survey,
P. Goyal and E. Ferrara, “Graph embedding techniques, applications, and performance: A survey,”Knowledge-Based Systems, vol. 151, pp. 78–94,
-
[27]
Available: https://doi.org/10.1016/j.knosys.2018.03.022
[Online]. Available: https://doi.org/10.1016/j.knosys.2018.03.022
-
[28]
Scalerunner: A fast mpi-based random walk engine for multi-cpu systems,
F. Willich and H. Meyerhenke, “Scalerunner: A fast mpi-based random walk engine for multi-cpu systems,” inEuropean Conference on Parallel Processing. Springer, 2025, pp. 124–138. [Online]. Available: https://doi.org/10.1007/978-3-031-99872-0 9
-
[29]
Finding good approximate vertex and edge partitions is np-hard,
T. N. Bui and C. Jones, “Finding good approximate vertex and edge partitions is np-hard,”Information Processing Letters, vol. 42, no. 3, pp. 153–159, 1992. [Online]. Available: https: //doi.org/10.1016/0020-0190(92)90140-Q
-
[30]
Rethinking design paradigm of graph processing system with a cxl-like memory semantic fabric,
X. Zhang, Y . Chang, T. Lu, K. Zhang, and M. Chen, “Rethinking design paradigm of graph processing system with a cxl-like memory semantic fabric,” in2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid). IEEE, 2023, pp. 25–35
2023
-
[31]
Distributed representations of words and phrases and their compositionality,
T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” Advances in Neural Information Processing Systems (NeurIPS), 2013. [Online]. Available: https://proceedings.neurips.cc/paper/2013/hash/9aa 42b31882ec039965f3c4923ce901b-Abstract.html
2013
-
[32]
Efficient Estimation of Word Representations in Vector Space
T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” in1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track Proceedings, Y . Bengio and Y . LeCun, Eds., 2013. [Online]. Available: http://arxiv.org/abs/1301.3781
work page internal anchor Pith review arXiv 2013
-
[33]
Pecanpy: a fast, efficient and parallelized python implementation of node2vec,
R. Liu and A. Krishnan, “Pecanpy: a fast, efficient and parallelized python implementation of node2vec,”Bioinformatics, vol. 37, no. 19, pp. 3377–3379, 2021. [Online]. Available: https://doi.org/10.1093/bioi nformatics/btab202
-
[34]
Hogwild: A lock-free approach to parallelizing stochastic gradient descent,
B. Recht, C. Re, S. Wright, and F. Niu, “Hogwild: A lock-free approach to parallelizing stochastic gradient descent,”Advances in neural information processing systems, vol. 24, 2011. [Online]. Available: https://proceedings.neurips.cc/paper/2011/hash/218a0aefd1d 1a4be65601cc6ddc1520e-Abstract.html
2011
-
[35]
A. Renz-Wieland, R. Gemulla, Z. Kaoudi, and V . Markl, “Nups: A parameter server for machine learning with non-uniform parameter access,” inProceedings of the 2022 International Conference on Management of Data, 2022, pp. 481–495. [Online]. Available: https://doi.org/10.1145/3514221.3517860
-
[36]
Pregel: a system for large-scale graph processing,
G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski, “Pregel: a system for large-scale graph processing,” inProceedings of the 2010 ACM SIGMOD International Conference on Management of data, 2010, pp. 135–146. [Online]. Available: https://doi.org/10.1145/1807167.1807184
-
[37]
Scalable node embedding algorithms using distributed sparse matrix operations,
I. Ranawaka and A. Azad, “Scalable node embedding algorithms using distributed sparse matrix operations,” in2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, 2024, pp. 1199–1201. [Online]. Available: https://doi.org/10.110 9/IPDPSW63119.2024.00205
-
[38]
Distributed graph embedding with information-oriented random walks,
P. Fang, A. Khan, S. Luo, F. Wang, D. Feng, Z. Li, W. Yin, and Y . Cao, “Distributed graph embedding with information-oriented random walks,”Proc. VLDB Endow., vol. 16, no. 7, pp. 1643–1656,
-
[39]
Available: https://doi.org/10.14778/3587136.3587140
[Online]. Available: https://doi.org/10.14778/3587136.3587140
-
[40]
Sparse collective operations for mpi,
T. Hoefler and J. L. Traff, “Sparse collective operations for mpi,” in2009 IEEE International Symposium on Parallel & Distributed Processing. IEEE, 2009, pp. 1–8. [Online]. Available: https://doi.org/10.1109/IPDPS.2009.5160935
-
[41]
Accelerate science on perlmutter with nersc,
C. Yang and J. Deslippe, “Accelerate science on perlmutter with nersc,” Bulletin of the American Physical Society, vol. 65, 2020. [Online]. Available: https://meetings.aps.org/Meeting/MAR20/Session/D40.9
2020
-
[42]
LINE: Large-scale information network embedding (github repository),
J. Tang, “LINE: Large-scale information network embedding (github repository),” https://github.com/tangjianpku/LINE, 2026, accessed: 2026-02-06
2026
-
[43]
SNAP: Stanford network analysis platform (github repository),
Stanford Network Analysis Project, “SNAP: Stanford network analysis platform (github repository),” https://github.com/snap-stanford/snap, accessed: 2026-02-06
2026
-
[44]
Pubmedgraphdataset,
DGL Contributors, “Pubmedgraphdataset,” https://www.dgl.ai/dgl doc s/generated/dgl.data.PubmedGraphDataset.html, accessed: 2026-04-04
2026
-
[45]
Gnn benchmark datasets,
Shchur, Oleksandr and others, “Gnn benchmark datasets,” https://github .com/shchur/gnn-benchmark#datasets, accessed: 2026-04-04
2026
-
[46]
Open graph benchmark: Datasets for machine learning on graphs,
W. Hu, M. Fey, M. Zitnik, Y . Dong, H. Ren, B. Liu, M. Catasta, and J. Leskovec, “Open graph benchmark: Datasets for machine learning on graphs,”Advances in neural information processing systems, vol. 33, pp. 22 118–22 133, 2020. [Online]. Available: https://proceedings.neurips. cc/paper/2020/hash/fb60d411a5c5b72b2e7d3527cfc84fd0-Abstract.html
2020
-
[47]
Mpi-sws social network datasets,
Max Planck Institute for Software Systems, “Mpi-sws social network datasets,” https://socialnetworks.mpi-sws.org/, accessed: 2026-04-04
2026
-
[48]
SNAP Datasets: Stanford large network dataset collection,
J. Leskovec and A. Krevl, “SNAP Datasets: Stanford large network dataset collection,” http://snap.stanford.edu/data, Jun. 2014
2014
-
[49]
Web data commons hyperlink graph, 2012-08,
Web Data Commons, “Web data commons hyperlink graph, 2012-08,” https://webdatacommons.org/hyperlinkgraph/2012-08/download.html, accessed: 2026-04-04
2012
-
[50]
An implementation and evaluation of the mpi 3.0 one-sided com- munication interface,
J. Dinan, P. Balaji, D. Buntinas, D. Goodell, W. Gropp, and R. Thakur, “An implementation and evaluation of the mpi 3.0 one-sided com- munication interface,”Concurrency and Computation: Practice and Experience, vol. 28, no. 17, pp. 4385–4404, 2016
2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.