arxiv: 2604.09419 · v1 · submitted 2026-04-10 · 💻 cs.LG · cs.DC

Recognition: 1 theorem link

· Lean Theorem

NOMAD: Generating Embeddings for Massive Distributed Graphs

Aishwarya Sarkar , Sayan Ghosh , Nathan R. Tallent , Ali Jannesari

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:11 UTC · model grok-4.3

classification 💻 cs.LG cs.DC

keywords graph embeddingsdistributed computingMPILINE algorithmscalabilitynode2veclarge-scale graphsnetwork embedding

0 comments

The pith

NOMAD distributes LINE-based graph embedding across machines with MPI to handle massive graphs at 10-100x speed.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to show that graph embedding methods like LINE can be scaled to graphs with billions of edges by moving the computation into distributed memory using the message passing interface. Single-node approaches run out of memory or take too long on such graphs, which are common in web and scientific data. NOMAD adds targeted adjustments that cut the amount of data sent between machines during the random-walk sampling process. These changes produce large measured speedups on real hardware while the resulting node vectors remain close in quality to those from standard single-node and other distributed tools. If correct, this removes a practical barrier for applying structure-preserving embeddings to networks that previously could not be processed at all.

Core claim

NOMAD is a distributed-memory framework that implements the proximity-based embedding models of the LINE algorithm using MPI. It introduces several practical trade-offs that lower communication volume for irregular graphs, yielding median speedups of 10-100x versus multi-threaded LINE and node2vec, 35-76x versus distributed PBG, and 12-370x end-to-end on real-world graphs while producing embeddings of competitive quality.

What carries the argument

The NOMAD framework, which runs proximity-based embedding models from LINE under MPI message passing and applies practical trade-offs that reduce inter-node communication for irregular distributed graphs.

Load-bearing premise

The practical adjustments that cut communication volume still capture the same node proximity information as the original models and do not systematically distort the learned vectors on the tested graphs.

What would settle it

Running NOMAD and the original single-node LINE on an identical large graph and finding that downstream task performance using NOMAD embeddings is substantially worse.

Figures

Figures reproduced from arXiv: 2604.09419 by Aishwarya Sarkar, Ali Jannesari, Nathan R. Tallent, Sayan Ghosh.

**Figure 1.** Figure 1: With sync: processes observe up-to-date context. Without sync: delayed updates result in stale context, affecting quality. III. RELATED WORK PecanPy [30] and GRAPE [4] improve within-node sampling and training efficiency, while GraphVite [3] combines CPU sampling with GPU training for high single-node [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Comparison of NOMAD with LINE and node2vec. 1 2 4 8 16 32 #Nodes 10 1 10 2 10 3 Avg execution time 35x 64x 68x 76x 66x 63x Youtube PBG NOMAD 1 2 4 8 #Nodes 41x 42x 44x 53x Physics [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 5.** Figure 5: End-to-end time of NOMAD variants on ogbn-arxiv variant is excluded since reuse/refresh-spill variants are more appropriate). We observed the same qualitative ordering of variants on other smaller inputs. The results clearly indicate that both refresh-spill and the augmented single walk versions incur the highest execution times: at least an order of magnitude higher than the rest. Refresh-spill generates… view at source ↗

**Figure 4.** Figure 4: In general, Fig. 6 shows that I [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 6.** Figure 6: Scalability of NOMAD on 13 diverse datasets (Table II) up to 16K processes. Top: AllReduce; Bottom: Ibarrier. 16 32 64 128 256 512 1024 2048 4096 #Processes 10 2 10 3 10 4 10 5 10 6 10 7 10 8 # Positive samples across processes ogbn-products ogbn-proteins hyperlink2012 reddit youtube AR-Local AR-Remote (Reuse-Spill) IB-Local / IB-Remote (Reuse-Spill) [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Distribution of positive samples across processes for AR-Local and AR-Remote (Reuse-Spill) on large datasets. Red markers denote IB-Local and IB-Remote (Reuse-Spill), which show little or no variation across processes. 2048 4096 8192 0 50 100 % of total time Sample time (light) Train time (dark) twitter7 friendster hyperlink-2012 ogbn-papers100M ogbn-products ogbn-proteins AR-Local IB-Remote (Aug-Pair) 204… view at source ↗

**Figure 8.** Figure 8: Breakdown of end-to-end execution time. tion and load imbalance ( [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗

**Figure 9.** Figure 9: 2 MB huge pages on IB-Remote (Aug-Pair). Top: sampling/training breakdown. Bottom: end-to-end time. We also develop an MPI-3.0 Remote Memory Access (RMA) variant of NOMAD using passive-target synchronization and the unified memory model [46]. Unlike the default collective version, which couples communication and synchronization, RMA separates them through one-sided get and accumulate operations with flus… view at source ↗

**Figure 10.** Figure 10: RMA variants increases load imbalance for Refresh-Spill (depicted by error bars): comparing regular against RMA for Refresh-Spill and Aug-Pair [PITH_FULL_IMAGE:figures/full_fig_p010_10.png] view at source ↗

read the original abstract

Successful machine learning on graphs or networks requires embeddings that not only represent nodes and edges as low-dimensional vectors but also preserve the graph structure. Established methods for generating embeddings require flexible exploration of the entire graph through repeated use of random walks that capture graph structure with samples of nodes and edges. These methods create scalability challenges for massive graphs with millions-to-billions of edges because single-node solutions have inadequate memory and processing capabilities. We present NOMAD, a distributed-memory graph embedding framework using the Message Passing Interface (MPI) for distributed graphs. NOMAD implements proximity-based models proposed in the widely popular LINE (Large-scale Information Network Embedding) algorithm. We propose several practical trade-offs to improve the scalability and communication overheads confronted by irregular and distributed graph embedding methods, catering to massive-scale graphs arising in web and science domains. NOMAD demonstrates median speedups of 10/100x on CPU-based NERSC Perlmutter cluster relative to the popular reference implementations of multi-threaded LINE and node2vec, 35-76x over distributed PBG, and competitive embedding quality relative to LINE, node2vec, and GraphVite, while yielding 12-370x end-to-end speedups on real-world graphs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

NOMAD is a practical MPI-based reimplementation of LINE with communication-reducing tweaks that delivers real speedups on Perlmutter-scale graphs while keeping quality close to baselines.

read the letter

The main takeaway is that this paper ships a working distributed-memory version of the 2015 LINE embedding method using MPI, plus some targeted engineering choices to cut communication costs on irregular graphs. It reports median 10-100x speedups over single-node multi-threaded LINE and node2vec, 35-76x over PBG, and up to 370x end-to-end on real graphs, with embedding quality that stays competitive on the tested datasets. That combination of concrete numbers and hardware results is the part worth paying attention to if you need embeddings on web- or science-scale networks that no longer fit in one node’s memory. The authors focus on proximity-based models and describe practical trade-offs for partitioning and sampling that reduce message passing overhead. They validate the approach by direct comparison to reference implementations on real-world graphs, which gives the claims some grounding. The work is clearly aimed at people who already know LINE or node2vec and now face memory walls. The soft spots are mostly around how much those trade-offs affect the learned representations. The paper treats the changes as engineering decisions whose effect is checked empirically, but without more detail on the exact partitioning strategy, negative sampling distribution across nodes, or sensitivity across graph families, it is hard to judge whether the quality holds for graphs with different degree distributions or community structure. The core algorithm is not new, so the contribution sits in the system and the measured speedups rather than in a fresh modeling idea. This is the kind of paper that belongs in a reading group focused on large-scale graph ML or distributed systems for ML. Practitioners who need to run embeddings on clusters will find the implementation lessons and benchmark numbers useful even if they end up modifying the trade-offs. It is not a theoretical advance, but the empirical results are specific enough to be worth checking. I would send it to peer review. The problem is real, the speedups are quantified on actual hardware, and the work provides a baseline others can compare against or extend. A referee can push on the details of the trade-offs and reproducibility without the paper being fundamentally unsound.

Referee Report

0 major / 3 minor

Summary. The manuscript presents NOMAD, an MPI-based distributed-memory framework for node embeddings on massive graphs (millions to billions of edges). It implements proximity-based models from the LINE algorithm, introduces practical trade-offs to reduce communication overhead for irregular distributed graphs, and reports median speedups of 10/100x over multi-threaded LINE and node2vec, 35-76x over distributed PBG, competitive embedding quality versus LINE/node2vec/GraphVite, and 12-370x end-to-end speedups on real-world graphs evaluated on the NERSC Perlmutter cluster.

Significance. If the empirical results hold, the work addresses a key scalability barrier for graph ML on web- and science-scale networks that exceed single-node memory/processing limits. Strengths include direct performance comparisons against standard reference implementations, explicit validation that the communication-reducing trade-offs preserve embedding fidelity on the evaluated graphs, and quantified end-to-end gains that demonstrate practical impact.

minor comments (3)

The abstract states 'competitive embedding quality' and 'preserve the structural properties' but does not name the concrete metrics (e.g., AUC for link prediction, accuracy for node classification) or the exact graphs used for the 12-370x claim; these should be stated explicitly in the abstract and §4.
The description of the practical trade-offs for communication reduction is central to the scalability argument; adding a short pseudocode listing or parameter table in the methods section would improve reproducibility without lengthening the paper.
Speedup figures are given as medians; reporting the number of runs, standard deviation or error bars, and the precise graph partitioning strategy used on Perlmutter would allow readers to assess variability of the 10-100x and 35-76x claims.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive review and recommendation for minor revision. The assessment accurately summarizes NOMAD's contributions to scalable distributed graph embeddings, including the reported speedups, communication optimizations, and competitive embedding quality on large real-world graphs. We appreciate the recognition of its practical impact for web- and science-scale networks.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents an engineering framework (NOMAD) that implements the existing LINE proximity objective on distributed graphs using MPI, along with practical communication-reducing trade-offs. All central claims are empirical measurements of runtime speedups and embedding quality against external baselines (multi-threaded LINE, node2vec, PBG, GraphVite) on real-world graphs. No derivation chain, first-principles prediction, or fitted parameter is presented that reduces to its own inputs by construction. Self-citations, if any, are limited to referencing the original LINE paper and are not load-bearing for the reported results. The work is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; the central claim rests on the unstated assumption that the chosen trade-offs maintain embedding fidelity.

pith-pipeline@v0.9.0 · 5524 in / 1139 out tokens · 101682 ms · 2026-05-10T18:11:49.452072+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

50 extracted references · 30 canonical work pages · 2 internal anchors

[1]

Pytorch-biggraph: A large scale graph embedding system,

A. Lerer, L. Wu, J. Shen, T. Lacroix, L. Wehrstedt, A. Bose, and A. Peysakhovich, “Pytorch-biggraph: A large scale graph embedding system,”Proceedings of Machine Learning and Systems, vol. 1, pp. 120– 131, 2019. [Online]. Available: https://proceedings.mlsys.org/paper files /paper/2019/hash/1eb34d662b67a14e3511d0dfd78669be-Abstract.html

2019
[2]

Efficient graph computation for node2vec,

D. Zhou, S. Niu, and S. Chen, “Efficient graph computation for node2vec,”CoRR, vol. abs/1805.00280, 2018. [Online]. Available: http://arxiv.org/abs/1805.00280

work page arXiv 2018
[3]

Graphvite: A high- performance cpu-gpu hybrid system for node embedding,

Z. Zhu, S. Xu, J. Tang, and M. Qu, “Graphvite: A high- performance cpu-gpu hybrid system for node embedding,” inThe world wide web conference, 2019, pp. 2494–2504. [Online]. Available: https://doi.org/10.1145/3308558.3313508

work page doi:10.1145/3308558.3313508 2019
[4]

Grape for fast and scalable graph processing and random-walk-based embedding,

L. Cappelletti, T. Fontana, E. Casiraghi, V . Ravanmehr, T. J. Callahan, C. Cano, M. P. Joachimiak, C. J. Mungall, P. N. Robinson, J. Reeseet al., “Grape for fast and scalable graph processing and random-walk-based embedding,”Nature Computational Science, vol. 3, no. 6, pp. 552–568,
[5]

Available: https://doi.org/10.1038/s43588-023-00465-8

[Online]. Available: https://doi.org/10.1038/s43588-023-00465-8

work page doi:10.1038/s43588-023-00465-8
[6]

cugraph: Rapids graph analytics library,

RAPIDS AI, “cugraph: Rapids graph analytics library,” https://github.c om/rapidsai/cugraph, 2026, accessed: 2026-04-01

2026
[7]

Accelerating graph sampling for graph machine learning using gpus,

A. Jangda, S. Polisetty, A. Guha, and M. Serafini, “Accelerating graph sampling for graph machine learning using gpus,” inProceedings of the sixteenth European conference on computer systems, 2021, pp. 311–326. [Online]. Available: https://doi.org/10.1145/3447786.3456244

work page doi:10.1145/3447786.3456244 2021
[8]

Generalized Slow Roll for Tensors

S. Pandey, L. Li, A. Hoisie, X. S. Li, and H. Liu, “C-saw: A framework for graph sampling and random walk on gpus,” in SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 2020, pp. 1–15. [Online]. Available: https://doi.org/10.1109/SC41405.2020.00060

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1109/sc41405.2020.00060 2020
[9]

InProceedings of the 29th Symposium on Operating Systems Principles(Koblenz, Germany)(SOSP ’23)

P. Gong, R. Liu, Z. Mao, Z. Cai, X. Yan, C. Li, M. Wang, and Z. Li, “gsampler: General and efficient gpu-based graph sampling for graph learning,” inProceedings of the 29th Symposium on Operating Systems Principles, 2023, pp. 562–578. [Online]. Available: https://doi.org/10.1145/3600006.3613168

work page doi:10.1145/3600006.3613168 2023
[10]

Representation learning on graphs: Methods and applications,

W. L. Hamilton, R. Ying, and J. Leskovec, “Representation learning on graphs: Methods and applications,”IEEE Data Eng. Bull., vol. 40, pp. 52–74, 2017. [Online]. Available: https://api.semanticscholar.org/Corp usID:3215337

2017
[11]

Exploiting centrality information with graph convolutions for network representation learning,

H. Chen, H. Yin, T. Chen, Q. V . H. Nguyen, W.-C. Peng, and X. Li, “Exploiting centrality information with graph convolutions for network representation learning,” in2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, 2019, pp. 590–601. [Online]. Available: https://doi.org/10.1109/ICDE.2019.00059

work page doi:10.1109/icde.2019.00059 2019
[12]

Line: Large-scale information network embedding,

J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei, “Line: Large-scale information network embedding,” inProceedings of the 24th international conference on world wide web, 2015, pp. 1067–1077. [Online]. Available: https://doi.org/10.1145/2736277.2741093

work page doi:10.1145/2736277.2741093 2015
[13]

Deepwalk: Online learning of social representations,

B. Perozzi, R. Al-Rfou, and S. Skiena, “Deepwalk: Online learning of social representations,” inProceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, 2014, pp. 701–710. [Online]. Available: https://doi.org/10.1145/262333 0.2623732

work page doi:10.1145/262333 2014
[14]

node2vec: Scalable Feature Learning for Networks

A. Grover and J. Leskovec, “node2vec: Scalable feature learning for networks,” inProceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, 2016, pp. 855–864. [Online]. Available: https://doi.org/10.1145/2939672.2939754

work page doi:10.1145/2939672.2939754 2016
[15]

Unsupervised graph representation learning with inductive shallow node embedding,

R. Kiss and G. Sz ˝ucs, “Unsupervised graph representation learning with inductive shallow node embedding,”Complex & Intelligent Systems, vol. 10, no. 5, pp. 7333–7348, 2024. [Online]. Available: https://doi.org/10.1007/s40747-024-01545-6

work page doi:10.1007/s40747-024-01545-6 2024
[16]

Heterogeneous graph neural network for attribute completion,

K. Wang, Y . Yu, C. Huang, Z. Zhao, and J. Dong, “Heterogeneous graph neural network for attribute completion,”Knowledge-Based Systems, vol. 251, p. 109171, 2022. [Online]. Available: https: //doi.org/10.1016/j.knosys.2022.109171

work page doi:10.1016/j.knosys.2022.109171 2022
[17]

Heterogeneous graph neural networks with weak information,

C. Na, K. Yang, and X. Sun, “Heterogeneous graph neural networks with weak information,”Expert Systems with Applications, p. 130133,
[18]

Available: https://doi.org/10.1016/j.eswa.2025.130133

[Online]. Available: https://doi.org/10.1016/j.eswa.2025.130133

work page doi:10.1016/j.eswa.2025.130133 2025
[19]

Graph convolutional networks for graphs containing missing features,

H. Taguchi, X. Liu, and T. Murata, “Graph convolutional networks for graphs containing missing features,”Future Generation Computer Systems, vol. 117, pp. 155–168, 2021. [Online]. Available: https: //doi.org/10.1016/j.future.2020.11.016

work page doi:10.1016/j.future.2020.11.016 2021
[20]

Protein functional properties prediction in sparsely-label ppi networks through regularized non-negative matrix factorization,

Q. Wu, Z. Wang, C. Li, Y . Ye, Y . Li, and N. Sun, “Protein functional properties prediction in sparsely-label ppi networks through regularized non-negative matrix factorization,”BMC systems biology, vol. 9, no. Suppl 1, p. S9, 2015. [Online]. Available: https: //doi.org/10.1186/1752-0509-9-S1-S9

work page doi:10.1186/1752-0509-9-s1-s9 2015
[21]

Science290(5500), 2319–2323 (2000) https://doi.org/10.1126/ science.290.5500.2319

J. B. Tenenbaum, V . d. Silva, and J. C. Langford, “A global geometric framework for nonlinear dimensionality reduction,”science, vol. 290, no. 5500, pp. 2319–2323, 2000. [Online]. Available: https://doi.org/10.1126/science.290.5500.2319

work page doi:10.1126/science.290.5500.2319 2000
[22]

Laplacian eigenmaps for dimensionality reduction and data representation,

M. Belkin and P. Niyogi, “Laplacian eigenmaps for dimensionality reduction and data representation,”Neural Comput., vol. 15, no. 6, pp. 1373–1396, 2003. [Online]. Available: https://doi.org/10.1162/089976 603321780317

work page doi:10.1162/089976 2003
[23]

Multidimensional scaling,

M. A. Cox and T. F. Cox, “Multidimensional scaling,” inHandbook of data visualization. Springer, 2008, pp. 315–347. [Online]. Available: https://doi.org/10.1007/978-3-540-33037-0 14

work page doi:10.1007/978-3-540-33037-0 2008
[24]

Hierarchical stochastic neighbor embedding,

N. Pezzotti, T. H ¨ollt, B. Lelieveldt, E. Eisemann, and A. Vilanova, “Hierarchical stochastic neighbor embedding,” inComputer graphics forum, vol. 35, no. 3. Wiley Online Library, 2016, pp. 21–30. [Online]. Available: https://doi.org/10.1111/cgf.12878

work page doi:10.1111/cgf.12878 2016
[25]

Comparing random walks in graph embedding and link prediction,

A. Vital Jr, F. N. Silva, and D. R. Amancio, “Comparing random walks in graph embedding and link prediction,”PloS one, vol. 19, no. 11, p. e0312863, 2024. [Online]. Available: https://doi.org/10.48550/arXiv.2308.03636

work page doi:10.48550/arxiv.2308.03636 2024
[26]

Graph embedding techniques, applications, and performance: A survey,

P. Goyal and E. Ferrara, “Graph embedding techniques, applications, and performance: A survey,”Knowledge-Based Systems, vol. 151, pp. 78–94,
[27]

Available: https://doi.org/10.1016/j.knosys.2018.03.022

[Online]. Available: https://doi.org/10.1016/j.knosys.2018.03.022

work page doi:10.1016/j.knosys.2018.03.022 2018
[28]

Scalerunner: A fast mpi-based random walk engine for multi-cpu systems,

F. Willich and H. Meyerhenke, “Scalerunner: A fast mpi-based random walk engine for multi-cpu systems,” inEuropean Conference on Parallel Processing. Springer, 2025, pp. 124–138. [Online]. Available: https://doi.org/10.1007/978-3-031-99872-0 9

work page doi:10.1007/978-3-031-99872-0 2025
[29]

Finding good approximate vertex and edge partitions is np-hard,

T. N. Bui and C. Jones, “Finding good approximate vertex and edge partitions is np-hard,”Information Processing Letters, vol. 42, no. 3, pp. 153–159, 1992. [Online]. Available: https: //doi.org/10.1016/0020-0190(92)90140-Q

work page doi:10.1016/0020-0190(92)90140-q 1992
[30]

Rethinking design paradigm of graph processing system with a cxl-like memory semantic fabric,

X. Zhang, Y . Chang, T. Lu, K. Zhang, and M. Chen, “Rethinking design paradigm of graph processing system with a cxl-like memory semantic fabric,” in2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid). IEEE, 2023, pp. 25–35

2023
[31]

Distributed representations of words and phrases and their compositionality,

T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” Advances in Neural Information Processing Systems (NeurIPS), 2013. [Online]. Available: https://proceedings.neurips.cc/paper/2013/hash/9aa 42b31882ec039965f3c4923ce901b-Abstract.html

2013
[32]

Efficient Estimation of Word Representations in Vector Space

T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” in1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track Proceedings, Y . Bengio and Y . LeCun, Eds., 2013. [Online]. Available: http://arxiv.org/abs/1301.3781

work page internal anchor Pith review arXiv 2013
[33]

Pecanpy: a fast, efficient and parallelized python implementation of node2vec,

R. Liu and A. Krishnan, “Pecanpy: a fast, efficient and parallelized python implementation of node2vec,”Bioinformatics, vol. 37, no. 19, pp. 3377–3379, 2021. [Online]. Available: https://doi.org/10.1093/bioi nformatics/btab202

work page doi:10.1093/bioi 2021
[34]

Hogwild: A lock-free approach to parallelizing stochastic gradient descent,

B. Recht, C. Re, S. Wright, and F. Niu, “Hogwild: A lock-free approach to parallelizing stochastic gradient descent,”Advances in neural information processing systems, vol. 24, 2011. [Online]. Available: https://proceedings.neurips.cc/paper/2011/hash/218a0aefd1d 1a4be65601cc6ddc1520e-Abstract.html

2011
[35]

2022 , isbn =

A. Renz-Wieland, R. Gemulla, Z. Kaoudi, and V . Markl, “Nups: A parameter server for machine learning with non-uniform parameter access,” inProceedings of the 2022 International Conference on Management of Data, 2022, pp. 481–495. [Online]. Available: https://doi.org/10.1145/3514221.3517860

work page doi:10.1145/3514221.3517860 2022
[36]

Pregel: a system for large-scale graph processing,

G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski, “Pregel: a system for large-scale graph processing,” inProceedings of the 2010 ACM SIGMOD International Conference on Management of data, 2010, pp. 135–146. [Online]. Available: https://doi.org/10.1145/1807167.1807184

work page doi:10.1145/1807167.1807184 2010
[37]

Scalable node embedding algorithms using distributed sparse matrix operations,

I. Ranawaka and A. Azad, “Scalable node embedding algorithms using distributed sparse matrix operations,” in2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, 2024, pp. 1199–1201. [Online]. Available: https://doi.org/10.110 9/IPDPSW63119.2024.00205

work page arXiv 2024
[38]

Distributed graph embedding with information-oriented random walks,

P. Fang, A. Khan, S. Luo, F. Wang, D. Feng, Z. Li, W. Yin, and Y . Cao, “Distributed graph embedding with information-oriented random walks,”Proc. VLDB Endow., vol. 16, no. 7, pp. 1643–1656,
[39]

Available: https://doi.org/10.14778/3587136.3587140

[Online]. Available: https://doi.org/10.14778/3587136.3587140

work page doi:10.14778/3587136.3587140
[40]

Sparse collective operations for mpi,

T. Hoefler and J. L. Traff, “Sparse collective operations for mpi,” in2009 IEEE International Symposium on Parallel & Distributed Processing. IEEE, 2009, pp. 1–8. [Online]. Available: https://doi.org/10.1109/IPDPS.2009.5160935

work page doi:10.1109/ipdps.2009.5160935 2009
[41]

Accelerate science on perlmutter with nersc,

C. Yang and J. Deslippe, “Accelerate science on perlmutter with nersc,” Bulletin of the American Physical Society, vol. 65, 2020. [Online]. Available: https://meetings.aps.org/Meeting/MAR20/Session/D40.9

2020
[42]

LINE: Large-scale information network embedding (github repository),

J. Tang, “LINE: Large-scale information network embedding (github repository),” https://github.com/tangjianpku/LINE, 2026, accessed: 2026-02-06

2026
[43]

SNAP: Stanford network analysis platform (github repository),

Stanford Network Analysis Project, “SNAP: Stanford network analysis platform (github repository),” https://github.com/snap-stanford/snap, accessed: 2026-02-06

2026
[44]

Pubmedgraphdataset,

DGL Contributors, “Pubmedgraphdataset,” https://www.dgl.ai/dgl doc s/generated/dgl.data.PubmedGraphDataset.html, accessed: 2026-04-04

2026
[45]

Gnn benchmark datasets,

Shchur, Oleksandr and others, “Gnn benchmark datasets,” https://github .com/shchur/gnn-benchmark#datasets, accessed: 2026-04-04

2026
[46]

Open graph benchmark: Datasets for machine learning on graphs,

W. Hu, M. Fey, M. Zitnik, Y . Dong, H. Ren, B. Liu, M. Catasta, and J. Leskovec, “Open graph benchmark: Datasets for machine learning on graphs,”Advances in neural information processing systems, vol. 33, pp. 22 118–22 133, 2020. [Online]. Available: https://proceedings.neurips. cc/paper/2020/hash/fb60d411a5c5b72b2e7d3527cfc84fd0-Abstract.html

2020
[47]

Mpi-sws social network datasets,

Max Planck Institute for Software Systems, “Mpi-sws social network datasets,” https://socialnetworks.mpi-sws.org/, accessed: 2026-04-04

2026
[48]

SNAP Datasets: Stanford large network dataset collection,

J. Leskovec and A. Krevl, “SNAP Datasets: Stanford large network dataset collection,” http://snap.stanford.edu/data, Jun. 2014

2014
[49]

Web data commons hyperlink graph, 2012-08,

Web Data Commons, “Web data commons hyperlink graph, 2012-08,” https://webdatacommons.org/hyperlinkgraph/2012-08/download.html, accessed: 2026-04-04

2012
[50]

An implementation and evaluation of the mpi 3.0 one-sided com- munication interface,

J. Dinan, P. Balaji, D. Buntinas, D. Goodell, W. Gropp, and R. Thakur, “An implementation and evaluation of the mpi 3.0 one-sided com- munication interface,”Concurrency and Computation: Practice and Experience, vol. 28, no. 17, pp. 4385–4404, 2016

2016