FeLoG: Scalable and Efficient Distributed Graph Embedding with Feedback Loop Mechanism

Arijit Khan; Dan Feng; Fang Wang; Peng Fang; Yibo Zhou; Zhenli Li; Ziqiang Wu

arxiv: 2606.22180 · v1 · pith:5HXRE42Onew · submitted 2026-06-20 · 💻 cs.DC · cs.LG

FeLoG: Scalable and Efficient Distributed Graph Embedding with Feedback Loop Mechanism

Peng Fang , Arijit Khan , Ziqiang Wu , Zhenli Li , Yibo Zhou , Fang Wang , Dan Feng This is my paper

Pith reviewed 2026-06-26 11:06 UTC · model grok-4.3

classification 💻 cs.DC cs.LG

keywords distributed graph embeddingfeedback loopsampling optimizationcommunication reductionpipeline schedulingscalabilitygraph applications

0 comments

The pith

FeLoG couples real-time embedding quality feedback to sampling and communication to reduce redundant work in distributed graph embedding.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that current sampling-training pipelines for graph embedding ignore how well individual nodes are already embedded, which wastes computation on trained areas and drives up communication in distributed settings. FeLoG closes this gap by feeding live quality signals back into node selection, sequence compression, and execution scheduling. The result is fewer samples overall, lighter data movement between machines, and better overlap of CPU and GPU work. Readers interested in scaling embeddings for recommendations or fraud detection would see direct value if these savings hold on billion-edge graphs.

Core claim

FeLoG introduces feedback-coupled sampling that dynamically prioritizes undertrained nodes according to real-time embedding-quality signals, activity-aware communication that compresses frequent node sequences and selectively synchronizes embeddings, and a round-interleaved pipeline that overlaps next-round sampling with current-round training; together these changes reduce redundant computation, cut communication cost, and raise CPU-GPU utilization on large-scale graphs.

What carries the argument

The feedback loop that links evolving embedding quality to sampling priorities and communication decisions.

If this is right

Redundant sampling of already well-embedded nodes drops because priorities shift to undertrained regions.
Intra-machine PCIe traffic and inter-machine synchronization both decrease through sequence compression and selective updates.
CPU-GPU utilization rises above 80 percent by interleaving sampling of the next round with training of the current round.
Convergence accelerates on large graphs relative to six prior distributed frameworks.
Average end-to-end speedup reaches 27.9 times with communication cost falling more than 53 percent.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same feedback principle could be tested on distributed training of graph neural networks beyond simple embedding.
Energy consumption per embedding task may fall in data-center settings if communication volume shrinks as reported.
Adaptive feedback loops of this type might generalize to other sampling-heavy distributed workloads such as large language model fine-tuning on graphs.
Experiments on graphs with different degree distributions would reveal whether the prioritization rule remains effective.

Load-bearing premise

The cost of computing and applying real-time embedding-quality feedback stays small enough that net savings in sampling and communication remain large.

What would settle it

On a billion-edge graph, if the measured time spent calculating and acting on feedback exceeds the measured time saved in sampling and data transfer, the claimed net speedup disappears.

Figures

Figures reproduced from arXiv: 2606.22180 by Arijit Khan, Dan Feng, Fang Wang, Peng Fang, Yibo Zhou, Zhenli Li, Ziqiang Wu.

**Figure 1.** Figure 1: (a) Existing approaches vs. FeLoG (ours) w.r.t. modeland system-level design choices and efficiency-scalability performance; (b) our design objectives for the proposed system FeLoG. traditional data-driven tasks, e.g., recommendation [83], link prediction [52], and node classification [72], their versatility has recently extended to retrieval-augmented generation (RAG) [15]. Embeddings act as a bridge… view at source ↗

**Figure 2.** Figure 2: Sampling and training in graph embedding. [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Kernel density distribution of node embedding sim [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 5.** Figure 5: Resource utilization in decoupled sampling-training, [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 6.** Figure 6: The workflow of FeLoG. 5 [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗

**Figure 7.** Figure 7: Feedback-driven sampling-training coupling model. [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗

**Figure 8.** Figure 8: Data flow in the feedback-coupled sampling-training. [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗

**Figure 10.** Figure 10: Frequency-aware compression process. FaBC expands the compression target set only when the (𝑘+1)- th node provides sufficient marginal benefit. Let 𝑐 = 1 𝐿 + 1 32 denote the amortized overhead of adding one more compressed node, so 𝑔(𝑘) = 1 − 𝐹𝑘 + 𝑘𝑐. Since 𝐹𝑘+1 = 𝐹𝑘 + 𝑝(𝑘+1), we have 𝑔(𝑘+1) = 𝑔(𝑘) − 𝑝(𝑘+1) + 𝑐, To avoid diminishing returns, FaBC selects the largest 𝑘 satisfying 𝑔(𝑘) − 𝑔(𝑘+1) 𝑔(𝑘) ≥ 0.1 ⇐… view at source ↗

**Figure 11.** Figure 11: Hotspot-aware Synchronization Strategy across dis [PITH_FULL_IMAGE:figures/full_fig_p009_11.png] view at source ↗

**Figure 12.** Figure 12: Execution flow comparison in FeLoG. To maximize concurrency under this feedback dependency, RiPP adopts a three-group operator scheduling strategy. It groups and dispatches: ➊ CPU-side random walk generation (𝑅), ➋ CPU-side data compression (𝐶) and checkpointing (𝑃), and ➌ GPU-side data decompression (𝐷), model training (𝑇 ), and synchronization (𝑆) into three asynchronous operator pipelines, each mappe… view at source ↗

**Figure 13.** Figure 13: Efficiency of PBG [33], DistDGL [85], DistGER [17], DistGER-G [18], NeutronTP [3], LeapGNN [13] and FeLoG (ours). § 6.7). For DistGER and DistGER-G, we set the sliding window size 𝑤=10, the number of negative samples 𝐾=5, and the synchronization period to 0.1 sec [28]. For PBG, DistDGL, LeapGNN and NeutronTP, we follow the default settings in their papers [3, 13, 33, 85]. For fair system comparison, we … view at source ↗

**Figure 14.** Figure 14: Scalability analysis of FeLoG. computation and hinder convergence speed. Hence, they are on average 7.9× and 15.8× slower than FeLoG, and cannot run on the largest dataset UK due to out-of-memory issues. We further discuss accuracy-time convergence in §6.4 to quantify the time-to-quality tradeoff. Considering the resource consumption that affects scalability, Table 3 reports the average per-machine memo… view at source ↗

**Figure 15.** Figure 15: The influence of running time on embedding qual [PITH_FULL_IMAGE:figures/full_fig_p013_15.png] view at source ↗

**Figure 16.** Figure 16: 𝑀𝑎𝑐𝑟𝑜 − 𝐹1 (a1, b1) and 𝑀𝑖𝑐𝑟𝑜 − 𝐹1 (a2, b2) scores for multilabel node classification. FL YT LJ OR 0.2 0.8 1.0 Normalization Runtime (s) NeutronTP NeutronTP+FeeST DistDGL DistDGL+FeeST Graphs (a) Generalizability YT LJ OR OGB UK 101 102 103 GraNNDis FeLoG End-to-end Runtime (s) Graphs (b) Multi-GPU FL YT LJ OR TW 1 w/o-FeeST w-FeeST Normalization Runtime (s) Graphs 0 (c) FeeST Efficiency [PITH_FULL_IMAGE… view at source ↗

**Figure 17.** Figure 17: FeLoG generalizability and FeeST impact on efficiency. (𝑀𝑖𝑐𝑟𝑜 − 𝐹1) and macro-averaged F1 (𝑀𝑎𝑐𝑟𝑜 − 𝐹1) [62] scores, where 𝑀𝑖𝑐𝑟𝑜 − 𝐹1 gives equal weight to each test instance and 𝑀𝑎𝑐𝑟𝑜 − 𝐹1 assigns equal weight to each label category [29]. Following [21, 24, 47, 57], we select 10% to 90% training data ratio on Flickr, and 1% to 9% training ratio on Youtube. We report the averaged 𝑀𝑎𝑐𝑟𝑜 − 𝐹1 and 𝑀𝑖𝑐𝑟𝑜 − 𝐹… view at source ↗

**Figure 19.** Figure 19: Resource utilization and runtime breakdown analy [PITH_FULL_IMAGE:figures/full_fig_p014_19.png] view at source ↗

**Figure 21.** Figure 21: Sensitivity of FeeST w.r.t. parameter 𝜇. 7 CONCLUSIONS We presented FeLoG, a scalable and efficient distributed graph embedding system that addresses redundant sampling, excessive communication, and low resource utilization caused by the decoupling between sampling and training. FeLoG introduces three innovations: (1) a feedback-coupled sampling-training model that adapts sampling based on real-time em… view at source ↗

read the original abstract

Graph embedding maps graph nodes into low-dimensional vectors to support applications such as recommendation, fraud detection, and graph-based retrieval-augmented generation (GraphRAG). As graphs scale to billions of edges, scalable and efficient graph embedding has become increasingly important. Existing frameworks commonly adopt a sampling-training paradigm, in which mini-batches are constructed by sampling nodes and their neighbors. However, sampling is typically decoupled from evolving embedding quality, causing redundant exploration of well-trained regions while under-sampling undertrained nodes. At the system level, such decoupling further leads to excessive communication, serialized execution, and low resource utilization in distributed environments. We present FeLoG, a feedback loop-driven system for scalable distributed graph embedding. (1) FeLoG introduces feedback-coupled sampling and training, dynamically prioritizing undertrained nodes according to real-time embedding-quality feedback, thereby reducing redundant computation and accelerating convergence. (2) It employs activity-aware communication that compresses frequently occurring node sequences to reduce intra-machine PCIe traffic and selectively synchronizes frequently updated embeddings to reduce inter-machine communication. (3) It adopts a round-interleaved pipeline that overlaps next-round sampling with current-round training to improve CPU-GPU utilization. Experiments against six state-of-the-art baselines on large-scale graphs show that FeLoG achieves an average speedup of 27.9x, reduces communication cost by more than 53.1%, and sustains over 80% CPU-GPU utilization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FeLoG adds feedback from embedding quality into sampling plus some comm tweaks and reports 27.9x speedups on large graphs.

read the letter

The punchline is that FeLoG adds feedback from embedding quality back into the sampling process for distributed graph embedding, along with some communication optimizations, and reports big performance wins.

What is new is the specific combination of feedback-coupled sampling that prioritizes undertrained nodes, activity-aware compression to cut PCIe and inter-machine traffic, and a round-interleaved pipeline to overlap sampling and training. The paper does well at identifying the decoupling problem in existing sampling-training setups and proposing targeted fixes for distributed environments.

The experiments against six baselines show 27.9x speedup, 53.1% communication reduction, and over 80% utilization on large-scale graphs. These numbers are the main evidence, and they directly test whether the feedback overhead pays off.

A soft spot is that the abstract gives little on the exact graph sizes, how baselines were implemented, or statistical details, so the claims need the full paper to verify. The assumption that feedback can be computed with low enough overhead is central, but again the end-to-end results address it. No obvious internal contradictions in the description.

This is for researchers and engineers working on scalable graph embedding for applications like recommendation or fraud detection. A practitioner or systems person would find the design ideas useful if the numbers hold.

It deserves a serious referee because the work is grounded in a real problem with concrete mechanisms and empirical claims that can be checked.

Referee Report

1 major / 1 minor

Summary. The manuscript introduces FeLoG, a distributed graph embedding system that couples sampling with real-time embedding-quality feedback to prioritize undertrained nodes, uses activity-aware communication to compress node sequences and selectively synchronize embeddings, and employs a round-interleaved pipeline to overlap sampling and training. It reports an average 27.9× speedup, >53.1% communication reduction, and >80% CPU-GPU utilization versus six baselines on large-scale graphs.

Significance. If the reported gains are reproducible, the work would constitute a practical advance in scalable systems for graph embedding by showing how feedback can reduce redundant sampling and communication in distributed settings. The contribution is primarily empirical and systems-oriented; its value depends directly on the rigor and transparency of the evaluation.

major comments (1)

[Abstract] Abstract: The central empirical claims (27.9× speedup, 53.1% communication reduction, >80% utilization) are presented without any description of graph sizes (nodes/edges), baseline implementations, hardware configuration, measurement methodology, or statistical significance. These omissions are load-bearing because the net benefit of the feedback-coupled sampling rests entirely on whether its overhead is low enough to produce the stated gains; without the experimental details the claims cannot be assessed.

minor comments (1)

[Abstract] The abstract paragraph describing the feedback mechanism could be expanded with one sentence on how embedding quality is quantified and how often feedback is computed, to clarify the overhead assumption.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We agree that the abstract requires additional context to make the empirical claims assessable and will revise it accordingly in the next version.

read point-by-point responses

Referee: [Abstract] Abstract: The central empirical claims (27.9× speedup, 53.1% communication reduction, >80% utilization) are presented without any description of graph sizes (nodes/edges), baseline implementations, hardware configuration, measurement methodology, or statistical significance. These omissions are load-bearing because the net benefit of the feedback-coupled sampling rests entirely on whether its overhead is low enough to produce the stated gains; without the experimental details the claims cannot be assessed.

Authors: We agree with this assessment. The current abstract is too concise and omits key experimental parameters that are necessary to evaluate the reported gains. In the revised version we will expand the abstract (while remaining within length limits) to include: (1) graph scales (up to 10^9 edges), (2) the six baselines and their implementations, (3) hardware configuration (multi-node CPU-GPU clusters), (4) measurement methodology (wall-clock time, communication volume, utilization averaged over 5 runs), and (5) a note on statistical significance. These additions will allow readers to judge whether the feedback-loop overhead is justified by the observed speedups. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is an empirical systems contribution describing a feedback-coupled sampling design, activity-aware communication, and round-interleaved pipeline for distributed graph embedding. All central claims rest on end-to-end experimental measurements (27.9× speedup, 53.1% communication reduction, >80% utilization) against six baselines on large graphs. No equations, fitted parameters, uniqueness theorems, or self-citation chains are present that would reduce any prediction or result to its own inputs by construction. The design choices are evaluated directly via reported performance metrics rather than through definitional equivalence or internal fitting.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are introduced; the contribution is an engineering system whose claims rest on empirical measurements rather than new mathematical objects or unproven assumptions beyond standard distributed-systems infrastructure.

pith-pipeline@v0.9.1-grok · 5806 in / 1222 out tokens · 24899 ms · 2026-06-26T11:06:51.550433+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

92 extracted references · 2 linked inside Pith

[1]

Our code and datasets

2026. Our code and datasets. https://github.com/RocmFang/FeLoG

2026
[2]

Xin Ai, Qiange Wang, Chunyu Cao, Y anfeng Zhang, Chaoyi Chen, Hao Y uan, Y u Gu, and Ge Y u. 2024. NeutronOrch: Rethinking Sample-Based GNN Train- ing under CPU-GPU Heterogeneous Environments. Proceedings of the VLDB Endowment 17, 8 (2024), 1995–2008

2024
[3]

Xin Ai, Hao Y uan, Zeyu Ling, Qiange Wang, Y anfeng Zhang, Zhenbo Fu, Chaoyi Chen, Y u Gu, and Ge Y u. 2025. NeutronTP: Load-Balanced Distributed Full- Graph GNN Training with Tensor Parallelism. In 51st International Conference on V ery Large Data Bases. 173–186

2025
[4]

Guangji Bai, Ziyang Y u, Zheng Chai, Y ue Cheng, and Liang Zhao. 2025. Staleness-alleviated distributed gnn training via online dynamic-embedding pre- diction. In Proceedings of the 2025 SIAM International Conference on Data Min- ing (SDM). SIAM, 578–587

2025
[5]

Albert-László Barabási and Réka Albert. 1999. Emergence of scaling in random networks. Science 286, 5439 (1999), 509–512

1999
[6]

Paolo Boldi, Massimo Santini, and Sebastiano Vigna. 2008. A large time-aware web graph. In ACM SIGIR F orum, V ol. 42. ACM New Y ork, NY , USA, 33–38

2008
[7]

Hongyun Cai, Vincent W Zheng, and Kevin Chen-Chuan Chang. 2018. A com- prehensive survey of graph embedding: Problems, techniques, and applications. IEEE Transactions on Knowledge and Data Engineering 30, 9 (2018), 1616– 1637

2018
[8]

Chunyu Cao, Xin Ai, Qiange Wang, Y anfeng Zhang, Zhenbo Fu, Hao Y uan, Mingyi Cao, Chaoyi Chen, Yingyou Wen, Y u Gu, et al. 2025. NeutronHeter: Op- timizing Distributed Graph Neural Network Training for Heterogeneous Clusters. Proceedings of the ACM on Management of Data 3, 4 (2025), 1–29

2025
[9]

Jiahang Cao, Jinyuan Fang, Zaiqiao Meng, and Shangsong Liang. 2024. Knowl- edge graph embedding: A survey from the perspective of representation spaces. Comput. Surveys 56, 6 (2024), 1–42

2024
[10]

Deepayan Chakrabarti, Yiping Zhan, and Christos Faloutsos. 2004. R-MA T: A Recursive Model for Graph Mining. In Proceedings of the SIAM International Conference on Data Mining . 442–446

2004
[11]

Chee-Y ong Chan and Y annis E Ioannidis. 1998. Bitmap index design and eval- uation. In Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data. 355–366

1998
[12]

Jie Chen, Tengfei Ma, and Cao Xiao. 2018. FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling. In International Conference on Learning Representations

2018
[13]

Weijian Chen, Shuibing He, Haoyang Qu, and Xuechen Zhang. 2025. LeapGNN: Accelerating Distributed GNN Training Leveraging Feature-Centric Model Mi- gration. In 23rd USENIX Conference on File and Storage Technologies (F AST 25). 255–270

2025
[14]

Xu Cheng, Liang Y ao, Feng He, Y ukuo Cen, Y ufei He, Chenhui Zhang, Wenzheng Feng, Hongyun Cai, and Jie Tang. 2025. LPS-GNN: Deploying Graph Neural Networks on Graphs with 100-Billion Edges. arXiv preprint arXiv:2507.14570 (2025)

arXiv 2025
[15]

Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, Dasha Metropolitansky, Robert Osazuwa Ness, and Jonathan Larson. 2024. From local to global: A graph rag approach to query- focused summarization. arXiv preprint arXiv:2404.16130 (2024)

Pith/arXiv arXiv 2024
[16]

R. Fan, K. Chang, C. Hsieh, X. Wang, and C. Lin. 2008. LIBLINEAR: A Library for Large Linear Classiﬁcation. Journal of Machine Learning Research (2008), 18711874

2008
[17]

Peng Fang, Arijit Khan, Siqiang Luo, Fang Wang, Dan Feng, Zhenli Li, Wei Yin, and Y uchao Cao. 2023. Distributed Graph Embedding with Information- Oriented Random Walks. Proceedings of the VLDB Endowment 16, 7 (2023), 1643–1656

2023
[18]

Peng Fang, Zhenli Li, Arijit Khan, Siqiang Luo, Fang Wang, Zhan Shi, and Dan Feng. 2025. Information-Oriented Random Walks and Pipeline Optimization for Distributed Graph Embedding. IEEE Transactions on Knowledge and Data Engineering 37, 1 (2025), 408–422

2025
[19]

Peng Fang, Siqiang Luo, Fang Wang, Bolong Zheng, Hong Jiang, Dan Feng, Hechang Pan, and Xingyu Wan. 2025. OMeGa: Boosting Large-Scale Graph Embeddings with Heterogeneous Memory Processing. In 2025 IEEE 41st Inter- national Conference on Data Engineering . 3369–3383

2025
[20]

Peng Fang, Fang Wang, Zhan Shi, Hong Jiang, Dan Feng, Xianghao Xu, and Wei Yin. 2022. How to Realize Efﬁcient and Scalable Graph Embeddings via an Entropy-driven Mechanism. IEEE Transactions on Big Data (2022)

2022
[21]

Peng Fang, Fang Wang, Zhan Shi, Hong Jiang, Dan Feng, and Lei Y ang. 2021. HuGE: An entropy-driven approach to efﬁcient and scalable graph embeddings. In IEEE 37th International Conference on Data Engineering. IEEE, 2045–2050

2021
[22]

Tong Geng, Ang Li, Runbin Shi, Chunshu Wu, Tianqi Wang, Y anfei Li, Pouya Haghi, Antonino Tumeo, Shuai Che, Steve Reinhardt, et al. 2020. AWB-GCN: A graph convolutional network accelerator with runtime workload rebalancing. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 922–936

2020
[23]

Chenghua Gong, Y ao Cheng, Jianxiang Y u, Can Xu, Caihua Shan, Siqiang Luo, and Xiang Li. 2024. A survey on learning from graphs with heterophily: Recent advances and future directions. arXiv preprint arXiv:2401.09769 (2024)

arXiv 2024
[24]

Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . 855–864

2016
[25]

Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive Representa- tion Learning on Large Graphs. In Advances in Neural Information Processing Systems, V ol. 30

2017
[26]

Jiafeng Hu, Reynold Cheng, Zhipeng Huang, Yixang Fang, and Siqiang Luo
[27]

In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management

On embedding uncertain graphs. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management . 157–166. 15 Peng Fang, Arijit Khan, Ziqiang Wu, Zhenli Li, Yibo Zhou, Fang Wang, and Dan Feng

2017
[28]

Weihua Hu, Matthias Fey, Marinka Zitnik, Y uxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. 2020. Open graph benchmark: Datasets for machine learning on graphs. Advances in neural information processing sys- tems 33 (2020), 22118–22133

2020
[29]

Shihao Ji, Nadathur Satish, Sheng Li, and Pradeep K Dubey. 2019. Parallelizing word2vec in shared and distributed memory. IEEE Transactions on Parallel and Distributed Systems 30, 9 (2019), 2090–2100

2019
[30]

M. M. Keikha, M. Rahgozar, and M. Asadpour. 2018. Community Aware Ran- dom Walk for Network Embedding. Knowledge-Based Systems 148 (2018), 47– 54

2018
[31]

Thomas N Kipf and Max Welling. 2017. Semi-Supervised Classiﬁcation with Graph Convolutional Networks. In International Conference on Learning Repre- sentations

2017
[32]

Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a social network or a news media?. In Proceedings of the 19th Interna- tional Conference on World Wide Web. 591–600

2010
[33]

Y unjae Lee, Jinha Chung, and Minsoo Rhu. 2022. Smartsage: training large-scale graph neural networks using in-storage processing architectures. In Proceedings of the 49th Annual International Symposium on Computer Architecture . 932– 945

2022
[34]

Adam Lerer, Ledell Wu, Jiajun Shen, Timothee Lacroix, Luca Wehrstedt, Abhi- jit Bose, and Alex Peysakhovich. 2019. Pytorch-biggraph: A large scale graph embedding system. In Proceedings of Machine Learning and Systems , V ol. 1. 120–131

2019
[35]

Jure Leskovec, Daniel Huttenlocher, and Jon Kleinberg. 2010. Predicting pos- itive and negative links in online social networks. In Proceedings of the 19th International Conference on World Wide Web. 641–650

2010
[36]

Y ongkun Li, Zhiyong Wu, Shuai Lin, Hong Xie, Min Lv, Yinlong Xu, and John CS Lui. 2019. Walking with perception: Efﬁcient random walk sampling via common neighbor awareness. In 2019 IEEE 35th International Conference on Data Engineering . IEEE, 962–973

2019
[37]

Zhiyuan Li, Xun Jian, Y ue Wang, Yingxia Shao, and Lei Chen. 2024. Daha: Ac- celerating gnn training with data and hardware aware execution planning. Pro- ceedings of the VLDB Endowment 17, 6 (2024), 1364–1376

2024
[38]

Ningyi Liao, Dingheng Mo, Siqiang Luo, Xiang Li, and Pengcheng Yin. 2022. SCARA: scalable graph neural networks with feature-oriented optimization. arXiv preprint arXiv:2207.09179 (2022)

arXiv 2022
[39]

Dongkyun Lim and John Kim. 2025. TidalMesh: Topology-Driven AllReduce Collective Communication for Mesh Topology. In 2025 IEEE International Sym- posium on High Performance Computer Architecture. IEEE, 1526–1540

2025
[40]

Tianfeng Liu, Y angrui Chen, Dan Li, Chuan Wu, Yibo Zhu, Jun He, Y anghua Peng, Hongzheng Chen, Hongzhi Chen, and Chuanxiong Guo. 2023. BGL: GPU- Efﬁcient GNN training by optimizing graph data I/O and preprocessing. In 20th USENIX Symposium on Networked Systems Design and Implementation . 103– 118

2023
[41]

Linhao Luo, Zicheng Zhao, Gholamreza Haffari, Dinh Phung, Chen Gong, and Shirui Pan. 2025. GFM-RAG: graph foundation model for retrieval augmented generation. arXiv preprint arXiv:2502.01113 (2025)

arXiv 2025
[42]

Siqiang Luo, Zichen Zhu, Xiaokui Xiao, Yin Y ang, Chunbo Li, and Ben Kao
[43]

In International Conference on Extending Database Technology

Multi-Task Processing in V ertex-Centric Graph Systems: Evaluations and Insights. In International Conference on Extending Database Technology . 247– 259
[44]

Miller McPherson, Lynn Smith-Lovin, and James M Cook. 2001. Birds of a feather: Homophily in social networks. Annual review of sociology 27, 1 (2001), 415–444

2001
[45]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efﬁcient estimation of word representations in vector space. In International Conference on Learning Representations

2013
[46]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. Advances in Neural Information Processing Systems 26 (2013)

2013
[47]

Jeongmin Brian Park, Vikram Sharma Mailthody, Zaid Qureshi, and Wen-mei Hwu. 2024. Accelerating Sampling and Aggregation Operations in GNN Frame- works with GPU Initiated Direct Storage Accesses. Proceedings of the VLDB Endowment 17, 6 (2024), 1227–1240

2024
[48]

Y eonhong Park, Sunhong Min, and Jae W Lee. 2022. Ginex: SSD-Enabled Billion-Scale Graph Neural Network Training on a Single Machine via Prov- ably Optimal in-Memory Caching. Proceedings of the VLDB Endowment 15, 11 (2022), 26262639

2022
[49]

Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. DeepWalk: Online Learning of Social Representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . 701–710

2014
[50]

Oleg Platonov, Denis Kuznedelev, Artem Babenko, and Liudmila Prokhorenkova
[51]

Advances in Neural Information Processing Systems 36 (2023), 523–548

Characterizing graph datasets for node classiﬁcation: Homophily- heterophily dichotomy and beyond. Advances in Neural Information Processing Systems 36 (2023), 523–548

2023
[52]

Oleg Platonov, Denis Kuznedelev, Michael Diskin, Artem Babenko, and Liud- mila Prokhorenkova. 2023. A Critical Look at the Evaluation of GNNs under Heterophily: Are We Really Making Progress?. In International Conference on Learning Representations

2023
[53]

Linshan Qiu, Lu Chen, Hailiang Jie, Xiangyu Ke, Y unjun Gao, Y ang Liu, and Zetao Zhang. 2024. GPU-Accelerated Batch-Dynamic Subgraph Matching. In 2024 IEEE 40th International Conference on Data Engineering (ICDE) . IEEE, 3204–3216

2024
[54]

Alexander Renz-Wieland, Rainer Gemulla, Zoi Kaoudi, and V olker Markl. 2022. NuPS: A Parameter Server for Machine Learning with Non-Uniform Parameter Access. In Proceedings of the 2022 International Conference on Management of Data. 481–495

2022
[55]

Andrea Rossi, Denilson Barbosa, Donatella Firmani, Antonio Matinata, and Paolo Merialdo. 2021. Knowledge graph embedding for link prediction: A com- parative analysis. ACM Transactions on Knowledge Discovery from Data 15, 2 (2021), 1–49

2021
[56]

Guangming Sheng, Junwei Su, Chao Huang, and Chuan Wu. 2024. Mspipe: Efﬁcient temporal gnn training via staleness-aware pipeline. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining . 2651–2662

2024
[57]

Jaeyong Song, Hongsun Jang, Hunseong Lim, Jaewon Jung, Y oungsok Kim, and Jinho Lee. 2024. GraNNDis: Fast distributed graph neural network training framework for multi-server clusters. In Proceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques. 91–107

2024
[58]

Dahai Tang, Jiali Wang, Rong Chen, Lei Wang, Wenyuan Y u, Jingren Zhou, and Kenli Li. 2024. Xgnn: Boosting multi-gpu gnn training via global gnn memory store. Proceedings of the VLDB Endowment 17, 5 (2024), 1105–1118

2024
[59]

Lei Tang and Huan Liu. 2009. Scalable learning of collective behavior based on sparse social dimensions. In Proceedings of the 18th ACM Conference on Information and Knowledge Management . 1107–1116

2009
[60]

Anton Tsitsulin, Davide Mottin, Panagiotis Karras, and Emmanuel Müller. 2018. V erse: V ersatile graph embeddings from similarity measures. In Proceedings of the 2018 World Wide Web conference. 539–548

2018
[61]

Ke Tu, Peng Cui, Xiao Wang, Philip S Y u, and Wenwu Zhu. 2018. Deep re- cursive network embedding with regular equivalence. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Min- ing. 2357–2366

2018
[62]

Petar V eliˇckovi´c, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Y oshua Bengio. 2018. Graph Attention Networks. In International Con- ference on Learning Representations

2018
[63]

Roger Waleffe, Jason Mohoney, Theodoros Rekatsinas, and Shivaram V enkatara- man. 2023. MariusGNN: Resource-Efﬁcient Out-of-Core Training of Graph Neu- ral Networks. In Proceedings of the 18th European Conference on Computer Systems (Rome, Italy). 144161

2023
[64]

Wolfe, Anastasios Kyrillidis, Nam Sung Kim, and Yingyan Lin

Cheng Wan, Y oujie Li, Cameron R. Wolfe, Anastasios Kyrillidis, Nam Sung Kim, and Yingyan Lin. 2022. PipeGCN: Efﬁcient Full-Graph Training of Graph Con- volutional Networks with Pipelined Feature Communication. In International Conference on Learning Representations

2022
[65]

Daixin Wang, Peng Cui, and Wenwu Zhu. 2016. Structural deep network em- bedding. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining . 1225–1234

2016
[66]

Hongwei Wang, Jia Wang, Jialin Wang, Miao Zhao, Weinan Zhang, Fuzheng Zhang, Xing Xie, and Minyi Guo. 2018. GraphGAN: Graph Representation Learning With Generative Adversarial Nets. In Proceedings of the AAAI Confer- ence on Artiﬁcial Intelligence , V ol. 32

2018
[67]

Minjie Wang, Da Zheng, Zihao Y e, Quan Gan, Mufei Li, Xiang Song, Jinjing Zhou, Chao Ma, Lingfan Y u, Y u Gai, et al. 2019. Deep graph library: A graph- centric, highly-performant package for graph neural networks. arXiv preprint arXiv:1909.01315 (2019)

Pith/arXiv arXiv 2019
[68]

Qiange Wang, Y anfeng Zhang, Hao Wang, Chaoyi Chen, Xiaodong Zhang, and Ge Y u. 2022. Neutronstar: Distributed GNN Training with Hybrid Dependency Management. In Proceedings of the 2022 International Conference on Manage- ment of Data. 1301–1315

2022
[69]

Xiao Wang, Deyu Bo, Chuan Shi, Shaohua Fan, Y anfang Y e, and Philip S Y u
[70]

IEEE Transactions on Big Data 9, 2 (2022), 415–436

A survey on heterogeneous graph embedding: methods, techniques, appli- cations and sources. IEEE Transactions on Big Data 9, 2 (2022), 415–436

2022
[71]

Xuhong Wang, Ding Lyu, Mengjian Li, Y ang Xia, Qi Y ang, Xinwen Wang, Xin- guang Wang, Ping Cui, Y upu Y ang, Bowen Sun, et al. 2021. Apan: Asynchronous propagation attention network for real-time temporal graph embedding. In Pro- ceedings of the 2021 International Conference on Management of Data . 2628– 2638

2021
[72]

Y uke Wang, Boyuan Feng, Zheng Wang, Tong Geng, Kevin Barker, Ang Li, and Y ufei Ding. 2023. MGG: Accelerating graph neural networks with Fine-Grained Intra-Kernel Communication-Computation pipelining on Multi-GPU platforms. In 17th USENIX Symposium on Operating Systems Design and Implementation . 779–795

2023
[73]

Y uyue Wang, Xiurui Pan, Y uda An, Jie Zhang, and Glenn Reinman. 2024. BeaconGNN: Large-Scale GNN Acceleration with Out-of-Order Streaming In- Storage Computing. In IEEE International Symposium on High-Performance Computer Architecture. IEEE, 330–344. 16 FeLoG: Scalable and Efficient Distributed Graph Embedding with Feedback Loop Mechanism

2024
[74]

Zesong Wang, Peng Fang, Fang Wang, Hong Jiang, Yimin Lu, Zhan Shi, and Dan Feng. 2025. SpiderCache: Semantic-Aware Caching Strategy for DNN Train- ing. In Proceedings of the 54th International Conference on Parallel Processing (ICPP’25). 320330

2025
[75]

Brian J Worton. 1989. Kernel methods for estimating the utilization distribution in home-range studies. Ecology 70, 1 (1989), 164–168

1989
[76]

Qitian Wu, Wentao Zhao, Zenan Li, David P Wipf, and Junchi Y an. 2022. Node- former: A scalable graph structure learning transformer for node classiﬁcation. Advances in Neural Information Processing Systems 35 (2022), 27387–27401

2022
[77]

Yidi Wu, Kaihao Ma, Zhenkun Cai, Tatiana Jin, Boyang Li, Chenguang Zheng, James Cheng, and Fan Y u. 2021. Seastar: V ertex-centric programming for graph neural networks. In Proceedings of the 16th European Conference on Computer Systems. 359–375

2021
[78]

Jaewon Y ang and Jure Leskovec. 2012. Deﬁning and evaluating network com- munities based on ground-truth. In Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics . 1–8

2012
[79]

Renchi Y ang, Jieming Shi, Xiaokui Xiao, Yin Y ang, and Sourav S Bhowmick
[80]

Prococeding of VLDB Endowment 13, 5 (2020), 670– 683

Homogeneous Network Embedding for Massive Graphs via Reweighted Personalized PageRank. Prococeding of VLDB Endowment 13, 5 (2020), 670– 683

2020

Showing first 80 references.

[1] [1]

Our code and datasets

2026. Our code and datasets. https://github.com/RocmFang/FeLoG

2026

[2] [2]

Xin Ai, Qiange Wang, Chunyu Cao, Y anfeng Zhang, Chaoyi Chen, Hao Y uan, Y u Gu, and Ge Y u. 2024. NeutronOrch: Rethinking Sample-Based GNN Train- ing under CPU-GPU Heterogeneous Environments. Proceedings of the VLDB Endowment 17, 8 (2024), 1995–2008

2024

[3] [3]

Xin Ai, Hao Y uan, Zeyu Ling, Qiange Wang, Y anfeng Zhang, Zhenbo Fu, Chaoyi Chen, Y u Gu, and Ge Y u. 2025. NeutronTP: Load-Balanced Distributed Full- Graph GNN Training with Tensor Parallelism. In 51st International Conference on V ery Large Data Bases. 173–186

2025

[4] [4]

Guangji Bai, Ziyang Y u, Zheng Chai, Y ue Cheng, and Liang Zhao. 2025. Staleness-alleviated distributed gnn training via online dynamic-embedding pre- diction. In Proceedings of the 2025 SIAM International Conference on Data Min- ing (SDM). SIAM, 578–587

2025

[5] [5]

Albert-László Barabási and Réka Albert. 1999. Emergence of scaling in random networks. Science 286, 5439 (1999), 509–512

1999

[6] [6]

Paolo Boldi, Massimo Santini, and Sebastiano Vigna. 2008. A large time-aware web graph. In ACM SIGIR F orum, V ol. 42. ACM New Y ork, NY , USA, 33–38

2008

[7] [7]

Hongyun Cai, Vincent W Zheng, and Kevin Chen-Chuan Chang. 2018. A com- prehensive survey of graph embedding: Problems, techniques, and applications. IEEE Transactions on Knowledge and Data Engineering 30, 9 (2018), 1616– 1637

2018

[8] [8]

Chunyu Cao, Xin Ai, Qiange Wang, Y anfeng Zhang, Zhenbo Fu, Hao Y uan, Mingyi Cao, Chaoyi Chen, Yingyou Wen, Y u Gu, et al. 2025. NeutronHeter: Op- timizing Distributed Graph Neural Network Training for Heterogeneous Clusters. Proceedings of the ACM on Management of Data 3, 4 (2025), 1–29

2025

[9] [9]

Jiahang Cao, Jinyuan Fang, Zaiqiao Meng, and Shangsong Liang. 2024. Knowl- edge graph embedding: A survey from the perspective of representation spaces. Comput. Surveys 56, 6 (2024), 1–42

2024

[10] [10]

Deepayan Chakrabarti, Yiping Zhan, and Christos Faloutsos. 2004. R-MA T: A Recursive Model for Graph Mining. In Proceedings of the SIAM International Conference on Data Mining . 442–446

2004

[11] [11]

Chee-Y ong Chan and Y annis E Ioannidis. 1998. Bitmap index design and eval- uation. In Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data. 355–366

1998

[12] [12]

Jie Chen, Tengfei Ma, and Cao Xiao. 2018. FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling. In International Conference on Learning Representations

2018

[13] [13]

Weijian Chen, Shuibing He, Haoyang Qu, and Xuechen Zhang. 2025. LeapGNN: Accelerating Distributed GNN Training Leveraging Feature-Centric Model Mi- gration. In 23rd USENIX Conference on File and Storage Technologies (F AST 25). 255–270

2025

[14] [14]

Xu Cheng, Liang Y ao, Feng He, Y ukuo Cen, Y ufei He, Chenhui Zhang, Wenzheng Feng, Hongyun Cai, and Jie Tang. 2025. LPS-GNN: Deploying Graph Neural Networks on Graphs with 100-Billion Edges. arXiv preprint arXiv:2507.14570 (2025)

arXiv 2025

[15] [15]

Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, Dasha Metropolitansky, Robert Osazuwa Ness, and Jonathan Larson. 2024. From local to global: A graph rag approach to query- focused summarization. arXiv preprint arXiv:2404.16130 (2024)

Pith/arXiv arXiv 2024

[16] [16]

R. Fan, K. Chang, C. Hsieh, X. Wang, and C. Lin. 2008. LIBLINEAR: A Library for Large Linear Classiﬁcation. Journal of Machine Learning Research (2008), 18711874

2008

[17] [17]

Peng Fang, Arijit Khan, Siqiang Luo, Fang Wang, Dan Feng, Zhenli Li, Wei Yin, and Y uchao Cao. 2023. Distributed Graph Embedding with Information- Oriented Random Walks. Proceedings of the VLDB Endowment 16, 7 (2023), 1643–1656

2023

[18] [18]

Peng Fang, Zhenli Li, Arijit Khan, Siqiang Luo, Fang Wang, Zhan Shi, and Dan Feng. 2025. Information-Oriented Random Walks and Pipeline Optimization for Distributed Graph Embedding. IEEE Transactions on Knowledge and Data Engineering 37, 1 (2025), 408–422

2025

[19] [19]

Peng Fang, Siqiang Luo, Fang Wang, Bolong Zheng, Hong Jiang, Dan Feng, Hechang Pan, and Xingyu Wan. 2025. OMeGa: Boosting Large-Scale Graph Embeddings with Heterogeneous Memory Processing. In 2025 IEEE 41st Inter- national Conference on Data Engineering . 3369–3383

2025

[20] [20]

Peng Fang, Fang Wang, Zhan Shi, Hong Jiang, Dan Feng, Xianghao Xu, and Wei Yin. 2022. How to Realize Efﬁcient and Scalable Graph Embeddings via an Entropy-driven Mechanism. IEEE Transactions on Big Data (2022)

2022

[21] [21]

Peng Fang, Fang Wang, Zhan Shi, Hong Jiang, Dan Feng, and Lei Y ang. 2021. HuGE: An entropy-driven approach to efﬁcient and scalable graph embeddings. In IEEE 37th International Conference on Data Engineering. IEEE, 2045–2050

2021

[22] [22]

Tong Geng, Ang Li, Runbin Shi, Chunshu Wu, Tianqi Wang, Y anfei Li, Pouya Haghi, Antonino Tumeo, Shuai Che, Steve Reinhardt, et al. 2020. AWB-GCN: A graph convolutional network accelerator with runtime workload rebalancing. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 922–936

2020

[23] [23]

Chenghua Gong, Y ao Cheng, Jianxiang Y u, Can Xu, Caihua Shan, Siqiang Luo, and Xiang Li. 2024. A survey on learning from graphs with heterophily: Recent advances and future directions. arXiv preprint arXiv:2401.09769 (2024)

arXiv 2024

[24] [24]

Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . 855–864

2016

[25] [25]

Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive Representa- tion Learning on Large Graphs. In Advances in Neural Information Processing Systems, V ol. 30

2017

[26] [26]

Jiafeng Hu, Reynold Cheng, Zhipeng Huang, Yixang Fang, and Siqiang Luo

[27] [27]

In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management

On embedding uncertain graphs. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management . 157–166. 15 Peng Fang, Arijit Khan, Ziqiang Wu, Zhenli Li, Yibo Zhou, Fang Wang, and Dan Feng

2017

[28] [28]

Weihua Hu, Matthias Fey, Marinka Zitnik, Y uxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. 2020. Open graph benchmark: Datasets for machine learning on graphs. Advances in neural information processing sys- tems 33 (2020), 22118–22133

2020

[29] [29]

Shihao Ji, Nadathur Satish, Sheng Li, and Pradeep K Dubey. 2019. Parallelizing word2vec in shared and distributed memory. IEEE Transactions on Parallel and Distributed Systems 30, 9 (2019), 2090–2100

2019

[30] [30]

M. M. Keikha, M. Rahgozar, and M. Asadpour. 2018. Community Aware Ran- dom Walk for Network Embedding. Knowledge-Based Systems 148 (2018), 47– 54

2018

[31] [31]

Thomas N Kipf and Max Welling. 2017. Semi-Supervised Classiﬁcation with Graph Convolutional Networks. In International Conference on Learning Repre- sentations

2017

[32] [32]

Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a social network or a news media?. In Proceedings of the 19th Interna- tional Conference on World Wide Web. 591–600

2010

[33] [33]

Y unjae Lee, Jinha Chung, and Minsoo Rhu. 2022. Smartsage: training large-scale graph neural networks using in-storage processing architectures. In Proceedings of the 49th Annual International Symposium on Computer Architecture . 932– 945

2022

[34] [34]

Adam Lerer, Ledell Wu, Jiajun Shen, Timothee Lacroix, Luca Wehrstedt, Abhi- jit Bose, and Alex Peysakhovich. 2019. Pytorch-biggraph: A large scale graph embedding system. In Proceedings of Machine Learning and Systems , V ol. 1. 120–131

2019

[35] [35]

Jure Leskovec, Daniel Huttenlocher, and Jon Kleinberg. 2010. Predicting pos- itive and negative links in online social networks. In Proceedings of the 19th International Conference on World Wide Web. 641–650

2010

[36] [36]

Y ongkun Li, Zhiyong Wu, Shuai Lin, Hong Xie, Min Lv, Yinlong Xu, and John CS Lui. 2019. Walking with perception: Efﬁcient random walk sampling via common neighbor awareness. In 2019 IEEE 35th International Conference on Data Engineering . IEEE, 962–973

2019

[37] [37]

Zhiyuan Li, Xun Jian, Y ue Wang, Yingxia Shao, and Lei Chen. 2024. Daha: Ac- celerating gnn training with data and hardware aware execution planning. Pro- ceedings of the VLDB Endowment 17, 6 (2024), 1364–1376

2024

[38] [38]

Ningyi Liao, Dingheng Mo, Siqiang Luo, Xiang Li, and Pengcheng Yin. 2022. SCARA: scalable graph neural networks with feature-oriented optimization. arXiv preprint arXiv:2207.09179 (2022)

arXiv 2022

[39] [39]

Dongkyun Lim and John Kim. 2025. TidalMesh: Topology-Driven AllReduce Collective Communication for Mesh Topology. In 2025 IEEE International Sym- posium on High Performance Computer Architecture. IEEE, 1526–1540

2025

[40] [40]

Tianfeng Liu, Y angrui Chen, Dan Li, Chuan Wu, Yibo Zhu, Jun He, Y anghua Peng, Hongzheng Chen, Hongzhi Chen, and Chuanxiong Guo. 2023. BGL: GPU- Efﬁcient GNN training by optimizing graph data I/O and preprocessing. In 20th USENIX Symposium on Networked Systems Design and Implementation . 103– 118

2023

[41] [41]

Linhao Luo, Zicheng Zhao, Gholamreza Haffari, Dinh Phung, Chen Gong, and Shirui Pan. 2025. GFM-RAG: graph foundation model for retrieval augmented generation. arXiv preprint arXiv:2502.01113 (2025)

arXiv 2025

[42] [42]

Siqiang Luo, Zichen Zhu, Xiaokui Xiao, Yin Y ang, Chunbo Li, and Ben Kao

[43] [43]

In International Conference on Extending Database Technology

Multi-Task Processing in V ertex-Centric Graph Systems: Evaluations and Insights. In International Conference on Extending Database Technology . 247– 259

[44] [44]

Miller McPherson, Lynn Smith-Lovin, and James M Cook. 2001. Birds of a feather: Homophily in social networks. Annual review of sociology 27, 1 (2001), 415–444

2001

[45] [45]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efﬁcient estimation of word representations in vector space. In International Conference on Learning Representations

2013

[46] [46]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. Advances in Neural Information Processing Systems 26 (2013)

2013

[47] [47]

Jeongmin Brian Park, Vikram Sharma Mailthody, Zaid Qureshi, and Wen-mei Hwu. 2024. Accelerating Sampling and Aggregation Operations in GNN Frame- works with GPU Initiated Direct Storage Accesses. Proceedings of the VLDB Endowment 17, 6 (2024), 1227–1240

2024

[48] [48]

Y eonhong Park, Sunhong Min, and Jae W Lee. 2022. Ginex: SSD-Enabled Billion-Scale Graph Neural Network Training on a Single Machine via Prov- ably Optimal in-Memory Caching. Proceedings of the VLDB Endowment 15, 11 (2022), 26262639

2022

[49] [49]

Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. DeepWalk: Online Learning of Social Representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . 701–710

2014

[50] [50]

Oleg Platonov, Denis Kuznedelev, Artem Babenko, and Liudmila Prokhorenkova

[51] [51]

Advances in Neural Information Processing Systems 36 (2023), 523–548

Characterizing graph datasets for node classiﬁcation: Homophily- heterophily dichotomy and beyond. Advances in Neural Information Processing Systems 36 (2023), 523–548

2023

[52] [52]

Oleg Platonov, Denis Kuznedelev, Michael Diskin, Artem Babenko, and Liud- mila Prokhorenkova. 2023. A Critical Look at the Evaluation of GNNs under Heterophily: Are We Really Making Progress?. In International Conference on Learning Representations

2023

[53] [53]

Linshan Qiu, Lu Chen, Hailiang Jie, Xiangyu Ke, Y unjun Gao, Y ang Liu, and Zetao Zhang. 2024. GPU-Accelerated Batch-Dynamic Subgraph Matching. In 2024 IEEE 40th International Conference on Data Engineering (ICDE) . IEEE, 3204–3216

2024

[54] [54]

Alexander Renz-Wieland, Rainer Gemulla, Zoi Kaoudi, and V olker Markl. 2022. NuPS: A Parameter Server for Machine Learning with Non-Uniform Parameter Access. In Proceedings of the 2022 International Conference on Management of Data. 481–495

2022

[55] [55]

Andrea Rossi, Denilson Barbosa, Donatella Firmani, Antonio Matinata, and Paolo Merialdo. 2021. Knowledge graph embedding for link prediction: A com- parative analysis. ACM Transactions on Knowledge Discovery from Data 15, 2 (2021), 1–49

2021

[56] [56]

Guangming Sheng, Junwei Su, Chao Huang, and Chuan Wu. 2024. Mspipe: Efﬁcient temporal gnn training via staleness-aware pipeline. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining . 2651–2662

2024

[57] [57]

Jaeyong Song, Hongsun Jang, Hunseong Lim, Jaewon Jung, Y oungsok Kim, and Jinho Lee. 2024. GraNNDis: Fast distributed graph neural network training framework for multi-server clusters. In Proceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques. 91–107

2024

[58] [58]

Dahai Tang, Jiali Wang, Rong Chen, Lei Wang, Wenyuan Y u, Jingren Zhou, and Kenli Li. 2024. Xgnn: Boosting multi-gpu gnn training via global gnn memory store. Proceedings of the VLDB Endowment 17, 5 (2024), 1105–1118

2024

[59] [59]

Lei Tang and Huan Liu. 2009. Scalable learning of collective behavior based on sparse social dimensions. In Proceedings of the 18th ACM Conference on Information and Knowledge Management . 1107–1116

2009

[60] [60]

Anton Tsitsulin, Davide Mottin, Panagiotis Karras, and Emmanuel Müller. 2018. V erse: V ersatile graph embeddings from similarity measures. In Proceedings of the 2018 World Wide Web conference. 539–548

2018

[61] [61]

Ke Tu, Peng Cui, Xiao Wang, Philip S Y u, and Wenwu Zhu. 2018. Deep re- cursive network embedding with regular equivalence. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Min- ing. 2357–2366

2018

[62] [62]

Petar V eliˇckovi´c, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Y oshua Bengio. 2018. Graph Attention Networks. In International Con- ference on Learning Representations

2018

[63] [63]

Roger Waleffe, Jason Mohoney, Theodoros Rekatsinas, and Shivaram V enkatara- man. 2023. MariusGNN: Resource-Efﬁcient Out-of-Core Training of Graph Neu- ral Networks. In Proceedings of the 18th European Conference on Computer Systems (Rome, Italy). 144161

2023

[64] [64]

Wolfe, Anastasios Kyrillidis, Nam Sung Kim, and Yingyan Lin

Cheng Wan, Y oujie Li, Cameron R. Wolfe, Anastasios Kyrillidis, Nam Sung Kim, and Yingyan Lin. 2022. PipeGCN: Efﬁcient Full-Graph Training of Graph Con- volutional Networks with Pipelined Feature Communication. In International Conference on Learning Representations

2022

[65] [65]

Daixin Wang, Peng Cui, and Wenwu Zhu. 2016. Structural deep network em- bedding. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining . 1225–1234

2016

[66] [66]

Hongwei Wang, Jia Wang, Jialin Wang, Miao Zhao, Weinan Zhang, Fuzheng Zhang, Xing Xie, and Minyi Guo. 2018. GraphGAN: Graph Representation Learning With Generative Adversarial Nets. In Proceedings of the AAAI Confer- ence on Artiﬁcial Intelligence , V ol. 32

2018

[67] [67]

Minjie Wang, Da Zheng, Zihao Y e, Quan Gan, Mufei Li, Xiang Song, Jinjing Zhou, Chao Ma, Lingfan Y u, Y u Gai, et al. 2019. Deep graph library: A graph- centric, highly-performant package for graph neural networks. arXiv preprint arXiv:1909.01315 (2019)

Pith/arXiv arXiv 2019

[68] [68]

Qiange Wang, Y anfeng Zhang, Hao Wang, Chaoyi Chen, Xiaodong Zhang, and Ge Y u. 2022. Neutronstar: Distributed GNN Training with Hybrid Dependency Management. In Proceedings of the 2022 International Conference on Manage- ment of Data. 1301–1315

2022

[69] [69]

Xiao Wang, Deyu Bo, Chuan Shi, Shaohua Fan, Y anfang Y e, and Philip S Y u

[70] [70]

IEEE Transactions on Big Data 9, 2 (2022), 415–436

A survey on heterogeneous graph embedding: methods, techniques, appli- cations and sources. IEEE Transactions on Big Data 9, 2 (2022), 415–436

2022

[71] [71]

Xuhong Wang, Ding Lyu, Mengjian Li, Y ang Xia, Qi Y ang, Xinwen Wang, Xin- guang Wang, Ping Cui, Y upu Y ang, Bowen Sun, et al. 2021. Apan: Asynchronous propagation attention network for real-time temporal graph embedding. In Pro- ceedings of the 2021 International Conference on Management of Data . 2628– 2638

2021

[72] [72]

Y uke Wang, Boyuan Feng, Zheng Wang, Tong Geng, Kevin Barker, Ang Li, and Y ufei Ding. 2023. MGG: Accelerating graph neural networks with Fine-Grained Intra-Kernel Communication-Computation pipelining on Multi-GPU platforms. In 17th USENIX Symposium on Operating Systems Design and Implementation . 779–795

2023

[73] [73]

Y uyue Wang, Xiurui Pan, Y uda An, Jie Zhang, and Glenn Reinman. 2024. BeaconGNN: Large-Scale GNN Acceleration with Out-of-Order Streaming In- Storage Computing. In IEEE International Symposium on High-Performance Computer Architecture. IEEE, 330–344. 16 FeLoG: Scalable and Efficient Distributed Graph Embedding with Feedback Loop Mechanism

2024

[74] [74]

Zesong Wang, Peng Fang, Fang Wang, Hong Jiang, Yimin Lu, Zhan Shi, and Dan Feng. 2025. SpiderCache: Semantic-Aware Caching Strategy for DNN Train- ing. In Proceedings of the 54th International Conference on Parallel Processing (ICPP’25). 320330

2025

[75] [75]

Brian J Worton. 1989. Kernel methods for estimating the utilization distribution in home-range studies. Ecology 70, 1 (1989), 164–168

1989

[76] [76]

Qitian Wu, Wentao Zhao, Zenan Li, David P Wipf, and Junchi Y an. 2022. Node- former: A scalable graph structure learning transformer for node classiﬁcation. Advances in Neural Information Processing Systems 35 (2022), 27387–27401

2022

[77] [77]

Yidi Wu, Kaihao Ma, Zhenkun Cai, Tatiana Jin, Boyang Li, Chenguang Zheng, James Cheng, and Fan Y u. 2021. Seastar: V ertex-centric programming for graph neural networks. In Proceedings of the 16th European Conference on Computer Systems. 359–375

2021

[78] [78]

Jaewon Y ang and Jure Leskovec. 2012. Deﬁning and evaluating network com- munities based on ground-truth. In Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics . 1–8

2012

[79] [79]

Renchi Y ang, Jieming Shi, Xiaokui Xiao, Yin Y ang, and Sourav S Bhowmick

[80] [80]

Prococeding of VLDB Endowment 13, 5 (2020), 670– 683

Homogeneous Network Embedding for Massive Graphs via Reweighted Personalized PageRank. Prococeding of VLDB Endowment 13, 5 (2020), 670– 683

2020