pith. sign in

arxiv: 2606.22180 · v1 · pith:5HXRE42Onew · submitted 2026-06-20 · 💻 cs.DC · cs.LG

FeLoG: Scalable and Efficient Distributed Graph Embedding with Feedback Loop Mechanism

Pith reviewed 2026-06-26 11:06 UTC · model grok-4.3

classification 💻 cs.DC cs.LG
keywords distributed graph embeddingfeedback loopsampling optimizationcommunication reductionpipeline schedulingscalabilitygraph applications
0
0 comments X

The pith

FeLoG couples real-time embedding quality feedback to sampling and communication to reduce redundant work in distributed graph embedding.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that current sampling-training pipelines for graph embedding ignore how well individual nodes are already embedded, which wastes computation on trained areas and drives up communication in distributed settings. FeLoG closes this gap by feeding live quality signals back into node selection, sequence compression, and execution scheduling. The result is fewer samples overall, lighter data movement between machines, and better overlap of CPU and GPU work. Readers interested in scaling embeddings for recommendations or fraud detection would see direct value if these savings hold on billion-edge graphs.

Core claim

FeLoG introduces feedback-coupled sampling that dynamically prioritizes undertrained nodes according to real-time embedding-quality signals, activity-aware communication that compresses frequent node sequences and selectively synchronizes embeddings, and a round-interleaved pipeline that overlaps next-round sampling with current-round training; together these changes reduce redundant computation, cut communication cost, and raise CPU-GPU utilization on large-scale graphs.

What carries the argument

The feedback loop that links evolving embedding quality to sampling priorities and communication decisions.

If this is right

  • Redundant sampling of already well-embedded nodes drops because priorities shift to undertrained regions.
  • Intra-machine PCIe traffic and inter-machine synchronization both decrease through sequence compression and selective updates.
  • CPU-GPU utilization rises above 80 percent by interleaving sampling of the next round with training of the current round.
  • Convergence accelerates on large graphs relative to six prior distributed frameworks.
  • Average end-to-end speedup reaches 27.9 times with communication cost falling more than 53 percent.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same feedback principle could be tested on distributed training of graph neural networks beyond simple embedding.
  • Energy consumption per embedding task may fall in data-center settings if communication volume shrinks as reported.
  • Adaptive feedback loops of this type might generalize to other sampling-heavy distributed workloads such as large language model fine-tuning on graphs.
  • Experiments on graphs with different degree distributions would reveal whether the prioritization rule remains effective.

Load-bearing premise

The cost of computing and applying real-time embedding-quality feedback stays small enough that net savings in sampling and communication remain large.

What would settle it

On a billion-edge graph, if the measured time spent calculating and acting on feedback exceeds the measured time saved in sampling and data transfer, the claimed net speedup disappears.

Figures

Figures reproduced from arXiv: 2606.22180 by Arijit Khan, Dan Feng, Fang Wang, Peng Fang, Yibo Zhou, Zhenli Li, Ziqiang Wu.

Figure 1
Figure 1. Figure 1: (a) Existing approaches vs. FeLoG (ours) w.r.t. model￾and system-level design choices and efficiency-scalability per￾formance; (b) our design objectives for the proposed system FeLoG. traditional data-driven tasks, e.g., recommendation [83], link pre￾diction [52], and node classification [72], their versatility has re￾cently extended to retrieval-augmented generation (RAG) [15]. Em￾beddings act as a bridge… view at source ↗
Figure 2
Figure 2. Figure 2: Sampling and training in graph embedding. [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Kernel density distribution of node embedding sim [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Resource utilization in decoupled sampling-training, [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: The workflow of FeLoG. 5 [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Feedback-driven sampling-training coupling model. [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Data flow in the feedback-coupled sampling-training. [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗
Figure 10
Figure 10. Figure 10: Frequency-aware compression process. FaBC expands the compression target set only when the (𝑘+1)- th node provides sufficient marginal benefit. Let 𝑐 = 1 𝐿 + 1 32 denote the amortized overhead of adding one more compressed node, so 𝑔(𝑘) = 1 − 𝐹𝑘 + 𝑘𝑐. Since 𝐹𝑘+1 = 𝐹𝑘 + 𝑝(𝑘+1), we have 𝑔(𝑘+1) = 𝑔(𝑘) − 𝑝(𝑘+1) + 𝑐, To avoid diminishing returns, FaBC selects the largest 𝑘 satisfying 𝑔(𝑘) − 𝑔(𝑘+1) 𝑔(𝑘) ≥ 0.1 ⇐… view at source ↗
Figure 11
Figure 11. Figure 11: Hotspot-aware Synchronization Strategy across dis [PITH_FULL_IMAGE:figures/full_fig_p009_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Execution flow comparison in FeLoG. To maximize concurrency under this feedback dependency, RiPP adopts a three-group operator scheduling strategy. It groups and dis￾patches: ➊ CPU-side random walk generation (𝑅), ➋ CPU-side data compression (𝐶) and checkpointing (𝑃), and ➌ GPU-side data de￾compression (𝐷), model training (𝑇 ), and synchronization (𝑆) into three asynchronous operator pipelines, each mappe… view at source ↗
Figure 13
Figure 13. Figure 13: Efficiency of PBG [33], DistDGL [85], DistGER [17], DistGER-G [18], NeutronTP [3], LeapGNN [13] and FeLoG (ours). § 6.7). For DistGER and DistGER-G, we set the sliding window size 𝑤=10, the number of negative samples 𝐾=5, and the synchroniza￾tion period to 0.1 sec [28]. For PBG, DistDGL, LeapGNN and Neu￾tronTP, we follow the default settings in their papers [3, 13, 33, 85]. For fair system comparison, we … view at source ↗
Figure 14
Figure 14. Figure 14: Scalability analysis of FeLoG. computation and hinder convergence speed. Hence, they are on av￾erage 7.9× and 15.8× slower than FeLoG, and cannot run on the largest dataset UK due to out-of-memory issues. We further discuss accuracy-time convergence in §6.4 to quantify the time-to-quality tradeoff. Considering the resource consumption that affects scalability, Ta￾ble 3 reports the average per-machine memo… view at source ↗
Figure 15
Figure 15. Figure 15: The influence of running time on embedding qual [PITH_FULL_IMAGE:figures/full_fig_p013_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: 𝑀𝑎𝑐𝑟𝑜 − 𝐹1 (a1, b1) and 𝑀𝑖𝑐𝑟𝑜 − 𝐹1 (a2, b2) scores for multilabel node classification. FL YT LJ OR 0.2 0.8 1.0 Normalization Runtime (s) NeutronTP NeutronTP+FeeST DistDGL DistDGL+FeeST Graphs (a) Generalizability YT LJ OR OGB UK 101 102 103 GraNNDis FeLoG End-to-end Runtime (s) Graphs (b) Multi-GPU FL YT LJ OR TW 1 w/o-FeeST w-FeeST Normalization Runtime (s) Graphs 0 (c) FeeST Efficiency [PITH_FULL_IMAGE… view at source ↗
Figure 17
Figure 17. Figure 17: FeLoG generalizability and FeeST impact on effi￾ciency. (𝑀𝑖𝑐𝑟𝑜 − 𝐹1) and macro-averaged F1 (𝑀𝑎𝑐𝑟𝑜 − 𝐹1) [62] scores, where 𝑀𝑖𝑐𝑟𝑜 − 𝐹1 gives equal weight to each test instance and 𝑀𝑎𝑐𝑟𝑜 − 𝐹1 assigns equal weight to each label category [29]. Fol￾lowing [21, 24, 47, 57], we select 10% to 90% training data ratio on Flickr, and 1% to 9% training ratio on Youtube. We report the averaged 𝑀𝑎𝑐𝑟𝑜 − 𝐹1 and 𝑀𝑖𝑐𝑟𝑜 − 𝐹… view at source ↗
Figure 19
Figure 19. Figure 19: Resource utilization and runtime breakdown analy [PITH_FULL_IMAGE:figures/full_fig_p014_19.png] view at source ↗
Figure 21
Figure 21. Figure 21: Sensitivity of FeeST w.r.t. parameter 𝜇. 7 CONCLUSIONS We presented FeLoG, a scalable and efficient distributed graph em￾bedding system that addresses redundant sampling, excessive com￾munication, and low resource utilization caused by the decoupling between sampling and training. FeLoG introduces three innovations: (1) a feedback-coupled sampling-training model that adapts sam￾pling based on real-time em… view at source ↗
read the original abstract

Graph embedding maps graph nodes into low-dimensional vectors to support applications such as recommendation, fraud detection, and graph-based retrieval-augmented generation (GraphRAG). As graphs scale to billions of edges, scalable and efficient graph embedding has become increasingly important. Existing frameworks commonly adopt a sampling-training paradigm, in which mini-batches are constructed by sampling nodes and their neighbors. However, sampling is typically decoupled from evolving embedding quality, causing redundant exploration of well-trained regions while under-sampling undertrained nodes. At the system level, such decoupling further leads to excessive communication, serialized execution, and low resource utilization in distributed environments. We present FeLoG, a feedback loop-driven system for scalable distributed graph embedding. (1) FeLoG introduces feedback-coupled sampling and training, dynamically prioritizing undertrained nodes according to real-time embedding-quality feedback, thereby reducing redundant computation and accelerating convergence. (2) It employs activity-aware communication that compresses frequently occurring node sequences to reduce intra-machine PCIe traffic and selectively synchronizes frequently updated embeddings to reduce inter-machine communication. (3) It adopts a round-interleaved pipeline that overlaps next-round sampling with current-round training to improve CPU-GPU utilization. Experiments against six state-of-the-art baselines on large-scale graphs show that FeLoG achieves an average speedup of 27.9x, reduces communication cost by more than 53.1%, and sustains over 80% CPU-GPU utilization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript introduces FeLoG, a distributed graph embedding system that couples sampling with real-time embedding-quality feedback to prioritize undertrained nodes, uses activity-aware communication to compress node sequences and selectively synchronize embeddings, and employs a round-interleaved pipeline to overlap sampling and training. It reports an average 27.9× speedup, >53.1% communication reduction, and >80% CPU-GPU utilization versus six baselines on large-scale graphs.

Significance. If the reported gains are reproducible, the work would constitute a practical advance in scalable systems for graph embedding by showing how feedback can reduce redundant sampling and communication in distributed settings. The contribution is primarily empirical and systems-oriented; its value depends directly on the rigor and transparency of the evaluation.

major comments (1)
  1. [Abstract] Abstract: The central empirical claims (27.9× speedup, 53.1% communication reduction, >80% utilization) are presented without any description of graph sizes (nodes/edges), baseline implementations, hardware configuration, measurement methodology, or statistical significance. These omissions are load-bearing because the net benefit of the feedback-coupled sampling rests entirely on whether its overhead is low enough to produce the stated gains; without the experimental details the claims cannot be assessed.
minor comments (1)
  1. [Abstract] The abstract paragraph describing the feedback mechanism could be expanded with one sentence on how embedding quality is quantified and how often feedback is computed, to clarify the overhead assumption.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We agree that the abstract requires additional context to make the empirical claims assessable and will revise it accordingly in the next version.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central empirical claims (27.9× speedup, 53.1% communication reduction, >80% utilization) are presented without any description of graph sizes (nodes/edges), baseline implementations, hardware configuration, measurement methodology, or statistical significance. These omissions are load-bearing because the net benefit of the feedback-coupled sampling rests entirely on whether its overhead is low enough to produce the stated gains; without the experimental details the claims cannot be assessed.

    Authors: We agree with this assessment. The current abstract is too concise and omits key experimental parameters that are necessary to evaluate the reported gains. In the revised version we will expand the abstract (while remaining within length limits) to include: (1) graph scales (up to 10^9 edges), (2) the six baselines and their implementations, (3) hardware configuration (multi-node CPU-GPU clusters), (4) measurement methodology (wall-clock time, communication volume, utilization averaged over 5 runs), and (5) a note on statistical significance. These additions will allow readers to judge whether the feedback-loop overhead is justified by the observed speedups. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is an empirical systems contribution describing a feedback-coupled sampling design, activity-aware communication, and round-interleaved pipeline for distributed graph embedding. All central claims rest on end-to-end experimental measurements (27.9× speedup, 53.1% communication reduction, >80% utilization) against six baselines on large graphs. No equations, fitted parameters, uniqueness theorems, or self-citation chains are present that would reduce any prediction or result to its own inputs by construction. The design choices are evaluated directly via reported performance metrics rather than through definitional equivalence or internal fitting.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are introduced; the contribution is an engineering system whose claims rest on empirical measurements rather than new mathematical objects or unproven assumptions beyond standard distributed-systems infrastructure.

pith-pipeline@v0.9.1-grok · 5806 in / 1222 out tokens · 24899 ms · 2026-06-26T11:06:51.550433+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

92 extracted references · 2 linked inside Pith

  1. [1]

    Our code and datasets

    2026. Our code and datasets. https://github.com/RocmFang/FeLoG

  2. [2]

    Xin Ai, Qiange Wang, Chunyu Cao, Y anfeng Zhang, Chaoyi Chen, Hao Y uan, Y u Gu, and Ge Y u. 2024. NeutronOrch: Rethinking Sample-Based GNN Train- ing under CPU-GPU Heterogeneous Environments. Proceedings of the VLDB Endowment 17, 8 (2024), 1995–2008

  3. [3]

    Xin Ai, Hao Y uan, Zeyu Ling, Qiange Wang, Y anfeng Zhang, Zhenbo Fu, Chaoyi Chen, Y u Gu, and Ge Y u. 2025. NeutronTP: Load-Balanced Distributed Full- Graph GNN Training with Tensor Parallelism. In 51st International Conference on V ery Large Data Bases. 173–186

  4. [4]

    Guangji Bai, Ziyang Y u, Zheng Chai, Y ue Cheng, and Liang Zhao. 2025. Staleness-alleviated distributed gnn training via online dynamic-embedding pre- diction. In Proceedings of the 2025 SIAM International Conference on Data Min- ing (SDM). SIAM, 578–587

  5. [5]

    Albert-László Barabási and Réka Albert. 1999. Emergence of scaling in random networks. Science 286, 5439 (1999), 509–512

  6. [6]

    Paolo Boldi, Massimo Santini, and Sebastiano Vigna. 2008. A large time-aware web graph. In ACM SIGIR F orum, V ol. 42. ACM New Y ork, NY , USA, 33–38

  7. [7]

    Hongyun Cai, Vincent W Zheng, and Kevin Chen-Chuan Chang. 2018. A com- prehensive survey of graph embedding: Problems, techniques, and applications. IEEE Transactions on Knowledge and Data Engineering 30, 9 (2018), 1616– 1637

  8. [8]

    Chunyu Cao, Xin Ai, Qiange Wang, Y anfeng Zhang, Zhenbo Fu, Hao Y uan, Mingyi Cao, Chaoyi Chen, Yingyou Wen, Y u Gu, et al. 2025. NeutronHeter: Op- timizing Distributed Graph Neural Network Training for Heterogeneous Clusters. Proceedings of the ACM on Management of Data 3, 4 (2025), 1–29

  9. [9]

    Jiahang Cao, Jinyuan Fang, Zaiqiao Meng, and Shangsong Liang. 2024. Knowl- edge graph embedding: A survey from the perspective of representation spaces. Comput. Surveys 56, 6 (2024), 1–42

  10. [10]

    Deepayan Chakrabarti, Yiping Zhan, and Christos Faloutsos. 2004. R-MA T: A Recursive Model for Graph Mining. In Proceedings of the SIAM International Conference on Data Mining . 442–446

  11. [11]

    Chee-Y ong Chan and Y annis E Ioannidis. 1998. Bitmap index design and eval- uation. In Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data. 355–366

  12. [12]

    Jie Chen, Tengfei Ma, and Cao Xiao. 2018. FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling. In International Conference on Learning Representations

  13. [13]

    Weijian Chen, Shuibing He, Haoyang Qu, and Xuechen Zhang. 2025. LeapGNN: Accelerating Distributed GNN Training Leveraging Feature-Centric Model Mi- gration. In 23rd USENIX Conference on File and Storage Technologies (F AST 25). 255–270

  14. [14]

    Xu Cheng, Liang Y ao, Feng He, Y ukuo Cen, Y ufei He, Chenhui Zhang, Wenzheng Feng, Hongyun Cai, and Jie Tang. 2025. LPS-GNN: Deploying Graph Neural Networks on Graphs with 100-Billion Edges. arXiv preprint arXiv:2507.14570 (2025)

  15. [15]

    Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, Dasha Metropolitansky, Robert Osazuwa Ness, and Jonathan Larson. 2024. From local to global: A graph rag approach to query- focused summarization. arXiv preprint arXiv:2404.16130 (2024)

  16. [16]

    R. Fan, K. Chang, C. Hsieh, X. Wang, and C. Lin. 2008. LIBLINEAR: A Library for Large Linear Classification. Journal of Machine Learning Research (2008), 18711874

  17. [17]

    Peng Fang, Arijit Khan, Siqiang Luo, Fang Wang, Dan Feng, Zhenli Li, Wei Yin, and Y uchao Cao. 2023. Distributed Graph Embedding with Information- Oriented Random Walks. Proceedings of the VLDB Endowment 16, 7 (2023), 1643–1656

  18. [18]

    Peng Fang, Zhenli Li, Arijit Khan, Siqiang Luo, Fang Wang, Zhan Shi, and Dan Feng. 2025. Information-Oriented Random Walks and Pipeline Optimization for Distributed Graph Embedding. IEEE Transactions on Knowledge and Data Engineering 37, 1 (2025), 408–422

  19. [19]

    Peng Fang, Siqiang Luo, Fang Wang, Bolong Zheng, Hong Jiang, Dan Feng, Hechang Pan, and Xingyu Wan. 2025. OMeGa: Boosting Large-Scale Graph Embeddings with Heterogeneous Memory Processing. In 2025 IEEE 41st Inter- national Conference on Data Engineering . 3369–3383

  20. [20]

    Peng Fang, Fang Wang, Zhan Shi, Hong Jiang, Dan Feng, Xianghao Xu, and Wei Yin. 2022. How to Realize Efficient and Scalable Graph Embeddings via an Entropy-driven Mechanism. IEEE Transactions on Big Data (2022)

  21. [21]

    Peng Fang, Fang Wang, Zhan Shi, Hong Jiang, Dan Feng, and Lei Y ang. 2021. HuGE: An entropy-driven approach to efficient and scalable graph embeddings. In IEEE 37th International Conference on Data Engineering. IEEE, 2045–2050

  22. [22]

    Tong Geng, Ang Li, Runbin Shi, Chunshu Wu, Tianqi Wang, Y anfei Li, Pouya Haghi, Antonino Tumeo, Shuai Che, Steve Reinhardt, et al. 2020. AWB-GCN: A graph convolutional network accelerator with runtime workload rebalancing. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 922–936

  23. [23]

    Chenghua Gong, Y ao Cheng, Jianxiang Y u, Can Xu, Caihua Shan, Siqiang Luo, and Xiang Li. 2024. A survey on learning from graphs with heterophily: Recent advances and future directions. arXiv preprint arXiv:2401.09769 (2024)

  24. [24]

    Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . 855–864

  25. [25]

    Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive Representa- tion Learning on Large Graphs. In Advances in Neural Information Processing Systems, V ol. 30

  26. [26]

    Jiafeng Hu, Reynold Cheng, Zhipeng Huang, Yixang Fang, and Siqiang Luo

  27. [27]

    In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management

    On embedding uncertain graphs. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management . 157–166. 15 Peng Fang, Arijit Khan, Ziqiang Wu, Zhenli Li, Yibo Zhou, Fang Wang, and Dan Feng

  28. [28]

    Weihua Hu, Matthias Fey, Marinka Zitnik, Y uxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. 2020. Open graph benchmark: Datasets for machine learning on graphs. Advances in neural information processing sys- tems 33 (2020), 22118–22133

  29. [29]

    Shihao Ji, Nadathur Satish, Sheng Li, and Pradeep K Dubey. 2019. Parallelizing word2vec in shared and distributed memory. IEEE Transactions on Parallel and Distributed Systems 30, 9 (2019), 2090–2100

  30. [30]

    M. M. Keikha, M. Rahgozar, and M. Asadpour. 2018. Community Aware Ran- dom Walk for Network Embedding. Knowledge-Based Systems 148 (2018), 47– 54

  31. [31]

    Thomas N Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In International Conference on Learning Repre- sentations

  32. [32]

    Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a social network or a news media?. In Proceedings of the 19th Interna- tional Conference on World Wide Web. 591–600

  33. [33]

    Y unjae Lee, Jinha Chung, and Minsoo Rhu. 2022. Smartsage: training large-scale graph neural networks using in-storage processing architectures. In Proceedings of the 49th Annual International Symposium on Computer Architecture . 932– 945

  34. [34]

    Adam Lerer, Ledell Wu, Jiajun Shen, Timothee Lacroix, Luca Wehrstedt, Abhi- jit Bose, and Alex Peysakhovich. 2019. Pytorch-biggraph: A large scale graph embedding system. In Proceedings of Machine Learning and Systems , V ol. 1. 120–131

  35. [35]

    Jure Leskovec, Daniel Huttenlocher, and Jon Kleinberg. 2010. Predicting pos- itive and negative links in online social networks. In Proceedings of the 19th International Conference on World Wide Web. 641–650

  36. [36]

    Y ongkun Li, Zhiyong Wu, Shuai Lin, Hong Xie, Min Lv, Yinlong Xu, and John CS Lui. 2019. Walking with perception: Efficient random walk sampling via common neighbor awareness. In 2019 IEEE 35th International Conference on Data Engineering . IEEE, 962–973

  37. [37]

    Zhiyuan Li, Xun Jian, Y ue Wang, Yingxia Shao, and Lei Chen. 2024. Daha: Ac- celerating gnn training with data and hardware aware execution planning. Pro- ceedings of the VLDB Endowment 17, 6 (2024), 1364–1376

  38. [38]

    Ningyi Liao, Dingheng Mo, Siqiang Luo, Xiang Li, and Pengcheng Yin. 2022. SCARA: scalable graph neural networks with feature-oriented optimization. arXiv preprint arXiv:2207.09179 (2022)

  39. [39]

    Dongkyun Lim and John Kim. 2025. TidalMesh: Topology-Driven AllReduce Collective Communication for Mesh Topology. In 2025 IEEE International Sym- posium on High Performance Computer Architecture. IEEE, 1526–1540

  40. [40]

    Tianfeng Liu, Y angrui Chen, Dan Li, Chuan Wu, Yibo Zhu, Jun He, Y anghua Peng, Hongzheng Chen, Hongzhi Chen, and Chuanxiong Guo. 2023. BGL: GPU- Efficient GNN training by optimizing graph data I/O and preprocessing. In 20th USENIX Symposium on Networked Systems Design and Implementation . 103– 118

  41. [41]

    Linhao Luo, Zicheng Zhao, Gholamreza Haffari, Dinh Phung, Chen Gong, and Shirui Pan. 2025. GFM-RAG: graph foundation model for retrieval augmented generation. arXiv preprint arXiv:2502.01113 (2025)

  42. [42]

    Siqiang Luo, Zichen Zhu, Xiaokui Xiao, Yin Y ang, Chunbo Li, and Ben Kao

  43. [43]

    In International Conference on Extending Database Technology

    Multi-Task Processing in V ertex-Centric Graph Systems: Evaluations and Insights. In International Conference on Extending Database Technology . 247– 259

  44. [44]

    Miller McPherson, Lynn Smith-Lovin, and James M Cook. 2001. Birds of a feather: Homophily in social networks. Annual review of sociology 27, 1 (2001), 415–444

  45. [45]

    Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. In International Conference on Learning Representations

  46. [46]

    Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. Advances in Neural Information Processing Systems 26 (2013)

  47. [47]

    Jeongmin Brian Park, Vikram Sharma Mailthody, Zaid Qureshi, and Wen-mei Hwu. 2024. Accelerating Sampling and Aggregation Operations in GNN Frame- works with GPU Initiated Direct Storage Accesses. Proceedings of the VLDB Endowment 17, 6 (2024), 1227–1240

  48. [48]

    Y eonhong Park, Sunhong Min, and Jae W Lee. 2022. Ginex: SSD-Enabled Billion-Scale Graph Neural Network Training on a Single Machine via Prov- ably Optimal in-Memory Caching. Proceedings of the VLDB Endowment 15, 11 (2022), 26262639

  49. [49]

    Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. DeepWalk: Online Learning of Social Representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . 701–710

  50. [50]

    Oleg Platonov, Denis Kuznedelev, Artem Babenko, and Liudmila Prokhorenkova

  51. [51]

    Advances in Neural Information Processing Systems 36 (2023), 523–548

    Characterizing graph datasets for node classification: Homophily- heterophily dichotomy and beyond. Advances in Neural Information Processing Systems 36 (2023), 523–548

  52. [52]

    Oleg Platonov, Denis Kuznedelev, Michael Diskin, Artem Babenko, and Liud- mila Prokhorenkova. 2023. A Critical Look at the Evaluation of GNNs under Heterophily: Are We Really Making Progress?. In International Conference on Learning Representations

  53. [53]

    Linshan Qiu, Lu Chen, Hailiang Jie, Xiangyu Ke, Y unjun Gao, Y ang Liu, and Zetao Zhang. 2024. GPU-Accelerated Batch-Dynamic Subgraph Matching. In 2024 IEEE 40th International Conference on Data Engineering (ICDE) . IEEE, 3204–3216

  54. [54]

    Alexander Renz-Wieland, Rainer Gemulla, Zoi Kaoudi, and V olker Markl. 2022. NuPS: A Parameter Server for Machine Learning with Non-Uniform Parameter Access. In Proceedings of the 2022 International Conference on Management of Data. 481–495

  55. [55]

    Andrea Rossi, Denilson Barbosa, Donatella Firmani, Antonio Matinata, and Paolo Merialdo. 2021. Knowledge graph embedding for link prediction: A com- parative analysis. ACM Transactions on Knowledge Discovery from Data 15, 2 (2021), 1–49

  56. [56]

    Guangming Sheng, Junwei Su, Chao Huang, and Chuan Wu. 2024. Mspipe: Efficient temporal gnn training via staleness-aware pipeline. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining . 2651–2662

  57. [57]

    Jaeyong Song, Hongsun Jang, Hunseong Lim, Jaewon Jung, Y oungsok Kim, and Jinho Lee. 2024. GraNNDis: Fast distributed graph neural network training framework for multi-server clusters. In Proceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques. 91–107

  58. [58]

    Dahai Tang, Jiali Wang, Rong Chen, Lei Wang, Wenyuan Y u, Jingren Zhou, and Kenli Li. 2024. Xgnn: Boosting multi-gpu gnn training via global gnn memory store. Proceedings of the VLDB Endowment 17, 5 (2024), 1105–1118

  59. [59]

    Lei Tang and Huan Liu. 2009. Scalable learning of collective behavior based on sparse social dimensions. In Proceedings of the 18th ACM Conference on Information and Knowledge Management . 1107–1116

  60. [60]

    Anton Tsitsulin, Davide Mottin, Panagiotis Karras, and Emmanuel Müller. 2018. V erse: V ersatile graph embeddings from similarity measures. In Proceedings of the 2018 World Wide Web conference. 539–548

  61. [61]

    Ke Tu, Peng Cui, Xiao Wang, Philip S Y u, and Wenwu Zhu. 2018. Deep re- cursive network embedding with regular equivalence. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Min- ing. 2357–2366

  62. [62]

    Petar V eliˇckovi´c, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Y oshua Bengio. 2018. Graph Attention Networks. In International Con- ference on Learning Representations

  63. [63]

    Roger Waleffe, Jason Mohoney, Theodoros Rekatsinas, and Shivaram V enkatara- man. 2023. MariusGNN: Resource-Efficient Out-of-Core Training of Graph Neu- ral Networks. In Proceedings of the 18th European Conference on Computer Systems (Rome, Italy). 144161

  64. [64]

    Wolfe, Anastasios Kyrillidis, Nam Sung Kim, and Yingyan Lin

    Cheng Wan, Y oujie Li, Cameron R. Wolfe, Anastasios Kyrillidis, Nam Sung Kim, and Yingyan Lin. 2022. PipeGCN: Efficient Full-Graph Training of Graph Con- volutional Networks with Pipelined Feature Communication. In International Conference on Learning Representations

  65. [65]

    Daixin Wang, Peng Cui, and Wenwu Zhu. 2016. Structural deep network em- bedding. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining . 1225–1234

  66. [66]

    Hongwei Wang, Jia Wang, Jialin Wang, Miao Zhao, Weinan Zhang, Fuzheng Zhang, Xing Xie, and Minyi Guo. 2018. GraphGAN: Graph Representation Learning With Generative Adversarial Nets. In Proceedings of the AAAI Confer- ence on Artificial Intelligence , V ol. 32

  67. [67]

    Minjie Wang, Da Zheng, Zihao Y e, Quan Gan, Mufei Li, Xiang Song, Jinjing Zhou, Chao Ma, Lingfan Y u, Y u Gai, et al. 2019. Deep graph library: A graph- centric, highly-performant package for graph neural networks. arXiv preprint arXiv:1909.01315 (2019)

  68. [68]

    Qiange Wang, Y anfeng Zhang, Hao Wang, Chaoyi Chen, Xiaodong Zhang, and Ge Y u. 2022. Neutronstar: Distributed GNN Training with Hybrid Dependency Management. In Proceedings of the 2022 International Conference on Manage- ment of Data. 1301–1315

  69. [69]

    Xiao Wang, Deyu Bo, Chuan Shi, Shaohua Fan, Y anfang Y e, and Philip S Y u

  70. [70]

    IEEE Transactions on Big Data 9, 2 (2022), 415–436

    A survey on heterogeneous graph embedding: methods, techniques, appli- cations and sources. IEEE Transactions on Big Data 9, 2 (2022), 415–436

  71. [71]

    Xuhong Wang, Ding Lyu, Mengjian Li, Y ang Xia, Qi Y ang, Xinwen Wang, Xin- guang Wang, Ping Cui, Y upu Y ang, Bowen Sun, et al. 2021. Apan: Asynchronous propagation attention network for real-time temporal graph embedding. In Pro- ceedings of the 2021 International Conference on Management of Data . 2628– 2638

  72. [72]

    Y uke Wang, Boyuan Feng, Zheng Wang, Tong Geng, Kevin Barker, Ang Li, and Y ufei Ding. 2023. MGG: Accelerating graph neural networks with Fine-Grained Intra-Kernel Communication-Computation pipelining on Multi-GPU platforms. In 17th USENIX Symposium on Operating Systems Design and Implementation . 779–795

  73. [73]

    Y uyue Wang, Xiurui Pan, Y uda An, Jie Zhang, and Glenn Reinman. 2024. BeaconGNN: Large-Scale GNN Acceleration with Out-of-Order Streaming In- Storage Computing. In IEEE International Symposium on High-Performance Computer Architecture. IEEE, 330–344. 16 FeLoG: Scalable and Efficient Distributed Graph Embedding with Feedback Loop Mechanism

  74. [74]

    Zesong Wang, Peng Fang, Fang Wang, Hong Jiang, Yimin Lu, Zhan Shi, and Dan Feng. 2025. SpiderCache: Semantic-Aware Caching Strategy for DNN Train- ing. In Proceedings of the 54th International Conference on Parallel Processing (ICPP’25). 320330

  75. [75]

    Brian J Worton. 1989. Kernel methods for estimating the utilization distribution in home-range studies. Ecology 70, 1 (1989), 164–168

  76. [76]

    Qitian Wu, Wentao Zhao, Zenan Li, David P Wipf, and Junchi Y an. 2022. Node- former: A scalable graph structure learning transformer for node classification. Advances in Neural Information Processing Systems 35 (2022), 27387–27401

  77. [77]

    Yidi Wu, Kaihao Ma, Zhenkun Cai, Tatiana Jin, Boyang Li, Chenguang Zheng, James Cheng, and Fan Y u. 2021. Seastar: V ertex-centric programming for graph neural networks. In Proceedings of the 16th European Conference on Computer Systems. 359–375

  78. [78]

    Jaewon Y ang and Jure Leskovec. 2012. Defining and evaluating network com- munities based on ground-truth. In Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics . 1–8

  79. [79]

    Renchi Y ang, Jieming Shi, Xiaokui Xiao, Yin Y ang, and Sourav S Bhowmick

  80. [80]

    Prococeding of VLDB Endowment 13, 5 (2020), 670– 683

    Homogeneous Network Embedding for Massive Graphs via Reweighted Personalized PageRank. Prococeding of VLDB Endowment 13, 5 (2020), 670– 683

Showing first 80 references.