A GPU Accelerated Temporal Window-Based Random Walk Sampler

George Parisis; Luc Berthouze; Md Ashfaq Salehin

arxiv: 2605.16182 · v1 · pith:IU5OHGNMnew · submitted 2026-05-15 · 💻 cs.DC

A GPU Accelerated Temporal Window-Based Random Walk Sampler

Md Ashfaq Salehin , George Parisis , Luc Berthouze This is my paper

Pith reviewed 2026-05-19 18:31 UTC · model grok-4.3

classification 💻 cs.DC

keywords temporal random walksGPU accelerationstreaming graphssliding windowscausal samplinghigh-throughput processinggraph streamswindow-based eviction

0 comments

The pith

Tempest uses a GPU dual-index and hierarchical scheduler to generate causality-preserving random walks on billion-edge temporal streams in real time.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Tempest, a GPU-accelerated engine for producing random walks that respect the chronological order of events in time-stamped interaction graphs. Such walks support analysis in areas like microservices monitoring, financial transaction tracking, and online platform behavior where timing determines causality. The approach tackles the difficulties of ingesting continuous high-volume streams, using memory efficiently, and enforcing temporal constraints during sampling. Success here would permit ongoing large-scale examination of evolving networks at speeds suitable for live decision-making rather than offline batch processing.

Core claim

Tempest combines a GPU-native dual-index organization over a shared edge store with a hierarchical cooperative scheduler that dispatches walks at thread, warp, or block granularity based on per-step node convergence. This design supports efficient start-edge selection, hop-by-hop causality enforcement, and window-based eviction without synchronization, while also supplying closed-form constant-time samplers for common temporal bias functions. The evaluation shows sustained real-time processing of billion-edge streams under sliding windows together with higher ingestion and walk generation throughput than prior systems, all while preserving causal correctness.

What carries the argument

The GPU-native dual-index organization over a shared edge store paired with a hierarchical cooperative scheduler that dispatches at varying granularities to enforce temporal ordering and window eviction.

If this is right

Sustained real-time ingestion and walk generation throughput on billion-edge temporal streams under sliding windows.
Higher performance than prior systems in both data ingestion and walk production while keeping causal correctness.
Constant-time sampling available for standard temporal bias functions without extra per-step computation.
Window-based eviction handled directly on the GPU without separate synchronization steps.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same scheduling approach could transfer to other order-sensitive streaming tasks such as real-time event correlation on GPUs.
Extending the dual-index structure to multiple GPUs might support even larger continuous streams without redesigning the core logic.
Closed-form bias samplers open the door to direct mathematical analysis of how different temporal preferences affect walk statistics.

Load-bearing premise

The GPU-native dual-index organization and hierarchical cooperative scheduler can enforce strict temporal ordering and window-based eviction at scale without hidden synchronization costs or memory bottlenecks.

What would settle it

Measure whether a continuous billion-edge temporal stream processed under sliding windows produces walks that violate causal order or drop below real-time throughput rates on standard GPU hardware.

Figures

Figures reproduced from arXiv: 2605.16182 by George Parisis, Luc Berthouze, Md Ashfaq Salehin.

**Figure 3.** Figure 3: Full-walk execution where each thread advances [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 2.** Figure 2: shows the layout. A node-group offset array locates each node’s edge region; a node–timestamp-group offset array locates each timestamp sub-group within that region. Both offset arrays are built once per batch and reused across all walks. Node Color Map: • n1 edges • n2 edges Node-TS Group Color Map: • (n1, t1) group • (n1, t2) group • (n2, t1) group • (n2, t2) group Edge Array: n1 → n2 t1 n1 → n3 t2 n1 → … view at source ↗

**Figure 5.** Figure 5: Dispatch plane. The 𝑊 axis selects the execution unit; the 𝐺 axis selects whether the node’s adjacency metadata is preloaded into smem or read from global memory. W thresholds are obtained empirically (Section 3.5). The 𝐺 thresholds set memory-tier boundaries. In the warp tier, the metadata sits in a warp’s slice of the block’s shared-memory allocation. In the block tier, it occupies the full block’s allo… view at source ↗

**Figure 4.** Figure 4: Three cooperative execution units. Solo dispatches [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 7.** Figure 7: Scaling on Konect-Delicious (1K to 301M edges). [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗

**Figure 6.** Figure 6: Cumulative streaming performance on the Alibaba [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 9.** Figure 9: 𝑊warp sweep across three datasets. Per-dataset throughput is normalized to its own maximum; the bold curve is the cross-dataset mean. 3.6 Cooperative Scheduler Ablation We isolate the two mechanisms of the cooperative scheduler (Section 2.4) by comparing three execution variants on TGBL-Coin, TGBL-Flight, and Konect-Delicious. Variants. Full-Walk is the one-thread-per-walk baseline of Section 2.4.1: each… view at source ↗

**Figure 8.** Figure 8: Block dimension sweep on TGBL-Coin. Solo–warp boundary [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗

**Figure 10.** Figure 10: Window sensitivity on TGB datasets. (b) reports [PITH_FULL_IMAGE:figures/full_fig_p009_10.png] view at source ↗

**Figure 11.** Figure 11: Memory usage. (a) Edge scaling in bulk mode [PITH_FULL_IMAGE:figures/full_fig_p009_11.png] view at source ↗

read the original abstract

Temporal random walks, which sample causality-preserving paths, are widely used to analyze time-stamped interactions in domains such as microservices, finance, and online platforms. Generating such walks at scale is challenging because real-world graphs evolve as high-volume streams, making continuous ingestion, efficient memory usage, and strict temporal ordering essential for practical deployment. We present Tempest (TEMPoral nEtwork Streaming Traversals), a GPU-accelerated engine for streaming temporal random walks. Tempest combines a GPU-native dual-index organization over a shared edge store with a hierarchical cooperative scheduler that dispatches walks at thread, warp, or block granularity based on per-step node convergence, enabling efficient start-edge selection, hop-by-hop causality enforcement, and window-based eviction without synchronization. It further provides closed-form constant-time samplers for common temporal bias functions. Our evaluation demonstrates sustained real-time processing of billion-edge streams under sliding windows, outperforming prior systems in ingestion and walk generation throughput while preserving causal correctness.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Tempest pairs a GPU dual-index with a hierarchical scheduler for temporal walks on streams and reports solid throughput gains, but the no-synchronization claim for causality and eviction looks under-tested at scale.

read the letter

Tempest is a GPU engine for sampling temporal random walks on high-volume edge streams. It stores edges in a dual-index layout and uses a scheduler that switches between thread, warp, and block dispatch depending on how many walks converge on a node at each step. The design aims to pick start edges, enforce hop-by-hop temporal order, and evict old edges from sliding windows without explicit synchronization, while also supplying closed-form samplers for common bias functions. The evaluation claims sustained real-time throughput on billion-edge streams and better numbers than earlier systems on both ingestion and walk generation, all while keeping causal correctness. That combination of storage layout and adaptive granularity is the concrete new piece; prior temporal walk systems are cited but none are said to use exactly this GPU-native mix. The practical focus on continuous ingestion and memory bounds for streaming data is useful for anyone who actually runs these analyses on live graphs. The main soft spot is the zero-synchronization claim. Maintaining strict temporal order and window eviction across concurrent walks typically requires some form of atomic timestamp check or memory barrier when walks meet at the same node. If those costs are present but not isolated in the reported timings, the throughput edge over prior work would shrink under full production loads. The abstract does not break out those overheads or show results under varying convergence rates, so the strongest performance statement rests on an assumption that still needs direct measurement. The paper is aimed at systems builders who tune GPU graph engines for temporal data in domains like networks or finance. A reader who cares about concrete implementation choices for streaming traversal will get usable details. It is worth sending to peer review because the architecture is testable and the engineering claims are specific enough for referees to check against code or additional runs.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces Tempest, a GPU-accelerated engine for streaming temporal random walks on evolving graphs. It proposes a GPU-native dual-index organization over a shared edge store combined with a hierarchical cooperative scheduler that dispatches walks at thread, warp, or block granularity based on per-step node convergence. This enables efficient start-edge selection, hop-by-hop causality enforcement, and window-based eviction without synchronization. The system also provides closed-form constant-time samplers for common temporal bias functions. The evaluation claims sustained real-time processing of billion-edge streams under sliding windows, outperforming prior systems in ingestion and walk generation throughput while preserving causal correctness.

Significance. If the zero-synchronization property and throughput results hold under production workloads, this would be a meaningful engineering contribution to scalable temporal graph processing on GPUs. The dual-index and hierarchical scheduler approach could influence real-time analytics systems in domains such as microservices and finance, where strict causality and sliding-window eviction are required at high ingestion rates.

major comments (2)

Abstract: The central claim that the hierarchical cooperative scheduler plus GPU-native dual-index enables 'window-based eviction without synchronization' is load-bearing for the reported throughput advantage. The manuscript does not isolate or measure potential hidden costs from per-step convergence checks, dual-index updates, or memory fences when multiple walks converge on the same node at billion-edge scale; if such costs exist, the performance edge over prior systems would be reduced.
Evaluation section: The abstract asserts 'sustained real-time processing of billion-edge streams' and 'outperforming prior systems' while 'preserving causal correctness,' yet supplies no experimental details, workload descriptions, error bars, specific baselines, or verification method for causality. This absence prevents assessment of whether the claims are robust or if they rest on unshown experiments.

minor comments (1)

Abstract: The description of 'closed-form constant-time samplers for common temporal bias functions' would benefit from naming the specific bias functions supported to improve clarity for readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and indicate the revisions we will make to strengthen the presentation of Tempest.

read point-by-point responses

Referee: Abstract: The central claim that the hierarchical cooperative scheduler plus GPU-native dual-index enables 'window-based eviction without synchronization' is load-bearing for the reported throughput advantage. The manuscript does not isolate or measure potential hidden costs from per-step convergence checks, dual-index updates, or memory fences when multiple walks converge on the same node at billion-edge scale; if such costs exist, the performance edge over prior systems would be reduced.

Authors: We agree that isolating these overheads would improve clarity. The dual-index organization and hierarchical scheduler are designed to eliminate synchronization entirely by dispatching at appropriate granularity and using per-node convergence to avoid concurrent updates on the same structures. Our reported throughputs already incorporate all per-step checks and updates. In the revised version we will add a microbenchmark subsection that quantifies the cost of convergence checks and dual-index maintenance separately on billion-edge inputs, confirming that these costs remain negligible relative to the overall gains. revision: yes
Referee: Evaluation section: The abstract asserts 'sustained real-time processing of billion-edge streams' and 'outperforming prior systems' while 'preserving causal correctness,' yet supplies no experimental details, workload descriptions, error bars, specific baselines, or verification method for causality. This absence prevents assessment of whether the claims are robust or if they rest on unshown experiments.

Authors: The Evaluation section of the full manuscript describes the workloads (synthetic and real temporal graphs up to one billion edges), the baselines, and the causality verification procedure (post-hoc temporal-order checks on sampled walks). To address the concern directly, we will expand this section with error bars from repeated runs, explicit parameter tables, and additional figures that report the causality verification results across all tested window sizes and ingestion rates. revision: yes

Circularity Check

0 steps flagged

No circularity: engineering implementation with no derivations or fitted parameters

full rationale

The paper presents Tempest as a GPU-accelerated system for streaming temporal random walks, relying on a dual-index organization and hierarchical cooperative scheduler for causality enforcement and window eviction. No mathematical derivations, equations, or parameter-fitting steps appear in the abstract or described contributions. Claims rest on implementation details and empirical evaluation of throughput and correctness, which are independent of any self-referential logic or self-citation chains. The absence of any load-bearing 'prediction' or uniqueness theorem reduces the circularity risk to zero under the specified criteria.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract relies on standard assumptions about GPU memory hierarchies and parallel scheduling but introduces no explicit free parameters, ad-hoc axioms, or new invented entities.

pith-pipeline@v0.9.0 · 5694 in / 1019 out tokens · 29596 ms · 2026-05-19T18:31:12.546696+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/ArithmeticFromLogic.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

GPU-native dual-index organization over a shared edge store with a hierarchical cooperative scheduler that dispatches walks at thread, warp, or block granularity based on per-step node convergence

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages

[1]

Kelen, Róbert Pálovics, and András A

Ferenc Béres, Domokos M. Kelen, Róbert Pálovics, and András A. Benczúr. 2019. Node Embeddings in Dynamic Graphs.Appl. Netw. Sci.4, 1 (2019), 1–27

work page 2019
[2]

Chaoyi Chen, Dechao Gao, Yanfeng Zhang, Qiange Wang, Zhenbo Fu, Xuecang Zhang, Junhua Zhu, Yu Gu, and Ge Yu. 2023. NeutronStream: A Dynamic GNN Training Framework with Sliding Window for Graph Streams.Proc. VLDB Endow. 17, 3 (2023), 455–468

work page 2023
[3]

Anurag Dutt, Doseok Jang, Joao Nadkarni, Kai Su, and Anshul Gandhi. 2025. GUIDE: GNN-based Unified Incident Detection for Microservices Application Deployments. InProc. of IEEE ICDCS

work page 2025
[4]

Jessica Enright, Kitty Meeks, and Hendrik Molter. 2025. Counting Temporal Paths.Algorithmica87, 3 (2025), 736–782

work page 2025
[5]

Ping Gong, Renjie Liu, Zunyao Mao, Zhenkun Cai, Xiao Yan, Cheng Li, Minjie Wang, and Zhuozhao Li. 2023. gSampler: General and Efficient GPU-based Graph Sampling for Graph Learning. InProc. of ACM SOSP

work page 2023
[6]

Xiangyang Gou and Lei Zou. 2021. Sliding Window-based Approximate Triangle Counting over Streaming Graphs with Duplicate Edges. InProc. of ACM SIGMOD

work page 2021
[7]

Hamilton, Zhitao Ying, and Jure Leskovec

William L. Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive Represen- tation Learning on Large Graphs. InProc. of NeurIPS

work page 2017
[8]

Mark Harris. 2007. Optimizing Parallel Reduction in CUDA. NVIDIA Developer Technology

work page 2007
[9]

Petter Holme and Jari Saramäki. 2012. Temporal networks.Phys. Rep.519, 3 (2012), 97–125

work page 2012
[10]

Chengying Huan, Yongchao Liu, Heng Zhang, Shuaiwen Song, Santosh Pandey, Shiyang Chen, Xiangfei Fang, Yue Jin, Baptiste Lepers, Yanjun Wu, and Hang Liu. 2024. TEA+: A Novel Temporal Graph Random Walk Engine with Hybrid Storage Architecture.ACM TACO21, 2 (2024)

work page 2024
[11]

Chengying Huan, Shuaiwen Leon Song, Santosh Pandey, Hang Liu, Yongchao Liu, Baptiste Lepers, Changhua He, Kang Chen, Jinlei Jiang, and Yongwei Wu

work page
[12]

TEA: A General-Purpose Temporal Graph Random Walk Engine. InProc. of ACM EuroSys

work page
[13]

Shane Culpepper

Shixun Huang, Zhifeng Bao, Guoliang Li, Yanghao Zhou, and J. Shane Culpepper

work page
[14]

Temporal Network Representation Learning via Historical Neighborhoods Aggregation. InProc. of IEEE ICDE

work page
[15]

Shenyang Huang, Farimah Poursafaei, Jacob Danovitch, Matthias Fey, Weihua Hu, Emanuele Rossi, Jure Leskovec, Michael Bronstein, Guillaume Rabusseau, and Reihaneh Rabbany. 2023. Temporal Graph Benchmark for Machine Learning on Temporal Graphs. InProc. of NeurIPS

work page 2023
[16]

Abhinav Jangda, Sandeep Polisetty, Arjun Guha, and Marco Serafini. 2021. Ac- celerating Graph Sampling for Graph Machine Learning Using GPUs. InProc. of ACM EuroSys

work page 2021
[17]

Ming Jin, Yuan-Fang Li, and Shirui Pan. 2022. Neural temporal walks: motif- aware representation learning on continuous-time dynamic graphs. InProc. of NeurIPS

work page 2022
[18]

Mertzios, and Paul G

Nina Klobas, George B. Mertzios, and Paul G. Spirakis. 2023. Sliding into the Future: Investigating Sliding Windows in Temporal Graphs. InProc. of MFCS

work page 2023
[19]

Jérôme Kunegis. 2013. KONECT: The Koblenz Network Collection. InProc. of ACM WWW Companion

work page 2013
[20]

Shutian Luo, Huanle Xu, Kejiang Ye, Guoyao Xu, Liping Zhang, Guodong Yang, and Chengzhong Xu. 2022. The Power of Prediction: Microservice Auto Scaling via Workload Learning. InProc. of ACM SoCC

work page 2022
[21]

Yang Luo, Mohan Gao, Zhemeng Yu, Haoyuan Ge, Xiaofeng Gao, Tengwei Cai, and Guihai Chen. 2024. Integrating System State into Spatio Temporal Graph Neural Network for Microservice Workload Prediction. InProc. of ACM KDD

work page 2024
[22]

Mehrdad Mahdavi, Saeed Khoshraftar, and Aijun An. 2018. dynnode2vec: Scalable Dynamic Network Embedding. InProc. of IEEE BigData

work page 2018
[23]

Junyi Mei, Shixuan Sun, Chao Li, Cheng Xu, Cheng Chen, Yibo Liu, Jing Wang, Cheng Zhao, Xiaofeng Hou, Minyi Guo, Bingsheng He, and Xiaoliang Cong. 2024. FlowWalker: A Memory-Efficient and High-Performance GPU-Based Dynamic Graph Random Walk Framework.Proc. VLDB Endow.17, 8 (2024), 1788–1801

work page 2024
[24]

Corrado, and Jeffrey Dean

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. InProc. of NeurIPS

work page 2013
[25]

Rossi, Nesreen K

Giang Hoang Nguyen, Jae-Gil Lee, Ryan A. Rossi, Nesreen K. Ahmed, Eunyee Koh, and Sungpack Kim. 2018. Continuous-Time Dynamic Network Embeddings. InProc. of ACM WWW Companion

work page 2018
[26]

2012.CUDA C Best Practices Guide

NVIDIA Corporation. 2012.CUDA C Best Practices Guide. Technical Report. NVIDIA

work page 2012
[27]

NVIDIA Corporation. 2024. CUB: A Flexible Library of Cooperative Threadblock Primitives. https://github.com/NVIDIA/cccl/tree/main/cub

work page 2024
[28]

Li, and Hang Liu

Santosh Pandey, Lingda Li, Adolfy Hoisie, Xiaoye S. Li, and Hang Liu. 2020. C-SAW: A Framework for Graph Sampling and Random Walk on GPUs. InProc. of SC

work page 2020
[29]

Jinhyeon Park, Bumsoo Song, Yongseok Shin, Hyeonseong Kim, Sungpack Hong, and Jinha Lee. 2026. FlexiWalker: Extensible GPU Framework for Efficient Dynamic Random Walks with Runtime Adaptation. InProc. of ACM EuroSys

work page 2026
[30]

Emanuele Rossi, Ben Chamberlain, Fabrizio Frasca, Davide Eynard, Federico Monti, and Michael Bronstein. 2020. Temporal Graph Networks for Deep Learn- ing on Dynamic Graphs. InProc. of ICML

work page 2020
[31]

Jiaxin Shi, Youyang Yao, Rong Chen, Haibo Chen, and Feifei Li. 2016. Fast and Concurrent RDF Queries with RDMA-based Distributed Graph Exploration. In Proc. of USENIX OSDI

work page 2016
[32]

Gagan Somashekar, Anurag Dutt, Mainak Adak, Tania Lorido Botran, and An- shul Gandhi. 2024. GAMMA: Graph Neural Network-Based Multi-Bottleneck Localization for Microservices Applications. InProc. of ACM WWW

work page 2024
[33]

Steer, Felix Cuadrado, and Richard G

Benjamin A. Steer, Felix Cuadrado, and Richard G. Clegg. 2017. Raphtory: Decentralised Streaming for Temporal Graphs. InProc. of ACM DEBS

work page 2017
[34]

Shixuan Sun, Yuhang Chen, Shengliang Lu, Bingsheng He, and Yuchen Li. 2021. ThunderRW: An In-Memory Graph Random Walk Engine.Proc. VLDB Endow. 14, 11 (2021), 1992–2005

work page 2021
[35]

Keita Suzuki, Kenta Ishiguro, and Kenji Kono. 2024. Balancing Analysis Time and Bug Detection: Daily Development-Friendly Bug Detection in Linux. InProc. of USENIX ATC

work page 2024
[36]

Visa Inc. 2024. Visa Annual Report 2024. https://investor.visa.com

work page 2024
[37]

Pinhuan Wang, Chengying Huan, Zhibin Wang, Chen Tian, Yuede Ji, and Hang Liu. 2025. Bingo: Radix-based Bias Factorization for Random Walk on Dynamic Graphs. InProc. of ACM EuroSys

work page 2025
[38]

Pengyu Wang, Chao Li, Jing Wang, Taolei Wang, Lu Zhang, Jingwen Leng, Quan Chen, and Minyi Guo. 2021. Skywalker: Efficient Alias-method-based Graph Sampling and Random Walk on GPUs. InProc. of IEEE PACT

work page 2021
[39]

Rui Wang, Yongkun Li, Hong Xie, Yinlong Xu, and John C. S. Lui. 2020. Graph- Walker: An I/O-efficient and Resource-friendly Graph Analytic System for Fast and Scalable Random Walks. InProc. of USENIX ATC

work page 2020
[40]

Xinshi Wang, Dingxian Zhang, Da Zheng, Alex Smola, and Le Song. 2021. In- ductive Link Prediction on Temporal Graphs via Causal Anonymous Walks. In Proc. of ICLR

work page 2021
[41]

Brian Wickman, Hong Hu, Insu Yun, DaeHee Jang, JungWon Lim, Sanidhya Kashyap, and Taesoo Kim. 2021. Preventing Use-After-Free Attacks with Fast Forward Allocation. InProc. of USENIX Security

work page 2021
[42]

Huanhuan Wu, Yuzhen Huang, James Cheng, Jinfeng Li, and Yiping Ke. 2016. Reachability and Time-based Path Queries in Temporal Graphs. InProc. of IEEE ICDE

work page 2016
[43]

Da Xu, Chuanwei Ruan, Evren Korpeoglu, Sushant Kumar, and Kannan Achan

work page
[44]

Inductive Representation Learning on Temporal Graphs. InProc. of ICLR

work page
[45]

Wentao Xu, Weiqing Liu, Chang Xu, Jiang Bian, Jian Yin, and Tie-Yan Liu. 2021. REST: Relational Event-driven Stock Trend Forecasting. InProc. of ACM WWW

work page 2021
[46]

Ke Yang, Xiaosong Ma, Saravanan Thirumuruganathan, Kang Chen, and Yongwei Wu. 2021. Random Walks on Huge Graphs at Cache Efficiency. InProc. of ACM SOSP

work page 2021
[47]

Ke Yang, MingXing Zhang, Kang Chen, Xiaosong Ma, Yang Bai, and Yong Jiang

work page
[48]

KnightKing: A Fast Distributed Graph Random Walk Engine. InProc. of ACM SOSP

work page

[1] [1]

Kelen, Róbert Pálovics, and András A

Ferenc Béres, Domokos M. Kelen, Róbert Pálovics, and András A. Benczúr. 2019. Node Embeddings in Dynamic Graphs.Appl. Netw. Sci.4, 1 (2019), 1–27

work page 2019

[2] [2]

Chaoyi Chen, Dechao Gao, Yanfeng Zhang, Qiange Wang, Zhenbo Fu, Xuecang Zhang, Junhua Zhu, Yu Gu, and Ge Yu. 2023. NeutronStream: A Dynamic GNN Training Framework with Sliding Window for Graph Streams.Proc. VLDB Endow. 17, 3 (2023), 455–468

work page 2023

[3] [3]

Anurag Dutt, Doseok Jang, Joao Nadkarni, Kai Su, and Anshul Gandhi. 2025. GUIDE: GNN-based Unified Incident Detection for Microservices Application Deployments. InProc. of IEEE ICDCS

work page 2025

[4] [4]

Jessica Enright, Kitty Meeks, and Hendrik Molter. 2025. Counting Temporal Paths.Algorithmica87, 3 (2025), 736–782

work page 2025

[5] [5]

Ping Gong, Renjie Liu, Zunyao Mao, Zhenkun Cai, Xiao Yan, Cheng Li, Minjie Wang, and Zhuozhao Li. 2023. gSampler: General and Efficient GPU-based Graph Sampling for Graph Learning. InProc. of ACM SOSP

work page 2023

[6] [6]

Xiangyang Gou and Lei Zou. 2021. Sliding Window-based Approximate Triangle Counting over Streaming Graphs with Duplicate Edges. InProc. of ACM SIGMOD

work page 2021

[7] [7]

Hamilton, Zhitao Ying, and Jure Leskovec

William L. Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive Represen- tation Learning on Large Graphs. InProc. of NeurIPS

work page 2017

[8] [8]

Mark Harris. 2007. Optimizing Parallel Reduction in CUDA. NVIDIA Developer Technology

work page 2007

[9] [9]

Petter Holme and Jari Saramäki. 2012. Temporal networks.Phys. Rep.519, 3 (2012), 97–125

work page 2012

[10] [10]

Chengying Huan, Yongchao Liu, Heng Zhang, Shuaiwen Song, Santosh Pandey, Shiyang Chen, Xiangfei Fang, Yue Jin, Baptiste Lepers, Yanjun Wu, and Hang Liu. 2024. TEA+: A Novel Temporal Graph Random Walk Engine with Hybrid Storage Architecture.ACM TACO21, 2 (2024)

work page 2024

[11] [11]

Chengying Huan, Shuaiwen Leon Song, Santosh Pandey, Hang Liu, Yongchao Liu, Baptiste Lepers, Changhua He, Kang Chen, Jinlei Jiang, and Yongwei Wu

work page

[12] [12]

TEA: A General-Purpose Temporal Graph Random Walk Engine. InProc. of ACM EuroSys

work page

[13] [13]

Shane Culpepper

Shixun Huang, Zhifeng Bao, Guoliang Li, Yanghao Zhou, and J. Shane Culpepper

work page

[14] [14]

Temporal Network Representation Learning via Historical Neighborhoods Aggregation. InProc. of IEEE ICDE

work page

[15] [15]

Shenyang Huang, Farimah Poursafaei, Jacob Danovitch, Matthias Fey, Weihua Hu, Emanuele Rossi, Jure Leskovec, Michael Bronstein, Guillaume Rabusseau, and Reihaneh Rabbany. 2023. Temporal Graph Benchmark for Machine Learning on Temporal Graphs. InProc. of NeurIPS

work page 2023

[16] [16]

Abhinav Jangda, Sandeep Polisetty, Arjun Guha, and Marco Serafini. 2021. Ac- celerating Graph Sampling for Graph Machine Learning Using GPUs. InProc. of ACM EuroSys

work page 2021

[17] [17]

Ming Jin, Yuan-Fang Li, and Shirui Pan. 2022. Neural temporal walks: motif- aware representation learning on continuous-time dynamic graphs. InProc. of NeurIPS

work page 2022

[18] [18]

Mertzios, and Paul G

Nina Klobas, George B. Mertzios, and Paul G. Spirakis. 2023. Sliding into the Future: Investigating Sliding Windows in Temporal Graphs. InProc. of MFCS

work page 2023

[19] [19]

Jérôme Kunegis. 2013. KONECT: The Koblenz Network Collection. InProc. of ACM WWW Companion

work page 2013

[20] [20]

Shutian Luo, Huanle Xu, Kejiang Ye, Guoyao Xu, Liping Zhang, Guodong Yang, and Chengzhong Xu. 2022. The Power of Prediction: Microservice Auto Scaling via Workload Learning. InProc. of ACM SoCC

work page 2022

[21] [21]

Yang Luo, Mohan Gao, Zhemeng Yu, Haoyuan Ge, Xiaofeng Gao, Tengwei Cai, and Guihai Chen. 2024. Integrating System State into Spatio Temporal Graph Neural Network for Microservice Workload Prediction. InProc. of ACM KDD

work page 2024

[22] [22]

Mehrdad Mahdavi, Saeed Khoshraftar, and Aijun An. 2018. dynnode2vec: Scalable Dynamic Network Embedding. InProc. of IEEE BigData

work page 2018

[23] [23]

Junyi Mei, Shixuan Sun, Chao Li, Cheng Xu, Cheng Chen, Yibo Liu, Jing Wang, Cheng Zhao, Xiaofeng Hou, Minyi Guo, Bingsheng He, and Xiaoliang Cong. 2024. FlowWalker: A Memory-Efficient and High-Performance GPU-Based Dynamic Graph Random Walk Framework.Proc. VLDB Endow.17, 8 (2024), 1788–1801

work page 2024

[24] [24]

Corrado, and Jeffrey Dean

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. InProc. of NeurIPS

work page 2013

[25] [25]

Rossi, Nesreen K

Giang Hoang Nguyen, Jae-Gil Lee, Ryan A. Rossi, Nesreen K. Ahmed, Eunyee Koh, and Sungpack Kim. 2018. Continuous-Time Dynamic Network Embeddings. InProc. of ACM WWW Companion

work page 2018

[26] [26]

2012.CUDA C Best Practices Guide

NVIDIA Corporation. 2012.CUDA C Best Practices Guide. Technical Report. NVIDIA

work page 2012

[27] [27]

NVIDIA Corporation. 2024. CUB: A Flexible Library of Cooperative Threadblock Primitives. https://github.com/NVIDIA/cccl/tree/main/cub

work page 2024

[28] [28]

Li, and Hang Liu

Santosh Pandey, Lingda Li, Adolfy Hoisie, Xiaoye S. Li, and Hang Liu. 2020. C-SAW: A Framework for Graph Sampling and Random Walk on GPUs. InProc. of SC

work page 2020

[29] [29]

Jinhyeon Park, Bumsoo Song, Yongseok Shin, Hyeonseong Kim, Sungpack Hong, and Jinha Lee. 2026. FlexiWalker: Extensible GPU Framework for Efficient Dynamic Random Walks with Runtime Adaptation. InProc. of ACM EuroSys

work page 2026

[30] [30]

Emanuele Rossi, Ben Chamberlain, Fabrizio Frasca, Davide Eynard, Federico Monti, and Michael Bronstein. 2020. Temporal Graph Networks for Deep Learn- ing on Dynamic Graphs. InProc. of ICML

work page 2020

[31] [31]

Jiaxin Shi, Youyang Yao, Rong Chen, Haibo Chen, and Feifei Li. 2016. Fast and Concurrent RDF Queries with RDMA-based Distributed Graph Exploration. In Proc. of USENIX OSDI

work page 2016

[32] [32]

Gagan Somashekar, Anurag Dutt, Mainak Adak, Tania Lorido Botran, and An- shul Gandhi. 2024. GAMMA: Graph Neural Network-Based Multi-Bottleneck Localization for Microservices Applications. InProc. of ACM WWW

work page 2024

[33] [33]

Steer, Felix Cuadrado, and Richard G

Benjamin A. Steer, Felix Cuadrado, and Richard G. Clegg. 2017. Raphtory: Decentralised Streaming for Temporal Graphs. InProc. of ACM DEBS

work page 2017

[34] [34]

Shixuan Sun, Yuhang Chen, Shengliang Lu, Bingsheng He, and Yuchen Li. 2021. ThunderRW: An In-Memory Graph Random Walk Engine.Proc. VLDB Endow. 14, 11 (2021), 1992–2005

work page 2021

[35] [35]

Keita Suzuki, Kenta Ishiguro, and Kenji Kono. 2024. Balancing Analysis Time and Bug Detection: Daily Development-Friendly Bug Detection in Linux. InProc. of USENIX ATC

work page 2024

[36] [36]

Visa Inc. 2024. Visa Annual Report 2024. https://investor.visa.com

work page 2024

[37] [37]

Pinhuan Wang, Chengying Huan, Zhibin Wang, Chen Tian, Yuede Ji, and Hang Liu. 2025. Bingo: Radix-based Bias Factorization for Random Walk on Dynamic Graphs. InProc. of ACM EuroSys

work page 2025

[38] [38]

Pengyu Wang, Chao Li, Jing Wang, Taolei Wang, Lu Zhang, Jingwen Leng, Quan Chen, and Minyi Guo. 2021. Skywalker: Efficient Alias-method-based Graph Sampling and Random Walk on GPUs. InProc. of IEEE PACT

work page 2021

[39] [39]

Rui Wang, Yongkun Li, Hong Xie, Yinlong Xu, and John C. S. Lui. 2020. Graph- Walker: An I/O-efficient and Resource-friendly Graph Analytic System for Fast and Scalable Random Walks. InProc. of USENIX ATC

work page 2020

[40] [40]

Xinshi Wang, Dingxian Zhang, Da Zheng, Alex Smola, and Le Song. 2021. In- ductive Link Prediction on Temporal Graphs via Causal Anonymous Walks. In Proc. of ICLR

work page 2021

[41] [41]

Brian Wickman, Hong Hu, Insu Yun, DaeHee Jang, JungWon Lim, Sanidhya Kashyap, and Taesoo Kim. 2021. Preventing Use-After-Free Attacks with Fast Forward Allocation. InProc. of USENIX Security

work page 2021

[42] [42]

Huanhuan Wu, Yuzhen Huang, James Cheng, Jinfeng Li, and Yiping Ke. 2016. Reachability and Time-based Path Queries in Temporal Graphs. InProc. of IEEE ICDE

work page 2016

[43] [43]

Da Xu, Chuanwei Ruan, Evren Korpeoglu, Sushant Kumar, and Kannan Achan

work page

[44] [44]

Inductive Representation Learning on Temporal Graphs. InProc. of ICLR

work page

[45] [45]

Wentao Xu, Weiqing Liu, Chang Xu, Jiang Bian, Jian Yin, and Tie-Yan Liu. 2021. REST: Relational Event-driven Stock Trend Forecasting. InProc. of ACM WWW

work page 2021

[46] [46]

Ke Yang, Xiaosong Ma, Saravanan Thirumuruganathan, Kang Chen, and Yongwei Wu. 2021. Random Walks on Huge Graphs at Cache Efficiency. InProc. of ACM SOSP

work page 2021

[47] [47]

Ke Yang, MingXing Zhang, Kang Chen, Xiaosong Ma, Yang Bai, and Yong Jiang

work page

[48] [48]

KnightKing: A Fast Distributed Graph Random Walk Engine. InProc. of ACM SOSP

work page