A GPU Accelerated Temporal Window-Based Random Walk Sampler
Pith reviewed 2026-05-19 18:31 UTC · model grok-4.3
The pith
Tempest uses a GPU dual-index and hierarchical scheduler to generate causality-preserving random walks on billion-edge temporal streams in real time.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Tempest combines a GPU-native dual-index organization over a shared edge store with a hierarchical cooperative scheduler that dispatches walks at thread, warp, or block granularity based on per-step node convergence. This design supports efficient start-edge selection, hop-by-hop causality enforcement, and window-based eviction without synchronization, while also supplying closed-form constant-time samplers for common temporal bias functions. The evaluation shows sustained real-time processing of billion-edge streams under sliding windows together with higher ingestion and walk generation throughput than prior systems, all while preserving causal correctness.
What carries the argument
The GPU-native dual-index organization over a shared edge store paired with a hierarchical cooperative scheduler that dispatches at varying granularities to enforce temporal ordering and window eviction.
If this is right
- Sustained real-time ingestion and walk generation throughput on billion-edge temporal streams under sliding windows.
- Higher performance than prior systems in both data ingestion and walk production while keeping causal correctness.
- Constant-time sampling available for standard temporal bias functions without extra per-step computation.
- Window-based eviction handled directly on the GPU without separate synchronization steps.
Where Pith is reading between the lines
- The same scheduling approach could transfer to other order-sensitive streaming tasks such as real-time event correlation on GPUs.
- Extending the dual-index structure to multiple GPUs might support even larger continuous streams without redesigning the core logic.
- Closed-form bias samplers open the door to direct mathematical analysis of how different temporal preferences affect walk statistics.
Load-bearing premise
The GPU-native dual-index organization and hierarchical cooperative scheduler can enforce strict temporal ordering and window-based eviction at scale without hidden synchronization costs or memory bottlenecks.
What would settle it
Measure whether a continuous billion-edge temporal stream processed under sliding windows produces walks that violate causal order or drop below real-time throughput rates on standard GPU hardware.
Figures
read the original abstract
Temporal random walks, which sample causality-preserving paths, are widely used to analyze time-stamped interactions in domains such as microservices, finance, and online platforms. Generating such walks at scale is challenging because real-world graphs evolve as high-volume streams, making continuous ingestion, efficient memory usage, and strict temporal ordering essential for practical deployment. We present Tempest (TEMPoral nEtwork Streaming Traversals), a GPU-accelerated engine for streaming temporal random walks. Tempest combines a GPU-native dual-index organization over a shared edge store with a hierarchical cooperative scheduler that dispatches walks at thread, warp, or block granularity based on per-step node convergence, enabling efficient start-edge selection, hop-by-hop causality enforcement, and window-based eviction without synchronization. It further provides closed-form constant-time samplers for common temporal bias functions. Our evaluation demonstrates sustained real-time processing of billion-edge streams under sliding windows, outperforming prior systems in ingestion and walk generation throughput while preserving causal correctness.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Tempest, a GPU-accelerated engine for streaming temporal random walks on evolving graphs. It proposes a GPU-native dual-index organization over a shared edge store combined with a hierarchical cooperative scheduler that dispatches walks at thread, warp, or block granularity based on per-step node convergence. This enables efficient start-edge selection, hop-by-hop causality enforcement, and window-based eviction without synchronization. The system also provides closed-form constant-time samplers for common temporal bias functions. The evaluation claims sustained real-time processing of billion-edge streams under sliding windows, outperforming prior systems in ingestion and walk generation throughput while preserving causal correctness.
Significance. If the zero-synchronization property and throughput results hold under production workloads, this would be a meaningful engineering contribution to scalable temporal graph processing on GPUs. The dual-index and hierarchical scheduler approach could influence real-time analytics systems in domains such as microservices and finance, where strict causality and sliding-window eviction are required at high ingestion rates.
major comments (2)
- Abstract: The central claim that the hierarchical cooperative scheduler plus GPU-native dual-index enables 'window-based eviction without synchronization' is load-bearing for the reported throughput advantage. The manuscript does not isolate or measure potential hidden costs from per-step convergence checks, dual-index updates, or memory fences when multiple walks converge on the same node at billion-edge scale; if such costs exist, the performance edge over prior systems would be reduced.
- Evaluation section: The abstract asserts 'sustained real-time processing of billion-edge streams' and 'outperforming prior systems' while 'preserving causal correctness,' yet supplies no experimental details, workload descriptions, error bars, specific baselines, or verification method for causality. This absence prevents assessment of whether the claims are robust or if they rest on unshown experiments.
minor comments (1)
- Abstract: The description of 'closed-form constant-time samplers for common temporal bias functions' would benefit from naming the specific bias functions supported to improve clarity for readers.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below and indicate the revisions we will make to strengthen the presentation of Tempest.
read point-by-point responses
-
Referee: Abstract: The central claim that the hierarchical cooperative scheduler plus GPU-native dual-index enables 'window-based eviction without synchronization' is load-bearing for the reported throughput advantage. The manuscript does not isolate or measure potential hidden costs from per-step convergence checks, dual-index updates, or memory fences when multiple walks converge on the same node at billion-edge scale; if such costs exist, the performance edge over prior systems would be reduced.
Authors: We agree that isolating these overheads would improve clarity. The dual-index organization and hierarchical scheduler are designed to eliminate synchronization entirely by dispatching at appropriate granularity and using per-node convergence to avoid concurrent updates on the same structures. Our reported throughputs already incorporate all per-step checks and updates. In the revised version we will add a microbenchmark subsection that quantifies the cost of convergence checks and dual-index maintenance separately on billion-edge inputs, confirming that these costs remain negligible relative to the overall gains. revision: yes
-
Referee: Evaluation section: The abstract asserts 'sustained real-time processing of billion-edge streams' and 'outperforming prior systems' while 'preserving causal correctness,' yet supplies no experimental details, workload descriptions, error bars, specific baselines, or verification method for causality. This absence prevents assessment of whether the claims are robust or if they rest on unshown experiments.
Authors: The Evaluation section of the full manuscript describes the workloads (synthetic and real temporal graphs up to one billion edges), the baselines, and the causality verification procedure (post-hoc temporal-order checks on sampled walks). To address the concern directly, we will expand this section with error bars from repeated runs, explicit parameter tables, and additional figures that report the causality verification results across all tested window sizes and ingestion rates. revision: yes
Circularity Check
No circularity: engineering implementation with no derivations or fitted parameters
full rationale
The paper presents Tempest as a GPU-accelerated system for streaming temporal random walks, relying on a dual-index organization and hierarchical cooperative scheduler for causality enforcement and window eviction. No mathematical derivations, equations, or parameter-fitting steps appear in the abstract or described contributions. Claims rest on implementation details and empirical evaluation of throughput and correctness, which are independent of any self-referential logic or self-citation chains. The absence of any load-bearing 'prediction' or uniqueness theorem reduces the circularity risk to zero under the specified criteria.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
GPU-native dual-index organization over a shared edge store with a hierarchical cooperative scheduler that dispatches walks at thread, warp, or block granularity based on per-step node convergence
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Kelen, Róbert Pálovics, and András A
Ferenc Béres, Domokos M. Kelen, Róbert Pálovics, and András A. Benczúr. 2019. Node Embeddings in Dynamic Graphs.Appl. Netw. Sci.4, 1 (2019), 1–27
work page 2019
-
[2]
Chaoyi Chen, Dechao Gao, Yanfeng Zhang, Qiange Wang, Zhenbo Fu, Xuecang Zhang, Junhua Zhu, Yu Gu, and Ge Yu. 2023. NeutronStream: A Dynamic GNN Training Framework with Sliding Window for Graph Streams.Proc. VLDB Endow. 17, 3 (2023), 455–468
work page 2023
-
[3]
Anurag Dutt, Doseok Jang, Joao Nadkarni, Kai Su, and Anshul Gandhi. 2025. GUIDE: GNN-based Unified Incident Detection for Microservices Application Deployments. InProc. of IEEE ICDCS
work page 2025
-
[4]
Jessica Enright, Kitty Meeks, and Hendrik Molter. 2025. Counting Temporal Paths.Algorithmica87, 3 (2025), 736–782
work page 2025
-
[5]
Ping Gong, Renjie Liu, Zunyao Mao, Zhenkun Cai, Xiao Yan, Cheng Li, Minjie Wang, and Zhuozhao Li. 2023. gSampler: General and Efficient GPU-based Graph Sampling for Graph Learning. InProc. of ACM SOSP
work page 2023
-
[6]
Xiangyang Gou and Lei Zou. 2021. Sliding Window-based Approximate Triangle Counting over Streaming Graphs with Duplicate Edges. InProc. of ACM SIGMOD
work page 2021
-
[7]
Hamilton, Zhitao Ying, and Jure Leskovec
William L. Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive Represen- tation Learning on Large Graphs. InProc. of NeurIPS
work page 2017
-
[8]
Mark Harris. 2007. Optimizing Parallel Reduction in CUDA. NVIDIA Developer Technology
work page 2007
-
[9]
Petter Holme and Jari Saramäki. 2012. Temporal networks.Phys. Rep.519, 3 (2012), 97–125
work page 2012
-
[10]
Chengying Huan, Yongchao Liu, Heng Zhang, Shuaiwen Song, Santosh Pandey, Shiyang Chen, Xiangfei Fang, Yue Jin, Baptiste Lepers, Yanjun Wu, and Hang Liu. 2024. TEA+: A Novel Temporal Graph Random Walk Engine with Hybrid Storage Architecture.ACM TACO21, 2 (2024)
work page 2024
-
[11]
Chengying Huan, Shuaiwen Leon Song, Santosh Pandey, Hang Liu, Yongchao Liu, Baptiste Lepers, Changhua He, Kang Chen, Jinlei Jiang, and Yongwei Wu
-
[12]
TEA: A General-Purpose Temporal Graph Random Walk Engine. InProc. of ACM EuroSys
-
[13]
Shixun Huang, Zhifeng Bao, Guoliang Li, Yanghao Zhou, and J. Shane Culpepper
-
[14]
Temporal Network Representation Learning via Historical Neighborhoods Aggregation. InProc. of IEEE ICDE
-
[15]
Shenyang Huang, Farimah Poursafaei, Jacob Danovitch, Matthias Fey, Weihua Hu, Emanuele Rossi, Jure Leskovec, Michael Bronstein, Guillaume Rabusseau, and Reihaneh Rabbany. 2023. Temporal Graph Benchmark for Machine Learning on Temporal Graphs. InProc. of NeurIPS
work page 2023
-
[16]
Abhinav Jangda, Sandeep Polisetty, Arjun Guha, and Marco Serafini. 2021. Ac- celerating Graph Sampling for Graph Machine Learning Using GPUs. InProc. of ACM EuroSys
work page 2021
-
[17]
Ming Jin, Yuan-Fang Li, and Shirui Pan. 2022. Neural temporal walks: motif- aware representation learning on continuous-time dynamic graphs. InProc. of NeurIPS
work page 2022
-
[18]
Nina Klobas, George B. Mertzios, and Paul G. Spirakis. 2023. Sliding into the Future: Investigating Sliding Windows in Temporal Graphs. InProc. of MFCS
work page 2023
-
[19]
Jérôme Kunegis. 2013. KONECT: The Koblenz Network Collection. InProc. of ACM WWW Companion
work page 2013
-
[20]
Shutian Luo, Huanle Xu, Kejiang Ye, Guoyao Xu, Liping Zhang, Guodong Yang, and Chengzhong Xu. 2022. The Power of Prediction: Microservice Auto Scaling via Workload Learning. InProc. of ACM SoCC
work page 2022
-
[21]
Yang Luo, Mohan Gao, Zhemeng Yu, Haoyuan Ge, Xiaofeng Gao, Tengwei Cai, and Guihai Chen. 2024. Integrating System State into Spatio Temporal Graph Neural Network for Microservice Workload Prediction. InProc. of ACM KDD
work page 2024
-
[22]
Mehrdad Mahdavi, Saeed Khoshraftar, and Aijun An. 2018. dynnode2vec: Scalable Dynamic Network Embedding. InProc. of IEEE BigData
work page 2018
-
[23]
Junyi Mei, Shixuan Sun, Chao Li, Cheng Xu, Cheng Chen, Yibo Liu, Jing Wang, Cheng Zhao, Xiaofeng Hou, Minyi Guo, Bingsheng He, and Xiaoliang Cong. 2024. FlowWalker: A Memory-Efficient and High-Performance GPU-Based Dynamic Graph Random Walk Framework.Proc. VLDB Endow.17, 8 (2024), 1788–1801
work page 2024
-
[24]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. InProc. of NeurIPS
work page 2013
-
[25]
Giang Hoang Nguyen, Jae-Gil Lee, Ryan A. Rossi, Nesreen K. Ahmed, Eunyee Koh, and Sungpack Kim. 2018. Continuous-Time Dynamic Network Embeddings. InProc. of ACM WWW Companion
work page 2018
-
[26]
2012.CUDA C Best Practices Guide
NVIDIA Corporation. 2012.CUDA C Best Practices Guide. Technical Report. NVIDIA
work page 2012
-
[27]
NVIDIA Corporation. 2024. CUB: A Flexible Library of Cooperative Threadblock Primitives. https://github.com/NVIDIA/cccl/tree/main/cub
work page 2024
-
[28]
Santosh Pandey, Lingda Li, Adolfy Hoisie, Xiaoye S. Li, and Hang Liu. 2020. C-SAW: A Framework for Graph Sampling and Random Walk on GPUs. InProc. of SC
work page 2020
-
[29]
Jinhyeon Park, Bumsoo Song, Yongseok Shin, Hyeonseong Kim, Sungpack Hong, and Jinha Lee. 2026. FlexiWalker: Extensible GPU Framework for Efficient Dynamic Random Walks with Runtime Adaptation. InProc. of ACM EuroSys
work page 2026
-
[30]
Emanuele Rossi, Ben Chamberlain, Fabrizio Frasca, Davide Eynard, Federico Monti, and Michael Bronstein. 2020. Temporal Graph Networks for Deep Learn- ing on Dynamic Graphs. InProc. of ICML
work page 2020
-
[31]
Jiaxin Shi, Youyang Yao, Rong Chen, Haibo Chen, and Feifei Li. 2016. Fast and Concurrent RDF Queries with RDMA-based Distributed Graph Exploration. In Proc. of USENIX OSDI
work page 2016
-
[32]
Gagan Somashekar, Anurag Dutt, Mainak Adak, Tania Lorido Botran, and An- shul Gandhi. 2024. GAMMA: Graph Neural Network-Based Multi-Bottleneck Localization for Microservices Applications. InProc. of ACM WWW
work page 2024
-
[33]
Steer, Felix Cuadrado, and Richard G
Benjamin A. Steer, Felix Cuadrado, and Richard G. Clegg. 2017. Raphtory: Decentralised Streaming for Temporal Graphs. InProc. of ACM DEBS
work page 2017
-
[34]
Shixuan Sun, Yuhang Chen, Shengliang Lu, Bingsheng He, and Yuchen Li. 2021. ThunderRW: An In-Memory Graph Random Walk Engine.Proc. VLDB Endow. 14, 11 (2021), 1992–2005
work page 2021
-
[35]
Keita Suzuki, Kenta Ishiguro, and Kenji Kono. 2024. Balancing Analysis Time and Bug Detection: Daily Development-Friendly Bug Detection in Linux. InProc. of USENIX ATC
work page 2024
-
[36]
Visa Inc. 2024. Visa Annual Report 2024. https://investor.visa.com
work page 2024
-
[37]
Pinhuan Wang, Chengying Huan, Zhibin Wang, Chen Tian, Yuede Ji, and Hang Liu. 2025. Bingo: Radix-based Bias Factorization for Random Walk on Dynamic Graphs. InProc. of ACM EuroSys
work page 2025
-
[38]
Pengyu Wang, Chao Li, Jing Wang, Taolei Wang, Lu Zhang, Jingwen Leng, Quan Chen, and Minyi Guo. 2021. Skywalker: Efficient Alias-method-based Graph Sampling and Random Walk on GPUs. InProc. of IEEE PACT
work page 2021
-
[39]
Rui Wang, Yongkun Li, Hong Xie, Yinlong Xu, and John C. S. Lui. 2020. Graph- Walker: An I/O-efficient and Resource-friendly Graph Analytic System for Fast and Scalable Random Walks. InProc. of USENIX ATC
work page 2020
-
[40]
Xinshi Wang, Dingxian Zhang, Da Zheng, Alex Smola, and Le Song. 2021. In- ductive Link Prediction on Temporal Graphs via Causal Anonymous Walks. In Proc. of ICLR
work page 2021
-
[41]
Brian Wickman, Hong Hu, Insu Yun, DaeHee Jang, JungWon Lim, Sanidhya Kashyap, and Taesoo Kim. 2021. Preventing Use-After-Free Attacks with Fast Forward Allocation. InProc. of USENIX Security
work page 2021
-
[42]
Huanhuan Wu, Yuzhen Huang, James Cheng, Jinfeng Li, and Yiping Ke. 2016. Reachability and Time-based Path Queries in Temporal Graphs. InProc. of IEEE ICDE
work page 2016
-
[43]
Da Xu, Chuanwei Ruan, Evren Korpeoglu, Sushant Kumar, and Kannan Achan
-
[44]
Inductive Representation Learning on Temporal Graphs. InProc. of ICLR
-
[45]
Wentao Xu, Weiqing Liu, Chang Xu, Jiang Bian, Jian Yin, and Tie-Yan Liu. 2021. REST: Relational Event-driven Stock Trend Forecasting. InProc. of ACM WWW
work page 2021
-
[46]
Ke Yang, Xiaosong Ma, Saravanan Thirumuruganathan, Kang Chen, and Yongwei Wu. 2021. Random Walks on Huge Graphs at Cache Efficiency. InProc. of ACM SOSP
work page 2021
-
[47]
Ke Yang, MingXing Zhang, Kang Chen, Xiaosong Ma, Yang Bai, and Yong Jiang
-
[48]
KnightKing: A Fast Distributed Graph Random Walk Engine. InProc. of ACM SOSP
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.