pith. sign in

arxiv: 2506.07179 · v3 · submitted 2025-06-08 · 💻 cs.LG · cs.AI

Efficient Traffic Forecasting on Large-Scale Road Network by Regularized Adaptive Graph Convolution

Pith reviewed 2026-05-19 11:05 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords traffic forecastinggraph convolutional networkslarge-scale networksadaptive graph learningspatial-temporal predictioncosine similarityregularization
0
0 comments X

The pith

A cosine-similarity operator on learned embeddings lets graph models forecast traffic on large road networks in linear time while keeping high accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tackles the quadratic scaling barrier that prevents standard graph convolutions from handling big road networks in traffic prediction. It replaces full neighborhood aggregation with an Efficient Cosine Operator that computes similarities between node embeddings in linear time. A regularized adaptive framework then combines stochastic shared embeddings and residual differences to refine those embeddings and recover predictive power. Experiments on four large real-world traffic datasets show the resulting model exceeds prior methods in accuracy while matching their computational speed.

Core claim

The Regularized Adaptive Graph Convolution (RAGC) model uses an Efficient Cosine Operator (ECO) to perform graph convolution via cosine similarity of node embeddings at linear complexity, integrated with Stochastic Shared Embedding and a residual adaptive convolution mechanism to produce high-quality embeddings that support accurate spatial-temporal traffic forecasts on large-scale networks.

What carries the argument

The Efficient Cosine Operator (ECO), which substitutes traditional quadratic graph convolution with a linear-time computation of cosine similarities between learned node embeddings.

If this is right

  • Traffic forecasting becomes feasible on road networks several times larger than those handled by previous STGCN variants.
  • The regularization components allow embedding quality to improve without increasing the asymptotic cost of each layer.
  • Competitive run times support repeated forecasting cycles needed for dynamic routing or signal control.
  • Outperformance holds across multiple real datasets, indicating the design is not tied to one city's traffic pattern.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same linear operator could be tested on other large spatial graphs such as power grids or sensor networks for similar efficiency gains.
  • If embedding similarities prove stable, the approach may reduce reliance on manual spatial partitioning in future graph forecasting work.
  • Extending the residual mechanism to longer temporal horizons could address multi-step prediction without extra quadratic terms.

Load-bearing premise

Cosine similarity among the learned node embeddings captures the essential spatial dependencies in traffic data closely enough that the approximation does not meaningfully reduce prediction quality.

What would settle it

Run both the proposed operator and a standard full graph convolution on a network small enough for the full version to finish, then compare their mean absolute errors to see whether the accuracy gap exceeds random variation.

Figures

Figures reproduced from arXiv: 2506.07179 by Kaiqi Wu, Sen Zhang, Weiyang Kong, Yubao Liu, Zitong Chen.

Figure 1
Figure 1. Figure 1: Parameter proportion analysis of different models [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Detailed framework of RAGL. 4.2.2 SSE for Node Embedding Regularization. To improve the gen￾eralization of node embeddings and mitigate overfitting, we adopt SSE [35], which randomly replaces the embeddings between nodes during training. As mentioned in Section 3, this procedure yields a regularized embedding matrix 𝐸˜ 𝑛𝑜𝑑𝑒 ∈ R 𝑁 ×𝑑𝑛𝑜𝑑𝑒 , which is then fed into the subsequent graph convolution layers. By s… view at source ↗
Figure 3
Figure 3. Figure 3: Comparison of (a) Existing Adaptive Graph Convo [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Training and validation loss curves of different RAGL variants on the SD dataset. [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Cluster visualization of node embeddings learned [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Visualization of 𝑊˜ 𝑔 on four datasets. (a) SD (b) GBA (c) GLA (d) CA [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Visualization of learned similarity adjacency ma [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: presents the search results for hyper-parameters 𝐿 and 𝑑𝑛𝑜𝑑𝑒 on the SD and GLA datasets. We evaluate the number of encoder layers 𝐿 from {2, 4, 6, 8} and the node embedding dimension 𝑑𝑛𝑜𝑑𝑒 from {32, 48, 64, 80}. As shown in the figure, increasing the number of encoder layers generally improves model performance. For the smaller SD dataset, the best performance is achieved at (a) ￾ on SD (b) ￾ on GLA (c) ￾￾… view at source ↗
read the original abstract

Traffic prediction is a critical task in spatial-temporal forecasting with broad applications in travel planning and urban management. To model the complex spatial-temporal dependencies in traffic data, Spatial-Temporal Graph Convolutional Networks (STGCNs) have been widely employed, achieving advanced performance. However, when applied to large-scale road networks, the quadratic computational complexity of traditional graph convolution operations severely limits their scalability. Several methods attempt to address this issue through approximation, compression, or spatial partitioning. Nevertheless, these methods often either fail to achieve sufficient computational efficiency or compromise prediction accuracy. To address these challenges, we propose a Regularized Adaptive Graph Convolution (RAGC) model. First, to ensure scalability on large road networks, we develop the Efficient Cosine Operator (ECO), which performs graph convolution based on the cosine similarity of node embeddings with linear time complexity. Second, we introduce a regularized adaptive graph convolution framework that combines Stochastic Shared Embedding (SSE) and adaptive graph convolution through a residual difference mechanism. This design enables the model to learn high-quality node embeddings, thereby improving prediction accuracy while maintaining computational efficiency. Extensive experiments on four large-scale real-world traffic datasets show that RAGC consistently outperforms state-of-the-art methods in terms of prediction accuracy and exhibits competitive computational efficiency. The code is available at: https://github.com/wkq-wukaiqi/RAGC.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes the Regularized Adaptive Graph Convolution (RAGC) model for traffic forecasting on large-scale road networks. It develops the Efficient Cosine Operator (ECO) to perform graph convolution via cosine similarity of learned node embeddings in linear time, and introduces a regularized adaptive framework that integrates Stochastic Shared Embedding (SSE) with adaptive graph convolution through a residual difference mechanism. This is claimed to improve embedding quality and prediction accuracy while ensuring scalability. Extensive experiments on four large-scale real-world traffic datasets reportedly show consistent outperformance over state-of-the-art methods in accuracy with competitive computational efficiency.

Significance. If the central claims hold, the work could meaningfully advance scalable spatial-temporal forecasting by replacing quadratic graph convolutions with an efficient cosine-based operator, enabling applications on very large road networks. The regularization via SSE and residual mechanism offers a practical way to learn adaptive structures from data, and code availability aids reproducibility. The multi-dataset evaluation strengthens the empirical case, though significance hinges on confirming that the linear approximation preserves sufficient long-range dependencies without hidden costs to accuracy.

major comments (3)
  1. [§3.2] §3.2 (ECO definition and complexity): The claim of linear time complexity for the Efficient Cosine Operator relies on cosine similarity of node embeddings, but the manuscript provides no derivation, pseudocode, or analysis showing how quadratic pairwise computation is avoided (e.g., via low-rank factorization or sampling). This is load-bearing for the scalability claim, as any unverified approximation risks discarding heterogeneous or long-range traffic dependencies.
  2. [Experiments] Experiments section (results tables): The reported outperformance on four datasets lacks error bars, standard deviations across runs, or statistical significance tests. Without these, the consistency of gains over STGCN baselines cannot be rigorously assessed and may not survive variability in training.
  3. [§3.3] §3.3 (residual difference mechanism): The residual difference mechanism is introduced to combine SSE and adaptive convolution, yet no ablation isolates its effect and no derivation shows it restores information potentially lost in the cosine approximation. This leaves the circularity concern (embeddings learned from the same traffic data used for prediction) unaddressed.
minor comments (2)
  1. [Abstract] Abstract: The statement of 'linear time complexity' would benefit from explicit big-O notation and a direct comparison to the quadratic cost of standard graph convolution.
  2. [Notation] Notation throughout: Define the node embedding matrix and the exact form of the cosine operator with numbered equations to clarify how the adaptive adjacency is constructed from embeddings.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thorough and constructive review. We have carefully considered each major comment and provide detailed responses below. We will incorporate clarifications, additional analyses, and experimental improvements in the revised manuscript to address the concerns raised.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (ECO definition and complexity): The claim of linear time complexity for the Efficient Cosine Operator relies on cosine similarity of node embeddings, but the manuscript provides no derivation, pseudocode, or analysis showing how quadratic pairwise computation is avoided (e.g., via low-rank factorization or sampling). This is load-bearing for the scalability claim, as any unverified approximation risks discarding heterogeneous or long-range traffic dependencies.

    Authors: We appreciate this observation on the need for explicit justification. The Efficient Cosine Operator first projects nodes into a low-dimensional embedding space and then computes normalized dot products via vectorized operations that avoid materializing a full N×N similarity matrix, achieving O(Nd) complexity where d is the embedding dimension. We acknowledge that the current manuscript lacks a formal derivation and pseudocode. In the revision we will add both to §3.2, together with a brief discussion of how the learned embeddings retain sufficient long-range spatial structure for traffic forecasting. revision: yes

  2. Referee: [Experiments] Experiments section (results tables): The reported outperformance on four datasets lacks error bars, standard deviations across runs, or statistical significance tests. Without these, the consistency of gains over STGCN baselines cannot be rigorously assessed and may not survive variability in training.

    Authors: We agree that reporting variability and statistical significance is essential for rigorous evaluation. We will rerun all experiments with multiple random seeds (at least five), report mean performance together with standard deviations, and include paired t-tests or Wilcoxon tests against the strongest baselines in the revised results tables and text. revision: yes

  3. Referee: [§3.3] §3.3 (residual difference mechanism): The residual difference mechanism is introduced to combine SSE and adaptive convolution, yet no ablation isolates its effect and no derivation shows it restores information potentially lost in the cosine approximation. This leaves the circularity concern (embeddings learned from the same traffic data used for prediction) unaddressed.

    Authors: We thank the referee for pointing out these gaps. We will add an ablation study that isolates the residual difference mechanism and quantify its contribution to accuracy. We will also include a short derivation in §3.3 explaining how the residual term compensates for information that may be attenuated by the cosine operator. On the circularity issue, the embeddings are learned end-to-end, yet the combination of SSE regularization and the residual connection is explicitly designed to encourage embeddings that generalize beyond the immediate prediction objective; we will clarify this design rationale in the revised section. revision: yes

Circularity Check

1 steps flagged

Node embeddings learned from target traffic data used to define cosine-based graph convolution

specific steps
  1. fitted input called prediction [Abstract (model description)]
    "we develop the Efficient Cosine Operator (ECO), which performs graph convolution based on the cosine similarity of node embeddings with linear time complexity. [...] we introduce a regularized adaptive graph convolution framework that combines Stochastic Shared Embedding (SSE) and adaptive graph convolution through a residual difference mechanism. This design enables the model to learn high-quality node embeddings, thereby improving prediction accuracy while maintaining computational efficiency."

    Node embeddings are optimized directly on the traffic forecasting objective; cosine similarity derived from those embeddings then defines the linear operator used for the same forecasting task. The reported accuracy improvement is therefore partly achieved by construction through fitting the embeddings to the target data rather than through an independent structural derivation.

full rationale

The paper's core efficiency claim rests on the ECO operator that computes graph convolution via cosine similarity of node embeddings. These embeddings are learned jointly from the same traffic datasets used for the downstream prediction task. While experiments on four real-world datasets provide external validation of accuracy gains, the design does not include a parameter-free derivation or external benchmark showing that the cosine operator preserves spatial-temporal structure independently of the fitted embeddings. This matches the fitted-input-called-prediction pattern at a moderate level without fully reducing the entire result to a tautology.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 2 invented entities

The central claim rests on the unproven premise that cosine similarity of embeddings captures traffic spatial structure and on several tunable components introduced to stabilize the adaptive learning.

free parameters (2)
  • regularization weight
    Balances the residual difference term in the adaptive graph convolution framework.
  • embedding dimension
    Controls the size of node representations used by the cosine operator.
axioms (1)
  • domain assumption Cosine similarity between node embeddings can substitute for standard graph convolution while preserving necessary spatial dependencies.
    This is the core justification for replacing quadratic operations with the linear ECO.
invented entities (2)
  • Efficient Cosine Operator (ECO) no independent evidence
    purpose: Perform graph convolution in linear time using cosine similarity of embeddings.
    New operator introduced to solve the quadratic complexity problem.
  • Residual difference mechanism no independent evidence
    purpose: Combine stochastic shared embeddings with adaptive graph convolution.
    New integration step in the regularized framework.

pith-pipeline@v0.9.0 · 5779 in / 1417 out tokens · 50170 ms · 2026-05-19T11:05:05.182971+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages

  1. [1]

    Mahdi Abavisani, Liwei Wu, Shengli Hu, Joel Tetreault, and Alejandro Jaimes

  2. [2]

    Multimodal Categorization of Crisis Events in Social Media. In CVPR

  3. [3]

    Lei Bai, Lina Yao, Can Li, Xianzhi Wang, and Can Wang. 2020. Adaptive graph convolutional recurrent network for traffic forecasting. In NeurIPS

  4. [4]

    Zhanxing Zhu Bing Yu, Haoteng Yin. 2018. Spatio-Temporal Graph Convolutional Networks: A Deep Learning Framework for Traffic Forecasting. In IJCAI

  5. [5]

    Yunyao Cheng, Peng Chen, Chenjuan Guo, Kai Zhao, Qingsong Wen, Bin Yang, and Christian S Jensen. 2023. Weakly Guided Adaptation for Robust Time Series Forecasting. In VLDB

  6. [6]

    Jeongwhan Choi, Hwangyong Choi, Jeehyun Hwang, and Noseong Park. 2022. Graph Neural Controlled Differential Equations for Traffic Forecasting. InAAAI

  7. [7]

    Krzysztof Marcin Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Quincy Davis, Afroz Mohiuddin, Lukasz Kaiser, et al. 2021. Rethinking Attention with Performers. In ICLR

  8. [8]

    Zheng Dong, Renhe Jiang, Haotian Gao, Hangchen Liu, Jinliang Deng, Qingsong Wen, and Xuan Song. 2024. Heterogeneity-informed meta-parameter learning for spatiotemporal time series forecasting. In SIGKDD

  9. [9]

    Harris Drucker, Christopher J Burges, Linda Kaufman, Alex Smola, and Vladimir Vapnik. 1996. Support Vector Regression Machines. In NeurIPS

  10. [10]

    Wenying Duan, Xiaoxi He, Zimu Zhou, Lothar Thiele, and Hong Rao. 2023. Localised Adaptive Spatial-Temporal Graph Neural Network. In SIGKDD

  11. [11]

    Yuchen Fang, Yuxuan Liang, Bo Hui, Zezhi Shao, Liwei Deng, Xu Liu, Xinke Jiang, and Kai Zheng. 2025. Efficient Large-Scale Traffic Forecasting with Transformers: A Spatial Data Management Perspective. In SIGKDD. 12

  12. [12]

    Yuchen Fang, Yanjun Qin, Haiyong Luo, Fang Zhao, Bingbing Xu, Liang Zeng, and Chenxing Wang. 2023. When Spatio-Temporal Meet Wavelets: Disentangled Traffic Forecasting via Efficient Spectral Graph Attention Networks. In ICDE

  13. [13]

    Zheng Fang, Qingqing Long, Guojie Song, and Kunqing Xie. 2021. Spatial- Temporal Graph ODE Networks for Traffic Flow Forecasting. In SIGKDD

  14. [14]

    Kan Guo, Yongli Hu, Yanfeng Sun, Sean Qian, Junbin Gao, and Baocai Yin. 2021. Hierarchical graph convolution network for traffic forecasting. In AAAI

  15. [15]

    James D Hamilton. 2020. Time Series Analysis. Princeton university press

  16. [16]

    Jindong Han, Weijia Zhang, Hao Liu, Tao Tao, Naiqiang Tan, and Hui Xiong

  17. [17]

    BigST: Linear Complexity Spatio-Temporal Graph Neural Network for Traffic Forecasting on Large-Scale Road Networks. In VLDB

  18. [18]

    Rongzhou Huang, Chuyin Huang, Yubao Liu, Genan Dai, and Weiyang Kong

  19. [19]

    In IJCAI

    LSGCN: Long Short-Term Traffic Prediction with Graph Convolutional Networks. In IJCAI

  20. [20]

    Jiawei Jiang, Chengkai Han, Wayne Xin Zhao, and Jingyuan Wang. 2023. PDFormer: Propagation Delay-Aware Dynamic Long-Range Transformer for Traffic Flow Prediction. In AAAI

  21. [21]

    Renhe Jiang, Zhaonan Wang, Jiawei Yong, Puneet Jeph, Quanjun Chen, Yasumasa Kobayashi, Xuan Song, Shintaro Fukushima, and Toyotaro Suzumura. 2023. Spatio-Temporal Meta-Graph Learning for Traffic Forecasting. InAAAI

  22. [22]

    Wenzhao Jiang, Jindong Han, Hao Liu, Tao Tao, Naiqiang Tan, and Hui Xiong

  23. [23]

    In SIGKDD

    Interpretable Cascading Mixture-of-Experts for Urban Traffic Congestion Prediction. In SIGKDD

  24. [24]

    Weiyang Kong, Ziyu Guo, and Yubao Liu. 2024. Spatio-Temporal Pivotal Graph Neural Networks for Traffic Flow Forecasting. In AAAI

  25. [25]

    Weiyang Kong, Kaiqi Wu, Sen Zhang, and Yubao Liu. 2025. GraphSparseNet: a Novel Method for Large Scale Trafffic Flow Prediction. arXiv preprint arXiv:2502.19823 (2025)

  26. [26]

    Shiyong Lan, Yitong Ma, Weikang Huang, Wenwu Wang, Hongyu Yang, and Pyang Li. 2022. DSTAGNN: Dynamic Spatial-Temporal Aware Graph Neural Network for Traffic Flow Forecasting. In ICML

  27. [27]

    Fuxian Li, Jie Feng, Huan Yan, Guangyin Jin, Fan Yang, Funing Sun, Depeng Jin, and Yong Li. 2023. Dynamic Graph Convolutional Recurrent Network for Traffic Prediction: Benchmark and Solution. ACM Transactions on Knowledge Discovery from Data 17, 1 (2023), 1–21

  28. [28]

    Mengzhang Li and Zhanxing Zhu. 2021. Spatial-Temporal Fusion Graph Neural Networks for Traffic Flow Forecasting. In AAAI

  29. [29]

    Yaguang Li, Rose Yu, Cyrus Shahabi, and Yan Liu. 2018. Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting. In ICLR

  30. [30]

    Hangchen Liu, Zheng Dong, Renhe Jiang, Jiewen Deng, Jinliang Deng, Quanjun Chen, and Xuan Song. 2023. STAEformer: Spatio-Temporal Adaptive Embedding Makes Vanilla Transformer SOTA for Traffic Forecasting. InCIKM

  31. [31]

    Xu Liu, Yutong Xia, Yuxuan Liang, Junfeng Hu, Yiwei Wang, Lei Bai, Chao Huang, Zhenguang Liu, Bryan Hooi, and Roger Zimmermann. 2023. Largest: A Benchmark Dataset for Large-Scale Traffic Forecasting. In NeruIPS

  32. [32]

    Zezhi Shao, Zhao Zhang, Fei Wang, Wei Wei, and Yongjun Xu. 2022. Spatial- Temporal Identity: A Simple Yet Effective Baseline for Multivariate Time Series Forecasting. In CIKM

  33. [33]

    Zezhi Shao, Zhao Zhang, Wei Wei, Fei Wang, Yongjun Xu, Xin Cao, and Chris- tian S. Jensen. 2022. Decoupled Dynamic Spatial-Temporal Graph Neural Net- work for Traffic Forecasting. In VLDB

  34. [34]

    David I Shuman, Sunil K Narang, Pascal Frossard, Antonio Ortega, and Pierre Vandergheynst. 2013. The Emerging Field of Signal Processing on Graphs: Extending High-Dimensional Data Analysis to Networks and Other Irregular Domains. IEEE signal processing magazine 30, 3 (2013), 83–98

  35. [35]

    Hongjun Wang, Jiyuan Chen, Tong Pan, Zipei Fan, Xuan Song, Renhe Jiang, Lingyu Zhang, Yi Xie, Zhongyi Wang, and Boyuan Zhang. 2023. Easy Begun Is Half Done: Spatial-Temporal Graph Modeling with ST-Curriculum Dropout. In AAAI

  36. [36]

    Leye Wang, Di Chai, Xuanzhe Liu, Liyue Chen, and Kai Chen. 2021. Exploring the Generalizability of Spatio-Temporal Traffic Prediction: Meta-Modeling and an Analytic Framework. IEEE Transactions on Knowledge and Data Engineering 35, 4 (2021), 3870–3884

  37. [37]

    Billy M Williams and Lester A Hoel. 2003. Modeling and Forecasting Vehicular Traffic Flow as a Seasonal ARIMA Process: Theoretical Basis and Empirical Results. Journal of transportation engineering 129, 6 (2003), 664–672

  38. [38]

    Liwei Wu, Shuqing Li, Cho-Jui Hsieh, and James Sharpnack. 2020. SSE-PT: Sequential Recommendation Via Personalized Transformer. In RecSys

  39. [39]

    Liwei Wu, Shuqing Li, Cho-Jui Hsieh, and James L Sharpnack. 2019. Stochas- tic Shared Embeddings: Data-driven Regularization of Embedding Layers. In NeurIPS

  40. [40]

    Zonghan Wu, Shirui Pan, Guodong Long, Jing Jiang, Xiaojun Chang, and Chengqi Zhang. 2020. Connecting the dots: Multivariate time series forecasting with graph neural networks. In SIGKDD

  41. [41]

    Zonghan Wu, Shirui Pan, Guodong Long, Jing Jiang, and Chengqi Zhang. 2019. Graph WaveNet for Deep Spatial-Temporal Graph Modeling. InIJCAI

  42. [42]

    Chin-Chia Michael Yeh, Yujie Fan, Xin Dai, Uday Singh Saini, Vivian Lai, Prince Osei Aboagye, Junpeng Wang, Huiyuan Chen, Yan Zheng, Zhongfang Zhuang, et al. 2024. RPMixer: Shaking Up Time Series Forecasting with Random Projections for Large Spatial-Temporal Data. In SIGKDD

  43. [43]

    Xueyan Yin, Genze Wu, Jinze Wei, Yanming Shen, Heng Qi, and Baocai Yin. 2021. Deep learning on Traffic Prediction: Methods, Analysis and Future Directions. IEEE Transactions on Intelligent Transportation Systems 23, 6 (2021), 4927–4943

  44. [44]

    Hongyuan Yu, Ting Li, Weichen Yu, Jianguo Li, Yan Huang, Liang Wang, and Alex Liu. 2022. Regularized Graph Structure Learning with Semantic Knowledge for Multi-variates Time-Series Forecasting. In IJCAI

  45. [45]

    Junbo Zhang, Yu Zheng, and Dekang Qi. 2017. Deep Spatio-Temporal Residual Networks for Citywide Crowd Flows Prediction. In AAAI

  46. [46]

    Junbo Zhang, Yu Zheng, Dekang Qi, Ruiyuan Li, and Xiuwen Yi. 2016. DNN- Based Prediction Model for Spatio-Temporal Data. In SIGSPATIAL. 13