arxiv: 2604.16859 · v1 · submitted 2026-04-18 · 💻 cs.AI

Recognition: unknown

GAMMA-Net: Adaptive Long-Horizon Traffic Spatio-Temporal Forecasting Model based on Interleaved Graph Attention and Multi-Axis Mamba

Dongyi He , Yuanquan Gao , Bin Jiang , He Yan

Authors on Pith no claims yet

Pith reviewed 2026-05-10 07:25 UTC · model grok-4.3

classification 💻 cs.AI

keywords traffic forecastingspatio-temporal modelinggraph attention networksmambalong-horizon predictionadaptive dependencies

0 comments

The pith

GAMMA-Net interleaves graph attention with multi-axis Mamba to forecast traffic flows more accurately over long horizons.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents GAMMA-Net as a model that pairs Graph Attention Networks for dynamically weighting spatial influences among road segments with multi-axis Mamba modules that track long-range temporal and spatial patterns at lower cost than recurrent networks. It seeks to fix the inability of earlier approaches to handle the full complexity of traffic dependencies across time and space. Experiments on METR-LA, PEMS-BAY, PEMS03, PEMS04, PEMS07, and PEMS08 show lower errors than prior state-of-the-art methods for multiple forecast lengths. Ablation checks confirm that the spatial and temporal pieces each add measurable gains when combined.

Core claim

GAMMA-Net integrates Graph Attention Networks that use self-attention to adjust node influence according to real-time traffic conditions with multi-axis Selective State Space Models that efficiently capture long-term temporal and spatial dynamics, delivering consistent gains over existing models on benchmark datasets.

What carries the argument

Interleaved GAT self-attention for adaptive spatial dependencies and multi-axis Mamba for long-term temporal and spatial modeling.

If this is right

More accurate long-horizon traffic predictions enable earlier congestion mitigation and better urban planning decisions.
Mamba's efficiency removes the need for heavy recurrent layers when modeling extended time series in traffic networks.
The complementary roles of the spatial and temporal modules explain the overall accuracy lift reported in the experiments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same interleaving pattern could be tested on other networked time-series problems such as power-grid load or epidemic spread.
Mamba's lower compute cost might allow the model to run on edge devices for live traffic applications.
Adding mechanisms to handle missing sensor data would test whether the claimed robustness extends beyond complete benchmark records.

Load-bearing premise

That interleaving graph attention with multi-axis Mamba will capture the full range of traffic dependencies without overlooking important patterns or overfitting to the tested datasets.

What would settle it

A fresh traffic dataset or longer forecast horizon on which GAMMA-Net shows no MAE reduction relative to the strongest existing baseline would disprove the performance claim.

Figures

Figures reproduced from arXiv: 2604.16859 by Bin Jiang, Dongyi He, He Yan, Yuanquan Gao.

**Figure 2.** Figure 2: Node distribution of the METR-LA (a) and PEMS-BAY dataset (b). [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Performance heatmaps of twelve predictive models on the Metr-LA dataset for forecasting horizons of 3, 6, and 12 time steps, showing [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

**Figure 4.** Figure 4: Performance heatmaps of twelve predictive models on the PEMS-Bay dataset for forecasting horizons of 3, 6, and 12 time steps, showing [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: Performance heatmaps of twelve predictive models across four tra [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

**Figure 6.** Figure 6: SVD-based visualization of state transition matrices in the Mamba component of spatial and temporal GAMMA-Net trained by METR [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗

**Figure 7.** Figure 7: Community detection based on spatial and temporal attention in the GAMMA-Net trained by METR-LA dataset. (a) Spatial community [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗

**Figure 8.** Figure 8: Reordered adjacency matrices of the spatial and temporal attention graphs extracted from GAMMA-Net trained by METR-LA dataset. (a) [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗

**Figure 9.** Figure 9: t-SNE visualization of the timestamp-of-day embeddings in GAMMA-Net. The reduced-dimensionality results are connected in time [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗

**Figure 10.** Figure 10: True vs. predicted traffic flow for 14 representative METR-LA sensor nodes. Each subplot displays approximately 1,440 time steps (5- minute intervals) of observed flow (blue) and the model’s forecasted flow (orange) at a single node. The x-axis denotes time in 5-minute increments, and the y-axis shows traffic flow. Subplot titles indicate the sensor node index. 16 [PITH_FULL_IMAGE:figures/full_fig_p016_10.png] view at source ↗

**Figure 11.** Figure 11: Scatter plots of predicted versus true peak values across selected nodes. Each subplot corresponds to one node (sampled every 15 nodes), [PITH_FULL_IMAGE:figures/full_fig_p017_11.png] view at source ↗

read the original abstract

Accurate traffic forecasting is crucial for intelligent transportation systems, supporting effective traffic management, congestion reduction, and informed urban planning. However, traditional models often fail to adequately capture the intricate spatio-temporal dependencies present in traffic data. To overcome these limitations, we introduce GAMMA-Net, a novel approach that integrates Graph Attention Networks (GAT) with multi-axis Selective State Space Models (Mamba). The GAT component uses a self-attention mechanism to dynamically adjust the influence of nodes within the traffic network, enabling adaptive spatial dependency modeling based on real-time conditions. Simultaneously, the Mamba module efficiently models long-term temporal and spatial dynamics without the heavy computational cost of conventional recurrent architectures. Extensive experiments on several benchmark traffic datasets, including METR-LA, PEMS-BAY, PEMS03, PEMS04, PEMS07, and PEMS08, show that GAMMA-Net consistently outperforms existing state-of-the-art models across different prediction horizons, achieving up to a 16.25% reduction in Mean Absolute Error (MAE) compared to baseline models. Ablation studies highlight the critical contributions of both the spatial and temporal components, emphasizing their complementary role in improving prediction accuracy. In conclusion, the GAMMA-Net model sets a new standard in traffic forecasting, offering a powerful tool for next-generation traffic management and urban planning. The code for this study is available at https://github.com/hdy6438/GAMMA-Net

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GAMMA-Net interleaves GAT with multi-axis Mamba for traffic forecasting and reports benchmark gains, but the performance claims rest on single-run results without statistical backing.

read the letter

The main point is that GAMMA-Net interleaves graph attention networks with multi-axis Mamba to model adaptive spatial dependencies and long-term spatio-temporal dynamics in traffic data, claiming better long-horizon accuracy than prior models on standard benchmarks. What is new is the specific interleaving design that lets GAT adjust node influence dynamically while Mamba handles extended sequences efficiently without heavy recurrent costs. The paper does well by evaluating on six public datasets including METR-LA and several PEMS collections across different horizons, plus ablation studies that separate the contributions of the spatial and temporal parts. Releasing the code on GitHub is also a clear positive for anyone wanting to inspect or extend the implementation. The soft spot is the evidence behind the central claim. The abstract states consistent outperformance with up to 16.25% MAE reduction, yet there is no sign of multi-run averages, error bars, or significance tests. In this field that leaves open the possibility that the gains reflect hyperparameter tuning on fixed splits or ordinary run variance rather than the architecture itself. That directly weakens how much credit we can give the interleaving for capturing intricate dependencies. This work is aimed at researchers in traffic forecasting and spatio-temporal modeling who are already working with attention or state-space models. A reader looking for practical accuracy lifts on long horizons would get usable ideas from the architecture and results. It deserves a serious referee because the core combination is a reasonable, grounded extension of existing techniques and the evaluation uses established benchmarks. I would send it to peer review but ask the authors to add multi-run statistics and training details in revision.

Referee Report

1 major / 2 minor

Summary. The manuscript introduces GAMMA-Net, which interleaves Graph Attention Networks (GAT) for adaptive spatial dependency modeling with multi-axis Mamba modules for efficient long-horizon spatio-temporal dynamics in traffic forecasting. It claims consistent outperformance over state-of-the-art baselines on six public benchmarks (METR-LA, PEMS-BAY, PEMS03, PEMS04, PEMS07, PEMS08) across prediction horizons, with a maximum MAE reduction of 16.25%, supported by ablation studies on the spatial and temporal components. The code is released at https://github.com/hdy6438/GAMMA-Net.

Significance. If the reported gains prove robust, the work would advance traffic forecasting by showing that interleaving GAT with multi-axis Mamba can deliver strong long-horizon accuracy at lower computational cost than transformer alternatives. The ablation studies and open-source code are clear strengths that would support adoption and follow-on research in spatio-temporal modeling.

major comments (1)

[Experiments] Experiments section: The central claim of consistent outperformance (including the 16.25% MAE reduction) rests on single-run point estimates without multi-run averages, standard deviations, or statistical significance tests against baselines. This directly undermines reliability, as the gains could arise from run-to-run variance or more aggressive tuning of the additional parameters in the interleaved design rather than the architecture itself. At minimum, results from 5–10 random seeds with error bars and appropriate tests (e.g., paired t-test) are required to substantiate the claim.

minor comments (2)

[Abstract] Abstract: The maximum 16.25% MAE reduction is stated without identifying the exact dataset, horizon, or baseline model achieving it; adding this detail would improve precision.
[Methodology] Methodology: The interleaving procedure between GAT and multi-axis Mamba is described at a high level; a diagram or pseudocode showing layer ordering, axis handling, and residual connections would clarify the architecture.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comment regarding the experimental evaluation below and will incorporate the suggested improvements in the revised version.

read point-by-point responses

Referee: [Experiments] Experiments section: The central claim of consistent outperformance (including the 16.25% MAE reduction) rests on single-run point estimates without multi-run averages, standard deviations, or statistical significance tests against baselines. This directly undermines reliability, as the gains could arise from run-to-run variance or more aggressive tuning of the additional parameters in the interleaved design rather than the architecture itself. At minimum, results from 5–10 random seeds with error bars and appropriate tests (e.g., paired t-test) are required to substantiate the claim.

Authors: We agree that single-run results limit the robustness of the reported gains. In the revised manuscript, we will re-execute all experiments across 5 random seeds, report mean values with standard deviations, and include paired t-tests (or equivalent) to evaluate statistical significance against each baseline. These additions will directly address concerns about run-to-run variance and parameter tuning effects. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain; empirical model evaluation on external benchmarks.

full rationale

The paper proposes GAMMA-Net by describing an architecture that interleaves GAT for spatial attention with multi-axis Mamba for temporal/spatial dynamics. It then reports empirical results on independent public datasets (METR-LA, PEMS-BAY, PEMS03/04/07/08) with comparisons to external baselines, plus ablation studies. No load-bearing equations, predictions, or first-principles claims reduce by construction to fitted inputs or self-citations. Performance numbers (e.g., MAE reductions) are presented as experimental outcomes, not derived equivalences. Public code link further supports external verification. This is standard non-circular empirical ML work.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard machine learning assumptions for neural network optimization and the domain assumption that traffic sensor data forms graphs with meaningful spatial dependencies that GAT and Mamba can jointly model.

axioms (1)

domain assumption Traffic networks can be represented as graphs where nodes correspond to sensors and edges capture road connectivity and influence.
This underpins the GAT component and is invoked implicitly when describing spatial dependency modeling.

pith-pipeline@v0.9.0 · 5570 in / 1325 out tokens · 52367 ms · 2026-05-10T07:25:56.651513+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

27 extracted references · 10 canonical work pages · 1 internal anchor

[1]

Deep learning algorithms for traffic fore- casting: A comprehensive review and comparison with classical ones.Journal of Advanced Transportation, 2024(1):9981657, 2024

Shahriar Afandizadeh, Saeid Abdolahi, and Hamid Mirzahossein. Deep learning algorithms for traffic fore- casting: A comprehensive review and comparison with classical ones.Journal of Advanced Transportation, 2024(1):9981657, 2024

2024
[2]

Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting.arXiv preprint arXiv:1709.04875,

Bing Yu, Haoteng Yin, and Zhanxing Zhu. Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting.arXiv preprint arXiv:1709.04875, 2017

work page arXiv 2017
[3]

Attention is all you need.Advances in Neural Information Processing Systems, 2017

A Vaswani. Attention is all you need.Advances in Neural Information Processing Systems, 2017

2017
[4]

Gman: A graph multi-attention network for traffic prediction

Chuanpan Zheng, Xiaoliang Fan, Cheng Wang, and Jianzhong Qi. Gman: A graph multi-attention network for traffic prediction. InProceedings of the AAAI conference on artificial intelligence, volume 34, pages 1234–1241, 2020

2020
[5]

Pdformer: Propagation delay-aware dy- namic long-range transformer for traffic flow prediction

Jiawei Jiang, Chengkai Han, Wayne Xin Zhao, and Jingyuan Wang. Pdformer: Propagation delay-aware dy- namic long-range transformer for traffic flow prediction. InProceedings of the AAAI conference on artificial intelligence, volume 37, pages 4365–4373, 2023

2023
[6]

Stae- former: Spatio-temporal adaptive embedding makes vanilla transformer sota for traffic forecasting

Hangchen Liu, Zheng Dong, Renhe Jiang, Jiewen Deng, Jinliang Deng, Quanjun Chen, and Xuan Song. Stae- former: Spatio-temporal adaptive embedding makes vanilla transformer sota for traffic forecasting. InProceed- ings of the 32nd ACM international conference on information and knowledge management, pages 4125–4129, 2023. 19

2023
[7]

Reformer: The Efficient Transformer

Nikita Kitaev, Łukasz Kaiser, and Anselm Levskaya. Reformer: The efficient transformer.arXiv preprint arXiv:2001.04451, 2020

work page internal anchor Pith review arXiv 2001
[8]

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces.arXiv preprint arXiv:2312.00752, 2023

work page Pith review arXiv 2023
[9]

St-mamba: Spatial-temporal mamba for traffic flow estimation recovery using limited data

Doncheng Yuan, Jianzhe Xue, Jinshan Su, Wenchao Xu, and Haibo Zhou. St-mamba: Spatial-temporal mamba for traffic flow estimation recovery using limited data. In2024 IEEE/CIC International Conference on Commu- nications in China (ICCC), pages 1928–1933. IEEE, 2024

1928
[10]

Decomposed spatio-temporal mamba for long-term traffic predic- tion

Sicheng He, Junzhong Ji, and Minglong Lei. Decomposed spatio-temporal mamba for long-term traffic predic- tion. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 11772–11780, 2025

2025
[11]

St-mambasync: The complement of mamba and trans- formers for spatial-temporal in traffic flow prediction.arXiv preprint arXiv:2404.15899, 2024

Zhiqi Shao, Xusheng Yao, Ze Wang, and Junbin Gao. St-mambasync: The complement of mamba and trans- formers for spatial-temporal in traffic flow prediction.arXiv preprint arXiv:2404.15899, 2024

work page arXiv 2024
[12]

ms-mamba: Multi-scale mamba for time-series forecasting.arXiv e-prints, pages arXiv–2504, 2025

Yusuf Meric Karadag, Sinan Kalkan, and Ipek Gursel Dino. ms-mamba: Multi-scale mamba for time-series forecasting.arXiv e-prints, pages arXiv–2504, 2025

2025
[13]

Graph convolutional networks: a comprehensive review.Computational Social Networks, 6(1):1–23, 2019

Si Zhang, Hanghang Tong, Jiejun Xu, and Ross Maciejewski. Graph convolutional networks: a comprehensive review.Computational Social Networks, 6(1):1–23, 2019

2019
[14]

Graph attention networks.stat, 1050(20):10–48550, 2017

Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, Yoshua Bengio, et al. Graph attention networks.stat, 1050(20):10–48550, 2017

2017
[15]

Diffusion convolutional recurrent neural network: Data-driven traffic forecasting.arXiv preprint arXiv:1707.01926, 2017

Yaguang Li, Rose Yu, Cyrus Shahabi, and Yan Liu. Diffusion convolutional recurrent neural network: Data- driven traffic forecasting.arXiv preprint arXiv:1707.01926, 2017

work page arXiv 2017
[16]

Graph wavenet for deep spatial-temporal graph model- ing.arXiv preprint arXiv:1906.00121,

Zonghan Wu, Shirui Pan, Guodong Long, Jing Jiang, and Chengqi Zhang. Graph wavenet for deep spatial- temporal graph modeling.arXiv preprint arXiv:1906.00121, 2019

work page arXiv 1906
[17]

Adaptive graph convolutional recurrent network for traffic forecasting.Advances in neural information processing systems, 33:17804–17815, 2020

Lei Bai, Lina Yao, Can Li, Xianzhi Wang, and Can Wang. Adaptive graph convolutional recurrent network for traffic forecasting.Advances in neural information processing systems, 33:17804–17815, 2020

2020
[18]

Stg-mamba: Spatial-temporal graph learning via selective state space model.arXiv preprint arXiv:2403.12418, 2024

Lincan Li, Hanchen Wang, Wenjie Zhang, and Adelle Coster. Stg-mamba: Spatial-temporal graph learning via selective state space model.arXiv preprint arXiv:2403.12418, 2024

work page arXiv 2024
[19]

Spot-mamba: Learning long-range dependency on spatio-temporal graphs with selective state spaces.arXiv preprint arXiv:2406.11244, 2024

Jinhyeok Choi, Heehyeon Kim, Minhyeong An, and Joyce Jiyoung Whang. Spot-mamba: Learning long-range dependency on spatio-temporal graphs with selective state spaces.arXiv preprint arXiv:2406.11244, 2024

work page arXiv 2024
[20]

Spatial-temporal synchronous graph convolutional networks: A new framework for spatial-temporal network data forecasting

Chao Song, Youfang Lin, Shengnan Guo, and Huaiyu Wan. Spatial-temporal synchronous graph convolutional networks: A new framework for spatial-temporal network data forecasting. InProceedings of the AAAI confer- ence on artificial intelligence, volume 34, pages 914–921, 2020

2020
[21]

Spatial-temporal identity: A simple yet effective baseline for multivariate time series forecasting

Zezhi Shao, Zhao Zhang, Fei Wang, Wei Wei, and Yongjun Xu. Spatial-temporal identity: A simple yet effective baseline for multivariate time series forecasting. InProceedings of the 31st ACM International Conference on Information&Knowledge Management, pages 4454–4458, 2022

2022
[22]

St-norm: Spatial and temporal nor- malization for multi-variate time series forecasting

Jinliang Deng, Xiusi Chen, Renhe Jiang, Xuan Song, and Ivor W Tsang. St-norm: Spatial and temporal nor- malization for multi-variate time series forecasting. InProceedings of the 27th ACM SIGKDD conference on knowledge discovery&data mining, pages 269–278, 2021

2021
[23]

Discrete graph structure learning for forecasting multiple time series.arXiv preprint arXiv:2101.06861, 2021

Chao Shang, Jie Chen, and Jinbo Bi. Discrete graph structure learning for forecasting multiple time series.arXiv preprint arXiv:2101.06861, 2021

work page arXiv 2021
[24]

Connecting the dots: Multivariate time series forecasting with graph neural networks

Zonghan Wu, Shirui Pan, Guodong Long, Jing Jiang, Xiaojun Chang, and Chengqi Zhang. Connecting the dots: Multivariate time series forecasting with graph neural networks. InProceedings of the 26th ACM SIGKDD international conference on knowledge discovery&data mining, pages 753–763, 2020. 20

2020
[25]

Spatio-temporal graph mixformer for traffic forecasting.Expert systems with applications, 228:120281, 2023

Mourad Lablack and Yanming Shen. Spatio-temporal graph mixformer for traffic forecasting.Expert systems with applications, 228:120281, 2023

2023
[26]

Spatial-temporal-decoupled masked pre-training for spatiotemporal forecasting.arXiv preprint arXiv:2312.00516, 2023

Haotian Gao, Renhe Jiang, Zheng Dong, Jinliang Deng, Yuxin Ma, and Xuan Song. Spatial-temporal-decoupled masked pre-training for spatiotemporal forecasting.arXiv preprint arXiv:2312.00516, 2023

work page arXiv 2023
[27]

Visualizing data using t-sne.Journal of machine learning research, 9(11), 2008

Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne.Journal of machine learning research, 9(11), 2008. 21

2008