Improving Spatio-Temporal Residual Error Propagation by Mitigating Over-Squashing

Adrian Munteanu; Bruno Cornelis; Esther Rodrigo Bonet; Seyed Mohamad Moghadas

arxiv: 2605.18068 · v1 · pith:QOERNOPQnew · submitted 2026-05-18 · 💻 cs.LG · cs.AI

Improving Spatio-Temporal Residual Error Propagation by Mitigating Over-Squashing

Seyed Mohamad Moghadas , Esther Rodrigo Bonet , Bruno Cornelis , Adrian Munteanu This is my paper

Pith reviewed 2026-05-20 11:56 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords spatio-temporal forecastingresidual error propagationover-squashingForman curvaturegraph rewiringuncertainty quantificationautoregressive modelscovariance calibration

0 comments

The pith

Graph rewiring based on discrete Forman curvature mitigates over-squashing to improve residual error propagation in spatio-temporal forecasting.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to address how small prediction errors compound over time in recurrent models for multivariate time series, particularly when residuals are spatially and temporally correlated. It establishes that a graph rewiring strategy guided by discrete Forman curvature can strengthen critical edges that limit information flow, thereby improving both forecast accuracy over long horizons and the calibration of uncertainty estimates. Readers interested in practical forecasting systems would find this relevant because reliable long-term predictions with proper error bars support better decision-making in areas such as urban planning and resource management. The module integrates with existing autoregressive encoders and demonstrates gains alongside theoretical analysis of its effects on graph properties.

Core claim

Teger overcomes the spatial and temporal limitations of error-correlated autoregressive forecasting through a spatial curvature-aware graph rewiring mechanism explicitly strengthening information-bottleneck edges identified by discrete Forman curvature. The component is integrated into a low-rank-plus-diagonal covariance head, preserving tractable inference via the Woodbury identity. Teger is backbone-agnostic and provides theoretical evidence connecting curvature-aware rewiring to oversquashing alleviation, improved spectral connectivity, reduced effective resistance, and improved covariance calibration bounds.

What carries the argument

The spatial curvature-aware graph rewiring mechanism that identifies information-bottleneck edges via discrete Forman curvature and strengthens them to alleviate over-squashing.

If this is right

Consistent improvements in Continuous Ranked Probability Score when tested on LSTM, Transformer, and xLSTM backbones.
Alleviation of over-squashing as shown through theoretical analysis.
Improvements in spectral connectivity and reductions in effective resistance of the graph.
Enhanced covariance calibration bounds for the uncertainty module.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Curvature-based diagnostics could help identify structural issues in a wider range of graph neural network models for sequential data.
The emphasis on residual correlations suggests potential benefits for probabilistic forecasting in non-spatial domains if adapted appropriately.
This method highlights a path for incorporating geometric graph properties into deep learning to address fundamental limitations like information bottlenecks.

Load-bearing premise

Discrete Forman curvature reliably identifies the specific edges whose strengthening will mitigate over-squashing and improve error propagation in autoregressive spatio-temporal models.

What would settle it

Ablating the curvature identification and rewiring steps while keeping the rest of Teger fixed and checking if the reported CRPS improvements and theoretical benefits no longer appear on the four real-world datasets.

Figures

Figures reproduced from arXiv: 2605.18068 by Adrian Munteanu, Bruno Cornelis, Esther Rodrigo Bonet, Seyed Mohamad Moghadas.

**Figure 2.** Figure 2: Analysis of a rewired subgraph in a congested area of Brussels under a sampled inference [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Analysis of spatial error propagation reduction for PeMS03 dataset. The top row shows the [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative comparison of results forecasted by baseline and variants of Teger for the [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Motivation of our work. Part (iii). For any x ̸= 0, x ⊤Qtx = α∥x∥ 2 + βx ⊤L (R) t x + σmin∥x∥ 2 ≥ (α + σmin)∥x∥ 2 > 0, so Qt ≻ 0 and hence Gt = Q−1 t is symmetric positive definite. B.2 Proof of Lemma 2 (Laplacian Monotonicity) Proof. For any symmetric non-negative weight matrix A, the unnormalised Laplacian is LA = diag(A) − A ≜ D − A where D is the degree matrix. Linearity of Lemma 2 gives LW+∆W = LW + L… view at source ↗

**Figure 6.** Figure 6: Analysis of computation resources for Brussels dataset. Teger consumes resources less [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

**Figure 7.** Figure 7: An Illustrative traffic snapshot at the three annotated sensor clusters. Images are static [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

read the original abstract

Residual error propagation remains a fundamental problem in recurrent models, where small prediction inaccuracies compound over time and degrade long-horizon performance. Accurately modeling the correlation structure of such residuals is critical for reliable uncertainty quantification in probabilistic multivariate timeseries forecasting. While recent time-series deep models efficiently parametrize time-varying contemporaneous correlations, they often assume temporal independence of errors and neglect spatial correlation across the observed network. In this paper, we introduce Teger, a structured uncertainty module that overcomes the spa- tial and temporal limitations of error-correlated autoregressive forecasting. Teger proposes a spatial curvature-aware graph rewiring mechanism explicitly strengthening information-bottleneck edges identified by discrete Forman curvature. The component is integrated into a low-rank-plus-diagonal covariance head, preserving tractable inference via the Woodbury identity. Teger is backbone-agnostic, requiring only the latent state produced by any autoregressive encoder. We provide theoretical evidence of Teger, and experimentally evaluate it on LSTM, Transformer, and xLSTM backbones across four real-world spatio-temporal datasets, showing consistent improvement in Continuous Ranked Probability Score (CRPS). We further provide a formal theoretical analysis connecting curvature-aware rewiring to (i) oversquashing alleviation, (ii) improved spectral connectivity, (iii) reduced effective resistance, and (iv) improved covariance calibration bounds

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Teger wires Forman curvature into residual covariance modeling for spatio-temporal forecasts, with decent empirical gains but a loose theoretical link from static curvature to the actual error propagation problem.

read the letter

The paper introduces Teger, a module that rewires the spatial graph using discrete Forman curvature to strengthen bottleneck edges, then feeds the result into a low-rank-plus-diagonal covariance head for autoregressive forecasting. The Woodbury identity keeps inference tractable, and the design works on top of any encoder such as LSTM, Transformer, or xLSTM. Experiments show consistent CRPS drops across four real-world datasets, which is the main practical takeaway for people doing traffic or climate forecasting where spatial correlations matter. That combination of curvature rewiring and structured uncertainty is new enough to note, even if the pieces come from earlier graph and covariance work. The theory section links rewiring to lower effective resistance and better spectral connectivity, which is standard and correctly cited. The experiments appear solid enough to report gains on multiple backbones without obvious cherry-picking. The soft spot is the missing step that would show why Forman curvature, computed once from local degrees, actually targets the edges that control residual correlations under autoregressive rollout. The stress-test concern holds: the paper does not derive a bound from the forecasting objective to the curvature choice, so the mechanism could be improving generic connectivity rather than the specific spatio-temporal residual issue. No error bars or curvature-threshold ablations are mentioned in the abstract, which is a minor but fixable gap. This paper is for graph-time-series researchers who already run autoregressive models and want a plug-in uncertainty upgrade. A reader looking for incremental but usable improvements will get value from the results. It deserves peer review because the empirical pattern is positive and the integration is clean, even if the theory section needs tightening on the curvature-to-CRPS connection.

Referee Report

2 major / 1 minor

Summary. The paper introduces Teger, a backbone-agnostic structured uncertainty module for autoregressive spatio-temporal forecasting. Teger applies a one-time discrete Forman curvature computation on the spatial graph to identify and strengthen information-bottleneck edges via rewiring; the resulting graph is used inside a low-rank-plus-diagonal covariance head whose inference remains tractable via the Woodbury identity. The authors supply a theoretical analysis linking the rewiring step to oversquashing alleviation, improved spectral connectivity, reduced effective resistance, and tighter covariance calibration bounds, and report consistent CRPS gains when Teger is attached to LSTM, Transformer, and xLSTM encoders on four real-world datasets.

Significance. If the claimed theoretical links can be made rigorous and the empirical gains prove robust to ablations and statistical controls, the work would offer a concrete, graph-theoretic remedy for the spatial and temporal independence assumptions that currently limit residual modeling in probabilistic time-series forecasters. The design choices that preserve tractability (Woodbury identity) and generality (backbone-agnostic latent-state interface) are practical strengths.

major comments (2)

[theoretical analysis] Theoretical analysis (abstract and § on curvature rewiring): the paper connects discrete Forman curvature rewiring to generic graph quantities (reduced effective resistance, spectral gap) but supplies no derivation showing that these quantities bound the CRPS or the calibration error of the low-rank-plus-diagonal covariance under autoregressive rollout. Because Forman curvature is computed statically from combinatorial structure and is independent of the encoder’s latent states or the evolving residual covariance, the central claim that the rewiring specifically mitigates spatio-temporal residual error propagation remains an assumption rather than a derived result.
[experimental evaluation] Experimental evaluation (abstract and results section): the manuscript asserts “consistent CRPS gains across backbones and datasets” yet provides no mention of error bars, statistical significance tests, ablation controls that isolate the curvature rewiring from generic connectivity improvements, or the procedure used to select curvature thresholds. Without these controls it is impossible to determine whether the reported gains are attributable to the proposed mechanism or to incidental changes in graph density.

minor comments (1)

[abstract] Abstract: the phrase “theoretical evidence of Teger” is used without any equation or key lemma; a single-sentence pointer to the main theoretical statement would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and will revise the manuscript to incorporate clarifications and additional material where needed.

read point-by-point responses

Referee: [theoretical analysis] Theoretical analysis (abstract and § on curvature rewiring): the paper connects discrete Forman curvature rewiring to generic graph quantities (reduced effective resistance, spectral gap) but supplies no derivation showing that these quantities bound the CRPS or the calibration error of the low-rank-plus-diagonal covariance under autoregressive rollout. Because Forman curvature is computed statically from combinatorial structure and is independent of the encoder’s latent states or the evolving residual covariance, the central claim that the rewiring specifically mitigates spatio-temporal residual error propagation remains an assumption rather than a derived result.

Authors: We thank the referee for this observation. The manuscript's theoretical analysis establishes that discrete Forman curvature identifies information bottlenecks and that the resulting rewiring improves spectral gap and reduces effective resistance, which we connect to oversquashing alleviation and to calibration bounds for the low-rank-plus-diagonal covariance. While these graph quantities are static, they directly affect the spatial message-passing structure used during autoregressive rollout. We acknowledge that an explicit end-to-end derivation bounding CRPS or calibration error from effective resistance under rollout is not fully expanded. In revision we will add a dedicated subsection that derives such bounds, showing how reduced effective resistance tightens the covariance calibration and thereby improves CRPS in the spatio-temporal setting. revision: yes
Referee: [experimental evaluation] Experimental evaluation (abstract and results section): the manuscript asserts “consistent CRPS gains across backbones and datasets” yet provides no mention of error bars, statistical significance tests, ablation controls that isolate the curvature rewiring from generic connectivity improvements, or the procedure used to select curvature thresholds. Without these controls it is impossible to determine whether the reported gains are attributable to the proposed mechanism or to incidental changes in graph density.

Authors: We agree that stronger statistical controls and ablations are required. The reported CRPS improvements are consistent across LSTM, Transformer, and xLSTM backbones on four datasets, but the current version lacks error bars, significance tests, and targeted ablations. In the revised manuscript we will (i) report mean CRPS with standard deviation over five random seeds, (ii) include paired statistical tests (t-test and Wilcoxon) with p-values, (iii) add ablations that replace Forman-curvature rewiring with random rewiring or degree-based rewiring while keeping the same edge count, and (iv) document the curvature-threshold selection procedure together with a sensitivity plot. These additions will isolate the contribution of the curvature mechanism from generic density changes. revision: yes

Circularity Check

1 steps flagged

Forman curvature rewiring link to residual covariance calibration rests on internal assumption without derivation from forecasting objective

specific steps

self definitional [Abstract / theoretical analysis paragraph]
"Teger proposes a spatial curvature-aware graph rewiring mechanism explicitly strengthening information-bottleneck edges identified by discrete Forman curvature. ... We further provide a formal theoretical analysis connecting curvature-aware rewiring to (i) oversquashing alleviation, (ii) improved spectral connectivity, (iii) reduced effective resistance, and (iv) improved covariance calibration bounds"

The paper defines the rewiring rule as strengthening edges flagged by Forman curvature and then presents a theoretical analysis that connects this same rewiring operation to the listed graph properties and to improved covariance calibration. Because no separate derivation is supplied showing that the curvature-selected edges bound the CRPS or the calibration error of the low-rank-plus-diagonal head under autoregressive rollout, the claimed improvement reduces to a re-expression of the chosen mechanism rather than an independent consequence of the forecasting objective.

full rationale

The paper proposes Teger as a curvature-aware rewiring module integrated into a low-rank-plus-diagonal covariance head and supplies a formal theoretical analysis connecting the rewiring to oversquashing alleviation, spectral connectivity, effective resistance, and covariance calibration bounds. However, the central load-bearing step—that static discrete Forman curvature computed on the spatial graph identifies precisely the information-bottleneck edges whose strengthening will improve autoregressive residual error propagation and CRPS—receives no derivation from the forecasting loss or the evolving residual covariance. The analysis instead shows general graph-theoretic consequences of rewiring, which are then asserted to translate into the specific spatio-temporal forecasting gains. This leaves the claimed theoretical evidence partially dependent on the mechanism definition itself rather than an independent reduction from the model objective, producing moderate circularity risk while still leaving room for the experimental results to provide separate support.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Review performed on abstract only; full derivation details, parameter counts, and independent evidence for the curvature assumption are unavailable.

axioms (1)

domain assumption Discrete Forman curvature identifies information-bottleneck edges whose strengthening alleviates over-squashing in residual propagation
Invoked as the core mechanism of the rewiring component without further justification in the abstract.

invented entities (1)

Teger module no independent evidence
purpose: Structured uncertainty modeling via curvature-aware rewiring and low-rank-plus-diagonal covariance
Newly introduced component whose independent evidence is limited to the abstract's experimental claims.

pith-pipeline@v0.9.0 · 5779 in / 1430 out tokens · 37690 ms · 2026-05-20T11:56:35.894901+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Teger proposes a spatial curvature-aware graph rewiring mechanism explicitly strengthening information-bottleneck edges identified by discrete Forman curvature... formal theoretical analysis connecting curvature-aware rewiring to (i) oversquashing alleviation, (ii) improved spectral connectivity, (iii) reduced effective resistance, and (iv) improved covariance calibration bounds
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

edges (i, j) with Balanced Forman curvature κ(i, j)≤ −2 +δ ... act as information bottlenecks

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages

[1]

On the bottleneck of graph neural networks and its practical implications

Uri Alon and Eran Yahav. On the bottleneck of graph neural networks and its practical implications. In9th International Conference on Learning Representations, ICLR 2021, 2021

work page 2021
[2]

Tactis-2: Better, faster, simpler attentional copulas for multivariate time series, 2024

Arjun Ashok, Étienne Marcotte, Valentina Zantedeschi, Nicolas Chapados, and Alexandre Drouin. Tactis-2: Better, faster, simpler attentional copulas for multivariate time series, 2024

work page 2024
[3]

Bronstein, and Francesco Di Giovanni

Federico Barbero, Ameya Velingker, Amin Saberi, Michael M. Bronstein, and Francesco Di Giovanni. Locality-aware graph rewiring in GNNs. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024
[4]

xlstm: Extended long short-term memory.Advances in Neural Information Processing Systems, 37:107547– 107603, 2024

Maximilian Beck, Korbinian Poppel, Markus Spanring, Andreas Auer, Oleksandra Prudnikova, Michael Kopp, Gunter Klambauer, Johannes Brandstetter, and Sepp Hochreiter. xlstm: Extended long short-term memory.Advances in Neural Information Processing Systems, 37:107547– 107603, 2024

work page 2024
[5]

Deep learning for time series forecasting: Tutorial and literature survey.ACM Computing Surveys, 55(6):1–36, December 2022

Konstantinos Benidis, Syama Sundar Rangapuram, Valentin Flunkert, Yuyang Wang, Danielle Maddix, Caner Turkmen, Jan Gasthaus, Michael Bohlke-Schneider, David Salinas, Lorenzo Stella, François-Xavier Aubet, Laurent Callot, and Tim Januschowski. Deep learning for time series forecasting: Tutorial and literature survey.ACM Computing Surveys, 55(6):1–36, December 2022

work page 2022
[6]

Higham.Accuracy and stability of numerical algorithms

Nicholas J. Higham.Accuracy and stability of numerical algorithms. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, second edition, 2002

work page 2002
[7]

Horn and Charles R

Roger A. Horn and Charles R. Johnson.Matrix Analysis. Cambridge University Press, 1985

work page 1985
[8]

Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting, 2020

Shiyang Li, Xiaoyong Jin, Yao Xuan, Xiyou Zhou, Wenhu Chen, Yu-Xiang Wang, and Xifeng Yan. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting, 2020

work page 2020
[9]

A traffic flow prediction model based on dynamic graph convolution and adaptive spatial feature extraction

Weijun Li, Guoliang Yang, Zhangyou Xiong, Xiaojuan Zhu, and Xinyu Ma. A traffic flow prediction model based on dynamic graph convolution and adaptive spatial feature extraction. Symmetry, 17(7), 2025

work page 2025
[10]

Diffusion convolutional recurrent neural net- work: Data-driven traffic forecasting

Yaguang Li, Rose Yu, Cyrus Shahabi, and Yan Liu. Diffusion convolutional recurrent neural net- work: Data-driven traffic forecasting. InInternational Conference on Learning Representations, 2018

work page 2018
[11]

Arik, Nicolas Loeff, and Tomas Pfister

Bryan Lim, Sercan O. Arik, Nicolas Loeff, and Tomas Pfister. Temporal fusion transformers for interpretable multi-horizon time series forecasting, 2020

work page 2020
[12]

Liu, and Schahram Dustdar

Shizhan Liu, Hang Yu, Cong Liao, Jianguo Li, Weiyao Lin, Alex X. Liu, and Schahram Dustdar. Pyraformer: Low-complexity pyramidal attention for long-range time series modeling and forecasting. InInternational Conference on Learning Representations, 2022

work page 2022
[13]

Over-squashing in spatiotemporal graph neural networks

Ivan Marisca, Jacob Bamberger, Cesare Alippi, and Michael M Bronstein. Over-squashing in spatiotemporal graph neural networks. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems

work page
[14]

Statnet: Spatial-temporal attention in the traffic prediction

Seyed Mohamad Moghadas, Amin Gheibi, and Alexander Alahi. Statnet: Spatial-temporal attention in the traffic prediction. InhEART 2022: 10th Symposium of the European Association for Research in Transportation, 2022

work page 2022
[15]

Basisformer: Attention-based time series forecasting with learnable and interpretable basis, 2024

Zelin Ni, Hang Yu, Shizhan Liu, Jianguo Li, and Weiyao Lin. Basisformer: Attention-based time series forecasting with learnable and interpretable basis, 2024

work page 2024
[16]

Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam

Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers, 2023

work page 2023
[17]

Oreshkin, Dmitri Carpov, Nicolas Chapados, and Yoshua Bengio

Boris N. Oreshkin, Dmitri Carpov, Nicolas Chapados, and Yoshua Bengio. N-beats: Neural basis expansion analysis for interpretable time series forecasting, 2020. 10

work page 2020
[18]

Towards a unified understanding of uncertainty quantification in traffic flow forecasting, 01 2023

Weizhu Qian, Yan Zhao, Dalin Zhang, Bowei Chen, Kai Zheng, and Xiaofang Zhou. Towards a unified understanding of uncertainty quantification in traffic flow forecasting, 01 2023

work page 2023
[19]

High-dimensional multivariate forecasting with low-rank gaussian copula processes, 2019

David Salinas, Michael Bohlke-Schneider, Laurent Callot, Roberto Medico, and Jan Gasthaus. High-dimensional multivariate forecasting with low-rank gaussian copula processes, 2019

work page 2019
[20]

Discrete graph structure learning for forecasting multiple time series

Chao Shang, Jie Chen, and Jinbo Bi. Discrete graph structure learning for forecasting multiple time series. InInternational Conference on Learning Representations, 2021

work page 2021
[21]

Spatial-temporal synchronous graph convolutional networks: A new framework for spatial-temporal network data forecasting

Chao Song, Youfang Lin, Shengnan Guo, and Huaiyu Wan. Spatial-temporal synchronous graph convolutional networks: A new framework for spatial-temporal network data forecasting. Proceedings of the AAAI Conference on Artificial Intelligence, 34(01):914–921, Apr. 2020

work page 2020
[22]

Bronstein

Jake Topping, Francesco Di Giovanni, Benjamin Paul Chamberlain, Xiaowen Dong, and Michael M. Bronstein. Understanding over-squashing and bottlenecks on graphs via curvature, 2022

work page 2022
[23]

Gomez, Lukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need, 2023

work page 2023
[24]

Etsformer: Expo- nential smoothing transformers for time-series forecasting, 2022

Gerald Woo, Chenghao Liu, Doyen Sahoo, Akshat Kumar, and Steven Hoi. Etsformer: Expo- nential smoothing transformers for time-series forecasting, 2022

work page 2022
[25]

Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting, 2022

Haixu Wu, Jiehui Xu, Jianmin Wang, and Mingsheng Long. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting, 2022

work page 2022
[26]

Connecting the dots: Multivariate time series forecasting with graph neural networks

Zonghan Wu, Shirui Pan, Guodong Long, Jing Jiang, Xiaojun Chang, and Chengqi Zhang. Connecting the dots: Multivariate time series forecasting with graph neural networks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’20, page 753–763, New York, NY , USA, 2020. Association for Computing Machinery

work page 2020
[27]

Graph wavenet for deep spatial-temporal graph modeling

Zonghan Wu, Shirui Pan, Guodong Long, Jing Jiang, and Chengqi Zhang. Graph wavenet for deep spatial-temporal graph modeling. InProceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, pages 1907–1913. International Joint Conferences on Artificial Intelligence Organization, 7 2019

work page 1907
[28]

Gman: A graph multi- attention network for traffic prediction.Proceedings of the AAAI Conference on Artificial Intelligence, 34(01):1234–1241, Apr

Chuanpan Zheng, Xiaoliang Fan, Cheng Wang, and Jianzhong Qi. Gman: A graph multi- attention network for traffic prediction.Proceedings of the AAAI Conference on Artificial Intelligence, 34(01):1234–1241, Apr. 2020

work page 2020
[29]

Multivariate probabilistic time series forecasting with correlated errors

Vincent Zhihao Zheng and Lijun Sun. Multivariate probabilistic time series forecasting with correlated errors. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

work page 2024
[30]

Informer: Beyond efficient transformer for long sequence time-series forecasting, 2021

Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. Informer: Beyond efficient transformer for long sequence time-series forecasting, 2021

work page 2021
[31]

Limitations

Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting, 2022. A Motivation B Proofs B.1 Proof of Proposition 1 (Validity of Rewired Covariance) Part (i).The graph Laplacian of any symmetric, entrywise nonnegative matrix is PSD: x⊤L′ tx= 1 2 P i,j W ′ ij,t...

work page 2022
[32]

Guidelines: • The answer [N/A] means that the paper does not involve crowdsourcing nor research with human subjects

Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...

work page

[1] [1]

On the bottleneck of graph neural networks and its practical implications

Uri Alon and Eran Yahav. On the bottleneck of graph neural networks and its practical implications. In9th International Conference on Learning Representations, ICLR 2021, 2021

work page 2021

[2] [2]

Tactis-2: Better, faster, simpler attentional copulas for multivariate time series, 2024

Arjun Ashok, Étienne Marcotte, Valentina Zantedeschi, Nicolas Chapados, and Alexandre Drouin. Tactis-2: Better, faster, simpler attentional copulas for multivariate time series, 2024

work page 2024

[3] [3]

Bronstein, and Francesco Di Giovanni

Federico Barbero, Ameya Velingker, Amin Saberi, Michael M. Bronstein, and Francesco Di Giovanni. Locality-aware graph rewiring in GNNs. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024

[4] [4]

xlstm: Extended long short-term memory.Advances in Neural Information Processing Systems, 37:107547– 107603, 2024

Maximilian Beck, Korbinian Poppel, Markus Spanring, Andreas Auer, Oleksandra Prudnikova, Michael Kopp, Gunter Klambauer, Johannes Brandstetter, and Sepp Hochreiter. xlstm: Extended long short-term memory.Advances in Neural Information Processing Systems, 37:107547– 107603, 2024

work page 2024

[5] [5]

Deep learning for time series forecasting: Tutorial and literature survey.ACM Computing Surveys, 55(6):1–36, December 2022

Konstantinos Benidis, Syama Sundar Rangapuram, Valentin Flunkert, Yuyang Wang, Danielle Maddix, Caner Turkmen, Jan Gasthaus, Michael Bohlke-Schneider, David Salinas, Lorenzo Stella, François-Xavier Aubet, Laurent Callot, and Tim Januschowski. Deep learning for time series forecasting: Tutorial and literature survey.ACM Computing Surveys, 55(6):1–36, December 2022

work page 2022

[6] [6]

Higham.Accuracy and stability of numerical algorithms

Nicholas J. Higham.Accuracy and stability of numerical algorithms. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, second edition, 2002

work page 2002

[7] [7]

Horn and Charles R

Roger A. Horn and Charles R. Johnson.Matrix Analysis. Cambridge University Press, 1985

work page 1985

[8] [8]

Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting, 2020

Shiyang Li, Xiaoyong Jin, Yao Xuan, Xiyou Zhou, Wenhu Chen, Yu-Xiang Wang, and Xifeng Yan. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting, 2020

work page 2020

[9] [9]

A traffic flow prediction model based on dynamic graph convolution and adaptive spatial feature extraction

Weijun Li, Guoliang Yang, Zhangyou Xiong, Xiaojuan Zhu, and Xinyu Ma. A traffic flow prediction model based on dynamic graph convolution and adaptive spatial feature extraction. Symmetry, 17(7), 2025

work page 2025

[10] [10]

Diffusion convolutional recurrent neural net- work: Data-driven traffic forecasting

Yaguang Li, Rose Yu, Cyrus Shahabi, and Yan Liu. Diffusion convolutional recurrent neural net- work: Data-driven traffic forecasting. InInternational Conference on Learning Representations, 2018

work page 2018

[11] [11]

Arik, Nicolas Loeff, and Tomas Pfister

Bryan Lim, Sercan O. Arik, Nicolas Loeff, and Tomas Pfister. Temporal fusion transformers for interpretable multi-horizon time series forecasting, 2020

work page 2020

[12] [12]

Liu, and Schahram Dustdar

Shizhan Liu, Hang Yu, Cong Liao, Jianguo Li, Weiyao Lin, Alex X. Liu, and Schahram Dustdar. Pyraformer: Low-complexity pyramidal attention for long-range time series modeling and forecasting. InInternational Conference on Learning Representations, 2022

work page 2022

[13] [13]

Over-squashing in spatiotemporal graph neural networks

Ivan Marisca, Jacob Bamberger, Cesare Alippi, and Michael M Bronstein. Over-squashing in spatiotemporal graph neural networks. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems

work page

[14] [14]

Statnet: Spatial-temporal attention in the traffic prediction

Seyed Mohamad Moghadas, Amin Gheibi, and Alexander Alahi. Statnet: Spatial-temporal attention in the traffic prediction. InhEART 2022: 10th Symposium of the European Association for Research in Transportation, 2022

work page 2022

[15] [15]

Basisformer: Attention-based time series forecasting with learnable and interpretable basis, 2024

Zelin Ni, Hang Yu, Shizhan Liu, Jianguo Li, and Weiyao Lin. Basisformer: Attention-based time series forecasting with learnable and interpretable basis, 2024

work page 2024

[16] [16]

Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam

Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers, 2023

work page 2023

[17] [17]

Oreshkin, Dmitri Carpov, Nicolas Chapados, and Yoshua Bengio

Boris N. Oreshkin, Dmitri Carpov, Nicolas Chapados, and Yoshua Bengio. N-beats: Neural basis expansion analysis for interpretable time series forecasting, 2020. 10

work page 2020

[18] [18]

Towards a unified understanding of uncertainty quantification in traffic flow forecasting, 01 2023

Weizhu Qian, Yan Zhao, Dalin Zhang, Bowei Chen, Kai Zheng, and Xiaofang Zhou. Towards a unified understanding of uncertainty quantification in traffic flow forecasting, 01 2023

work page 2023

[19] [19]

High-dimensional multivariate forecasting with low-rank gaussian copula processes, 2019

David Salinas, Michael Bohlke-Schneider, Laurent Callot, Roberto Medico, and Jan Gasthaus. High-dimensional multivariate forecasting with low-rank gaussian copula processes, 2019

work page 2019

[20] [20]

Discrete graph structure learning for forecasting multiple time series

Chao Shang, Jie Chen, and Jinbo Bi. Discrete graph structure learning for forecasting multiple time series. InInternational Conference on Learning Representations, 2021

work page 2021

[21] [21]

Spatial-temporal synchronous graph convolutional networks: A new framework for spatial-temporal network data forecasting

Chao Song, Youfang Lin, Shengnan Guo, and Huaiyu Wan. Spatial-temporal synchronous graph convolutional networks: A new framework for spatial-temporal network data forecasting. Proceedings of the AAAI Conference on Artificial Intelligence, 34(01):914–921, Apr. 2020

work page 2020

[22] [22]

Bronstein

Jake Topping, Francesco Di Giovanni, Benjamin Paul Chamberlain, Xiaowen Dong, and Michael M. Bronstein. Understanding over-squashing and bottlenecks on graphs via curvature, 2022

work page 2022

[23] [23]

Gomez, Lukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need, 2023

work page 2023

[24] [24]

Etsformer: Expo- nential smoothing transformers for time-series forecasting, 2022

Gerald Woo, Chenghao Liu, Doyen Sahoo, Akshat Kumar, and Steven Hoi. Etsformer: Expo- nential smoothing transformers for time-series forecasting, 2022

work page 2022

[25] [25]

Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting, 2022

Haixu Wu, Jiehui Xu, Jianmin Wang, and Mingsheng Long. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting, 2022

work page 2022

[26] [26]

Connecting the dots: Multivariate time series forecasting with graph neural networks

Zonghan Wu, Shirui Pan, Guodong Long, Jing Jiang, Xiaojun Chang, and Chengqi Zhang. Connecting the dots: Multivariate time series forecasting with graph neural networks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’20, page 753–763, New York, NY , USA, 2020. Association for Computing Machinery

work page 2020

[27] [27]

Graph wavenet for deep spatial-temporal graph modeling

Zonghan Wu, Shirui Pan, Guodong Long, Jing Jiang, and Chengqi Zhang. Graph wavenet for deep spatial-temporal graph modeling. InProceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, pages 1907–1913. International Joint Conferences on Artificial Intelligence Organization, 7 2019

work page 1907

[28] [28]

Gman: A graph multi- attention network for traffic prediction.Proceedings of the AAAI Conference on Artificial Intelligence, 34(01):1234–1241, Apr

Chuanpan Zheng, Xiaoliang Fan, Cheng Wang, and Jianzhong Qi. Gman: A graph multi- attention network for traffic prediction.Proceedings of the AAAI Conference on Artificial Intelligence, 34(01):1234–1241, Apr. 2020

work page 2020

[29] [29]

Multivariate probabilistic time series forecasting with correlated errors

Vincent Zhihao Zheng and Lijun Sun. Multivariate probabilistic time series forecasting with correlated errors. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

work page 2024

[30] [30]

Informer: Beyond efficient transformer for long sequence time-series forecasting, 2021

Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. Informer: Beyond efficient transformer for long sequence time-series forecasting, 2021

work page 2021

[31] [31]

Limitations

Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting, 2022. A Motivation B Proofs B.1 Proof of Proposition 1 (Validity of Rewired Covariance) Part (i).The graph Laplacian of any symmetric, entrywise nonnegative matrix is PSD: x⊤L′ tx= 1 2 P i,j W ′ ij,t...

work page 2022

[32] [32]

Guidelines: • The answer [N/A] means that the paper does not involve crowdsourcing nor research with human subjects

Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...

work page