pith. sign in

arxiv: 2606.21022 · v1 · pith:477PVX74new · submitted 2026-06-19 · 💻 cs.LG · cs.AI

Structure-Aware Graph Multi-Task Learning for Dynamic Sparse OD Demand Prediction

Pith reviewed 2026-06-26 14:59 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords OD demand predictionmulti-task learninggraph neural networkssparse dataurban mobilityorigin-destination flows
0
0 comments X

The pith

A graph multi-task model decomposes sparse OD demand prediction into joint modeling of regional activity states, OD connection activity, and flow intensity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that single-task flow regression struggles with dynamically sparse and long-tailed OD data because it cannot reliably separate whether a connection is active from the demand volume it carries once active. SAGMTL instead frames the problem as three jointly learned tasks inside one structure-aware graph framework: regional activity states, OD connection activity, and edge-level flow intensity. A node-edge collaborative module builds representations that combine regional semantics, temporal dynamics, and spatial priors through interactive updates. A multi-constraint loss then enforces sparsity awareness and structural consistency. On mobility datasets from Beijing, Chengdu, and Nanjing the joint approach outperforms prior single-task baselines.

Core claim

SAGMTL decomposes OD prediction into structural state modeling and flow intensity estimation, jointly learning regional activity states, OD connection activity, and edge-level flow intensity within a unified framework using node-edge collaborative representations and a multi-constraint objective.

What carries the argument

The node-edge collaborative representation module, which produces structure-aware representations by performing interactive node-edge updates that capture regional semantics, temporal dynamics, and spatial priors.

If this is right

  • Joint task learning improves the model's ability to distinguish active from inactive OD pairs in sparse, long-tailed settings.
  • Structure-aware node-edge representations better capture the spatial and temporal priors that govern dynamic OD interactions.
  • The multi-constraint objective increases robustness to heterogeneous zero-flow patterns across different cities.
  • Performance gains hold when the same framework is applied to multiple real-world urban mobility datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same state-plus-intensity decomposition could be tested on other sparse graph regression tasks such as traffic speed or social contact prediction.
  • Ablating the connection-activity task while keeping the other two should produce a measurable drop in accuracy if the separation is load-bearing.
  • The approach may generalize to non-urban graphs where zero entries also mix inactivity with low intensity.

Load-bearing premise

That explicitly separating and jointly learning regional activity states, OD connection activity, and edge-level flow intensity will resolve the difficulty of distinguishing active connections from flow volume in heterogeneous zero-flow patterns.

What would settle it

On the Beijing, Chengdu, or Nanjing datasets, a single-task regression version of the same graph architecture that matches or exceeds SAGMTL performance would falsify the claim that the multi-task decomposition is necessary.

Figures

Figures reproduced from arXiv: 2606.21022 by Jiawei Cao, Ming Xu.

Figure 1
Figure 1. Figure 1: Overall architecture of the proposed SAGMTL framework. Given historical OD flows, edge activation states, spatial relationships, and static regional [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Fitting results on a representative stable OD edge. The gray curve denotes the ground-truth flow, the red curve denotes the prediction of SAGMTL, [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Fitting results on a representative bursty OD edge. The gray curve denotes the ground-truth flow, the red curve denotes the prediction of SAGMTL, [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Fitting results on a representative highly volatile OD edge. The gray curve denotes the ground-truth flow, the red curve denotes the prediction of [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Parameter sensitivity analysis of SAGMTL on the Beijing dataset. Panels (a)–(i) report the effects of the spatial residual weight, the number of [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗
read the original abstract

Origin-Destination (OD) demand prediction is fundamental to intelligent transportation systems, yet real-world OD flows are often dynamically sparse, long-tailed, and characterized by heterogeneous zero-flow patterns. These properties make it difficult to distinguish whether an OD connection is active from how much demand it generates once activated. Many existing methods primarily treat OD prediction as a single flow regression task, which limits their ability to model low-frequency, intermittent, and long-tailed OD interactions. To address these challenges, we propose SAGMTL, a Structure-Aware Graph Multi-Task Learning framework for dynamic sparse OD demand prediction. SAGMTL decomposes OD prediction into structural state modeling and flow intensity estimation, jointly learning regional activity states, OD connection activity, and edge-level flow intensity within a unified framework. Specifically, a node-edge collaborative representation module captures regional semantics, temporal dynamics, and spatial priors through interactive node-edge updates, producing structure-aware representations for dynamic OD interactions. Based on these representations, SAGMTL estimates OD flows by jointly modeling stable demand patterns and short-term fluctuations. A multi-constraint objective further improves sparsity awareness and structural consistency. Experiments on three real-world urban mobility datasets from Beijing, Chengdu, and Nanjing show that SAGMTL achieves superior overall performance compared with state-of-the-art baselines. Further analysis demonstrates that explicitly modeling regional activity, connection states, and flow intensity improves the robustness of dynamic sparse OD demand prediction.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 1 minor

Summary. The paper proposes SAGMTL, a Structure-Aware Graph Multi-Task Learning framework for dynamic sparse OD demand prediction. It decomposes the task into structural state modeling (regional activity states, OD connection activity) and flow intensity estimation, using a node-edge collaborative representation module for structure-aware representations and a multi-constraint objective for sparsity awareness. Experiments on three real-world datasets (Beijing, Chengdu, Nanjing) claim superior performance over state-of-the-art baselines, with further analysis showing benefits from explicit multi-task modeling of activity, states, and intensity.

Significance. If the results and ablations hold, the multi-task decomposition of sparse heterogeneous OD flows could meaningfully advance transportation demand modeling by separating activation from volume, with potential applicability to other long-tailed graph regression settings. The structure-aware node-edge updates and joint learning of stable/short-term patterns represent a targeted response to a recognized practical difficulty.

minor comments (1)
  1. The abstract references specific datasets and superiority claims but provides no quantitative metrics, error bars, or ablation tables; these should be highlighted in the results section for reproducibility.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their summary of SAGMTL and for noting the potential value of the multi-task decomposition for handling sparse, heterogeneous OD flows. The recommendation is listed as uncertain, yet the report contains no specific major comments or questions for us to address. We remain available to provide clarifications or additional experiments should any be requested.

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained in empirical claims

full rationale

Only the abstract is supplied; it describes a multi-task decomposition into structural state modeling and flow intensity estimation plus a multi-constraint objective, but supplies no equations, fitted parameters, or self-citations that could reduce any claimed prediction to its own inputs by construction. The central result is an empirical performance comparison on three external datasets, which is falsifiable outside any internal fit and does not invoke uniqueness theorems or ansatzes from prior author work. No load-bearing step can be isolated, so the default finding of no circularity applies.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that the proposed task decomposition matches the statistical structure of real OD data; no free parameters or invented physical entities are named in the abstract.

axioms (1)
  • domain assumption Decomposing OD prediction into regional activity, connection activity, and flow intensity tasks will improve modeling of dynamic sparsity and long-tailed patterns.
    Invoked in the problem statement and method description as the solution to the stated limitations of single-task regression.

pith-pipeline@v0.9.1-grok · 5778 in / 1296 out tokens · 23768 ms · 2026-06-26T14:59:37.872112+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

50 extracted references · 28 canonical work pages

  1. [1]

    2021 , journal =

    J. Ke, X. Qin, H. Yang, Z. Zheng, Z. Zhu, and J. Ye, “Predicting origin-destination ride-sourcing demand with a spatio-temporal encoder- decoder residual multi-graph convolutional network,”Transportation Research Part C: Emerging Technologies, vol. 122, Art. no. 102858, 2021, doi: 10.1016/j.trc.2020.102858

  2. [2]

    On- line metro origin-destination prediction via heterogeneous informa- tion aggregation,

    L. Liu, Y . Zhu, G. Li, Z. Wu, L. Bai, M. Mao, and L. Lin, “On- line metro origin-destination prediction via heterogeneous informa- tion aggregation,”IEEE Transactions on Pattern Analysis and Ma- chine Intelligence, vol. 45, no. 3, pp. 3574–3589, Mar. 2023, doi: 10.1109/TPAMI.2022.3178184. 14

  3. [3]

    Discovery of critical nodes in road networks through mining from vehicle trajecto- ries,

    M. Xu, J. Wu, M. Liu, Y . Xiao, H. Wang, and D. Hu, “Discovery of critical nodes in road networks through mining from vehicle trajecto- ries,”IEEE Transactions on Intelligent Transportation Systems, vol. 20, no. 2, pp. 583–593, Feb. 2019

  4. [4]

    Learning to rank critical road segments via heterogeneous graphs with origin-destination flow integration,

    M. Xu, J. Xiang, Z. Xie, and X. Meng, “Learning to rank critical road segments via heterogeneous graphs with origin-destination flow integration,”Information Processing & Management, vol. 63, no. 6, Art. no. 104702, 2026, doi: 10.1016/j.ipm.2026.104702

  5. [5]

    Short-term origin- destination demand prediction in urban rail transit systems: A channel- wise attentive split-convolutional neural network method,

    J. Zhang, H. Che, F. Chen, W. Ma, and Z. He, “Short-term origin- destination demand prediction in urban rail transit systems: A channel- wise attentive split-convolutional neural network method,”Transporta- tion Research Part C: Emerging Technologies, vol. 124, Art. no. 102928, 2021, doi: 10.1016/j.trc.2020.102928

  6. [6]

    Uncer- tainty quantification of sparse travel demand prediction with spatial- temporal graph neural networks,

    D. Zhuang, S. Wang, H. N. Koutsopoulos, and J. Zhao, “Uncer- tainty quantification of sparse travel demand prediction with spatial- temporal graph neural networks,” inProc. 28th ACM SIGKDD Conf. Knowledge Discovery and Data Mining, 2022, pp. 4639–4647, doi: 10.1145/3534678.3539093

  7. [7]

    Jiang, D

    X. Jiang, D. Zhuang, X. Zhang, H. Chen, J. Luo, and X. Gao, “Uncertainty quantification via spatial-temporal Tweedie model for zero-inflated and long-tail travel demand prediction,” inProc. 32nd ACM Int. Conf. Information and Knowledge Management, 2023, doi: 10.1145/3583780.3615215

  8. [8]

    Learning to generate temporal origin-destination flow based on urban regional features and traffic in- formation,

    C. Rong, Z. Liu, J. Ding, and Y . Li, “Learning to generate temporal origin-destination flow based on urban regional features and traffic in- formation,”ACM Transactions on Knowledge Discovery from Data, vol. 18, no. 6, Art. no. 150, pp. 150:1–150:17, 2024, doi: 10.1145/3649141

  9. [9]

    Predicting origin- destination flows by considering heterogeneous mobility patterns,

    Y . Zhao, S. Cheng, S. Gao, P. Wang, and F. Lu, “Predicting origin- destination flows by considering heterogeneous mobility patterns,”Sus- tainable Cities and Society, vol. 118, Art. no. 106015, 2025, doi: 10.1016/j.scs.2024.106015

  10. [10]

    DeepSTN+: Context- aware spatial-temporal neural network for crowd flow prediction in metropolis,

    Z. Lin, J. Feng, Z. Lu, Y . Li, and D. Jin, “DeepSTN+: Context- aware spatial-temporal neural network for crowd flow prediction in metropolis,” inProc. AAAI Conf. Artificial Intelligence, vol. 33, no. 1, 2019, pp. 1020–1027

  11. [11]

    Revisiting spatial- temporal similarity: A deep learning framework for traffic prediction,

    H. Yao, X. Tang, H. Wei, G. Zheng, and Z. Li, “Revisiting spatial- temporal similarity: A deep learning framework for traffic prediction,” inProc. AAAI Conf. Artificial Intelligence, vol. 33, no. 1, 2019, pp. 5668–5675, doi: 10.1609/aaai.v33i01.33015668

  12. [12]

    Spatio-temporal prototype-based hierarchical learning for OD demand prediction,

    S. Yuan, X. Li, W. Mu, J. Zhong, M. Chen, H. Sun, and Y . Gong, “Spatio-temporal prototype-based hierarchical learning for OD demand prediction,” inProc. 34th Int. Joint Conf. Artificial Intelligence, 2025, pp. 3597–3605

  13. [13]

    Exploiting dynamic spatio-temporal correlations for origin-destination demand pre- diction,

    Y . Gong, P. Yu, X. Zhang, X. Zhang, X. Nie, and H. Sun, “Exploiting dynamic spatio-temporal correlations for origin-destination demand pre- diction,”Expert Systems with Applications, vol. 299, Art. no. 130095, 2026, doi: 10.1016/j.eswa.2025.130095

  14. [14]

    Graph neural networks em- powered origin-destination learning for urban traffic prediction,

    C. Zhang, G. Ma, L. Zhang, and B. Shihada, “Graph neural networks em- powered origin-destination learning for urban traffic prediction,”CAAI Transactions on Intelligence Technology, 2025, doi: 10.1049/cit2.70021

  15. [15]

    Contextualized spatial-temporal network for taxi origin-destination demand prediction,

    L. Liu, Z. Qiu, G. Li, Q. Wang, W. Ouyang, and L. Lin, “Contextualized spatial-temporal network for taxi origin-destination demand prediction,” IEEE Transactions on Intelligent Transportation Systems, vol. 20, no. 10, pp. 3875–3887, Oct. 2019

  16. [16]

    Real-time forecasting of metro origin-destination matrices with high-order weighted dynamic mode decomposition,

    Z. Cheng, M. Tr ´epanier, and L. Sun, “Real-time forecasting of metro origin-destination matrices with high-order weighted dynamic mode decomposition,”Transportation Science, vol. 56, no. 4, pp. 904–918, 2022

  17. [17]

    Origin- destination demand prediction of public transit using graph convolutional neural network,

    N. K. Shanthappa, R. H. Mulangi, and H. M. Manjunath, “Origin- destination demand prediction of public transit using graph convolutional neural network,”Case Studies on Transport Policy, vol. 17, Art. no. 101230, 2024

  18. [18]

    A multi-task matrix factorized graph neural network for co-prediction of zone-based and OD-based ride-hailing demand,

    S. Feng, J. Ke, H. Yang, and J. Ye, “A multi-task matrix factorized graph neural network for co-prediction of zone-based and OD-based ride-hailing demand,”IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 6, pp. 5704–5716, Jun. 2022

  19. [19]

    Origin- destination matrix prediction via graph convolution: A new perspective of passenger demand modeling,

    Y . Wang, H. Yin, H. Chen, T. Wo, J. Xu, and K. Zheng, “Origin- destination matrix prediction via graph convolution: A new perspective of passenger demand modeling,” inProc. 25th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, 2019, pp. 1227–1235, doi: 10.1145/3292500.3330877

  20. [20]

    Pre- dicting origin-destination flow via multi-perspective graph convolutional network,

    H. Shi, Q. Yao, Q. Guo, Y . Li, L. Zhang, J. Ye, Y . Li, and Y . Liu, “Pre- dicting origin-destination flow via multi-perspective graph convolutional network,” inProc. IEEE 36th Int. Conf. Data Engineering, 2020, pp. 1818–1821

  21. [21]

    Continuous-time and multi-level graph representation learning for origin-destination demand prediction,

    L. Han, X. Ma, L. Sun, B. Du, Y . Fu, W. Lv, and H. Xiong, “Continuous-time and multi-level graph representation learning for origin-destination demand prediction,” inProc. 28th ACM SIGKDD Conf. Knowledge Discovery and Data Mining, 2022, pp. 516–524, doi: 10.1145/3534678.3539273

  22. [22]

    Dynamic graph learning based on hierarchical memory for origin-destination demand prediction,

    R. Zhang, L. Han, B. Liu, J. Zeng, and L. Sun, “Dynamic graph learning based on hierarchical memory for origin-destination demand prediction,” inProc. 31st Int. Joint Conf. Artificial Intelligence, 2022, pp. 2383–2389, doi: 10.24963/ijcai.2022/331

  23. [23]

    A spatiotemporal model for urban taxi origin-destination prediction based on multi-hop GCN and hierarchical LSTM,

    J. Rong, W. Xu, and Y . Wen, “A spatiotemporal model for urban taxi origin-destination prediction based on multi-hop GCN and hierarchical LSTM,”Alexandria Engineering Journal, vol. 128, pp. 905–917, 2025

  24. [24]

    Inductive representation learning on temporal graphs,

    D. Xu, C. Ruan, E. Korpeoglu, S. Kumar, and K. Achan, “Inductive representation learning on temporal graphs,” inProc. Int. Conf. Learning Representations, 2020

  25. [25]

    DySAT: Deep neural representation learning on dynamic graphs via self-attention networks,

    A. Sankar, Y . Wu, L. Gou, W. Zhang, and H. Yang, “DySAT: Deep neural representation learning on dynamic graphs via self-attention networks,” inProc. 13th ACM Int. Conf. Web Search and Data Mining, 2020, pp. 519–527, doi: 10.1145/3336191.3371845

  26. [26]

    TGAE: Temporal graph autoencoder for travel forecasting,

    Q. Wang, H. Jiang, M. Qiu, Y . Liu, and D. Ye, “TGAE: Temporal graph autoencoder for travel forecasting,”IEEE Transactions on Intelligent Transportation Systems, vol. 24, no. 8, pp. 8529–8541, Aug. 2023

  27. [27]

    Real-time OD connection demand forecast- ing based on streaming temporal link prediction in dynamic graphs,

    Y . Yang, H. Li, and H. Tu, “Real-time OD connection demand forecast- ing based on streaming temporal link prediction in dynamic graphs,” Transportation Research Part C: Emerging Technologies, vol. 182, Art. no. 105413, 2026, doi: 10.1016/j.trc.2025.105413

  28. [28]

    Graph neural networks with flow conservation constraints for real-time origin-destination matrix completion,

    Z. Zhang, J. Yang, X. Chen, and Y . Wu, “Graph neural networks with flow conservation constraints for real-time origin-destination matrix completion,”Pattern Recognition, vol. 170, Art. no. 112046, 2026, doi: 10.1016/j.patcog.2025.112046

  29. [29]

    Spatial-temporal identity: A simple yet effective baseline for multivariate time series forecasting,

    Z. Shao, Z. Zhang, F. Wang, W. Wei, and Y . Xu, “Spatial-temporal identity: A simple yet effective baseline for multivariate time series forecasting,” inProc. 31st ACM Int. Conf. Information & Knowledge Management, 2022, pp. 4454–4458, doi: 10.1145/3511808.3557702

  30. [30]

    Graph neural network for traffic forecasting: A survey,

    W. Jiang and J. Luo, “Graph neural network for traffic forecasting: A survey,”Expert Systems with Applications, vol. 207, Art. no. 117921, 2022, doi: 10.1016/j.eswa.2022.117921

  31. [31]

    Spatio-temporal graph neural networks for predictive learning in urban computing: A survey,

    G. Jin, Y . Liang, Y . Fang, Z. Shao, J. Huang, J. Zhang, and Y . Zheng, “Spatio-temporal graph neural networks for predictive learning in urban computing: A survey,”IEEE Transactions on Knowledge and Data Engineering, vol. 36, no. 10, pp. 5388–5408, Oct. 2024

  32. [32]

    Diffusion convolutional recurrent neural network: Data-driven traffic forecasting,

    Y . Li, R. Yu, C. Shahabi, and Y . Liu, “Diffusion convolutional recurrent neural network: Data-driven traffic forecasting,” inProc. Int. Conf. Learning Representations, 2018

  33. [33]

    Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting,

    B. Yu, H. Yin, and Z. Zhu, “Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting,” inProc. 27th Int. Joint Conf. Artificial Intelligence, 2018, pp. 3634–3640

  34. [34]

    T-GCN: A temporal graph convolutional network for traffic prediction,

    L. Zhao, Y . Song, C. Zhang, Y . Liu, P. Wang, T. Lin, M. Deng, and H. Li, “T-GCN: A temporal graph convolutional network for traffic prediction,” IEEE Transactions on Intelligent Transportation Systems, vol. 21, no. 9, pp. 3848–3858, Sep. 2020

  35. [35]

    Graph WaveNet for deep spatial-temporal graph modeling,

    Z. Wu, S. Pan, G. Long, J. Jiang, and C. Zhang, “Graph WaveNet for deep spatial-temporal graph modeling,” inProc. 28th Int. Joint Conf. Artificial Intelligence, 2019, pp. 1907–1913

  36. [36]

    Adaptive graph convolutional recurrent network for traffic forecasting,

    L. Bai, L. Yao, C. Li, X. Wang, and C. Wang, “Adaptive graph convolutional recurrent network for traffic forecasting,” inAdvances in Neural Information Processing Systems, vol. 33, 2020

  37. [37]

    Con- necting the dots: Multivariate time series forecasting with graph neural networks,

    Z. Wu, S. Pan, G. Long, J. Jiang, C. Chang, and C. Zhang, “Con- necting the dots: Multivariate time series forecasting with graph neural networks,” inProc. 26th ACM SIGKDD Conf. Knowledge Discovery and Data Mining, 2020, pp. 753–763, doi: 10.1145/3394486.3403118

  38. [38]

    Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining , pages =

    Z. Fang, Q. Long, G. Song, and K. Xie, “Spatial-temporal graph ODE networks for traffic flow forecasting,” inProc. 27th ACM SIGKDD Conf. Knowledge Discovery and Data Mining, 2021, pp. 364–373, doi: 10.1145/3447548.3467430

  39. [39]

    Decoupled dynamic spatial-temporal graph neural network for traffic forecasting,

    Z. Shao, Z. Zhang, W. Wei, F. Wang, Y . Xu, X. Cao, and C. S. Jensen, “Decoupled dynamic spatial-temporal graph neural network for traffic forecasting,”Proceedings of the VLDB Endowment, vol. 15, no. 11, pp. 2733–2746, 2022

  40. [40]

    Attention based spatial- temporal graph convolutional networks for traffic flow forecasting,

    S. Guo, Y . Lin, N. Feng, C. Song, and H. Wan, “Attention based spatial- temporal graph convolutional networks for traffic flow forecasting,” in Proc. AAAI Conf. Artificial Intelligence, vol. 33, no. 1, 2019, pp. 922– 929

  41. [41]

    GMAN: A graph multi- attention network for traffic prediction,

    C. Zheng, X. Fan, C. Wang, and J. Qi, “GMAN: A graph multi- attention network for traffic prediction,” inProc. AAAI Conf. Artificial Intelligence, vol. 34, no. 1, 2020, pp. 1234–1241

  42. [42]

    PDFormer: Propagation delay-aware dynamic long-range transformer for traffic flow prediction,

    J. Jiang, C. Han, W. X. Zhao, and J. Wang, “PDFormer: Propagation delay-aware dynamic long-range transformer for traffic flow prediction,” 15 inProc. AAAI Conf. Artificial Intelligence, vol. 37, no. 4, 2023, pp. 4365–4373, doi: 10.1609/aaai.v37i4.25556

  43. [43]

    Spatio-temporal adaptive embedding makes vanilla Transformer SOTA for traffic forecasting,

    H. Liu, Z. Dong, R. Jiang, J. Deng, J. Deng, Q. Chen, and X. Song, “Spatio-temporal adaptive embedding makes vanilla Transformer SOTA for traffic forecasting,” inProc. 32nd ACM Int. Conf. In- formation and Knowledge Management, 2023, pp. 4125–4129, doi: 10.1145/3583780.3615160

  44. [44]

    Efficient large-scale traffic forecasting with Transformers: A spatial data management perspective,

    Y . Fang, Y . Liang, B. Hui, Z. Shao, L. Deng, X. Liu, X. Jiang, and K. Zheng, “Efficient large-scale traffic forecasting with Transformers: A spatial data management perspective,” inProc. 31st ACM SIGKDD Conf. Knowledge Discovery and Data Mining, 2025, pp. 307–317, doi: 10.1145/3690624.3709177

  45. [45]

    Multitask learning,

    R. Caruana, “Multitask learning,”Machine Learning, vol. 28, no. 1, pp. 41–75, 1997, doi: 10.1023/A:1007379606734

  46. [46]

    2014 , booktitle =

    K. Cho, B. van Merri ¨enboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y . Bengio, “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” inProc. Conf. Empirical Methods in Natural Language Processing, 2014, pp. 1724– 1734, doi: 10.3115/v1/D14-1179

  47. [47]

    Graph attention networks,

    P. Veli ˇckovi´c, G. Cucurull, A. Casanova, A. Romero, P. Li `o, and Y . Bengio, “Graph attention networks,” inProc. Int. Conf. Learning Representations, 2018

  48. [48]

    Neural message passing for quantum chemistry,

    J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl, “Neural message passing for quantum chemistry,” inProc. 34th Int. Conf. Machine Learning, 2017, pp. 1263–1272

  49. [49]

    The Annals of Mathematical Statistics , author =

    P. J. Huber, “Robust estimation of a location parameter,”The Annals of Mathematical Statistics, vol. 35, no. 1, pp. 73–101, 1964, doi: 10.1214/aoms/1177703732

  50. [50]

    Habitat: A platform for embodied AI research,

    T.-Y . Lin, P. Goyal, R. Girshick, K. He, and P. Doll ´ar, “Focal loss for dense object detection,” inProc. IEEE Int. Conf. Computer Vision, 2017, pp. 2980–2988, doi: 10.1109/ICCV .2017.324. 16