pith. sign in

arxiv: 2602.00844 · v2 · submitted 2026-01-31 · 📊 stat.ML · cs.LG· stat.AP

Multivariate Time Series Data Imputation via Distributionally Robust Regularization

Pith reviewed 2026-05-16 08:39 UTC · model grok-4.3

classification 📊 stat.ML cs.LGstat.AP
keywords multivariate time seriesdata imputationdistributionally robust optimizationWasserstein distancemissing datanon-stationary datadeep learning
0
0 comments X

The pith

A Wasserstein-based robust objective reduces overfitting in multivariate time series imputation caused by non-stationarity and systematic missingness.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard imputation methods overfit to the biased observed data because non-stationary patterns and missing values create a mismatch with the true distribution. The paper introduces the Distributionally Robust Regularized Imputer Objective that adds a worst-case divergence penalty inside a Wasserstein ambiguity set. This forces the model to perform well even under the most adverse plausible shift in the data distribution. A tractable surrogate converts the problem to an adversarial search over trajectories, solved by alternating optimization that fits modern neural network backbones. Experiments across real datasets indicate the approach delivers steadier imputations and supports better forecasting on the completed series.

Core claim

The Distributionally Robust Regularized Imputer Objective jointly minimizes reconstruction error and the worst-case divergence between the imputer distribution and data distributions within a Wasserstein ambiguity set. A tractable upper-bound surrogate reduces the infinite-dimensional optimization over measures to an adversarial search over sample trajectories, solved by an alternating learning algorithm compatible with deep learning architectures.

What carries the argument

The DRIO objective, which augments standard reconstruction loss with a worst-case Wasserstein divergence term over an ambiguity set around the observed data.

If this is right

  • DRIO yields more stable imputation accuracy across varied missingness scenarios in real multivariate time series.
  • Completed series produced by DRIO support improved performance in downstream forecasting tasks.
  • The method integrates directly with existing deep learning time-series models without changing their architecture.
  • The surrogate bound converts the robust objective into a practical min-max training loop.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same worst-case regularization idea could extend to other sequential data tasks where partial observations induce distribution shift.
  • Synthetic benchmarks with explicitly constructed non-stationary shifts would isolate whether the Wasserstein set choice drives the gains.
  • Combining DRIO with uncertainty-aware forecasting models might produce end-to-end pipelines that remain reliable under heavy missingness.

Load-bearing premise

The chosen Wasserstein ambiguity set must correctly describe the distribution mismatch created by non-stationarity and missing values.

What would settle it

On a controlled dataset where the true complete distribution is known, if standard point-wise or alignment-based imputers achieve lower error than DRIO under the same missingness patterns, the advantage disappears.

Figures

Figures reproduced from arXiv: 2602.00844 by Che-Yi Liao, Gian-Gabriel Garcia, Kamran Paynabar, Zheng Dong.

Figure 1
Figure 1. Figure 1: Overview of the Multivariate Time Series Imputation framework under [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Downstream forecasting MSE using imputed data. Each box aggregates scenario-level [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Test-MSE gap between the deployable validation-MSE pick and the oracle test-MSE [PITH_FULL_IMAGE:figures/full_fig_p028_3.png] view at source ↗
read the original abstract

Multivariate time series imputation is often compromised by mismatch between the observed and true data distributions, a bias induced by the combined effects of time-series non-stationarity and systematic missingness. Standard methods that encourage point-wise reconstruction or direct distributional alignment may overfit these biased observations. We propose the Distributionally Robust Regularized Imputer Objective (DRIO), which jointly minimizes reconstruction error and the worst-case divergence between the imputer distribution and data distributions within a Wasserstein ambiguity set. We derive a tractable upper-bound surrogate that reduces infinite-dimensional optimization over measures to adversarial search over sample trajectories, and develop an alternating learning algorithm compatible with modern deep learning backbones. Comprehensive experiments on diverse real-world datasets show that DRIO consistently provides robust imputation and suggests improved downstream forecasting under various missingness scenarios.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes the Distributionally Robust Regularized Imputer Objective (DRIO) for multivariate time series imputation to address distribution mismatch induced by non-stationarity and systematic missingness. It jointly minimizes reconstruction error and worst-case divergence within a Wasserstein ambiguity set, derives a tractable upper-bound surrogate that reduces the infinite-dimensional problem to adversarial search over sample trajectories, and develops an alternating optimization algorithm compatible with deep learning models. Experiments on diverse real-world datasets are reported to show consistent robustness in imputation and improved downstream forecasting under various missingness scenarios.

Significance. If the central claims hold, the work offers a principled DRO-based approach to robust imputation that could mitigate overfitting to biased observations in non-stationary time series, with the tractable surrogate and deep-learning compatibility as notable practical strengths. This could influence methods for handling incomplete data in forecasting pipelines, provided the ambiguity-set construction and bound quality are validated.

major comments (2)
  1. [Derivation of surrogate] The derivation of the tractable upper-bound surrogate (as stated in the abstract) lacks explicit verification of bound tightness or empirical checks on approximation quality, which is load-bearing for the claimed robustness guarantees.
  2. [Ambiguity set construction] The Wasserstein ambiguity set is constructed around the observed empirical measure without an explicit correction for selection bias induced by the missingness mechanism in non-stationary series; this risks the true data-generating distribution lying outside the ball, undermining the min-max guarantee.
minor comments (2)
  1. [Experiments] Experimental results report gains without error bars, standard deviations, or multiple-run statistics, which would be needed to substantiate consistency claims.
  2. [Experiments] No ablation on the ambiguity-set radius is presented, leaving sensitivity to this key hyperparameter unexamined.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify key aspects of our work. We address each major comment below and indicate the revisions planned for the next version of the manuscript.

read point-by-point responses
  1. Referee: [Derivation of surrogate] The derivation of the tractable upper-bound surrogate (as stated in the abstract) lacks explicit verification of bound tightness or empirical checks on approximation quality, which is load-bearing for the claimed robustness guarantees.

    Authors: We agree that the manuscript would benefit from explicit verification of the surrogate bound. In the revision we will add a dedicated subsection deriving conditions under which the upper bound is tight (under Lipschitz continuity of the loss and bounded support assumptions) and include new synthetic-data experiments that compare the surrogate objective value against a Monte-Carlo estimate of the true min-max objective, thereby quantifying approximation error across different missingness rates. revision: yes

  2. Referee: [Ambiguity set construction] The Wasserstein ambiguity set is constructed around the observed empirical measure without an explicit correction for selection bias induced by the missingness mechanism in non-stationary series; this risks the true data-generating distribution lying outside the ball, undermining the min-max guarantee.

    Authors: The concern is well-taken. Our current construction follows the standard empirical-measure centering used in most DRO imputation literature, but it does not explicitly adjust for selection bias. In the revision we will (i) add a paragraph in Section 3.2 acknowledging this limitation and (ii) propose a simple re-weighting scheme based on an estimated missingness probability to recenter the empirical measure. We will also report additional experiments that vary the missingness mechanism (MCAR vs. MNAR) to illustrate the practical effect of this bias. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained mathematical reduction

full rationale

The paper defines DRIO as a joint min-max objective over reconstruction error and worst-case Wasserstein divergence, then derives a tractable upper-bound surrogate that converts the infinite-dimensional problem into adversarial search over sample trajectories. This reduction is presented as a direct consequence of the Wasserstein ball construction and duality arguments, without any fitted parameters being renamed as predictions or any load-bearing step collapsing to a self-citation. Experiments are reported separately on real datasets and do not feed back into the objective definition. No self-definitional loops, uniqueness theorems imported from prior author work, or ansatz smuggling are present in the derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard properties of the Wasserstein metric and the existence of a tractable upper bound; no new free parameters or invented entities are introduced in the abstract.

axioms (1)
  • standard math Wasserstein distance admits a tractable dual or adversarial representation that yields a finite-dimensional surrogate
    Invoked when reducing the infinite-dimensional DRO problem to adversarial search over trajectories

pith-pipeline@v0.9.0 · 5440 in / 1147 out tokens · 49000 ms · 2026-05-16T08:39:37.455251+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages

  1. [1]

    Multivariate short-term traffic flow forecasting using time-series analysis.IEEE transactions on intelligent transportation systems, 10(2):246–254, 2009

    Bidisha Ghosh, Biswajit Basu, and Margaret O’Mahony. Multivariate short-term traffic flow forecasting using time-series analysis.IEEE transactions on intelligent transportation systems, 10(2):246–254, 2009

  2. [2]

    Springer Science & Business Media, 2012

    Gebhard Kirchgässner, Jürgen Wolters, and Uwe Hassler.Introduction to modern time series analysis. Springer Science & Business Media, 2012

  3. [3]

    Princeton university press, 2020

    James D Hamilton.Time series analysis. Princeton university press, 2020

  4. [4]

    Yang Yang, Che-Yi Liao, Esmaeil Keyvanshokooh, Hui Shao, Mary Beth Weber, Francisco J Pasquel, and Gian-Gabriel P Garcia. A responsible framework for assessing, selecting, and explaining machine learning models in cardiovascular disease outcomes among people with type 2 diabetes: Methodology and validation study.JMIR Medical Informatics, 13:e66200, 2025

  5. [5]

    Development and evaluation of cardiovascular disease risk prediction models for patients with type 2 diabetes.Scientific Reports, 2026

    Yang Yang, Tian Liu, Che-Yi Liao, Sun Ju Lee, Esmaeil Keyvanshokooh, Hui Shao, Mary Beth Weber, Francisco J Pasquel, and Gian-Gabriel P Garcia. Development and evaluation of cardiovascular disease risk prediction models for patients with type 2 diabetes.Scientific Reports, 2026

  6. [6]

    Constraint-aware self- improving large language model for clinical role model generation.Available at SSRN 5642250, 2025

    Che-Yi Liao, Esmaeil Keyvanshokooh, and Gian-Gabriel Garcia. Constraint-aware self- improving large language model for clinical role model generation.Available at SSRN 5642250, 2025

  7. [7]

    A spatiotemporal approach for traffic data imputation with complicated missing patterns.Transportation research part C: emerging technologies, 119:102730, 2020

    Huiping Li, Meng Li, Xi Lin, Fang He, and Yinhai Wang. A spatiotemporal approach for traffic data imputation with complicated missing patterns.Transportation research part C: emerging technologies, 119:102730, 2020

  8. [8]

    Racial disparities in opioid overdose deaths in massachusetts.JAMA Network Open, 5(4):e229081, 2022

    Che-Yi Liao, Gian-Gabriel P Garcia, Catherine DiGennaro, and Mohammad S Jalali. Racial disparities in opioid overdose deaths in massachusetts.JAMA Network Open, 5(4):e229081, 2022

  9. [9]

    A survey on missing data in machine learning.Journal of Big data, 8(1):140, 2021

    Tlamelo Emmanuel, Thabiso Maupong, Dimane Mpoeleng, Thabo Semong, Banyatsang Mphago, and Oteng Tabona. A survey on missing data in machine learning.Journal of Big data, 8(1):140, 2021

  10. [10]

    Estimating hidden epidemic: A bayesian spatiotemporal compartmental modeling approach.INFORMS Journal on Data Science, 4(3):230–247, 2025

    Che-Yi Liao, Peiliang Bai, Lance A Waller, and Kamran Paynabar. Estimating hidden epidemic: A bayesian spatiotemporal compartmental modeling approach.INFORMS Journal on Data Science, 4(3):230–247, 2025

  11. [11]

    Missing data in non-stationary multivariate time series from digital studies in psychiatry.arXiv preprint arXiv:2506.14946, 2025

    Xiaoxuan Cai, Charlotte R Fowler, Li Zeng, Habiballah Rahimi Eichi, Dost Ongur, Lisa Dixon, Justin T Baker, Jukka-Pekka Onnela, and Linda Valeri. Missing data in non-stationary multivariate time series from digital studies in psychiatry.arXiv preprint arXiv:2506.14946, 2025

  12. [12]

    Springer Science & Business Media, 2012

    Moamar Sayed-Mouchaweh and Edwin Lughofer.Learning in non-stationary environments: methods and applications. Springer Science & Business Media, 2012

  13. [13]

    Time series forecasting for nonlinear and non-stationary processes: a review and comparative study.Iie Transactions, 47(10):1053–1071, 2015

    Changqing Cheng, Akkarapol Sa-Ngasoongsong, Omer Beyca, Trung Le, Hui Yang, Zhenyu Kong, and Satish TS Bukkapatnam. Time series forecasting for nonlinear and non-stationary processes: a review and comparative study.Iie Transactions, 47(10):1053–1071, 2015

  14. [14]

    Learning in nonstationary environments: A survey.IEEE Computational intelligence magazine, 10(4):12–25, 2015

    Gregory Ditzler, Manuel Roveri, Cesare Alippi, and Robi Polikar. Learning in nonstationary environments: A survey.IEEE Computational intelligence magazine, 10(4):12–25, 2015

  15. [15]

    Che-Yi Liao, Zheng Dong, Gian-Gabriel P Garcia, Kamran Paynabar, Yao Xie, and Moham- mad S Jalali. Tides need stemmed: A locally operating spatiotemporal mutually exciting point process with dynamic network for improving opioid overdose death prediction.Manufacturing & Service Operations Management, 28(2):577–593, 2026

  16. [16]

    Ankit Dixit and Shikha Jain. Contemporary approaches to analyze non-stationary time-series: Some solutions and challenges.Recent Advances in Computer Science and Communications (Formerly: Recent Patents on Computer Science), 16(2):61–80, 2023. 10

  17. [17]

    On the sample complexity of graphical model selection from non-stationary samples.IEEE Transactions on Signal Processing, 68: 17–32, 2019

    Nguyen Tran, Oleksii Abramenko, and Alexander Jung. On the sample complexity of graphical model selection from non-stationary samples.IEEE Transactions on Signal Processing, 68: 17–32, 2019

  18. [18]

    Balancing access, precision, and equity in adaptive test site allocation with an application to covid-19 in atlanta, georgia.Scientific Reports, 15(1):32775, 2025

    Thomas W Hsiao, Che-Yi Liao, Lance A Waller, and Kamran Paynabar. Balancing access, precision, and equity in adaptive test site allocation with an application to covid-19 in atlanta, georgia.Scientific Reports, 15(1):32775, 2025

  19. [19]

    Trouble in the tails: Earnings non-response and response bias across the distribution

    Christopher R Bollinger, Barry T Hirsch, Charles M Hokayem, and James P Ziliak. Trouble in the tails: Earnings non-response and response bias across the distribution. InAnnual Meeting of the Society of Labor Economists. http://citeseerx. ist. psu. edu/viewdoc/download, 2014

  20. [20]

    Missing data, part 2

    Tra My Pham, Nikolaos Pandis, and Ian R White. Missing data, part 2. missing data mechanisms: Missing completely at random, missing at random, missing not at random, and why they matter. American journal of orthodontics and dentofacial orthopedics, 162(1):138–139, 2022

  21. [21]

    Missing data recovery methods on multivariate time series in iot: A comprehensive survey.IEEE Communications Surveys & Tutorials, 2025

    Kai Zhang, Qinmin Yang, Chao Li, Xin Sun, and Jiming Chen. Missing data recovery methods on multivariate time series in iot: A comprehensive survey.IEEE Communications Surveys & Tutorials, 2025

  22. [22]

    Missing data imputation of high-resolution temporal climate time series data.Meteorological Applications, 27(1):e1873, 2020

    Eben Afrifa-Yamoah, Ute A Mueller, Stephen M Taylor, and Aiden J Fisher. Missing data imputation of high-resolution temporal climate time series data.Meteorological Applications, 27(1):e1873, 2020

  23. [23]

    Brits: Bidirectional recurrent imputation for time series.Advances in neural information processing systems, 31, 2018

    Wei Cao, Dong Wang, Jian Li, Hao Zhou, Lei Li, and Yitan Li. Brits: Bidirectional recurrent imputation for time series.Advances in neural information processing systems, 31, 2018

  24. [24]

    Bidi- rectional spatial–temporal traffic data imputation via graph attention recurrent neural network

    Guojiang Shen, Wenfeng Zhou, Wenyi Zhang, Nali Liu, Zhi Liu, and Xiangjie Kong. Bidi- rectional spatial–temporal traffic data imputation via graph attention recurrent neural network. Neurocomputing, 531:151–162, 2023

  25. [25]

    Miwae: Deep generative modelling and imputation of incomplete data sets

    Pierre-Alexandre Mattei and Jes Frellsen. Miwae: Deep generative modelling and imputation of incomplete data sets. InInternational conference on machine learning, pages 4413–4423. PMLR, 2019

  26. [26]

    not-miwae: Deep generative modelling with missing not at random data

    Niels Bruun Ipsen, Pierre-Alexandre Mattei, and Jes Frellsen. not-miwae: Deep generative modelling with missing not at random data.arXiv preprint arXiv:2006.12871, 2020

  27. [27]

    Missing data imputation using optimal transport

    Boris Muzellec, Julie Josse, Claire Boyer, and Marco Cuturi. Missing data imputation using optimal transport. InInternational Conference on Machine Learning, pages 7130–7140. PMLR, 2020

  28. [28]

    Glima: Global and local time series imputation with multi-directional attention learning

    Qiuling Suo, Weida Zhong, Guangxu Xun, Jianhui Sun, Changyou Chen, and Aidong Zhang. Glima: Global and local time series imputation with multi-directional attention learning. In 2020 IEEE International Conference on Big Data (Big Data), pages 798–807. IEEE, 2020

  29. [29]

    Remian: real-time and error-tolerant missing value imputation.ACM Transactions on Knowledge Discovery from Data (TKDD), 14(6):1–38, 2020

    Qian Ma, Yu Gu, Wang-Chien Lee, Ge Yu, Hongbo Liu, and Xindong Wu. Remian: real-time and error-tolerant missing value imputation.ACM Transactions on Knowledge Discovery from Data (TKDD), 14(6):1–38, 2020

  30. [30]

    Spatial-temporal traffic data imputation via graph attention convolutional network

    Yongchao Ye, Shiyao Zhang, and James JQ Yu. Spatial-temporal traffic data imputation via graph attention convolutional network. InInternational Conference on artificial neural networks, pages 241–252. Springer, 2021

  31. [31]

    Learning to reconstruct missing data from spatiotemporal graphs with sparse observations.Advances in neural information processing systems, 35:32069–32082, 2022

    Ivan Marisca, Andrea Cini, and Cesare Alippi. Learning to reconstruct missing data from spatiotemporal graphs with sparse observations.Advances in neural information processing systems, 35:32069–32082, 2022

  32. [32]

    Saits: Self-attention-based imputation for time series

    Wenjie Du, David Côté, and Yan Liu. Saits: Self-attention-based imputation for time series. Expert Systems with Applications, 219:119619, 2023

  33. [33]

    Imputeformer: Low rankness- induced transformers for generalizable spatiotemporal imputation

    Tong Nie, Guoyang Qin, Wei Ma, Yuewen Mei, and Jian Sun. Imputeformer: Low rankness- induced transformers for generalizable spatiotemporal imputation. InProceedings of the 30th ACM SIGKDD conference on knowledge discovery and data mining, pages 2260–2271, 2024. 11

  34. [34]

    Gain: Missing data imputation using generative adversarial nets

    Jinsung Yoon, James Jordon, and Mihaela Schaar. Gain: Missing data imputation using generative adversarial nets. InInternational conference on machine learning, pages 5689–5698. PMLR, 2018

  35. [35]

    Variational auto-encoders based on the shift correction for imputation of specific missing in multivariate time series.Measurement, 186:110055, 2021

    Jie Li, Weijie Ren, and Min Han. Variational auto-encoders based on the shift correction for imputation of specific missing in multivariate time series.Measurement, 186:110055, 2021. doi: 10.1016/j.measurement.2021.110055. URL https://www.sciencedirect.com/scie nce/article/pii/S0263224121009805

  36. [36]

    Csdi: Conditional score-based diffusion models for probabilistic time series imputation.Advances in neural information processing systems, 34:24804–24816, 2021

    Yusuke Tashiro, Jiaming Song, Yang Song, and Stefano Ermon. Csdi: Conditional score-based diffusion models for probabilistic time series imputation.Advances in neural information processing systems, 34:24804–24816, 2021

  37. [37]

    Diffusion-based time series imputa- tion and forecasting with structured state space models,

    Juan Miguel Lopez Alcaraz and Nils Strodthoff. Diffusion-based time series imputation and forecasting with structured state space models.arXiv preprint arXiv:2208.09399, 2022

  38. [38]

    Diffimp: Efficient diffusion model for probabilistic time series imputation with bidirectional mamba backbone.arXiv preprint arXiv:2410.13338, 2024

    Hongfan Gao, Wangmeng Shen, Xiangfei Qiu, Ronghui Xu, Jilin Hu, and Bin Yang. Diffimp: Efficient diffusion model for probabilistic time series imputation with bidirectional mamba backbone.arXiv preprint arXiv:2410.13338, 2024

  39. [39]

    Shuo-Chieh Huang, Tengyuan Liang, and Ruey S. Tsay. Temporal wasserstein imputation: A versatile method for time series imputation, 2025. URL https://arxiv.org/abs/2411.0 2811

  40. [40]

    Optimal transport for time series imputation

    Hao Wang, Haoxuan Li, Xu Chen, Mingming Gong, Zhichao Chen, et al. Optimal transport for time series imputation. InThe Thirteenth International Conference on Learning Representations, 2025

  41. [41]

    Sinkhorn divergences for unbalanced optimal transport.arXiv preprint arXiv:1910.12958, 2019

    Thibault Séjourné, Jean Feydy, François-Xavier Vialard, Alain Trouvé, and Gabriel Peyré. Sinkhorn divergences for unbalanced optimal transport.arXiv preprint arXiv:1910.12958, 2019

  42. [42]

    Generative modeling through the semi- dual formulation of unbalanced optimal transport.Advances in Neural Information Processing Systems, 36:42433–42455, 2023

    Jaemoo Choi, Jaewoong Choi, and Myungjoo Kang. Generative modeling through the semi- dual formulation of unbalanced optimal transport.Advances in Neural Information Processing Systems, 36:42433–42455, 2023

  43. [43]

    Springer, 2008

    Cédric Villani et al.Optimal transport: old and new, volume 338. Springer, 2008

  44. [44]

    Sinkhorn distributionally robust optimization.Operations Research, 2025

    Jie Wang, Rui Gao, and Yao Xie. Sinkhorn distributionally robust optimization.Operations Research, 2025

  45. [45]

    Miscellaneous notes on optimization theory and related topics.Report, Cal- tech.[0915], 2015

    Kim C Border. Miscellaneous notes on optimization theory and related topics.Report, Cal- tech.[0915], 2015

  46. [46]

    Cnnpred: Cnn-based stock market prediction using a diverse set of variables.Expert Systems with Applications, 129:273–285, 2019

    Ehsan Hoseinzade and Saman Haratizadeh. Cnnpred: Cnn-based stock market prediction using a diverse set of variables.Expert Systems with Applications, 129:273–285, 2019

  47. [47]

    Attention based spatial- temporal graph convolutional networks for traffic flow forecasting.Proceedings of the AAAI Conference on Artificial Intelligence, 33(01):922–929, Jul

    Shengnan Guo, Youfang Lin, Ning Feng, Chao Song, and Huaiyu Wan. Attention based spatial- temporal graph convolutional networks for traffic flow forecasting.Proceedings of the AAAI Conference on Artificial Intelligence, 33(01):922–929, Jul. 2019. doi: 10.1609/aaai.v33i01.33 01922. URLhttps://ojs.aaai.org/index.php/AAAI/article/view/3881

  48. [48]

    Assessing beijing’s pm2

    Xuan Liang, Tao Zou, Bin Guo, Shuo Li, Haozhe Zhang, Shuyi Zhang, Hui Huang, and Song Xi Chen. Assessing beijing’s pm2. 5 pollution: severity, weather impact, apec and winter heating. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 471 (2182):20150257, 2015

  49. [49]

    Bioinspired early detection through gas flow modulation in chemo-sensory systems.Sensors and Actuators B: Chemical, 206:538–547, 2015

    Andrey Ziyatdinov, Jordi Fonollosa, Luis Fernández, Agustín Gutierrez-Gálvez, Santiago Marco, and Alexandre Perera. Bioinspired early detection through gas flow modulation in chemo-sensory systems.Sensors and Actuators B: Chemical, 206:538–547, 2015

  50. [50]

    Anguita, Alessandro Ghio, L

    D. Anguita, Alessandro Ghio, L. Oneto, Xavier Parra, and Jorge Luis Reyes-Ortiz. A public domain dataset for human activity recognition using smartphones. InThe European Symposium on Artificial Neural Networks, 2013. URL https://api.semanticscholar.org/Corpus ID:6975432. 12

  51. [51]

    A kernel two-sample test.The journal of machine learning research, 13(1):723–773, 2012

    Arthur Gretton, Karsten M Borgwardt, Malte J Rasch, Bernhard Schölkopf, and Alexander Smola. A kernel two-sample test.The journal of machine learning research, 13(1):723–773, 2012

  52. [52]

    Generative moment matching networks

    Yujia Li, Kevin Swersky, and Rich Zemel. Generative moment matching networks. InInterna- tional conference on machine learning, pages 1718–1727. PMLR, 2015

  53. [53]

    Generative models and model criticism via optimized maximum mean discrepancy.arXiv preprint arXiv:1611.04488, 2016

    Danica J Sutherland, Hsiao-Yu Tung, Heiko Strathmann, Soumyajit De, Aaditya Ramdas, Alex Smola, and Arthur Gretton. Generative models and model criticism via optimized maximum mean discrepancy.arXiv preprint arXiv:1611.04488, 2016

  54. [54]

    The wasserstein-fourier distance for stationary time series.IEEE Transactions on Signal Processing, 69:709–721, 2020

    Elsa Cazelles, Arnaud Robert, and Felipe Tobar. The wasserstein-fourier distance for stationary time series.IEEE Transactions on Signal Processing, 69:709–721, 2020

  55. [55]

    Certifying some distributional robustness with principled adversarial training.arXiv preprint arXiv:1710.10571, 2017

    Aman Sinha, Hongseok Namkoong, Riccardo V olpi, and John Duchi. Certifying some dis- tributional robustness with principled adversarial training.arXiv preprint arXiv:1710.10571, 2017. 13 A Loss Function Construction We provide a comprehensive discussion on the Sinkhorn divergence used in our formulation (2). Let Z=R D×T denote the feature-temporal space o...

  56. [56]

    Balanced

    and structure all datasets as three-dimensional tensors of shape (N, T, D) representing samples, time steps, and features, respectively. Note that the exchange of temporal and feature dimensions does not affect our theory and algorithm as one just needs to swap the indices during computation. CNNpred [46].UCI stock market data combining 5 US indices (S&P ...