Multivariate Time Series Data Imputation via Distributionally Robust Regularization
Pith reviewed 2026-05-16 08:39 UTC · model grok-4.3
The pith
A Wasserstein-based robust objective reduces overfitting in multivariate time series imputation caused by non-stationarity and systematic missingness.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The Distributionally Robust Regularized Imputer Objective jointly minimizes reconstruction error and the worst-case divergence between the imputer distribution and data distributions within a Wasserstein ambiguity set. A tractable upper-bound surrogate reduces the infinite-dimensional optimization over measures to an adversarial search over sample trajectories, solved by an alternating learning algorithm compatible with deep learning architectures.
What carries the argument
The DRIO objective, which augments standard reconstruction loss with a worst-case Wasserstein divergence term over an ambiguity set around the observed data.
If this is right
- DRIO yields more stable imputation accuracy across varied missingness scenarios in real multivariate time series.
- Completed series produced by DRIO support improved performance in downstream forecasting tasks.
- The method integrates directly with existing deep learning time-series models without changing their architecture.
- The surrogate bound converts the robust objective into a practical min-max training loop.
Where Pith is reading between the lines
- The same worst-case regularization idea could extend to other sequential data tasks where partial observations induce distribution shift.
- Synthetic benchmarks with explicitly constructed non-stationary shifts would isolate whether the Wasserstein set choice drives the gains.
- Combining DRIO with uncertainty-aware forecasting models might produce end-to-end pipelines that remain reliable under heavy missingness.
Load-bearing premise
The chosen Wasserstein ambiguity set must correctly describe the distribution mismatch created by non-stationarity and missing values.
What would settle it
On a controlled dataset where the true complete distribution is known, if standard point-wise or alignment-based imputers achieve lower error than DRIO under the same missingness patterns, the advantage disappears.
Figures
read the original abstract
Multivariate time series imputation is often compromised by mismatch between the observed and true data distributions, a bias induced by the combined effects of time-series non-stationarity and systematic missingness. Standard methods that encourage point-wise reconstruction or direct distributional alignment may overfit these biased observations. We propose the Distributionally Robust Regularized Imputer Objective (DRIO), which jointly minimizes reconstruction error and the worst-case divergence between the imputer distribution and data distributions within a Wasserstein ambiguity set. We derive a tractable upper-bound surrogate that reduces infinite-dimensional optimization over measures to adversarial search over sample trajectories, and develop an alternating learning algorithm compatible with modern deep learning backbones. Comprehensive experiments on diverse real-world datasets show that DRIO consistently provides robust imputation and suggests improved downstream forecasting under various missingness scenarios.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes the Distributionally Robust Regularized Imputer Objective (DRIO) for multivariate time series imputation to address distribution mismatch induced by non-stationarity and systematic missingness. It jointly minimizes reconstruction error and worst-case divergence within a Wasserstein ambiguity set, derives a tractable upper-bound surrogate that reduces the infinite-dimensional problem to adversarial search over sample trajectories, and develops an alternating optimization algorithm compatible with deep learning models. Experiments on diverse real-world datasets are reported to show consistent robustness in imputation and improved downstream forecasting under various missingness scenarios.
Significance. If the central claims hold, the work offers a principled DRO-based approach to robust imputation that could mitigate overfitting to biased observations in non-stationary time series, with the tractable surrogate and deep-learning compatibility as notable practical strengths. This could influence methods for handling incomplete data in forecasting pipelines, provided the ambiguity-set construction and bound quality are validated.
major comments (2)
- [Derivation of surrogate] The derivation of the tractable upper-bound surrogate (as stated in the abstract) lacks explicit verification of bound tightness or empirical checks on approximation quality, which is load-bearing for the claimed robustness guarantees.
- [Ambiguity set construction] The Wasserstein ambiguity set is constructed around the observed empirical measure without an explicit correction for selection bias induced by the missingness mechanism in non-stationary series; this risks the true data-generating distribution lying outside the ball, undermining the min-max guarantee.
minor comments (2)
- [Experiments] Experimental results report gains without error bars, standard deviations, or multiple-run statistics, which would be needed to substantiate consistency claims.
- [Experiments] No ablation on the ambiguity-set radius is presented, leaving sensitivity to this key hyperparameter unexamined.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which help clarify key aspects of our work. We address each major comment below and indicate the revisions planned for the next version of the manuscript.
read point-by-point responses
-
Referee: [Derivation of surrogate] The derivation of the tractable upper-bound surrogate (as stated in the abstract) lacks explicit verification of bound tightness or empirical checks on approximation quality, which is load-bearing for the claimed robustness guarantees.
Authors: We agree that the manuscript would benefit from explicit verification of the surrogate bound. In the revision we will add a dedicated subsection deriving conditions under which the upper bound is tight (under Lipschitz continuity of the loss and bounded support assumptions) and include new synthetic-data experiments that compare the surrogate objective value against a Monte-Carlo estimate of the true min-max objective, thereby quantifying approximation error across different missingness rates. revision: yes
-
Referee: [Ambiguity set construction] The Wasserstein ambiguity set is constructed around the observed empirical measure without an explicit correction for selection bias induced by the missingness mechanism in non-stationary series; this risks the true data-generating distribution lying outside the ball, undermining the min-max guarantee.
Authors: The concern is well-taken. Our current construction follows the standard empirical-measure centering used in most DRO imputation literature, but it does not explicitly adjust for selection bias. In the revision we will (i) add a paragraph in Section 3.2 acknowledging this limitation and (ii) propose a simple re-weighting scheme based on an estimated missingness probability to recenter the empirical measure. We will also report additional experiments that vary the missingness mechanism (MCAR vs. MNAR) to illustrate the practical effect of this bias. revision: partial
Circularity Check
No significant circularity; derivation is self-contained mathematical reduction
full rationale
The paper defines DRIO as a joint min-max objective over reconstruction error and worst-case Wasserstein divergence, then derives a tractable upper-bound surrogate that converts the infinite-dimensional problem into adversarial search over sample trajectories. This reduction is presented as a direct consequence of the Wasserstein ball construction and duality arguments, without any fitted parameters being renamed as predictions or any load-bearing step collapsing to a self-citation. Experiments are reported separately on real datasets and do not feed back into the objective definition. No self-definitional loops, uniqueness theorems imported from prior author work, or ansatz smuggling are present in the derivation chain.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Wasserstein distance admits a tractable dual or adversarial representation that yields a finite-dimensional surrogate
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
DRIO objective: min_θ α R_θ + (1-α) sup_{Q∈B_ρ(bP_N)} S_{ε,τ}(Q, bP_θ) with Wasserstein ball and Sinkhorn surrogate
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Ambiguity set B_ρ(bP_N) and adversarial trajectories Z
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Bidisha Ghosh, Biswajit Basu, and Margaret O’Mahony. Multivariate short-term traffic flow forecasting using time-series analysis.IEEE transactions on intelligent transportation systems, 10(2):246–254, 2009
work page 2009
-
[2]
Springer Science & Business Media, 2012
Gebhard Kirchgässner, Jürgen Wolters, and Uwe Hassler.Introduction to modern time series analysis. Springer Science & Business Media, 2012
work page 2012
-
[3]
Princeton university press, 2020
James D Hamilton.Time series analysis. Princeton university press, 2020
work page 2020
-
[4]
Yang Yang, Che-Yi Liao, Esmaeil Keyvanshokooh, Hui Shao, Mary Beth Weber, Francisco J Pasquel, and Gian-Gabriel P Garcia. A responsible framework for assessing, selecting, and explaining machine learning models in cardiovascular disease outcomes among people with type 2 diabetes: Methodology and validation study.JMIR Medical Informatics, 13:e66200, 2025
work page 2025
-
[5]
Yang Yang, Tian Liu, Che-Yi Liao, Sun Ju Lee, Esmaeil Keyvanshokooh, Hui Shao, Mary Beth Weber, Francisco J Pasquel, and Gian-Gabriel P Garcia. Development and evaluation of cardiovascular disease risk prediction models for patients with type 2 diabetes.Scientific Reports, 2026
work page 2026
-
[6]
Che-Yi Liao, Esmaeil Keyvanshokooh, and Gian-Gabriel Garcia. Constraint-aware self- improving large language model for clinical role model generation.Available at SSRN 5642250, 2025
work page 2025
-
[7]
Huiping Li, Meng Li, Xi Lin, Fang He, and Yinhai Wang. A spatiotemporal approach for traffic data imputation with complicated missing patterns.Transportation research part C: emerging technologies, 119:102730, 2020
work page 2020
-
[8]
Racial disparities in opioid overdose deaths in massachusetts.JAMA Network Open, 5(4):e229081, 2022
Che-Yi Liao, Gian-Gabriel P Garcia, Catherine DiGennaro, and Mohammad S Jalali. Racial disparities in opioid overdose deaths in massachusetts.JAMA Network Open, 5(4):e229081, 2022
work page 2022
-
[9]
A survey on missing data in machine learning.Journal of Big data, 8(1):140, 2021
Tlamelo Emmanuel, Thabiso Maupong, Dimane Mpoeleng, Thabo Semong, Banyatsang Mphago, and Oteng Tabona. A survey on missing data in machine learning.Journal of Big data, 8(1):140, 2021
work page 2021
-
[10]
Che-Yi Liao, Peiliang Bai, Lance A Waller, and Kamran Paynabar. Estimating hidden epidemic: A bayesian spatiotemporal compartmental modeling approach.INFORMS Journal on Data Science, 4(3):230–247, 2025
work page 2025
-
[11]
Xiaoxuan Cai, Charlotte R Fowler, Li Zeng, Habiballah Rahimi Eichi, Dost Ongur, Lisa Dixon, Justin T Baker, Jukka-Pekka Onnela, and Linda Valeri. Missing data in non-stationary multivariate time series from digital studies in psychiatry.arXiv preprint arXiv:2506.14946, 2025
-
[12]
Springer Science & Business Media, 2012
Moamar Sayed-Mouchaweh and Edwin Lughofer.Learning in non-stationary environments: methods and applications. Springer Science & Business Media, 2012
work page 2012
-
[13]
Changqing Cheng, Akkarapol Sa-Ngasoongsong, Omer Beyca, Trung Le, Hui Yang, Zhenyu Kong, and Satish TS Bukkapatnam. Time series forecasting for nonlinear and non-stationary processes: a review and comparative study.Iie Transactions, 47(10):1053–1071, 2015
work page 2015
-
[14]
Gregory Ditzler, Manuel Roveri, Cesare Alippi, and Robi Polikar. Learning in nonstationary environments: A survey.IEEE Computational intelligence magazine, 10(4):12–25, 2015
work page 2015
-
[15]
Che-Yi Liao, Zheng Dong, Gian-Gabriel P Garcia, Kamran Paynabar, Yao Xie, and Moham- mad S Jalali. Tides need stemmed: A locally operating spatiotemporal mutually exciting point process with dynamic network for improving opioid overdose death prediction.Manufacturing & Service Operations Management, 28(2):577–593, 2026
work page 2026
-
[16]
Ankit Dixit and Shikha Jain. Contemporary approaches to analyze non-stationary time-series: Some solutions and challenges.Recent Advances in Computer Science and Communications (Formerly: Recent Patents on Computer Science), 16(2):61–80, 2023. 10
work page 2023
-
[17]
Nguyen Tran, Oleksii Abramenko, and Alexander Jung. On the sample complexity of graphical model selection from non-stationary samples.IEEE Transactions on Signal Processing, 68: 17–32, 2019
work page 2019
-
[18]
Thomas W Hsiao, Che-Yi Liao, Lance A Waller, and Kamran Paynabar. Balancing access, precision, and equity in adaptive test site allocation with an application to covid-19 in atlanta, georgia.Scientific Reports, 15(1):32775, 2025
work page 2025
-
[19]
Trouble in the tails: Earnings non-response and response bias across the distribution
Christopher R Bollinger, Barry T Hirsch, Charles M Hokayem, and James P Ziliak. Trouble in the tails: Earnings non-response and response bias across the distribution. InAnnual Meeting of the Society of Labor Economists. http://citeseerx. ist. psu. edu/viewdoc/download, 2014
work page 2014
-
[20]
Tra My Pham, Nikolaos Pandis, and Ian R White. Missing data, part 2. missing data mechanisms: Missing completely at random, missing at random, missing not at random, and why they matter. American journal of orthodontics and dentofacial orthopedics, 162(1):138–139, 2022
work page 2022
-
[21]
Kai Zhang, Qinmin Yang, Chao Li, Xin Sun, and Jiming Chen. Missing data recovery methods on multivariate time series in iot: A comprehensive survey.IEEE Communications Surveys & Tutorials, 2025
work page 2025
-
[22]
Eben Afrifa-Yamoah, Ute A Mueller, Stephen M Taylor, and Aiden J Fisher. Missing data imputation of high-resolution temporal climate time series data.Meteorological Applications, 27(1):e1873, 2020
work page 2020
-
[23]
Wei Cao, Dong Wang, Jian Li, Hao Zhou, Lei Li, and Yitan Li. Brits: Bidirectional recurrent imputation for time series.Advances in neural information processing systems, 31, 2018
work page 2018
-
[24]
Guojiang Shen, Wenfeng Zhou, Wenyi Zhang, Nali Liu, Zhi Liu, and Xiangjie Kong. Bidi- rectional spatial–temporal traffic data imputation via graph attention recurrent neural network. Neurocomputing, 531:151–162, 2023
work page 2023
-
[25]
Miwae: Deep generative modelling and imputation of incomplete data sets
Pierre-Alexandre Mattei and Jes Frellsen. Miwae: Deep generative modelling and imputation of incomplete data sets. InInternational conference on machine learning, pages 4413–4423. PMLR, 2019
work page 2019
-
[26]
not-miwae: Deep generative modelling with missing not at random data
Niels Bruun Ipsen, Pierre-Alexandre Mattei, and Jes Frellsen. not-miwae: Deep generative modelling with missing not at random data.arXiv preprint arXiv:2006.12871, 2020
-
[27]
Missing data imputation using optimal transport
Boris Muzellec, Julie Josse, Claire Boyer, and Marco Cuturi. Missing data imputation using optimal transport. InInternational Conference on Machine Learning, pages 7130–7140. PMLR, 2020
work page 2020
-
[28]
Glima: Global and local time series imputation with multi-directional attention learning
Qiuling Suo, Weida Zhong, Guangxu Xun, Jianhui Sun, Changyou Chen, and Aidong Zhang. Glima: Global and local time series imputation with multi-directional attention learning. In 2020 IEEE International Conference on Big Data (Big Data), pages 798–807. IEEE, 2020
work page 2020
-
[29]
Qian Ma, Yu Gu, Wang-Chien Lee, Ge Yu, Hongbo Liu, and Xindong Wu. Remian: real-time and error-tolerant missing value imputation.ACM Transactions on Knowledge Discovery from Data (TKDD), 14(6):1–38, 2020
work page 2020
-
[30]
Spatial-temporal traffic data imputation via graph attention convolutional network
Yongchao Ye, Shiyao Zhang, and James JQ Yu. Spatial-temporal traffic data imputation via graph attention convolutional network. InInternational Conference on artificial neural networks, pages 241–252. Springer, 2021
work page 2021
-
[31]
Ivan Marisca, Andrea Cini, and Cesare Alippi. Learning to reconstruct missing data from spatiotemporal graphs with sparse observations.Advances in neural information processing systems, 35:32069–32082, 2022
work page 2022
-
[32]
Saits: Self-attention-based imputation for time series
Wenjie Du, David Côté, and Yan Liu. Saits: Self-attention-based imputation for time series. Expert Systems with Applications, 219:119619, 2023
work page 2023
-
[33]
Imputeformer: Low rankness- induced transformers for generalizable spatiotemporal imputation
Tong Nie, Guoyang Qin, Wei Ma, Yuewen Mei, and Jian Sun. Imputeformer: Low rankness- induced transformers for generalizable spatiotemporal imputation. InProceedings of the 30th ACM SIGKDD conference on knowledge discovery and data mining, pages 2260–2271, 2024. 11
work page 2024
-
[34]
Gain: Missing data imputation using generative adversarial nets
Jinsung Yoon, James Jordon, and Mihaela Schaar. Gain: Missing data imputation using generative adversarial nets. InInternational conference on machine learning, pages 5689–5698. PMLR, 2018
work page 2018
-
[35]
Jie Li, Weijie Ren, and Min Han. Variational auto-encoders based on the shift correction for imputation of specific missing in multivariate time series.Measurement, 186:110055, 2021. doi: 10.1016/j.measurement.2021.110055. URL https://www.sciencedirect.com/scie nce/article/pii/S0263224121009805
-
[36]
Yusuke Tashiro, Jiaming Song, Yang Song, and Stefano Ermon. Csdi: Conditional score-based diffusion models for probabilistic time series imputation.Advances in neural information processing systems, 34:24804–24816, 2021
work page 2021
-
[37]
Diffusion-based time series imputa- tion and forecasting with structured state space models,
Juan Miguel Lopez Alcaraz and Nils Strodthoff. Diffusion-based time series imputation and forecasting with structured state space models.arXiv preprint arXiv:2208.09399, 2022
-
[38]
Hongfan Gao, Wangmeng Shen, Xiangfei Qiu, Ronghui Xu, Jilin Hu, and Bin Yang. Diffimp: Efficient diffusion model for probabilistic time series imputation with bidirectional mamba backbone.arXiv preprint arXiv:2410.13338, 2024
-
[39]
Shuo-Chieh Huang, Tengyuan Liang, and Ruey S. Tsay. Temporal wasserstein imputation: A versatile method for time series imputation, 2025. URL https://arxiv.org/abs/2411.0 2811
work page 2025
-
[40]
Optimal transport for time series imputation
Hao Wang, Haoxuan Li, Xu Chen, Mingming Gong, Zhichao Chen, et al. Optimal transport for time series imputation. InThe Thirteenth International Conference on Learning Representations, 2025
work page 2025
-
[41]
Sinkhorn divergences for unbalanced optimal transport.arXiv preprint arXiv:1910.12958, 2019
Thibault Séjourné, Jean Feydy, François-Xavier Vialard, Alain Trouvé, and Gabriel Peyré. Sinkhorn divergences for unbalanced optimal transport.arXiv preprint arXiv:1910.12958, 2019
-
[42]
Jaemoo Choi, Jaewoong Choi, and Myungjoo Kang. Generative modeling through the semi- dual formulation of unbalanced optimal transport.Advances in Neural Information Processing Systems, 36:42433–42455, 2023
work page 2023
-
[43]
Cédric Villani et al.Optimal transport: old and new, volume 338. Springer, 2008
work page 2008
-
[44]
Sinkhorn distributionally robust optimization.Operations Research, 2025
Jie Wang, Rui Gao, and Yao Xie. Sinkhorn distributionally robust optimization.Operations Research, 2025
work page 2025
-
[45]
Miscellaneous notes on optimization theory and related topics.Report, Cal- tech.[0915], 2015
Kim C Border. Miscellaneous notes on optimization theory and related topics.Report, Cal- tech.[0915], 2015
work page 2015
-
[46]
Ehsan Hoseinzade and Saman Haratizadeh. Cnnpred: Cnn-based stock market prediction using a diverse set of variables.Expert Systems with Applications, 129:273–285, 2019
work page 2019
-
[47]
Shengnan Guo, Youfang Lin, Ning Feng, Chao Song, and Huaiyu Wan. Attention based spatial- temporal graph convolutional networks for traffic flow forecasting.Proceedings of the AAAI Conference on Artificial Intelligence, 33(01):922–929, Jul. 2019. doi: 10.1609/aaai.v33i01.33 01922. URLhttps://ojs.aaai.org/index.php/AAAI/article/view/3881
-
[48]
Xuan Liang, Tao Zou, Bin Guo, Shuo Li, Haozhe Zhang, Shuyi Zhang, Hui Huang, and Song Xi Chen. Assessing beijing’s pm2. 5 pollution: severity, weather impact, apec and winter heating. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 471 (2182):20150257, 2015
work page 2015
-
[49]
Andrey Ziyatdinov, Jordi Fonollosa, Luis Fernández, Agustín Gutierrez-Gálvez, Santiago Marco, and Alexandre Perera. Bioinspired early detection through gas flow modulation in chemo-sensory systems.Sensors and Actuators B: Chemical, 206:538–547, 2015
work page 2015
-
[50]
D. Anguita, Alessandro Ghio, L. Oneto, Xavier Parra, and Jorge Luis Reyes-Ortiz. A public domain dataset for human activity recognition using smartphones. InThe European Symposium on Artificial Neural Networks, 2013. URL https://api.semanticscholar.org/Corpus ID:6975432. 12
work page 2013
-
[51]
A kernel two-sample test.The journal of machine learning research, 13(1):723–773, 2012
Arthur Gretton, Karsten M Borgwardt, Malte J Rasch, Bernhard Schölkopf, and Alexander Smola. A kernel two-sample test.The journal of machine learning research, 13(1):723–773, 2012
work page 2012
-
[52]
Generative moment matching networks
Yujia Li, Kevin Swersky, and Rich Zemel. Generative moment matching networks. InInterna- tional conference on machine learning, pages 1718–1727. PMLR, 2015
work page 2015
-
[53]
Danica J Sutherland, Hsiao-Yu Tung, Heiko Strathmann, Soumyajit De, Aaditya Ramdas, Alex Smola, and Arthur Gretton. Generative models and model criticism via optimized maximum mean discrepancy.arXiv preprint arXiv:1611.04488, 2016
-
[54]
Elsa Cazelles, Arnaud Robert, and Felipe Tobar. The wasserstein-fourier distance for stationary time series.IEEE Transactions on Signal Processing, 69:709–721, 2020
work page 2020
-
[55]
Aman Sinha, Hongseok Namkoong, Riccardo V olpi, and John Duchi. Certifying some dis- tributional robustness with principled adversarial training.arXiv preprint arXiv:1710.10571, 2017. 13 A Loss Function Construction We provide a comprehensive discussion on the Sinkhorn divergence used in our formulation (2). Let Z=R D×T denote the feature-temporal space o...
-
[56]
and structure all datasets as three-dimensional tensors of shape (N, T, D) representing samples, time steps, and features, respectively. Note that the exchange of temporal and feature dimensions does not affect our theory and algorithm as one just needs to swap the indices during computation. CNNpred [46].UCI stock market data combining 5 US indices (S&P ...
work page 2010
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.