tBayes-MICE: A Bayesian Approach to Multiple Imputation for Time Series Data
Pith reviewed 2026-05-14 22:25 UTC · model grok-4.3
The pith
tBayes-MICE extends MICE with Bayesian MCMC sampling and time-lagged features to reduce imputation errors in time-series data while quantifying uncertainty.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central discovery is that tBayes-MICE, by embedding Bayesian inference and MCMC sampling into the MICE procedure along with temporal lags, achieves lower imputation errors than baseline methods on the AirQuality and PhysioNet datasets while properly incorporating uncertainty.
What carries the argument
Bayesian MICE with MCMC sampling and time-lagged features, which performs posterior sampling over conditional imputation models to handle temporal dependencies in missing data.
If this is right
- Imputation errors are reduced relative to baseline methods for every variable in the evaluated datasets.
- Uncertainty in the imputation process is quantified through posterior distributions, providing a more accurate assessment of error.
- The MALA sampler shows superior mixing compared to RWM across most variables with similar accuracy.
- The approach offers a practical method for time-series imputation in environmental and clinical applications.
Where Pith is reading between the lines
- This could improve the reliability of time-series predictions in fields where missing data is common by feeding the imputed data with uncertainty into forecasting models.
- Future work might test the method on datasets with different missing patterns or higher dimensions.
- The Bayesian framework could be combined with other imputation techniques like deep learning for even better performance.
Load-bearing premise
The time-lagged features and MCMC sampling on the conditional models will accurately reflect the temporal dependencies without introducing bias or failing to converge properly.
What would settle it
A direct comparison showing that tBayes-MICE imputation errors exceed those of standard MICE on the same AirQuality and PhysioNet test data, or that the uncertainty estimates do not align with actual imputation inaccuracies.
Figures
read the original abstract
Time-series analysis is often affected by missing data, a common problem across several fields, including healthcare and environmental monitoring. Multiple Imputation by Chained Equations (MICE) has been prominent for imputing missing values through "fully conditional specification". We extend MICE using the Bayesian framework (tBayes-MICE), utilising Bayesian inference to impute missing values via Markov Chain Monte Carlo (MCMC) sampling to account for uncertainty in MICE model parameters and imputed values. We also include temporally informed initialisation and time-lagged features in the model to respect the sequential nature of time-series data. We evaluate the tBayes-MICE method using two real-world datasets (AirQuality and PhysioNet), and using both the Random Walk Metropolis (RWM) and the Metropolis-Adjusted Langevin Algorithm (MALA) samplers. Our results demonstrate that tBayes-MICE reduces imputation errors relative to the baseline methods over all variables and accounts for uncertainty in the imputation process, thereby providing a more accurate measure of imputation error. We also found that MALA mixed better than RWM across most variables, achieving comparable accuracy while providing more consistent posterior exploration. Overall, these findings suggest that the tBayes-MICE framework represents a practical and efficient approach to time-series imputation, balancing increased accuracy with meaningful quantification of uncertainty in various environmental and clinical settings.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes tBayes-MICE, a Bayesian extension of Multiple Imputation by Chained Equations (MICE) for time-series data that uses MCMC sampling (RWM and MALA) on conditional models augmented with time-lagged features and temporally informed initialization to account for parameter and imputation uncertainty. It evaluates the method on the AirQuality and PhysioNet datasets and claims reduced imputation errors relative to baselines across all variables together with improved uncertainty quantification.
Significance. If the MCMC sampling converges reliably and the empirical reductions hold, the framework could provide a practical Bayesian alternative to standard MICE for sequential data in healthcare and environmental monitoring, offering both point imputations and posterior-based uncertainty estimates that are currently missing from many chained-equation approaches.
major comments (3)
- [Abstract] Abstract and evaluation sections: the central claim that tBayes-MICE 'reduces imputation errors relative to the baseline methods over all variables' is stated without any reported numerical values, error bars, baseline definitions, or per-variable metrics, so the support for the primary result cannot be assessed from the provided evidence.
- [MCMC Sampling] MCMC implementation and results: no R-hat statistics, effective sample sizes, trace plots, or autocorrelation times are supplied for the conditional posteriors on either dataset, yet the uncertainty quantification and error-reduction claims rest entirely on the MCMC samples faithfully representing the target posterior.
- [Method] Method description: the assumption that time-lagged features plus MCMC on the fully conditional specification will capture temporal dependencies without bias or convergence failure is not accompanied by any diagnostic checks or sensitivity analysis, leaving open the possibility that reported improvements are artifacts of poor mixing.
minor comments (2)
- [Abstract] The abstract refers to 'over all variables' without listing the variables or providing variable-wise breakdowns, which reduces clarity.
- [Method] Notation for the Bayesian conditional models and the precise form of the time-lagged feature augmentation could be made more explicit to aid reproducibility.
Simulated Author's Rebuttal
We thank the referee for their thorough and constructive review. We have addressed each major comment below and will revise the manuscript to strengthen the presentation of results, add necessary diagnostics, and provide additional analyses as outlined.
read point-by-point responses
-
Referee: [Abstract] Abstract and evaluation sections: the central claim that tBayes-MICE 'reduces imputation errors relative to the baseline methods over all variables' is stated without any reported numerical values, error bars, baseline definitions, or per-variable metrics, so the support for the primary result cannot be assessed from the provided evidence.
Authors: We agree that the abstract and evaluation sections would be strengthened by including specific quantitative results. In the revised manuscript, we will report numerical imputation error values (e.g., mean RMSE and MAE) for tBayes-MICE versus the baselines, including standard deviations to serve as error bars. We will explicitly define the baseline methods and add a table or figure with per-variable metrics for both datasets to allow readers to fully assess the primary claims. revision: yes
-
Referee: [MCMC Sampling] MCMC implementation and results: no R-hat statistics, effective sample sizes, trace plots, or autocorrelation times are supplied for the conditional posteriors on either dataset, yet the uncertainty quantification and error-reduction claims rest entirely on the MCMC samples faithfully representing the target posterior.
Authors: We acknowledge this gap in the reporting of MCMC diagnostics. We will add R-hat statistics, effective sample sizes, and autocorrelation times for the conditional posteriors on both the AirQuality and PhysioNet datasets. Representative trace plots and summary convergence diagnostics will be included in the main text or supplementary material to confirm that the samples faithfully represent the target posteriors for both RWM and MALA samplers. revision: yes
-
Referee: [Method] Method description: the assumption that time-lagged features plus MCMC on the fully conditional specification will capture temporal dependencies without bias or convergence failure is not accompanied by any diagnostic checks or sensitivity analysis, leaving open the possibility that reported improvements are artifacts of poor mixing.
Authors: We will revise the method section to include explicit diagnostic checks and sensitivity analyses. This will encompass the convergence metrics noted above, plus sensitivity tests on the number of time lags and initialization strategies. These additions will demonstrate that temporal dependencies are captured reliably and that improvements are not artifacts of poor mixing or convergence issues. revision: yes
Circularity Check
No circularity in derivation chain
full rationale
The paper extends standard MICE with Bayesian MCMC sampling (RWM/MALA) and time-lagged features for time-series imputation. No equations, predictions, or central claims reduce by construction to fitted inputs or self-citations; the method is presented as a direct application of established fully conditional specification and MCMC techniques on external datasets (AirQuality, PhysioNet). No self-definitional loops, fitted-input predictions, or load-bearing self-citations appear in the derivation. The approach remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
E. Afrifa-Yamoah, U. A. Mueller, S. M. Taylor, and A.J. Fisher. Missing data imputation of high-resolution tempo- ral climate time series data.Meteorological Applications, 27(1):e1873, 2020
work page 2020
-
[2]
M.J. Azur, E. A. Stuart, C. Frangakis, and P. J. Leaf. Mul- tiple imputation by chained equations: what is it and how does it work?International Journal of Methods in Psy- chiatric Research, 20(1):40–49, 2011
work page 2011
-
[3]
G. E. Batista, M. C. Monard, et al. A study of k-nearest neighbour as an imputation method.His, 87(251-260):48, 2002
work page 2002
-
[4]
J. Brand.Development, implementation and evaluation of multiple imputation strategies for the statistical analysis of incomplete data sets. Thesis, 1999
work page 1999
-
[5]
W. Cao, D. Wang, J. Li, H. Zhou, L. Li, and Y . Li. Brits: Bidirectional recurrent imputation for time series. Advances in Neural Information Processing Systems, 31, 2018
work page 2018
-
[6]
R. Chandra and J. Simmons. Bayesian neural networks via mcmc: a python-based tutorial.IEEE Access, 12:70519– 70549, 2024
work page 2024
-
[7]
Z. Che, S. Purushotham, K. Cho, D. Sontag, and Y . Liu. Recurrent neural networks for multivariate time series with missing values.Scientific Reports, 8(1):6085, 2018
work page 2018
-
[8]
S. De Vito, E. Massera, M. Piga, L. Martinotto, and G. Di Francia. On field calibration of an electronic nose for benzene estimation in an urban pollution mon- itoring scenario.Sensors and Actuators B: Chemical, 129(2):750–757, 2008
work page 2008
-
[9]
A R. T. Donders, G.JMG. Van Der Heijden, T. Stijnen, and K. Moons. A gentle introduction to imputation of missing values.Journal of Clinical Epidemiology, 59(10):1087– 1091, 2006
work page 2006
- [10]
-
[11]
N. S. Erler, D. Rizopoulos, V . W.V . Jaddoe, O. H. Franco, and E. MEH Lesaffre. Bayesian imputation of time- varying covariates in linear mixed models.Statistical Methods in Medical Research, 28(2):555–568, 2019
work page 2019
- [12]
-
[13]
A. Farhangfar, L. A. Kurgan, and W. Pedrycz. A novel framework for imputation of missing values in databases. IEEE Transactions on Systems, Man, and Cybernetics- Part A: Systems and Humans, 37(5):692–709, 2007
work page 2007
-
[14]
P. J. García-Laencina and A. R. Sancho-Gómez, J. L.and Figueiras-Vidal. Pattern classification with missing data: a review.Neural Computing and Applications, 19:263– 282, 2010
work page 2010
-
[15]
A. E. Gelfand and A. FM. Smith. Sampling-based ap- proaches to calculating marginal densities.Journal of the American Statistical Association, 85(410):398–409, 1990
work page 1990
- [16]
- [17]
-
[18]
A. Gelman and D. B. Rubin. Inference from iterative simulation using multiple sequences.Statistical Science, 7(4):457–472, 1992
work page 1992
-
[19]
S. Geman and D. Geman. Stochastic relaxation, gibbs dis- tributions, and the bayesian restoration of images.IEEE Transactions on Pattern Analysis and Machine Intelli- gence, (6):721–741, 1984
work page 1984
-
[20]
W. R. Gilks, S. Richardson, and D. Spiegelhalter.Markov chain Monte Carlo in practice. CRC press, 1995
work page 1995
-
[21]
K. Grzesiak, C. Muller, J. Josse, and J. Näf. Do we need dozens of methods for real world missing value imputa- tion?arXiv preprint arXiv:2511.04833, 2025
- [22]
-
[23]
W. K. Hastings. Monte carlo sampling methods using markov chains and their applications. 1970. 18
work page 1970
-
[24]
V . Hua, T. Nguyen, M. Dao, H. Nguyen, and B. T. Nguyen. The impact of data imputation on air quality pre- diction problem.Plos one, 19(9):e0306303, 2024
work page 2024
- [25]
-
[26]
L. Ji, M. Chen, Z. Oravecz, E. M. Cummings, Z. Lu, and S. Chow. A bayesian vector autoregressive model with nonignorable missingness in dependent variables and covariates: Development, evaluation, and application to family processes.Structural Equation Modeling: A Mul- tidisciplinary Journal, 27(3):442–467, 2020
work page 2020
-
[27]
H. Junninen, H. Niska, K. Tuppurainen, J. Ruuskanen, and M. Kolehmainen. Methods for imputation of missing values in air quality data sets.Atmospheric Environment, 38(18):2895–2907, 2004
work page 2004
-
[28]
S. I. Khan and A. S. Md L. Hoque. Sice: an improved missing data imputation technique.Journal of Big Data, 7(1):37, 2020
work page 2020
-
[29]
O. Kulkarni and R. Chandra. Bayes-catsi: A variational bayesian deep learning framework for medical time series data imputation.arXiv preprint arXiv:2410.01847, 2024
-
[30]
M. Le Morvan, J. Josse, E. Scornet, and G. Varoquaux. What’sa good imputation to predict with missing val- ues?Advances in Neural Information Processing Systems, 34:11530–11540, 2021
work page 2021
-
[31]
R.JA Little and D.B Rubin.Statistical analysis with miss- ing data, volume 793. John Wiley & Sons, 2019
work page 2019
-
[32]
N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller. Equation of state calculations by fast computing machines.The Journal of Chemical Physics, 21(6):1087–1092, 1953
work page 1953
-
[33]
Comparison of different Methods for Univariate Time Series Imputation in R
S. Moritz, T. Sardá, A.and Bartz-Beielstein, M. Zaef- ferer, and J. Stork. Comparison of different methods for univariate time series imputation in r.arXiv preprint arXiv:1510.03924, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[34]
J. S. Murray. Multiple imputation: a review of practical and theoretical findings.Statistical Science, 33:142–159, 2018
work page 2018
- [35]
-
[36]
N. M. Noor, M. M. Al Bakri Abdullah, A. S. Yahaya, and N. A. Ramli. Comparison of linear interpolation method and mean method to replace the missing values in envi- ronmental data set. InMaterials Science Forum, volume 803, pages 278–281. Trans Tech Publ, 2015
work page 2015
-
[37]
M. Resche-Rigon and I. R. White. Multiple imputation by chained equations for systematically and sporadically missing multilevel data.Statistical Methods in Medical Research, 27(6):1634–1649, 2018
work page 2018
-
[38]
C. P. Robert, G. Casella, and G. Casella.Monte Carlo statistical methods, volume 2. Springer, 1999
work page 1999
-
[39]
G. O. Roberts and J. S. Rosenthal. Optimal scaling of dis- crete approximations to langevin diffusions.Journal of the Royal Statistical Society: Series B (Statistical Methodol- ogy), 60(1):255–268, 1998
work page 1998
-
[40]
G. O. Roberts and J. S. Rosenthal. Optimal scaling for var- ious metropolis-hastings algorithms.Statistical Science, 16(4):351–367, 2001
work page 2001
-
[41]
G. O. Roberts and R. L. Tweedie. Exponential conver- gence of langevin distributions and their discrete approxi- mations. 1996
work page 1996
-
[42]
D. B. Rubin. An overview of multiple imputation. In Proceedings of the survey research methods section of the American Statistical Association, volume 79, page 84, 1988
work page 1988
-
[43]
D.B. Rubin. Multiple imputations in sample surveys-a phenomenological bayesian approach to nonresponse. In Proceedings of the survey research methods section of the American Statistical Association, volume 1, pages 20–34. American Statistical Association Alexandria, V A, 1978
work page 1978
-
[44]
C. M. Salgado, C. Azevedo, H. Proença, and S. M. Vieira. Missing data.Secondary Analysis of Electronic Health Records, pages 143–162, 2016
work page 2016
-
[45]
J. L. Schafer.Analysis of incomplete multivariate data. CRC press, 1997
work page 1997
-
[46]
J.L. Schafer. Multiple imputation: a primer.Statistical Methods in Medical Research, 8(1):3–15, 1999
work page 1999
-
[47]
T. Shadbahr, M. Roberts, J. Stanczuk, J. Gilbey, P. Teare, S. Dittmer, M. Thorpe, R. V . Torné, E. Sala, P. Lió, et al. The impact of imputation quality on machine learning classifiers for datasets with missing values.Communica- tions Medicine, 3(1):139, 2023
work page 2023
- [48]
-
[49]
D. J. Stekhoven and P. Bühlmann. Missforest—non- parametric missing value imputation for mixed-type data. Bioinformatics, 28(1):112–118, 2012
work page 2012
-
[50]
W. Sun. Application of markov chain monte-carlo mul- tiple imputation method to deal with missing data from the mechanism of mnar in sensitivity analysis for a lon- gitudinal clinical trial. InMonte-Carlo Simulation-Based Statistical Modeling, pages 233–252. Springer, 2017. 19
work page 2017
-
[51]
M. A. Tanner and W. H. Wong. The calculation of pos- terior distributions by data augmentation.Journal of the American statistical Association, 82(398):528–540, 1987
work page 1987
-
[52]
S. Van Buuren. Flexible multivariate imputation by mice. TNO Prevention and Health, 1999
work page 1999
-
[53]
van Buuren.Flexible Imputation of Missing Data
S. van Buuren.Flexible Imputation of Missing Data. CRC Press, 2018
work page 2018
-
[54]
S. Van Buuren, H. C. Boshuizen, and D. L. Knook. Mul- tiple imputation of missing blood pressure covariates in survival analysis.Statistics in Medicine, 18(6):681–694, 1999
work page 1999
-
[55]
S. Van Buuren and K. Groothuis-Oudshoorn. mice: Mul- tivariate imputation by chained equations in r.Journal of Statistical Software, 45:1–67, 2011
work page 2011
-
[56]
W. F. Velicer and S. M. Colby. A comparison of missing- data procedures for arima time-series analysis.Educa- tional and Psychological Measurement, 65(4):596–615, 2005
work page 2005
-
[57]
I. R. White and A.M. Royston, P.and Wood. Multiple im- putation using chained equations: issues and guidance for practice.Statistics in Medicine, 30(4):377–399, 2011
work page 2011
-
[58]
K. Yin, L. Feng, and W. K. Cheung. Context-aware time series imputation for multi-analyte clinical data.Journal of Healthcare Informatics Research, 4(4):411–426, 2020
work page 2020
-
[59]
Gain: Missing data imputation using generative adversarial nets
Jinsung Yoon, James Jordon, and Mihaela Schaar. Gain: Missing data imputation using generative adversarial nets. InInternational Conference on Machine Learning, pages 5689–5698. PMLR, 2018
work page 2018
-
[60]
C. Yozgatligil, S. Aslan, C. Iyigun, and I. Batmaz. Com- parison of missing value imputation methods in time se- ries: the case of turkish meteorological data.Theoretical and Applied Climatology, 112(1):143–167, 2013. 20
work page 2013
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.