HIMCE: High-dimensional multiple imputation via covariance-mode updating for neuroimaging and spatiotemporal blocks

Hsin-Hsiung Huang; Stef van Buuren

arxiv: 2605.04440 · v1 · submitted 2026-05-06 · 📊 stat.ME · stat.CO

HIMCE: High-dimensional multiple imputation via covariance-mode updating for neuroimaging and spatiotemporal blocks

Hsin-Hsiung Huang , Stef van Buuren This is my paper

Pith reviewed 2026-05-08 17:42 UTC · model grok-4.3

classification 📊 stat.ME stat.CO

keywords multiple imputationhigh-dimensional dataneuroimagingmissing datacovariance estimationMICEdata augmentationspatiotemporal blocks

0 comments

The pith

HIMCE approximates covariance uncertainty via mode updating to impute high-dimensional blocks faster and more accurately than HIMA or MICE.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes HIMCE as a hybrid multiple-imputation method designed for continuous blocks with structured missingness, such as those arising in neuroimaging or spatiotemporal data. It keeps the exact Gaussian conditional distributions used by full multivariate normal imputation but replaces costly repeated covariance sampling with updates at the mode, optionally adding a scalar bridge for better uncertainty propagation. In small blocks the method can switch to an exact inverse-Wishart refresh. A sympathetic reader cares because common imputation tools either become unstable or prohibitively slow when dimension and correlation are both high, yet downstream analyses still need properly propagated uncertainty. The paper proves fixed-dimensional posterior consistency and shows, in a spatial benchmark, lower posterior-mean error than HIMA or screened MICE, HIMA-like speed, and runtime under half that of MICE.

Core claim

HIMCE is a hybrid multiple-imputation procedure for continuous blocks that preserves the Gaussian conditional imputation law and propagates mean-parameter uncertainty through stochastic coefficient or local-ridge draws. In high-dimensional blocks it approximates covariance uncertainty through covariance-mode updating, optionally with a scalar bridge; in small blocks it restores exact covariance uncertainty through a conditional inverse-Wishart refresh. The authors record the exact Bayesian reference sampler, prove fixed-dimensional posterior consistency, and establish asymptotic equivalence of mode plug-in prediction in total variation.

What carries the argument

Covariance-mode updating, which replaces full posterior sampling of the covariance matrix with direct updates at its mode to approximate uncertainty without repeated matrix factorizations.

If this is right

In primary spatial benchmarks HIMCE reduces posterior-mean error relative to HIMA and screened MICE.
Runtime matches HIMA and stays below half the runtime of MICE.
Interval coverage improves over HIMA although MICE remains better calibrated.
Fixed-dimensional posterior consistency and total-variation equivalence of the mode-plug-in predictor hold.
Randomized rank-cell PIT, PIT-consistent empirical coverage, and marginal overlays supply practical diagnostics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same mode-updating device could be inserted into other imputation pipelines that already rely on a multivariate normal working model.
Because the method separates the conditional imputation law from the covariance refresh, it offers a modular template for hybrid imputers that mix exact and approximate steps.
The PIT-based diagnostics could be applied directly to any chained-equation or data-augmentation procedure to compare calibration across methods.
Asymptotic equivalence in total variation suggests that, for fixed dimension and growing sample size, HIMCE predictions converge to those of the exact sampler.

Load-bearing premise

The multivariate normal working model supplies a coherent posterior predictive target and the covariance-mode updating approximates the full covariance uncertainty closely enough to avoid substantial bias.

What would settle it

A simulation study on high-dimensional MVN data with known parameters in which HIMCE produces higher posterior-mean imputation error or worse interval calibration than exact MVN data augmentation would falsify the central performance claims.

Figures

Figures reproduced from arXiv: 2605.04440 by Hsin-Hsiung Huang, Stef van Buuren.

**Figure 1.** Figure 1: Distributional comparisons under pseudo-missing masking in the spatially correlated view at source ↗

read the original abstract

High-dimensional neuroimaging and spatiotemporal blocks often contain structured missingness from acquisition artifacts, preprocessing failures, and sensor dropout. Multiple imputation propagates uncertainty, but fully conditional specification methods such as multivariate imputation by chained equations (MICE) can be slow or unstable when block dimension is large and correlations are strong. A multivariate normal (MVN) working model provides a coherent posterior predictive target and an exact data augmentation sampler, but repeated covariance sampling and matrix factorizations become costly in large dimensions. We propose High-dimensional Imputation via covariance Mode and Chained Equations (HIMCE), a hybrid multiple-imputation procedure for continuous blocks. Relative to exact MVN data augmentation, HIMCE preserves the Gaussian conditional imputation law and propagates mean- parameter uncertainty through stochastic coefficient or local-ridge draws. In high-dimensional blocks, it approximates covariance uncertainty through covariance-mode updating, optionally with a scalar bridge; in small blocks, it can restore exact covariance uncertainty through a conditional inverse-Wishart refresh. We record the exact Bayesian reference sampler and prove fixed-dimensional posterior consistency and asymptotic equivalence of mode plug-in prediction in total variation. We also develop diagnostics based on randomized rank-cell probability integral transform (PIT), PIT-consistent empirical coverage, and marginal distribution overlays. In the primary spatial benchmark, HIMCE improves posterior-mean error relative to HIMA and screened MICE, runs at HIMA-like speed and below half the MICE runtime, and improves interval coverage over HIMA, although MICE remains better calibrated. A repeated low- dimensional NHANES illustration shows improved coverage with competitive point prediction.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript proposes HIMCE, a hybrid multiple-imputation method for high-dimensional continuous blocks with structured missingness in neuroimaging and spatiotemporal data. It preserves the Gaussian conditional imputation law from an MVN working model, propagates mean-parameter uncertainty via stochastic draws, and approximates covariance uncertainty through covariance-mode updating (optionally with a scalar bridge) in large blocks while allowing exact inverse-Wishart refresh in small blocks. The authors record an exact Bayesian reference sampler, prove fixed-dimensional posterior consistency and asymptotic equivalence of mode plug-in prediction in total variation, introduce PIT-based diagnostics (randomized rank-cell PIT, empirical coverage, marginal overlays), and report benchmark results showing reduced posterior-mean error and runtime versus HIMA and screened MICE, plus improved coverage over HIMA (though MICE is better calibrated) in a primary spatial benchmark, with a repeated low-dimensional NHANES illustration.

Significance. If the covariance-mode approximation controls bias adequately, HIMCE would supply a practical, faster alternative to full MVN data augmentation and MICE for high-dimensional blocks, supported by explicit consistency proofs (fixed dimension), new PIT diagnostics, and concrete benchmark gains in point estimation and speed. The work credits the exact reference sampler and develops falsifiable diagnostics that could aid reproducibility in neuroimaging imputation.

major comments (1)

[Abstract] Abstract: The manuscript states proofs of fixed-dimensional posterior consistency and asymptotic equivalence of mode plug-in prediction in total variation, yet the central empirical claims and motivating regime concern high-dimensional blocks (where dimension may grow with sample size). No extension, rate, or bound is supplied showing that the total-variation equivalence or the bias introduced by covariance-mode updating remains controlled when p grows with n, which is the regime in which the method is benchmarked and motivated.

minor comments (1)

[Abstract] The description of the optional scalar bridge and its effect on the approximation could be expanded with a brief equation or pseudocode to clarify when it is activated versus the full mode update.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on our manuscript. We address the single major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: The manuscript states proofs of fixed-dimensional posterior consistency and asymptotic equivalence of mode plug-in prediction in total variation, yet the central empirical claims and motivating regime concern high-dimensional blocks (where dimension may grow with sample size). No extension, rate, or bound is supplied showing that the total-variation equivalence or the bias introduced by covariance-mode updating remains controlled when p grows with n, which is the regime in which the method is benchmarked and motivated.

Authors: We agree that the theoretical results are derived under fixed dimension p, as stated in the manuscript. The covariance-mode updating is introduced precisely to enable scalable approximation of covariance uncertainty in the high-dimensional regime where exact inverse-Wishart sampling becomes computationally infeasible; the procedure preserves the exact conditional Gaussian imputation law from the MVN working model while propagating mean-parameter uncertainty via stochastic draws. No rates or bounds are supplied for the growing-p case, and extending the total-variation equivalence result to p = o(n) or similar regimes would require substantial additional technical work that lies outside the present scope. The high-dimensional performance claims rest on the design of the approximation together with the reported benchmark evidence (reduced posterior-mean error and competitive coverage relative to HIMA in the primary spatial example). We will revise the abstract to make the fixed-dimensional scope of the consistency and equivalence statements explicit while retaining the description of the empirical high-dimensional results. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected; derivation remains self-contained

full rationale

The paper presents HIMCE as a hybrid procedure that preserves the Gaussian conditional law while approximating covariance uncertainty via mode updating (with optional scalar bridge) in large blocks and exact inverse-Wishart refresh in small blocks. It separately records an exact Bayesian reference sampler and states proofs of fixed-dimensional posterior consistency plus total-variation equivalence of mode plug-in prediction. These elements are introduced as distinct from the approximation itself; no equation or claim reduces a prediction, consistency result, or empirical performance metric to a fitted parameter or self-referential definition by construction. External benchmarks against HIMA and MICE further anchor the claims without load-bearing self-citation loops.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach rests on the multivariate normal assumption for the data blocks and on the validity of the mode approximation for covariance uncertainty in high dimensions; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (2)

domain assumption Data blocks follow a multivariate normal distribution
Stated as the working model providing coherent posterior predictive target.
ad hoc to paper Covariance-mode updating approximates full posterior covariance uncertainty sufficiently well
Core of the high-dimensional efficiency claim.

pith-pipeline@v0.9.0 · 5585 in / 1331 out tokens · 27875 ms · 2026-05-08T17:42:31.424478+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

9 extracted references · 9 canonical work pages

[1]

D. B. Rubin, Multiple Imputation for Nonresponse in Surveys, Wiley, New York, 2004

work page 2004
[2]

van Buuren, K

S. van Buuren, K. Groothuis-Oudshoorn, mice: Multivariate imputation by chained equations in R, Journal of Statistical Software 45 (3) (2011) 1–67

work page 2011
[3]

van Buuren, Flexible Imputation of Missing Data, 2nd Edition, Chapman and Hall/CRC, Boca Raton, 2018

S. van Buuren, Flexible Imputation of Missing Data, 2nd Edition, Chapman and Hall/CRC, Boca Raton, 2018

work page 2018
[4]

J. L. Schafer, Analysis of Incomplete Multivariate Data, Chapman and Hall/CRC, London, 1997

work page 1997
[5]

J. L. Schafer, J. W. Graham, Missing data: Our view of the state of the art, Psychological Methods 7 (2) (2002) 147–177

work page 2002
[6]

R. J. A. Little, D. B. Rubin, Statistical Analysis with Missing Data, 3rd Edition, Wiley, Hoboken, 2019

work page 2019
[7]

Ledoit, M

O. Ledoit, M. Wolf, A well-conditioned estimator for large-dimensional co- variance matrices, Journal of Multivariate Analysis 88 (2) (2004) 365–411

work page 2004
[8]

T. Lu, P. Kochunov, C. Chen, H.-H. Huang, L. E. Hong, S. Chen, A new multiple imputation method for high-dimensional neuroimaging data, Hu- man Brain Mapping 46 (5) (2025) e70161

work page 2025
[9]

C. J. Champion, Empirical Bayesian estimation of normal variances and covariances, Journal of Multivariate Analysis 87 (1) (2003) 60–79.doi: 10.1016/S0047-259X(02)00076-3. 25

work page doi:10.1016/s0047-259x(02)00076-3 2003

[1] [1]

D. B. Rubin, Multiple Imputation for Nonresponse in Surveys, Wiley, New York, 2004

work page 2004

[2] [2]

van Buuren, K

S. van Buuren, K. Groothuis-Oudshoorn, mice: Multivariate imputation by chained equations in R, Journal of Statistical Software 45 (3) (2011) 1–67

work page 2011

[3] [3]

van Buuren, Flexible Imputation of Missing Data, 2nd Edition, Chapman and Hall/CRC, Boca Raton, 2018

S. van Buuren, Flexible Imputation of Missing Data, 2nd Edition, Chapman and Hall/CRC, Boca Raton, 2018

work page 2018

[4] [4]

J. L. Schafer, Analysis of Incomplete Multivariate Data, Chapman and Hall/CRC, London, 1997

work page 1997

[5] [5]

J. L. Schafer, J. W. Graham, Missing data: Our view of the state of the art, Psychological Methods 7 (2) (2002) 147–177

work page 2002

[6] [6]

R. J. A. Little, D. B. Rubin, Statistical Analysis with Missing Data, 3rd Edition, Wiley, Hoboken, 2019

work page 2019

[7] [7]

Ledoit, M

O. Ledoit, M. Wolf, A well-conditioned estimator for large-dimensional co- variance matrices, Journal of Multivariate Analysis 88 (2) (2004) 365–411

work page 2004

[8] [8]

T. Lu, P. Kochunov, C. Chen, H.-H. Huang, L. E. Hong, S. Chen, A new multiple imputation method for high-dimensional neuroimaging data, Hu- man Brain Mapping 46 (5) (2025) e70161

work page 2025

[9] [9]

C. J. Champion, Empirical Bayesian estimation of normal variances and covariances, Journal of Multivariate Analysis 87 (1) (2003) 60–79.doi: 10.1016/S0047-259X(02)00076-3. 25

work page doi:10.1016/s0047-259x(02)00076-3 2003