Extending Sheldon M. Ross's Method for Efficient Large-Scale Variance Computation

read the original abstract

We introduce Prior Knowledge Acceleration (PKA), a batch-update method for variance that reuses previously computed sufficient statistics to avoid full recomputation. The update identity is algebraically equivalent to the pairwise formula of Chan, Golub, and LeVeque (1983); our contribution is a runtime-cost analysis that derives an explicit acceleration factor $\tau_a$ and identifies the data-size regime where batch updating outperforms both na\"ive recomputation and Ross's single-sample method. We prove that Ross's approach is preferable only when the new batch contains a single sample ($N_2 = 1$). We further generalise the framework to covariance and other decomposable statistics. Benchmarks against Welford, Chan pairwise, and na\"ive two-pass baselines on synthetic and real-world streaming data confirm the theoretical predictions, with speedups of up to $454\times$ when the prior dataset is large relative to the new batch.

Extending Sheldon M. Ross's Method for Efficient Large-Scale Variance Computation

discussion (0)