pith. sign in

arxiv: 2410.21922 · v10 · submitted 2024-10-29 · 📊 stat.CO · math.ST· stat.TH

Extending Sheldon M. Ross's Method for Efficient Large-Scale Variance Computation

classification 📊 stat.CO math.STstat.TH
keywords batchmethodrossaccelerationchanpairwisepriorrecomputation
0
0 comments X p. Extension
read the original abstract

We introduce Prior Knowledge Acceleration (PKA), a batch-update method for variance that reuses previously computed sufficient statistics to avoid full recomputation. The update identity is algebraically equivalent to the pairwise formula of Chan, Golub, and LeVeque (1983); our contribution is a runtime-cost analysis that derives an explicit acceleration factor $\tau_a$ and identifies the data-size regime where batch updating outperforms both na\"ive recomputation and Ross's single-sample method. We prove that Ross's approach is preferable only when the new batch contains a single sample ($N_2 = 1$). We further generalise the framework to covariance and other decomposable statistics. Benchmarks against Welford, Chan pairwise, and na\"ive two-pass baselines on synthetic and real-world streaming data confirm the theoretical predictions, with speedups of up to $454\times$ when the prior dataset is large relative to the new batch.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.