Revisiting Frequency Moment Estimation in Random Order Streams
read the original abstract
We revisit one of the classic problems in the data stream literature, namely, that of estimating the frequency moments $F_p$ for $0 < p < 2$ of an underlying $n$-dimensional vector presented as a sequence of additive updates in a stream. It is well-known that using $p$-stable distributions one can approximate any of these moments up to a multiplicative $(1+\epsilon)$-factor using $O(\epsilon^{-2} \log n)$ bits of space, and this space bound is optimal up to a constant factor in the turnstile streaming model. We show that surprisingly, if one instead considers the popular random-order model of insertion-only streams, in which the updates to the underlying vector arrive in a random order, then one can beat this space bound and achieve $\tilde{O}(\epsilon^{-2} + \log n)$ bits of space, where the $\tilde{O}$ hides poly$(\log(1/\epsilon) + \log \log n)$ factors. If $\epsilon^{-2} \approx \log n$, this represents a roughly quadratic improvement in the space achievable in turnstile streams. Our algorithm is in fact deterministic, and we show our space bound is optimal up to poly$(\log(1/\epsilon) + \log \log n)$ factors for deterministic algorithms in the random order model. We also obtain a similar improvement in space for $p = 2$ whenever $F_2 \gtrsim \log n\cdot F_1$.
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
Towards Optimal Moment Estimation in Streaming and Distributed Models
For p-moments with positive updates, achieves Õ(ε^{-2} + log n) space for p ≤ 1 without random order and Õ(ε^{-2}) max-communication for p in (1,2] in coordinator/blackboard models.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.