A Framework for Estimating Stream Expression Cardinalities
pith:OMPXMWLL Add to your LaTeX paper
What is a Pith Number?\usepackage{pith}
\pithnumber{OMPXMWLL}
Prints a linked pith:OMPXMWLL badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more
read the original abstract
Given $m$ distributed data streams $A_1, \dots, A_m$, we consider the problem of estimating the number of unique identifiers in streams defined by set expressions over $A_1, \dots, A_m$. We identify a broad class of algorithms for solving this problem, and show that the estimators output by any algorithm in this class are perfectly unbiased and satisfy strong variance bounds. Our analysis unifies and generalizes a variety of earlier results in the literature. To demonstrate its generality, we describe several novel sampling algorithms in our class, and show that they achieve a novel tradeoff between accuracy, space usage, update speed, and applicability.
This paper has not been read by Pith yet.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.