pith. sign in

arxiv: 1202.0920 · v2 · pith:CBOCYJSQnew · submitted 2012-02-04 · 💻 cs.DM

The weighted words collector

classification 💻 cs.DM
keywords wordscollectorgiventheoremapplicationasymptoticcdotcomposition
0
0 comments X
read the original abstract

Motivated by applications in bioinformatics, we consider the word collector problem, i.e. the expected number of calls to a random weighted generator of words of length $n$ before the full collection is obtained. The originality of this instance of the non-uniform coupon collector lies in the, potentially large, multiplicity of the words/coupons of a given probability/composition. We obtain a general theorem that gives an asymptotic equivalent for the expected waiting time of a general version of the Coupon Collector. This theorem is especially well-suited for classes of coupons featuring high multiplicities. Its application to a given language essentially necessitates some knowledge on the number of words of a given composition/probability. We illustrate the application of our theorem, in a step-by-step fashion, on three exemplary languages, revealing asymptotic regimes in $\Theta(\mu(n)\cdot n)$ and $\Theta(\mu(n)\cdot \log n)$, where $\mu(n)$ is the sum of weights over words of length $n$.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.