pith. sign in

arxiv: 1110.5236 · v2 · pith:MDN7CP3Dnew · submitted 2011-10-24 · 💻 cs.DS

String Indexing for Patterns with Wildcards

classification 💻 cs.DS
keywords queryindextimepatternsigmaspacewildcardsindexing
0
0 comments X
read the original abstract

We consider the problem of indexing a string $t$ of length $n$ to report the occurrences of a query pattern $p$ containing $m$ characters and $j$ wildcards. Let $occ$ be the number of occurrences of $p$ in $t$, and $\sigma$ the size of the alphabet. We obtain the following results. - A linear space index with query time $O(m+\sigma^j \log \log n + occ)$. This significantly improves the previously best known linear space index by Lam et al. [ISAAC 2007], which requires query time $\Theta(jn)$ in the worst case. - An index with query time $O(m+j+occ)$ using space $O(\sigma^{k^2} n \log^k \log n)$, where $k$ is the maximum number of wildcards allowed in the pattern. This is the first non-trivial bound with this query time. - A time-space trade-off, generalizing the index by Cole et al. [STOC 2004]. We also show that these indexes can be generalized to allow variable length gaps in the pattern. Our results are obtained using a novel combination of well-known and new techniques, which could be of independent interest.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.