pith. sign in

arxiv: 1806.09646 · v1 · pith:72L45U2Fnew · submitted 2018-06-25 · 💻 cs.DS

Fast entropy-bounded string dictionary look-up with mismatches

classification 💻 cs.DS
keywords dictionarydatastructurequeryspacestringsbitsentropy-bounded
0
0 comments X
read the original abstract

We revisit the fundamental problem of dictionary look-up with mismatches. Given a set (dictionary) of $d$ strings of length $m$ and an integer $k$, we must preprocess it into a data structure to answer the following queries: Given a query string $Q$ of length $m$, find all strings in the dictionary that are at Hamming distance at most $k$ from $Q$. Chan and Lewenstein (CPM 2015) showed a data structure for $k = 1$ with optimal query time $O(m/w + occ)$, where $w$ is the size of a machine word and $occ$ is the size of the output. The data structure occupies $O(w d \log^{1+\varepsilon} d)$ extra bits of space (beyond the entropy-bounded space required to store the dictionary strings). In this work we give a solution with similar bounds for a much wider range of values $k$. Namely, we give a data structure that has $O(m/w + \log^k d + occ)$ query time and uses $O(w d \log^k d)$ extra bits of space.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.