pith. sign in

arxiv: 1608.02413 · v2 · pith:XKQ3MIRCnew · submitted 2016-08-08 · 💻 cs.DS

EPR-dictionaries: A practical and fast data structure for constant time searches in unidirectional and bidirectional FM-indices

classification 💻 cs.DS
keywords bidirectionalindicesmethodpracticalsigmatimeconstantdata
0
0 comments X
read the original abstract

We introduce a new, practical method for conducting an exact search in a uni- and bidirectional FM index in $O(1)$ time per step while using $O(\log \sigma * n) + o(\log \sigma * \sigma * n)$ bits of space. This is done by replacing the binary wavelet tree by a new data structure, the Enhanced Prefixsum Rank dictionary (EPR-dictionary). We implemented this method in the SeqAn C++ library and experimentally validated our theoretical results. In addition we compared our implementation with other freely available implementations of bidirectional indices and show that we are between $\approx 2.6-4.8$ times faster. This will have a large impact for many bioinformatics applications that rely on practical implementations of (2)FM indices e.g. for read mapping. To our knowledge this is the first implementation of a constant time method for a search step in 2FM indices.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.