A Faster Grammar-Based Self-Index

Juha K\"arkk\"ainen; Pawe{\l} Gawrychowski; Simon J. Puglisi; Travis Gagie; Yakov Nekrich

arxiv: 1109.3954 · v6 · pith:5TNKZ7XPnew · submitted 2011-09-19 · 💻 cs.DS

A Faster Grammar-Based Self-Index

Travis Gagie , Pawe{\l} Gawrychowski , Juha K\"arkk\"ainen , Yakov Nekrich , Simon J. Puglisi This is my paper

classification 💻 cs.DS

keywords buildinggivenprogramself-indexself-indexesstorestraight-lineaccept

0 comments

read the original abstract

To store and search genomic databases efficiently, researchers have recently started building compressed self-indexes based on grammars. In this paper we show how, given a straight-line program with $r$ rules for a string (S [1..n]) whose LZ77 parse consists of $z$ phrases, we can store a self-index for $S$ in $\Oh{r + z \log \log n}$ space such that, given a pattern (P [1..m]), we can list the $\occ$ occurrences of $P$ in $S$ in $\Oh{m^2 + \occ \log \log n}$ time. If the straight-line program is balanced and we accept a small probability of building a faulty index, then we can reduce the $\Oh{m^2}$ term to $\Oh{m \log m}$. All previous self-indexes are larger or slower in the worst case.

This paper has not been read by Pith yet.

A Faster Grammar-Based Self-Index

discussion (0)