pith. sign in

arxiv: 1111.1355 · v1 · pith:2JKB2CFXnew · submitted 2011-11-05 · 💻 cs.DS

A Compressed Self-Index for Genomic Databases

classification 💻 cs.DS
keywords genomescompressedcompressioncopiesdatabasesfastfirstgenome
0
0 comments X
read the original abstract

Advances in DNA sequencing technology will soon result in databases of thousands of genomes. Within a species, individuals' genomes are almost exact copies of each other; e.g., any two human genomes are 99.9% the same. Relative Lempel-Ziv (RLZ) compression takes advantage of this property: it stores the first genome uncompressed or as an FM-index, then compresses the other genomes with a variant of LZ77 that copies phrases only from the first genome. RLZ achieves good compression and supports fast random access; in this paper we show how to support fast search as well, thus obtaining an efficient compressed self-index.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.