pith. sign in

arxiv: 1103.2351 · v1 · pith:N34ENILSnew · submitted 2011-03-11 · 💻 cs.CE · cs.IT· math.IT· q-bio.QM

Engineering Relative Compression of Genomes

classification 💻 cs.CE cs.ITmath.ITq-bio.QM
keywords compressionfastergenomesrelativeaccessaccompaniedalgorithmsamounts
0
0 comments X
read the original abstract

Technology progress in DNA sequencing boosts the genomic database growth at faster and faster rate. Compression, accompanied with random access capabilities, is the key to maintain those huge amounts of data. In this paper we present an LZ77-style compression scheme for relative compression of multiple genomes of the same species. While the solution bears similarity to known algorithms, it offers significantly higher compression ratios at compression speed over a order of magnitude greater. One of the new successful ideas is augmenting the reference sequence with phrases from the other sequences, making more LZ-matches available.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.