pith. sign in

arxiv: 1301.0068 · v3 · pith:WHPUZUDZnew · submitted 2013-01-01 · 🧬 q-bio.GN · cs.DS· cs.IT· math.IT· q-bio.QM

Optimal Assembly for High Throughput Shotgun Sequencing

classification 🧬 q-bio.GN cs.DScs.ITmath.ITq-bio.QM
keywords sequencingassemblyconditionsreconstructionshotgunbounddesignlower
0
0 comments X
read the original abstract

We present a framework for the design of optimal assembly algorithms for shotgun sequencing under the criterion of complete reconstruction. We derive a lower bound on the read length and the coverage depth required for reconstruction in terms of the repeat statistics of the genome. Building on earlier works, we design a de Brujin graph based assembly algorithm which can achieve very close to the lower bound for repeat statistics of a wide range of sequenced genomes, including the GAGE datasets. The results are based on a set of necessary and sufficient conditions on the DNA sequence and the reads for reconstruction. The conditions can be viewed as the shotgun sequencing analogue of Ukkonen-Pevzner's necessary and sufficient conditions for Sequencing by Hybridization.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.