pith. sign in

arxiv: 1304.5665 · v1 · pith:VBMZYJAZnew · submitted 2013-04-20 · 🧬 q-bio.GN

Informed and Automated k-Mer Size Selection for Genome Assembly

classification 🧬 q-bio.GN
keywords besthistogramsabundanceassemblyestimatefastgenomeinformed
0
0 comments X
read the original abstract

Genome assembly tools based on the de Bruijn graph framework rely on a parameter k, which represents a trade-off between several competing effects that are difficult to quantify. There is currently a lack of tools that would automatically estimate the best k to use and/or quickly generate histograms of k-mer abundances that would allow the user to make an informed decision. We develop a fast and accurate sampling method that constructs approximate abundance histograms with a several orders of magnitude performance improvement over traditional methods. We then present a fast heuristic that uses the generated abundance histograms for putative k values to estimate the best possible value of k. We test the effectiveness of our tool using diverse sequencing datasets and find that its choice of k leads to some of the best assemblies. Our tool KmerGenie is freely available at: http://kmergenie.bx.psu.edu/

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.