pith. sign in

arxiv: 1602.05856 · v1 · pith:K5LR6B3Inew · submitted 2016-02-18 · 💻 cs.DS · q-bio.GN

TwoPaCo: An efficient algorithm to build the compacted de Bruijn graph from many complete genomes

classification 💻 cs.DS q-bio.GN
keywords genomesbruijngraphtwopacoalgorithmcompactedcompletedata
0
0 comments X
read the original abstract

Motivation: De Bruijn graphs have been proposed as a data structure to facilitate the analysis of related whole genome sequences, in both a population and comparative genomic settings. However, current approaches do not scale well to many genomes of large size (such as mammalian genomes). Results: In this paper, we present TwoPaCo, a simple and scalable low memory algorithm for the direct construction of the compacted de Bruijn graph from a set of complete genomes. We demonstrate that it can construct the graph for 100 simulated human genomes in less then a day and eight real primates in less than two hours, on a typical shared-memory machine. We believe that this progress will enable novel biological analyses of hundreds of mammalian-sized genomes. Availability: Our code and data is available for download from github.com/medvedevgroup/TwoPaCo Contact: ium125@psu.edu

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.