pith. sign in

arxiv: 1403.7486 · v1 · pith:OZ666DTKnew · submitted 2014-03-28 · 🧬 q-bio.GN

SAMBLASTER: fast duplicate marking and structural variant read extraction

classification 🧬 q-bio.GN
keywords samblasterfilesdataduplicatemarkingoutputpipelinespost-pass
0
0 comments X
read the original abstract

Motivation: Illumina DNA sequencing is now the predominant source of raw genomic data, and data volumes are growing rapidly. Bioinformatic analysis pipelines are having trouble keeping pace. A common bottleneck in such pipelines is the requirement to read, write, sort and compress large BAM files multiple times. Results: We present SAMBLASTER, a tool that reduces the number of times such costly operations are performed. SAMBLASTER is designed to mark duplicates in read-sorted SAM files as a piped post-pass on DNA aligner output before it is compressed to BAM. In addition, it can simultaneously output into separate files the discordant read-pairs and/or split-read mappings used for structural variant calling. As an alignment post-pass, its own runtime overhead is negligible, while dramatically reducing overall pipeline complexity and runtime. As a stand-alone duplicate marking tool, it performs significantly better than PICARD or SAMBAMBA in terms of both speed and memory usage, while achieving nearly identical results. Availability: SAMBLASTER is open source C++ code and freely available from https://github.com/GregoryFaust/samblaster

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.