A Reference-Free Algorithm for Computational Normalization of Shotgun Sequencing Data

arxiv: 1203.4802 · v2 · pith:BSUNHPQCnew · submitted 2012-03-21 · 🧬 q-bio.GN

A Reference-Free Algorithm for Computational Normalization of Shotgun Sequencing Data

C. Titus Brown , Adina Howe , Qingpeng Zhang , Alexis B. Pyrkosz , Timothy H. Brom This is my paper

classification 🧬 q-bio.GN

keywords datasequencingnormalizationshotguncomputationaldigitalsetsalgorithm

0 comments p. Extension

pith:BSUNHPQC Add to your LaTeX paper

What is a Pith Number?

\usepackage{pith}
\pithnumber{BSUNHPQC}

Prints a linked pith:BSUNHPQC badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

read the original abstract

Deep shotgun sequencing and analysis of genomes, transcriptomes, amplified single-cell genomes, and metagenomes has enabled investigation of a wide range of organisms and ecosystems. However, sampling variation in short-read data sets and high sequencing error rates of modern sequencers present many new computational challenges in data interpretation. These challenges have led to the development of new classes of mapping tools and {\em de novo} assemblers. These algorithms are challenged by the continued improvement in sequencing throughput. We here describe digital normalization, a single-pass computational algorithm that systematizes coverage in shotgun sequencing data sets, thereby decreasing sampling variation, discarding redundant data, and removing the majority of errors. Digital normalization substantially reduces the size of shotgun data sets and decreases the memory and time requirements for {\em de novo} sequence assembly, all without significantly impacting content of the generated contigs. We apply digital normalization to the assembly of microbial genomic data, amplified single-cell genomic data, and transcriptomic data. Our implementation is freely available for use and modification.

This paper has not been read by Pith yet.

A Reference-Free Algorithm for Computational Normalization of Shotgun Sequencing Data

discussion (0)