pith. sign in

arxiv: 1811.01121 · v3 · pith:XJZFM65Unew · submitted 2018-11-02 · 💻 cs.DS · math.PR· q-bio.QM

Optimal Sequence Length Requirements for Phylogenetic Tree Reconstruction with Indels

classification 💻 cs.DS math.PRq-bio.QM
keywords treelengthmodelsequencephylogeneticreconstructionsequencesbounds
0
0 comments X
read the original abstract

We consider the phylogenetic tree reconstruction problem with insertions and deletions (indels). Phylogenetic algorithms proceed under a model where sequences evolve down the model tree, and given sequences at the leaves, the problem is to reconstruct the model tree with high probability. Traditionally, sequences mutate by substitution-only processes, although some recent work considers evolutionary processes with insertions and deletions. In this paper, we improve on previous work by giving a reconstruction algorithm that simultaneously has $O(\text{poly} \log n)$ sequence length and tolerates constant indel probabilities on each edge. Our recursively-reconstructed distance-based technique provably outputs the model tree when the model tree has $O(\text{poly} \log n)$ diameter and discretized branch lengths, allowing for the probability of insertion and deletion to be non-uniform and asymmetric on each edge. Our polylogarithmic sequence length bounds improve significantly over previous polynomial sequence length bounds and match sequence length bounds in the substitution-only models of phylogenetic evolution, thereby challenging the idea that many global misalignments caused by insertions and deletions when $p_{indel}$ is large are a fundamental obstruction to reconstruction with short sequences.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.