pith. sign in

arxiv: 2207.12543 · v1 · pith:CGFVBMWSnew · submitted 2022-07-25 · 🧬 q-bio.PE · cs.CE· math.PR· math.ST· stat.TH

Pairwise sequence alignment at arbitrarily large evolutionary distance

classification 🧬 q-bio.PE cs.CEmath.PRmath.STstat.TH
keywords sequencealignmentancestralphylogenyevolutionaryknownlargemolecular
0
0 comments X
read the original abstract

Ancestral sequence reconstruction is a key task in computational biology. It consists in inferring a molecular sequence at an ancestral species of a known phylogeny, given descendant sequences at the tip of the tree. In addition to its many biological applications, it has played a key role in elucidating the statistical performance of phylogeny estimation methods. Here we establish a formal connection to another important bioinformatics problem, multiple sequence alignment, where one attempts to best align a collection of molecular sequences under some mismatch penalty score by inserting gaps. Our result is counter-intuitive: we show that perfect pairwise sequence alignment with high probability is possible in principle at arbitrary large evolutionary distances - provided the phylogeny is known and dense enough. We use techniques from ancestral sequence reconstruction in the taxon-rich setting together with the probabilistic analysis of sequence evolution models involving insertions and deletions.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.