pith. sign in

arxiv: 1108.0129 · v1 · pith:QOUOUVLEnew · submitted 2011-07-31 · 🧮 math.PR · cs.CE· cs.DS· math.ST· q-bio.PE· stat.TH

Identifiability and inference of non-parametric rates-across-sites models on large-scale phylogenies

classification 🧮 math.PR cs.CEcs.DSmath.STq-bio.PEstat.TH
keywords ratereconstructionidentifiabilitylargemodelsmutationphylogeniesrates-across-sites
0
0 comments X
read the original abstract

Mutation rate variation across loci is well known to cause difficulties, notably identifiability issues, in the reconstruction of evolutionary trees from molecular sequences. Here we introduce a new approach for estimating general rates-across-sites models. Our results imply, in particular, that large phylogenies are typically identifiable under rate variation. We also derive sequence-length requirements for high-probability reconstruction. Our main contribution is a novel algorithm that clusters sites according to their mutation rate. Following this site clustering step, standard reconstruction techniques can be used to recover the phylogeny. Our results rely on a basic insight: that, for large trees, certain site statistics experience concentration-of-measure phenomena.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.