pith. sign in

arxiv: 2010.14460 · v2 · pith:QZRHYYJDnew · submitted 2020-10-27 · 🧮 math.PR · cs.CE· math.ST· q-bio.PE· stat.TH

Impossibility of phylogeny reconstruction from k-mer counts

classification 🧮 math.PR cs.CEmath.STq-bio.PEstat.TH
keywords countsphylogenysequenceestimationimpossibilityinfinityleafsequences
0
0 comments X
read the original abstract

We consider phylogeny estimation under a two-state model of sequence evolution by site substitution on a tree. In the asymptotic regime where the sequence lengths tend to infinity, we show that for any fixed $k$ no statistically consistent phylogeny estimation is possible from $k$-mer counts over the full leaf sequences alone. Formally, we establish that the joint distribution of $k$-mer counts over the entire leaf sequences on two distinct trees have total variation distance bounded away from $1$ as the sequence length tends to infinity. Our impossibility result implies that statistical consistency requires more sophisticated use of $k$-mer count information, such as block techniques developed in previous theoretical work.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.