pith. sign in

arxiv: 1808.06109 · v1 · pith:SGSU4FH4new · submitted 2018-08-18 · 📊 stat.AP

Bayesian Hidden Markov Tree Models for Clustering Genes with Shared Evolutionary History

classification 📊 stat.AP
keywords climegenesevolutionaryclusteringevolutiongenemodelsalgorithm
0
0 comments X
read the original abstract

Determination of functions for poorly characterized genes is crucial for understanding biological processes and studying human diseases. Functionally associated genes are often gained and lost together through evolution. Therefore identifying co-evolution of genes can predict functional gene-gene associations. We describe here the full statistical model and computational strategies underlying the original algorithm, CLustering by Inferred Models of Evolution (CLIME 1.0) recently reported by us [Li et al., 2014]. CLIME 1.0 employs a mixture of tree-structured hidden Markov models for gene evolution process, and a Bayesian model-based clustering algorithm to detect gene modules with shared evolutionary histories (termed evolutionary conserved modules, or ECMs). A Dirichlet process prior was adopted for estimating the number of gene clusters and a Gibbs sampler was developed for posterior sampling. We further developed an extended version, CLIME 1.1, to incorporate the uncertainty on the evolutionary tree structure. By simulation studies and benchmarks on real data sets, we show that CLIME 1.0 and CLIME 1.1 outperform traditional methods that use simple metrics (e.g., the Hamming distance or Pearson correlation) to measure co-evolution between pairs of genes.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.