pith. sign in

arxiv: 1306.4420 · v1 · pith:V6R2M56Bnew · submitted 2013-06-19 · 🧬 q-bio.BM

Statistical inference for template-based protein structure prediction

classification 🧬 q-bio.BM
keywords proteinsequencestructuretemplatesalignmentsproteinsalignmentlearning
0
0 comments X
read the original abstract

Protein structure prediction is one of the most important problems in computational biology. The most successful computational approach, also called template-based modeling, identifies templates with solved crystal structures for the query proteins and constructs three dimensional models based on sequence/structure alignments. Although substantial effort has been made to improve protein sequence alignment, the accuracy of alignments between distantly related proteins is still unsatisfactory. In this thesis, I will introduce a number of statistical machine learning methods to build accurate alignments between a protein sequence and its template structures, especially for proteins having only distantly related templates. For a protein with only one good template, we develop a regression-tree based Conditional Random Fields (CRF) model for pairwise protein sequence/structure alignment. By learning a nonlinear threading scoring function, we are able to leverage the correlation among different sequence and structural features. We also introduce an information-theoretic measure to guide the learning algorithm to better exploit the structural features for low-homology proteins with little evolutionary information in their sequence profile. For a protein with multiple good templates, we design a probabilistic consistency approach to thread the protein to all templates simultaneously. By minimizing the discordance between the pairwise alignments of the protein and templates, we are able to construct a multiple sequence/structure alignment, which leads to better structure predictions than any single-template based prediction.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.