pith. sign in

arxiv: q-bio/0701036 · v1 · pith:5PEQ4R25new · submitted 2007-01-24 · 🧬 q-bio.BM · math.PR

Parametrized Stochastic Grammars for RNA Secondary Structure Prediction

classification 🧬 q-bio.BM math.PR
keywords stochasticfamilyscfgsecondarystructurearchitecturebaseschain
0
0 comments X
read the original abstract

We propose a two-level stochastic context-free grammar (SCFG) architecture for parametrized stochastic modeling of a family of RNA sequences, including their secondary structure. A stochastic model of this type can be used for maximum a posteriori estimation of the secondary structure of any new sequence in the family. The proposed SCFG architecture models RNA subsequences comprising paired bases as stochastically weighted Dyck-language words, i.e., as weighted balanced-parenthesis expressions. The length of each run of unpaired bases, forming a loop or a bulge, is taken to have a phase-type distribution: that of the hitting time in a finite-state Markov chain. Without loss of generality, each such Markov chain can be taken to have a bounded complexity. The scheme yields an overall family SCFG with a manageable number of parameters.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.