pith. sign in

arxiv: q-bio/0410033 · v1 · pith:4NXQAOY7new · submitted 2004-10-27 · 🧬 q-bio.GN · math.ST· stat.TH

Four basic symmetry types in the universal 7-cluster structure of 143 complete bacterial genomic sequences

classification 🧬 q-bio.GN math.STstat.TH
keywords bacterialclustergenomesgenomicsequencesstructureavailablefour
0
0 comments X
read the original abstract

Coding information is the main source of heterogeneity (non-randomness) in the sequences of bacterial genomes. This information can be naturally modeled by analysing cluster structures in the "in-phase" triplet distributions of relatively short genomic fragments (200-400bp). We found a universal 7-cluster structure in bacterial genomic sequences and explained its properties. We show that codon usage of bacterial genomes is a multi-linear function of their genomic G+C-content with high accuracy. Based on the analysis of 143 completely sequenced bacterial genomes available in Genbank in August 2004, we show that there are four "pure" types of the 7-cluster structure observed. All 143 cluster animated 3D-scatters are collected in a database and is made available on our web-site: http://www.ihes.fr/~zinovyev/7clusters The finding can be readily introduced into any software for gene prediction, sequence alignment or bacterial genomes classification.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.