pith. sign in

arxiv: physics/0104028 · v1 · submitted 2001-04-06 · ⚛️ physics.bio-ph · physics.data-an· q-bio.QM

Zipf's Law in Importance of Genes for Cancer Classification Using Microarray Data

classification ⚛️ physics.bio-ph physics.data-anq-bio.QM
keywords zipfpower-lawmicroarrayabilitycancerclassificationconditionsdata
0
0 comments X
read the original abstract

Microarray data consists of mRNA expression levels of thousands of genes under certain conditions. A difference in the expression level of a gene at two different conditions/phenotypes, such as cancerous versus non-cancerous, one subtype of cancer versus another, before versus after a drug treatment, is indicative of the relevance of that gene to the difference of the high-level phenotype. Each gene can be ranked by its ability to distinguish the two conditions. We study how the single-gene classification ability decreases with its rank (a Zipf's plot). Power-law function in the Zipf's plot is observed for the four microarray datasets obtained from various cancer studies. This power-law behavior in the Zipf's plot is reminiscent of similar power-law curves in other natural and social phenomena (Zipf's law). However, due to our choice of the measure of importance in classification ability, i.e., the maximized likelihood in a logistic regression, the exponent of the power-law function is a function of the sample size, instead of a fixed value close to 1 for a typical example of Zipf's law. The presence of this power-law behavior is important for deciding the number of genes to be used for a discriminant microarray data analysis.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.