jump to main area
:::
A- A A+

Seminars

Stein's lemma and liquid association - a novel statistical measure for elucidating aggregated microarray gene expression data

  • 2001-05-23 (Wed.), 15:00 PM
  • Recreation Hall, 2F, Institute of Statistical Science
  • Prof. Ker-Chau Li
  • Department of Statistic University of California Los Angeles U.S.A

Abstract

Microarray technology enables the massive measurement of mRNA at the full genome scale. It opens a wide window for scientists to visualize the extremely complex, yet well-orchestrated cellular activities as encoded by thousands of genes in the cell. By studying the gene expression data aggregated from various biological experiments involving different cell species or under a variety of environmental conditions, interesting hypotheses about protein functionality can be speculated. Genes with high degree of expression similarity may be co- regulated by common upstream regulatory factors. They are likely to be functionally related and may participate in common pathways. Pearson's correlation coefficient, a simple way of describing the strength of linear association between a pair of random variables, has become the most popular measure of gene expression similarity. However, in vivo, the association between a pair of genes is not solid. Their expression levels are prone to be influenced by the activities of other genes. In view of this, we introduce a new statistical measure, liquid association (LA), for quantifying the flow in the association between a pair of genes as the expression level of a third gene varies. A pair of genes, A and B, is called a LAP (liquid associated pair) of gene C if the corresponding LA measure (in the absolute value) is large. Genes which exert greater influence on others, as judged by comparing the distribution of their LA's, are called expression masters. We apply LA to the Stanford Yeast Cell-Cycle data. Top 100 positive and 100 negative expression masters are obtained from nearly 6000 Yeast ORF's. Among them, 6 are in the energy category of glycolysis and metabolism of energy reserves (glycogen, trehalose), according to MIPS : PFK1 and PFK2 (coding 6-phosphofructokinase, the major flux-controlling enzyme of glycolysis); and TPS2, TPS1, GSY1, GLC3 (appearing, in KEGG's chart for the starch and sucrose metabolism pathway, neck to neck with alpha-alpha Trehalose-6P, UDP-glucose, and glycogen being the intermediate in sequence). A shorter list of expression masters with known fucntions is disussed, including CYR1(adenylate cyclase, catalyzing the transformation of ATP into cAMP), CYC7 (cytochrome C isoform 2), QCR9(ubiquinol cytochrome c reductase subunit), ATP1 (the F1 alpha subunit of the F0F1 ATPase complex in mitochondria), and APC1(subnit of anaphase-promoting complex (cyclosome)). The statistical theory for LAP is based on ideas similar to SIR/ PHD for high dimensional data analysis. Broader application for voluminous data analysis in physical and social sciences will be discussed.

Update:
scroll to top