jump to main area
:::
A- A A+

Seminars

Some Statistical Analysis on Genomics and Functional Genomics Data

  • 2003-02-18 (Tue.), 10:30 AM
  • Recreation Hall, 2F, Institute of Statistical Science
  • Prof. I-Ping Tu
  • The Stanford Functional Genomics Facility, Stanford Univ., USA

Abstract

I will address two approaches to study that genes are associated with some traits (e.g. diseases). One is in genomics level, for example, the goal of genetic linkage analysis is to infer the location of trait genes relative to the markers in genomics. Another one is at the functional genomics level, for example, Microarrays can measure the expression tens of thousands of genes simultaneously to identify changes in gene expression between different biological states or identify expression changes across different tumor or tissue types. I will discuss some statistical problems related to two methods in genetic linkage analysis: Allele Sharing Method and Transmission Disequilibrium Test (TDT). In addition, a statistical algorithm to identify and correct the inconsistencies in Microarray expression data at the 96/384 well plate level will be introduced in this talk. The Maximum of a Function of a Markov Chain and Application to Linkage Analysis (Allele Sharing Method) One method of linkage analysis in humans is based on identity-by-descent of pairs of relatives who share a phenotype of interest (for example, a particular disease). We replace the convenient assumption of continuous specification of regions of identity by descent by the more realistic, although still artificially simple, assumption of data from a discrete set of equally spaced infinitely polymorphic markers. We generalize the continuous time Markov chain analysis of Feingold ( J. Appl. Probab. [1993]) and compare the accuracy of the new approximation with that of the simpler Gaussian approximation of Feingold, Brown and Siegmund (Am. J. Hum. Genet. [1993]) under a variety of assumptions about the composition of the pedigrees to be studied. Detection of Disease Genes by Use of Family Data (An extension of TDT) We present a likelihood-based score statistic to evaluate disequilibrium in the transmission of marker alleles from parents to offspring. This statistic, when applied to nuclear families, generalizes the transmission disequilibrium test to arbitrary numbers of affected and unaffected sibling with or without typed parents. We apply the statistic to data on a polymorphism of the SDR5A2 gene in nuclear families with multiple cases of prostate cancer. Mix Up, Fix Up (MuFu) (Microarray Array Data Analysis) We have developed a simple statistical strategy (MuFu) to identify previously undetected inconsistencies in Microarray expression data at the 96/384 well plate level and, with or without the help of DNA sequencing, to correct the data. More than 6000 human cDNA Microarray experiments have been deposited in SMD (Stanford Microarray Database). These data were collected over three years and include experiments in which a common reference consisting of 11 cell lines (CH1 Intensity: Cy3) is compared to a tumor/tissue specimen (CH2 Intensity: Cy5). The experiments were carried out on arrays from 100 different print runs. Each print run produced 137 to 255 microarrays, depending on the arrayer that was used. The size of the array varied from 9K to 45K elements. DNA was amplified, by PCR, four times from the same source plates. Each round of PCR generated 4 sets of print plates that were used to print arrays. PCR products were transferred from 96-well plates to 384-well plates. Arrays were then printed from the 384-well plates. During the transfer from the 96-well to 384- well format the possibility to swap or rotate plates exists. In addition, during a given print-run, a 384-well plate may be printed in the wrong orientation or skipped, accidentally. Because the signal in the reference Channel 1 is consistent, we were able to use MuFu to measure the similarities of Channel 1 intensities between various plates of different experiments, and ultimately to detect anomalies within and across PCR rounds and print runs and to detect skipped, swapped or rotated plates. We have successfully fixed the print batch-specific problems including skipped and rotated print plates by MuFu. By combining the results from MuFu and additional DNA sequencing information to resolve ambiguities, we have fixed swapped and rotated 96 well plate mistakes. A distance metric was employed to measure the similarities between two plates. This methodology will be discussed in detail and the corresponding results and fixes will be presented.

Update:
scroll to top