jump to main area
:::
A- A A+

Seminars

Validity and Reliability of Preprocessing and Differential Expression Combinations for Affymetrix GeneChip Microarrays

  • 1970-01-01 (Thu.), 10:00 AM
  • Auditorium, 2F, Tsai Yuan-Pei Memorial Hall
  • Prof. Guan-Hua Huang
  • Institute of Statistics, National Chiao Tung University

Abstract

TMicroarray technology has been widely used for several years and a large number of computational analysis tools have been developed. We focus on the most popular platform, Affymetrix GeneChip arrays. Despite the rich research on selecting the optimal method of preprocessing and/or differential expression, this paper is unique in the following aspects. First, we have explored the best combination of preprocessing and differential expression methods. Second, we have evaluated both validity (accuracy) and reliability (reproducibility) on a variety of datasets with distinct characteristics. Third, we have compared stochastic-model-based and physical-mode-based preprocessing algorithms and gene-specific and empirical-Bayes’ differential expression detection. To evaluate which combinations of preprocessing and differential expression methods perform well, we considered 4 popular preprocessing methods (MAS 5.0, RMA, dChip and PDNN) and 5 popular differential expression methods (fold-change, two sample t-test, SAM, EBarrays and limma). We used three spike-in datasets to assess the validity, and ROC curves were used for the evaluation. To evaluate the reliability, we used another dataset from the MAQC project, which was generated using samples hybridized to Affymetrix platform at two different test sites. Overlap rates between two test sites were compared. I have found that validity was more sensitive to preprocessing methods, whereas reliability was more sensitive to differential expression methods. Considering both validity and reliability, six combinations were recommended when a small percentage of the genes were differentially expressed: RMA+fold-change, RMA+SAM, RMA+limma, PDNN+fold-change, PDNN+SAM and PDNN+limma. Three combinations were recommended when the percentage of differentially expressed genes was large: dChip(PM-only)+fold-change, dChip(PM-only)+SAM and dChip(PM-only)+limma.

Update:
scroll to top