Machine Learning Bias Correction for Minimal-error Classifier and a Meta-analysis Framework for Sparse K-means in Genomic Applications
- 2013-06-10 (Mon.), 15:00 PM
- Recreation Hall, 2F, Institute of Statistical Science
- Prof. George C. Tseng
- Dept. of Biostatistics, Univ. of Pittsburgh, USA
Abstract
In this talk, I will cover two topics we recently developed for genomic applications. In the first part, we investigated the machine learning bias when one utilizes many classifiers and chooses the best to report. We studied the properties of the bias in relation to the sample size and classifiers used. We proposed an inverse power law method to correct the bias and compared it to conventional nested cross-validation. Finally we used large-scale empirical gene expression data to recommend a practical guideline for practitioners. In the second part, we extended the sparse K-means algorithm to a meta-analysis framework to combine multiple gene expression profiles for improved disease subtype discovery. The result showed more stable and accurate sample clustering to identify meaningful disease subtypes.?