Canonical Ensembles for Nearly Compatible and Incompatible Conditional Models
- 2010-06-14 (Mon.), 10:30 AM
- Auditorium, 2F, Tsai Yuan-Pei Memorial Hall
- Prof. Yuchung Wang
- Rutgers University, USA
Abstract
There are two fundamental approaches in machine learning: (1) algorithm to optimize certain objective functions, or (2) algorithm to generate an ensemble of solutions and then weighted-averaging the ensemble. Last year (2009), I gave a talk on an ensemble approach based on the Gibbs sampler and showed that the Gibbs ensemble does generate reasonable joint distributions for incompatible conditional models. Given that conditional models are rarely compatible and Gibbs sampler is computationally intensive (30 hours in R for 100,000 3-D sampled points), we propose a new ensemble algorithm based on the canonical parameterization of a joint distribution. In this talk, I will first give an overview of the ensemble approach.? My review is based on Zhu (2008, American Statistician).? Because our ensemble is deterministic and no simulation is required, it is extremely efficient (seconds versus hours).?? In addition, it is scalable so that it can handle large data sets of high dimensionality with easy.? It is also adaptive to different performance measures.? Using simulated data, we show that the proposed approach provides joint distributions that are less discrepant from the incompatible conditionals than the answers obtained via linear programming and by pure Gibbs sampler.? The ensemble approach is applied to a data set regarding geno-polymorphism and response to chemotherapy in patients with metastatic colorectal cancer.? Its advantage in selecting conditional models will be illustrated.