Active Learning Support Vector Machines to Classify Imbalanced Reservoir Simulation Data
- 2011-05-23 (Mon.), 10:30 AM
- Recreation Hall, 2F, Institute of Statistical Science
- Dr. Tina Yu
- Department of Computer Science, Memorial University of Newfoundland, St Johns NL A1B 3X5, Canada
Abstract
Reservoir modeling is an on-going activity during the production life of a reservoir. One challenge to constructing accurate reservoir models is the time required to carry out a large number of computer simulations. To address this issue, we have constructed surrogate models (proxies) for the computer simulator to reduce the simulation time. The quality of the proxies, however, relies on the quality of the computer simulation data. Frequently, the majority of the simulation outputs match poorly to the production data collected from the field. In other words, most of the data describe the characteristics of what the reservoir is not (negative samples), rather than what the reservoir is (positive samples). Applying machine learning methods to train a simulator proxy based on these data faces the challenge of imbalanced data set. This work ?applies active learning support vector machines to incrementally select a subset of informative simulation data to train a classifier as the simulator proxy. We compare the results with the results produced by the standard support vector machines combined with other imbalanced training set handling techniques. We also analyze reservoir characteristics that are revealed from the support vectors in the trained classifiers.