中央研究院統計科學研究所

演講公告

演講公告演講公告

:::

An Approach for Big Data Variable Selection and Classification

2017-07-05 (Wed.), 10:30 AM
中研院-統計所 2F 交誼廳
茶會：上午10：10統計所二樓交誼廳
Prof. Shaw-Hwa Lo 羅小華教授
Department of Statistics, Columbia University, USA

Abstract

Current practices toward prediction problems generally include using a significance-based criterion for evaluating variables to use in a chosen model and evaluating variables and models simultaneously for prediction, using cross-validation or independent test data. Our recent works showed that significant variables may not necessarily be predictive, and that strong predictors may not appear statistically significant at all. This left us with an important question: how can we find highly predictive variables then, if not through a guideline of statistical significance? To respond, we suggest a “Partition Retention (PR)” approach, for handling general big data variable selection and classification (prediction) problems. PR alters standard statistical practice in big data analysis, switching from significance-based modeling to seeking variables with high predictivity, a novel parameter of interest. We introduce the I-score, a statistic that can select variables sets with very high prediction rates and is closely related to a very useful lower bound of the predictivity. ??? There are diverse scientific applications for which the PR approach would be useful, for example in formulating predictions about diseases with high dimensional data, such as gene datasets, in the social sciences for text prediction or financial markets predictions; in terrorism, civil war, elections and financial markets. We're hoping this opens up a new field of work that would focus on designing new statistics that measure predictivity."

最後更新日期：2025-07-01 19:20

回列表頁