跳到主要內容區塊
:::
A- A A+

演講公告

:::

An Approach for Big Data Variable Selection and Classification

  • 2017-07-05 (Wed.), 10:30 AM
  • 中研院-統計所 2F 交誼廳
  • 茶 會:上午10:10統計所二樓交誼廳
  • Prof. Shaw-Hwa Lo 羅小華 教授
  • Department of Statistics, Columbia University, USA

Abstract

Current practices toward prediction problems generally include using a significance-based criterion for evaluating variables to use in a chosen model and evaluating variables and models simultaneously for prediction, using cross-validation or independent test data. Our recent works showed that significant variables may not necessarily be predictive, and that strong predictors may not appear statistically significant at all. This left us with an important question: how can we find highly predictive variables then, if not through a guideline of statistical significance? To respond, we suggest a “Partition Retention (PR)” approach, for handling general big data variable selection and classification (prediction) problems. PR alters standard statistical practice in big data analysis, switching from significance-based modeling to seeking variables with high predictivity, a novel parameter of interest. We introduce the I-score, a statistic that can select variables sets with very high prediction rates and is closely related to a very useful lower bound of the predictivity. ??? There are diverse scientific applications for which the PR approach would be useful, for example in formulating predictions about diseases with high dimensional data, such as gene datasets, in the social sciences for text prediction or financial markets predictions; in terrorism, civil war, elections and financial markets. We're hoping this opens up a new field of work that would focus on designing new statistics that measure predictivity."

最後更新日期:
回頁首