jump to main area
:::
A- A A+

Seminars

Does E-M algorithm work for identifying principal components when massive data are missing?

  • 2000-08-07 (Mon.), 10:30 AM
  • Recreation Hall, 2F, Institute of Statistical Science
  • Professor J. T. Gene Hwang
  • Department of Statistics & Department of Mathe

Abstract

E-M algorithm is one of the most popular techniques in statistical applications. It is the technique when there are missing data, which frequently happens in applications. The question in the title is what a group of engineers and engineering statisticians in the National Institute of Standards and Technology (NIST) pondered in their research work related to HELP (high dimensional empirical linear prediction). Using a statistical language, HELP is a prediction technique of the "future" observations based on a factor analysis model. What to do with the application of HELP is one of the most everlasting problems consulted at the Statistical Engineering devision of NIST. The problem is also equivalent to estimate the principal components when massive data are missing. Previously when data has no missing values, HELP may be used to save us lots of measurements. However one problem occurs after several applications of HELP: we end up with a set of data with many unmeasured observations, or missing data. The engineers are wondering whether they can use the remaining data with missing values to detect the additional, say, two vectors to be included in the model. They tried the E-M algorithm and seemed to conclude that it does not work. In this talk, I will develop a statistical theory which has changed the mind of the engineers. Although we thought for a while that EM works well, a more close scrutinizing reveals a somewhat surprising phenomenon that casts some doubt about the effectiveness of E-M. Come and decide for yourselves whether E-M works well in light of the phenomenon.

Update:
scroll to top