jump to main area
:::
A- A A+

Seminars

Mining Massive Text Data and Developing Tracking Statistics

  • 2005-03-16 (Wed.), 10:30 AM
  • Recreation Hall, 2F, Institute of Statistical Science
  • Prof. Regina Liu
  • Dept. of Statistics, Rutgers University,USA

Abstract

We present a systematic data mining procedure for exploring large free- style text datasets to discover useful features and develop tracking statistics, often referred to as performance measures or risk indicators. The procedure includes text classification, inference under error measurements and risk analysis. An aviation safety report repository from the FAA is used to illustrate applications of our research to aviation risk management and general decision-support systems. Some specific text analysis methodologies and tracking statistics are discussed. Several approaches for incorporating misclassified data or error measurements into the inference for tracking statistics are proposed and evaluated. Although most illustrations here are drawn from aviation safety data, the proposed data mining procedure applies to many other domains, including, for example, mining free-style medical reports for tracking possible disease outbreaks. (This is joint work with Daniel Jeske, Department of Statistics, UC Riverside.)

Update:
scroll to top