jump to main area
:::
A- A A+

Seminars

Regression Analysis Using Summary-level Information from External Big Data Sources

  • 2019-05-06 (Mon.), 11:00 AM
  • R6005, Research Center for Environmental Changes Building
  • Prof. Yi-Hau Chen
  • Institute of Statistical Science, Academia Sinica

Abstract

Information from various public and private data sources of extremely large sample sizes is now increasingly available for research purposes. Statistical methods are needed for utilizing information from such big data sources while analyzing data from individual studies that collect more detailed information than external sources to address specific problems. In this talk, we consider regression analysis with individual-level data from an “internal” study while utilizing summary-level or crude information from an “external” big data source. The constraints that link internal and external models are identified and used to develop a semiparametric maximum likelihood inference framework. Our proposal considers both the settings where the covariate distribution in the internal sample is the same as or different from that in the external data sources. Extensions to complex stratified sampling designs, such as the case-control design, for internal studies are also considered. Asymptotic distribution theory is developed. We use simulation studies and a real data application to assess the performance of the proposed methods. Some related research topics will also be discussed.

Update:
scroll to top