jump to main area
:::
A- A A+

Seminars

Importance resampling for bootstrap confidence regions

  • 1999-03-08 (Mon.), 10:30 AM
  • Recreation Hall, 2F, Institute of Statistical Science
  • 劉 正 平 博士
  • Household Survey Methods Division, Statistics Canada

Abstract

In surveys and censuses, it is recognized that nonresponse is present even with the best prevention programs. There are various techniques to deal with nonresponse, but in this paper we are interested in imputation (for item nonresponse). Among imputation techniques, a wide range of methods exist for situations where the characteristics imputed are numerical or continuous, but there are fewer methods when the variables imputed are categorical. Yet, imputation of categorical variables may have a greater impact on final estimates, especially when they are in turn used to create imputation classes for continuous variables or when they form domains of interest. Unless specifically taken into account, imputation does not usually preserve the distribution of categorical variables and particularly if nonresponse is not missing completely at random. For instance, if hot-deck imputation techniques are used, then one can expect to have the distributions preserved, but it will not necessarily be the case for a given sample. To solve this, ratio adjustments (sometimes called pro-rating) is often applied to the data set after imputation, thus satisfying additive constraints or benchmarking totals. However, this approach fails to work for categories that remain empty after imputation and it may over adjust some imputed data. There are also more sophisticated methods to model the probability of being into a given category, but even with those approaches, the constraints may not be satisfied by the data after imputation. In many surveys such as some of those carried out at Statistics Canada, there is a need of adjusting the data after imputation. In this paper, we describe an iterative imputation algorithm, which performs categorical imputation and simultaneously calibrates the data set according to auxiliary information available for all units in the sample. The method is based on a constrained selection of records to insert into categories, thereby imputing the categorical variable(s). To insure that all nonrespondents are imputed, a forcing procedure also takes place after the insert step. The approach also applies to situations where an already-imputed data file does not satisfy specified constraints. In this case, categories having large differences in their distributions between auxiliary totals and imputed totals are identified. Then, records are exchanged between categories, using a search algorithm to satisfy the constraints imposed by the auxiliary totals. The resulting data file after imputation is balanced. That is, has an architecture, which preserves the categorical associations(distribution) of the auxiliary variables used in the process.

Update:
scroll to top