Statistical Standards Program
Table of Contents
1. Development of Concepts and Methods
2. Planning and Design of Surveys
3. Collection of Data
4. Processing and Editing of Data
5. Analysis of Data / Production of Estimates or Projections
6. Establishment of Review Procedures
7. Dissemination of Data
·Problems with Ignoring Item Nonresponse
·Imputing Item Nonresponse
·Data Analysis with Imputed Data
·Comparisons of Methods
For help viewing PDF files, please click here
|APPENDIX B: EVALUATING THE IMPACT OF IMPUTATIONS FOR ITEM NONRESPONSE|
This brief review has highlighted the fact that imputed data sets can provide good estimates of means and totals, and that with some care and attention in the selection of the imputation method, the distributions can be reasonably well preserved. However, as Kovar and Whitridge, 1995 point out "The situation is not as favorable when it comes to estimates of variances and correlations." They note that numerous studies have shown that imputations can have a deleterious effect on the statistics of the estimates. In particular, correlations between imputed variables are attenuated to varying degrees, but good auxiliary variables can help this problem (Santos 1981; Kalton and Kasprzyk, 1982, 1986; and Little, 1986).
When standard formulas are used for the computation of statistics for estimates based on imputed data, the variances of estimated means and totals are underestimated (Rubin, 1978). This underestimation occurs because standard computing software treats imputed values for missing data as observed data and thus, ignores the component of variance that is due to imputation. Kovar and Whitridg,, in Cox et. al. 1995, report that standard variance formulas underestimate the variance with imputations present by about 2 to 10 percent with a nonresponse rate of 5 percent and by as much as 10 to 50 percent with 30 percent nonresponse. The size of the underestimate varies with different types of imputation.
Brick and Kalton, 1996, discuss two methods for reducing imputation variance. The first method involves the use of sampling strategies. Selecting donors without replacement within each imputation class minimizes the multiple use of donors resulting in a lower imputation variance compared to sampling with replacement. When there is more than one respondent in a class, stratified sampling with a class or systematic sampling from an ordered list can also help reduce imputation variance. The second method relies on fractional imputation. With this approach individual respondent records are divided into parts, with weights distributed accordingly, and separate donors are chosen for each part of the respondent's record.
The underestimation of the variance results in short confidence intervals and a tendency to declare significance when none exists. Sarndall, 1992 demonstrated that these statistical problems become more severe as the amount of missing data increases. Lessler and Kalsbeek, 1992 point out that the size of the nonresponse bias associated with totals, means, variances, and covariances is linked to differences between respondents and nonrespondents.
There are several recently developed techniques designed to estimate the variance due to imputation. Rubin pioneered the use of multiple imputations in this arena, estimating the variance by replicating the process a number of times and then estimating the between replicate variances. Sarndall, 1992, proposed a method using model-assisted estimators of variance. Rao and Shao, 1992, use a method that corrects the usual jackknife variance estimator. Brick, Kalton, Kim and Fuller are currently under contract to NCES, conducting an evaluation of these new methodologies. The Statistical Standards Program at NCES is also supporting work by Aitken on an alternative approach using the EM algorithm.
Despite these limitations and cautions associated with various imputation methods, Little and Rubin, 1987, note that "It is important to emphasize that in many applications the issue of nonresponse bias is often more crucial than that of bias. In fact, it has been argued that providing a valid estimate of sampling variance is worse than providing no estimate if the estimator has a large bias "