- Surveys & Programs
- Data & Tools
- Fast Facts
- News & Events
- Publications & Products
- About Us

Statistical Standards Program
Table of Contents Introduction 1. Development of Concepts and Methods 2. Planning and Design of Surveys 3. Collection of Data 4. Processing and Editing of Data 5. Analysis of Data / Production of Estimates or Projections 6. Establishment of Review Procedures 7. Dissemination of Data Glossary Appendix A Appendix B ·Measuring Bias ·Problems with Ignoring Item Nonresponse ·Imputing Item Nonresponse ·Data Analysis with Imputed Data ·Comparisons of Methods ·References Appendix C Appendix D Publication information For help viewing PDF files, please click here |
APPENDIX B: EVALUATING THE IMPACT OF IMPUTATIONS FOR ITEM NONRESPONSE |

There are a number of extant studies comparing alternative imputation methods. Two of them were conducted using NCES data, and a third involving a set of simulations was supported by NCES.
A linear regression model was used to predict a student's performance on a reading literacy test. The three reading scores used as the dependent variables were the narrative, expository, and document performance scores. These scores were derived using Item Response Theory models scaled for international comparison (Elley, 1992). The predictor variables used in all models were gender, age, race, father's and mother's education, family structure, family composition, family wealth/possessions, and use of a language other than English at home. The amount of missing data ranged from 0 to 18 percent with 31 percent missing data for one or more variables. Unweighted ordinary least squares regressions were run using each of the four imputation methods for the three independent variables. For each independent variable, the regression coefficients estimated using the HD, EM, and AC methods were very similar. The estimates using the CC analysis method were dissimilar. This analysis also used adjusted mean scores to examine the performance of subgroups of students after controlling for other characteristics. The adjusted scores for a number of subgroups (e.g., gender, minority status, and parent's education) showed mean scores using CC that were approximately 10 points higher than the mean scores using HD, EM, and AC. These differences are presumably explained by the fact that the CC analysis excludes the 31 percent of the students who had missing data on one or more items. This analysis was repeated for a comparison of CC, AC, and HD using weighted data. Although the use of the weights reduced the size of the gap somewhat, the differences persisted, with the CC analysis method yielding higher estimates than the AC and HD methods (which yielded similar results). The authors of this report concluded that the CC analysis method was clearly inefficient. Rather than the missing cases being randomly distributed, they found evidence that the students with missing data differed from those with complete data in reading performance, race/ethnicity, type of community, region of the country, and control of the school. They further concluded that given the similarity of results between the remaining three methods (AC, HD, and EM) since the HD method is the easiest to implement it is the best to use for the IEA study.
The authors of this analysis first identified a set of auxiliary variables, and then using the subset of cases with complete cases they simulated different levels and patterns of missingness assuming about 20 percent missing data. Following the simulation, the incomplete data were compared with the imputed data using the average imputing error, the bias of the variance, and the mean bias. The average imputation error was found to be consistently lower in the model-based approach compared to the hot-deck approach. Looking first at math, although a comparison of the bias of the mean across the two imputation methods and the incomplete data showed no consistent pattern; the means computed with the incomplete data were outperformed by one or both of the other two imputation methods in all but one comparison (i.e., the bias was smaller for one of the other two methods). The relative bias of the variance was consistently smaller in the model-based approach than it was in the other two approaches. The same results were observed in reading. The authors concluded that the model-based approach was the "preferred method" and proceeded to use PROC IMPUTE to implement the imputations for the NELS data set.
The evaluation criteria used include: bias of parameter estimates,
bias of variance estimates, coverage probability, confidence interval
width, and average imputation error. They found that the results varied
across different types of missing data; the five types considered are:
missing completely at random (MCAR), tails more likely missing, large
values more likely missing, center values more likely missing, tail
values more likely missing with confounded (missingness in In the case where large values are missing, ratio imputation (with or without disturbances), and data augmentation (Schafer) correct the bias in the mean; and within class random imputation and the sequential nearest neighbor hot-deck improved the biases substantially. However, the authors cautioned that the findings for ratio imputation may well be an artifact of their manipulation of the data. In summary, they note that although the improvement is much less when there is a right skewed distribution, in most cases these methods provide improvement when considerable biases exist in the means with the incomplete data. In summarizing the results for variance estimation, the authors concluded that all imputation methods studies, except the mean imputation method, yield acceptable variance estimates when the data are missing completely at random. For the three unconfounded types of missing data-tails missing, large values missing, and center missing ¾ data augmentation (Schafer) worked best, but ratio imputation, within class random imputation, and the sequential nearest neighbor hot-deck method all can improve the biases of variance estimates dramatically. (However, there is a caution that the ratio imputation method tends to overestimate the variance.) For the confounded missing data pattern, where the missingness is related to the variable itself, only the ratio imputation methods (with and without disturbances) results in a substantial improvement in the bias of the variance. When coverage rates and confidence interval widths are considered together, data augmentation (Schafer) and adjusted data augmentation are the least likely to provide bad estimates. Finally, when average imputation error is considered, ratio imputation, data augmentation (Schafer), and within class random imputation perform best, followed by hot-deck, ratio with disturbance, and mean imputation methods. Looking across the entire set of results, data augmentation (Schafer) is the one imputation method that scores high on all accounts. Two other methods that are more commonly used at NCES ¾ within class random imputation (PROC IMPUTE) and the sequential nearest neighbor hot-deck method-also performed well in estimating means and variances and perform reasonably well on coverage rates and average imputation error (although within class random imputation (PROC IMPUTE) usually edges out the hot-deck method). |