 Statistical Standards Program

Introduction
1. Development of Concepts and Methods
2. Planning and Design of Surveys
3. Collection of Data
4. Processing and Editing of Data
5. Analysis of Data / Production of Estimates or Projections
6. Establishment of Review Procedures
7. Dissemination of Data

Glossary
Appendix A
Appendix B

·Measuring Bias
·Problems with Ignoring Item Nonresponse
·Imputing Item Nonresponse
·Data Analysis with Imputed Data
·Comparisons of Methods
·References

Appendix C
Appendix D

Publication information
APPENDIX B: EVALUATING THE IMPACT OF IMPUTATIONS FOR ITEM NONRESPONSE

The Problem with Ignoring Item Nonresponse

The reason item nonresponse cannot be ignored is because once it exists, any analysis of the data item requires either an implicit or explicit imputation. To ignore the missing data and restrict analyses to those records with reported values for the variables in the analysis, implicitly invokes the assumption that the missing cases are a random subsample of the full sample, that is, they are missing completely at random (MCAR). This means that missingness is not related to the variables under study. This requires that all respondents are equally likely/unlikely to respond to the item and that the estimate is approximately unbiased. These are strong assumptions. As noted by Brick and Kalton, 1996, "The use of imputation can improve on this strategy."

Little and Rubin included a discussion of "Quick Methods for Multivariate Data with Missing Data" in their 1987 book Statistical Analysis with Missing Data. In introducing these methods they state "Although the methods appear in statistical computing software and are widely used, we do not generally recommend any of them except in special cases where the amount of missing data is limited." Included in this discussion are complete-case analyses where only the cases with all variables specified in the analysis included (i.e., the number of cases is fixed for all variables in an analysis) and available-case methods that include all cases where the variable of interest is present (i.e., the sample base changes from variable to variable). They conclude this discussion by stating "Neither method, however, is generally satisfactory."

Lessler and Kalsbeek also explored a variety of imputation methods in their 1992 book, Nonsampling Errors in Surveys. While they caution that there is no substitute for complete response, "…it is better when attempting to reduce nonresponse bias to use a well-chosen method than to do nothing at all, unless the rate of nonresponse is low."

Examples
A few numerical studies can help illustrate this point. Lessler and Kalsbeek, 1992 reported on a 1978 analysis that they conducted on data from the National Assessment of Educational Progress (NAEP). Their goal was to measure the effect of nonresponse on 17-year-old students, since they have lower response rates than the 13- or 9-year-old students. Their comparison of data from a subsample of nonresponding 17-year-olds with data from the original group of sample respondents showed that the size of the nonresponse bias relative to the variance component of most estimates in this survey was high. They noted that since bias does not depend on sample size, but variance diminishes as the sample size increases; nonresponse bias tends to be significant for large surveys. They also observed a direct relationship between the extent of nonresponse bias and a lowering of the actual confidence levels.

A second example may be drawn from "A study of selected nonsampling error in the 1991 Recent College Graduates Study," (U.S. Department of Education, 1995). The estimate of interest is the percent of graduates with a bachelor's degree who are education majors. Although technically the institution is the first stage of sample selection and the graduate is the second stage, for the purposes of this example the institution will be taken as the respondent and the item nonresponse is determined by whether the graduate responded or not. The institution response rate of 95 percent is posited to allow for a relatively accurate estimate of the item nonresponse bias.

The nonresponse rate for graduates was 16.4 percent. The institutions reported data showing that 7.79 percent of the nonrespondents majored in education, compared to 10.54 percent of the respondents. The bias can be estimated as:

[.164*(.1054 - .0779)] = .00451 = 0.5%

In other words, if the estimate were based only on the respondents, it would overestimate the percentage who are education majors by one-half a percent.

The relative bias with respect to the estimate, is:

(.00451/.1054) = .0428 = 4.3%

Thus, the bias is relatively small in this case. However, when the bias ratio is considered, a different picture emerges. In general, a bias ratio of 10 percent or less has little effect on confidence intervals or test of significance. That is to say, with a bias ratio of 10 percent, the probability of an error of more than 1.96 standard deviations from the mean is only 5.11 percent, compared with the usual 5 percent (table 1). In the graduate example, when the estimate of bias is compared to the standard error, the bias ratio is:

(.00451/.0003047) = 14.8 = 148%

The bias ratio of 148 percent means that that there is a 32 percent chance of a Type I error, (i.e., rejecting a true hypothesis) in computing the confidence interval or conducting a significance test in this example.

This bias ratio is so large because the estimated standard error is small, as is typically the case with large sample sizes. Thus, although the actual bias and the relative bias are relatively small, the bias ratio illustrates the fact that the impact on statistical inferences can still be quite large. This has important implications for Federal statistical agencies that conduct large sample surveys.

If we assume that the variance associated with the estimate of education majors is the same for respondents and nonrespondents. Then, the bias of the variance estimate in this example is:

(B( ) = - [(.164)(.836)](.1054 - .0779)2 = - .000104

The variance in this example is underestimated by .01 percent.

Table 1. Bias ratio by size of probability of a Type I error

Bias Ratio
(Percent)
Probability of Type I error
2 .0500
4 .0502
6 .0504
8 .0508
10 .0511
20 .0546
40 .0685
60 .0921
80 .01259
100 .1700
150 .3231
Cochran, 1977