Skip Navigation
Digest of Education Statistics: 2004
Digest of Education Statistics: 2004

NCES 2006-005
October 2005

Appendix A.1. Guide to Sources

Sources and Comparability of Data

The information presented in this report was obtained from many sources, including federal and state agencies, private research organizations, and professional associations. The data were collected using many research methods, including surveys of a universe (such as all colleges) or of a sample, compilations of administrative records, and statistical projections. Digest users should take particular care when comparing data from different sources. Differences in sampling, data collection procedures, coverage of target population, timing, phrasing of questions, scope of non-response, interviewer training, and data processing and coding mean that the results from the different sources may not be strictly comparable. Following the general discussion of data accuracy below, descriptions of the information sources and data collection methods are presented, grouped by sponsoring organization. More extensive documentation of a particular survey's procedures does not imply more problems with the data, only that more information is available.

Accuracy of Data

The joint effects of "sampling" and "nonsampling" errors determine the accuracy of any statistic. Estimates based on a sample will differ somewhat from the figures that would have been obtained if a complete census had been taken using the same survey instruments, instructions, and procedures. In addition to such sampling errors, all surveys, both universe and sample, are subject to design, reporting, and processing errors and errors due to nonresponse. To the extent possible, these nonsampling errors are kept to a minimum by methods built into the survey procedures. In general, however, the effects of nonsampling errors are more difficult to gauge than those produced by sampling variability.

Sampling Errors

The samples used in surveys are selected from large numbers of possible samples of the same size that could have been selected using the same sample design. Estimates derived from the different samples would differ from each other. The difference between a sample estimate and the average of all possible samples is called the sampling deviation. The standard or sampling error of a survey estimate is a measure of the variation among the estimates from all possible samples and, thus, is a measure of the precision with which an estimate from a particular sample approximates the average result of all possible samples.

The sample estimate and an estimate of its standard error permit us to construct interval estimates with prescribed confidence that the interval includes the average result of all possible samples. If all possible samples were selected under essentially the same conditions and an estimate and its estimated standard error were calculated from each sample, then: (1) approximately 66.7 percent of the intervals from one standard error below the estimate to one standard error above the estimate would include the average value of all possible samples; and (2) approximately 95.0 percent of the intervals from two standard errors below the estimate to two standard errors above the estimate would include the average value of all possible samples. We call an interval from two standard errors below the estimate to two standard errors above the estimate a 95 percent confidence interval.

To illustrate this concept, consider the data and standard errors appearing in table 107. For the 2003 estimate that 9.9 percent of 16- to 24-year-olds were high school dropouts, the table shows that the standard error is 0.23 percent. The sampling error above and below the stated figure is approximately double the standard error, or about 0.46 percentage points. Therefore, we can create a 95 percent confidence interval, which is approximately 9.44 to 10.36 (9.9 percent ± 2 times 0.23 percent).

Analysis of standard errors can help assess how valid a comparison between two estimates might be. The standard error of a difference between two independent sample estimates is equal to the square root of the sum of the squared standard errors of the estimates. The standard error (se) of the difference between independent sample estimates "a" and "b" is:

sea,b = (sea2+seb2)1/2

It should be noted that most of the standard error estimates presented in the Digest and in the original documents are approximations. That is, to derive estimates of standard errors that would be applicable to a wide variety of items and could be prepared at a moderate cost, a number of approximations were required. As a result, the standard error estimates provide a general order of magnitude rather than the exact standard error for any specific item. The preceding discussion on sampling variability was directed toward a situation concerning one or two estimates. Determining the accuracy of statistical projections is more difficult. In general, the further away the projection date is from the date of the actual data being used for the projection, the greater the probable error in the projections. If, for instance, annual data from 1970 to 2002 are being used to project enrollment in institutions of higher education, the further beyond 2002 one projects, the more variability in the projection. One will be less sure of the 2014 enrollment projection than of the 2006 projection. A detailed discussion of the projections methodology is contained in Projections of Education Statistics to 2014 (National Center for Education Statistics, NCES 2005-074).

Nonsampling Errors

Universe and sample surveys are subject to nonsampling errors. Nonsampling errors may arise when respondents or interviewers interpret questions differently, when respondents must estimate values, or when coders, keyers, and other processors handle answers differently, when persons who should be included in the universe are not, or when persons fail to respond (completely or partially). Nonsampling errors usually, but not always, result in an underestimate of total survey error and thus an overestimate of the precision of survey estimates. Since estimating the magnitude of nonsampling errors often would require special experiments or access to independent data, these nonsampling errors are seldom available.

To compensate for nonresponse, adjustments of the sample estimates are often made. For universe surveys, an adjustment made for either type of nonresponse, total or partial, is often referred to as an imputation, which is often a substitution of the "average" questionnaire response for the nonresponse. For universe surveys, imputations are usually made separately within various groups of sample members that have similar survey characteristics. For sample surveys missing cases (i.e., total nonresponse) is handled through nonresponse adjustments to the sample weights. For sample surveys, imputation for item nonresponse is usually made by substituting for a missing item, the response to that item of a respondent having characteristics that are similar to those of the nonrespondent. For more information, see NCES Statistical Standards.

Although the magnitude of nonsampling error in the data compiled in this Digest is frequently unknown, idiosyncrasies that have been identified are noted on the appropriate tables.