Skip Navigation
small NCES header image

Statistical Standards
Statistical Standards Program
Table of Contents
1. Development of Concepts and Methods
2. Planning and Design of Surveys
3. Collection of Data
4. Processing and Editing of Data
5. Analysis of Data / Production of Estimates or Projections

5-1 Statistical Analysis, Inference, and Comparisons
5-2 Variance Estimation
5-3 Rounding
5-4 Tabular and Graphic Presentations of Data

6. Establishment of Review Procedures
7. Dissemination of Data
Appendix A
Appendix B
Appendix C
Appendix D
Publication information

For help viewing PDF files, please click here



PURPOSE: Given that most NCES sample designs have one or more of the following three characteristics: unequal probabilities of selection, stratification, and clustering, it is important to ensure that appropriate techniques for the estimation of variance in sample surveys are identified, implemented and documented.

KEY TERMS: clustered samples, confidentiality, Data Analysis System (DAS), DEFT, design effect (DEFF), estimation, imputation, raking, point estimate, replication methods, Simple Random Sampling (SRS), strata, Taylor-series linearization, and variance.

STANDARD 5-2-1: Variance estimates must be derived for all reported point estimates whether reported as a single, descriptive statistic (e.g., 6 percent of 1988 eighth-graders dropped out of school by 1990) or used in an analysis to infer or draw a conclusion (e.g., more 12th graders took advanced-level mathematics courses in 1998 than in 1982).

STANDARD 5-2-2: Variance estimates must be calculated by a method appropriate to a survey's sample design (e.g., unequal probabilities of selection, stratification, clustering, and the effects of nonresponse, post-stratification, and raking). These estimates must reflect the design effect resulting from the complex design.

Approximate variance estimation methods that adjust for most of the impact of clustering and stratification include bootstrap, jackknife, Balanced-Repeated Replication (BRR), and Taylor-series linearization. Replication methods (bootstrap, jackknife, and BRR) can also adjust for the impact of nonresponse, post-stratification, and raking. When replication methods are used, the number of replicates should be large enough to enable stable variance estimation (e.g., ³ 30) and small enough (e.g., £ 100) for efficient calculation.

    GUIDELINE 5-2-2A: The preferred way to derive appropriate variance estimates for totals, means, proportions and regression coefficients is to use a statistical package that does not assume simple random sampling (SRS). Such packages include SUDAAN, WesVar, DAS, or Stata, and use such techniques as Taylor-series linearization or one of the replication methods mentioned above.

    GUIDELINE 5-2-2B: Consideration should be given to incorporating an adjustment for imputations in variance estimation procedures.

    GUIDELINE 5-2-2C: In some cases, alternative approximation strategies can be used to produce variance estimates. For example, software for multilevel models can be used to produce estimates that take into account some aspects of complex survey design. Care must be taken to include any clustering of the sample as a level in the model(s). In addition, any design variables and weights, such as those associated with strata or measures of size, should be taken into account.

STANDARD 5-2-3: Data files must include all information necessary for point estimation and variance estimation (e.g., probabilities of selection, weights, stratum and PSU codes), subject to confidentiality constraints (see Standard 7-1 on Machine Readable Data Products and Standard 4-2 on Maintaining Confidentiality).


Kish, L., Frankel, M. R., Verma, V., and Kaciroti, N. (1995). "Design effects for correlated (Pi-Pj)," Survey Methodology, 1995, 21: 117-124 (for an example on design effects for estimates of differences between proportions).

Pfeffermann, D. (1996). "The use of sampling weights for survey data analysis," Statistical Methods in Medical Research, 1996, (5) pp. 239-261.

Skinner, C. J., Holt, D., and Smith, T. M. F. (Eds.). (1989). Analysis of Complex Surveys, New York: Wiley.

Lehtonen R. and Pahkinen, E. J. (1995). Practical Methods for Design and Analysis of Complex Surveys. New York, NY: Wiley.

Pothoff, R. F., Woodbury, M. A., and Manton, K. G. (1992). "Equivalent sample size and equivalent degrees of freedom: refinements for inference using survey weights under superpopulation models." Journal of the American Statistical Association, 87, pp. 383-396.

Goldstein, H. and Rasbash, J. (1998) Weighting for Unequal Selection Probabilities in Multilevel Models, Journal of the Royal Statistical Society, Series B, (60), pp. 23-40.

Jones, K. (1992). "Using Multilevel Models for Survey Analysis." In Westlake, A. (Ed.), Survey and Statistical Computing. New York: North Holland. pp. 231-242.

Goldstein, H. (1991). "Multilevel Modeling of Survey Data." The Statistician, 40, pp. 235-244.

Would you like to help us improve our products and website by taking a short survey?

YES, I would like to take the survey


No Thanks

The survey consists of a few short questions and takes less than one minute to complete.
National Center for Education Statistics -
U.S. Department of Education