Skip to main content
Skip Navigation

Table of Contents  |  Search Technical Documentation  |  References

NAEP Technical DocumentationReplicate Variance Estimation

Variances for NAEP estimates are computed using the jackknife replication variance procedure. This technique is applicable for common statistics such as means and ratios as well as for more complex statistics such as Item Response Theory (IRT) scores. 

In general, the jackknife replicate variance procedure involves pairing clusters of first-stage sampling units to form H variance strata (h = 1, 2, 3, ...,H) with two units per stratum. The first replicate is formed by deleting one unit at random from the first variance stratum, inflating the weight of the remaining unit to weight up to the variance stratum total and using all other units from the other (H-1) strata. This procedure is carried out for each variance stratum resulting in H replicates, each of which provides an estimate of the population total.

The jackknife estimate of the variance for any given statistic is given by the following formula:

v open paren theta hat close paren equals summation from h equals one to cap h open paren theta hat subscript h minus theta hat close paren squared

where theta hat represents the full sample estimate of the given statistic and  theta hat subscript h represents the corresponding estimate for replicate h.

Each replicate undergoes the same weighting procedure as the full sample so that the jackknife variance estimate reflects the contributions to or reductions in variance resulting from the various weighting adjustments. 

The NAEP jackknife variance estimator is based on 62 variance strata resulting in a set of 62 replicate weights assigned to each school and student.

The basic idea of the jackknife variance estimator is to create the replicate weights so that use of the jackknife procedure results in a correct, unbiased variance estimate for simple totals and means, and which is also reasonably efficient (i.e., has a low variance as a variance estimator). The jackknife variance estimator will then produce a consistent (but not fully unbiased) estimate of variance for (sufficiently smooth) nonlinear functions of total and mean estimates such as ratios, regression coefficients, and so forth (Shao and Tu, 1995). The development below shows why the NAEP jackknife variance estimator does return a correct unbiased variance estimate for totals and means, which is the cornerstone of the asymptotic results for nonlinear estimators. See for example Rust (1985). This paper also discusses why this variance estimator is generally efficient (i.e., more reliable than alternative approaches requiring similar computational resources).

The development will be done for an estimate of a mean based on a simplified sample design that closely approximates the sample design for first-stage units used in the NAEP studies. The sample design is a stratified random sample with H strata with population weights Wh, stratum sample sizes nh, and stratum sample means y bar subscript h. The population estimate Y bar hat and standard unbiased variance estimate v left parenthesis Y bar hat right parenthesis are:

Y bar hat equals the summation over h from 1 to H of W subscript h times y bar subscript h, and v left parenthesis Y bar hat right parenthesis equals the summation over h from 1 to H of W subscript h squared times s subscript h squared divided by n subscript h

with

s subscript h squared equals 1 divided by the difference between n subscript h and 1 times the summation over i from 1 to n subscript h of left parenthesis y subscript h i minus y bar subscript h right parenthesis squared

The jackknife replicate variance estimator assigns one replicate h=1,…,H to each stratum, so that the number of replicates equals H. In NAEP, the replicates correspond generally to “doublets” and “triplets” (with the latter only being used if there are an odd number of sample units within a particular hard boundary generating replicate strata). For doublets, the process of generating replicates can be viewed as taking a simple random sample of size nh/2 within the replicate stratum, and assigning a doubled weight to the sampled elements, and zero weight to the unsampled elements. In this simplified context of stratified random sampling, this assignment reduces to replacing y bar subscript h with y bar subscript h left parenthesis asterisk right parenthesis, the latter being the sample mean of the sampled nh/2 units. The replicate estimate corresponding to stratum r is

Y bar hat left parenthesis r right parenthesis equals the summation of h from 1 to H that is not equal to r of W subscript h times y bar subscript h plus W subscript r times y bar subscript r left parenthesis asterisk right parenthesis

The r-th term in the sum of squares for v subscript j left parenthesis Y bar hat right parenthesis is thus:

left parenthesis Y bar hat left parenthesis r right parenthesis minus Y bar hat right parenthesis squared equals W subscript r squared times left parenthesis y bar subscript r left parenthesis asterisk right parenthesis minus y bar subscript r right parenthesis squared

In stratified random sampling, when a sample of size nr/2 is drawn without replacement from a population of size nr, the sampling variance is

E subscript asterisk left parenthesis y bar subscript r left parenthesis asterisk right parenthesis minus y bar subscript r right parenthesis squared equals 1 over left parenthesis n subcript r divided by 2 right parenthesis times n subscript r minus n subscript r divided by 2 divided by n subscript r times 1 over the difference of n subscript r and 1 times the summation over i from 1 to n subscript r of left parenthesis y subscript r i minus y bar subscript r right parenthesis squared equals 1 divided by n subscript r times left parenthesis n subscript r minus 1 right parenthesis times the summation over i from 1 to n subscript r of left parenthesis y subscript r i minus y bar subscript r right parenthesis squared equals s subscript r squared divided by n subscript r

See for example Cochran (1977), Theorem 5.3, using nr as the “population size”, nr/2 as the “sample size”, and sr2 as the “population variance” in the given formula. Thus

E subscript asterisk left bracket W subscript r squared times left parenthesis y bar subscript r left parenthesis asterisk right parenthesis minus y bar subscript r right parenthesis squared right bracket equals W subscript r squared times s subscript r squared divided by n subscript r

Taking the *-expectation over all of these stratified samples of size nr/2, it is found that

E subscript asterisk left parenthesis v subscript j left parenthesis Y bar hat right parenthesis right parenthesis equals v left parenthesis Y bar hat right parenthesis

In this sense, the jackknife variance estimator “gives back” the sample variance estimator for means and totals, as desired under the theory. In practice, random selection is not done in each replicate stratum, but units are instead assigned systematically (the first, third, etc.). Replicate strata are also grouped to make sure that the number of replicates is not too large (the replicate total is usually 62 for NAEP surveys). The randomization from the original sample distribution guarantees that the sum of squares contributed by each replicate will be close to the target expected value (rather than much larger or much smaller).

For triplets, the NAEP weighting contractor assigns two sets of replicate weights for replicate stratum r: r1 and r2 (which are then usually grouped with other doublets and/or triplets.) Note that r1 is always equal to r, with r2 being another replicate (“far away” from the first replicate). The replicate stratum r1 is partitioned into three equal-sized replicate units, with the following replicate weight assignments for the two replicates:

w subscript i open paren r subscript one close paren equals open paren one point five times w subscript i if i is an element of replicate stratum r and replicate unit one close paren or equals open paren one point five times w subscript i if i is an element of replicate stratum r and replicate unit two close paren or equals open paren zero if i is an element of replicate stratum r and replicate unit three close paren or equals open paren w subscript i if i is not an element of replicate stratum r close paren

where wi is the full sample base weight,

w subscript i open paren r subscript two close paren equals open paren one point five times w subscript i if i is an element of replicate stratum r and replicate unit one close paren or equals open paren zero if i is an element of replicate stratum r and replicate unit two close paren or equals open paren one point five time w subscript i if i is an element of replicate stratum r and replicate unit three close paren or equals open paren w subscript i if i is not an element of replicate stratum r close paren

In the case of stratified random sampling, this formula reduces to replacing y bar subscript r with y bar subscript r subscript 1 left parenthesis asterisk right parenthesis for replicate r1, where y bar subscript r subscript 1 left parenthesis asterisk right parenthesis is the sample mean from a “2/3” sample of 2*nr/3 units from the nr sample units in the replicate stratum, and replacing y bar subscript r with y bar subscript r subscript 2 left parenthesis asterisk right parenthesis for replicate r2, where y bar subscript r subscript 2 left parenthesis asterisk right parenthesis is the sample mean from another overlapping “2/3” sample of 2*nr/3 units from the nr sample units in the replicate stratum.

The r1-th and r2-th replicates can be written:

Y bar hat left parenthesis r subscript 1 right parenthesis equals the summation of h from 1 to H that is not equal to r of W subscript h times y bar subscript h plus W subscript r times y bar subscript r subscript 1 left parenthesis asterisk right parenthesis

Y bar hat left parenthesis r subscript 2 right parenthesis equals the summation of h from 1 to H that is not equal to r of W subscript h times y bar subscript h plus W subscript r times y bar subscript r subscript 2 left parenthesis asterisk right parenthesis

From these formulas, expressions for the r1-th and r2-th components of the jackknife variance estimator  are obtained (ignoring other sums of squares from other grouped components attached to those replicates):

open bracket cap y hat bar open paren r subscript one close paren minus cap y hat bar close bracket squared equals cap w subscript r squared times open bracket y bar subscript r subsubscript one open paren asterisk close paren minus y bar subscript r close bracket squared

open bracket cap y hat bar open paren r subscript two close paren minus cap y hat bar close paren squared equals cap w subscript r squared times open bracket y bar subscript r subsubscript two open paren asterisk close paren minus y bar subscript r close bracket squared

These sums of squares have *-expectations as follows, using the general formula for sampling variances:

cap e asterisk open bracket y bar subscript r subsubscript one open paren asterisk close paren minus y bar subscript r close bracket squared equals open bracket one divided by open paren two times n subscript r divided by three close paren close bracket times open bracket open subbracket n subscript r minus open paren two times n subscript r divided by three close paren close subbracket divided by n subscript r close bracket times open bracket one divided by open paren n subscript r minus one close paren close bracket times summation from i equals one to n subscript r open paren y subscript r subsubscript i minus y bar subscript r close paren squared equals open bracket one divided by open subbracket two times n subscript r times open paren n subscript r minus one close paren close subbracket close bracket times summation from i equals one to n subscript r open paren y subscript r subsubscript i minus y bar subscript r close paren squared equals s subscript r squared divided by two times n subscript r

cap e asterisk open bracket y bar subscript r subsubscript two open paren asterisk close paren minus y bar subscript r close bracket squared equals open bracket one divided by open paren two times n subscript r divided by three close paren close bracket times open bracket open subbracket n subscript r minus open paren two times n subscript r divided by three close paren close subbracket divided by n subscript r close bracket times open bracket one divided by open paren n subscript r minus one close paren close bracket times summation from i equals one to n subscript r open paren y subscript r subsubscript i minus y bar subscript r close paren squared equals open bracket one divided by open subbracket two times n subscript r times open paren n subscript r minus one close paren close subbracket close bracket times summation from i equals one to n subscript r open paren y subscript r subsubscript i minus y bar subscript r close paren squared equals s subscript r squared divided by two times n subscript r

Thus,

cap e asterisk open bracket cap w subscript r squared times open subbracket y bar subscript r subsubscript one open paren asterisk close paren minus y bar subscript r close subbracket squared plus cap w subscript r squared times open subbracket y bar subscript r subsubscript two open paren asterisk close paren minus y bar subscript r close subbracket squared close bracket equals cap w subscript r squared times open bracket open paren s subscript r squared divided by two times n subscript r close paren plus open paren s subscript r squared divided by two times n subscript r close paren close bracket equals cap w subscript r squared times s subscript r squared divided by n subscript r

as desired again.


Last updated 10 March 2009 (RF)

Printer-friendly Version