Skip Navigation

Table of Contents  |  Search Technical Documentation  |  References

Substitute Schools

Substitute schools were identified with the purpose of replacing originally sampled schools that do not cooperate. In order to serve this role, substitute schools should be as much alike the originally sampled school as possible in terms of characteristics correlated to achievement. The procedure of selecting substitutes for the twelfth-grade public school sample was very similar to that for selecting substitutes for the fourth- and eighth-grade public school samples, with some exceptions in the details. The pool of potential substitutes for each originally sampled regular1 twelfth-grade public school had to satisfy the following criteria:

  • The school could not be in any of the fourth- or eighth-grade NAEP 2002 samples (as some of these will have a twelfth grade as well).

  • The school could not be an Education Longitudinal Study school.

  • The school could not be a state-run public school.

  • The school had to be in the same state, minority stratum (high or low), and urbanicity stratum as the originally sampled school.

After identifying the correct pool of potential substitutes for each sampled school, a distance measure was computed for each original school-potential substitute pair. Using as indexes h as the state-minority stratum-urbanicity stratum cell, i as the potential substitute school, and j as the sampled school, the distance measure DMhij is computed as follows:

D M subscript h i j equals the square root of left bracket the squared difference of MINB subscript h i and MINB subscript h j divided by VAR underscore M plus the squared difference of MINH subscript h i and MINH subscript h j divided by VAR underscore M plus the squared difference of the square root of EST subscript h i and the square root of EST subscript h j divided by VAR underscore SQE right bracket


  • MINB represents the percentage of Blacks in the school;

  • MINH represents the percentage of Hispanics in the school;

  • EST represents the estimated grade enrollment;

  • VAR_M equals the average of the variance of MINB over all schools and the variance of MINH over all schools; and

  • VAR_SQE equals the variance of the square root of grade enrollment over all schools.

This distance measure is a measure of how much a pair of schools is 'alike': for example, a distance measure of zero would indicate that two schools have the same minority percentages (Black and Hispanic) and the same enrollment. A large value of the distance measure would indicate a pair of schools with minority percentages and enrollments at opposite extremes: 'very different' schools. If for example a pair of schools has minority percentages and root estimated enrollments which are one standard deviation apart, then the distance measure will be equal to the square root of 3 (roughly 1.7). A pair of schools with a distance measure less than 0.65 indicates that all three distances (Black percentage, Hispanic percentage, square-root estimated enrollment) are closer than 0.65 of a standard deviation. We used this ad-hoc cutoff of 0.65 as a boundary for 'closeness' for a pair as we have found that this cutoff leads to a optimal-sized set of candidate schools (not too many and not too few).

On the "first pass," each school was checked for a substitute which is out-of-district,2 and has an upper bound for DMhij of 0.65. Each substitute can be a substitute for only one school, so substitutes are selected one at a time, based on the "best" original-substitute pair available at that point.

This assignment can be illustrated via the example given below, in which we have 5 sampled schools and 10 potential substitutes. We check the distances for all possible pairs, which are indicated below by subscripts i and j (i=1,...,10 for the potential substitutes and j=1,...,5 for the sampled schools). The numbers in the table are the distances, with the bold-faced value the selected substitute in the column (the best potential substitute for that sampled school j). School i=5 is a good substitute for both j=2 and j=4, but gets selected for j=4. The selected pairs in this example (in their order of selection) are (j=4, i=5), (j=3, i=4), (j=5, i=1), (j=1, i=6), and (j=2, i= 9).

Potential substitutes Sampled schools
j=1 j=2 j=3 j=4 j=5
i=1 26 57 40 30 16
i=2 42 47 31 55 48
i=3 36 32 51 46 54
i=4 41 27 12 59 34
i=5 50 14 39 10 69
i=6 22 43 28 61 44
i=7 56 63 58 64 45
i=8 68 49 33 35 65
i=9 52 25 62 67 60
i=10 29 66 38 53 37

At the end of this process, it usually is the case that not every original school will find a substitute (even if potentials might have been available, as they could have been chosen for other original schools). We carried through a "second pass" in which we allowed for in-district substitutes as well as out-of-district substitutes, and set an upper bound for DMhij of 0.75. Many original schools had no substitute even after this second pass, and were left as such. The procedure then is conservative in that it doesn't provide substitutes for every sampled school, but the substitutes that are provided are guaranteed to be 'similar' in terms of ethnicity and enrollment.

The process for selecting substitutes for the private schools was very similar, with the same definition of distance and the same cutoffs. The cells were defined differently however. For Catholic schools, the cells consisted of the individual dioceses. In addition, only within-diocese substitution was done.3 For non-Catholic private schools, the cells were based on private school stratum, state, urbanicity stratum, and private school type. Only one run is done for each cell: there is no "out-of-district" and "in-district." Original twelfth-grade schools receive substitutes first, followed by eighth grade and then fourth grade. Schools cannot be substitutes for more than one school (e.g., a twelfth-grade school and an eighth-grade school). This "favors" twelfth-grade schools in receiving substitutes as they tend to have lower response rates.

1 State-run public schools do not receive substitutes.
2 Substitutes that are out-of-district are preferred, as noncooperation tends to cluster in particular districts (due sometimes to decisions taken at the district level).
3There does not tend to be clustering of nonresponse at the diocese level as we generally see at the district level: i.e., decisions not to participate do not tend to occur at the diocese level.

Last updated 08 July 2008 (PE)

Printer-friendly Version