NAEP Technical DocumentationSelection of Primary Sampling Units in the 2001 National Main Assessment

A sample of 94 primary sampling units (PSUs) was drawn for the 2001 sample. Of the 94 PSUs, 22 were selected with certainty because they had the largest populations in the PSU universe.

Within each of the 72 noncertainty strata, one PSU was selected with probability proportionate to its 1990 Census population. That is, within each stratum, a PSU's probability of being selected was proportional to its population. The PSUs were selected with probability proportional to size (PPS) with the twin aims of obtaining approximately self-weighting samples of students, and having approximately equal workloads in each PSU. PSUs were drawn to minimize overlap of the PSUs from one assessment to the next, except that certainty PSUs were retained in each assessment year, and some of the larger noncertainty PSUs are in the sample for more than one of these assessment years.

Primarily because of the use of MSAs as PSUs, PSUs varied considerably as to their probability of selection, since they varied greatly in size. In 2001, the 36 selected noncertainty MSA PSUs had probabilities of selection ranging from 0.046 to 0.506, while the 36 selected non-MSA PSUs had probabilities ranging from 0.031 to 0.137.

Within each stratum, the order of the PSUs was randomized. The selection of PSUs within a stratum was not independent among the survey years. Ordering the PSUs within a stratum by size, geography, or other variables could have resulted in unintended and possibly detrimental correlation between survey estimates across years. Since only one PSU is selected for a given year, the PSU ordering has no effect on sampling variance.

For each PSU within a stratum, a normalized measure of size was calculated for each PSU within a stratum, where k is the PSU and i is the stratum. The cumulative count C_ik is calculated as:

where NM_ij is the normalizing measure for the j^th PSU in the i^th stratum.

A systematic sample was carried out using the measures, with five different sample designation numbers. These five sample designation numbers covered the five NAEP assessments between 1994 and 2001 (1994, 1996, 1998, 2000, 2001). The first sample designation number, denoted r, was selected as a random number between 0 and 1. This number was used for the NAEP 1994 assessment. Subsequent sample designation numbers added an additional factor to the original r. The sample designation number for the NAEP 2001 assessment equaled r + 0.6. Only the noninteger part of any number in the sequence which exceeded 1.0000 was retained. For example, if r equals 0.626743, then r + 0.6 equals 1.126743, and 0.126743 becomes the sample designation number for NAEP 2001.

The first PSU in each stratum whose cumulative count was equal to or greater than the noninteger portions of r + 0.6 were designated the 2001 main sample PSUs.

The purpose of having the sample designation numbers for 2001 be a function of r was to attempt to minimize the overlap among the sets of main sample PSUs chosen for the five NAEP assessments between 1994 and 2001 that involved PSU sampling. In strata with smaller numbers of PSUs, some PSUs had large enough normalized measures of size so that they were drawn for two and sometimes even three survey years. By setting the spacing between the sample designation numbers for any two consecutive survey years to 0.4, selecting the same PSU in two consecutive survey years was unlikely (had a low probability of occurring).

Last updated 08 May 2008 (MH)

Printer-friendly Version