The target student sample size per jurisdiction for an operational assessment was 3,150. In fourth grade, the total sample size for a jurisdiction was 6,500, which includes reading and mathematics assessments and 200 pilot tests. In eighth grade, the total sample size for a jurisdiction that was participating in writing assessment was 9,750, which includes reading, mathematics, and writing assessments and 300 pilot tests. Jurisdictions not participating in the state writing assessment had a target sample size of 6,650, which includes reading and mathematics assessments and 350 writing students sampled for the national writing sample. No students were sampled for the pilot test in states not participating in writing.
Students in California, Texas, and Florida were oversampled in order to facilitate analyses of the diverse populations in those states. The California sample size was doubled and the samples for Florida and Texas were one-and-a-half times the standard sample. Unlike in previous years, New York was not oversampled.
To increase precision of estimates for charter schools outside of the Trial Urban District Assessment (TUDA) jurisdictions, charter schools were oversampled in California, Texas, and New York.
The school samples were designed to have minimum overlap with the United States school sample for the Trends in International Mathematics and Science Study (TIMSS).
The general goal is to achieve a "self-weighting" sample at the student level; that is, as much as is possible, every eligible student should have the same probability of selection. Differences in the probability of selection among students introduce unwanted design effects, which increase the variance (reducing the marginal benefit of each added student).
When all students in a grade are taken in each sampled school, a self-weighting sample results from setting a fixed probability of selection across schools (as each student in the grade then has a probability of selection equal to the school probability of selection, which is equal across schools). When a fixed sample size of students (e.g., six) is taken in a selected grade in each sampled school, a self-weighting sample is achieved by taking a probability-proportionate-to-size sample of schools, with size equal to the number of grade-eligible students in schools divided by a constant such that the sum of the measures of size is the sample size. Each student then has a conditional probability of selection, which, when multiplied by the school's probability of selection, again gives equal unconditional probabilities of selection for students across schools.
There is also an added need to lower the expected number of very small schools in the sample, as the marginal cost for each assessed student in these schools is higher. These very small schools are sampled at half the rate of the larger schools, and their weights are doubled to account for the half-sampling.
Schools were ordered within each jurisdiction using a serpentine sort (by TUDA/urbanicity status, race/ethnicity status, and achievement score or ZIP Code area median income). Next, a systematic sample was drawn with probability proportional to the measures of size, using a sampling interval of one. We refer to sampled schools as being "hit" in the sampling process.
Some larger schools had size measures larger than one. These schools may have been sampled more than once (i.e., they had multiple "hits"), meaning that a larger sample of students will be selected from these schools.
The goal of deeply stratifying the school sample in each jurisdiction was to reflect the population distribution as closely as possible, thus minimizing the sampling error. The success of this approach was shown by comparing the proportion of race/ethnicity groups enrolled in schools (based on Common Core of Data values for each school), median income, and type of location (viewed as an interval variable) reported in the original frame against the school sample.
In addition, the distribution of state assessment achievement scores for the original frame can be compared with that of the school sample for those jurisdictions for which state assessment achievement data are available, as was done in the evaluation of state achievement data in the sampling frame. The number of significant differences found in this analysis is smaller than what would be expected to occur by chance, given the large number of comparisons that were made. The small number of significant differences may be partially accounted for by the lack of use of a finite population correction factor in the calculation of the sampling variances. However, the close adherence of sample values to frame values suggests that there is little evidence that the school sample for NAEP 2007 is not representative of the frame from which it was selected. The achievement/median income variable is used as the fourth-level sort order variable in the school systematic selection procedure. While it may be a rather low level sort variable, it still helps control how representative the sampled schools are in terms of achievement. The close agreement between frame and sample values of these achievement/median income variables provides assurance that the selected sample is representative of the frame with respect to achievement status.