The general target for each jurisdiction participating in all three assessments, including science, was 9,450 sampled students for each grade. Jurisdictions participating only in the mathematics and reading assessments had a target of 6,300 per grade. This target was designed to yield on average 30 students per subject per school for the selected grade in each of the subject areas (mathematics, reading, and science). The minimum target for each sampled school was 60 students per grade for mathematics and reading (90 students for all three subjects), which corresponds roughly to two (three) assessment sessions per school. Puerto Rico, where students were assessed in mathematics only, was an exception; its target was 3,150 students for each grade with 30 students per school.
California, Texas, New York, and Florida all received larger sample sizes than in NAEP 2003 in order to facilitate analyses of the diverse populations in those states. Specifically, the California sample size was tripled, Texas was doubled, and New York and Florida each received a 50 percent increase over the 2003 sample size.
The general goal is to achieve a self-weighting sample at the student level; that is, as much as is possible, every eligible student should have the same probability of selection. Differences in the probability of selection among students introduce unwanted design effects, which increase the variance (reducing the marginal benefit of each added student).
When all students in a grade are taken in each sampled school, a self-weighting sample results from setting a fixed probability of selection across schools (as each student in the grade then has a probability of selection equal to the school probability of selection, which is equal across schools). When a fixed sample size of students (e.g., six) is taken in a selected grade in each sampled school as listed on the frame, a self-weighting sample is achieved by taking a probability-proportionate-to-size sample of schools, with size equal to the number of grade-eligible students in schools. Each student then has a conditional probability of selection, which, when multiplied by the school's probability of selection, again gives equal unconditional probabilities of selection for students across schools.
There is also an added need to lower the expected number of very small schools in the sample, as the marginal cost for each assessed student in these schools is higher. These very small schools are sampled at half the rate of the larger schools, and their weights are doubled to account for the half-sampling.
Schools were ordered within each jurisdiction using a serpentine sort (by TUDA/urbanicity status, race/ethnicity status, and achievement score or ZIP Code area median income). Next, a systematic sample was drawn with probability proportional to the measures of size, using a sampling interval of one. NAEP sample design staff refer to sampled schools as being "hit."
Some larger schools had size measures larger than one. These schools may have been sampled more than once (i.e., they had multiple "hits").
The goal of deeply stratifying the school sample in each jurisdiction was to reflect the population distribution as closely as possible, thus minimizing the sampling error. The success of this approach was shown by comparing the proportion of races/ethnicities enrolled in schools (Common Core of Data [CCD] values for each school), median income, and type of location (viewed as an interval variable) reported in the original frame against the school sample. In addition, the distribution of state assessment achievement scores for the original frame can be compared with that of the school sample for those jurisdictions for which state assessment achievement data are available, as was done in the evaluation of state achievement data in the sampling frame.
For quality control purposes, school and student counts from the sampling frame were compared to school and student estimates from the sample. No major issues were found.