Table of Contents | Search Technical Documentation | References
The initial measure of size MOS_{js} for school s in jurisdiction j is set as follows as a function of x_{js}, the total number of students in school js in the fourth (eighth) grade. This assignment is conditional: each of the four pairs following the equal sign correspond to a condition on x_{js} and a functional relationship between MOS_{js} and x_{js} which is used to assign MOS_{js} under that condition: for example, schools with enrollment x_{js} greater than or equal to 65 have their MOS_{js} set equal to x_{js}.
Putting aside the very small schools (x_{js} less than 20), the MOS is constant for schools with smaller enrollment (up to 64), and proportional to x_{js} for schools with larger enrollment (65 and above), as outlined in the prescription for getting a self-weighting student sample. For the schools with x_{js} less than or equal to 10, there is also a constant MOS, but half that of the "medium" schools (x_{js} from 20 to 64). The students in these schools are selected at half the rate of those in the medium schools, as desired. Schools with between 10 and 20 students have a sliding factor of 1/2 to 1 in relative probabilities.
The probability of selection for each school is essentially this MOS_{js} multiplied by a constant of proportionality b_{j} which is carefully calculated for each jurisdiction j and grade. The proportionality constant b_{j} is computed to achieve the target of T_{j} equal to 6,300 as closely as possible, under a maximum burden constraint which is defined below.
The sample selection plan does allow for "multiple hits" for large schools: Write H_{js} as the random variable determining the number of hits (H_{js} = 0, 1, 2, 3, . . .). Schools for which H_{js} is 0 are not included in the school sample. If H_{js} >= 1, the school is sampled. The desired student sample size is then essentially 60*H_{js} (60 for one-hit schools, 120 for two-hit schools, etc.). There is, however, an added twist on this formula. The final student sample size for each sampled school is computed conditionally according to the following function:
The A( ) function can be called the "almost all" function; when a school’s target population x_{js} is only slightly larger than the desired sample size 60*H_{js} (by a factor of from 1 to 65/60), the sample size is increased to equal the target population. This increase avoids samples which exclude a handful of students in the grade, something that schools tend to prefer to avoid. For example, if two hits (H_{js} = 2) have been sampled, the desired sample size is 120. If the school has fewer than 129 students, all students are taken. If the school has 130 or more students, exactly 120 students are sampled.
The next step is to define the sampling distribution of H_{js}, which essentially defines the school and student sample design. Let E_{js} = E(H_{js}), the expected number of hits. Write [E_{js}] as the integer part of E_{js} (i.e., the largest integer smaller than E_{js}). Define as the probability of selection of the school into the sample (i.e., equals the probability that H_{js} ≥ 1). The relationship between H_{js}, E_{js}, and is as follows in all NAEP school samples. This formula is conditional on the value of E_{js} – there are three possible cases: E_{js} less than 1, E_{js} equal to an integer, E_{js} a noninteger greater than 1– with varying definitions of and H_{js} for each case. In the first case, if 0 < E_{js} < 1 then H_{js} is random and equals 0 (with probability 1-E_{js}) or 1 (with probability E_{js}), and = E_{js}. In the second case, if E_{js} is an integer n = 1, 2, . . ., then H_{js} equals n with certainty and E_{js} is a noninteger greater than 1), suppose n < E_{js} < n+1 where n is an integer greater than or equal to 1. Then H_{js} is random and equals n with probability 1 - [E_{js}], and equals n+1 with probability [E_{js}]. is equal to 1.
Note that the sampling distribution of the random variable H_{js} is now fully defined if E_{js} is defined for each school. The following formula defines E_{js}:
The quantity u_{j} in this formula is designed to put an upper bound on the burden for the sampled schools. In most jurisdictions, u_{j} was set to 3. In smaller jurisdictions, u_{j} was set to a larger value to allow for a larger expected student sample size. The largest value of u_{j} was 8 for the jurisdiction Alaska.
The last task in this development is to define how b_{j} is computed for each jurisdiction. This task is done in an iterative fashion. In the k-th iteration, E_{js}(k) values are computed for each based on an intermediate value of b_{j}(k). This computation defines a distribution for H_{js}, and thus for y_{js}; the overall expected yield of students is then
with S_{j} the set of schools in the jurisdiction j frame, and E_{k} indicating expectations using b_{j}(k). Then T_{j}(k) is compared with the desired target T_{j}. If T_{j}(k) does not equal T_{j} (rounded to an integer), then the k+1^{st} iteration is continued, computing b_{j}(k+1) as some value in the interval (b_{j},b_{j}(k)*T_{j}/T_{j}(k)), and continue iteratively until a b_{j}(K) is found that satisfies
after rounding to integer values.