Computation of Measures of Size

The initial measure of size MOSjs for school s in jurisdiction j is set as follows as a function of xjs, the total number of students in school js in the fourth (eighth) grade. This assignment is conditional: each of the four pairs following the equal sign correspond to a condition on xjs and a functional relationship between MOSjs and xjs which is used to assign MOSjs under that condition: for example, schools with enrollment xjs greater than or equal to 65 have their MOSjs set equal to xjs.

Putting aside the very small schools (xjs less than 20), the MOS is constant for schools with smaller enrollment (up to 64), and proportional to xjs for schools with larger enrollment (65 and above), as outlined in the prescription for getting a self-weighting student sample. For the schools with xjs less than or equal to 10, there is also a constant MOS, but half that of the "medium" schools (xjs from 20 to 64). The students in these schools are selected at half the rate of those in the medium schools, as desired. Schools with between 10 and 20 students have a sliding factor of 1/2 to 1 in relative probabilities.

The probability of selection for each school is essentially this MOSjs multiplied by a constant of proportionality bj which is carefully calculated for each jurisdiction j and grade. The proportionality constant bj is computed to achieve the target of Tj equal to 6,300 as closely as possible, under a maximum burden constraint which is defined below.

The sample selection plan does allow for "multiple hits" for large schools: Write Hjs as the random variable determining the number of hits (Hjs = 0, 1, 2, 3, . . .). Schools for which Hjs is 0 are not included in the school sample. If Hjs >= 1, the school is sampled. The desired student sample size is then essentially 60*Hjs (60 for one-hit schools, 120 for two-hit schools, etc.). There is, however, an added twist on this formula. The final student sample size for each sampled school is computed conditionally according to the following function:

The A( ) function can be called the "almost all" function; when a school’s target population xjs is only slightly larger than the desired sample size 60*Hjs (by a factor of from 1 to 65/60), the sample size is increased to equal the target population. This increase avoids samples which exclude a handful of students in the grade, something that schools tend to prefer to avoid. For example, if two hits (Hjs = 2) have been sampled, the desired sample size is 120. If the school has fewer than 129 students, all students are taken. If the school has 130 or more students, exactly 120 students are sampled.

The next step is to define the sampling distribution of Hjs, which essentially defines the school and student sample design. Let Ejs = E(Hjs), the expected number of hits. Write [Ejs] as the integer part of Ejs (i.e., the largest integer smaller than Ejs). Define  as the probability of selection of the school into the sample (i.e., equals the probability that Hjs ≥ 1). The relationship between Hjs, Ejs, and  is as follows in all NAEP school samples. This formula is conditional on the value of Ejs – there are three possible cases: Ejs less than 1, Ejs equal to an integer, Ejs a noninteger greater than 1– with varying definitions of and Hjs for each case. In the first case, if 0 < Ejs < 1 then Hjs is random and equals 0 (with probability 1-Ejs) or 1 (with probability Ejs), and = Ejs. In the second case, if Ejs is an integer n = 1, 2, . . ., then Hjs equals n with certainty and  Ejs is a noninteger greater than 1), suppose n < Ejs < n+1 where n is an integer greater than or equal to 1. Then Hjs is random and equals n with probability 1 - [Ejs], and equals n+1 with probability [Ejs].  is equal to 1.

Note that the sampling distribution of the random variable Hjs is now fully defined if Ejs is defined for each school. The following formula defines Ejs:

The quantity uj in this formula is designed to put an upper bound on the burden for the sampled schools. In most jurisdictions, uj was set to 3. In smaller jurisdictions, uj was set to a larger value to allow for a larger expected student sample size. The largest value of uj was 8 for the jurisdiction Alaska.

The last task in this development is to define how bj is computed for each jurisdiction. This task is done in an iterative fashion. In the k-th iteration, Ejs(k) values are computed for each based on an intermediate value of bj(k). This computation defines a distribution for Hjs, and thus for yjs; the overall expected yield of students is then

with Sj the set of schools in the jurisdiction j frame, and Ek indicating expectations using bj(k). Then Tj(k) is compared with the desired target Tj. If Tj(k) does not equal Tj (rounded to an integer), then the k+1st iteration is continued, computing bj(k+1) as some value in the interval (bj,bj(k)*Tj/Tj(k)), and continue iteratively until a bj(K) is found that satisfies

after rounding to integer values.

Last updated 02 October 2008 (KL)