NAEP Technical DocumentationComputation of Measures of Size

			Establishment of Measures of Size for Each Jurisdiction

The initial measure of size MOS_js for school s in jurisdiction j is set as follows as a function of x_js, the total number of students in school js in the fourth (eighth) grade. This assignment is conditional: each of the four pairs following the equal sign correspond to a condition on x_js and a functional relationship between MOS_js and x_js which is used to assign MOS_js under that condition: for example, schools with enrollment x_js greater than or equal to 65 have their MOS_js set equal to x_js.

MOS subscript j s equals left open bracket x subscript j s if x subscript j s is greater than or equal to 65 or 60 is x subscript j s is between 20 and 64 or 3 times x subscript j s if x subscript j s is between 10 and 20 or 30 if x subscript j s is less than or equal to 10

Putting aside the very small schools (x_js less than 20), the MOS is constant for schools with smaller enrollment (up to 64), and proportional to x_js for schools with larger enrollment (65 and above), as outlined in the prescription for getting a self-weighting student sample. For the schools with x_js less than or equal to 10, there is also a constant MOS, but half that of the "medium" schools (x_js from 20 to 64). The students in these schools are selected at half the rate of those in the medium schools, as desired. Schools with between 10 and 20 students have a sliding factor of 1/2 to 1 in relative probabilities.

The probability of selection for each school is essentially this MOS_js multiplied by a constant of proportionality b_j which is carefully calculated for each jurisdiction j and grade. The proportionality constant b_j is computed to achieve the target of T_j equal to 6,300 as closely as possible, under a maximum burden constraint which is defined below.

The sample selection plan does allow for "multiple hits" for large schools: Write H_js as the random variable determining the number of hits (H_js = 0, 1, 2, 3, . . .). Schools for which H_js is 0 are not included in the school sample. If H_js >= 1, the school is sampled. The desired student sample size is then essentially 60*H_js (60 for one-hit schools, 120 for two-hit schools, etc.). There is, however, an added twist on this formula. The final student sample size for each sampled school is computed conditionally according to the following function:

y subscript j s equals A left parenthesis x subscript j s comma H subscript j s right parenthesis equals left open bracket x subscript j s if x subscript j s is less than H subscript j s times left parenthesis 65 divided by 60 right parenthesis or 60 times H subscript j s if H subscript j s times left parenthesis 65 divided by 60 right parenthesis is less than or equal to x subscript j s

The A( ) function can be called the "almost all" function; when a school’s target population x_js is only slightly larger than the desired sample size 60*H_js (by a factor of from 1 to 65/60), the sample size is increased to equal the target population. This increase avoids samples which exclude a handful of students in the grade, something that schools tend to prefer to avoid. For example, if two hits (H_js = 2) have been sampled, the desired sample size is 120. If the school has fewer than 129 students, all students are taken. If the school has 130 or more students, exactly 120 students are sampled.

The next step is to define the sampling distribution of H_js, which essentially defines the school and student sample design. Let E_js = E(H_js), the expected number of hits. Write [E_js] as the integer part of E_js (i.e., the largest integer smaller than E_js). Define as the probability of selection of the school into the sample (i.e., equals the probability that H_js ≥ 1). The relationship between H_js, E_js, and is as follows in all NAEP school samples. This formula is conditional on the value of E_js – there are three possible cases: E_js less than 1, E_js equal to an integer, E_js a noninteger greater than 1– with varying definitions of and H_js for each case. In the first case, if 0 < E_js < 1 then H_js is random and equals 0 (with probability 1-E_js) or 1 (with probability E_js), and = E_js. In the second case, if E_js is an integer n = 1, 2, . . ., then H_js equals n with certainty and pi subscript j s E_js is a noninteger greater than 1), suppose n < E_js < n+1 where n is an integer greater than or equal to 1. Then H_js is random and equals n with probability 1 - [E_js], and equals n+1 with probability [E_js]. is equal to 1.

Note that the sampling distribution of the random variable H_js is now fully defined if E_js is defined for each school. The following formula defines E_js:

The quantity u_j in this formula is designed to put an upper bound on the burden for the sampled schools. In most jurisdictions, u_j was set to 3. In smaller jurisdictions, u_j was set to a larger value to allow for a larger expected student sample size. The largest value of u_j was 8 for the jurisdiction Alaska.

The last task in this development is to define how b_j is computed for each jurisdiction. This task is done in an iterative fashion. In the k-th iteration, E_js(k) values are computed for each based on an intermediate value of b_j(k). This computation defines a distribution for H_js, and thus for y_js; the overall expected yield of students is then

with S_j the set of schools in the jurisdiction j frame, and E_k indicating expectations using b_j(k). Then T_j(k) is compared with the desired target T_j. If T_j(k) does not equal T_j (rounded to an integer), then the k+1^st iteration is continued, computing b_j(k+1) as some value in the interval (b_j,b_j(k)*T_j/T_j(k)), and continue iteratively until a b_j(K) is found that satisfies

after rounding to integer values.

Last updated 02 October 2008 (KL)

Printer-friendly Version