NAEP Technical DocumentationComputation of Measures of Size

			Setting Measures of Size for Each Jurisdiction in State NAEP 2003

For the main school sample for State NAEP the initial measure of size MOS_js for school s in jurisdiction j is a function of x_js, the total number of students in school js in the respective grade (fourth or eighth):

M O S subscript j s equals x subscript j s if x subscript j s is greater than or equal to 69, or 62 if x subscript j s is greater than or equal to 20 and less than 69, or 3.1 times x subscript j s if x subscript j s is greater than or equal to 10 and less than 20, or 31 if x subscript j s is less than 10

Putting aside the very small schools (x_js less than 20), the measure of size (MOS) is constant for schools with smaller enrollment (up to 68), and proportional to x_js for schools with larger enrollment (69 and above), as outlined in the formulation for obtaining a self-weighting student sample. For the schools with x_js less than or equal to 10, the MOS is also constant, but half that of the “medium” schools (x_js from 20 to 68). The students in these schools are selected at half the rate of those in the medium schools, as desired. Schools with 10 to 20 students have a sliding factor of 1/2 to 1 in relative probabilities.

The probability of selection for each school is essentially this MOS_js multiplied by a constant of proportionality b_j which is carefully calculated for each jurisdiction j and grade. The proportionality constant b_j is computed to achieve the target of T_j equal to 6,510 as closely as possible, under a maximum burden constraint defined below.

Multiple hits are allowed for large schools. With H_js as the random variable determining the number of hits (H_js = 0, 1, 2, 3, . . .), schools for which H_js is 0 are not included in the school sample. If H_js 1, the school is sampled. The desired student sample size is then essentially 62 × H_js (62 for one-hit schools, 124 for two-hit schools, etc.). There is, however, an added twist. The final student sample size for each sampled school is computed according to the following function:

y subscript j s equals A left parenthesis x subscript j s comma H subscript j s right parenthesis equals left bracket x subscript j s if x subscript j s less than H subscript j s times left parenthesis 69 divided by 62 right parenthesis or 62 times H subscript j s if H subscript j s times left parenthesis 69 divided by 62 right parenthesis less than or equal to x subscript j s

The A( ) function is the “almost all” function. When a school's target population x_js is only slightly larger than the desired sample size 62 × H_js (by a factor of from 1 to 69/62), the sampling plan increases the sample size to equal the target population. This function avoids samples which exclude a handful of students in the grade, something that schools usually prefer to avoid. For example, a school with two sample hits (H_js = 2) has a desired sample size of 124. If the school has anywhere up to 137 students, all students are taken. If the school has 138 or more students, the design calls for a sample of 124 students exactly.

The next step in the development is to characterize the sampling distribution of H_js, which essentially defines the school and student sample design. Let E_js = E(H_js), the expected number of hits. Write [E_js] as the integer part of E_js; that is, the largest integer smaller than E_js. Define as the probability of selection of the school into the sample (i.e., equals the probability that H_js 1). The relationship among H_js, E_js, and is as follows in all NAEP school samples:

0 < E_js < 1: H_js equals 0 or 1, = E_js.
E_js = n, n = 1, 2, … H_js, = 1.
n < E_js < n+1, n = 1, 2, . . . :

H_js = n with probability 1 – {E_js – [E_js]}
H_js = n+1 with probability E_js – [E_js]

Note that the sampling distribution of the random variable H_js = n is now fully specified, once E_js is defined for each school. The following formula defines E_js:

The quantity u_j (the maximum number of “hits” allowed) in this formula is designed to put an upper bound on the burden for the sampled schools. In most jurisdictions, u_j was set to 3. In smaller jurisdictions, u_j was set to a larger value to allow for a larger expected student sample size. The largest values of u_j were

four for the Atlanta Trial Urban District Assessment district grade 8,
four for Wyoming grade 8,
five for Delaware grade 8, and
eight for Alaska grades 4 and 8.

The last task in this development is to delineate the computation of b_j for each jurisdiction. This delineation is done in an iterative fashion. In the kth iteration, compute E_js(k) values for each school based on an intermediate value of b_j(k). This computation defines a distribution for H_js, and thus for y_js. The overall expected yield of students is then

with S_j the set of schools in the jurisdiction j frame, and E_k indicating expectations using b_j(k). The value T_j(k) is compared with the desired target T_j. If T_j(k) does not equal T_j (rounded to an integer), then the process continues to a k+1st iteration, computing b_j(k+1) as some value in the interval [b_j,b_j(k)×T_j/T_j(k)], and continues iteratively until b_j(k) satisfies

after rounding to integer values.

Last updated 02 October 2008 (KL)

Printer-friendly Version