NAEP Weighting Procedures → Weighting Procedures for the 2000 State Assessment → Quality Control in the Weighting Process → Use of Normit Data to Check Standard Errors and Design Effects

## Use of Normit Data to Check Standard Errors and Design Effects

Normits are pseudo-scores generated by the data analysis contractor for each assessed student in the 2000 state assessment. The definition of a normit score according to the the NAEP 1996 Technical Report (Allen, Carlson, and Zelenak 1999) is as follows: "The normit score is a student-level Gaussian score based on the inverse normal transformation of the mid-percentile rank of a student’s number-corrected booklet score within a booklet. The normits have a mean of zero, and can range from –3.7 to 3.7. They correlate well with score data and therefore can be used as a preliminary check on standard errors and design effects (DEFF)."

Any large standard errors or DEFFs were identified. School-level means were computed for each school in the potential problem case (or domain). The means unveiled some unexpected normit averages, and explained some of the higher variances. The text that follows contains a summary of the normit data analysis for grade 4. There were no problem cases identified for grade 8 in both reference populations R2 and R3.

Tables were processed using the normits for the ten states that had been selected for quality control purposes for 2000 state assessment weighting. The purpose was to observe the standard errors of the estimated mean normits for subgroups in order to identify any potential problems in the variance estimation procedure. Grade 4 mathematics was processed for the R2 population and the estimated means, standard errors, and design effects look reasonable in general. However, there were three large design effects that called for further investigation. There was also one case with a standard error equal to 0. The four student group estimates were

Case 1: California, Rural (TYPLOC_R = 7), n = 43, DEFF= 38.309.

Case 2: Kentucky, SCHLUNCH = 4, n = 45, DEFF= 39.781.

Case 3: Maryland, Urban fringe of mid-size city (TYPLOC_R = 4), n = 18, DEFF = 0.000.

Case 4: Rhode Island, Small Town (TYPLOC_R = 6), n= 42, DEFF = 24.243.

For the three cases 1, 2, and 4, each of them involved two schools (actually case 2 involved four schools, but two of the schools only had 1 student assessed) where the mean normit for each school was very different from the others. Therefore, the large amount of clustering within the two schools produced the large DEFF. For case 3, there was only one school involved, which then causes a jackknife standard error of the mean equal to zero in this case.

ETS uses a rule of 62, where cells of a table with 62 students or less are not reported. All potential problem cases have less than 62 students in grade 4 for the R2 population.

Case 1 = CA overall. DEFF = 7.151.

Case 2 = MI, type of location = 'urban fringe of large city (3)'. DEFF = 6.482.

Case 3 = TX, type of location = 'rural (7)'. DEFF = 10.847.

In California grade 4, the standard error and design effect for the whole state were much higher for R3 than for R2 (design effect of 7.2 compared to 3.9). The mean normit was also quite a lot lower in R3 (-0.44 in R3, -0.18 in R2). There seemed to have been two problematic schools

In school A, all 20 students seemed to score the minimum normit of -3.719.

In school B, the mean normit for the 23 students was -3.3306559.

Texas grade 4 also had a higher design effect for R3 than R2 (5.4 versus 3.5). One school had an unusually low mean for a rural school. It was -2.22, while the other 10 rural schools range from +0.17 to +0.78.

Last updated 13 August 2008 (KL)