The DIF analyses of the dichotomous items were based on the Mantel-Haenszel chi-square procedure as adapted by Holland and Thayer (1988). The procedure tests the statistical hypothesis that the odds of correctly answering an item are the same for two groups of examinees that have been matched on some measure of proficiency—usually referred to as the matching criterion. The DIF analyses of the polytomous items were completed using the Mantel-Haenszel ordinal procedure which is based on the Mantel procedure (Mantel 1963, Mantel and Haenszel 1959). These procedures compare proportions of matched examinees from each group in each polytomous item-response category.
For both types of analyses, the measure of proficiency used is typically the total item score on some collection of items. Since, by the nature of the balanced incomplete block (BIB) or partially balance incomplete block (pBIB) design, booklets comprise different combinations of blocks, there is no single set of items common to all examinees. Therefore, for each student, the measure of proficiency used was the total item score on the entire booklet. These scores were then pooled across booklets for each analysis. This procedure is described by Allen and Donoghue (1994, 1996). In addition, because research results (Zwick and Grima 1991) strongly suggest that sampling weights should be used in conducting DIF analyses, the weights were used.
For each dichotomous item in the assessment, an estimate of the Mantel-Haenszel common odds ratio,MH, expressed on the delta scale for item difficulty, was produced. The estimates indicate the difference between reference-group and focal-group item difficulties (measured in delta scale units), and typically run between about +3 and -3. Positive values indicate items that are differentially easier for the focal group than the reference group after making an adjustment for the overall level of proficiency in the two groups. Similarly, negative values indicate items that are differentially harder for the focal group than the reference group. NAEP categorizes each item into one of three categories (Petersen 1988): "A" (items exhibiting no DIF), "B" (items exhibiting a weak indication of DIF), or "C" (items exhibiting a strong indication of DIF). Items in category "A" have Mantel-Haenszel common odds ratios on the delta scale that do not differ significantly from 0 at the alpha = 0.05 level or are less than 1.0 in absolute value. Category "C" items are those with Mantel-Haenszel values that are significantly greater than 1.0 and larger than 1.5 in absolute magnitude. Other items are categorized as "B" items. A plus sign (+) indicates that items are differentially easier for the focal group; a minus sign (-) indicates that items are differentially more difficult for the focal group. The NAEP DIF procedure for polytomous items uses the Mantel-Haenszel ordinal procedure (Mantel and Haenszel 1959). Generalizations of the dichotomous "A", "B", and "C" categories specified in Peterson (1988) are: "AA","BB", or "CC." Polytomous items are categorized as "AA", "BB", or "CC" in a way that is generalized from the "A", "B", and "C" categories used for dichotomous items (Peterson 1988).