Skip Navigation

Table of Contents  |  Search Technical Documentation  |  References

Differential Item Functioning (DIF)


DIF Procedures

Categorization of Items According to DIF Criteria

NAEP Specifications in DIF Procedures

Differential item functioning (DIF) analysis refers to procedures that assess whether items are differentially difficult for different groups of examinees after controlling for overall differences in performance. DIF procedures compare performances of subgroups on each item within sets of examinees having the same level of performance, usually measured by total test score. Items identified as having DIF are then evaluated to determine if they are biased; that is, if the DIF is related to a factor unrelated to what is being tested. If an item is determined to be biased, it is deleted from NAEP assessments. A biased item deviates from the Item Response Theory (IRT) models used in NAEP, because the probability of doing well on the item depends not only on what the examinee knows and can do and on the item as reflected in the item parameters, but also on a characteristic of the item that is unrelated to the construct being measured.

As pointed out by Zieky (1993):

It is important to realize that DIF is not a synonym for bias. The Item Response Theory based methods, as well as the Mantel-Haenszel and standardization methods of DIF detection, will identify questions that are not measuring the same dimension(s) as the bulk of the items in the matching criterion . . . .Therefore, judgment is required to determine whether or not the difference in difficulty shown by a DIF index is unfairly related to group membership. The judgment of fairness is based on whether or not the difference in difficulty is believed to be related to the construct being measured . . . .The fairness of an item depends directly on the purpose for which a test is being used. For example, a science item that is differentially difficult for women may be judged to be fair in a test designed for certification of science teachers because the item measures a topic that every entry-level science teacher should know. However, that same item, with the same DIF value, may be judged to be unfair in a test of general knowledge designed for all entry-level teachers. (p. 340)

Last updated 17 November 2008 (RF)

Printer-friendly Version