During scaling, Item Response Theory (IRT) parameters are estimated using data from the current assessment and the most recent past assessment of the same subject, if that past assessment was developed according to the same assessment framework. For items that fit
the two-parameter IRT model, "a" and "b" parameters are estimated. For items that fit the three-parameter model, "a," "b," and "c" are estimated. For items that fit
the generalized partial-credit model, "a," "b," and "d" parameters are estimated. There is an acceptable range of values for each IRT parameter. The "a" parameter typically ranges from 0 to 2. Generally, this value should never be negative. The "b" parameter typically varies from -3 to 3, and the "c" parameter varies from 0 to 1. The range of values for the "d" parameter is generally between -3 and 3, but can at times vary from -8 to 8. Items that are functioning poorly on a scale are identified early (i.e., during pilot tests and item analysis) and dropped from that scale. As with other IRT scaling procedures, person parameters are also estimated while the items are scaled; however, NAEP does not make use of these estimates, because group results based directly on these individual student parameters are inconsistent (Mislevy 1991). Note that the item parameters here are provided in the metrics used for the original
calibration of the scales. To learn more about available resources for conducting education research with NAEP data, see
NAEP Restricted-Use Datasets.