NAEP Technical DocumentationItem Mapping Procedures

Item maps help to illustrate what students know and can do in NAEP subject areas by positioning descriptions of individual assessment items along the NAEP scale at each grade level. An item is placed at the point on the scale where students are more likely to give successful responses to it. The descriptions used in NAEP item maps focus on the knowledge and skills needed to respond successfully to the assessment item. For multiple-choice items, the description indicates the knowledge or skill demonstrated by selection of the correct option. For constructed-response items, the description takes into account the knowledge or skill specified by the different levels of scoring criteria for that item.

To map items to particular points on each subject area scale, a response probability convention had to be adopted that would divide those who had a higher probability of success from those who had a lower probability. Choosing a response probability convention has an impact on the mapping of assessment items onto the scales. A lower boundary convention maps the items at lower points along the scales, and a higher boundary convention maps the same items at higher points along the scales. The underlying distribution of skills in the population does not change, but the choice of a response probability convention does have an impact on the proportion of the student population that is reported as "able to do" the items on the scales.

There is no obvious choice of a point along the probability scale that is clearly superior to any other point. If the convention were set with a boundary at 50 percent, those above the boundary would be more likely to get an item right than get it wrong, while those below that boundary would be more likely to get the item wrong than right. While this convention has some intuitive appeal, it was rejected on the grounds that having a 50/50 chance of getting the item right shows an insufficient degree of mastery. If the convention were set with a boundary at 80 percent, students above the criterion would have a high probability of success with an item. However, many of the students below this criterion show some level of achievement that would be ignored by such a stringent criterion. In particular, those in the range between 50 and 80 percent correct would be more likely to get the item right than wrong, yet would not be in the group described as "able to do" the item.

In a compromise between the 50 percent and the 80 percent conventions, NAEP has adopted two related response probability conventions: 74 percent for multiple-choice items (to correct for the possibility of answering correctly by guessing), and 65 percent for constructed-response items (where guessing is not a factor). These probability conventions were established, in part, based on an intuitive judgment that they would provide the best picture of students' knowledge and skills.

Some additional support for the dual conventions adopted by NAEP was provided by Huynh (1994, 1998). He examined the Item Response Theory (IRT) information provided by items, according to the IRT model used in scaling NAEP items. Following Bock (1972), Huynh decomposed the item information into that provided by a correct response

[P_j₁(θ)×I_j(θ)]

and that provided by an incorrect response

[ (1 – P_j₁(θ))×I_j(θ)].

Huynh showed that the item information provided by a correct response to a constructed-response item is maximized at the point along the scale at which two-thirds of the students get the item correct (for multiple-choice items with four options, information is maximized at the point at which 75 percent get the item correct). Maximizing the item information, I (θ),

rather than the information provided by a correct response

[ P_j₁(θ)×I_j(θ)]

would imply an item-mapping criterion closer to 50 percent. Maximizing just the item information, I (θ), takes into account both responses that are correct and those that are incorrect, however.

For dichotomously scored items the information function as defined by Birnbaum (1968) is defined for the j^th item as

where

	a_j	where a_j > 0, is the slope parameter of item j, characterizing its sensitivity to scale score;

	c_j	where 0 ≤ c_j < 1, is the lower asymptote parameter of item j, reflecting the chances of students of very low scale score selecting the correct option;

	θ	is the unobservable variable characterizing a person's score on scale; and

	P_j₁(θ) = I – P_j₀() is the probability of a correct response to item j given theta, a_j, b_j (the threshold parameter of item j), and c_j.

The item information function was defined by Samejima (1969) in general for polytomously scored items, and has been derived for items scaled by the generalized partial credit model (Donoghue 1993; Muraki 1993) as (in a slightly different, but equivalent form)

where m_j is the number of categories of response for item j, and P_ji(θ) is the probability of a response that is scored in the i^th of m_j ordered categories for item j given θ, a_j, b_j, and d_{j, i} (the category i threshold parameter).

Last updated 08 August 2008 (RF)

Printer-friendly Version