Multiple-choice items and constructed-response items that are scored as correct or incorrect are analyzed using standard classical test theory (Allen and Yen 1979) procedures resulting in a report for each item that includes
the proportion of examinees who received a correct score on the item,
delta, an index of item difficulty based on p+ (the percentage of examinees who received a correct score on the item). Delta is obtained by converting p+ to a standard score using an inverse normal distribution, then applying a linear transformation to obtain values with a mean of 13 and a standard deviation of 4.
the biserial correlation coefficient between the item score and the total score for the block in which the item appears; and
the point-biserial correlation coefficient (item-score correlation coefficient) between the item score and the total score for the block in which the item appears.
NAEP data analysts create a report for all response options for each item, and these reports are referred to as "descriptive item statistics." This information is based on collected data from the total sample of examinees, including those who omitted and/or did not reach an item.
The total block score is computed by summing the number of correct responses across the dichotomously scored items, plus adding to that sum the number of points earned on each polytomously scored item.
In NAEP, a distinction is made between types of missing responses. Missing responses at the end of a block (i.e., missing responses after the last item answered) are considered not reached. Missing responses before the last observed response are considered intentional omissions. In NAEP classical item analysis, omitted responses to multiple-choice items are treated as incorrect responses. This differs from how missing responses are treated in NAEP Item Response Theory (IRT) analyses. In IRT analysis, items that were not reached are not treated as incorrect; these item are treated as if they were not presented to the student. Omitted responses to multiple-choice items are treated as fractionally correct; if it is not a multiple-choice item, the omitted response is scored so that the response is in the lowest response category.