Monitoring within-year interrater agreement is accomplished by re-routing some responses for scoring them a second time. For all items, the scoring system selects a subset of the current-year student responses for this second scoring. Responses being second-scored are not identifiable to the scorers. The first and second scores for the subset of responses are analyzed to determine the within-year agreement. The agreement statistics can be obtained by the scoring supervisor at any point during scoring. Within-year interrater agreement is closely monitored to ensure the quality of the scoring.
The target standards for within-year agreement are as follows:
Scoring staff also need to be alert for downward changes in the within-year agreement for an item. For example, if first and second scores were in exact agreement 90 percent of the time in the morning (or on day 1 of scoring) and the rate of exact agreement declined to 82 percent in the afternoon (or day 2 of scoring), a problem may exist, even if the overall within-year agreement remains over the minimum standard. Backreading and calibration are tools used to monitor and correct declines in within-year agreement.
If within-year agreement rates fall below the indicated standards for an item and it is believed this was primarily a result of inconsistent scoring, it is possible that the item will be rescored. Decisions about rescoring of items are made by test development staff psychometricians in consultation with scoring staff and content coordinators.
For more information on the estimation of reliability based on interrater agreement, see Analysis and Scaling.