Skip Navigation
small NCES header image
NAEP Scoring → Scoring Monitoring → Within-Year Interrater Agreement

Within-Year Interrater Agreement


Arts Interrater Agreement

Civics Interrater Agreement

Economics Interrater Agreement

Geography Interrater Agreement

Mathematics Interrater Agreement

Reading Interrater Agreement

Science Interrater Agreement

U.S. History Interrater Agreement

Writing Interrater Agreement

t statistics


Monitoring within-year interrater agreement is accomplished by re-routing some responses for scoring them a second time. For all items, the scoring system selects a subset of the current-year student responses for this second scoring. Responses being second-scored are not identifiable to the scorers. The first and second scores for the subset of responses are analyzed to determine the within-year agreement. The agreement statistics can be obtained by the scoring supervisor at any point during scoring. Within-year interrater agreement is closely monitored to ensure the quality of the scoring.

The target standards for within-year agreement are as follows:

  • items scored on 2-point scales: 85 percent exact agreement,
  • items scored on 3-point scales: 80 percent exact agreement,
  • items scored on 4-point and 5-point scales: 75 percent exact agreement, and
  • items scored on 6-point scales: 60 percent exact agreement.

Scoring staff also need to be alert for downward changes in the within-year agreement for an item. For example, if first and second scores were in exact agreement 90 percent of the time in the morning (or on day 1 of scoring) and the rate of exact agreement declined to 82 percent in the afternoon (or day 2 of scoring), a problem may exist, even if the overall within-year agreement remains over the minimum standard. Backreading and calibration are tools used to monitor and correct declines in within-year agreement.

If within-year agreement rates fall below the indicated standards for an item and it is believed this was primarily a result of inconsistent scoring, it is possible that the item will be rescored. Decisions about rescoring of items are made by test development staff psychometricians in consultation with scoring staff and content coordinators.  

For more information on the estimation of reliability based on interrater agreement, see Analysis and Scaling.


Last updated 26 January 2010 (JL)
Would you like to help us improve our products and website by taking a short survey?

YES, I would like to take the survey


No Thanks

The survey consists of a few short questions and takes less than one minute to complete.
National Center for Education Statistics -
U.S. Department of Education