Skip Navigation
small NCES header image

Table of Contents  |  Search Technical Documentation  |  References

NAEP Scoring → Scoring Monitoring → Within-Year Interrater Agreement → Science Interrater Agreement

Science Interrater Agreement


2005 Assessment Item-by-Item Interrater Agreement
2005 Interrater Agreement Ranges
2005 Number of Constructed-Response Items

2000 Assessment Item-by-Item Interrater Agreement
2000 Interrater Agreement Ranges
2000 Number of Constructed-Response Items

A subsample of the responses for each constructed-response item is scored by a second scorer to obtain statistics on interrater agreement. For items with a large sample size, five percent of responses are second scored. For items with smaller sample sizes, twenty-five percent of responses are second scored. This interrater agreement information is also used by the scoring supervisor to monitor the capabilities of all scorers and maintain uniformity of scoring across scorers.

Agreement reports are generated on demand by the scoring supervisor, trainer, scoring director, or NAEP test development contractor's content coordinator. Printed copies are reviewed daily by the lead scoring staff. In addition to the immediate feedback provided by online agreement reports, each scoring supervisor can also review the actual responses scored by a scorer with the backreading tool. In this way, the scoring supervisor can monitor each scorer carefully and correct difficulties in scoring almost immediately with a high degree of efficiency.

During the scoring of an item or the scoring of a calibration set, scoring supervisors monitor progress using an interrater agreement tool. This display tool functions in either of two modes:

  • to display information of all first readings versus all second readings, or

  • to display all readings of an individual that were also scored by another scorer versus the scores assigned by the other scorers.

The information is displayed as a matrix with scores awarded during first readings displayed in rows and scores awarded during second readings displayed in columns. Results may be reviewed for either individual scorers or the team as a whole. In this format, instances of exact agreement fall along the diagonal of the matrix. For completeness, data in each cell of the matrix contain the number and percentage of cases of agreement (or disagreement). The display also contains information on the total number of second readings and the overall percentage of agreement on the item. Since the interrater agreement reports are cumulative, a printed copy of the agreement of each item is made periodically and compared to previously generated reports. Scoring staff members save printed copies of all final agreement reports and archive them with the training sets.

Last updated 19 December 2008 (RF)

Printer-friendly Version

Would you like to help us improve our products and website by taking a short survey?

YES, I would like to take the survey


No Thanks

The survey consists of a few short questions and takes less than one minute to complete.
National Center for Education Statistics -
U.S. Department of Education