Skip to main content
Skip Navigation

Table of Contents  |  Search Technical Documentation  |  References

NAEP Scoring → Training for Scoring of NAEP Items → Training the Scorers

NAEP Technical DocumentationTraining the Scorers


Selection of Training Papers

Training Packets


When scorers first arrive and teams are assembled, the content area specialist who oversees scoring for a particular content area reviews general scoring information with all teams as a large group prior to item-level training and scoring. Scoring teams generally consist of a trainer, supervisor and 10-12 scorers. Training includes an overview of the NAEP assessment, including general scoring guidelines, interrater agreement and scoring rate expectations, evaluations, scorer bias, and grade-level considerations regarding items.

Training continues in more detail at the team level with the supervisors.

Trainers conduct training on individual constructed-response items as the scoring supervisors monitor the progress of individual scorers. Trainers and scoring supervisors keep notes on all scoring decisions and refinements made to scoring guides, anchor papers, or practice sets during training of items. Such refinements are prevalent during pilot and field test scoring, but occur much less often during operational scoring. Scoring guides and training papers with annotations explaining the scoring of each paper must be complete and correct and reflect how the item has been trained. Flip charts and wall charts developed by trainers are used to document scoring decisions. The scoring supervisors make certain that all scorers are taking notes on such decisions throughout the item-level training sessions.

All NAEP team-level training and scoring is conducted on an item-level basis (note that all scoring guides in NAEP are item-specific, except for writing, which uses a more general six point scoring guide). Each team is trained one item at a time. All responses for that item are scored by that team before the next item is introduced. This model promotes scoring accuracy by focusing scorers on just one item at a time.

Last updated 10 December 2008 (RF)

Printer-friendly Version