Table of Contents  |  Search Technical Documentation  |  References

NAEP Scoring → Training for Scoring of NAEP Items → Training the Scorers → Training Packets → Training Scorers in Trend Items

Training Scorers in Trend Items

Trend scoring is used in NAEP to compare the consistency of scoring over time (i.e., cross-year interrater agreement). During trend scoring, the NAEP electronic scoring system allows for the presentation of a pool of scored responses from a prior assessment to current scorers. Comparing current scores to the scores given in the prior assessment offers the ability to generate reports to evaluate scoring consistency over time for a specific NAEP item.

The trend set is an important addition to the traditional paper training sets. Trend responses are the randomly pulled responses drawn for each item from prior scoring. Each trend set is composed of 600 responses and is available before (on compact disc) and during (electronic scoring system) the project for trainer use and review.

After thorough preparation using the paper training sets, the trainer reviews a broader range of responses (via compact disc) using the trend responses specific to that item. This occurs during the trainer preparation period (i.e. before the scoring window). It is important to note that the trend scores do not represent validity; the responses may be scored correctly or incorrectly. Therefore, when trainers review trend during the preparation phase, trend scores should be thought of as pattern markers, indicators of scoring patterns that may or may not have been evident within the paper-training sets.

After being trained on a new item, the scorers are given a trend set with 200 responses to score. If the scorers as a group meet minimum performance criteria, they then begin the scoring of current year responses. Performance criteria are:

  • The cross-year t statistic (the difference between item mean scores used to evaluate drift in cross-year scoring performance) should be between -1.5 and +1.5, and
  • The cross-year interrater agreement rate should not differ by more than 7 percentage points in exact agreement from the interrater agreement rate in the adjacent scoring year

If they do not, the process is repeated until they do meet the criteria. Clarification of training, or retraining, often is conducted prior to the next attempt.

Trend responses can serve other functions as well.

During trainer preparation the trainers may:

  • Review compact disc for score point examples to solidify issues covered in paper training sets,
  • Review for particular score point examples to clarify issues not thoroughly covered in paper training sets or that may be additional issues covered by addenda or notes, and
  • Practice scoring trend responses to get an overall sense of trend pool effectiveness prior to item training. Run statistics electronically or do hand calculations to monitor progress

While training scorers, trend responses may be:

  • Used for individual, pair, and/or group scoring activities, as needed, and
  • Used to meet performance criteria before scoring current-year responses.

As a calibration tool, trend responses may be:

  • Used in the morning, after lunch, after breaks of more than 15 minutes, or intermittently during current-year scoring (i.e., calibrating for team drift),
  • Used for individual, pair, and/or group scoring activities, as needed, and
  • Used in small increments as a group scoring activity. May use in conjunction with correctly scored paper sets compiled from training trend or current-year backreading

When retraining scorers, trend responses may be:

  • Used as a readiness indicator after retraining a previously scored and reset item,
  • Used for individual, pair, and/or group scoring activities, as needed, and
  • Used in increments consistent with getting useful feedback given item difficulty level, team size, and/or time constraints

Last updated 10 December 2008 (RF)

Printer-friendly Version