Skip to main content
Skip Navigation

Table of Contents | Search Technical Documentation | References

NAEP Analysis and Scaling → Scale Linking and Transformation to the Reporting Metric

NAEP Technical DocumentationScale Linking and Transformation to the Reporting Metric

   
   

Transformation Equation

Linking Diagrams for NAEP Assessments

Transformation Constants for NAEP Assessments

Comparisons of Distributions of Linking Samples for NAEP Assessments

NAEP Variance Estimation During Transition to Digitally Based Assessment

A goal of NAEP is to measure changes in performance over time. Results from different administrations must be based on a common scale in order for valid comparisons to be made. Five situations in which NAEP scales must be linked to existing NAEP assessment scales are

  • when the NAEP subject has been assessed before;

  • when a NAEP subject had been assessed in state samples using administrators supplied by the states (for assessments prior to 2002);

  • when a NAEP subject has been assessed for a special sample or in a special way;

  • when a NAEP subject has been assessed in combined national and state samples (as of 2002); and

  • when a NAEP subject transitions from a paper-based assessment (PBA) to a digitally based assessment (DBA) for the first time.

NAEP scales are linked to scales from previous assessments via a common calibration linking (also known as common item linking) procedure. Essentially, the data from the current assessment year and the last previous assessment year are calibrated together. Data from the two assessments are scaled together in the same run, specifying the samples for each assessment as coming from different populations. For each scale, the mean and the standard deviation of the last previous assessment data from this joint calibration are matched, or equal, to the mean and standard deviation of the last previous assessment data as reported in the last NAEP report card for that subject. This process links the current data to the previously established scales.

For 2000 and prior state assessments, NAEP state scales were linked to the national scales via a common population linking procedure. The response data from the participating jurisdictions was scaled separately from the nation. Then the mean and standard deviation of the state data from the jurisdictions that were represented by the national sample were matched to the mean and standard deviation for the national linking sample (NLS). The national linking sample contains any national data that represents students from jurisdictions participating in the state assessment. This process linked the current state data to the current national data and, so, to the previously established scales.

In the combined national and state samples in 2002 and onward, NAEP scales are no longer calibrated separately for the nation and the participating states. All public and private school students are combined to form the national sample, which is calibrated jointly with the previous year’s data. Subsequently, the mean and standard deviation of the last previous assessment data from the joint calibration are matched to the mean and standard deviation of the last previous assessment data as reported in the last NAEP report card for that subject. These transformation constants are then applied to the current year, thus linking the current data to the previously established scales.

Special sample data may be linked to NAEP scales in a variety of ways. In 1998 and 2000, special linking transformations were created to link the scales for samples of students for which accommodations were permitted to the scales for samples of students for which no accommodations were permitted. For these links, students who were not identified as those with a disability (SD) or as English learners (EL) were used for a special common calibration/common population linking. Because they, by definition, did not require and were not offered accommodations, these students were constrained to have the same mean and standard deviation for their scale scores whether their response data were scaled with students who were offered accommodations or not. This process linked the scales for the accommodated samples to the scales for the non-accommodated samples and, so, to the previously established scales.

The 2017 NAEP mathematics and reading assessments were designed to continue reporting trends in student performance dating back to the early 1990s, while keeping pace with the new generation of classroom environments in which digital technology has become an increasing part of students' learning. After the administration of the assessment, NCES conducted rigorous analyses of the data and aligned the 2017 results to previous assessment years using a two-step process. First, common item linking was used to calculate the trend line from 2015 to 2017 based on the paper-based assessment results. This kind of linking was possible because the majority of the 2017 assessment questions were also administered in 2015 and showed the same statistical properties. Second, common population linking was used to align the 2017 paper-based assessment results with the 2017 digitally based assessment results. This kind of linking was possible because the samples of students for each assessment mode were randomly equivalent; that is, each random sample included students from the same school, ensuring that the students' educational experiences and characteristics were equivalent. These analyses—common item linking based on paper results and common population linking of paper results to digital results—enabled NCES to successfully maintain the mathematics and reading trend lines while transitioning to digitally based assessments.

After scaling, which results in scores centered around zero, NAEP scales must be transformed to NAEP reporting metrics. Beginning in 1996, the metrics for newly established NAEP scales have been most often set to have a mean of 150 and a standard deviation of 35. Prior to 1996, the metrics for newly established NAEP scales were most often set to have a mean of 250 and a standard deviation of 50. Each subject area scale is linked to that from the previous assessment; then, if there are several subscales for the subject area, the composite scale scores can be calculated as the weighted sum of transformed subscale scores.


Last updated 11 January 2023 (PG)