NAEP Technical DocumentationNAEP Scales

Number of Items in Each Subject Area Scale

Scale Composition by Item Type for New and Trend Blocks

Ranges for the Final Subject Area Scales

Correlations Among NAEP Subject Area Subscales

NAEP Item Response Theory (IRT) scales are determined a priori by grouping items into content domains for which overall performance is deemed to be of interest. The content domains are defined by NAEP frameworks, which have been the responsibility of the National Assessment Governing Board since 1998. Frameworks for some subject areas (e.g., mathematics and reading) specify multiple content-related subscales, while others (e.g., writing and all long-term trend assessment subjects) specify a single scale.

For all of the IRT scales, there is a linear indeterminacy between the values of item parameters and proficiency parameters. That is, mathematically equivalent but different values of item parameters can be estimated on an arbitrarily linearly transformed proficiency scale. This linear indeterminacy can be resolved by setting the origin and unit size of the proficiency scale to arbitrary constants, such as a mean of 0 with a standard deviation of 1. The indeterminacy is most apparent when the scale is set for the first time.

Final results for each subject area are linearly transformed from the original scale to a 0–500 or a 0–300 scale.

When content area scales are specified, a composite scale is usually created from them. The frameworks specify the weights assigned to each of the content area scales when a composite scale is created.

Last updated 05 October 2023 (SK)

Printer-friendly Version

​NAEP Technical DocumentationNAEP Scales

NAEP Technical DocumentationNAEP Scales