NAEP Item Response Theory (IRT) scales are determined a priori by grouping items into content domains for which overall performance is deemed to be of interest. The content domains are defined by NAEP frameworks, which have been the responsibility of the National Assessment Governing Board since 1998. Frameworks for some subject areas (e.g., mathematics and reading) specify multiple content-related subscales, while others (e.g., writing and all long-term trend assessment subjects) specify a single scale.
For all of the IRT scales, there is a linear indeterminacy between the values of item parameters and proficiency parameters. That is, mathematically equivalent but different values of item parameters can be estimated on an arbitrarily linearly transformed proficiency scale. This linear indeterminacy can be resolved by setting the origin and unit size of the proficiency scale to arbitrary constants, such as a mean of 0 with a standard deviation of 1. The indeterminacy is most apparent when the scale is set for the first time.
Final results for each subject area are linearly transformed from the original scale (e.g., −3.0 to 3.0) to a 0–500 or a 0–300 scale.
When content area scales are specified, a composite scale is usually created from them. The frameworks specify the weights assigned to each of the content area scales when a composite scale is created.