The goal of NAEP is to estimate the distributions of scale scores for groups of students. Because there is no way to know the underlying performance for any student exactly, NAEP uses every item response for students in every group to estimate the distribution of scale scores for groups. To build the models necessary to do this, consider the probability that a particular student has an underlying performance level of . Given everything we know about the student, this probability is of the form:
where is the vector consisting of the k underlying performance levels for the k subscales in the subject area, x is the vector consisting of the vectors of item responses, y is the vector specifying the group membership, α represents the parameters in the IRT models, and and are parameters of the population structure model.
This distribution can be separated into two parts: the distribution of the item responses conditional on the underlying performance level of the student, and the distribution of underlying performance conditional on which students are in the group.
The first of the two parts is modeled using the Item Response Theory (IRT) measurement models; the second is modeled using a population-structure model. The first of the two parts, the distribution of the item responses conditional on the underlying performance level of the student, does not depend on group membership at all, because IRT models require the assumption that underlying performance level only depends on the item responses for a student and the item parameters. The IRT measurement models are used to define the NAEP score scales.