NAEP's primary goal is to provide information about what students know and can do. NAEP uses marginal maximum likelihood methodologies to estimate characteristics of the scale score distributions for groups of students. Because NAEP is designed only to provide information about groups of students, NAEP does not have the information needed to produce scale scores for individual students.
In NAEP, Item Response Theory (IRT) is used to model the relationship between student item responses and item characteristics, such as difficulty and discrimination. In the estimation of item parameters and, therefore, in the creation of the score scale using IRT, group membership and background variables play no role. Group membership is only used in differential item functioning (DIF) analyses to verify that the fit of the models to the data is similar for target student groups.
To estimate the scale score distributions for populations and groups of students, population-structure models are used in NAEP. In order to do this, the populations and groups must be identified. Groups in NAEP are identified through background variables that are available from the student sampling process or from background questionnaires administered to students, teachers and school officials. Marginal maximum likelihood methods, which are also used to estimate parameters for the IRT models, are used to estimate parameters for the population-structure models. So, group score distributions are estimated directly without the need for individual student scores.
As a convenience to produce summary statistics, plausible values are calculated and attached to the record of each student. These are used to calculate summary statistics for NAEP reports and are available for the use of NAEP data users in secondary analyses of NAEP data. The appropriate use of plausible values in analysis of NAEP data assures that the results reflect the population-structure models estimated for NAEP datasets.
A detailed development of the methodology used in NAEP is given in Mislevy (1991). Along with theoretical justifications, that paper presents comparisons with standard procedures, discussions of biases that arise in some secondary analyses, and numerical examples. Mislevy, Beaton, Kaplan, and Sheehan (1992) also provide an introduction to the methodologies used in NAEP as well as a comparison with standard psychometric analyses.