In order to calculate the many statistics estimated for each NAEP sample and to provide data for secondary analysis, NAEP uses the information provided by the population and subgroup distributions formed using the measurement and population-structure models to create plausible values. The plausible values can be used in standard statistical equations for many statistics of interest and can be used to correctly estimate the standard errors for those statistics, as long as the population-structure model includes any groups for which statistics are calculated.
The combination of Item Response Theory (IRT) models and population-structure models provides an estimated distribution of underlying performance for the population and subgroups of interest. This distribution is
where x is the matrix of item responses, y is the matrix containing group membership information, α is the matrix of IRT parameters, and and are parameters from the population-structure models. The goal of NAEP is to summarize different characteristics of this distribution.
Any statistic, t, of interest can be calculated directly on the basis of this estimated distribution of underlying performance. However, to allow secondary analyses of NAEP data to be conducted with software available in most statistical packages, five plausible values are assigned to each student record. The plausible values or the average of the plausible values attached to student record r cannot be treated as a student's scale score.
Plausible values can be thought of as a mechanism for accounting for the fact that the true scale scores describing the underlying performance for each student are unknown. Because the IRT models are latent variable models, r values are not observed, even for the students in a NAEP sample. To overcome this problem, we follow Rubin (1987) by considering r as "missing data," and approximating the values of a statistic t based on r's for all students by t's expected value given x and y, the data that actually were observed.