Because the goal of NAEP is to estimate the distributions of scales' scores for groups of students, population-structure models are used along with Item Response Theory (IRT) models to appropriately estimate population and subgroup distributions. To understand population-structure models, consider the probability that a particular student has an underlying performance level of . Given only the population model parameters ( and ) and the group membership for the student, this probability is of the form:
where y is the vector specifying the group membership. A normal (Gaussian) form is assumed for this probability. The common variance-covariance matrix for this distribution is denoted by ; the mean is given by a linear model with slope parameters, , based on the variables that define the groups of interest. The following linear model is fit to the data within each subject area.
where ε has a multivariate normal distribution with mean zero and variance-covariance matrix . As in regression analysis, is a matrix, with each column containing the effects for one scale, and is the matrix variance-covariance of residuals between scales.
The group-defining variables used in NAEP population-structure models are derived from answers to student, teacher, and school questionnaires, demographic and background data, and other known student information. Vectors of student-based response data are formed into main effects, two-way, and three-way interactions according to the specifications for a given assessment subject and year.
For the NAEP main assessments, several hundred background variable contrasts are formed. In order to eliminate inherent instabilities in estimation encountered when using a large number of correlated variables, principal components analysis is used to eliminate variables that have very little variance and those that are highly collinear with other variables. Principal components analysis is a multivariate data reduction technique that analyzes the covariance structure of a matrix. The principal component scores that result, rather than the original variable contrasts, are used as the predictor variables in estimating the NAEP population-structure model. For computational stability and due to computational limitations, a large number, but not all, of the principal components based on this transformation are used as the variables in estimating the population-structure models. The number of principal components is selected so that 90 percent of the overall variance of the group-defining variable contrasts is accounted for by the principal components included in the population-structure models.
In analyses of data from the long-term trend assessments, the group-defining variables consist of main effects and interactions formed from the smaller set of background variables (rather than principal components of those variables) available in the long-term trend assessments.