The probability for a student with an underlying performance level of θk on scale k to have response i for item j is Pji (θk), where Pji (θk) is of the form appropriate to the type of item (dichotomous or polytomous). After x is observed,
can be viewed as a likelihood function, where mj is equal to 2 for dichotomous items or to the number of categories defined for polytomous items, and uji is an indicator variable defined by
The likelihood function is an equation that relates the responses for a student to the item parameters for the items the student answered and to the student's underlying performance level. The item parameters for NAEP assessments are estimated using marginal maximum likelihood methods. Marginal maximum likelihood methods are iterative procedures used to estimate item parameters in which an initial distribution of scale scores is assumed for the sample of students. Based on the initial distribution of scale scores, interim estimates of the item parameters can be calculated. Then these interim item parameter estimates are used to calculate a new and improved interim distribution of scale scores. From this interim distribution of scale scores new interim estimates of the item parameters are calculated. This procedure is repeated until the numerical values for the item parameters and scale score distributions converge on estimates that best fit the IRT model. After they are estimated, the item parameters are treated as known in subsequent calculations. Then, a likelihood function for the scale score θk for a student is induced by a vector of responses to the subset of the calibrated items the student answered. The likelihood function for the scale score for a student makes it possible to make inferences about the scale score distributions from samples where no student is administered all of the items.
For the purposes of estimation and reporting item parameter estimates and other intermediary estimates, the linear indeterminacies of the Item Response Theory (IRT) models are resolved by an arbitrary choice of the origin and unit size in a given scale. In most cases, a provisional scale standardizing the score scale distribution to have mean 0 and standard deviation 1 is employed. So, the item parameters for NAEP are reported on a provisional 0, 1 scale. Final results for each subject area are linearly transformed to the appropriate scale.
In NAEP analyses, for subject areas with multiple scales (i.e., national main geography, mathematics, reading, science, U.S. history, and music), the parameters of the items constituting each of the separate subscales are estimated independently of the parameters of the other subscales.
Estimates of item parameters were obtained using a NAEP BILOG/PARSCALE program, which combines Mislevy and Bock's (1982) BILOG and Muraki and Bock's (1991) PARSCALE computer programs, and which concurrently estimates parameters for all items (dichotomous and polytomous). The NAEP BILOG/PARSCALE program has also been adapted to make use of student sampling weights.