The local independence assumption implies that item response probabilities depend only on θ and the specified item parameters, and not on the position of the item in the booklet, the content of items around an item of interest, or the test-administration and timing conditions. However, these effects are certainly present in any application. The practical question is whether inferences concerning aggregate performance in the scaling area that are based on the Item Response Theory (IRT) probabilities are robust with respect to the ideal assumptions underlying the IRT model.
The 1986 NAEP reading anomaly (Beaton and Zwick 1990) has shown that for measuring small changes over time, changes in item context and speededness conditions can lead to unacceptably large random error components. These can be avoided by presenting items used to measure change in identical test forms, with identical timings and administration conditions. In NAEP, items are grouped into blocks of items that always appear together. Thus, NAEP does not maintain that the item parameter estimates obtained in any particular booklet configuration are appropriate for other conceivable configurations. Rather, NAEP assumes that the parameter estimates are context-bound.
For this reason, only a limited number of blocks of items are released and replaced after each assessment cycle. It is also the reason NAEP prefers common population equating to common item equating whenever equivalent random samples are available for linking. In common item equating, items are assumed to be measuring exactly the same thing for two or more populations, despite any differences in context or administration. In common population equating, results for two or more samples from the same population are matched to one another when linking the scales.