If there were statistics that could be used to appropriately test the fit of the data for each item to the Item Response Theory (IRT) scales, decisions about the use of items within IRT scales would be simple. Item fit statistics, such as those provided by PARSCALE (Muraki and Bock 1997), provide a measure of how closely the observed student item responses match what the IRT models would predict. Most of the statistics of this type that are available for use have distributions that are unknown. Therefore, they cannot be used for final decisions about the fit of the items to the IRT model. Because of the lack of statistical tests for IRT model fit, the fit of the IRT models to the observed data is visually examined within each scale by comparing the empirical item response functions (IRFs) with the theoretical curves. The primary means of accomplishing this is to generate plots of empirical versus theoretical item response curves. In practice, item fit statistics are used as a way to identify items that need further examination.