NCES: NAEP-TIMSS Linkage: Chapter 11: Conclusions

11. CONCLUSIONS

This report has described an attempt to link together the results from 1996 NAEP mathematics and science with the results of TIMSS. The purpose of the link was to predict TIMSS results for states and jurisdictions based on their State NAEP results.

Because they were the only data available, the link between NAEP and TIMSS was established using the data from the U.S. national administrations of both assessments. Since the two assessments differed in varying degrees in terms of the assessment specifications, numbers and kinds of tasks presented, and administration conditions, and since the linking data are based on assessments conducted 1 year apart, the type of link that was established was statistical moderation. That is, the link presented in this report uses formal linear equating procedures. However, there is no claim that the linked results are equated in any sense of the word. Rather, these results are, at best, applicable only to the purpose to which they have been put in this report: the comparison of state-level predicted TIMSS results with actual country-level results from TIMSS.

Statistical moderation is the weakest form of linking. Unlike equatings of assessments built to support equating, the procedures of statistical moderation can produce markedly different numerical links if carried out with different samples of students. As observed by Mislevy (1992), "We would have little confidence in a comparison of, say, subgroup means across 'moderated' test scores unless it held up under a broad range of choices for linking samples."

For this reason, the linking was evaluated for a variety of demographic subgroups. While the predicted values from the various subgroup-based linkings were not significantly different from each other, there was still enough difference to suggest that caution be used in applying the linking functions to subpopulations. Indeed, the ultimate linking used for predicting state-level TIMSS results was that derived for public school students, since that was the population for which NAEP state-level results have been published.

Since the linking functions are based on fallible data, a major portion of this report was devoted to developing estimates of the variability of the link attributable to various sources. In addition to the naive variance, which assumes the linking function was exactly known, variance estimates attributable to the following sources were estimated: sampling, measurement error, model misspecification, and temporal shift. Of these, by far the most important variance component was that due to sampling, accounting for around 80 percent of the total variance. Including these other variance components produced a variance estimate roughly four to six times larger than the naive variance estimate.

Evaluating the goodness of the link was hampered by the fact that no student was administered both assessments. Consequently, it is impossible to assess the degree of correlation between scores on the two assessments. Since the linkage results would be highly suspect if the two assessments were not strongly related, evidence was sought about the potential degree of relationship between NAEP and TIMSS proficiency estimates. One type of evidence was a set of content comparison analyses conducted by McLaughlin and others. The aim of this analysis was to determine the similarity in content coverage, item types, and difficulty of the NAEP and TIMSS instruments. The greater the similarity between the two instruments, the more likely that the two assessments are measuring roughly the same construct. A summary of the findings of the content analysis is included in Appendix A, which notes the important differences between the instruments but which also judges that the assessments are similar enough to warrant linkage for global comparisons.

The only direct validation of the link of the 1996 NAEP to the 1995 TIMSS came from data from Minnesota, which participated in the 1996 State NAEP and the 1995 state-level TIMSS. The agreement between the actual TIMSS results and the predicted TIMSS results provides support for the use of the linkage to predict public school, state-level TIMSS results at grade 8. Further validation of the link comes from the data from Missouri and Oregon. These states, who participated in the 1996 State NAEP, also participated in a special assessment of the TIMSS in their states in 1997. While the results of these assessments have not yet been publicly released, and while the data come from a 1997 rather than a 1995 administration of TIMSS, the predicted TIMSS results for these states using the 1995 TIMSS/1996 NAEP linking function were consistent (within acceptable statistical bounds) with their actual TIMSS results.

This adds support to the utility of the link for purposes such as approximate comparisons of the relative rankings of individual states versus other countries, but is likely not adequate for extensive analyses based on the point estimates of scores. And, of course, there is no guarantee that a validation conducted in other states would always have produced similar results. The reader is reminded that the moderation type of linking required for the available NAEP and TIMSS data is the weakest in terms of the strength and stability of the linkage produced, and in terms of the generalizability of the linkage. As discussed in the report, there have been a number of examples where such a linking has been judged as only adequate for rough comparisons. In fact, a similar validation analysis conducted on an equivalent linking based on fourth grade data has proven more problematic than the eighth grade link and is still undergoing review by NCES.

The fact that the link was formed in the direction of predicting TIMSS from NAEP removes from the link issues such as the applicability of a linkage function across diverse languages and educational systems—issues which would be of paramount importance if the linkage was in the direction of predicting NAEP from TIMSS. The links presented in this report express U.S. State NAEP results in terms of the U.S. TIMSS distribution. Consequently, the comparability of the predicted TIMSS results for U.S. states to the actual TIMSS results for the TIMSS countries is largely on the same footing as the comparability of the actual U.S. TIMSS results to the actual TIMSS results for other countries.

The link assumes comparability of NAEP across states and assumes that the relationship between NAEP and TIMSS is the same within the states as it is in the country as a whole. The validation of the link based on Minnesota data lends credence to this assumption.

Of course, one will never know if the link would hold equivalently in all states. Also, there is no guarantee that the link established in this report would hold in subsequent years. Nevertheless, this linkage should be quite serviceable for its stated purpose of comparing state-level, public school performance from the 1996 NAEP at the eighth grade.

For a more detailed set of comparisons between states and countries, see Johnson and Siegendorf (1998).