This report has described an attempt to link together the results
from 1996 NAEP mathematics and science with the results of TIMSS.
The purpose of the link was to predict TIMSS results for states
and jurisdictions based on their State NAEP results.
Because they were the only data available, the link between NAEP
and TIMSS was established using the data from the U.S. national
administrations of both assessments. Since the two assessments
differed in varying degrees in terms of the assessment specifications,
numbers and kinds of tasks presented, and administration conditions,
and since the linking data are based on assessments conducted
1 year apart, the type of link that was established was statistical
moderation. That is, the link presented in this report uses formal
linear equating procedures. However, there is no claim that the
linked results are equated in any sense of the word. Rather, these
results are, at best, applicable only to the purpose to which
they have been put in this report: the comparison of state-level
predicted TIMSS results with actual country-level results from
TIMSS.
Statistical moderation is the weakest form of linking. Unlike
equatings of assessments built to support equating, the procedures
of statistical moderation can produce markedly different numerical
links if carried out with different samples of students. As observed
by Mislevy (1992), "We would have little confidence in a comparison
of, say, subgroup means across 'moderated' test scores unless
it held up under a broad range of choices for linking samples."
For this reason, the linking was evaluated for a variety of demographic
subgroups. While the predicted values from the various subgroup-based
linkings were not significantly different from each other, there
was still enough difference to suggest that caution be used in
applying the linking functions to subpopulations. Indeed, the
ultimate linking used for predicting state-level TIMSS results
was that derived for public school students, since that was the
population for which NAEP state-level results have been published.
Since the linking functions are based on fallible data, a major
portion of this report was devoted to developing estimates of
the variability of the link attributable to various sources. In
addition to the naive variance, which assumes the linking function
was exactly known, variance estimates attributable to the following
sources were estimated: sampling, measurement error, model misspecification,
and temporal shift. Of these, by far the most important variance
component was that due to sampling, accounting for around 80 percent
of the total variance. Including these other variance components
produced a variance estimate roughly four to six times larger
than the naive variance estimate.
Evaluating the goodness of the link was hampered by the fact that
no student was administered both assessments. Consequently, it
is impossible to assess the degree of correlation between scores
on the two assessments. Since the linkage results would be highly
suspect if the two assessments were not strongly related, evidence
was sought about the potential degree of relationship between
NAEP and TIMSS proficiency estimates. One type of evidence was
a set of content comparison analyses conducted by McLaughlin and
others. The aim of this analysis was to determine the similarity
in content coverage, item types, and difficulty of the NAEP and
TIMSS instruments. The greater the similarity between the two
instruments, the more likely that the two assessments are measuring
roughly the same construct. A summary of the findings of the content
analysis is included in Appendix A, which notes the important
differences between the instruments but which also judges that
the assessments are similar enough to warrant linkage for global
comparisons.
The only direct validation of the link of the 1996 NAEP to the
1995 TIMSS came from data from Minnesota, which participated in
the 1996 State NAEP and the 1995 state-level TIMSS. The agreement
between the actual TIMSS results and the predicted TIMSS results
provides support for the use of the linkage to predict public
school, state-level TIMSS results at grade 8. Further validation
of the link comes from the data from Missouri and Oregon. These
states, who participated in the 1996 State NAEP, also participated
in a special assessment of the TIMSS in their states in 1997.
While the results of these assessments have not yet been publicly
released, and while the data come from a 1997 rather than a 1995
administration of TIMSS, the predicted TIMSS results for these
states using the 1995 TIMSS/1996 NAEP linking function were consistent
(within acceptable statistical bounds) with their actual TIMSS
results.
This adds support to the utility of the link for purposes such
as approximate comparisons of the relative rankings of individual
states versus other countries, but is likely not adequate for
extensive analyses based on the point estimates of scores. And,
of course, there is no guarantee that a validation conducted in
other states would always have produced similar results. The reader
is reminded that the moderation type of linking required for the
available NAEP and TIMSS data is the weakest in terms of the strength
and stability of the linkage produced, and in terms of the generalizability
of the linkage. As discussed in the report, there have been a
number of examples where such a linking has been judged as only
adequate for rough comparisons. In fact, a similar validation
analysis conducted on an equivalent linking based on fourth grade
data has proven more problematic than the eighth grade link and
is still undergoing review by NCES.
The fact that the link was formed in the direction of predicting
TIMSS from NAEP removes from the link issues such as the applicability
of a linkage function across diverse languages and educational
systemsissues which would be of paramount importance if the linkage
was in the direction of predicting NAEP from TIMSS. The links
presented in this report express U.S. State NAEP results in terms
of the U.S. TIMSS distribution. Consequently, the comparability
of the predicted TIMSS results for U.S. states to the actual TIMSS
results for the TIMSS countries is largely on the same footing
as the comparability of the actual U.S. TIMSS results to the actual
TIMSS results for other countries.
The link assumes comparability of NAEP across states and assumes
that the relationship between NAEP and TIMSS is the same within
the states as it is in the country as a whole. The validation
of the link based on Minnesota data lends credence to this assumption.
Of course, one will never know if the link would hold equivalently
in all states. Also, there is no guarantee that the link established
in this report would hold in subsequent years. Nevertheless, this
linkage should be quite serviceable for its stated purpose of
comparing state-level, public school performance from the 1996
NAEP at the eighth grade.
For a more detailed set of comparisons between states and countries,
see Johnson and Siegendorf (1998).