4. ESTABLISHING THE LINK
As was mentioned earlier, the link between TIMSS and NAEP is based
on applying formal equating procedures to match up characteristics
of the score distribution of the 1996 NAEP with the characteristics
of the score distribution of the 1995 administration of TIMSS
in the United States. The simplest link is linear linking, where
the NAEP distribution is adjusted so that the mean and standard
deviation of the adjusted NAEP proficiencies for the 1996 U.S.
population match the mean and standard deviation on the 1995 U.S.
TIMSS population.
Linear linking assumes that the two distributions have the same
characteristics apart from their means and standard deviations.
In particular, if linear linking is valid, then after adjustment
of the means and standard deviations, the percentiles of the two
distributions will be similar. If this assumption is not true,
such as when one distribution is more skewed than the other, linear
linking may not provide an adequate linking between the two populations.
However, comparisons of the distributions of NAEP and TIMSS shows
that the two distributions have a similar shape for both mathematics
and science at grade 8. The panels in Figure 3 show comparisons
of the NAEP and TIMSS distributions for grade 8 mathematics and
grade 8 science based on a graphical technique called suspended
rootograms (Wainer 1974). The TIMSS scale for a given subject
was divided into 25point intervals, and the percentage of students
in each interval was estimated. The matching NAEP scale for that
subject was transformed to have the same mean and standard deviation
as the TIMSS scale, and the percentage of students with transformed
NAEP plausible values within each of the 25point intervals was
estimated. Following Tukey (1977), the square root of these two
percentages were compared.^{1}
The heights of each of the unshaded bars in each panel of Figure 3 correspond to the square root of the percentage of students in
the TIMSS sample in each 25point interval.
^{1 }The square root transformation allows for more effective comparisons
of percentages when the percentage expected is to vary over the
range of intervals.
Figure 3.—Rootograms comparing proficiency distributions for 1995
TIMSS and 1996 NAEP
(NAEP distributions adjusted to have same mean and standard deviation
as TIMSS)
The shaded bars show the difference in root percentages between
the TIMSS and the transformed NAEP distributions. Positive differences
indicate intervals where the percentages from the transformed
NAEP are lower than those from the TIMSS, while negative differences
indicate the reverse. In both cases, the differences in root percentages
are small, suggesting that the shape of the NAEP and TIMSS distributions
are similar enough to warrant a linear linking.
The linking of TIMSS to NAEP can be expressed by the following
equation:
(1)
where x is a value on the NAEP scale, is the transformed value of x onto the TIMSS scale and
where and are, respectively, the mean and standard deviation of the NAEP
U.S. sample and and are the mean and standard deviation of the matching TIMSS U.S.
sample. The functional notation is meant to stress that is a function of and , derived from the U.S. samples, as well as of x, determined from some other sample, such as from data from some
state that participated in State NAEP.
Table 1 gives the values of the linking functions for the two subjects.
As is appropriate for such data, the estimates of the mean and
standard deviation for the NAEP and TIMSS samples took the sample
design into account by using the sampling weights for estimation.
Additionally, as is discussed later, neither NAEP nor TIMSS provide
individual proficiencies for students. Rather, both assessments
provide five plausible values, each providing a separate, and
equivalently good, estimate of the mean and standard deviation.
Following accepted NAEP practice (see Mislevy, Johnson, and Muraki
1992), the five estimates of and were paired with the five estimates of and (with the pairing arbitrarily in the order in which the sets
of plausible values were on the database). Five values of and were then computed, one set for each pair of plausible values.
The final values of and are the average of the five values.
The difference in the values of the and statistics for the two subjects is partly an artifact of the
differences in the metrics used in the NAEP and TIMSS scales.
The TIMSS scales for grade 8 mathematics and grade 8 science were
set to have a mean of 500 and a standard deviation of 100 across
the participating countries. On the other hand, the NAEP mathematics
and science scales differed from each other. The NAEP 1996 mathematics
scale for grade 8 was linked to a 500point scale established
in 1990 across the grades 4, 8, and 12. The parameter for grade 8 science having a different sign than the
parameter for mathematics reflects that the grade 8 NAEP science
scales are expressed on a 300point withingrade metric rather
than a 500point acrossgrade metric.
