NCES: NAEP-TIMSS Linkage: Chapter 4: Establishing the Link

4. ESTABLISHING THE LINK

As was mentioned earlier, the link between TIMSS and NAEP is based on applying formal equating procedures to match up characteristics of the score distribution of the 1996 NAEP with the characteristics of the score distribution of the 1995 administration of TIMSS in the United States. The simplest link is linear linking, where the NAEP distribution is adjusted so that the mean and standard deviation of the adjusted NAEP proficiencies for the 1996 U.S. population match the mean and standard deviation on the 1995 U.S. TIMSS population.

Linear linking assumes that the two distributions have the same characteristics apart from their means and standard deviations. In particular, if linear linking is valid, then after adjustment of the means and standard deviations, the percentiles of the two distributions will be similar. If this assumption is not true, such as when one distribution is more skewed than the other, linear linking may not provide an adequate linking between the two populations.

However, comparisons of the distributions of NAEP and TIMSS shows that the two distributions have a similar shape for both mathematics and science at grade 8. The panels in Figure 3 show comparisons of the NAEP and TIMSS distributions for grade 8 mathematics and grade 8 science based on a graphical technique called suspended rootograms (Wainer 1974). The TIMSS scale for a given subject was divided into 25-point intervals, and the percentage of students in each interval was estimated. The matching NAEP scale for that subject was transformed to have the same mean and standard deviation as the TIMSS scale, and the percentage of students with transformed NAEP plausible values within each of the 25-point intervals was estimated. Following Tukey (1977), the square root of these two percentages were compared.¹

The heights of each of the unshaded bars in each panel of Figure 3 correspond to the square root of the percentage of students in the TIMSS sample in each 25-point interval.

¹The square root transformation allows for more effective comparisons of percentages when the percentage expected is to vary over the range of intervals.

Figure 3.—Rootograms comparing proficiency distributions for 1995 TIMSS and 1996 NAEP

(NAEP distributions adjusted to have same mean and standard deviation as TIMSS)

The shaded bars show the difference in root percentages between the TIMSS and the transformed NAEP distributions. Positive differences indicate intervals where the percentages from the transformed NAEP are lower than those from the TIMSS, while negative differences indicate the reverse. In both cases, the differences in root percentages are small, suggesting that the shape of the NAEP and TIMSS distributions are similar enough to warrant a linear linking.

The linking of TIMSS to NAEP can be expressed by the following equation:

(1)

where x is a value on the NAEP scale, is the transformed value of x onto the TIMSS scale and

where and are, respectively, the mean and standard deviation of the NAEP U.S. sample and and are the mean and standard deviation of the matching TIMSS U.S. sample. The functional notation is meant to stress that is a function of and , derived from the U.S. samples, as well as of x, determined from some other sample, such as from data from some state that participated in State NAEP.

Table 1 gives the values of the linking functions for the two subjects. As is appropriate for such data, the estimates of the mean and standard deviation for the NAEP and TIMSS samples took the sample design into account by using the sampling weights for estimation. Additionally, as is discussed later, neither NAEP nor TIMSS provide individual proficiencies for students. Rather, both assessments provide five plausible values, each providing a separate, and equivalently good, estimate of the mean and standard deviation. Following accepted NAEP practice (see Mislevy, Johnson, and Muraki 1992), the five estimates of and were paired with the five estimates of and (with the pairing arbitrarily in the order in which the sets of plausible values were on the database). Five values of and were then computed, one set for each pair of plausible values. The final values of and are the average of the five values.

Table 1.—Parameter estimates for the linking of NAEP to TIMSS for grade 8

Subject	= /	= —
Mathematics Science	2.498 3.087	-180.13 70.62

The difference in the values of the and statistics for the two subjects is partly an artifact of the differences in the metrics used in the NAEP and TIMSS scales. The TIMSS scales for grade 8 mathematics and grade 8 science were set to have a mean of 500 and a standard deviation of 100 across the participating countries. On the other hand, the NAEP mathematics and science scales differed from each other. The NAEP 1996 mathematics scale for grade 8 was linked to a 500-point scale established in 1990 across the grades 4, 8, and 12. The parameter for grade 8 science having a different sign than the parameter for mathematics reflects that the grade 8 NAEP science scales are expressed on a 300-point within-grade metric rather than a 500-point across-grade metric.