2009 Spotlight

U.S. Performance Across International Assessments of Student Achievement

Technical Notes: A.9 Comparing results from TIMSS 1995 and 1999

TIMSS 1995 scale scores

TIMSS 1995 utilized a one-parameter item response theory (IRT) model to produce score scales that summarized the achievement results in the original reports. The TIMSS 1995 data were rescaled using a three-parameter IRT model to match the procedures used to scale the 1999, 2003, and 2007 TIMSS data. The three-parameter model was preferred to the one-parameter model because it can more accurately account for the differences among items in their ability to discriminate between students of high and low ability. After careful study of the rescaling process, the International Study Center concluded that the fit between the original TIMSS data and the rescaled TIMSS data met acceptable standards. However, as a result of rescaling, the average achievement scores of some countries changed from those initially reported in 1996 and 1997 (Peak 1996; NCES 1997). The rescaled TIMSS scores are included in this special analysis.

TIMSS tests of significance in 1995 and 1999

Tests of significance used in this special analysis to make multiple country comparisons for TIMSS 1995 and TIMSS 1999 differ from those presented in their respective U.S. reports (NCES 1997, NCES 2000). Some country differences discussed in this special analysis were not reported as statistically significant in the TIMSS 1995 and 1999 U.S. reports. The reason for this is that a Bonferroni adjustment was used in all multiple comparisons of countries in those reports. However, the TIMSS 2003 and 2007 U.S. reports discontinued use of the Bonferroni adjustment. To maintain the comparability of results across all four TIMSS assessments, none of the tests of significance presented in this report used the Bonferroni adjustment for multiple comparisons.