Stuart Kerachsky
Acting Commissioner, National Center for Education Statistics

NCES Statement on TIMSS 2007
December 9, 2008

Today the National Center for Education Statistics is releasing results on the performance of students in the United States on the Trends in International Mathematics and Science Study (TIMSS). The 2007 TIMSS is the fourth administration since 1995 of this cross-national comparative study. Developed through the auspices of the International Association for the Evaluation of Educational Achievement (IEA), TIMSS assesses the mathematics and science knowledge and skills of fourth- and eighth-graders.

TIMSS is designed to align broadly with mathematics and science curricula in the participating countries. The results, therefore, suggest the degree to which students have learned mathematics and science concepts and skills likely taught in school. TIMSS also collects background information on students, teachers, and schools to allow cross-national comparison of educational contexts that may be related to student achievement.

TIMSS is open to countries, as well as large subnational education systems. For example, Hong Kong, which also participated in TIMSS 1995, is now a Special Administrative Region (SAR) of the People’s Republic of China. For convenience, however, the term "country" or "nation" is used in the report to refer to all participating entities. In 2007, mathematics and science assessments and associated questionnaires were administered in 36 countries at the fourth-grade level and 48 countries at the eighth-grade level.1

The TIMSS fourth-grade assessment was implemented in 1995, 2003, and 2007, while the eighth-grade assessment was implemented in 1995, 1999, 2003, and 2007. For a number of participating countries, including the United States, changes in achievement can be documented over the last 12 years, from 1995 to 2007.

The results presented here focus on the performance of U.S. fourth-and eighth- grade students in mathematics and science relative to that of their peers in other countries in 2007 and since 1995.

How TIMSS was Conducted

In the United States, TIMSS was administered in spring 2007. The U.S. sample is representative of both public and private school students at 4th and 8th grades nationally. In total, 257 schools and 10,350 students participated at grade four, and 239 schools and 9,723 students participated at grade eight. More information about how the assessment was developed and conducted is included in the technical notes of the U.S. report on TIMSS (http://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2009001).

How TIMSS Results are Reported

Like other large-scale assessments, TIMSS was not designed to provide individual student scores, but rather national and group estimates of performance. Achievement results from TIMSS are reported on a scale from 0 to 1,000. In order to compare performance over time, each TIMSS administration is placed on the same scale, which has a mean of 500 and standard deviation of 100. The TIMSS scale average (500) is the mean score of the original TIMSS 1995 countries (including the United States). Countries can compare their scores over time to this standardized TIMSS scale average, as well as compare their scores directly with other countries.

All differences described using TIMSS data are statistically significant at the .05 level. Differences that are not statistically significant are either not discussed or referred to as "not measurably different" or "not statistically significant."

In addition to numerical scale results, TIMSS includes international benchmarks at four points on the mathematics and science scales—advanced international benchmark (625), high international benchmark (550), intermediate international benchmark (475), and low international benchmark (400).

U.S. Performance in Mathematics

Average scores in 2007

In 2007, the average mathematics scores of both U.S. fourth-graders (529) and eighth-graders (508) were higher than the TIMSS scale average.

The average U.S. fourth-grade mathematics score was higher than those of students in 23 of the 35 other countries, lower than those in 8 countries (located in Asia or Europe), and not measurably different from those in the remaining 4 countries. Fourth-graders from Hong Kong SAR had the highest estimated mathematics score in TIMSS 2007.

At eighth grade, the average U.S. mathematics score was higher than those of students in 37 of the 47 other countries, lower than those in 5 countries (located in Asia), and not measurably different than those in the other 5 countries. Eighth-graders from Chinese Taipei had the highest estimated mathematics score in TIMSS 2007.

Trends in scores since 1995

Compared with 1995, the average mathematics scores for both U.S. fourth- and eighth-grade students were higher in 2007. At fourth grade, the U.S. average score in 2007 was 529, 11 points higher than its 1995 average. At eighth grade, the U.S. average mathematics score in 2007 was 508, 16 points higher than its 1995 average score.

In addition to the United States, 15 other countries participated in both the 1995 and 2007 TIMSS fourth-grade administrations. Comparing 2007 mathematics scores with those from 1995 shows that one-half of the countries (8 of 16) showed improvement in average scores, including the United States and 3 countries that scored higher than the United States in 2007: England, Hong Kong SAR, and Latvia. The size of the gains in these three countries was larger than the gain in the United States. One-quarter of the countries (4 of 16) showed declines, including one country—the Netherlands—with an average score not measurably different from the U.S. average in 2007.

At grade eight, 20 countries participated in TIMSS in both 1995 and 2007, including the United States. About one-third of the countries (6 of 20) had higher average mathematics scores in 2007 than in 1995, including the United States and 3 countries—England, Korea, and Lithuania—with average scores higher than or not measurably different from the U.S. average score in 2007. Two countries had larger gains than the United States, including Lithuania which had an average score that was not measurably different from the U.S. average in 2007. One-half of the countries (10 of 20) showed declines in their average scores, including 4 countries that scored higher than or not measurably different from the United States in 2007: the Czech Republic, Hungary, Japan and Singapore.

Performance on the TIMSS international benchmarks

In 2007, there were higher percentages of U.S. fourth-graders performing at or above each of the four TIMSS international benchmarks than the international medians2 of the percentages performing at each level. For example, 10 percent of U.S. fourth-graders performed at or above the advanced benchmark (625) compared to the international median of 5 percent. These students demonstrated an ability to apply their understanding and knowledge to a variety of relatively complex mathematical situations and explain their reasoning.

Similar to their fourth-grade counterparts, there were higher percentages of U.S. eighth-graders performing at or above each of the four TIMSS international benchmarks than the international medians of the percentage performing at each level. For example, 6 percent of U.S. eighth-graders performed at or above the advanced benchmark (625) compared to the international median of 2 percent. These students demonstrated an ability to organize and draw conclusions from information, make generalizations, and solve nonroutine problems.

Differences in mathematics performance by selected student characteristics

Scores of lower and higher performing students

In 2007, the highest-performing U.S. fourth-graders (those performing at or above the 90th percentile) scored 625 or higher in mathematics. This was higher than the 90th percentile scores for fourth-graders in 23 countries and lower than the 90th percentile score for students in 7 countries: Singapore, Hong Kong SAR, Japan, Chinese Taipei, Kazakhstan, England, and the Russian Federation. The lowest-performing U.S. fourth-graders (those performing at or below the 10th percentile) scored 430 or lower in mathematics. This was higher than the 10th percentile score in 23 countries and lower than the 10th percentile score in 6 countries: Singapore, Hong Kong SAR, Japan, Chinese Taipei, Latvia, and the Netherlands.

A comparison of 1995, when TIMSS was first administered, and 2007 shows no measurable change in the mathematics cutpoint score at the 90th percentile for U.S. fourth graders, the point marking the top 10 percent of students. However, a comparison of data from 2003 and 2007 shows there was an increase in the 90th percentile score: from 614 to 625. The lowest-performing U.S. fourth graders’ showed improvement in mathematics between 1995 and 2007 and between 2003 and 2007.

At grade eight, the highest-performing U.S. students in mathematics scored 607 or higher in 2007. The U.S. 90th percentile score was higher than that of 34 countries and lower than the 90th percentile score in 6 countries: Chinese Taipei, Korea, Singapore, Hong Kong SAR, Japan, and Hungary. The lowest-performing U.S. eighth-graders scored 408 or less. The 10th percentile score for U.S. eighth-graders in mathematics was higher than the 10th percentile score in 34 countries and lower than the 10th percentile score in 4 countries: Chinese Taipei, Korea, Singapore, and Japan.

Comparing the scores in 2007 to those in 1995, both the 90th and 10th percentile U.S. eighth-grade mathematics scores were higher. Though the U.S. 90th percentile score has been relatively stable over the last three administrations of TIMSS (1999, 2003, and 2007), the 2007 U.S. score of 607 was higher than the 1995 score of 594, showing improvement among top students. The 10th percentile score for U.S. eighth-graders was higher in 2007 than in 1995 and 1999.

Performance by sex

In 2007 in the United States, fourth-grade males outperformed females by 6 score points on average in mathematics. In the 35 other countries participating at grade four, 20 showed a significant difference in the average mathematics scores of males and females: 12 in favor of males and 8 in favor of females. The difference in average scores between males and females ranged from 37 score points in Kuwait (in favor of females) to 17 score points in Colombia (in favor of males).

At grade eight, there was no measurable difference in the average mathematics scores of U.S. males and females in 2007. Among the 47 other countries participating in TIMSS at grade eight, 24 showed a difference in the average mathematics scores of males and females: 8 in favor of males and 16 in favor of females. The difference in average scores between males and females ranged from 54 score points in Oman (in favor of females) to 32 score points in Colombia (in favor of males).

Between 1995 and 2007, the average scores of both U.S. males and females improved at both the fourth and eighth grades. Compared to 1995, U.S. fourth-grade males scored 12 points higher on average in mathematics in 2007 and U.S. fourth-grade females scored 10 points higher, on average. Compared to 1995, U.S. eighth-grade males scored 15 points higher on average in mathematics in 2007 and U.S. eighth-grade females scored 17 points higher, on average.

Performance by race/ethnicity

In the United States, students were asked whether they were of Hispanic origin and their race. Students who identified themselves as being of Hispanic origin were classified as Hispanic, regardless of race. 3

In 2007, U.S. White, Asian, and multiracial (non-Hispanic students who identified with two or more races) fourth-graders all scored higher, on average, in mathematics than the TIMSS scale average, while Black fourth-graders scored lower, on average. Hispanic fourth-graders’ average score showed no measurable difference from the TIMSS scale average.

At grade eight, the average scores of U.S. White, and Asian students were higher than the TIMSS scale average in mathematics. On the other hand, the average scores of Black and Hispanic eighth-graders were lower than the TIMSS scale average. The average score of multiracial eighth-graders was not measurably different from the TIMSS scale average.

Examination of performance over the 12-year period, from 1995 to 2007 in the United States, shows that White, Black, and Asian students in both fourth and eighth grades, as well as Hispanic students in grade eight, have mostly improved in mathematics. Hispanic fourth graders have improved over a shorter period, between 2003 and 2007.

Performance by school poverty level

The U.S. results are also arrayed by the concentration of low-income enrollment in the public schools, as measured by eligibility for free or reduced-price lunch, and shown in relation to the TIMSS scale average and the U.S. national average.

In comparison to the TIMSS scale average, the average mathematics score of U.S. fourth-graders in the highest poverty public schools (at least 75 percent of students eligible for free or reduced-price lunch) in 2007 was lower; the average scores of fourth-graders in each of the other categories of school poverty was higher than the TIMSS scale average.

In comparison to the U.S. national average score, fourth-graders in schools with 50 percent or more students eligible for free or reduced-price lunch scored lower, on average, while those in schools with lower proportions of poor students scored higher, on average, than the U.S. national average.

On average, U.S. eighth-graders in public schools with at least 50 percent eligible for free and reduced price lunch scored lower than the TIMSS scale average in 2007. U.S. eighth-graders attending public schools with fewer than 50 percent of students eligible for the free or reduced-price lunch program scored higher than the TIMSS scale average in mathematics.

In comparison to the U.S. national average, U.S. eighth-graders in public schools with fewer than 25 percent of students eligible scored higher in mathematics, on average, while students in public schools with at least 50 percent eligible scored lower, on average.

TIMSS did not collect information on the percentage of students eligible for the federal free or reduced-price lunch program in 1995 for either grade. Thus, comparisons over time on this measure are limited to 2003 to 2007 for fourth-grade, and 1999 to 2007 for eighth-grade.

Comparisons of scores in 2007 to 2003 showed an inconsistent pattern of improvement in mathematics among U.S. fourth-graders in public schools serving students from various levels of poverty. For example, fourth graders in public schools with the lowest level of poverty (less than 10 percent eligible) had higher average mathematics scores in 2007 than in 2003. On the other hand, there was no measurable difference detected in the average scores of students attending the highest poverty level schools.

For grade eight, students in different types of public schools categorized by poverty did not show change in performance generally. Only U.S. eighth-graders in public schools with medium levels of poverty (25 to less than 50 percent of students eligible for free or reduced-price lunch) had higher average mathematics scores in 2007 compared to 1999.

U.S. Performance in Science

Average scores in 2007

In 2007, the average science scores of both U.S. fourth-graders (539) and eighth-graders (520) were higher than the TIMSS scale average (500 at both grades).

The average U.S. fourth-grade science score was higher than those of students in 25 of the 35 other countries, lower than those in 4 countries (located in Asia), and not measurably different from those in the remaining 6 countries. Fourth-graders from Singapore had the highest estimated science score in TIMSS 2007.

At eighth grade, the average U.S. science score was higher than the average scores of students in 35 of the 47 other countries, lower than those in 9 countries (located in Asia or Europe), and not measurably different from those in the other 3 countries. Again, Singapore had the highest estimated science score.

Trends in scores since 1995

Compared with 1995, the average science scores for both U.S. fourth- and eighth-grade students were not measurably different from 2007. The U.S. fourth-grade average science score in 2007 was 539 and in 1995 was 542. The U.S. eighth-grade average science score in 2007 was 520 and in 1995 was 513.

At grade four, 16 countries, including the United States, participated in both the first TIMSS in 1995 and the most recent TIMSS in 2007. Comparing 2007 with 1995, 7 of the 16 countries showed improvement in average science scores, including 5 countries that scored higher than or not measurably different from the United States in 2007: England, Hong Kong SAR, Hungary, Latvia, and Singapore. Five countries showed declines in scores, including Japan, which scored higher than the United States in 2007. Four countries, including the United States, had no measurable difference in average scores.

At grade eight, 19 countries participated in TIMSS in both 1995 and 2007, including the United States. Five countries had higher average science scores in 2007 than in 1995, including 4 countries that scored higher than or not measurably different from the United States in 2007: Hong Kong SAR, Korea, Lithuania, and Slovenia. Three countries showed declines in their average scores, including the Czech Republic, which scored higher than the United States in 2007. Eleven countries, including the United States, had no measurable difference between average science scores in 1995 and 2007.

Performance on the TIMSS international benchmarks

In 2007, there were higher percentages of U.S. fourth-graders performing at or above three of the four TIMSS international benchmarks than the international median percentage. For example, 15 percent of U.S. fourth-graders performed at or above the advanced benchmark (625) in science compared to the international median of 7 percent. These students demonstrated an ability to apply their knowledge and understanding of scientific processes and relationships in beginning scientific inquiry.

At the eighth grade, there were higher percentages of U.S. students performing at or above each of the four TIMSS international science benchmarks than the international median. For example, 10 percent of U.S. eighth-graders performed at or above the advanced benchmark (625) compared to the international median of 3 percent. These students demonstrated a grasp of some complex and abstract concepts in biology, chemistry, physics, and Earth science.

Differences in science performance by selected student characteristics

Scores of lower and higher performing students

In 2007, the highest-performing U.S. fourth-graders in science (those performing at or above the 90th percentile) scored 643 or higher. This was higher than the 90th percentile score for fourth-graders in 27 countries and lower than those in 2 of the 35 other countries. Of the 4 countries with average fourth grade science scores higher than that of the United States in 2007, 2 had higher 90th percentile cutpoint scores than the United States: Singapore and Chinese Taipei.

The lowest-performing U.S. fourth-graders in science (those performing at or below the 10th percentile) scored 427 or less in 2007. The 10th percentile score for U.S. fourth-graders was higher than the 10th percentile score in 17 countries and lower than that in 7 countries: Singapore, Chinese Taipei, the Russian Federation, Hong Kong SAR, Japan, Latvia, and the Netherlands.

A comparison of 1995 and 2007 shows a decline in the 90th percentile cutpoint score for U.S. fourth-graders in science, the point marking the top 10 percent of students. In 2007, the 90th percentile score was 11 score points lower than the analogous score in 1995. A comparison of the 10th percentile science scores for U.S. fourth-graders in 1995 and 2007 and 2003 and 2007 shows no measurable difference.

At grade eight, the highest-performing U.S. students in science (90th percentile or higher) scored 623 or higher in 2007. This was higher than the 90th percentile score in 34 countries and lower than in 6 countries: Singapore, Chinese Taipei, England, Japan, Korea, and Hungary.

At the other end of the scale, the lowest-performing U.S. eighth-graders in science (10th percentile or lower) scored 410 or lower in 2007. The 10th percentile score for U.S. eighth-graders was higher than the 10th percentile score in 34 countries and lower than in 8 countries: Chinese Taipei, England, Japan, Korea, Hungary, the Czech Republic, Slovenia, and the Russian Federation.

At grade eight, the 90th percentile cutpoint score in science showed no measurable differences in comparisons of 2007 to 1995 or 2003, but showed a decrease when the 2007 score was compared to the 1999 score (636 v. 623). The score identifying the lowest-performing U.S. eighth-graders in science was higher in 2007 than in 1995 (410 v. 384) and in 1999 (410 v. 386).

Performance by sex

In 2007 in the United States, fourth-grade males and females showed no measurable difference in their average science performance. In 14 of the 35 other countries participating at grade four there were significant differences in the average science scores of males and females: 8 countries in favor of males and 6 in favor of females. The largest differences were 64 score points in favor of females in Kuwait and 15 score points in favor of males in Colombia.

Unlike their fourth-grade counterparts, U.S. eighth-grade males outperformed their female classmates, on average, in science in 2007. Among the 47 other countries participating in TIMSS, 24 showed a difference in the average science scores of males and females: 10 countries in favor of males and 14 in favor of females. The largest differences were 70 score points in favor of females in Qatar and 35 score points in favor of males in Colombia and Germany.

There has been no measurable change in the average scores in science of U.S. males and females at either grade four or eight over the 12-year period from 1995 to 2007.

Performance by race/ethnicity

U.S. White, Asian, and multiracial fourth-graders all scored higher in science, on average, than the TIMSS scale average, while Black fourth-graders scored lower. Hispanic fourth-graders’ average score showed no measurable difference from the TIMSS scale average.

At grade eight, U.S. White, Asian, and multiracial students scored higher, on average, than the TIMSS scale average in science. On the other hand, Black and Hispanic eighth-graders scored lower, on average, than the TIMSS scale average.

Between 1995 and 2007 in the United States, Black and Asian fourth-graders, and Black, Asian, and Hispanic eighth-graders showed an overall pattern of improvement in science. There was no measurable change in the average science scores of White and Hispanic fourth-graders, and White eighth-graders when 2007 scores were compared to those from the earlier assessments.

Performance by school poverty level

In comparison to the TIMSS scale average, the average science score of U.S. fourth graders in the highest poverty public schools (at least 75 percent of students eligible for free or reduced-price lunch) in 2007 was lower; the average scores of fourth-graders in each of the other categories of school poverty was higher.

In comparison to the U.S. average score, fourth-graders in schools with 50 percent or more students eligible for free or reduced-price lunch scored lower in science, on average, while those in schools with lower proportions of poor students scored higher, on average, than the U.S. national average.

In comparison to the TIMSS scale average, U.S. eighth-graders attending public schools with fewer than 50 percent of students eligible for the free or reduced-price lunch program scored higher in science, on average. On the other hand, U.S. eighth-graders in public schools with the highest poverty levels scored lower in science, on average, than the TIMSS scale average.

In comparison to the U.S. national average, U.S. eighth-graders in public schools with fewer than 25 percent of students eligible scored higher in science, on average, while students in public schools with at least 50 percent eligible scored lower, on average.

Comparisons of the 2007 average science scores to those for the earlier years within each school poverty level revealed no measurable change at either grade four or eight, with one exception. At grade eight, students in public schools with the highest poverty levels had a higher average science score (466) in 2007 than in 1999 (440).

Comparisons Between TIMSS and NAEP

It is often asked how TIMSS compares with other assessments that measure similar subjects and populations, in particular, the National Assessment of Educational Progress (NAEP).

The two assessments vary in some obvious ways, such as the goals of the studies, the populations, and the sample sizes, as well as their frameworks and specifications. However, there also are differences that are less obvious and that can only be found by comparing the content of the assessments through examination of the items. In a recent comparison study, TIMSS 2007 mathematics and science items were classified using the NAEP assessment frameworks (2005/2007 for mathematics and 2005 for science). The classification categories included content topics and objectives and grade-level expectations. In mathematics, it was a matter of curricular emphasis; that is, there was a somewhat different distribution of mathematics topics across the assessments. In science, TIMSS contained items that could not be mapped onto the NAEP objectives and they measured a somewhat different balance of skills.

That said, there are broad results that are consistent across the international and national assessments over a similar time interval--in NAEP between 1996 and 2007 and in TIMSS between 1995 and 2007. Both assessments showed statistically significant increases in the mathematics performance of fourth- and eighth-grade students: overall, among boys, and among girls. NAEP also reported general increases for White, Black, Hispanic, and Asian students and for students at the top and bottom of the distribution (at the 10th and 90th percentiles) at both grades. TIMSS only detected increases in mathematics performance for some of these groups (e.g., White and Black students in both grades, students in the 10th percentile in both grades) and no change for others (e.g., Hispanic fourth-grade students). This difference is likely the result of NAEP’s larger sample sizes, which make it more sensitive to picking up small changes among nationally relevant student groups than TIMSS, which is designed primarily to detect differences among countries.

The most recent results from NAEP and TIMSS also provide trend information for fourth- and eighth-grade science, although covering a slightly shorter period in NAEP than in TIMSS. NAEP provides trends for the period 1996 to 2005 and TIMSS for the period 1995 to 2007. Compared with mathematics, the trends shown by NAEP and TIMSS in science are less consistent with one another, which is not surprising given the differing time periods and the relatively greater differences in the assessments discussed in the previous sections.

For example, in fourth grade, NAEP shows that there was an increase in students’ science performance both overall and among boys between 1996 and 2005, whereas TIMSS did not detect any change in performance for either of those groups from 1995 to 2007. At the eighth-grade level, neither NAEP nor TIMSS showed any change in science performance among students overall. But in contrast to the fourth-grade results, TIMSS reported increases for Black, Hispanic, and Asian eighth-grade students, whereas NAEP only reported increases among Black students. This suggests that Hispanic and Asian eighth-grade students performed relatively better over time on the content unique to TIMSS than unique to NAEP.

Conclusion

This TIMSS report is intended to be used by educators, policymakers, and interested members of the public. It is important to have the kind of performance data that TIMSS provides as an external perspective on the performance of our nation’s students.

For More Information

This statement covers some of the major findings from the TIMSS 2007 highlights report from the U.S. perspective available on the NCES website. Other findings are available in IEA’s report on TIMSS 2007. The TIMSS 2007 data will also be publicly available after December 9 for independent analyses.

For more information on TIMSS, please visit the TIMSS website at
http://nces.ed.gov/timss/.

For more information on NAEP, please visit the NAEP website at
http://nces.ed.gov/nationsreportcard/.

The U.S. TIMSS 2007 results are available at
http://nces.ed.gov/timss/results07.asp.

The 2007 NAEP eighth grade mathematics results are available at
http://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2007494.

Information about how TIMSS compares with NAEP and the Programme for International Student Assessment (PISA) is provided in the following paper: Comparing TIMSS with NAEP and PISA in Mathematics and Science PDF icon (281 KB)

Acting Commissioner Stuart Kerachsky's Presentation:
Highlights From TIMSS 2007 PDF File (1.4 MB)

Top


1 The total number of countries reported here differs from the total number reported in the international TIMSS reports published by IEA. In addition to the 36 countries at grade four and 48 countries at grade eight, 8 other education systems, or "benchmarking" entities, participated: the states of Massachusetts and Minnesota; the Canadian provinces of Alberta, British Columbia, Ontario, and Quebec; Dubai, United Arab Emirates; and the Basque region of Spain.
2 The international median at each benchmark represents the percentage at which half of the participating countries have that percentage of students at or above the median and half have that percentage of students below the median. For example, the low international benchmark median of 90 percent at grade four indicates that half of the countries have 90 percent or more of their students who met the low benchmark, and half have less than 90 percent of their students who met the low benchmark.