The National Center for Education Statistics (NCES) is congressionally mandated to report on the state of education in the United States and other countries. 2 To carry out this mission, NCES participates in several international assessments to measure how the performance of U.S. students and adults compares with that of their counterparts in other countries. This special analysis looks closely at the information NCES has gathered from recent international studies that U.S. students have participated in: the Progress in International Reading Literacy Study (PIRLS), the Program for International Student Assessment (PISA), and the Trends in International Mathematics and Science Study (TIMSS). 3
This special analysis describes the most recent results from these international studies as well as trends in the results, when possible. It is organized by subject area into three parts—reading, mathematics, and science. For each subject area, the following topics are addressed:
The three international studies examined in this special analysis periodically measure one or more dimensions of the performance of students at different ages or grade levels. PIRLS, sponsored by the International Association for the Evaluation of Educational Achievement (IEA) and first conducted in 2001, assesses the reading performance of 4th-graders every 5 years. PISA, sponsored by the Organization for Economic Cooperation and Development (OECD) and first conducted in 2000, assesses the reading, mathematics, and science literacy of 15-year-old students every 3 years. 4 And TIMSS, sponsored by the IEA and first conducted in 1995, assesses the mathematics and science performance of both 4th- and 8th-graders every 4 years.5 Although organized and run by two different international organizations, these three assessments all provide score results on a scale of 0 to 1,000, with a standard deviation of 100. 6 However, scores from different assessment studies (e.g., PISA and TIMSS) cannot be compared with each other directly because of differences in each study's purpose, subject matter, and assessed grade or age. Thus all comparisons in this special analysis are between countries that participated in the same study.
It is important to point out here that the term "country" is used for simplicity's sake throughout this special analysis as a common name for the range of political entities that have participated in each study. In most cases, participating countries represent an entire nation state, as in the case of the United States. However, in some studies, participating countries represent parts of nation states. For example, several Canadian provinces participated separately in PIRLS 2006, instead of Canada. Likewise, England and Scotland regularly participate separately (instead of the entire United Kingdom) and Belgium regularly participates as two units (Flemish-speaking and French-speaking Belgium) in PIRLS and TIMSS. Similarly, Hong Kong and Macao, which are special administrative regions (SAR) of China, also participate independently. 7
Not all countries have participated in all three studies or in all administrations of a single study's assessments. 8 Table 1 lists the participating countries in the most recent administration of each assessment, and the supplemental tables 1–8 list participating countries in all administrations of the assessments. All three studies include both developed and developing countries; however, TIMSS and PIRLS have a larger proportion of developing countries participating than PISA does because PISA is principally a study of the member countries of the OECD—an intergovernmental organization of 30 developed countries.
Differences in the set of countries that participate in an assessment can affect how well the United States appears to do internationally when results are released. One reason for this is that average student performance in developed countries tends to be higher than in developing countries. As a result, the extent to which developing countries participate in an assessment can affect the international average of participating countries as well as the relative position of one country compared with the others. 9 To deal with this problem, none of the international assessments calculates an international "average" score based on results of all participating countries. Instead, PISA calculates an OECD average, for each PISA subject area, that is based only on the results of the OECD-member countries. All OECD-member countries participate in PISA; therefore, PISA ostensibly calculates this average based on a consistent group of countries. 10 TIMSS and PIRLS, on the other hand, do not calculate an average based on the results of any of the participating countries; they report results relative to the midpoint of each assessment's reporting scale, which they call the "scale average." 11
All differences reported in this special analysis are statistically "measurable" or significant at a 95 percent level of confidence. All t-tests supporting this special analysis were done without adjustments for multiple comparisons. It is also important to note that the purpose of this special analysis is to provide descriptive information; thus, complex interactions and causal relationships have not been explored. Readers are cautioned not to make causal inferences based on the results presented here.
2 Most recently mandated in the Education Sciences Reform Act of 2002.
3 This special analysis does not examine the results of international assessments of adult literacy, in which the United States has also participated. The reason for this is that the results of the 2002 Adult Literacy and Lifeskills Survey (ALL), the last assessment of adult literacy, have already been described in The Condition of Education 2006 special analysis (see http://nces.ed.gov/programs/coe/2006/analysis/index.asp), and the next assessment, the Program for the International Assessment of Adult Competencies (PIAAC), is not scheduled to be conducted until 2011.
4 While PISA assesses each subject area every 3 years, each assessment cycle focuses on one particular subject. In 2000, the focus was on reading literacy; in 2003, on mathematics literacy; in 2006, on science literacy. In 2009, the focus is on reading literacy again.
5 In 1995, TIMSS also assessed students at the end of secondary school: in some countries, these were students in grade 10, while in others these were students in grade 14. In the United States, 12th-graders were assessed.
6 For details about scale scores, see appendix A.
7 In some assessments, subnational units such as states and regions have been benchmarking participants either instead of or in addition to the entire nation-state. For a list of U.S. states that have participated in international assessments, independent of the nation as a whole, see appendix A.
Note that official designation of participating entities may differ between assessments. For example, in TIMSS, the official designation for Hong Kong is "Hong Kong SAR," while in PISA, it is "Hong Kong-China." In the text of this special analysis, shortened forms of official designations are used; but in the figures and tables, the assessment's full official designations are used.
8 Countries vary over time in the assessments in which they participate for a variety of reasons, including individual countries' perceptions of the benefits and costs of each assessment, and the specific logistic challenges of administering the assessments.
9 Specifically, as more developing countries participate in a study, the lower the international average tends to be and the higher the participating developed countries appear to be ranked.
10 While all OECD-member countries' results are used to calculate PISA's OECD average, the number of countries used to calculate this average has actually increased. For example, in 2000, results for The Netherlands were not used to calculate the average because of its low response rates. In addition, between 2000 and 2003, the total number of countries in the OECD increased from 28 to 30 when the Slovak Republic and Turkey joined the OECD.
11 Although the IEA uses the label "scale average," this is not actually a calculated average: it equals 500 because that is the "average" value on the assessment's 1,000-point scale. For a more detailed explanation of scale scores and scale averages, see appendix A.