Skip Navigation
small NCES header image

Frequently Asked Questions


[Show All] International Assessments
  • U.S. Participation
    • In what international education studies does the United States participate, and what do they measure?
    • Why does the United States participate in international education studies?
      • The United States participates in international studies primarily for two reasons:
        • To learn about the performance of U.S. students and adults in comparison to their peers in other countries.
        • To learn about the educational and work experiences of students and adults in other countries.
        Student assessments are a common feature of school systems that are concerned about accountability and assuring students' progress throughout their educational careers. National or state assessments enable us to know how well students are doing in a variety of subjects and at different ages and grade levels compared to other students nationally or within their own state. International assessments, on the other hand, offer a unique opportunity to benchmark our students' performance to the performance of students in other countries. Similarly, international assessments of adult literacy enable us to compare U.S. adults with their international peers on literacy skills that support productive adult lives in the workplace and society.

        International assessments of students also enable countries to learn from each other about the variety of approaches to schooling and to identify promising practices and policies to consider in their schools. International assessments of adults enable research on the correlates between adults' work and educational experiences and their skill levels within countries and cross-nationally.

  • Development and Administration
    • How are test and survey questions developed for the international studies?
      • There are three main components in the development of test and survey questions:
        1. Test and survey questions for each study are first developed through a collaborative, international process.
          For each study, an international subject area expert group is convened by the organization conducting the study. This expert group drafts a framework (the outline of the topics and skills that should be assessed or surveyed in a particular domain), which reflects a multinational consensus on the assessment and survey of a subject area. Based on the framework, national representatives and subject matter specialists develop the test and survey questions. National representatives from each country then review every item to ensure that each adheres to the internationally agreed-upon framework. While not every item may be equally familiar to all study participants, if any item is considered inappropriate for a participating country or an identified subgroup within a country, that item is eliminated.
        2. Test and survey items are field-tested before they are used or administered in the full-scale study.
          Before the administration of the study, a field test is conducted in the participating countries. An expert panel convenes after the field test to review the results and look at the items to see if any results were biased due to national, social or cultural differences. If such items exist, they are not included in the full study. Only after this thorough process, in which every participating country is involved, are the actual items administered to study participants.
        3. There is an extensive translation verification process.
          All participating countries are responsible for translating the assessment or survey into their own language or languages, unless the original items are in the language of the country. All countries identify translators to translate the source versions into their own language. External translation companies independently review each country's translations. Instruments are verified twice, once before the field test and again before the main data collection. Statistical analyses of the item data are then conducted to check for evidence of differences in performance across countries that could indicate a translation problem. If a translation problem with an item is discovered in the field test, it is removed for the full study. Since for TIMSS, PIRLS, PISA, PIAAC, and TALIS the items are provided to countries in English, the United States does not need to translate the assessments but does adapt the international English versions to U.S.-English when necessary and appropriate.
    • Who participates in the international studies?
      • A representative national sample of the target population in each participating country responds to each study. In the case of PIRLS, PISA, and TIMSS, the sample is drawn to be representative of students at the designated age or grade level. In the case of PIAAC, the sample is drawn to be representative of persons 16 to 65 years old living in households. In the case of TALIS, the sample is drawn to be representative of teachers.

        The international organization that conducts each study verifies that all participating countries select a nationally representative sample. To ensure comparability, target grades, ages, or populations are clearly defined. For example, in TIMSS countries are required to sample students in the grade that corresponds to the end of 8 years of formal schooling, providing that the mean age of the students at the time of testing is at least 13.5 years.

        Not all selected respondents choose to participate in the studies; and certain respondents, such as some with cognitive or physical disabilities, may not be able to participate. Thus the sponsoring international organizations check each country's participation rates and exclusion rates to ensure they meet established target rates in order for the country's results to be reported.
    • How can we be sure that countries administer the test or survey in the same way?
      • The short answer is that procedures for the administration of the international studies are standardized and independently verified.

        The international organizations that conduct international studies require compliance with standardized procedures. Manuals are provided to each country that specify the standardized procedures that all countries must follow on all aspects of sampling, preparation, administration, and scoring. To further ensure standardization, independent international quality control monitors visit a sample of schools (or households in the case of PIAAC) in each country. In addition, the countries themselves organize their own quality control monitors to visit an additional number of schools (or households in the case of PIAAC). Results for countries that fail to meet the international requirements are footnoted with explanations of the specific failures (e.g., "only met guidelines for sample participation rates after substitute schools were included"), are shown separately in the international reports (e.g., listed in a separate section at the bottom of a table), or are omitted from the international reports and datasets (as happened to the Netherlands' PISA results in 2000, the United Kingdom's PISA results in 2003, and Morocco's TIMSS 2007 results at grade 8).
    • Are respondents required to participate in these studies?
      • To our knowledge, no countries require all schools and students to participate in PIRLS, PISA, or TIMSS. However, some countries give more prominence to these studies than do others. In the United States, participation by respondents to international studies is voluntary.
  • Issues of Validity and Reliability
    • How different are assessment test questions from what students are expected to learn in the classroom?
      • The answer varies from study to study. Some studies, like TIMSS, are curriculum-based and are designed to assess what students have been taught in school using multiple-choice and open-ended (or short answer) test questions. Other studies, like PISA and PIAAC, are "literacy" assessments, designed to measure performance in certain skill areas at a broader level than the school curriculum.
    • How do international studies deal with the fact that education systems around the world are so different?
      • The fact that education systems are different across countries is one of the main reasons we are interested in making cross-country comparisons. However, these differences make it essential to carefully define the target populations to be compared, so that comparisons are as fair and valid as possible. For studies focusing on students, depending in large part on when students first start school, students at a given age may have less or more schooling in different countries, and, students in a given grade may be of different ages in different countries. In every case, detailed information on the comparability of the sampled populations is published for review and consideration.

        For PIRLS, the target population represents students in the grade that corresponds to 4 years of formal schooling, counting from the first year of schooling as defined by the International Standard Classification of Education (ISCED), Level 1. This corresponds to fourth grade in most countries, including the United States. This population represents an important stage in reading development.

        In TIMSS, the two target populations are defined as follows: (1) all students enrolled in the grade that corresponds to 4 years of formal schooling—fourth grade in most countries—providing that the mean age at the time of testing is at least 9.5 years, and (2) all students enrolled in the grade that corresponds to 8 years of formal schooling—eighth grade in most countries—providing that the mean age at the time of testing is at least 13.5 years. At grade four in 2007, only England, Scotland, and New Zealand included students who had 5 years of formal schooling at the time of testing. At grade eight, England, Malta, Scotland, and Bosnia and Herzegovina included students who had 9 years of formal school at the time of testing. In addition, at grade eight, the Russian Federation and Slovenia included some students who had less than 8 years of formal schooling. However, in all of these cases, the assessed students were of comparable average age to those participating in other countries.

        Another approach, used in PISA, is to designate a target population as students of a particular age (15 years in PISA), regardless of grade. Both approaches are suited to addressing the particular research questions posed by the assessments. The focus of TIMSS and PIRLS is on content as commonly expected to be taught in classrooms, while PISA emphasizes the skills and knowledge that students have acquired throughout their education both in and out of school.
    • Do international studies take into account that student and adult populations vary in participating countries—for example, the United States has higher percentages of immigrant students and adults than some other countries?
      • Each country has different population characteristics, but the point of international studies is to measure as accurately as possible the levels of achievement or proficiency of each participating country's target population. Differences in the levels of achievement or proficiency among students or adults in different countries may be associated with high variations in respondent characteristics, but they may also be due in part to differences in curriculum, teacher preparation, and other educational or societal factors.
    • What if countries select only their best students to participate? Won't they look better than the rest?
      • Countries cannot independently select the students who will take the test. Students are sampled, but the sampling of schools and students is carefully planned and monitored by the sponsoring international organizations.

        Sampling within countries proceeds as follows:
        A sample of schools in each country is selected randomly from lists of all schools in the country that have students in the particular grade or of the particular age to be assessed. Samples for each country are verified by an international sampling referee. Once the sample of schools is selected, each country must contact these original schools to solicit participation in the assessment. Countries are not allowed to switch schools from the list; doing so can result in the exclusion of their data from the reports.

        Every study establishes response rate targets of selected schools (and students) that countries must meet in order to have their data reported. If the response rate target is not met, countries may be able to assess students from substitute schools following international guidelines. For example, PIRLS and TIMSS guidelines specify that substitute schools be identified at the time that the original sample was selected by assigning the two schools neighboring the sampled school on the sampling frame as substitutes. If the original school declines to participate, the first of two substitute schools is contacted to participate. If it declines, the second substitute school is contacted. If it also declines, no other substitute school may be used. If one of the two substitute schools accepts, there are still several constraints on their participation in order to prevent bias. If participation levels, even using substitute schools, still fall short of international or national guidelines, a special non-response bias analysis is conducted to determine if the schools that did not participate differ systematically from the schools that did participate. If the analysis does not show evidence of bias, then the data for a country may still be included in the reporting of results for the international assessment but the problem of participation rates is noted.

        Once a sample of schools agrees to participate, the schools are asked to provide a list of all students of the target age or a list of a particular kind of class (for example, all grade 4 classrooms) within the school. From those lists, a group or whole class of students is then randomly selected for the assessment. No substitutions for the students randomly selected are allowed. However, some individual students may be excluded. Each study establishes a set of guidelines for excluding individual students from assessment. Typically, if a student has a verifiable cognitive or physical disability, he or she can be excluded from assessment. However, all student exclusions (at the school level and within schools) cannot exceed established levels, and are reported in international publications. For example, the sampling standards used in PISA permit countries to exclude up to a total of 5 percent of the relevant population for approved reasons. In the United States, the overall exclusion rate in PISA 2006 was 4.28 percent.

        Exclusions can take place at the school level (e.g., excluding very small schools or those in remote regions) and the student level. Students can be excluded if they are functionally disabled, intellectually disabled, or have insufficient language proficiency. This determination is made on the basis of information from the school, although the contractors implementing the study also look out for ineligible students who may make it through the screening process. Students cannot be excluded solely because of low proficiency or normal discipline problems.
  • Reported Results
    • Are scores of individual students or adults reported or available for analysis?
      • No. The assessment methods used in international assessments only produce valid scores for groups, not individuals.
    • Can you use the international data to report scores for states?
      • No. The U.S. data are typically representative of the nation as a whole but not of individual states. Drawing a sample that is representative of all 50 individual states would require a much larger sample than the United States currently draws for international assessments, requiring considerable amounts of additional time and money.

        A state may elect to participate in an international assessment as an individual jurisdiction, in which case a sample is drawn that is representative of that state. To date, several states have participated in TIMSS, PIRLS, and PISA that way.
    • Can you compare scores from one study to another?
      • Scores can be compared from one round of an assessment to another round of the same assessment (e.g., TIMSS 1999 to TIMSS 2007), but they typically cannot be directly compared from one study to another (e.g., TIMSS to PISA or NAEP) without special studies to link the different assessments.
    • Can you compare scores between grades—for example, between grade 4 and grade 8 scores on TIMSS?
      • No. The assessments for each grade are scaled separately, so the scores cannot be directly compared in a meaningful way. Only scores from different rounds of the same assessment (e.g., 2003 TIMSS grade 4 and 2007 TIMSS grade 4) can be compared.
    • Why does the United States report different findings for the same subjects from different international assessments?
      • At times, different assessments report different findings for the same subject. One obvious factor to consider when examining findings across assessments is that the grade or age levels of the students assessed may differ. Another factor is that studies also differ in the specific subject matter or skills emphasized, (e.g., reading, mathematics, science). An additional difference between assessments that can affect findings in terms of the U.S. position relative to other countries is the groups of countries involved in a study. The United States may appear to perform better or worse depending on the number and competitiveness of the other participating countries.
    • Why don't TIMSS, PISA, and PIRLS report differences between U.S. students and other countries' students based on race/ethnicity?
      • There are certain demographic characteristics that are not meaningful across countries. Race/ethnicity is one of these. In the United States, race and ethnicity are highly correlated with education and socio-economic status, which makes them meaningful categories for analysis. While that is also true in other countries, the racial and ethnic categories used to classify people vary from country to country
  • About PIAAC (the Program for the International Assessment of Adult Competencies)
    • What is assessed in PIAAC?
      • PIAAC is designed to assess adults over a broad range of abilities—from simple reading to complex computer-based problem solving skills. All countries that participated in PIAAC in 2012 assessed the domains of literacy and numeracy in both a paper-and-pencil mode and a computer-administered mode. In addition, some countries assessed problem solving administered on a computer as well as components of reading (administered only in paper-and-pencil format). The U.S. assessed all four domains.
    • How valid is PIAAC?  Are assessment questions that are appropriate for the population in one country necessarily appropriate for the population in another country?
      • The assessment was designed to be valid cross-culturally and cross-nationally.

        PIAAC assessment questions are developed in a collaborative, international process. PIAAC assessment questions were based on frameworks developed by internationally-known experts in each subject or domain. Assessment experts and developers from Ministries/Departments of Education and Labor and OECD staff participated in the conceptualization, creation, and extensive yearlong reviews of assessment questions. In addition, the PIAAC Consortium's support staff, assisted by expert panels, researchers, and working groups, developed the assessment Background Questionnaire (BQ). The PIAAC Consortium also guided the development of common standards and procedures for collecting and reporting data, as well as an international “virtual machine” software that administers the assessment uniformly across countries. All PIAAC countries follow the common standards and procedures and use the virtual machine software when conducting the survey and assessment.  As a result, PIAAC can provide a reliable and comparable measure of literacy skills in the adult population of participating countries.

        Test items, instruments, and procedures are field-tested prior to administration. Before the administration of the assessment, a field test was conducted in the participating countries. The PIAAC Consortium analyzed the field test data and implemented changes to eliminate problematic items or revise procedures prior to the administration of the assessment.
    • How can you be sure that countries administer the test in the same way?
      • The design and implementation of PIAAC was guided by technical standards and guidelines developed by literacy experts to ensure that the survey yielded high-quality and internationally comparable data. For example, for their survey operations, participating countries were required to develop a quality assurance and quality control program that included information about the design and implementation of PIAAC data collection. In addition, all countries were required to adhere to recognized standards of ethical research practices in regard to respect for respondent privacy and confidentiality, the importance of ethics and scientific rigor in research involving human subjects, and avoiding practices or methods that may harm or seriously mislead survey participants. Compliance with the technical standards was mandatory and monitored throughout the development and implementation phases of the data collection through direct contact, submission of evidence that required activities were completed, and on-going collection of data from countries concerning key aspects of implementation.

        In addition, participating countries provided standardized training to the interviewers who administered the assessment in order to familiarize them with the survey procedures that would allow them to administer the assessment consistently across respondents and reduce the potential for erroneous data. After the data collection process, the quality of each participating country's data was reviewed prior to publication. The review was based on the analysis of the psychometric characteristics of the data and evidence of compliance with the technical standards.
    • What does problem solving test or measure?
      • The problem solving in technology-rich environments domain assesses the cognitive processes of problem solving:  goal setting, planning, selecting, evaluating, organizing, and communicating results. In a digital environment, these skills involve understanding electronic texts, images, graphics and numerical data, as well as locating, evaluating, and critically judging the validity, accuracy, and appropriateness of the accessed information.
    • What are “technology-rich environments”?
      • The environment in which PIAAC problem solving is assessed is meant to reflect the fact that digital technology has changed the ways in which individuals live their day-to-day lives, communicate with others, work, conduct their affairs, and access information. The information and communication technology tools such as computer applications, the Internet, and mobile technologies are all part of the environments in which individuals operate. In PIAAC, items for problem solving in technology-rich environments are presented on laptop computers in simulated software applications using commands and functions commonly found in email, web browsers, and spreadsheets.
    • How does PIAAC select a representative sample of adults?
      • Countries that participate in PIAAC must draw a sample of individuals ages 16 to 65 that represent the entire population of adults living in households in the country. Some countries draw their samples from their national registries of all persons in the country; others draw their samples from census data. In the United States, a nationally representative household sample was drawn from the most current Census Bureau population estimates.

        The U.S. sample design employed by the PIAAC is generally referred to as a four-stage stratified area probability sample. This method involves the selection of (1) primary sampling units (PSUs) consisting of counties or groups of contiguous counties, (2) secondary sampling units (referred to as segments) consisting of area blocks, (3) dwelling units (DUs), and (4) eligible persons (ultimate sampling unit) within DUs. Random selection methods are used, with calculable probabilities of selection at each stage of sampling. This sample design ensures the production of reliable statistics for a minimum of 5,000 completed cases.
    • Who was included in the sample (who was not)?
      • The PIAAC main study's target population consisted of non-institutionalized adults age 16 to 65 who resided in the United States at the time of survey, where age was determined as part of an initial screener questionnaire. Adults were included regardless of citizenship, nationality, or language. The main study's target population included only persons living in households or group quarters; it excluded all other persons (such as persons living in shelters, the incarcerated, military personnel who live in barracks or bases, or persons who live in institutionalized group quarters, such as hospitals or nursing homes). The target population included full-time and part-time members of the military who did not reside in military barracks or military bases, adults in other non-institutional collective dwelling units, such as workers' quarters or halfway homes, and adults who lived at school in student group quarters, such as a dormitory, fraternity or sorority.

        In 2013-14, the U.S. PIAAC National Supplement repeated the administration of PIAAC with a target population of 3,600 adults.  It used the same procedures, instruments, and assessments that were used for the PIAAC Main Study (2011-12). The National Supplement increased the sample size of key U.S. subgroups of interest, including unemployed adults (ages 16–65), young adults (ages 16–34), and older adults (ages 66–74). A separate Prison Study extended the PIAAC assessment to incarcerated adults in the United States. The Prison Study sample included 1,200 inmates (ages 16–74) in state, federal, or private prisons in the United States.
    • Were (immigrants/illegal immigrants/non-English speakers) assessed? / Did they bring down our scores?
      • All adults, regardless of immigration status, were part of the main study's target population for the assessment. In order to get a representative sample of the adult population currently residing in the United States, respondents were not asked about citizenship status before taking the assessment and were guaranteed anonymity for all their answers to the background questionnaire. Although the assessment was only administered in English, the background questionnaire was offered in both Spanish and English. These procedures allowed the estimates to be applicable to all adults in the United States, regardless of citizenship or legal status, and they mitigated the effects of low-English language proficiency. Non-native born adults had, on average, lower scores than native-born adults. This was true in most participating countries. The percentage of non-native-born adults in the U.S. was 15 percent. The average percentage of non-native-born adults across all participating countries was 12 percent, and ranged from 28 percent in Australia to less than 1 percent in Japan.
    • What if countries select only their highest performing adults to participate?  Won't they look better than the rest?
      • Sampling is carefully planned and monitored. The rules of participation require that countries design a sampling plan according to the standards provided in the PIAAC Technical Standards and Guidelines and submit their plans to the PIAAC Consortium for approval. In addition to a sampling plan, countries were required to complete quality control forms to verify that their sample was selected in an unbiased and randomized way. Quality checks were performed by the PIAAC Consortium to ensure that submitted sampling plans were accurately followed.
    • Are adults required to participate in PIAAC?
      • No, PIAAC is a voluntary assessment.
    • How do international assessments deal with the fact that adult populations in participating countries are so different —for example, the U.S. has higher percentages of immigrants than some other countries?
      • The PIAAC results are nationally representative and therefore reflect countries as they are: highly diverse or not. PIAAC collects extensive information about respondents' background and therefore support analyses that take into account differences in the amount of diversity across countries. The international PIAAC report produced by the OECD presents some analyses that examine issues of diversity.
    • How does PIAAC differ from the international student assessments?
      • As an international assessment of adult competencies, PIAAC differs from student assessments in several ways. PIAAC assesses a wide range of ages (16 through 65) whereas student assessments target a specific age (e.g., 15-year-olds in the case of PISA) or grade (e.g., grade 4 in PIRLS). PIAAC is a household assessment (i.e., an assessment administered in individuals' homes), whereas the international student assessments (PIRLS, PISA, and TIMSS) are conducted in schools. The skills that are measured in each assessment also differ based on the goals of the assessment. Both TIMSS and PIRLS are curriculum-based and are designed to assess what students have been taught in school in specific subjects, such as science, mathematics or reading, using multiple-choice and open-ended test questions. In contrast, PIAAC and PISA are “literacy” assessments, designed to measure performance in certain skill areas at a broader level than school curricula. So while TIMSS and PIRLS aim to assess particular academic knowledge that students are expected to be taught at particular grades, PISA and PIAAC encompass a broader set of skills that students and adults have acquired throughout life.
    • How does PIAAC differ from earlier adult literacy assessments, like IALS and ALL?
      • PIAAC has improved and expanded on the cognitive frameworks of previous large-scale literacy assessments, including NALS, NAAL, IALS, and ALL, and has added an assessment of problem solving via computer, which was not a component of these earlier surveys. In addition, PIAAC is capitalizing on prior experiences with large-scale assessments in its approach to survey design and sampling, measurement, data collection procedures, data processing, and weighting and estimation. Finally, the most significant difference between PIAAC and previous large-scale assessments is that PIAAC is administered on laptop computers, and is designed to be a computer-adaptive assessment so respondents will receive groups of items targeted to their performance levels (respondents not able to or not wishing to take the assessment were provided with an equivalent paper and pencil version of the literacy and numeracy items). Because of these differences, PIAAC introduces a new set of scales to measure adult literacy, numeracy, and problem solving. Some scales from these previous adult assessments have been mapped to the PIAAC scales so that performance can be measured over time.
    • How do PIAAC and PISA compare?
      • PISA and PIAAC both emphasize knowledge and skills in the context of everyday situations, asking students and adults to perform tasks that involve real-world materials as much as possible. PISA is designed to show the knowledge and skills 15-year-old students have accumulated within and outside of school. It is intended to provide insight into what students who are about to complete compulsory education know and are able to do. PIAAC focuses on adults who are already eligible to be in the workforce, and aims to measure the set of literacy, numeracy, and technology-based problem solving skills an individual needs in order to function successfully in society. Therefore, PIAAC is not directly measuring the academic skills or knowledge adults may have learned in school. The PIAAC assessment focuses on tasks adults may encounter in their lives at home, work, or in their community.
    • Why are PIAAC scales different than those for NAAL, IALS, and ALL?
      • PIAAC has improved and expanded on the cognitive frameworks of previous large-scale literacy assessments, including NAAL, IALS, and ALL, and has added an assessment of problem solving via computer, which was not a component of these earlier surveys. In addition, PIAAC is capitalizing on prior experiences with large-scale assessments in its approach to survey design and sampling, measurement, data collection procedures, data processing, and weighting and estimation. Finally, the most significant difference between PIAAC and previous large-scale assessments is that PIAAC is administered on laptop computers, and is designed to be a computer-adaptive assessment so respondents will receive groups of items targeted to their performance levels (respondents not able to or not wishing to take the assessment were provided with an equivalent paper-and-pencil version of the literacy and numeracy items). Because of these differences, PIAAC introduces a new set of scales to measure adult literacy, numeracy, and problem solving. Some scales from these previous adult assessments have been mapped to the PIAAC scales so that performance can be measured over time.
    • Why aren't you reporting the change over time for more countries? / Why are there so few countries in the trend tables?
      • Trends are only reported for countries that participated in both IALS and ALL. Of the PIAAC countries, 6 countries participated in ALL and 14 participated in IALS. All 6 countries which participated in ALL also participated in IALS.
    • Why did U.S. adults between 45 and 65 do better than the international average, and U.S. adults between 16 and 34 do worse?
      • This has been a consistent trend in the United States across IALS, ALL, and PIAAC. It is probably related to a number of causes but we don't currently know why this is true and an answer can only be postulated after education researchers examine the rich data that was collected by PIAAC.
    • Why doesn't PIAAC report differences between minorities in the U.S. and minorities in other countries?
      • Each country can collect data for subgroups of the population that have national importance. In some countries, these subgroups are identified by language usage, in other countries they are distinguished by tribal affiliation.  In the United States, different racial and ethnic subgroups are of national importance. However, categories of race and ethnicity are social and cultural categories that differ greatly across countries. As a result, they cannot be accurately compared across countries.
    • Can you use the international data to report scores for states?
      • In the United States, PIAAC results can only be reported at the national level because of the size of the U.S. sample. In the United States, 5,000 adults participated in PIAAC, which are not enough respondents to produce accurate estimates at the state or county level. However, NCES is in the process of reviewing plans for producing state-level (synthetic) estimates.
    • How do international assessments deal with the fact that educational systems are so different across countries?
      • PIAAC collects extensive information on educational attainment and years of schooling. For the purposes of cross-country comparison, the education level classifications of each country are standardized on what is known as the International Standard Classification of Education (ISCED). This classification allows for cross-country comparisons of educational attainment. For example, the ISCED level for short-cycle tertiary education (ISCED level 5) is equivalent to an Associate degree in the United States, and therefore comparisons of adults with Associate degrees or their equivalents can be made across countries using this classification.
    • Why are you measuring literacy in English only?
      • PIAAC assesses in the official language or languages of each participating country. Based on a 1988 congressional mandate and the 1991 National Literacy Act, the U.S. Department of Education is required to evaluate the status and progress of adults' literacy in English. However, in order to obtain background information from a wide range of respondents in the United States, the background questionnaire was administered in both English and Spanish.
    • What countries participated in PIAAC in 2012?
      • The following 23 countries or regions participated in PIAAC 2012 and are included in the U.S. national report published by NCES: Australia, Austria, Belgium, Canada, Cyprus, Czech Republic, Denmark, Estonia, Finland, France, Germany, Ireland, Italy, Japan, Republic of Korea, Netherlands, Norway, Poland, Slovak Republic, Spain, Sweden, United Kingdom, and the United States.

        The Russian Federation also participated in 2012 but their data are not included in NCES report or data explorer due to technical issues.
  • About PIRLS (the Progress in International Reading Literacy Study)
    • What aspects of reading literacy are assessed in PIRLS?
      • PIRLS focuses on three aspects of reading literacy:
        • purposes of reading;
        • processes of comprehension; and
        • reading behaviors and attitudes.
        The first two form the basis of the written test of reading comprehension. The student background questionnaire addresses the third aspect.

        In PIRLS, purposes of reading refers to the two types of reading that account for most of the reading done by young students, both in and out of school: (1) reading for literary experience, and (2) reading to acquire and use information. In the assessment, narrative fiction is used to assess students' ability to read for literary experience, while a variety of informational texts are used to assess students' ability to acquire and use information while reading. The PIRLS assessment contains about an equal proportion devoted to each of these two purposes.

        Processes of comprehension refer to ways in which readers construct meaning from the text. Readers focus on and retrieve information; make inferences; interpret and integrate ideas and information; and examine and evaluate content, language, and textual elements.

        For more information on the purposes for reading and processes of comprehension, see the PIRLS 2011 Assessment Framework.
    • How many U.S. schools and students participated in previous PIRLS cycles?
      • Assessment year Number of participating students Number of participating schools Overall weighted response rate (percent)
        2001 3,763 174 83
        2006 5,190 183 82
        2011 12,726 370 81

        NOTE: The overall weighted response rate is the product of the school participation rate, after replacement, and the student participation rate, after replacement.
    • How does PIRLS select a representative sample of students?
      • Each participating country agrees to select a sample which is representative of the target population as a whole. In 2001, the target population was the upper of the two adjacent grades with the most 9-year-olds. For PIRLS 2006 and 2011, the definition of the target population has been refined to represent students in the grade that corresponds to four years of schooling, counting from the first year of International Standard Classification of Education (ISCED) Level 1—4th grade in most countries, including the United States. This population represents an important stage in the development of reading. At this point, generally children have learned to read and are using reading to learn. IEA's Trends in International Mathematics and Science Study (TIMSS) has also chosen to assess this target population of students.

        In each administration of PIRLS, schools are randomly selected first (with a probability proportional to the estimated number of students enrolled in the target grade) and then one or two classrooms are randomly selected within each school. In 2001, a nationally representative sample of 3,763 U.S. 4th-grade students was selected from a sample of 174 schools. In 2006, a nationally representative sample of 5,190 U.S. 4th-grade students was selected from a sample of 183 schools. In 2011, a nationally representative sample of 12,726 U.S. 4th-grade students was selected from a sample of 370 schools.

        The reason for a larger sample size in 2011 than in previous administrations of PIRLS was that in 2011 both TIMSS and PIRLS happened to coincide in the same year. The decision was made to draw a larger sample of schools and to request that both studies be administered in the same schools (where feasible), albeit to separate classroom samples of students. Thus, TIMSS (grade 4) and PIRLS in the United States were administered in the same schools but to separately sampled classrooms of students.
    • Have there been changes in the countries participating in PIRLS?
      • Yes. The composition of participating countries in PIRLS changed somewhat from 2001 to 2011, as some countries dropped out and others joined.

        The table below lists the total number of education systems that have participated in each of the three administrations of PIRLS at grade 4. This number includes both countries and subnational entities, such as Canadian provinces, U.S. states, England, and Hong Kong. For more information on participating education systems, visit the PIRLS Country Page.

        Year Education systems participating
        PIRLS at grade 4
        2001 36
        2006 45
        2011 53
    • If the makeup of the countries changes across the years, how can one compare countries to the PIRLS scale average?
      • PIRLS scores are reported on a scale from 0–1,000 with the scale average fixed at 500 and a standard deviation of 100. The PIRLS scale average was set in 2001 and reflects the combined proficiency distribution of all students in all jurisdictions participating in 2001. To allow comparisons between 2001 and 2006, scores of students in jurisdictions that participated in both 2001 and 2006 were used to scale the 2006 results. The 2006 scores were linked to the 2001 scale using items common to both assessments. Likewise, scores of students in jurisdictions that participated in both 2006 and 2011 were used to scale the 2011 results. Once scores from the 2011 assessment were scaled to the 2006 scale, scores of students in jurisdictions that participated in 2011 but not in 2006, were placed on the PIRLS scale.
    • How does PIRLS compare to the NAEP fourth-grade reading assessment?
      • Several studies have compared PIRLS and NAEP in terms of their measurement frameworks and the reading passages and questions included in the assessments. The studies found the following similarities and differences:

        Similarities include:
        • PIRLS and NAEP call for students to develop interpretations, make connections across text, and evaluate aspects of what they have read.
        • PIRLS and NAEP use literary passages drawn from children's storybooks and informational texts as the basis for the reading assessment.
        • PIRLS and NAEP use multiple-choice and constructed-response questions with similar distributions of these types of questions.
        Differences include:
        • PIRLS reading passages are, on average, shorter than fourth grade NAEP reading passages.
        • Results of readability analyses suggest that the PIRLS reading passages are easier than the NAEP passages (by about one grade level on average).

        PIRLS calls for more text-based interpretation than NAEP. NAEP places more emphasis on having students take what they have read and connect to other readings or knowledge and to critically evaluate what they have read.

        Downloadable comparison studies with NAEP can be found at http://nces.ed.gov/surveys/international/cross-study-comparisons.asp
    • When are PIRLS data collected?
      • In both hemispheres, PIRLS is conducted near the end of the school year. Thus, for PIRLS 2011, countries in the Southern Hemisphere conducted the study between October and December, 2010. Countries in the Northern Hemisphere conducted the study between March and June, 2011.
    • Where can I get a copy of the PIRLS U.S. Report?
    • When is PIRLS scheduled to be administered next?
      • The next PIRLS assessment is scheduled for 2016. For more information on the 2016 assessment schedule, visit the PIRLS Schedule & Plans Page.
  • About PISA (the Program for International Student Assessment)
    • What does PISA measure?
      • PISA measures student performance in mathematics, reading, and science literacy. Conducted every 3 years, each PISA data cycle assesses one of the three core subject areas in depth (considered the major domain), although all three core subjects are assessed in each cycle (the other two subjects are considered minor subject areas for that assessment year). Assessing all three subjects every 3 years allows countries to have a consistent source of achievement data in each of the three subjects while rotating one area as the primary focus over the years. More information on the PISA assessment frameworks can be found at: www.oecd.org/pisa/pisaproducts.

        Mathematics was the major subject area in 2012, as it was in 2003, since each subject is a major subject area once every three cycles. In 2012, mathematics, science, and reading literacy were assessed primarily through a paper-and-pencil assessment, and problem solving was administered via a computer-based assessment. In addition to these core assessments, education systems could participate in optional paper-based financial literacy and computer-based mathematics and reading assessments. The United States participated in these optional assessments.

        PISA administration cycle
        Assessment year 2000 2003 2006 2009 2012 2015
        Subjects assessed READING
        Mathematics
        Science
        Reading
        MATHEMATICS
        Science
        Problem solving
        Reading
        Mathematics
        SCIENCE
        READING
        Mathematics
        Science
        Reading
        MATHEMATICS
        Science
        Problem solving
        Financial literacy
        Reading
        Mathematics
        SCIENCE
        Collaborative
        problem solving
        Financial literacy

        NOTE: Reading, mathematics, and science literacy are all assessed in each assessment cycle of the Program for International Assessment (PISA). A separate problem-solving assessment was administered in 2003 and 2012. The subject in all capital letters is the major subject area for that cycle. Problem solving was assessed on computer in 2012. As of PISA 2015, PISA will be administered entirely on computer. Financial literacy is an optional assessment for countries.
    • What are the components of PISA?
      • Assessments
        PISA 2012 consisted of a paper-based assessment of students' mathematics, science, and reading literacy and a computer-based assessment of problem solving. Countries could also opt to participate in an assessment of financial literacy and computer-based assessments in mathematics and reading. In each participating school, sampled students sat for a two-hour paper-based assessment. A subsample returned for a second session in which they completed a 40-minute computer-based assessment of problem solving, mathematics or reading, or a combination of these subjects.

        Questionnaires
        In 2012, students completed a 30-minute student questionnaire providing information about their background, attitudes towards mathematics, and learning strategies. In addition, the principal of each participating school completed a 30-minute school questionnaire providing information on the school's demographics and learning environment. PISA also includes a contextual questionnaire for the students' parents or guardians, though the United States has not administered this questionnaire. The PISA questionnaires used in the United States are available at: http://nces.ed.gov/surveys/pisa/questionnaire.asp.
    • How many U.S. schools and students participate in PISA?
      • Assessment year Number of participating students Number of participating schools School response rate (percent) Overall student response rate (percent)
        Original schools With substitute schools
        2000 3,700 145 56 70 85
        2003 5,456 262 65 68 83
        2006 5,611 166 69 79 91
        2009 5,233 165 68 78 87
        2012 6,111 161 67 77 89
    • How does PISA select a representative sample of students?
      • Step 1
        To provide valid estimates of student achievement and characteristics, PISA selects a sample of students that represents the full population of 15-year-old students in each participating country or education system. This population is defined internationally as 15-year-olds (15 years and 3 months to 16 years and 2 months at the beginning of the testing period) attending both public and private schools in grades 7-12. Each country or education system submits a sampling frame to the consortium of organizations responsible for the implementation of PISA 2012 internationally. Westat, a survey research firm in Rockville, Maryland, contracted by the OECD, then validates each country or education system's frame.

        Step 2
        Once a sampling frame is validated, Westat draws a scientific random sample of a minimum of 150 schools from each frame with two replacement schools for each original school, unless there are less than 150 schools, in which case all schools would be sampled. A minimum of 50 schools were sampled for benchmarking participants (e.g., U.S. states that participated in 2012). The list of selected schools, both original and replacement, is delivered to each education system's PISA national center. Countries and education systems do not draw their own samples.

        Step 3
        Each country/education system is responsible for recruiting the sampled schools. They begin with the original sample and only use the replacement schools if an original school refuses to participate. In accordance with PISA guidelines, replacement schools are identified by assigning the two schools neighboring the sampled school in the frame as substitutes to be used in instances where an original sampled school refuses to participate. Replacement schools are required to be in the same implicit stratum (i.e., have similar demographic characteristics) as the sampled school. A minimum participation rate of 65 percent of schools from the original sample of schools is required for a country or education system's data to be included in the international database.

        Step 4
        After schools are sampled and agree to participate, students are sampled. Each country/education system submits student listing forms containing all age-eligible students for each of their schools to ACER, an education research firm in Australia and the lead organization of the PISA 2012 international consortium, for student level sampling.

        Step 5
        ACER carefully reviews the student lists and uses sophisticated software to perform data validity checks to compare each list against what is known of the schools (e.g., expected enrollment, gender distribution) and PISA eligibility requirements (e.g., grade and birthday ranges). The selected student samples are then sent back to each national center. Unlike school sampling, students are not sampled with replacement.

        Step 6
        Schools inform students of their selection to participate on assessment day. Student participation must be at least 80 percent for a country's/education system's data to be reported by the OECD.
    • Which countries participate in PISA?
      • Countries and education systems within countries participate in PISA.

        • PISA 2000: 43 countries and education systems participated (11 of these administered PISA 2000 in 2001/2002).
        • PISA 2003: 41 countries and education systems participated.
        • PISA 2006: 57 countries and education systems participated.
        • PISA 2009: 75 countries and education systems participated (10 of these administered PISA 2009+ in 2010).
        • PISA 2012: 65 countries and education systems participated.
        The list of countries and education systems that participated in each PISA cycle is available at: http://nces.ed.gov/surveys/pisa/countries.asp.
    • How does the performance of U.S. students in mathematics and science on PISA compare with U.S. student performance on TIMSS?
      • Before talking about how the TIMSS results compare with the PISA results, it is important to recognize the ways in which TIMSS and PISA differ.

        While TIMSS and PISA both assess mathematics and science, they differ with respect to which students are assessed, what is measured, and who the competition is.
        • TIMSS assesses younger students (4th- and 8th-graders) on their knowledge of specific mathematics and science topics and cognitive skills that are closely linked to the curricula of the participating countries. PISA assesses older students (15-year-old students) in mathematics literacy and science literacy, or how well they can apply their knowledge and skills to problems set in real-world contexts.
        • While there is some overlap in content, each assessment may have unique topics or different emphases and the nature of the items may differ as well, given their different focuses.
        • Not all countries have participated in TIMSS and PISA, or in all administrations of either assessment. Both TIMSS and PISA include developed and developing countries; however, TIMSS has a larger proportion of developing countries participating than PISA does because PISA is principally a study of the member countries of the OECD—an intergovernmental organization of developed countries. All 34 OECD countries participate in PISA, but not all of those countries participate in TIMSS.
        On TIMSS, students at 4th and 8th grades performed above the TIMSS scale average in both mathematics and science, unlike what we see in PISA in which—in 2012—U.S. 15-year-olds performed below (in mathematics) or not measurably different than (in science) the OECD averages. Five East Asian countries and education systems (Singapore, Korea, Hong Kong-China, Chinese Taipei, and Japan) outperformed the United States in mathematics and science in both TIMSS and PISA.
        • Mathematics. The 2011 TIMSS results showed that U.S. students' average mathematics score for both 4th-graders and 8th-graders were above the TIMSS scale average, which is set at 500 for every administration of TIMSS at both grades. At the 8th grade, students in 6 countries and 5 states or provinces had higher mathematics scores than U.S. students on average; this includes Singapore, Korea, Hong Kong-China, Chinese Taipei, Japan, Russian Federation, Indiana, Massachusetts, Minnesota, North Carolina, and Quebec.
        • Science. The 2011 TIMSS results showed that U.S. students' average science scores for both 4th-graders and 8th-graders were above the TIMSS scale average, which is set at 500 for every administration of TIMSS at both grades. However, students in 8 countries and 4 states or provinces outperformed U.S. students at the 8th grade level: Chinese Taipei, Finland, Japan, Korea, Singapore, Russian Federation, Slovenia, Hong Kong-China, Colorado, Massachusetts, Minnesota, and Alberta, Canada.
    • When is PISA data collected in the United States?
      • PISA operates on a 3-year cycle, with 2000 being the first assessment year. For PISA 2000, the U.S. data collection began in April and ended in May. For PISA 2003, the U.S. data collection was conducted in the spring (the same as in 2000) and again in the fall, beginning in September and ending in November. For PISA 2006 and 2009, the U.S. data collection was conducted only in the fall (September–November). The PISA 2012 data collection was administered between October and November of 2012.
    • Where can I get a copy of the U.S. PISA reports?
    • When is PISA next scheduled to be administered?
      • The next administration of PISA is in 2015. Results will be reported at the end of 2016.
    • How is the OECD Test for Schools related to PISA?
      • In 2012, the OECD piloted a new test, based on the PISA assessment frameworks and statistically linked to the PISA scales, for individual schools. The purpose of this test, called the OECD Test for Schools in the United States, is for individual schools to benchmark their performance internationally. More information about this is available from the OECD at: http://www.oecd.org/pisa/aboutpisa/pisa-basedtestforschools.htm.
    • How does PISA differ from other international assessments?
      • PISA differs from these studies in several ways:

        Content
        PISA is designed to measure "literacy" broadly, while other studies, such as TIMSS and NAEP, have a stronger link to curriculum frameworks and seek to measure students' mastery of specific knowledge, skills, and concepts. The content of PISA is drawn from broad content areas, such as space and shape for mathematics, in contrast to more specific curriculum-based content such as geometry or algebra.

        Tasks
        In addition to the differences in purpose and age coverage between PISA and other international comparative studies, PISA differs from other assessments in what students are asked to do. PISA focuses on assessing students' knowledge and skills in reading, mathematics, and science literacy in the context of everyday situations. That is, PISA emphasizes the application of knowledge to everyday situations by asking students to perform tasks that involve interpretation of real-world materials as much as possible. Analyses based on expert panels' reviews of mathematics and science items from PISA, TIMSS, and NAEP indicate that PISA items require multi-step reasoning more often than either TIMSS or NAEP. The study also shows that PISA mathematics and science literacy items often involve the interpretation of charts and graphs or other "real world" material. These tasks reflect the underlying assumption of PISA: as 15-year-olds begin to make the transition to adult life, they need to not only comprehend what they read or to retain particular mathematical formulas or scientific concepts, they need to know how to apply their knowledge and skills in the many different situations they will encounter in their lives.

        Moreover, NAEP and PISA have different underlying approaches to mathematics that play out in the operationalization of items. NAEP focuses more closely on school-based curricular attainment whereas PISA focuses on literacy, or the use of mathematics in real-word situations. The implication of this difference is that while the NAEP assessment is not devoid of real-world contexts, it does not specifically require them; thus it includes computation items as well as problem solving items U.S. students are likely to encounter in school. PISA does not include any computation items (nor any items) that are not placed within a real-world context and, in that way, may be more unconventional to some students. PISA items also may have a heavier reading load, use a greater diversity of visual representations, and require students to make assumptions or sift through information that is irrelevant to the problem (i.e., ‘mathematize'), whereas NAEP items typically do not include this aspect. These are thus other ways in which the assessments differ and explain divergent trend results. A study comparing the PISA and NAEP (grades 8 and 12) reading assessments found that PISA and NAEP view reading as a constructive process and both measure similar cognitive skills. There are differences between them, though, reflecting in part the different purposes of the assessments. First, NAEP has longer reading passages than PISA and asks more questions about each passage, which is possible because of the NAEP passages' longer length. With regard to cognitive skills, NAEP has more emphasis on critiquing and evaluating text, while PISA has more emphasis on locating information.

        NAEP also measures students' understanding of vocabulary in context and PISA does not include any questions of this nature. Finally, NAEP has a greater emphasis on multiple-choice items compared to PISA and the nature of the open-ended items differs, where PISA open-ended items call for less elaboration and support from the text than do those in NAEP.

        To learn more about the differences in the respective approaches to the assessment of mathematics, science and reading among PISA, TIMSS, and NAEP, see the following papers (a paper comparing NAEP and PISA 2012 is forthcoming): Age-based sample
        The goal of PISA is to represent outcomes of learning rather than outcomes of schooling. By placing the emphasis on age, PISA intends to show what 15-year-olds have learned inside and outside of school throughout their lives, not just in a particular grade. Focusing on age 15 provides an opportunity to measure broad learning outcomes while all students across the many participating nations are still required to be in school. Finally, because years of education vary among countries and education systems, choosing an age-based sample makes comparisons across countries and education systems somewhat easier.

        Information collected
        The kind of information PISA collects also reflects a policy purpose somewhat different from the other assessments. PISA collects only background information related to general school context and student demographics. This differs from other international studies such as TIMSS, which also collects background information related to how teachers in different countries approach the task of teaching and how the approved curriculum is implemented in the classroom. The TIMSS video studies further extend this work by capturing images of instruction across countries. The results of PISA will certainly inform education policy and spur further investigation into differences within and between countries and education systems, but PISA is not intended to provide direct information about improving instructional practice in the classroom. The purpose of PISA is to generate useful indicators to benchmark performance and inform policy.
    • Are PISA scores of individual students reported or available for analysis?
      • Student and school-level data are available for download and analysis. However, the assessment methods used in international assessments only produce valid scores for groups, not individuals. Data from PISA 2012 for all countries, including the United States can be obtained from the OECD website at www.oecd.pisa.org. Data collected in the United States for PISA can be downloaded from: http://nces.ed.gov/pubsearch/getpubcats.asp?sid=098 (2012 data forthcoming).
    • Can you report PISA results for states?
      • Yes and no. The U.S. national PISA results are representative of the nation as a whole but not of individual states. Drawing a sample that is representative of all 50 individual states would require a much larger sample than the United States currently draws for international assessments, requiring considerable amounts of additional time and money. A state may elect to participate in PISA as an individual education system—as Connecticut, Florida and Massachusetts did in 2012—and in that case a sample is drawn that is representative of that state.
  • About TIMSS (the Trends in International Mathematics and Science Study)
    • Who is in charge of TIMSS?
      • The National Center for Education Statistics (NCES), part of the U.S. Department of Education, is responsible for administering TIMSS in the United States and for representing the United States in international collaboration on TIMSS.

        The International Association for the Evaluation of Educational Achievement (IEA) coordinates TIMSS internationally. The IEA is an independent international cooperative of national research institutions and government agencies with nearly 70 member countries worldwide. The IEA has a permanent secretariat based in Amsterdam, and a data processing and research center in Hamburg, known as the IEA Data Processing Center (DPC).

        The IEA contracts with the TIMSS & PIRLS International Study Center at Boston College to lead the design and implementation of TIMSS. The TIMSS & PIRLS International Study Center works with country representatives, called National Research Coordinators, to design and implement TIMSS, assure quality control and international comparability, and report results. The U.S. National Research Coordinator is Stephen Provasnik of NCES. Data collection for TIMSS 2015 within the United States is done under contract with WESTAT, Inc.
    • Can my school sign up to participate in TIMSS?
      • Schools cannot sign up to participate in TIMSS as part of the national U.S. sample. It is important for fair comparisons across countries that each country only include in its national sample those schools and students scientifically sampled by the international contractor to fairly represent the country.
    • How does TIMSS select a representative sample of students?
      • To provide valid estimates of student achievement and characteristics, TIMSS selects a random sample of students that represents the full population of students in the target grades. This population is defined internationally as the following:
        Fourth-grade: all students enrolled in the grade that represents four years of formal schooling, counting from the first year of the International Standard Classification of Education (ISCED), Level 1, providing the mean age at the time of testing is at least 9.5 years.

        Eighth-grade: all students enrolled in the grade that represents eight years of formal schooling, counting from the first year of ISCED Level 1, providing the mean age at the time of testing is at least 13.5 years.

        Twelfth-grade: All students in the final year of secondary schooling who are taking or have taken advanced mathematics or physics courses.
        TIMSS guidelines call for a minimum of 150 schools to be sampled per grade, with a minimum of 4,000 students assessed per grade. The school response rate target is 85 percent for all countries. A minimum participation rate of 50 percent of schools from the original sample of schools is required for a country's data to be included in the international database. The response rate target for classrooms is 95 percent, and the target student response rate is set at 85 percent, from both original and substitute schools.

        Countries are allowed to use substitute schools (selected during the sampling process) to increase the response rate once the 50 percent minimum participation rate of original school sampling is reached. In accordance with TIMSS guidelines, substitute schools are identified by assigning the two schools neighboring the sampled school in the frame as substitutes to be used in instances where an original sampled school refuses to participate. Substitute schools are required to be in the same implicit stratum (i.e., have similar demographic characteristics) as the sampled school.

        U.S. sampling frame
        The TIMSS U.S. sample is drawn from the Common Core of Data (CCD) listing of public schools supplemented with the Private School Universe Survey (PSS) listing of private schools. The combination of these national listings has proven to be close to 100 percent complete.

        U.S. sampling design
        The U.S. TIMSS sample uses a stratified two-stage cluster sampling design. The U.S. sampling frame, or list of schools from which the sample is selected, is both explicitly and implicitly stratified (that is, sorted for sampling).

        The U.S. sampling frame is explicitly stratified by three categorical stratification variables: (1) the percentage of students eligible for free or reduced-price lunch, (2) school control (public or private), and (3) region of the country (Northeast, Central, West, Southeast).  Explicit stratification controls completely the sample size for a specific variable or variables so that the proportion of schools in the specific variable's subgroups exactly matches that of the population.

        The U.S. sampling frame is implicitly stratified by two categorical stratification variables:  community type (city, suburb, town, or rural) and minority status (i.e., above or below 15 percent of the student population). Implicit stratification controls the sample size for a specific variable or variables, but it does not do so completely because it does not rely on independent random draws within each stratum, as occurs with explicit stratification.  Instead, implicit stratification entails sorting the list of all schools by the implicit stratification variable(s), and taking a systematic sample. The sample's proportion of schools in the specific variable's subgroups will then be close to that of the population. The variability of the sample sizes in the subgroups will be reduced considerably by systematic sampling, but it will not be reduced to zero as in explicit stratification.

        Once the sampling frame has been stratified, the first stage of the sampling design makes use of a systematic “probabilities proportional to size” technique to select schools for the original sample that are representative of the U.S. as a whole. The second stage of the sampling design consists of selecting intact mathematics classes within each participating school. All students in sampled classrooms are selected for assessment. In this way, the overall sample design for the United States is intended to approximate a self-weighting sample of students as much as possible, with each fourth- or eighth-grade student having an equal probability of selection.
    • How many U.S. schools and students participated in previous TIMSS cycles?
      • At grade 4
        Assessment year Number of participating schools Number of participating students Overall weighted response rate (percent)
        1995 182 7,296 80
        2003 248 9,829 78
        2007 257 7,896 84
        2011 369 12,569 80

        At grade 8
        Assessment year Number of participating schools Number of participating students Overall weighted response rate (percent)
        1995 183 7,087 78
        1999 221 9,072 85
        2003 232 8,912 73
        2007 239 7,377 77
        2011 501 10,477 81

        At grade 12
        Assessment year Number of participating schools Number of participating students Overall weighted response rate (percent)
        1995      
            Advanced mathematics 199 2,349 67
           Physics 203 2,678 68

        NOTE: The overall weighted response rate is the product of the school participation rate, after replacement, and the student participation rate, after replacement. There was no grade 4 assessment in 1999.
    • Have there been changes in the countries participating in TIMSS?
      • Yes. Please follow this link to a table of all TIMSS participating countries and non-national education systems for each of the TIMSS years of administration.
    • If the makeup of the countries changes across the years, how can one compare countries to the TIMSS scale average?
      • Achievement results from TIMSS are reported on a scale from 0 to 1,000, with a TIMSS scale average of 500 and standard deviation of 100. The scale is based on the 1995 results, and the results of all subsequent TIMSS administrations have been placed on this same scale. This allows countries to compare their performance over time as well as to compare with a set standard, the TIMSS scale average.
    • What areas of mathematics and science are assessed in TIMSS?
      • At grade 4, TIMSS focuses on three domains of mathematics:
        • numbers (manipulating whole numbers and place values; performing addition, subtraction, multiplication, and division; and using fractions and decimals),
        • geometric shapes and measures, and
        • data display.
        At grade 8, TIMSS focuses on four domains of mathematics:
        • numbers,
        • algebra,
        • geometry, and
        • data and chance.
        At grade 12, TIMSS focuses on three domains of advanced mathematics:
        • algebra,
        • calculus, and
        • geometry.
        At grade 4, TIMSS focuses on three domains of science:
        • life science,
        • physical science, and
        • Earth science.
        At grade 8, TIMSS focuses on four domains of science:
        • biology,
        • chemistry,
        • physics, and
        • Earth science.
        At grade 12, TIMSS focuses on three domains of advanced physics:
        • mechanics and thermodynamics,
        • electricity and magnetism and,
        • wave phenomena and atomic/nuclear physics.
    • How do the results of TIMSS compare with the results in PISA?
      • The TIMSS 2011 results at 8th grade, the grade closest to the age of the PISA students, showed U.S. average scores higher than the TIMSS scale average in both mathematics and science. In PISA 2009, the average scores of U.S. 15-year-old students were below (in mathematics) or not measurably different (in science) from the OECD average—the average score of students in the 34 Organization for Economic Cooperation and Development countries. How do we reconcile the apparent differences?

        The results from TIMSS and PISA are difficult to compare because the assessments are so different in at least three key ways that could influence results. First, TIMSS assesses 8th- and 4th-graders, while PISA is an assessment of 15-year-old students, regardless of grade level. (In the United States, PISA data collection occurs in the autumn, when most 15-year-olds are in 10th grade.) So, the grade levels of students in PISA and TIMSS differ. Second, the knowledge and skills measured in the two assessments differ. TIMSS is intended to measure how well students have learned the mathematics and science curricula in participating countries, whereas PISA is focused on application of knowledge to “real-world” situations.  Third, the participating countries in the two assessments differ. Both assessments cover much of the world, but they do not overlap neatly. Only 25 of the 42 participating education systems in TIMSS 2011 at the 8th-grade level participated in the PISA 2009 assessment of 15-year-olds. Both assessments include key economic competitors and partners, but the overall makeups of the countries participating in the two assessments differ markedly.  Thus, the “averages” used by the two assessments are in no way comparable, and the “rankings” often reported in media coverage of these two assessments are based on completely different sets of countries.
    • How does the mathematics and science achievement of U.S. students on TIMSS compare with achievement on NAEP?
      • Both TIMSS and NAEP provide a measure of fourth- and eighth-grade mathematics and science learning. It is natural to compare them, but the distinctions described below need to be kept in mind in understanding the converging or diverging results.

        Mathematics
        The most recent results from NAEP and TIMSS include information on trends over time in fourth- and eighth-grade mathematics achievement for a similar time interval: in NAEP between 1996 and 2011 and in TIMSS between 1995 and 2011.
        Both assessments showed statistically significant increases in the mathematics performance of fourth- and eighth-grade students between these years.
        Science
        The most recent results from TIMSS provide trend information for fourth- and eighth-grade science achievement between 1995 and 2011. In contrast, NAEP only provides trends for fourth-grade between 1996 and 2005 and for eighth-grade between 1996 and 2005 as well as between 2009 and 2011. (Due to a major revision of the NAEP Science Framework in 2009, no trend comparison can be made between 1996 and 2011). Compared with mathematics, the available trends shown by NAEP and TIMSS in science are less consistent with one another.
        In fourth grade, NAEP shows that there was an increase in students’ science performance overall between 1996 and 2005, whereas TIMSS did not detect any change in performance from 1995 to 2007.  In eighth-grade, NAEP detected no measurable change between 1996 and 2005, but shows an increase in student’s science performance overall between 2009 and 2011.  In contrast, TIMSS detected an increase between 1995 and 2003 and between 1995 and 2011, but detected no measurable change between 2003 and 2011 or between 2007 and 2011.
    • Can you directly compare TIMSS scores at grade 4 to scores at grade 8?
      • The scaling of TIMSS data is conducted separately for each grade and each content domain. While the scales were created to each have a mean of 500 and a standard deviation of 100, the subject matter and the level of difficulty of items necessarily differ between the assessments at both grades. Therefore, direct comparisons between scores across grades should not be made.
    • On TIMSS, why do U.S. boys outperform girls in mathematics at grade 4 but not at grade 8, and U.S. boys outperform girls in science at grade 8 but not at grade 4? Why aren't differences between the sexes more consistent?
      • The seeming inconsistencies between the achievement scores of U.S. boys and girls in mathematics and science are not easily explainable. Research into differences in achievement by sex has been unable to offer any definitive explanation for these differences. For example, Xie and Shauman (2003)i, in examining sex differences primarily at the high school level, find that "differences in math and science achievement cannot be explained by the individual and familial influences that we examine." Indeed, that sex differences vary in the participating TIMSS countries—some in favor of males and others in favor of females—would appear to support the idea that the factors related to sex differences in mathematics and science achievement are complicated.

        iXie, Y. & Shauman, K. (2003). Women in Science: Career Processes and Outcomes. Boston, MA: Harvard University Press.
    • When are TIMSS data collected?
      • TIMSS operates on a 4-year cycle, with 1995 being the first year it was administered. Countries in the Northern Hemisphere conduct the assessment between April and June of the assessment year, while countries in the Southern Hemisphere conduct the assessment in October and November of the assessment year. In both hemispheres the assessment is conducted near the end of the school year.
    • Where can I get a copy of the TIMSS U.S. Report?
    • When is TIMSS scheduled to be administered next?
      • TIMSS is scheduled to be administered next in 2015, with results to be reported at the end of 2016.
    • Can my state or school district or school sign up to obtain its own TIMSS results, independent of the U.S. results?
      • Yes, states, school districts, and schools can sign up to obtain their own TIMSS results at their own cost. Sample size restrictions apply. Please contact Stephen Provasnik, the U.S. TIMSS National Research Coordinator, for more information.
[Show All] International Exchange Programs and Foreign Study

Would you like to help us improve our products and website by taking a short survey?

YES, I would like to take the survey

or

No Thanks

The survey consists of a few short questions and takes less than one minute to complete.