Methodology and Technical Notes
This document describes characteristics of the Program for International Student Assessment (PISA) 2022 methodology, including sample design, test design, and scoring, with focus on U.S. implementation. In 2022, the United States took part in the core PISA assessment for mathematics, reading, and science literacy as well as the optional domain of financial literacy. For further details about the assessment and any of the topics discussed here, see the Organization for Economic Cooperation and Development's (OECD 2023) PISA 2022 Technical Report.
These Methodology and Technical Notes provide an overview, with a particular focus on the U.S. implementation, of the following technical aspects of PISA 2022:
- International Requirements
- Sampling and Data Collection in the United States
- Test Development
- Weighting
- Scaling of Data
- Proficiency Levels
- Data Limitations
- Descriptions of Background Variables
- Confidentiality and Disclosure Limitations
- Statistical Procedures
International and U.S. Response Rates
International Requirements
The OECD required all participating education systems (countries and subnational regions) to adhere to the PISA 2022 technical standards (OECD 2020), which provided detailed information about the target population, sampling, response rates, translation and adaptation, assessment administration, and data submission. According to the standards, the international desired population in each education system consisted of 15-year-olds attending publicly and privately controlled schools in grade 7 and higher. To provide valid estimates of student achievement and characteristics, the sample of PISA students had to be selected in a way that represented the full population of 15-year-old students in each education system. The sample design for PISA 2022 was a stratified systematic sample, with sampling probabilities proportional to the estimated number of 15-year-old students in the school based on grade enrollments. Samples were drawn using a two-stage sampling process. The first stage was a sample of schools, and the second stage was a sample of students within schools. The PISA international contractors responsible for the design and implementation of PISA internationally (hereafter referred to as the PISA consortium) drew the sample of schools for each education system that were supplied by the national study centers of each education system.
Sample Size. Each education system planning to administer computer-based assessments required at least 6,300 assessed students from at least 150 participating schools. For a country with a minimum of 150 schools, it was recommended to assess at least 42 students to satisfy the required threshold of 6,300 assessed students, with a minimum set at 25 students.[1] Following the PISA consortium guidelines, replacement schools were identified at the same time the PISA sample was selected by assigning the two schools neighboring the sampled school in the frame as replacements. For countries administering financial literacy, an additional sample of students was selected. In the United States up to 52 students were sampled within schools. Students were selected in an equal probability sample unless fewer than 52 students age 15 were available (in which case all 15-year-old students were selected).
Age Guidelines. Each education system collected its own data, following international guidelines and specifications. The technical standards required that students in the sample be 15 years and 3 months to 16 years and 2 months at the beginning of the testing period (hereafter referred to as "15-year-olds" or "15-year-old students"). The maximum length of the testing period was no longer than eight consecutive weeks in duration for computer-based testing participants, and no longer than six for paper-based testing participants. Most education systems conducted testing from March through August 2022.[2]
Response Rates. International guidelines were given for both school-level as well as student-level response rates.
- For the purpose of calculating school response rates, a participating school is defined as a sampled school in which more than 33 percent of sampled eligible, non-excluded students respond. The weighted school response-rate target was a minimum of 85 percent for all education systems. A minimum of 65 percent of schools from the original sample of schools was required to participate for an education system's data to be included in the international database. Education systems could use replacement schools (pre-selected during the sampling process) to increase the response rate once the 65 percent of original schools' benchmark had been reached. Note that schools were required to reach a student response rate greater than 33 percent to be included in the PISA dataset and in calculations for the school-level response rate.
- The technical standards also required a minimum participation rate of 80 percent of sampled, non-excluded students from schools (sampled and replacement) within each education system. This target applied in aggregate, not to each individual school. A student was considered a participant if he or she participated in the first testing session or a follow-up or makeup testing session. Follow-up sessions were required in schools where too few students participated in the originally scheduled test sessions to ensure a high overall student response rate. Replacement students within a school were not allowed. Data from education systems not meeting this requirement are excluded from international reports.
Exclusion Rate. PISA 2022 is designed to be as inclusive as possible. The guidelines allowed schools to be excluded for approved reasons (for example, schools in remote regions, very small schools, or special education-only schools). Schools used the following international guidelines on student exclusions:
Students with functional disabilities. These are students with a moderate to severe permanent physical disability such that they cannot perform in the PISA testing environment.
Students with intellectual disabilities. They have a cognitive, behavioral, or emotional disability confirmed by qualified staff, meaning they cannot take the PISA test. These are students who are cognitively, behaviorally, or emotionally unable to follow even the general instructions of the assessment.
Students with insufficient language experience. These are students who meet the three criteria of not being native speakers in the assessment language, having limited proficiency in the assessment language, and having less than 1 year of instruction in the assessment language.
Students currently not attending in-person classes, receiving all their instruction online/virtually and not coming to schools for tests/assessments. These are students who attend school virtually and do not come to the school in person for either instruction or assessments. This exclusion category was added exceptionally for PISA 2022.
Students can also be excluded if there are no materials available in the language in which the student is taught and if they cannot be assessed for some other reason as agreed upon.
Overall estimated exclusions (including both school and student exclusions) were to be under 5 percent of the PISA target population. To keep PISA as inclusive as possible and to keep the exclusion rate down, the United States used the UH ('Une Heure') instrument designed for students with special education needs. See the description of the UH instrument in the next section.
[1] Four countries—Cambodia, Guatemala, Paraguay, and Vietnam—assessed their students' knowledge and skills in PISA 2022 using paper-based instruments. These countries needed to have a minimum of 35 assessed students in 150 schools for a total of 5,250 assessed students in the PISA sample.
[2] Ireland, the Netherlands, Scotland, the United States, and the United Kingdom were given permission to move the testing dates to the fall in an effort to improve response rates. The range of eligible birth dates was adjusted so that the mean age remained the same (i.e., 15 years and 3 months to 16 years and 2 months at the beginning of the testing period). In 2003, the United States conducted PISA in the spring and fall and found no significant difference in student performance between the two time points. The United States has collected data in the fall in every PISA cycle since 2003.
Sampling and Data Collection in the United States
The PISA 2022 school sample was drawn for the United States by the PISA consortium. The U.S. PISA sample was stratified into 8 explicit groups based on census region of the country (Northeast, Midwest, South, and West)[3], and control of school (public or private). Within each stratum, the frame was sorted for sampling by five categorical stratification variables: grade range of the school (five categories); type of location relative to populous areas (city, suburb, town, rural);[4] combined percentage of Black, Hispanic, Asian, Native Hawaiian/Pacific Islander, and American Indian/Alaska Native students (above or below 15 percent); gender (mostly female (percent female ≥ 95 percent), mostly male (percent female < 5 percent); and other); and state.
The United States took part in the core PISA assessment for mathematics, reading, and science literacy as well as the optional domain of financial literacy. To obtain an adequate sample of students in the United States that also took into consideration historical rates of nonresponse, 52 students aged 15 were randomly sampled within each U.S. school. If fewer than 52 age-eligible students were enrolled in a school, all 15-year-old students in the school were selected. Thus, in each school, each age-eligible student had an equal probability of being selected. To be eligible for PISA, students had to be born between July 1, 2006, and June 30, 2007.
In the United States, of the 52 students who were randomly sampled within each school, 41 students took the mathematics, science, and reading literacy assessments and 11 students took the optional financial literacy assessment with reading or mathematics. The group of students who took the financial literacy assessment are referred to as the "financial literacy sample." This sampling approach used in both 2018 and 2022 cycles is different from the approach used in 2015 when financial literacy was administered to a subset of the students in the main PISA sample. As in past rounds of PISA, the United States planned to assess schools within the maximum testing period of eight consecutive weeks, from October to November 2022.
The U.S. PISA 2022 national school sample consisted of 297 schools. This number represents an increase from the international minimum requirement of 150 and was implemented to offset anticipated school nonresponse and reduce design effects. Schools were selected with probability proportionate to the school's estimated enrollment of 15-year-olds. The data for public schools were from the 2019–20 Common Core of Data (CCD) and the data for private schools were from the 2019–20 Private School Universe Survey (PSS). Any school containing at least one of grades 7 through 12 was included in the school sampling frame. Participating schools provided a list of 15-year-old students (typically in August or September 2022) from which the student sample was drawn using sampling software provided by the international contractor.
In the United States, 4,552 15-year-old students took part in the core PISA 2022 assessment and 1,109 15-year-old students took part in the financial literacy assessment.
In addition to the international response rate standards described in the prior section, the U.S. sample had to meet the statistical standards of the National Center for Education Statistics (NCES) of the U.S. Department of Education. For an assessment like PISA, NCES requires that a nonresponse bias analysis be conducted when the response rate for schools falls below 85 percent or the response rate for students falls below 85 percent.
In order to keep PISA as inclusive as possible and to keep the exclusion rate down, the United States used the UH ('Une Heure') instrument designed for students with special education needs. The UH instrument was available to special education needs students within mainstream schools and contained about half as many items as the regular test instrument. These testing items were deemed more suitable for students with special education needs. A UH student questionnaire was also administered, which only contained trend items from the regular student questionnaire. The timing structure of both the UH test instrument and UH student questionnaire allowed more time per question than the regular instruments and UH sessions were generally held in small groups.
[3] The Northeast region consists of Connecticut, Maine, Massachusetts, New Hampshire, New Jersey, New York, Pennsylvania, Rhode Island, and Vermont. The Midwest region consists of Illinois, Indiana, Iowa, Kansas, Michigan, Minnesota, Missouri, Nebraska, North Dakota, Ohio, South Dakota, and Wisconsin. The South region consists of Alabama, Arkansas, Delaware, the District of Columbia, Florida, Georgia, Kentucky, Louisiana, Maryland, Mississippi, North Carolina, Oklahoma, South Carolina, Tennessee, Texas, Virginia, and West Virginia. The West region consists of Alaska, Arizona, California, Colorado, Hawaii, Idaho, Montana, Nevada, New Mexico, Oregon, Utah, Washington, and Wyoming.
[4] These types are defined as follows: (1) "city" is a territory inside an urbanized area with a core population of 50,000 or more and inside a principal city; (2) "suburb" is a territory inside an urbanized area with a core population of 50,000 or more and outside a principal city; (3) "town" is a territory inside an urban cluster that is greater than 10 miles and fewer than or equal to 35 miles from an urbanized area ; and (4) "rural" is Census-defined rural territory that is fewer than or equal to 5 miles from an urbanized area, as well as fewer than or equal to 2.5 miles from an urban cluster.
Test Development
The 2022 assessment instruments were developed by international experts and PISA consortium test developers and included items submitted by participating education systems. In 2022, the major focus of PISA was on mathematics literacy, with reading and science literacy treated as minor domains. Financial literacy was an optional domain administered by 20 education systems including the United States.
All reading and science items in the 2022 assessment instrument were trend items from previous assessments. Mathematics literacy and financial literacy included both trend items and new items developed for 2022.[5] Items were reviewed by representatives of each country and the PISA subject-matter expert groups for possible bias and relevance to PISA's goals. To further examine potential biases and design issues in the PISA assessment, all participating education systems field-tested the assessment items in spring and fall of 2021. After the field trial, items that did not meet the established measurement criteria or were otherwise found to include intrinsic biases were dropped for the main assessment.
For the PISA main assessment, there were 160 new items and 74 trend items. There were 197 trend items in reading literacy and 115 trend items in science literacy. For reading and science literacy there were no new items since neither reading nor science literacy was the main domain for PISA 2022. Financial literacy had 5 new items and 41 trend items. For the 2022 cycle, the number of assessment items by domain is as shown in table 1:
Domain | New | Trend | Total |
---|---|---|---|
Mathematics literacy | 160 | 74 | 234 |
Reading literacy | 0 | 197 | 197 |
Science literacy | 0 | 115 | 115 |
Financial literacy | 5 | 41 | 46 |
NOTE:The number of new and trend items shown in this table reflect the design for the computer-based PISA assessment only.
SOURCE:Organization for Economic Cooperation and Development (OECD), Program for International Student Assessment (PISA), 2022.
PISA Test Design. To provide the most comprehensive measure of mathematics literacy, PISA would have to present each student with the complete set of test items. Asking students to answer all such items would be the best way to eliminate any gaps or biases in the assessment. However, this would result in a test that would take more than six hours to complete.
To make it feasible to measure student proficiency in all domains, the test material in all PISA cycles, up to and including PISA 2022, was divided into several smaller testlets for mathematics and reading, and 30-minute clusters for science and financial literacy. A hybrid multistage adaptive test design (MSAT) was adopted for the mathematics assessment in PISA 2022 [more details on the PISA multistage adaptive test in mathematics literacy are provided below]. Materials equivalent to 15 30-min clusters, but organized into units, rather than clusters were used for the mathematics adaptive design (7 trend clusters and approximately 12 new clusters). Reading was a minor domain in 2022 and used a reduced multistage adaptive design from the one introduced in 2018.
These clusters were linked across domains and organized into test forms, which were then randomly allocated to students. Students received two 30-minute clusters of test material in the major domain along with two clusters of test material in one or two of the other domains. Each student saw only a small subset of the test material and was thus assessed on only a selection of the skills and competencies that comprise each domain. Nonetheless, students in an education system, when taken as a group, were examined on the complete set of skills.
PISA Multistage Adaptive Test in Mathematics Literacy. A hybrid MSAT was used for mathematics in PISA 2022. The design was "hybrid" because it combined an adaptive testing design with non-adaptive random-rotation design (in the latter, item assignment is not conditional on prior performance).
The MSAT design for Mathematics partitioned the item pool of 234 items (99 units) into three mutually exclusive item sets (each with 78 items). For each of the item sets, Stage 1 "core" testlets of medium difficulty, Stage 2 high- or low-difficulty testlets, and Stage 3 high-, medium-, or low-difficulty testlets were assembled, each comprising 9 or 10 items. The sequence of the item sets was rotated in the final instruments (each constituting one "Stage 1 (core)", one "Stage 2", and one "Stage 3 (testlet)" ), in order to constitute three sets of equivalent instruments to be assigned to three groups of randomly selected students (A, B, and C).
From each item set, 16 testlets of either 9 or 10 items were created within each stage. Therefore, across the three item sets and three groups, there were a total of 144 testlets (16*3*3). Each student took one testlet in each stage; the total number of mathematics items administered to each student ranges from 28 to 30.
PISA Multistage Adaptive Test in Reading Literacy. The PISA 2022 Reading MSAT design was a reduced version of the PISA 2018 Main Survey Design. It used the same adaptive structure (e.g., number of stages) as in 2018, but the 2018 Main Survey Reading item pool was reduced by approximately 25 percent.
In the PISA 2022 reading assessment, there were three stages: Core stage Stage 1 and Stage 2. At the Core stage, six testlets were assembled. At Stage 1 and Stage 2, twelve testlets were assembled (the six more-difficult testlets were labelled as "high" and the six easier testlets were labelled as "low"). For more details on the Multistage Adaptive test design, please see the PISA 2022 Technical Report (forthcoming).
Reading Fluency. In addition to the typical reading literacy items, the 2022 reading literacy instrument includes a measure of reading fluency in the form of sentence processing. This measure requires students to make a sensibility judgment about sentences of increasing complexity and was designed to provide additional information about the reading skills of students at the lower end of the proficiency range. Information from this task, combined with the typical reading literacy items allow for a more thorough understanding of how students differ at various levels of the proficiency scale. In the Main Survey, there were 65 reading fluency sentences that were organized into 5 clusters of 11 sentences and 1 cluster of 10 sentences. Each student as assigned two fluency clusters for a total of 21 or 22 sentences right before the reading literacy clusters. These reading fluency tasks were administered within a 3-minute timed session which means that any sentences not completed within the three-minute session were skipped. Reading fluency items were considered in the computation of students' overall score. However, these items were not included in the computation of subscale scores (neither the text-source subscale nor the reading-process subscale).
PISA Test in Financial Literacy. The assessment of financial literacy was offered as an international option in PISA 2022. Financial literacy was administered only as a computer-based assessment to an additional sample of students at the same schools sampled for PISA. Countries/economies participating in the financial literacy assessment were required to assess 1,650 additional students.
The cognitive instruments for financial literacy included trend items from the PISA 2012, PISA 2015, and PISA 2018 assessments, plus a few new units that were developed for PISA 2022. The total testing time for each student was two hours (120 minutes). Each student that took the financial literacy assessment took 60 minutes of financial literacy items, and then either mathematics or reading items. Students taking the financial literacy assessment did not take any of the science items and therefore they do not have science literacy proficiency estimates. The PISA assessment instruments included 46 financial literacy items, of which 41 were trend items and 5 were new items. These items were organized into two 30-minute clusters of financial literacy that were rotated into eight forms, each containing 60 minutes of financial literacy and 60 minutes of either MSAT mathematics or MSAT reading items.
Background Questionnaires
After the cognitive assessment, students also completed a 43-minute questionnaire designed to provide information about their backgrounds, attitudes, and experiences in school. Principals in schools where PISA was administered also completed a 45-minute questionnaire, administered online, designed to provide information on their school's structure, resources, instruction, climate, and policies.
For both the student and school questionnaires, a set of items were developed for the 2022 administration as an additional module with a focus on effects of the COVID-19 pandemic on student learning and well-being and the degree of interruptions or changes to education across participating countries/economies.
A 10-minute computer-based financial literacy questionnaire was administered to all participating students in countries/economies that were taking the assessment on computer and administered the financial literacy assessment. It included questions about students' access to financial information and education as well as their practical financial experiences.
PISA included a teacher questionnaire that was optional for countries/economies, but the United States did not implement the survey in 2022.
Translation and Adaptation
Source versions of all instruments (the assessment booklets, questionnaires, and operations manuals) were prepared in English and French and translated into the primary language or languages of instruction in each education system. The PISA consortium recommended a double translation design and provided precise translation guidelines that included a description of the features each item was measuring and statistical analysis from the field trial. This entailed having two independent translations, one from each of the source languages (English and French), and reconciliation by a third party. When double translation was not possible, single translation was accepted. In addition, the PISA consortium verified the instrument translation when more than 10 percent of an education system's PISA population used a national language that was neither French nor English.
Instrument adaptation was necessary even in nations such as the United States that use English as the primary language of instruction. These adaptations were primarily for cultural purposes. For example, words such as "lift"; might be adapted to "elevator" for the United States. The PISA consortium verified and approved the national adaptation of all instruments, including that of the United States.
Test Administration and Quality Assurance
The PISA consortium emphasized the use of standardized procedures in all education systems. Each education system collected its own data, based on detailed manuals provided by the PISA consortium that explained the survey's implementation, including precise instructions for the work of school coordinators and test administrators and scripts for test administrators to use in testing sessions. Test administration in the United States was conducted by professional staff trained in accordance with the international guidelines. Students could use calculators, and U.S. students were provided calculators.
In each education system, a PISA Quality Monitor (PQM) who was engaged independently by the PISA consortium observed test administrations in a subsample of participating schools. The schools in which the independent observations were conducted were selected jointly by the PISA consortium and the PQM. In the United States, there were three PQMs who observed 15 schools. The PQM's primary responsibility was to document the extent to which testing procedures in schools were implemented in accordance with test administration procedures. The PQM's observations in U.S. schools indicated that international procedures for data collection were applied consistently.
[5] In the vast majority of participating countries, PISA 2022 was a computer-based assessment. However, four countries—Cambodia, Guatemala, Paraguay, and Vietnam—assessed their students' knowledge and skills in PISA 2022 using paper-based instruments. These paper-based tests were offered to countries who were not ready, or did not have the resources, to transition to a computer-based assessment. The paper-based tests comprise a subset of the tasks included in the computer-based version of the tests, all of which were developed in earlier cycles of PISA.
Weighting
The use of sampling weights is necessary for computing statistically sound, nationally representative estimates. Adjusted survey weights adjust for the probabilities of selection for individual schools and students, for school or student nonresponse, and for errors in estimating the size of the school or the number of 15-year-olds in the school at the time of sampling. Survey weighting for all education systems participating in PISA 2022 was coordinated by Westat, as part of the international PISA consortium.
The school base weight was defined as the reciprocal of the school's probability of selection multiplied by the number of eligible students in the school. (For replacement schools, the school base weight was set equal to the original school it replaced.) The student base weight was given as the reciprocal of the probability of selection for each selected student from within a school.
The product of these base weights was then adjusted for school and student nonresponse. The school nonresponse adjustment was done individually for each education system by cross classifying the explicit and implicit stratification variables defined as part of the sample design.
The student nonresponse adjustment was done within cells based first on their school nonresponse cell and their explicit stratum; within that, grade and gender were used when possible. All PISA analyses were conducted using these adjusted sampling weights. For more information on the nonresponse adjustments, see PISA 2022 Technical Report (OECD 2023).
Scaling of Student Test Data
Each test form had a different subset of items. Because each student completed only a subset of all possible items, classical test scores, such as the percentage correct, are not accurate measures of student performance. Instead, scaling techniques were used to establish a common scale for all students. For PISA 2022, item response theory (IRT) was used to estimate average scores for mathematics, reading, and science literacy for each education system, as well as for four mathematics process and four mathematics content subscales. For education systems participating in the optional financial literacy assessment these assessments were scaled separately and assigned separate scores. IRT identifies patterns of response and uses statistical models to predict the probability of answering an item correctly as a function of the students' proficiency in answering other questions. With this method, the performance of a sample of students in a subject area or subarea can be summarized on a simple scale or series of scales, even when students are administered different items.
Scores for students were estimated as plausible values because each student completed only a subset of items. Ten plausible values were estimated for each student for each scale. These values represented the distribution of potential scores for all students in the population with similar characteristics and identical patterns of item response. Statistics describing performance on the PISA mathematics, reading, science, and financial literacy scales are based on plausible values. In PISA, the mathematics, reading, science, and financial literacy scales are from 0–1,000.
Proficiency Levels
In addition to using a range of scale scores as the basic form of measurement, PISA describes student proficiency in terms of levels of proficiency. Higher levels represent the knowledge, skills, and capabilities needed to perform tasks of increasing complexity. PISA results are reported in terms of percentages of the student population at each of the predefined levels.
To determine the performance levels and cut scores on the literacy scales, IRT techniques were used. With IRT techniques, it is possible to simultaneously estimate the ability of all students taking the PISA assessment, as well as the difficulty of all PISA items. Estimates of student ability and item difficulty can then be mapped on a single continuum. The relative ability of students taking a particular test can be estimated by considering the percentage of test items they get correct. The relative difficulty of items in a test can be estimated by considering the percentage of students getting each item correct. In PISA, all students within a level are expected to answer at least half of the items from that level correctly. Students at the bottom of a level are able to provide the correct answers to about 52 percent of all items from that level, have a 62 percent chance of success on the easiest items from that level, and have a 42 percent chance of success on the most difficult items from that level. Students in the middle of a level have a 62 percent chance of correctly answering items of average difficulty for that level (an overall response probability of 62 percent). Students at the top of a level are able to provide the correct answers to about 70 percent of all items from that level, have a 78 percent chance of success on the easiest items from that level, and have a 62 percent chance of success on the most difficult items from that level. Students just below the top of a level would score less than 50 percent on an assessment at the next higher level. Students at a particular level demonstrate not only the knowledge and skills associated with that level but also the proficiencies defined by lower levels. Patterns of responses for students in the proficiency levels labeled below level 1c for reading literacy, below level 1b for science literacy, and below level 1c for mathematics literacy and financial literacy suggest that these students are unable to answer at least half of the items from those levels correctly. For details about the approach to defining and describing the PISA proficiency levels and establishing the cut scores, see PISA 2022 Technical Report (OECD 2023). Table 2 shows the cut scores for each proficiency level for reading, science, mathematics, and financial literacy.
Proficiency level | Mathematics | Reading | Science | Financial |
---|---|---|---|---|
Level 1 (1c) | 233.17 to less than 295.47 | 189.33 to less than 262.04 | — | — |
Level 1 (1b) | 295.47 to less than 357.77 | 262.04 to less than 334.75 | 260.54 to less than 334.94 | — |
Level 1 (1a) | 357.77 to less than 420.07 | 334.75 to less than 407.47 | 334.94 to less than 409.54 | 325.57 to less than 400.33 |
Level 2 | 420.07 to less than 482.38 | 407.47 to less than 480.18 | 409.54 to less than 484.14 | 400.33 to less than 475.10 |
Level 3 | 482.38 to less than 544.68 | 480.18 to less than 552.89 | 484.14 to less than 558.73 | 475.10 to less than 549.86 |
Level 4 | 544.68 to less than 606.99 | 552.89 to less than 625.61 | 558.73 to less than 633.33 | 549.86 to less than 624.63 |
Level 5 | 606.99 to less than 669.30 | 625.61 to less than 698.32 | 633.33 to less than 707.93 | 624.63 to less than 1000 |
Level 6 | 669.30 to less than 1000 | 698.32 to less than 1000 | 707.93 to less than 1000 | — |
— Not available.
NOTE:For reading and mathematics literacy, proficiency level 1 is composed of three levels, 1a, 1b, and 1c. For science literacy, proficiency level 1 is composed of two levels, 1a and 1b. The score range for below level 1 refers to scores below level 1c for reading and mathematics, level 1b for science, and level 1a for financial literacy.
SOURCE:Organization for Economic Cooperation and Development (OECD), Program for International Student Assessment (PISA), 2022.
Data Limitations
As with any study, there are limitations to PISA 2022 that should be taken into consideration. Estimates produced using data from PISA 2022 are subject to two types of error: nonsampling errors and sampling errors.
Nonsampling error is a term used to describe variations in the estimates that may be caused by population coverage limitations, nonresponse bias, and measurement error, as well as data collection, processing, and reporting procedures. For example, suppose the study was unsuccessful in getting permission from many rural schools in a certain region of the country. In that case, reports of means for rural schools for that region may be biased. Fortunately, such a coverage problem did not occur in PISA in the United States. The sources of nonsampling errors are typically problems such as unit and item nonresponse, the differences in respondents' interpretations of the meaning of survey questions, and mistakes in data preparation.
Sampling errors arise when a sample of the population, rather than the whole population, is used to estimate some statistic. Different samples from the same population would likely produce somewhat different estimates of the statistic in question. This means that there is a degree of uncertainty associated with statistics estimated from a sample. This uncertainty is referred to as sampling variance and is usually expressed as the standard error of a statistic estimated from sample data. The approach used for calculating standard errors in PISA is the Fay method of balanced repeated replication (BRR) (Judkins 1990). This method of producing standard errors uses information about the sample design to produce more accurate standard errors than would be produced using simple random sample assumptions.
Standard errors can be used as a measure of the precision expected from a particular sample. Standard errors for all statistics reported in this report are available in each of the downloadable tables that accompany each online figure and table at the PISA webpage on NCES' website[6]
Confidence intervals provide a way to make inferences about population statistics in a manner that reflects the sampling error associated with the statistic. Assuming a normal distribution and a 95 percent confidence interval, the population value of this statistic can be inferred to lie within the confidence interval in 95 out of 100 replications of the measurement on different samples drawn from the same population.
[6] The link to the PISA webpage is https://nces.ed.gov/surveys/pisa/pisa2022.
Descriptions of Background Variables
In this report, PISA 2022 results are provided for groups of students with different demographic characteristics. Definitions of student population groups are as follows:
Gender: Results are reported separately for male students and female students.
Race/ethnicity: In the United States, students' race/ethnicity was obtained through student responses to a two-part question in the student questionnaire. Students were asked first whether they were Hispanic or Latino and then whether they were members of the following racial groups: White (non-Hispanic), Black (non-Hispanic), Asian (non-Hispanic), American Indian or Alaska Native (non-Hispanic), or Native Hawaiian/Other Pacific Islander (non-Hispanic). Multiple responses to the race classification were allowed. Results are shown separately for White (non-Hispanic), Black (non-Hispanic), Hispanic, Asian (non-Hispanic), and non-Hispanic students who selected more than one race (labeled as Two or More Races). Students identifying themselves as Hispanic and one or more races were included in the Hispanic group, rather than in a racial group.
PISA index of economic, social, and cultural status (ESCS): PISA uses a composite measure that combines into a single score the financial, social, cultural and human capital resources available to students. ESCS has been computed and used in analyses since the first cycle of PISA in 2000. Currently, the ESCS index is derived from three indices: highest parental occupation (HISEI), highest parental education (PARED), and one IRT scale based on student reports on home possessions including books in the home (HOMEPOS).
The three indices on which it is based are described below:
- HISEI (Index of highest parental occupation status): This derived variable represents the higher of the (typically two) parental occupation status variables. They are quantifications of parents' occupation variables provided by the students participating in PISA. The coding of occupations is based on the International Standard Classification of Occupations (ISCO-08).
- PARED (Index of highest parental education in years of schooling): This derived variable is based on HISCED which in turn represents the higher ISCED classification of the (typically two) levels of schooling attained by the students' parents. They are quantifications of parents' educational attainment variables provided by the students. The HISCED codes are then mapped to (approximate) years of schooling. This mapping of ISCED levels to years of education is specific for each participating country.
- HOMEPOS (Home possessions): This derived variable represents the availability of possessions at home and is based on 25 items asked of students in the background questionnaire, including country-specific indicator items.
Eligibility for Free or Reduced-price Lunch (FRPL): The percentage of students receiving free or reduced-price lunch is often used as a proxy measure for the percentage of students living in poverty. While the percentage of students receiving free or reduced-priced lunch can provide some information about relative poverty, it is not the actual percentage of students in poverty enrolled in school. The National School Lunch Program provides meals to millions of children each school day. All lunches provided by the National School Lunch Program are considered subsidized to some extent because meal-service programs at schools must operate as non-profit programs. While all students at participating schools are eligible for regular priced lunches through the National School Lunch Program, there are multiple ways in which a student can become eligible for a free/reduced price lunch. Traditionally, family income has been used to establish eligibility for free/reduced price lunch. Despite its limitations, the free/reduced price lunch data are often used by education researchers as a proxy for school poverty since this count is generally available at the school level, while the poverty rate is typically not available. In the U.S.-version of the PISA school questionnaire, principals were asked for the percentage of students in school eligible for FRPL.
Confidentiality and Disclosure Limitations
Confidentiality analyses for the United States were designed to provide reasonable assurance that public-use data files issued by the PISA consortium would not allow identification of individual U.S. schools or students when compared against other public-use data collections. Disclosure limitations included identifying and masking potential disclosure risk to PISA schools and including an additional measure of uncertainty to school and student identification through random swapping of data elements within the student and school file. Swapping was designed to not significantly affect estimates of means and variances for the whole sample or reported subgroups (Krenzke et al. 2006).
Statistical Procedures
Comparisons made in the text of this report have been tested for statistical significance. For example, in the commonly made comparison of OECD averages to U.S. averages, tests of statistical significance were used to establish whether the observed differences from the U.S. average were statistically significant.
In almost all instances, the tests for significance used were standard t tests. These fell into three categories according to the nature of the comparison being made: comparisons of independent samples, comparisons of nonindependent samples, and comparisons of performance over time. In PISA, education system groups are independent. We judge that a difference is "significant" if the probability associated with the t test is less than .05. If a test is significant this implies that difference in the observed means in the selected sample likely represents a real difference in the population.[7] No adjustments were made for multiple comparisons.
In simple comparisons of independent averages, such as the average score of education system 1 with that of education system 2, the following formula was used to compute the t statistic:
where est1 and est2 are the estimates being compared (e.g., averages of education system 1 and education system 2) and se12 and se22 are the corresponding squared standard errors of these averages. The PISA 2022 data are hierarchical and include school and student data from the participating schools. The standard errors for each education system take into account the clustered nature of the sampled data. These standard errors are not adjusted for correlations between groups since groups are independent.
The second type of comparison occurs when evaluating differences between nonindependent groups within the education system. Because of the sampling design in which schools and students within schools are randomly sampled, the data within the education system from mutually exclusive sets of students (for example, males and females) are not independent. For example, to determine whether the performance of females differs from that of males would require estimating the correlation between females' and males' scores. A BRR procedure, mentioned above, was used to estimate the standard errors of differences between nonindependent samples within the United States. Use of the BRR procedure implicitly accounts for the correlation between groups when calculating the standard errors.
To test comparisons between nonindependent groups the following t statistic formula was used:
where estgrp1 and estgrp2 are the nonindependent group estimates being compared and se(grp1-grp2) is the standard error of the difference calculated using BRR to account for the correlation between the estimates for the two nonindependent groups.
A third type of comparison—the addition of a standard error term to the standard t test shown above for simple comparisons of independent averages—was also used when analyzing change in performance over time. The transformation that was performed to equate the 2022 data with previous data depends upon the change in difficulty of each of the individual link items and as a consequence the sample of link items that have been chosen will influence the choice of transformation. This means that if an alternative set of link items had been chosen the resulting transformation would be slightly different. The consequence is an uncertainty in the transformation due to the sampling of the link items, just as there is an uncertainty in values such as country means due to the use of a sample of students. This uncertainty that results from the link item sampling is referred to as "linking error," and this error must be taken into account when making certain comparisons between previous rounds of PISA (2000, 2003, 2006, 2009, 2012, 2015, and 2018) and PISA 2022 results. Just as with the error that is introduced through the process of sampling students, the exact magnitude of this linking error cannot be determined. We can, however, estimate the likely range of magnitudes for this error and take this error into account when interpreting PISA results. As with sampling errors, the likely range of magnitude for the errors is represented as a standard error.
Comparison | Mathematics | Reading | Science | Financial literacy |
---|---|---|---|---|
2000 vs. 2022 | † | 6.67 | † | † |
2003 vs. 2022 | 5.54 | 5.25 | † | † |
2006 vs. 2022 | 4.09 | 8.56 | 3.68 | † |
2009 vs. 2022 | 4.28 | 4.66 | 5.92 | † |
2012 vs. 2022 | 3.58 | 6.01 | 5.20 | 4.05 |
2015 vs. 2022 | 2.74 | 3.63 | 1.38 | 3.47 |
2018 vs. 2022 | 2.24 | 1.47 | 1.61 | 2.20 |
† Not applicable.
NOTE:Comparisons between PISA 2022 scores and previous assessments can only be made to when the subject first became a major domain. As a result, comparisons of reading can be made as far back as PISA 2000; mathematics comparisons can be made as far back as PISA 2003; and science comparisons can be made as far back as PISA 2006. Financial literacy comparisons can be made as far back as 2012 when it was first administered as an optional domain in the United States.
SOURCE:Organization for Economic Cooperation and Development (OECD), Program for International Student Assessment (PISA), 2022.
In PISA, in each of the three subject matter areas, a common transformation was estimated from the link items, and this transformation was applied to all participating education systems when comparing achievement scores over time. It follows that any uncertainty that was introduced through the linking is common to all students and all education systems. Thus, for example, suppose the unknown linking error (between PISA 2018 and PISA 2022) in reading literacy resulted in an over-estimation of student scores by about four points on the PISA 2018 scale. It follows that every student's score will be over-estimated by four score points. This over-estimation will have effects on certain, but not all, summary statistics computed from the PISA 2022 data. For example, consider the following:
- each education system's mean will be over-estimated by an amount equal to the link error (in our example this is four score points);
- the mean performance of any student subgroup will be over-estimated by an amount equal to the link error (in our example this is four score points);
- the standard deviation of student scores will not be affected because the overestimation of each student by a common error does not change the standard deviation;
- the difference between the mean scores of two education systems in PISA 2022 will not be influenced because the over-estimation of each student by a common error will have distorted each system's mean by the same amount;
- the difference between the mean scores of two student groups (e.g., males and females) in PISA 2022 will not be influenced because the over-estimation of each student by a common error will have distorted each group's mean by the same amount;
- the difference between the performance of a group of students (e.g., an education system) between PISA 2018 and PISA 2022 will be influenced because each student's score in PISA 2022 will be influenced by the error; and
- a change in the difference in performance between two groups from PISA 2018 to PISA 2022 will not be influenced. This is because neither of the components of this comparison, which are differences in scores in 2018 and 2022 respectively, is influenced by a common error that is added to all student scores in PISA 2022.
In general terms, the linking error need only be considered when comparisons are being made between PISA 2018 and PISA 2022 results, and then usually only when group means are being compared. The most obvious example of a situation where there is a need to use linking error is in the comparison of the mean performance for a single education system between PISA 2018 and PISA 2022. For example, let us consider a comparison between 2018 and 2022 of the performance of the United States in mathematics. The mean performance of the United States in 2018 was 478 with a standard error of 3.24, while in 2022 the mean was 465 with a standard error of 4.01. Using rounded mean values, the standardized difference in the U.S. means is 2.313., which is computed as follows:
and is statistically significant.
ESCS quarters. The ESCS index is a student-level, internationally comparable measure of socioeconomic status. For comparisons made in the text of this report, students in each participating country/economy were grouped into four quarters using the quartiles of the distribution of ESCS in the United States. These make each of these four groups comparable in their ESCS level to those in the corresponding level in the United States. Those in the bottom ESCS quarter report the highest levels of poverty while those in the top quarter report the lowest levels of poverty.
[7] A .05 probability implies that the t statistic is among the 5 percent most extreme values one would expect if there were no difference between the means. The decision rule is that when t statistics are this extreme, the samples represent populations that likely have different means.
Response Rates and Population Coverage
This section describes the success of participating education systems in meeting the international technical standards on data collection. Information is provided for all participating education systems on their coverage of the target population, exclusion rates, and response rates. Table 3 provides information on weighted school participation rates before and after school replacement and the number of participating schools after replacement for each participating education system. Table 4 provides information on coverage of the target population, overall exclusion rates, weighted student response rates after school replacement, and the number of participating students after replacement for each participating education system.
In the United States, 125 original schools and 29 replacement schools participated in the 2022 administration of PISA. This resulted in 154 participating schools and an overall weighted school response rate of 63 percent. In the United States, 4,552 15-year-old students took part in the core PISA 2022 assessment. The U.S. overall student exclusion rate was 6.1 percent.
See section on International Requirements, above, for PISA international sampling guidelines and requirements regarding accommodations, exclusions, and response rate requirements, as well as response rates of all participating education systems (OECD 2023).
Education system | Percent Weighted school participation before replacement | Percent Weighted school participation after replacement | Number of participating schools after replacement |
---|---|---|---|
Albania | 94.7 | 94.7 | 274 |
Argentina | 98.3 | 99.2 | 457 |
Australia | 92.5 | 95.6 | 743 |
Austria | 95.7 | 96.3 | 302 |
Baku (Azerbaijan) | 100.0 | 100.0 | 178 |
Belgium | 80.3 | 91.4 | 285 |
Brazil | 80.9 | 95.6 | 599 |
Brunei Darussalam | 100.0 | 100.0 | 54 |
Bulgaria | 84.5 | 97.7 | 202 |
Cambodia | 99.6 | 100.0 | 183 |
Canada | 81.3 | 85.6 | 867 |
Chile | 84.3 | 94.2 | 230 |
Chinese Taipei | 82.6 | 83.8 | 182 |
Colombia | 96.6 | 99.2 | 262 |
Costa Rica | 99.0 | 99.0 | 198 |
Croatia | 99.8 | 99.8 | 180 |
Cyprus | 97.5 | 97.5 | 101 |
Czech Republic | 100.0 | 100.0 | 430 |
Denmark | 90.1 | 96.2 | 347 |
Dominican Republic | 98.5 | 99.4 | 253 |
El Salvador | 99.6 | 99.9 | 290 |
Estonia | 99.4 | 99.4 | 196 |
Finland | 99.5 | 99.5 | 241 |
France | 99.6 | 99.6 | 282 |
Georgia | 93.6 | 99.8 | 267 |
Germany | 92.9 | 98.2 | 257 |
Greece | 90.1 | 96.1 | 230 |
Guatemala | 85.0 | 92.6 | 290 |
Hong Kong (China) | 59.6 | 79.9 | 163 |
Hungary | 88.8 | 98.6 | 270 |
Iceland | 96.4 | 96.4 | 134 |
Indonesia | 99.3 | 99.8 | 410 |
Ireland | 99.4 | 100.0 | 170 |
Israel | 90.7 | 92.9 | 193 |
Italy | 96.0 | 99.4 | 345 |
Jamaica | 89.8 | 90.9 | 147 |
Japan | 91.9 | 91.9 | 182 |
Jordan | 100.0 | 100.0 | 260 |
Kazakhstan | 98.5 | 100.0 | 571 |
Korea | 88.9 | 99.7 | 186 |
Kosovo | 96.1 | 96.1 | 229 |
Latvia | 83.9 | 88.7 | 225 |
Lithuania | 99.6 | 100.0 | 292 |
Macao (China) | 100.0 | 100.0 | 46 |
Malaysia | 99.7 | 99.7 | 199 |
Malta | 100.0 | 100.0 | 46 |
Mexico | 95.9 | 98.9 | 280 |
Moldova | 99.7 | 99.7 | 265 |
Mongolia | 100.0 | 100.0 | 195 |
Montenegro | 98.8 | 98.8 | 63 |
Morocco | 99.8 | 100.0 | 178 |
Netherlands | 65.5 | 89.6 | 154 |
New Zealand | 61.4 | 72.4 | 169 |
North Macedonia | 100.0 | 100.0 | 111 |
Norway | 98.7 | 99.1 | 267 |
Palestinian Authority | 99.0 | 100.0 | 273 |
Panama | 84.1 | 91.3 | 215 |
Paraguay | 98.7 | 99.6 | 281 |
Peru | 94.0 | 99.9 | 337 |
Philippines | 100.0 | 100.0 | 188 |
Poland | 88.6 | 96.1 | 240 |
Portugal | 94.7 | 99.2 | 224 |
Qatar | 100.0 | 100.0 | 229 |
Romania | 100.0 | 100.0 | 262 |
Saudi Arabia | 91.9 | 99.6 | 193 |
Serbia | 98.7 | 98.7 | 183 |
Singapore | 98.5 | 98.5 | 164 |
Slovak Republic | 90.5 | 95.5 | 288 |
Slovenia | 97.2 | 97.3 | 345 |
Spain | 97.7 | 99.1 | 966 |
Sweden | 97.8 | 98.9 | 262 |
Switzerland | 95.1 | 98.2 | 259 |
Thailand | 98.8 | 99.5 | 279 |
Turkey | 99.4 | 100.0 | 196 |
Ukrainian regions (18 of 27) | 79.8 | 91.0 | 164 |
United Arab Emirates | 99.8 | 99.8 | 840 |
United Kingdom | 67.3 | 81.8 | 451 |
United States | 51.4 | 63.3 | 154 |
Uruguay | 99.4 | 99.9 | 222 |
Uzbekistan | 100.0 | 100.0 | 202 |
Vietnam | 100.0 | 100.0 | 178 |
NOTE:In calculating school participation rates, each school received a weight equal to the product of its base weight (the reciprocal of its probability of selection) and the number of age-eligible students enrolled in the school, as indicated on the sampling frame. Weighted school participation before replacement refers to the sum of weights of the original sample schools with PISA-assessed students and a student response rate of at least 33 percent over the sum of weights of all eligible original sample schools. Weighted school participation after replacement refers to the sum of weights of the original and replacement schools with PISA-assessed students and a student response rate of at least 33 percent over the sum of weights of responding original sample schools, responding replacement schools, and eligible refusing original sample schools. Italics indicate non-OECD countries and education systems.
SOURCE:Organization for Economic Cooperation and Development (OECD), Program for International Student Assessment (PISA), 2022.
Education system | Total population of 15-year-olds (number) | Percent Coverage of 15-year-olds population | Percent Coverage of national desired population | Percent Overall student exclusion rate | Percent Weighted student participation after replacement | Number of participating students |
---|---|---|---|---|---|---|
Albania | 29,039 | 79.2 | 99.3 | 0.7 | 86.5 | 6,129 |
Argentina | 688,260 | 83.7 | 98.4 | 1.6 | 85.8 | 12,111 |
Australia | 285,436 | 89.5 | 93.1 | 6.9 | 76.1 | 13,437 |
Austria | 81,024 | 88.8 | 96.5 | 3.5 | 88.8 | 6,151 |
Baku (Azerbaijan) | 28,475 | 73.3 | 95.8 | 4.2 | 87.8 | 7,720 |
Belgium | 125,100 | 99.1 | 97.6 | 2.4 | 86.6 | 8,286 |
Brazil | 2,692,533 | 76.1 | 96.8 | 3.2 | 84.2 | 10,798 |
Brunei Darussalam | 6,633 | 98.0 | 99.1 | 0.9 | 93.2 | 5,576 |
Bulgaria | 56,061 | 80.0 | 97.3 | 2.7 | 88.8 | 6,107 |
Cambodia | 201,962 | 36.3 | 99.3 | 0.7 | 99.4 | 5,279 |
Canada | 374,753 | 92.2 | 94.2 | 5.8 | 77.0 | 23,073 |
Chile | 224,344 | 86.5 | 97.1 | 2.9 | 84.0 | 6,488 |
Chinese Taipei | 199,619 | 92.8 | 98.5 | 1.5 | 82.3 | 5,857 |
Colombia | 685,175 | 72.9 | 99.4 | 0.6 | 91.8 | 7,804 |
Costa Rica | 64,582 | 77.6 | 99.9 | 0.1 | 92.0 | 6,113 |
Croatia | 37,552 | 89.2 | 94.6 | 5.4 | 85.2 | 6,135 |
Cyprus | 9,113 | 94.3 | 95.5 | 4.5 | 83.8 | 6,515 |
Czech Republic | 101,450 | 91.5 | 98.0 | 2.0 | 91.2 | 8,460 |
Denmark | 65,490 | 83.6 | 88.4 | 11.6 | 84.2 | 6,200 |
Dominican Republic | 136,830 | 64.3 | 98.6 | 1.4 | 92.7 | 6,868 |
El Salvador | 75,000 | 61.1 | 98.9 | 1.1 | 93.6 | 6,705 |
Estonia | 13,640 | 93.9 | 94.1 | 5.9 | 88.2 | 6,392 |
Finland | 60,913 | 95.2 | 96.7 | 3.3 | 88.7 | 10,239 |
France | 795,091 | 93.4 | 96.3 | 3.7 | 90.7 | 6,770 |
Georgia | 43,737 | 86.3 | 95.1 | 4.9 | 98.1 | 6,583 |
Germany | 729,330 | 91.9 | 97.5 | 2.5 | 88.0 | 6,116 |
Greece | 101,556 | 91.4 | 98.5 | 1.5 | 92.4 | 6,403 |
Guatemala | 168,154 | 47.7 | 99.9 | 0.1 | 91.4 | 5,190 |
Hong Kong (China) | 54,429 | 81.4 | 95.7 | 4.3 | 75.3 | 5,907 |
Hungary | 91,101 | 86.2 | 95.3 | 4.7 | 92.3 | 6,198 |
Iceland | 4,577 | 94.1 | 95.2 | 4.8 | 80.1 | 3,360 |
Indonesia | 4,008,391 | 84.9 | 98.5 | 1.5 | 95.2 | 13,439 |
Ireland | 63,204 | 102.3 | 96.4 | 3.6 | 76.8 | 5,569 |
Israel | 137,723 | 89.9 | 96.2 | 3.8 | 84.1 | 6,251 |
Italy | 527,307 | 86.7 | 96.9 | 3.1 | 91.9 | 10,552 |
Jamaica | 50,760 | 58.4 | 99.1 | 0.9 | 67.6 | 3,873 |
Japan | 1,043,449 | 92.0 | 97.5 | 2.5 | 91.9 | 5,760 |
Jordan | 141,443 | 94.0 | 98.8 | 1.2 | 97.5 | 7,799 |
Kazakhstan | 286,244 | 93.4 | 95.8 | 4.2 | 98.3 | 19,769 |
Korea | 414,550 | 102.4 | 98.5 | 1.5 | 94.4 | 6,454 |
Kosovo | 24,136 | 86.3 | 99.4 | 0.6 | 91.1 | 6,027 |
Latvia | 18,507 | 85.0 | 92.1 | 7.9 | 88.5 | 5,373 |
Lithuania | 25,225 | 92.5 | 93.5 | 6.5 | 92.7 | 7,257 |
Macao (China) | 4,453 | 98.3 | 99.6 | 0.4 | 99.1 | 4,384 |
Malaysia | 421,552 | 74.9 | 98.5 | 1.5 | 93.5 | 7,069 |
Malta | 4,125 | 92.6 | 96.1 | 3.9 | 79.1 | 3,127 |
Mexico | 1,582,817 | 63.5 | 98.6 | 1.4 | 94.9 | 6,288 |
Moldova | 29,633 | 97.4 | 98.3 | 1.7 | 94.1 | 6,235 |
Mongolia | 43,266 | 87.1 | 99.2 | 0.8 | 97.9 | 6,999 |
Montenegro | 6,735 | 92.9 | 96.0 | 4.0 | 94.6 | 5,793 |
Morocco | 480,823 | 76.2 | 99.5 | 0.5 | 98.1 | 6,867 |
Netherlands | 180,190 | 78.6 | 91.6 | 8.4 | 80.9 | 5,046 |
New Zealand | 57,876 | 90.3 | 94.2 | 5.8 | 71.7 | 4,682 |
North Macedonia | 17,919 | 90.7 | 96.3 | 3.7 | 89.6 | 6,610 |
Norway | 63,504 | 91.0 | 92.7 | 7.3 | 86.7 | 6,611 |
Palestinian Authority | 94,729 | 78.2 | 99.7 | 0.3 | 96.2 | 7,905 |
Panama | 64,812 | 57.7 | 98.9 | 1.1 | 76.8 | 4,544 |
Paraguay | 91,143 | 71.9 | 98.5 | 1.5 | 92.0 | 5,084 |
Peru | 520,109 | 86.3 | 96.7 | 3.3 | 97.5 | 6,968 |
Philippines | 1,709,495 | 83.3 | 98.7 | 1.3 | 95.2 | 7,193 |
Poland | 346,226 | 89.2 | 95.2 | 4.8 | 81.0 | 6,011 |
Portugal | 101,878 | 92.5 | 96.0 | 4.0 | 86.1 | 6,793 |
Qatar | 19,126 | 93.7 | 97.3 | 2.7 | 89.0 | 7,676 |
Romania | 169,172 | 76.2 | 97.1 | 2.9 | 97.4 | 7,364 |
Saudi Arabia | 336,717 | 81.5 | 96.8 | 3.2 | 97.1 | 6,928 |
Serbia | 64,948 | 86.9 | 96.2 | 3.8 | 91.2 | 6,413 |
Singapore | 42,626 | 95.3 | 98.1 | 1.9 | 91.4 | 6,606 |
Slovak Republic | 48,108 | 95.6 | 97.5 | 2.5 | 90.9 | 5,824 |
Slovenia | 19,294 | 99.6 | 97.2 | 2.8 | 82.5 | 6,721 |
Spain | 485,188 | 90.4 | 96.0 | 4.0 | 86.3 | 30,800 |
Sweden | 119,747 | 89.1 | 92.6 | 7.4 | 85.1 | 6,072 |
Switzerland | 78,108 | 90.8 | 94.2 | 5.8 | 90.9 | 6,829 |
Thailand | 699,541 | 74.6 | 98.5 | 1.5 | 96.4 | 8,495 |
Turkey | 1,109,307 | 73.7 | 94.4 | 5.6 | 98.0 | 7,250 |
Ukrainian regions (18 of 27) | 244,954 | 41.6 | 63.9 | 36.1 | 86.9 | 3,876 |
United Arab Emirates | 64,029 | 93.5 | 97.4 | 2.6 | 92.9 | 24,600 |
United Kingdom | 726,937 | 96.9 | 95.1 | 4.9 | 75.2 | 12,972 |
United States | 4,120,742 | 86.4 | 93.9 | 6.1 | 79.9 | 4,552 |
Uruguay | 43,774 | 84.5 | 99.7 | 0.3 | 86.7 | 6,618 |
Uzbekistan | 509,948 | 88.1 | 95.8 | 4.2 | 98.1 | 7,293 |
Vietnam | 1,156,735 | 68.4 | 99.3 | 0.7 | 99.4 | 6,068 |
NOTE:In calculating student participation rates, each student received a weight (student base weight) equal to the product of the school base weight—for the school in which the student was enrolled—and the reciprocal of the student selection probability within the school. Coverage of 15-year-old population refers to the extent to which the weighted participants covered the target population of all enrolled students in grades 7 and above. Coverage of national desired population refers to the extent to which the weighted participants covered the national desired target population of 15-year-olds under the non-excluded portion of the student sample. Overall student exclusion rate is the percentage of students excluded for intellectual or functional disabilities, or insufficient assessment language experience at either the school level or within schools. Weighted student participation rate after replacement refers to the sum of weights of assessed students over the sum of weights of all sampled students. Italics indicate non-OECD countries and education systems.
SOURCE:Organization for Economic Cooperation and Development (OECD), Program for International Student Assessment (PISA), 2022.
U.S. Nonresponse Bias Analysis
NCES statistical standards call for a nonresponse bias analysis to be conducted on a sample with a response rate below 85 percent. Nonresponse bias analysis is aimed at evaluating whether the responding sample is representative of the population of inference in terms of student and school characteristics.
School-level Non-Response Bias Analysis
In PISA 2022, school response rates for the U.S. were 51 percent before replacement and 63 percent after replacement.
In order to evaluate if the responding schools were representative of the U.S. schools (15-year-old students were enrolled in), they were compared to the population of the U.S. schools in terms of the following categorical and continuous variables.
The following categorical variables were available in the sampling frame for all schools:
- School control—indicates whether the school is under public control (operated by publicly elected or appointed officials) or private control (operated by privately elected or appointed officials and derives its major source of funds from private sources):
- Locale—urban-centric locale code (i.e., city, suburb, town, rural):
- Census region—Northeast, Midwest, South, and West:
- Poverty level—for public schools, a high-poverty school is defined as one in which 50 percent or more of the students are eligible for participation in the FRPL program, and a low-poverty school is defined as one in which less than 50 percent are eligible; and
- School size—age-eligible enrollment of school (as shown on school samples frame) divided into three equally sized categories (small, medium, and large).
The following continuous variables were available in the sampling frame for all schools:
- Estimated number of age-eligible students enrolled;
- Total number of students;
- Percentage of students in seven race/ethnicity categories (White; Black; Hispanic; Asian; American Indian or Alaska Native; Hawaiian/Pacific Islander; and Two or More Races); and
- Percentage of students eligible to participate in the FRPL (only for public schools).
For categorical variables, the distribution of frame characteristics for participating schools was compared with the distribution for all eligible schools. For continuous variables, summary means were calculated and the difference between means was tested using a t test. These analyses showed that after nonresponse adjustments were applied as part of the analytic weighting process, these analyses showed there was little evidence of resulting potential bias.
Student-level Non-Response Bias Analysis
In addition, the student response rate in PISA 2022 was 80 percent for the U.S, again requiring a nonresponse bias analysis. In order to evaluate if the responding students were representative of all 15-year-old students in the U.S., the student level analysis used a similar method to the school level analysis, with the following student variables also included:
- Gender;
- Grade level;
- Special education needs; and
- Average age
These analyses showed that after the application of nonresponse adjustments were made as part of the analytic weighting procedures, two characteristics were found to be statistically significant: grade and special education needs status. Specifically, 10th graders were overrepresented, (weighted eligible 10th graders: 73.1 percent vs. weighted participating 10th graders: 73.7 percent) and students without special education needs were slightly overrepresented (weighted eligible non-special education needs students: 95 percent vs. weighted participating non-special education needs students: 95.4 percent).
In summary, the investigation into nonresponse bias at the student level in the U.S. PISA 2022 provides evidence that there is some potential for nonresponse bias in the PISA participating students based on the characteristics studied. There is still a possibility of unobserved bias in variables that were not included in this evaluation.
References
Judkins, D.R. (1990). Fay's method for variance estimation. Journal of Official Statistics 6 (3), 223–239.
Krenzke, T., Roey, S. Dohrmann, S.M., Mohadjer, L., Haung, W-C., Kaufman, S., and Seastrom, M. (2006). Tactics for Reducing the Risk of Disclosure Using the NCES DataSwap Software. Proceedings of the American Sociological Association: Survey Research Methods Section. Philadelphia: American Sociological Association.
Organization for Economic Cooperation and Development (OECD). (2020). PISA 2022 Technical Standards. Paris: Author. Available online at https://www.oecd.org/pisa/pisaproducts/PISA-2022-Technical-Standards.pdf
Organization for Economic Cooperation and Development (OECD). (2023). PISA 2022 Technical Report. Paris: Author.