Methodology and Technical Notes - Test Development

Return to Methodology and Technical Notes

The 2012 assessment instruments were developed by international experts and PISA consortium test developers and included items submitted by participating education systems. Items were reviewed by representatives of each country for possible bias and relevance to PISA’s goals and the PISA subject-matter expert groups. All participating education systems field-tested the assessment items in spring 2011.

The final paper-based assessment consisted of 85 mathematics items, 44 reading items, 53 science items, and 40 financial literacy items allocated to 17 test booklets (in education systems that did not administer the optional financial literacy assessment there were 13 test booklets). Each booklet was made up of four test clusters. Altogether there were seven mathematics clusters, three reading clusters, three science clusters, and two financial literacy clusters. The mathematics, science, and reading clusters were allocated in a rotated design to 13 booklets. The financial literacy clusters in conjunction with mathematics and reading clusters were allocated in a rotated design to four booklets. The average number of items per cluster was 12 items for mathematics, 15 items for reading, 18 items for science, and 20 items for financial literacy. Each cluster was designed to average 30 minutes of test material. Each student took one booklet, with about 2 hours’ worth of testing material. Approximately half of the items were multiple-choice, about 20 percent were closed or short response types (for which students wrote an answer that was simply either correct or incorrect), and about 30 percent were open constructed responses (for which students wrote answers that were graded by trained scorers using an international scoring guide). In PISA 2012, with the exception of students participating in the financial literacy assessment, every student answered mathematics items. Not all students answered reading, science items, and\or financial literacy items.

A subset of students who took the paper-based assessment also took a 40-minute, computer-based assessment. In the United States, the computer-based assessment consisted of problem solving and the optional computer-based assessment of mathematics and reading. The computer-based assessment consisted of 168 problem-solving items, 164 mathematics items, and 144 reading items allocated to 24 forms. Each form was made up of two clusters that together contained 18 to 22 items. Altogether there were four clusters of problem solving, four clusters of mathematics, and two clusters of reading. The problem-solving, mathematics, and reading clusters were allocated in a rotated design to the 24 forms. Each cluster was designed to average 20 minutes of test material. (Not all education systems participated in the computer-based assessment and some education systems only administered the computer-based problem-solving assessment. Education systems that administered only the problem-solving assessment followed a different rotation design.)

In addition to the cognitive assessment, students also completed a 30-minute questionnaire designed to provide information about their backgrounds, attitudes, and experiences in school. Principals in schools where PISA was administered also completed a 30-minute questionnaire about their schools.

Source versions of all instruments (assessment booklets, computer-based assessment forms, questionnaires, and manuals) were prepared in English and French and translated into the primary language or languages of instruction in each education system. The PISA consortium recommended that education systems prepare and consolidate independent translations from both source versions and provided precise translation guidelines that included a description of the features each item was measuring and statistical analysis from the field trial. In cases for which one source language was used, independent translations were required and discrepancies reconciled. In addition, it was sometimes necessary to adapt the instrument for cultural purposes, even in nations such as the United States that use English as the primary language of instruction. For example, words such as “lift” might be adapted to “elevator” for the United States. The PISA consortium verified the national adaptation of all instruments. Electronic copies of printed materials were sent to the PISA consortium for a final visual check prior to data collection.

The PISA consortium emphasized the use of standardized procedures in all education systems. Each education system collected its own data, based on a manual provided by the PISA consortium (ACER 2011) that explained the survey’s implementation, including precise instructions for the work of school coordinators and scripts for test administrators to use in testing sessions. Test administration in the United States was conducted by professional staff trained in accordance with the international guidelines. Students were allowed to use calculators, and U.S. students were provided calculators.

In a sample of schools in each education system, a PISA Quality Monitor (PQM) who was engaged by the PISA consortium observed test administrations. The sample schools were selected jointly by the PISA consortium and the PQM. In the United States, there were two PQMs who each observed seven schools from the national and state samples. The PQM’s primary responsibility was to document the extent to which testing procedures in schools were implemented in accordance with test administration procedures. The PQM’s observations in U.S. schools indicated that international procedures for data collection were applied consistently.