The 2015 assessment instruments were developed by international experts and PISA consortium test developers and included items submitted by participating education systems. All mathematics and reading items in the 2015 assessment instrument were trend items from previous assessments. Science items included both trend items and new items developed for 2015. Items were reviewed by representatives of each country for possible bias and relevance to PISA's goals and the PISA subject-matter expert groups. To further examine potential biases and design issues in the PISA assessment, all participating education systems field-tested the assessment items in spring 2014. After the field trial, items that did not meet the established measurement criteria or were otherwise found to include intrinsic biases were dropped for the main assessment.
The field trial also served as a mode effect study for the transition from paper-based to computer-based assessments. The field trial evaluated the efficiency, accuracy, and effectiveness of the computer platform in capturing information as well as the psychometric properties of the items in both modes. The field trial mode effect study found little to no overall systematic differences between computer-based and paper-based student achievement results at the country level. Although some differences between the computer-based and paper-based results were found among students who used computers infrequently or not at all, this group accounted for only 10 percent of students. Overall, the mode effect portion of the field trial study found that, across countries, paper-based items were comparable to their computer-based counterparts and the level of difficulty of an item stayed largely the same between paper-based and computer-based modes (OECD forthcoming). This finding provided the test designers with reasonable assurance that they could move forward with comparing scores from prior cycles with the 2015 cycle as well as place scores derived from the paper-based version of PISA on the same scale of scores derived from the computer-based version.
The final 2015 main study computer-based assessment included six clusters from each of the trend domains of science, reading, and mathematics literacy, six clusters of new science literacy test items, and three clusters of new collaborative problem solving materials. The clusters were allocated in a rotated design using 66 test forms to create six groups. According to the design, 33 percent of students within each school were assigned to one of 12 science literacy and reading literacy test forms; 33 percent were assigned to one of 12 science literacy and mathematics literacy test forms; 22 percent were assigned to one of six science literacy and collaborative problem solving test forms; 4 percent of students were assigned to one of 12 science literacy, mathematics literacy, and collaborative problem solving test forms; 4 percent were assigned to one of 12 science literacy, reading literacy, and collaborative problem solving test forms; and 4 percent were assigned to one of 12 science literacy, reading literacy, and mathematics literacy test forms. Every student taking the assessment answered science items, but not all students answered mathematics literacy, reading literacy, or collaborative problem solving.
Approximately 65 percent of science items were multiple-choice and 35 percent were open response. For reading and mathematics items, approximately 40 percent were multiple choice and 60 percent open response. Open response items were graded by trained scorers.
For education systems administering the paper-based version of PISA, as in the case of Puerto Rico, the assessment included six clusters from each of the trend domains of science, reading, and mathematics literacy only. The clusters were allocated in a rotated design to create three groups of test booklets. According to the design, 44 percent of students within each school were assigned to one of 12 science literacy and reading literacy test booklets; 44 percent were assigned to one of 12 science literacy and mathematics literacy test booklets; and 12 percent were assigned to one of six science literacy, reading literacy, and mathematics literacy test booklets. Every student taking the paper-based assessment answered science items, but not all students answered mathematics literacy and\or reading literacy items.
After the cognitive assessment, students also completed a 30-minute questionnaire designed to provide information about their backgrounds, attitudes, and experiences in school. Principals in schools where PISA was administered also completed a 30-minute questionnaire designed to provide information on their school's structure, resources, instruction, climate, and policies.
In addition, for the U.S. PISA 2015 national school sample, Massachusetts school sample, and North Carolina school sample, a sample of teachers within each school were selected to complete a 30-minute computer-based questionnaire. (Puerto Rico did not administer the teacher questionnaire.) The questionnaire was designed to provide information on teachers' backgrounds, education and professional development, and teaching practices. Ten science teachers and fifteen non-science teachers eligible to teach the modal grade (10th grade in the United States) were sampled in each school. Similar to the test development of the main assessment, student, school, and teacher questionnaire items that did not meet the established measurement criteria or were otherwise found to include intrinsic biases in the field trial were dropped from the main assessment.
Source versions of all instruments (the assessment booklets, questionnaires, and operations manuals) were prepared in English and French and translated into the primary language or languages of instruction in each education system. The PISA consortium recommended a double translation design and provided precise translation guidelines that included a description of the features each item was measuring and statistical analysis from the field trial. This entailed having two independent translations, one from each of the source languages (English and French), and reconciliation by a third party. When double translation was not possible, single translation was accepted. In addition, the PISA consortium verified the instrument translation when more than 10 percent of an education system's PISA population used a national language that was neither French nor English.
Instrument adaptation was necessary even in nations such as the United States that use English as the primary language of instruction. These adaptations were primarily for cultural purposes. For example, words such as "lift" might be adapted to "elevator" for the United States. The PISA consortium verified and approved the national adaptation of all instruments, including that of the United States.
The PISA consortium emphasized the use of standardized procedures in all education systems. Each education system collected its own data, based on detailed manuals provided by the PISA consortium (Westat 2014) that explained the survey's implementation, including precise instructions for the work of school coordinators and test administrators and scripts for test administrators to use in testing sessions. Test administration in the United States was conducted by professional staff trained in accordance with the international guidelines. Students were allowed to use calculators, and U.S. students were provided calculators.
In each education system, a PISA Quality Monitor (PQM) who was engaged independently by the PISA consortium observed test administrations in a subsample of participating schools. The schools in which the independent observations were conducted were selected jointly by the PISA consortium and the PQM. In the United States, there were five PQMs who observed 15 schools from the national sample and 5 schools from each of the two U.S. states and Puerto Rico. The PQM's primary responsibility was to document the extent to which testing procedures in schools were implemented in accordance with test administration procedures. The PQM's observations in U.S. schools indicated that international procedures for data collection were applied consistently.