Technical Notes: A.8 Comparing results from PISA 2000, 2003, and 2006
The PISA 2000, 2003, and 2006 assessments of reading, mathematics and science are linked assessments. That is, the sets of items used to assess reading, mathematics, and science in PISA 2000, 2003, and 2006 include a subset of common items. For example, there were 20 common mathematics items that were used in PISA in 2000 and 2003. To establish common reporting metrics for PISA, the difficulty of each link items is measured on different occasions and compared. Using procedures that are detailed in the PISA 2006 Technical Report (OECD 2008), the change in the difficulty of each of the individual link items is used in determining a score transformation that allows the reporting of the data on a common scale. As each item provides slightly different information about the link transformation, it follows that the chosen sample of link items will influence the estimated transformation. Thus, if an alternative set of link items had been chosen, the resulting transformation would be slightly different. The consequence is an uncertainty in the transformation due to the sampling of the link items, just as there is uncertainty in values such as country means due to the sampling of students.
Such uncertainty that results from the link-item sampling is referred to as linking error and this error must be taken into account when making certain comparisons between PISA 2000, 2003, and 2006 results.43 Just as with the error that is introduced through the process of sampling students, the exact magnitude of this linking error can only be estimated. As with sampling errors, the likely range of magnitude for the errors is represented as a standard error. The standard errors of linking are as follows:
When comparing two country means from PISA taken at different times (e.g., 2000 and 2003), the calculation of the standard error of the difference includes the standard errors of the two individual scores in addition to the linking error, making the resulting statement of statistical significance more conservative than if there were no linking error. For example, to calculate the standard error on the difference between scores obtained for a country in 2000 and 2003, the following formula is applied when
represent the standard errors for the results of PISA 2000 and PISA 2003, respectively, and
represents the linking error between PISA 2000 and PISA 2003:
Because linking error should be taken into account when comparing means from different PISA assessment cycles, the results of simple t-tests that do not include the linking error will differ from the results published in the official PISA reports and this special analysis. For example, without adjusting for linking error, significance tests comparing reading literacy scores between PISA 2000 and PISA 2003 indicate that 15 jurisdictions measurably changed. However, after adjusting for linking error, only 9 jurisdictions are shown to have measurably changed at the .05 level of significance.
PISA tests of significance in 2000
Results from PISA 2000 summarized in this special analysis have been updated from what was presented in the PISA 2000 U.S. report (Lemke et al. 2001). Some country differences discussed in this report were not reported as statistically significant in the PISA 2000 U.S. report. In that report, a Bonferroni adjustment was used in all multiple comparisons of countries. This was not the case when PISA 2003 and 2006 data were analyzed and reported, which makes it difficult to compare results from the PISA 2000 U.S. report with results from the PISA 2003 and 2006 U.S. reports. The use of the Bonferroni adjustment for multiple comparisons was discontinued in order to avoid the possibility that comparisons of achievement between countries could be interpreted differently depending on the numbers of countries compared.
43 Because PIRLS and TIMSS are designed differently, there is no need to account for linking error.