- Surveys & Programs
- Data & Tools
- Fast Facts
- News & Events
- Publications & Products
- About Us

- Overview
- PISA Data Explorer
- PISA 2018 Results
- Technical Notes
- Previous PISA Results
- PISA 2015 Results
- Welcome to the PISA 2015 Results
- Selected Findings from PISA 2015
- Introduction
- Science Literacy
- Reading Literacy
- Mathematics Literacy
- Financial Literacy
- Collaborative Problem Solving
- Trends in Student Performance
- State Results
- Methodology and Technical Notes
- Download all PISA 2015 tables and figures
- For More Information

- PISA 2012 Results
- PISA 2009 Results
- PISA 2006 Results
- PISA 2003 Results

- PISA 2015 Results
- FAQs
- Data
- PISA Released Assessment Items
- Questionnaires
- Countries
- Schedule and Plans
- Partners
- PISA International Site
- Follow-up Study
- Join NewsFlash

Return to Methodology and Technical Notes

Comparisons made in the text of this report have been tested for statistical significance. For example, in the commonly made comparison of OECD averages to U.S. averages, tests of statistical significance were used to establish whether or not the observed differences from the U.S. average were statistically significant.

In almost all instances, the tests for significance used were standard *t*
tests. These fell into three categories according to the nature of the comparison
being made: comparisons of independent samples, comparisons of nonindependent samples,
and comparisons of performance over time. In PISA, education system groups are independent.
We judge that a difference is "significant" if the probability associated with the
*t* test is less than .05. If a test is significant this implies that difference
in the observed means in the sample represents a real difference in the population.^{6} No adjustments were made for multiple
comparisons.

In simple comparisons of independent averages, such as the average score of education
system 1 with that of education system 2, the following formula was used to compute
the *t* statistic:

where *est _{1}* and

The second type of comparison occurs when evaluating differences between nonindependent groups within the education system. Because of the sampling design in which schools and students within schools are randomly sampled, the data within the education system from mutually exclusive sets of students (for example, males and females) are not independent. For example, to determine whether the performance of females differs from that of males would require estimating the correlation between females' and males' scores. A BRR procedure, mentioned above, was used to estimate the standard errors of differences between nonindependent samples within the United States. Use of the BRR procedure implicitly accounts for the correlation between groups when calculating the standard errors.

To test comparisons between nonindependent groups the following *t* statistic
formula was used:

where *est _{grp1}* and

A third type of comparison—the addition of a standard error term to the standard
*t* test shown above for simple comparisons of independent averages—was
also used when analyzing change in performance over time. The transformation that
was performed to equate the 2015 data with previous data depends upon the change
in difficulty of each of the individual link items and as a consequence the sample
of link items that have been chosen will influence the choice of transformation.
This means that if an alternative set of link items had been chosen the resulting
transformation would be slightly different. The consequence is an uncertainty in
the transformation due to the sampling of the link items, just as there is an uncertainty
in values such as country means due to the use of a sample of students. This uncertainty
that results from the link item sampling is referred to as "linking error," and
this error must be taken into account when making certain comparisons between previous
rounds of PISA (2003, 2006, 2009, and 2012) and PISA 2015 results. Just as with
the error that is introduced through the process of sampling students, the exact
magnitude of this linking error cannot be determined. We can, however, estimate
the likely range of magnitudes for this error and take this error into account when
interpreting PISA results. As with sampling errors, the likely range of magnitude
for the errors is represented as a standard error. The standard errors of linking
for the various PISA rounds and subjects are:

Comparison | Mathematics | Reading | Science |
---|---|---|---|

2003-2015 | 5.6080 | 5.3907 | † |

2006-2015 | 3.5111 | 6.6064 | 4.4821 |

2009-2015 | 3.7853 | 3.4301 | 4.5016 |

2012-2015 | 3.5462 | 5.2535 | 3.9228 |

† Not applicable. Science trend comparisons can only be made as far back as 2006 due to a change in the framework. |

In PISA, in each of the three subject matter areas, a common transformation was estimated from the link items, and this transformation was applied to all participating education systems when comparing achievement scores over time. It follows that any uncertainty that was introduced through the linking is common to all students and all education systems. Thus, for example, suppose the unknown linking error (between PISA 2012 and PISA 2015) in reading literacy resulted in an over-estimation of student scores by five and one-fourth points on the PISA 2012 scale. It follows that every student's score will be over-estimated by five and one fourth score points. This over-estimation will have effects on certain, but not all, summary statistics computed from the PISA 2015 data. For example, consider the following:

- each education system's mean will be over-estimated by an amount equal to the link error (in our example this is five and one-fourth score points);
- the mean performance of any student subgroup will be over-estimated by an amount equal to the link error (in our example this is five and one-fourth score points);
- the standard deviation of student scores will not be affected because the overestimation of each student by a common error does not change the standard deviation;
- the difference between the mean scores of two education systems in PISA 2015 will not be influenced because the over-estimation of each student by a common error will have distorted each system's mean by the same amount;
- the difference between the mean scores of two student groups (e.g., males and females) in PISA 2015 will not be influenced because the over-estimation of each student by a common error will have distorted each group's mean by the same amount;
- the difference between the performance of a group of students (e.g., an education system) between PISA 2012 and PISA 2015 will be influenced because each student's score in PISA 2015 will be influenced by the error; and
- a change in the difference in performance between two groups from PISA 2012 to PISA 2015 will not be influenced. This is because neither of the components of this comparison, which are differences in scores in 2012 and 2015 respectively, is influenced by a common error that is added to all student scores in PISA 2015.

In general terms, the linking error need only be considered when comparisons are being made between PISA 2012 and PISA 2015 results, and then usually only when group means are being compared. Because the linking error need only be used in a limited range of situations, we have chosen not to report the linking error in the tables included in this report. The general formula is given by:

The most obvious example of a situation where there is a need to use linking error is in the comparison of the mean performance for a single education system between PISA 2012 and PISA 2015. For example, let us consider a comparison between 2012 and 2015 of the performance of the United States in reading. The mean performance of the United States in 2012 was 498 with a standard error of 3.7, while in 2015 the mean was 497 with a standard error of 3.4. Using rounded mean values, the standardized difference in the U.S. means is 0.138, which is computed as follows:

0.138 = (498 – 497) / SQRT[3.72 + 3.42 + 5.25352]

and is not statistically significant.

^{6} A .05 probability implies that the *t*
statistic is among the 5 percent most extreme values one would expect if there were
no difference between the means. The decision rule is that when *t* statistics
are this extreme, they are sampled from a population in which there is a difference
between the means.