In PIRLS 2016, results are generally reported in two ways: scale scores and international benchmarks of achievement. Scores on PIRLS for each administration since 2001 have been scaled to range from 0 to 1,000, with an international centerpoint of 500 and a standard deviation of 100. By centering results in this manner, comparisons can be made from 2001 to 2006, to 2011, and now to 2016. The ePIRLS scale also ranges from 0 to 1,000, with an international centerpoint of 500 and a standard deviation of 100. This scaling facilitates comparisons across education systems of overall printed reading achievement and overall online reading achievement.
Along with scale scores, PIRLS has international benchmarks that group achievement into four levels: Advanced, High, Intermediate, and Low. The distribution of student scores—and the kinds of skills and knowledge that students demonstrate—determines the score cut-points for these benchmarks. The benchmarks offer an interpretation of what the scale scores mean. Items within each of these benchmarks identify what students are likely to answer correctly; experts then examine the data to provide a sense of what students know and can do. See exhibit 2 for a description of each of the PIRLS benchmarks, and exhibit 3 for a description of the ePIRLS benchmarks. Examples of items at each benchmark level are provided at https://timssandpirls.bc.edu/pirls2016/frameworks.html.
The comparisons presented in The PIRLS and ePIRLS Results from 2016 pages and the in the PIRLS and ePIRLS 2016 First Look report have been tested for statistical significance. For example, in the commonly made comparison of international averages to U.S. averages, tests of statistical significance were used to establish whether or not the observed differences from the U.S. average were statistically significant. In all instances, the tests for significance used were standard t tests. A difference is "significant" if the probability associated with the t test is less than .05. If a test is significant, it implies that the difference in the observed value in the sample represents a real difference in the population. No adjustments were made for multiple comparisons.
For additional information on scaling, reporting, and statistical procedures, see the Methodology and Technical Notes