Methodology and Technical Notes - Data Limitations

Return to Methodology and Technical Notes

As with any study, there are limitations to PISA 2012 that should be taken into consideration. Estimates produced using data from PISA 2012 are subject to two types of error: nonsampling errors and sampling errors.

Nonsampling error is a term used to describe variations in the estimates that may be caused by population coverage limitations, nonresponse bias, and measurement error, as well as data collection, processing, and reporting procedures. For example, suppose the study was unsuccessful in getting permission from many rural schools in a certain region of the country. In that case, reports of means for rural schools for that region may be biased. Fortunately, such a coverage problem did not occur in PISA in the United States. The sources of nonsampling errors are typically problems such as unit and item nonresponse, the differences in respondents’ interpretations of the meaning of survey questions, and mistakes in data preparation.

Sampling errors arise when a sample of the population, rather than the whole population, is used to estimate some statistic. Different samples from the same population would likely produce somewhat different estimates of the statistic in question. This fact means that there is a degree of uncertainty associated with statistics estimated from a sample. This uncertainty is referred to as sampling variance and is usually expressed as the standard error of a statistic estimated from sample data. The approach used for calculating standard errors in PISA was the Fay method of balanced repeated replication (BRR) (Judkins 1990). This method of producing standard errors uses information about the sample design to produce more accurate standard errors than would be produced using simple random sample assumptions.

Standard errors can be used as a measure for the precision expected from a particular sample.

Confidence intervals provide a way to make inferences about population statistics in a manner that reflects the sampling error associated with the statistic. Assuming a normal distribution and a 95 percent confidence interval, the population value of this statistic can be inferred to lie within the confidence interval in 95 out of 100 replications of the measurement on different samples drawn from the same population.