- Surveys & Programs
- Data & Tools
- Fast Facts
- News & Events
- Publications & Products
- About Us

- Executive Summary
- Introduction
- How Do U.S. Students Compare With Their Peers in Other Countries?
- Focus Points
- Summary
- List of Tables
- List of Figures
- References
- Appendix A: Technical Notes
- A.1 Limitations of sampled data
- A.2 International requirements for sampling, data collection, and response rates
- A.3 Test development
- A.4 Scoring
- A.5 Data entry and cleaning
- A.6 Weighting and scaling
- A.7 Cutpoint scores and achievement levels
- A.8 Comparing results from PISA 2000, 2003, and 2006
- A.9 Comparing results from TIMSS 1995 and 1999
- A.10 Confidentiality and disclosure limitations
- A.11 Nonresponse bias analysis
- A.12 State participation in international assessments

- PDF & Related Info

Technical Notes: A.1 Limitations of sampled data

Estimating the achievement of the total population or subpopulations from an assessment based on a sample of the entire population requires consideration of several factors before the results become meaningful. However conscientious an organization may be in collecting assessment data from a sample of a population, there will always be the possibility of *nonsampling errors* (errors made in the collection and processing of data) and some sampling errors (the margin of error in estimating the achievement of the actual total population or subpopulation because the data are available from only a portion of the total population).

**Nonsampling errors**

"Nonsampling error" is a term used to describe variations in the estimates that may be caused by population coverage limitations, nonresponse bias, and measurement error, as well as data collection, processing, and reporting procedures. The sources of nonsampling errors are typically problems such as unit and item nonresponse, the differences in respondents' interpretations of the meaning of questions, response differences related to the particular time the assessment was conducted, and mistakes in data preparation. Sections A.2 through A.5 describe the international policies and procedures put in place to minimize nonsampling errors. Section A.11 describes NCES's policy of nonresponse bias analysis.

**Sampling errors **

Sampling errors occur when a discrepancy between a population characteristic and the sample estimate arises because not all members of the target population are sampled for the survey. The margin of error or the magnitude of sampling error depends on several factors, such as the amount of variation in the responses, the size and representativeness of the sample, and the size of the subgroup for which the estimate is computed. The magnitude of this margin of error is measured by what statisticians call the *standard error* of an estimate.

The standard error for each estimate in this special analysis was calculated in order to determine the "margin of error" for each estimate. An estimate with a smaller standard error provides a more reliable estimate of the true value than an estimate with a higher standard error. The standard errors for all the estimated average scores, cutpoint scores, and percentages reported in the figures and tables of the special analysis can be found on *The Condition of Education* website at http://nces.ed.gov/programs/coe.

**Analysis and interpretation **

Due to standard errors, caution is warranted when drawing conclusions about the achievement results estimated for one population in comparison to another or whether a time series of achievement results is increasing, decreasing, or staying about the same. Although one estimate of the achievement results may be larger than another, a statistical test may reveal that there is no measurable difference between the two estimates due to their uncertainty. Whether differences in averages (means) or cutpoint scores are statistically significant can be determined by using the standard errors of the estimates. When differences are statistically significant, the probability that the difference occurred by chance is usually small; about 5 times out of 100. For this special analysis, differences between means or cutpoint scores (including increases or decreases) are stated only when they are statistically significant. To determine whether differences reported are statistically significant, two-tailed t tests, at the .05 level of significance, were used. In addition, the t test formula for determining statistical significance was adjusted when a linking error term needed to be accounted for (see below for more on linking errors, under A.8). No multiple comparisons (Bonferroni adjustments) were used in this special analysis (see below for more on past significance tests, under A.8 and A.9).