# Statistical Significance and Sample Size

When the National Center for Education Statistics (NCES) reports differences in results, these results reflect statistical significance. Understanding statistical significance in large-scale assessments, how results are estimated, and the influence of sample size are important when interpreting NAEP data in The Nation's Report Card. Explore this important guide to NAEP results and the use of statistical significance in NAEP data.

## Influence of Sample Size

### Sampling Variance

Like any survey based on a sample, NAEP results are subject to uncertainty. This uncertainty is reflected by the standard error of NAEP estimates; the more precise the estimate, the smaller the standard error.

The first source of uncertainty arises from the fact that NAEP only assesses a sample of students, rather than every eligible student (a census). The sample consists of a number randomly selected students. Carefully constructed surveys can yield very precise estimates of population quantities. But, a different, equally good sample of students could be have been selected, and the results based on the second sample would be slightly different. Thus, the first component of the standard error is due to sampling of students, termed "sampling variance."

In a good sampling design, the sampling variance decreases as the number of students selected increases. Large groups will tend to have smaller standard errors than smaller groups. A NAEP national assessment typically contains about 10,000 students. Some NAEP assessments include separate, state level samples of over 2,000 per state, which are combined to produce national results. These state-national assessments result in total samples of approximately 140,000 students. Thus, results for the nation based on NAEP state –national assessments will have much smaller standard errors than results from NAEP national only assessments.

### Measurement Variance

The second source of uncertainty in NAEP results is due to "measurement." Measurement variance arises from the fact that a student’s proficiency in a subject (e.g., how good the student is at mathematics), is not directly observed, but has to be estimated based on the answers that the student provides to the items on the assessment. It is possible that, were the assessment given on a different day, the student might provide slightly different answers. Similarly, a different version of the assessment, comprised of different but equally valid items, would give slightly different estimates of students’ proficiency. These two factors give rise to what is typically termed "measurement variance."

NAEP assessments contain a third, related source of measurement uncertainty, due to sampling of items. The contents of all NAEP assessments are created according the specifications of a framework, which is created by the National Assessment Governing Board. NAEP frameworks are quite broad and multifaceted, and the resulting assessments are long. Taking the full assessment would require approximately 5-6 hours for each student, which is unreasonable to ask of students. To limit the burden on individual students, NAEP items are grouped into blocks requiring 25-30 minutes to complete. Each student receives a book of two blocks. The fact that students did not take the entire assessment is an additional source of measurement uncertainty.