​​Cautions in Interpreting NAEP Results

Users of this website are cautioned against interpreting NAEP results in a causal sense. For example, inferences related to student performance by race or ethnicity, or to the effectiveness of public and nonpublic schools, should take into consideration the many socioeconomic and educational factors that may also impact performance.

The NAEP scale scores make it possible to examine relationships between students' performance and various factors measured by NAEP. However, a relationship that exists between student achievement and another variable does not reveal its underlying cause, which may be influenced by a number of other variables. Similarly, the assessments do not reflect the influence of unmeasured variables. The results are most useful when they are considered in combination with other knowledge about the student population and the educational system, such as trends in instruction, changes in the school-age population, and societal demands and expectations.

Read about how achievement levels are set, and about the status of achievement levels.

Statistical Significance

The differences between scale scores and between percentages discussed in the results on this website take into account the standard errors associated with the estimates. Comparisons are based on statistical tests that consider both the magnitude of the difference between the group average scores or percentages and the standard errors of those statistics. Throughout the results, differences between scores or between percentages are discussed only when they are significant from a statistical perspective.

All differences reported are significant at the .05 level with appropriate adjustments for multiple comparisons. The term significant is not intended to imply a judgment about the absolute magnitude of the educational relevance of the differences. It is intended to identify statistically dependable population differences to help inform dialogue among policy makers, educators, and the public.


Results are Estimates 

The average scores and percentages presented on this website are estimates because they are based on representative samples of students rather than on the entire population of students. Moreover, the collection of subject-area questions used at each grade level is but a sample of the many questions that could have been asked that measure the NAEP subject frameworks. As such, NAEP results are subject to a measure of uncertainty, reflected in the standard error of the estimates. The standard errors for the estimated scale scores and percentages in the figures and tables presented on this website are available through the NAEP Data Explorer.​​​​​

​​​​​​​​​Influence of Sample Size 

Beginning in 2002, the NAEP national sample was obtained by aggregating the samples of public school students from each state and jurisdiction, and then supplementing the aggregate sample with a nationally representative sample of students from nonpublic schools, rather than by obtaining an independently selected national sample. As a consequence, the national sample size increased, and smaller differences between years or between groups of students were found to be statistically significant than would have been detected in previous assessments.

 A caution is also warranted for some small group population estimates. At times in the results pages, smaller population groups show very large increases or decreases across years in average scores. For example, fourth-grade Hispanic students in Delaware are reported as having a 36-point score increase in mathematics between 1998 and 2002. However, it is often necessary to interpret such score gains with extreme caution. For one thing, the effects of exclusion-rate changes may be more marked for small groups than they are for the whole population. To continue with the Delaware example, 2 percent of Hispanic students were excluded in 1998. This number increased to 21 percent in 2002. Also the standard errors are often quite large around the score estimates for small groups, which in turn means the standard error around the gain is also large. While the Delaware Hispanic student scores went up 36 points, the standard error of the gain is almost 12 points, which means that statisticians are confident that the estimate is correct within 23.5 points (i.e., 36 ± 11.75 points).


Last updated 21 December 2016 (AA)