- Surveys & Programs
- Data & Tools
- Fast Facts
- News & Events
- Publications & Products
- About Us

April 2008

**National Center for Education Statistics**

Download the complete report or portions of the reports as PDF files for viewing and printing.

In late January through early March of 2003, the National Assessment of Educational Progress (NAEP) grade 4 and 8 reading and mathematics assessments were administered to representative samples of students in approximately 100 public schools in each state. The results of these assessments were announced in November 2003. Each state also carried out its own reading and mathematics assessments in the 2002-2003 school year, most including grades 4 and 8. This report addresses the question of whether the results published by NAEP are comparable to the results published by individual state testing programs.

Comparisons to address the following four questions are based purely on results of testing and do not compare the content of NAEP and state assessments.

- How do states' achievement standards compare with each other and with NAEP?
- Are NAEP and state assessment results correlated across schools?
- Do NAEP and state assessments agree on achievement trends over time?
- Do NAEP and state assessments agree on achievement gaps between subgroups?

Both NAEP and State Education Agencies have set achievement, or performance, standards for mathematics and have identified test score criteria for determining the percentages of students who meet the standards. Most states have multiple performance standards, and these can be categorized into a *primary standard*, which, since the passage of *No Child Left Behind*, is generally the standard used for reporting adequate yearly progress (AYP), and standards that are above or below the primary standard. Most states refer to their primary standard as *proficient* or *meets the standard*.

By matching percentages of students reported to be meeting state standards in schools participating in NAEP with the distribution of performance of students in those schools on NAEP, cutpoints on the NAEP scale can be identified that are equivalent to the scores required to meet a state's standards.

From the analyses presented in chapter 2, we find:

- The median of the states' primary mathematics standards, as reflected in their NAEP equivalents, is between the NAEP
*basic*and*proficient*levels in both grades 4 and 8. - The primary standards vary greatly in difficulty across states, as reflected in their NAEP equivalents. In fact, among states, there is more variation in placement of primary mathematics standards than in average NAEP performance.
- As a corollary, states with high primary standards tend to see few students meet their standards, while states with low primary standards tend to see most students meet their standards.
- There is no evidence that setting a higher state standard is correlated with higher performance on NAEP. Students in states with high primary standards score just about the same on NAEP as students in states with low primary standards.

An essential criterion for the comparison of NAEP and state assessment results in a state is that the two assessments agree on which schools are high achieving and which are not. The critical statistic for testing this criterion is the correlation between schools' percentages achieving their primary standard, as measured by NAEP and the state assessment. Generally, a correlation of at least .7 is important for confidence in linkages between them.^{1} Several factors other than similarity of the assessments depress this correlation. In 2003, correlations between NAEP and state assessment measures of mathematics achievement were greater than .7 in 41 out 46 states for grade 8 and in 30 out of 49 state for grade 4.

One of these factors is a disparity between the standards: the correlation between the percent of students meeting a high standard on one test and a low standard on the other test are bound to be lower than the correlation between percents of students meeting standards of equal difficulty on the two tests. To be fair and unbiased, comparisons of percentages meeting standards on two tests must be based on equivalent standards for both tests. To remove the bias of different standards, NAEP was rescored in terms of percentages meeting the state's standard. Nevertheless, as discussed in chapter 3, other factors also depressed the correlations:

- Correlations are biased downward by schools with small enrollments, by use of scores for an adjacent grade rather than the same grade, and by standards set near the extremes of a state's achievement distribution.
- Estimates of what the correlations would have been if they were all based on scores on non-extreme standards in the same grade in schools with 30 or more students per grade were greater than .7 in 42 of 43 states for grade 8 and in 37 of 46 states for grade 4.
^{2}

Comparisons are made between NAEP and state assessment mathematics achievement trends between 2000 and 2003. Achievement trends are measured by both NAEP and state assessments as gains in school-level percentages meeting the state's primary standard.^{3}

From the analyses presented in chapter 4, we find:

- For mathematics achievement trends from 2000 to 2003, there are significant differences between NAEP and state assessment trends in 14 of 24 states in grade 4 and 11 of 22 states in grade 8.
- In aggregate, in grade 4 but not in grade 8, mathematics achievement gains from 2000 to 2003 measured by NAEP are significantly larger than those measured by state assessments.
- Across states, there was a positive correlation between gains measured by NAEP and gains measured by state assessments (
*r*= .52 at grade 4 and*r*= .36 at grade 8).

Comparisons are made between NAEP and state assessment measurement of mathematics achievement gaps in grades 4 and 8 in 2003. Comparisons are based on school-level percentages of Black, Hispanic, White, and economically disadvantaged and non-disadvantaged students achieving the state's primary mathematics achievement standard in the NAEP schools in each state.

From the analyses presented in chapter 5, we find:

- In 34 of 70 gap comparisons at grade 4 and 17 of 62 gap comparisons at grade 8, NAEP found significantly larger gaps than the state assessment did. In only two of the comparisons (both at grade 8) did the state assessment record a significantly larger gap.
- The tendency for NAEP to find larger gaps in mathematics achievement than state assessments did was equally strong with respect to Black-White and Hispanic-White gaps and slightly weaker for poverty gap comparisons.

This report makes use of test score data for 48 states and the District of Columbia from two sources: (1) NAEP plausible value files for the states participating in the 2000 and 2003 mathematics assessments, augmented by imputations of plausible values for the achievement of excluded students;^{4} and (2) state assessment files of school-level statistics compiled in the National Longitudinal School-Level State Assessment Score Database (NLSLSASD).^{5}

All comparisons in the report are based on NAEP and state assessment results in schools that participated in NAEP, weighted to represent the states. Across states in 2003, the median percentage of NAEP schools for which state assessment records were matched was greater than 99 percent. However, results in this report represent about 96 percent of the regular public school population, because for confidentiality reasons state assessment scores are not available for the smallest schools in most states.

In most states, comparisons with NAEP grade 4 and 8 results are based on state assessment scores for the same grades, but in a few states for which tests were not given in grades 4 and 8, assessment scores from adjacent grades are used.

Because NAEP and state assessment scores were not available from all states prior to 2003, trends could not be compared in all states. Furthermore, in eight of the states with available scores, either assessments or performance standards were changed between 2000 and 2003, precluding trend analysis in those states for some years. As a result, comparisons of trends from 2000 to 2003 are possible in 24 states for grade 4 and 21 states for grade 8.

Because subpopulation achievement scores were not systematically acquired for the NLSLSASD prior to 2002, achievement gap comparisons are limited to gaps in 2003. In addition, subpopulation data are especially subject to suppression due to small sample sizes, so achievement gap comparisons are not possible for groups consisting of fewer than ten percent of the student population in a state.

Black-White gap comparisons for 2003 are possible in 25 states for grade 4 and 20 states for grade 8; Hispanic-White gap comparisons in 14 states for both grades 4 and 8; and poverty gap comparisons in 31 states for grade 4 and 28 states for grade 8.

Although this report brings together a large amount of information about NAEP and state assessments, there are significant limitations on the conclusions that can be reached from the results presented.

First, this report does not address questions about the content, format, or conduct of state assessments, as compared to NAEP. The only information presented in this report concerns the results of the testing—the achievement scores reported by NAEP and state mathematics assessments.

Second, this report does not represent all public school students in each state. It does not represent students in home schooling, private schools, or many special education settings. State assessment scores based on alternative tests are not included in the report, and no adjustments for non-standard test administrations (i.e., accommodations) are applied to scores. Student exclusion and nonparticipation are statistically controlled for NAEP data, but not state assessment data.

Third, this report is based on school-level percentages of students, overall and in demographic subgroups, who meet standards. As such, it has nothing to say about measurement of individual student variation in achievement within these groups or differences in achievement that fall within the same discrete achievement level.

Finally, this report is not an evaluation of state assessments. State assessments and NAEP are designed for different, although overlapping purposes. In particular, state assessments are designed to provide important information about individual students to their parents and teachers, while NAEP is designed for summary assessment at the state and national level. Findings of different standards, different trends, and different gaps are presented without suggestion that they be considered as deficiencies either in state assessments or in NAEP.

There are many technical reasons for different assessment results from different assessments of the same skill domain. The analyses in this report have been designed to eliminate some of these reasons, by (1) comparing NAEP and state results in terms of the same performance standards, (2) basing the comparisons on scores in the same schools, and (3) removing the effects of NAEP exclusions on trends. However, other differences remain untested, due to limitations on available data.

The findings in this report must necessarily raise more questions than they answer. For each state in which the correlation between NAEP and state assessment results is not high, a variety of alternative explanations must be investigated before reaching conclusions about the cause of the relatively low correlation. The report evaluates some explanations but leaves others to be explained when more data become available.

Similarly, the explanations of differences in trends in some states may involve differences in populations tested, differences in testing accommodations, or other technical differences, even though the assessments may be testing the same domain of skills. Only further study will yield explanations of differences in measurement of achievement gaps. This report lays a foundation for beginning to study the effects of differences between NAEP and state assessments of mathematics achievement.

Download the entire Volume I report in a PDF file for viewing and printing. (1906K PDF)

Download the entire Volume II report in a PDF file for viewing and printing. (4377K PDF)

Download Volume I Chapter 1 in a PDF file for viewing and printing. (799K PDF)

Download Volume I Chapter 2 in a PDF file for viewing and printing. (635K PDF)

Download Volume I Chapter 3 in a PDF file for viewing and printing. (529K PDF)

Download Volume I Chapter 4 in a PDF file for viewing and printing. (533K PDF)

Download Volume I Chapter 5 in a PDF file for viewing and printing. (620K PDF)

Download Volume I Chapter 6 in a PDF file for viewing and printing. (579K PDF)

Download Volume I Appendix A in a PDF file for viewing and printing. (544K PDF)

Download Volume I Appendix B in a PDF file for viewing and printing. (612K PDF)

Download Volume I Appendix C in a PDF file for viewing and printing. (548K PDF)

Download Volume II Appendix D.1 in a PDF file for viewing and printing. (697K PDF)

Download Volume II Appendix D.2 in a PDF file for viewing and printing. (1430K PDF)

Download Volume II Appendix D.3 in a PDF file for viewing and printing. (1445K PDF)

Download Volume II Appendix D.4 in a PDF file for viewing and printing. (1413K PDF)

Download Volume II Appendix D.5 in a PDF file for viewing and printing. (1206K PDF)

**NCES 2008-475** **Ordering information**

**Suggested Citation
**McLaughlin, D.H., Bandeira de Mello, V., Blankenship, C., Chaney, K., Esra, P., Hikawa, H., Rojas, D., William, P., and Wolman, M. (2008).

See more information on comparing NAEP and state proficiency standards on the NAEP website.

^{1}A correlation of at least .7 implies that 50% or more of the variance of one variable can be predicted from the other variable.

^{2}Three states for which state reports of percentages meeting standards were unavailable were not included in the computations of these estimates.

^{3}To provide an unbiased trend comparison, NAEP was rescored in terms of the percentages meeting the state's primary standard in the earliest trend year.

^{4}Estimations of NAEP scale score distributions are based on an estimated distribution of *possible scale scores* (or plausible values), rather than point estimates of a single scale score.

^{5}Most states have made school-level achievement statistics available on state websites since the late 1990s; these data have been compiled into a single database, the NLSLSASD, for use by educational researchers. These data can be downloaded from http://www.schooldata.org. However, 2003 school-level state mathematics assessment results were not available for Nebraska and West Virginia when this report was prepared.

Last updated 06 August 2009 (EP)