Jack Buckley
Commissioner, National Center for Education Statistics

Mapping State Proficiency Standards Onto the NAEP Scales: Variation and Change in State Standards for Reading and Mathematics, 2005-2009
August 10, 2011

Good morning. Today I am presenting our new report on mapping state proficiency standards onto NAEP scales. This is our fourth report on state mapping. We have released three earlier reports, using state data for reading and mathematics at grades 4 and 8 from 2003, 2005, and 2007. The current report is based on data from 2009.

I will give you an overview of the mapping procedure and the overall results for 2009, and also discuss the states that made changes in their assessments from 2005 or 2007 to 2009 and the results of those changes. In all cases, our results are for public school students only.

NCES developed our state mapping methods to address the following issue. Each state designs its own tests and sets its own standards for fourth- and eighth-grade reading and mathematics proficiency. They report the percent proficient by subject and grade to the U.S. Department of Education. We want to see how the states are doing in comparison to one another, but because they use different tests and different standards for defining “proficiency,” we can’t directly use the results they report to make comparisons. The NAEP scale provides a common yardstick to make these comparisons.

The problem of state comparability is illustrated by the results from two states, Arizona and New Hampshire, in grade 8 reading. In each state 70 percent of students scored at or above the proficiency standard or cutpoint for grade 8 reading in 2009 on the state test. On NAEP, however, 27 percent of students in Arizona scored at or above the NAEP Proficient achievement level set by the National Assessment Governing Board, while 39 percent did so in New Hampshire.

It is clear that the NAEP standard for “Proficient” is different from either state standard, but, using NAEP as a common yardstick, it is also clear that there is a higher percentage of high-performing students in New Hampshire than in Arizona. By the states’ own test results, it would appear that student achievement in the two states was the same. It is also clear that the states’ standards are different and a method is needed to compare these varying standards.

Mapping Methodology

To illustrate the mapping method, again consider the two states, Arizona and New Hampshire. I have noted that these two states appear to have similar proficiency ratings while they differ in the levels at which their students perform.

As noted above, in Arizona 70 percent of students scored at or above the proficiency standard or cutpoint on the state assessment. To link Arizona’s proficiency standard to the 0-500 point NAEP scale, we note the NAEP score point above which the top 70 percent of Arizona’s students scored on the NAEP assessment. In fact, 70 percent of students in Arizona scored at or above 241 on NAEP, making 241 the “NAEP equivalent score” of Arizona’s proficiency cutpoint.

In New Hampshire, where 70 percent of students also scored at or above New Hampshire’s own proficiency standard, we identify the top 70 percent in New Hampshire’s NAEP results and find that 256 is New Hampshire’s NAEP equivalent score.

The NAEP equivalent score for each state’s proficiency standard serves as the basis for making comparisons from state to state. Using this measure, it is possible to compare the performance of students who did not take the same state assessment, as well as to compare the stringency of state proficiency standards. In this example, the average student performance on NAEP is higher in New Hampshire than in Arizona, despite similar percentages of students scoring at or above proficient on their respective state assessments.

In some of the charts in the report, we indicate the margin of error surrounding the estimate of an equivalent score. The margin of error is equal to twice the standard error associated with the score. Because NAEP scores are based on samples, there is always a margin of error both above and below a NAEP score. To be statistically significant, the difference between the NAEP equivalent scores of any two states must be greater than the margin of error surrounding the estimates for the two states.

There is an additional source of error involved in mapping student scores on state assessments to NAEP, because NAEP and a state assessment may not measure exactly the same construct in exactly the same way—that is, the underlying concepts of what is important for students to know and be able to do may be measured differently by the two assessments. This source of error is called “relative error.”

Relative error for each state is measured as a fraction of the total variation in the percentages of students meeting the proficiency standard in each school in the state that participated in NAEP. If the mapping is valid, the procedure should reproduce the individual school percentages fairly accurately, and the fraction should be low. If the relative error is greater than 0.5—if it accounts for more than half of the total variation—then it is considered to be too large to support useful inferences from the placement of the state standard on the NAEP scale without additional evidence. In such cases, a black triangle is placed under a state’s NAEP equivalent score in the report charts.

For the rest of this presentation I’ll be referring to the NAEP equivalent score as the state proficiency standard.

The NAEP equivalent scores for the grade 4 reading proficiency standards of the 50 jurisdictions for which we had data in 2009 are shown in Figure 2 of the report, arrayed from lowest to highest in order of their stringency, on the NAEP scale. (Nebraska was not included, as it does not have state-wide assessments. Instead, each school district develops its own.)

The two states with the lowest and highest proficiency standards—Tennessee at 170 and Massachusetts at 234, are separated by a range of 64 points. Thirty-five states had proficiency standards that mapped below NAEP’s Basic achievement level cutpoint. The remaining states lay within the Basic range. According to the National Assessment Governing Board, students at the Basic achievement level show “partial mastery of the prerequisite knowledge and skills that are fundamental for proficient work at each grade.” None of the state proficiency standards were in the NAEP Proficient range. According to the Governing Board, students at the Proficient achievement level demonstrate competency over challenging subject matter.

Comparing the various state proficiency standards to the NAEP achievement levels gives context to our discussion, but I do not mean to imply that the NAEP achievement levels are more valid than the state standards. There are a wide variety of policy considerations involved in setting proficiency standards, and what is appropriate for NAEP may not be the best fit for a given state.

There is also wide variation between state standards for grade 8 reading. The range between the lowest state standard, that of Texas (201), and the highest, Missouri (267), was 66 points. Sixteen states had standards that placed them below the Basic range. As with grade 4, no states were in the Proficient range. Sixteen states had proficiency standards for grade 8 reading below NAEP’s Basic level and 34 states had standards within the Basic range.

State Standards for Proficient Performance: Mathematics Grade 4, 2009

The variation in mathematics standards at grade 4 between the lowest state standard, for Tennessee (195), and the highest, for Massachusetts (255), was 60 points. Seven states had proficiency standards that were below Basic, and one state, Massachusetts, was in the Proficient range. The remaining 42 states fell in the Basic range.

State Standards for Proficient Performance: Mathematics Grade 8, 2009

Grade 8 mathematics has a variation of 71 points between the lowest and highest states, Tennessee (229) and Massachusetts (300). Twelve states had state proficiency standards that were below NAEP Basic, 36 states’ standards were in NAEP’s Basic range, and one state—Massachusetts—had a standard that was in the Proficient range. We do not have results for California for grade 8 mathematics because the state does not assess general mathematics at grade 8.

Slide 14— State Standards versus NAEP Achievement

In any given year, how well a state’s students perform on NAEP is not necessarily associated with the rigor of the state’s proficiency standards. Three neighboring states—Oklahoma, Arkansas and Tennessee—illustrate this point. They all had similar average NAEP scores on the eighth-grade mathematics assessment, either 275 or 276.

However, their NAEP-equivalent scores for grade 8 mathematics were different. The state proficiency standards for Oklahoma and Arkansas, 269 and 267, respectively—were near their students’ average NAEP scores, which, for both states, happened to be 276. On the other hand, Tennessee’s state proficiency standard, at 229, was well below the state’s average NAEP score. Even though grade 8 students in all three states, on average, had comparable scores on NAEP, students in Tennessee were more likely to score “Proficient” on their state grade 8 mathematics assessment than were students in Oklahoma or Arkansas on their state assessments.

Changes in State Assessments

You’ve just seen the way state proficiency standards aligned on the NAEP scales in 2009. We will now shift our focus to states that changed their assessments from 2005 or 2007 to 2009 and the effects those changes had on their proficiency standards.

A number of states reported that they made changes to key aspects of their assessments prior to 2009. For each of the four assessments, 8 or 9 states reported that they made such changes between 2007 and 2009, either modifying the assessment or changing the standard itself. At least 17 states have reported changes to an assessment since 2005.

In all of these cases, the changes were significant enough to prevent comparisons of student performance on the assessment from 2005 or 2007 to 2009. Using our mapping methodology, however, we were able to see if changes in a state’s assessment or in their standard affected the rigor of the standard.

In seven states—Indiana, Mississippi, New Jersey, North Carolina, Oklahoma, South Dakota, and West Virginia—the NAEP equivalent scores increased significantly. For example, Mississippi’s reading grade 4 NAEP equivalent score rose from 163 in 2007 to 210 in 2009.

For Illinois, the NAEP equivalent scores did not change significantly.

In one state, South Carolina, we see the NAEP equivalent score decreased.

The average equivalent score for all the state proficiency standards for grade 4 reading in 2007 was 199, which was 9 points below the NAEP Basic achievement level cutpoint of 208. This 199-point average does not reflect a consensus or a goal that all the states should be moving toward. I simply mention it to give you an idea of where these states are in comparison to the average.

The state proficiency standard increased significantly in seven of these states.

In five states, it did not change significantly either way.

In the remaining five, it decreased significantly.

The average of the state proficiency standards for grade 4 reading in 2005 was 197, also below NAEP’s Basic cutpoint.

In six states the NAEP equivalent scores increased significantly. These states were Indiana, Mississippi, North Carolina, Oklahoma, South Dakota, and West Virginia.

For Illinois, the NAEP equivalent scores did not change significantly.

The NAEP equivalent scores decreased in New Jersey and South Carolina.

The average NAEP equivalent score for all the state proficiency standards for grade 8 reading in 2007 was 245, which was 2 points about the NAEP Basic achievement level cutpoint for grade 8—243 points.

The state proficiency standard increased significantly in five of these states.

In one state, Connecticut, it did not change significantly either way.

In the remaining fourteen, it decreased significantly.

The average of the state proficiency standards for grade 8 reading in 2005 was 247.

State Proficiency Standards 2009 vs. 2007: Mathematics, Grade 4

At grade 4, eight states made changes in the mathematics assessment from 2007 to 2009.

In five states—Georgia, Mississippi, New Jersey, Oklahoma, and West Virginia—the state proficiency standard increased.

In Illinois and Indiana, it did not change significantly.

In South Carolina, it decreased significantly.

The average of the state proficiency standards for grade 4 mathematics in 2007 was 223, compared to the NAEP Mathematics Basic achievement level cutpoint of 214.

State Proficiency Standards 2009 vs. 2005: Mathematics, Grade 4

Comparing 2009 to 2005, 19 states made changes in the grade 4 mathematics assessment.

The state proficiency standard increased significantly in eight of these states.

In four states, it did not change significantly either way.

In the remaining seven, it decreased significantly.

The average of the state proficiency standards for grade 4 mathematics in 2005 was 224, within NAEP’s Basic range.

State Proficiency Standards 2009 vs. 2007: Mathematics, Grade 8

At grade 8, eight states made changes in the mathematics assessment from 2007 to 2009.

The state proficiency standard increased in three of these states—Indiana, Oklahoma, and West Virginia.

In four states—Georgia, Illinois, Mississippi, and New Jersey—the state standard did not change significantly.

In South Carolina, it decreased significantly.

The average of the state proficiency standards for grade 8 mathematics in 2007 was 270, compared to the NAEP Basic cutpoint of 262.

State Proficiency Standards 2009 vs. 2005: Mathematics, Grade 8

Comparing 2009 to 2005, 23 states made changes in the grade 8 mathematics assessment.

The state proficiency standard increased significantly in five of these states.

In four states, it did not change significantly either way.

In the remaining fourteen, it decreased significantly.

The average of the state proficiency standards for grade 8 mathematics in 2005 was 272, within NAEP’s Basic range.

Changes in standards from 2007-2009: Summary

For those states in which assessments or standards were changed between 2007 and 2009, increases in standards were more common than decreases. South Carolina decreased their standards in all assessments.

In some states, such as Illinois, a change in the state assessment did not affect the state’s proficiency standard.

Changes in standards from 2005-2009: Summary

Shifting the focus to changes occurring from 2005 to 2009, more changes in state assessments resulted in decreases in standards than increases.

As for the shorter time period 2007-2009, in a number states a change in the assessment did not affect the state’s proficiency standard.

NAEP Corroborating Progress Reported by States

We will now examine how measures of progress as measured by state assessments may be confirmed using NAEP.

Our mapping methodology allows a comparison of the measures of achievement change from one assessment year to the next, as shown by state assessments and NAEP, in states where assessments did not change significantly over the time period. In some cases, NAEP results corroborate a state’s finding of achievement change, and in some cases NAEP shows a different result.

Idaho is an example of a state for which change in student achievement was not corroborated by NAEP. The baseline for this comparison is 2007, when 86.1 percent of Idaho’s students scored at or above the state’s proficiency standard, as reflected on both Idaho’s assessment and NAEP.

In 2009, 91.6 percent of Idaho’s students met the state’s proficiency standard, an increase of 5.5 percentage points. However, on NAEP in 2009, 84.4 percent of Idaho’s students scored at or above the state’s 2007 proficiency standard, which was 233 on the NAEP scale, a decline of 1.6 percentage points. Subtracting the state change from the NAEP change yields a difference of minus 7.1 points. This difference was statistically significant, meaning that NAEP did not corroborate the change shown on the state assessment. The minus sign indicates that the state result showed a “more positive” change than NAEP—either a larger increase or a smaller decrease on the state assessment as compared to NAEP.

In 2009, that percentage rose to 88.1 percent, an increase of 1.1 percentage points.

On NAEP, the percentage did not change, remaining at 87 percent. Since the difference between the two measures—1.1 percentage points—was not statistically significant, NAEP’s result was consistent with Colorado’s.

Comparing Measures of Change in Reading Achievement from 2007 to 2009

In reading, from 2007 to 2009, at grade 4, NAEP and state measurements of change were not statistically different in 14 states. In 22 states, the state measure was more positive than NAEP, while in the remaining 4 states, the NAEP measure of change was more positive than the state measure.

At grade 8, NAEP corroborated the changes reporting in state measurements of change in 17 states. In 20 states, the state measure was more positive than NAEP, while in the remaining 3 states, the NAEP measure was more positive than the state measure.

A number of these states had relative errors of greater than 0.5, which means that we can’t be highly confident that our analysis produced the correct comparison without additional evidence. At grade 4, these states are Maryland, Texas, and Vermont, for which NAEP corroborated state changes; Georgia, Kansas, and Virginia, for which the state results indicated more positive change than did NAEP; and Wyoming, for which NAEP reported more positive change than the state assessment. At grade 8, relative error was greater than 0.5 for Washington, for which NAEP corroborated the change according to the state assessment results; and Maine and Virginia, where the state assessments indicated more positive change than NAEP.

Comparing Measures of Change in Mathematics Achievement from 2007 to 2009

In mathematics from 2007 to 2009, we see similar results. For many states, there is either no significant difference in the results shown by the state assessment and NAEP or else the state assessment shows more positive change than NAEP. In a few states, NAEP shows more positive change. At grade 4, 21 state assessments showed more positive change than NAEP. At grade 8, 17 state assessments showed more positive change than NAEP.

When we identify the states with relative errors of greater than .5, we have, again beginning with those states for which NAEP corroborated change according to state assessments in grade 4, we have North Dakota and Tennessee; and Michigan, Texas, and Virginia, where state assessment results indicated greater positive change than NAEP. At grade 8, there was high relative error in the analysis for North Dakota, whose change was corroborated by NAEP; and Hawaii and Virginia, where the state assessments indicated more positive change than NAEP.

Summary: 2009 State Standards

A major finding of the study is that there is wide variation among state proficiency standards, with a range of at least 60 points on the NAEP scale, depending on subject and grade. With such a wide range, a student considered proficient in one state may not be considered proficient in another.

Secondly, almost all state standards are mapped at or below NAEP’s Basic achievement level, which represents partial mastery of knowledge and skills fundamental for proficient work at each grade according to NAEP’s achievement level definitions.

Summary: Change in State Standards

When we look at states that made significant changes in an assessment compared with 2007, most states that made substantive changes in their assessment moved toward more rigorous standards.

Between 2005 and 2009, on the other hand, there were more decreases in rigor than increases.

Summary: Confirming State Progress Over Time

Compared to 2009, changes from 2007 in the proportion of students meeting states’ standards for proficiency were corroborated by the proportion of students meeting proficiency as measured by NAEP in less than half the cases, regardless of grade or assessment.

For example, in grade 4 reading, NAEP corroborated the state results in 14 out of 40 cases. In 22 cases, the state result was more positive than NAEP, while in 4 cases, the NAEP result was more positive.