Understanding the 2009 Reading Trend Study

NAEP frameworks provide the basis for the content of the assessments in each subject area, and describe the types of questions that should be included, as well as how those questions should be designed and scored. The NAEP frameworks are developed under the guidance of the National Assessment Governing Board. Frameworks are periodically updated or redeveloped in order to reflect current educational practice, such as changes in standards or coursework.

A new framework was developed for the 2009 NAEP reading assessment at grades 4, 8, and 12. The framework (2055K PDF) describes how reading is defined for the 2009 assessment and how this differs from the previous reading framework. The previous reading framework was first implemented in 1992 and was used for subsequent assessments through 2007 (or 2005 at grade 12). Because these assessments are based on the same framework, and therefore on a common definition of and approach to assessing reading comprehension, the results can be directly compared and reported as a trend line.

Past NAEP practice has been to start a new trend line when a new framework is introduced. However, special analyses were conducted in 2009 to determine if the results from the 2009 reading assessment could be compared to results from earlier years despite being based on a new framework.

The first step was to conduct a content alignment study to closely examine and compare the ?new? (2009) and ?old? (1992–2007) reading frameworks. Alignment studies are often used to help determine the extent to which two assessments are similar with respect to their purpose, characteristics, and content. A panel of content experts, such as reading teachers and teacher educators, looked at reading passages and questions from both the old and new assessments and judged how well they aligned with the specifications of each framework. It was determined that the old and new reading passages and questions were sufficiently similar to continue to the next stage of the special analyses: a reading trend study in 2009.

The purpose of the 2009 reading trend study was to compare results based on the 2009 and 2007 reading assessment instruments. Trend studies, also referred to as bridge studies, have previously been used in NAEP to evaluate the impact of changes to the assessment on the comparability of scores. For instance, results of a 2004 bridge study demonstrated that trend lines could be continued after a number of changes were made to the long-term trend assessments in reading and mathematics.

In the 2009 reading trend study, students were randomly assigned to take the old (2007) assessment, the new (2009) assessment, or a specially designed ?mixed? assessment that contained material from both the old and new assessments. By administering both the old and new assessments in 2009, and by having some students answer questions from both assessments, it was possible to examine empirically the relationship between the old and new assessments. If analyses showed that the old and new assessments were similar, then it would be possible to compare the 2009 results directly to those from previous years.

The special analyses into the relationship between the old and new assessments focused on three main questions:

1. How do the blocks of old and new assessment questions compare in terms of difficulty, student nonresponse rates, student ability to complete all questions in the block, and reliability?

2. What is the relationship between the old and new assessment scales?

3. Do the old and new assessments produce similar results (scale scores and percentages of students reaching the NAEP achievement levels) for major reporting groups?

Overall, the results of the special analyses suggested that the old and new assessments were similar in terms of their item and scale characteristics and the results they produced for important demographic groups of students. It was determined that the results of the 2009 reading assessment could still be compared to those from earlier assessment years, thereby maintaining the trend lines established in 1992. The results reported for 2009 are based on the total pool of questions administered to students in 2009—that is, the reading scales are based on the performance of students who took the old, new, and mixed assessments in 2009.

Although the reading trend lines are maintained, the implementation of the new framework for 2009 and for future assessment years meant that some of the differences between the old and new frameworks had to be addressed in the analysis of the 2009 data. Two such changes are discussed here:

  • At grades 8 and 12, ?reading to perform a task? is no longer assessed.

From 1992 to 2007, results were reported for three reading subscales that correspond to different contexts for reading: reading for literary experience, reading for information, and reading to perform a task. The 2009 framework, however, focuses exclusively on the literary and informational text types at all three grades. As a result, a separate scale for reading to perform a task was not estimated in 2009 for grades 8 and 12, and the trend line for this scale ends in 2007. Trend lines for the literary scale and the information scale at grade 8, however, are maintained. At grade 12, trend lines could not be maintained for the information scale because of changes to the framework specifications for reading for information at this grade. Grade 12 texts that would have been categorized under the task subscale in the 1992-2007 framework may now be included under the information subscale in the 2009 framework. In contrast, texts formerly associated with the task scale at grade 8 do not have a corresponding place in the 2009 framework. Therefore, the 2009 information scale at grade 12 is not comparable to previous information scale results.

  • At all three grades, the composite reading scale is defined differently in the 2009 framework than in previous years.

In addition to reporting results for the separate reading subscales, a composite NAEP reading scale is also reported. The composite reading scale is a weighted combination of the subscales, with the weights reflecting the relative proportion of total testing time dedicated to each subscale. The distribution of testing time across the reading subscales is specified by the framework, and is different in the 2009 framework compared to the 1992–2007 framework. At grade 4, although the same two subscales are reported, the proportion of testing time dedicated to each changed under the 2009 framework. At grades 8 and 12, the composite scale is affected by the change in the number of subscales, as discussed above. The table below summarizes the differences in subscales and allocation of total testing time between the 2009 framework and the previous framework.

Percentage of total testing time allocated to each reading context/subscale, by grade: 1992–2007 and 2009 reading frameworks
  1992–2007 Framework 2009 Framework
Context/Subscale Grade 4 Grade 8 Grade 12 Grade 4 Grade 8 Grade 12
Literary 45 40 35 50 45 30
Informational 55 40 45 50 55 70
Task 20 20
† Not applicable. Reading to perform a task is not assessed.
SOURCE: U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics, National Assessment of Educational Progress (NAEP), 1992–2009 Reading Assessments.

The reading composite scale results reported for 2009 are based on the 2009 framework specifications. Analyses affirmed that the composite scales reported for grades 4, 8, and 12 in 2009 could be validly compared to composite results from previous years despite the changes in framework specifications.

For more information on NAEP scaling procedures in general, read technical documentation on the estimation of scale scores.

Last updated 14 July 2010 (RF)