Interpreting NAEP Reading ResultsOverview of the Assessment Overview of the AssessmentNAEP assesses student performance in reading by administering assessments to samples that are representative of the nation's students. The content of the NAEP reading assessment is determined by a framework developed with the help of researchers, policymakers, and the interested public as well as expert perspectives about reading and its measurement. Read more about what the assessment measures, how it was developed, who took the assessment and how the assessment was administered. The 2003 reading results presented on the website are based on representative samples of students for the nation and for participating states and other jurisdictions. Approximately 343,000 students from 14,000 schools were assessed at grades 4 and 8. The national results reflect the performance of students attending both public and nonpublic schools, while the state and jurisdiction results reflect only the performance of students attending public schools. Beginning in 2002, the NAEP national sample was obtained by aggregating the samples from each state, rather than by obtaining an independently selected national sample. As a consequence the national sample size increased, and smaller differences between years or between types of student were found to be statistically significant than would have been detected in previous assessments. In keeping with past practice, all statistically significant differences are indicated in the current web results pages. Comparisons are made to results from previous years in which the assessment was administered. In addition to the 2003 results, national results are reported from the 1992, 1994, 1998, and 2002 assessments and from the 2000 assessment for the fourth-grade. Fourth-grade state and/or jurisdiction results are reported from the 1992, 1994, 1998, 2002 and 2003 assessments. Eighth-grade results are reported from the 1998, 2002, and 2003 assessments. Results from 1998 or later are based on administration procedures in which testing accommodations were permitted for students with disabilities and limited-English-proficient students. Accommodations were not permitted in earlier assessments. Comparisons between results from 2003 and those from assessment years in which both types of administration procedures were used (i.e., 1998 and 2000 at grade 4 and in 1998 at grade 8) are based on the results when accommodations were permitted. Changes in student performance across years or differences between groups of students in 2003 are discussed only if they have been determined to be statistically significant. Reporting the Assessment—Scale Scores and Achievement LevelsThe results of student performance on the NAEP reading assessment are presented on this website in two ways: as average scores on the NAEP reading scale and as the percentages of students attaining NAEP reading achievement levels. The average scale scores represent how students performed on the assessment. The achievement levels represent how that performance measured up against set expectations for achievement. Thus, the average scale scores represent what students know and can do, while the achievement-level results indicate the degree to which student performance meets expectations of what they should know and be able to do. Average reading scale score results are based on the NAEP reading scale, which ranges from 0 to 500. The NAEP reading assessment scale is a composite combining separate scales for each reading context specified by the reading framework (at grade 4, reading for literary experience and reading for information, and at grade 8, those contexts and reading to perform a task). Average scale scores are computed for groups, not for individual students. The average scores are based on analyses of the percentages of students who answered each item successfully. While the score ranges at each grade in reading are identical, the scale was derived independently at each grade. Therefore, average scale scores across grades cannot be compared. For example, equal scale scores on the grade 4 and grade 8 scales do not imply equal levels of reading achievement. Achievement-level results are presented in terms of reading achievement levels adopted by the National Assessment Governing Board (NAGB), and are intended to measure how well students' actual achievement matches the achievement desired of them. For each grade tested, NAGB has adopted three achievement levels: Basic, Proficient, and Advanced. NAGB's overall goal for students is performance at the Proficient level or higher. For reporting purposes, the achievement level cut scores are placed on the reading scales, resulting in four ranges: below Basic, Basic, Proficient, and Advanced. NAGB established its achievement levels in 1996 based upon the reading content framework and standard-setting process. A cross section of educators and interested citizens from across the nation were asked to judge what students should know and be able to do relative to the content reflected in the NAEP reading framework. As provided by law, the NCES has determined that the achievement levels are to be considered on a trial basis and should be interpreted and used with caution. However, both NCES and NAGB believe these performance standards are useful for understanding trends in student achievement. Description of Reading Performance by Item Maps for Each GradeThe performance of fourth- and eighth-graders can be illustrated by maps that position question descriptions along the NAEP reading scale. The descriptions used on these maps focus on the reading skill or knowledge needed to answer the question. For multiple-choice questions, the description indicates the skill or knowledge demonstrated by selection of the correct option; for constructed-response questions, the description takes into account the skill or knowledge specified by the different levels of scoring criteria for that question. Approximately 28 reading questions per grade have been selected and placed on an item map for grade 4 and grade 8. Results are EstimatesThe average scores and percentages presented on this website are estimates because they are based on representative samples of students rather than on the entire population of students. Moreover, the collection of subject-area questions used at each grade level is but a sample of the many questions that could have been asked. As such, NAEP results are subject to a measure of uncertainty, reflected in the standard error of the estimates. The standard errors for the estimated scale scores and percentages in the figures and tables presented on this web site are available through the NAEP Data Tool. NAEP Reporting GroupsResults are provided for groups of students defined by shared characteristics—gender, race or ethnicity, school's type of location, Title I participation, eligibility for free/reduced-price school lunch, and type of school. Based on participation rate criteria, results are reported for subpopulations only when sufficient numbers of students and adequate school representation are present. The minimum requirement is at least 62 students in a particular subgroup from at least five primary sampling units (PSUs). However, the data for all students, regardless of whether their subgroup was reported separately, were included in computing overall results. Explanations of the reporting groups are presented below. GenderResults are reported separately for males and females. Race/EthnicityIn all NAEP assessments, data about student race/ethnicity is collected from two sources: school records and student self-reports. Before 2002, NAEP used students' self-report of their race and ethnicity on a background questionnaire as the source of race/ethnicity data. In 2002, it was decided to change the student race/ethnicity variable highlighted in NAEP reports. Starting in 2002, NAEP reports of students' race and ethnicity are based on the school records, with students' self-report used only if school data are missing. Information based on student self-reported race/ethnicity will continue to be reported in the NAEP Data Tool. In order to allow comparisons across years, assessment results presented are based on school-reported information for six mutually exclusive racial/ethnic categories: White, Black, Hispanic, Asian/Pacific Islander, American Indian (including Alaska Native), and Other. Students who identified with more than one of the first five categories or had a background other than the ones listed were categorized as Other. Type of LocationResults from the 2003 assessment are reported for students attending schools in three mutually exclusive location types: Central city: The U.S. Census Bureau defines "central city" as the largest city of a Metropolitan Statistical Area (MSA) or a Consolidated Metropolitan Statistical Area (CMSA). An MSA is an area defined by the federal government for the purposes of presenting general-purpose statistics for metropolitan areas. Typically, an MSA contains a city with a population of at least 50,000 and includes its adjacent areas. An MSA becomes a CMSA if it meets the requirements to qualify as a metropolitan statistical area, has a population of 1,000,000 or more, its component parts are recognized as primary metropolitan statistical areas, and local opinion favors the designation." Urban fringe/large town: The urban fringe category includes any incorporated place, census-designated place, or non-place territory within a CMSA or MSA of a large or mid-sized city that is defined as urban by the U.S. Census Bureau, but that does not qualify as central city. A large town is defined as a place outside a CMSA or MSA with a population greater than or equal to 25,000. Rural/small town: Rural includes all places and areas with populations of less than 2,500 that are classified as rural by the U.S. Census Bureau. A small town is defined as a place outside a CMSA or MSA with a population of less than 25,000 but greater than or equal to 2,500. Starting in 2000, NAEP adopted new methods to identify the type of location assigned to each school in the Common Core of Data (CCD). The new methods were put into place by NCES in order to improve the quality of the assignments, and they take into account more information about the exact physical location of the school. Therefore the results for type of location are not comparable to those for years before 2000. To provide a context for the data collected in the 2003 Trial Urban District Assessment in reading (TUDA), results are presented for students attending public schools in the nation as a whole, as well as for public schools located in large central cities across the nation. Large central cities are defined by NCES as cities with a population of 250,000 or more. Title I ParticipationBased on available school records, students were classified either as currently participating in a Title I program, receiving Title I services, or as not participating. The classification applies only to the school year when the assessment was administered (i.e., the 2002-03 school year) and is not based on participation in previous years. If the school does not offer any Title I programs or services, all students in that school would be classified as not participating. Eligibility for Free/Reduced-Price School LunchAs part of the Department of Agriculture's National School Lunch Program, schools can receive cash subsidies and donated commodities in turn for offering free or reduced-price lunches to eligible children. Based on available school records, students were classified as either currently eligible for the free/reduced-price school lunch or not eligible. Eligibility for free and reduced-price lunches is determined by students' family income in relation to the federally established poverty level. Students whose family income is at or below 130 percent of the poverty level qualify to receive free lunch, and students whose family income is between 130 percent and 185 percent of the poverty level qualify to receive reduced-price lunch. The classification applies only to the school year when the assessment was administered (i.e., the 2002-03 school year) and is not based on eligibility in previous years. If school records were not available, the student was classified as "Information not available." If the school did not participate in the program, all students in that school were classified as "Information not available." Type of SchoolResults are reported by the type of school that the student attends—public or nonpublic. Nonpublic schools include Catholic and other private schools. Because they are funded by federal authorities, not state/local governments, Bureau of Indian Affairs (BIA) schools, Department of Defense Dependents Schools (Overseas), and Department of Defense Domestic Dependent Elementary and Secondary Schools (DDESS) are not included in either the public or nonpublic categories; they are included in the overall national results. Exclusion Rates and Assessment ResultsAll 50 states and 3 other jurisdictions participated in the 2003 reading assessment. To ensure that the samples in each state are representative, NAEP has established policies and procedures to maximize the inclusion of all students in the assessment. Every effort is made to ensure that all selected students who are capable of participating meaningfully in the assessment are assessed. While some students with disabilities (SD) and/or limited-English-proficient (LEP) students can be assessed without any special procedures, others require accommodations to participate in NAEP. Still other SD and/or LEP students selected by NAEP may not be able to participate. Local school authorities determine whether SD/LEP students require accommodations or should be excluded because they cannot be assessed. The percentage of SD and/or LEP students who are excluded from NAEP assessments varies from one jurisdiction to another and within a jurisdiction over time. If excluded students are less proficient readers, variations in exclusion rates could have an impact on average reading scores or score gains within jurisdictions. NCES is currently sponsoring ongoing research on the potential impact of changes in exclusion rates on changes in average reading performance. The preliminary findings from the research suggest that the potential impact on reading scores is minimal. For example, one model examined what the results might have been if no fourth-grade students had been excluded in 1998 and 2003. For 21 of 38 jurisdictions that participated in both 1998 and 2002 the change in average reading scores might have differed by up to one point in either direction from what is being reported. Thirty-five of the 38 jurisdictions might have differed by up to three points, and another three jurisdictions might have differed by three points or more. Further discussion of this research is presented in appendix A of the 2002 Reading Report Card. Statistical SignificanceDifferences between scale scores and between percentages that are discussed in the results on this website take into account the standard errors associated with the estimates. Comparisons are based on statistical tests that consider both the magnitude of the difference between the group average scores or percentages and the standard errors of those statistics. Throughout the results, differences between scores or between percentages are discussed only when they are significant from a statistical perspective. All differences reported are significant at the 0.05 level with appropriate adjustments for multiple comparisons. The term "significant" is not intended to imply a judgment about the absolute magnitude or the educational relevance of the differences. It is intended to identify statistically dependable population differences to help inform dialogue among policy makers, educators, and the public. Cautions in InterpretationsUsers of this website are cautioned against interpreting NAEP results as implying causal relations. Inferences related to subgroup performance or to the effectiveness of public and nonpublic schools, for example, should take into consideration the many socioeconomic and educational factors that may also impact performance. The NAEP reading scale makes it possible to examine relationships between students' performance and various background factors measured by NAEP. However, a relationship that exists between achievement and another variable does not reveal its underlying cause, which may be influenced by a number of other variables. Similarly, the assessments do not reflect the influence of unmeasured variables. The results are most useful when they are considered in combination with other knowledge about the student population and the educational system, such as trends in instruction, changes in the school-age population, and societal demands and expectations. Beginning in 2002, the NAEP national sample was obtained by aggregating the samples from each state, rather than by obtaining an independently selected national sample. As a consequence the national sample size increased, and smaller differences between years or between types of student were found to be statistically significant than would have been detected in previous assessments. A caution is also warranted for some small population group estimates. At times in the results pages, smaller population groups show very large increases or decreases across years in average scores. For example, fourth-grade Hispanic students in Delaware are reported as having a 36-point score increase between 1998 and 2002. However, it is often necessary to interpret such score gains with extreme caution. For one thing, the effects of exclusion-rate changes for small subgroups may be more marked for small groups than they are for the whole population. To continue with the Delaware example, 2 percent of Hispanic students were excluded in 1998. This number increased to 21 percent in 2002. Also, the standard errors are often quite large around the score estimates for small groups, which in turn means the standard error around the gain is also large. While the Delaware Hispanic student scores went up 36 points, the standard error of the gain is almost 12 points, which means that statisticians are confident that the estimate is correct within 23.5 points (i.e., 36 ± 23.5 points). Return to main results.
|