
Help for the NAEP 1999 Long-Term Trend Summary Data TablesThe NAEP 1999 long-term trend summary data tables present mathematics, reading, and science trend results from the 1999 National Assessment of Educational Progress (NAEP). A long-term trend writing assessment was also administered in 1999; however, the results of that assessment are undergoing evaluation for possible later release. The NAEP long-term trend assessments are separate from a series of newer NAEP assessments (called "main" assessments) that involve more recently developed instruments. While the long-term trend assessments have used the same sets of questions and tasks so that trends across time can be measured, the main assessments in each subject area have been developed to reflect current educational content and assessment methodology. In some cases, the main assessment in a particular subject area has been administered in more than one year, providing short-term trend results (e.g., mathematics in 1990, 1992, and 1996; and reading in 1992, 1994, and 1998). The use of both long-term trend and main assessments allows NAEP to provide information about students' achievement over time and to evaluate their attainment of more contemporary educational objectives. Because they are based on a different set of questions and tasks, scale score results and students' reports of educationally related experiences from the long-term trend assessments cannot be directly compared to the main assessments. Help is available for the following topics:
The Mathematics Trend Assessment The Mathematics Trend AssessmentNAEP has assessed the mathematics achievement of in-school 9-, 13-, and 17-year-olds nine times, in the school years ending in 1973, 1978, 1982, 1986, 1990, 1992, 1994, 1996, and 1999. The trend assessment, which forms the basis of the results provided in these tables, uses procedures established in 1973. The assessments were presented in paced-tape administrations, and for each of the assessments, 13-year-olds were assessed in the fall, 9-year-olds were assessed in the winter, and 17-year-olds were assessed in the spring of the assessment school year. The same assessment booklets were used in 1986, 1990, 1992, 1994, 1996, and 1999; these booklets contained blocks of mathematics questions and blocks of science questions, as well as background questions. The mathematics trend assessments contained a range of constructed-response and multiple-choice questions measuring performance on sets of objectives developed by nationally representative panels of mathematics specialists, educators, and other interested parties. The 1986, 1990, 1992, 1994, 1996, and 1999 assessments shared common objectives. The objectives for each assessment prior to 1990 were based on the framework used for the previous assessment with some revisions that reflected changes in the contents of mathematics education. Although changes were made from assessment to assessment before 1990, some questions were retained from one assessment to the next in order to measure trends in achievement across time. This allows comparisons across all of the available assessments, other than the 1973 assessment, to be made using IRT. Results from the 1973 assessment were placed on the same scale using mean proportion correct extrapolation. The 1986, 1990, 1992, 1994, 1996, and 1999 mathematics trend assessments included 71 questions, including 28 constructed-response questions at age 9; 127 questions, including 27 constructed-response questions, at age 13; and 132 questions, including 29 constructed-response questions at age 17. The questions covered a range of content, including numbers and operations, measurement, geometry, and algebra. The process areas include knowledge, understanding, skills, applications, and problem solving. The Reading Trend AssessmentNAEP has assessed students' reading performance at age 9 or in grade 4, at age 13 or in grade 8, and at age 17 or in grade 11 in ten reading assessments conducted during the school years ending 1971, 1975, 1980, 1984, 1988, 1990, 1992, 1994, 1996, and 1999. For each assessment, 13-year-olds and eighth graders were assessed in the fall, 9-year-olds and fourth graders were assessed in the winter, and 17-year-olds and eleventh graders were assessed in the spring of the assessment school year. Because data from both the age samples and the grade samples were used to establish the reading trend scale in 1984 when scaling of the trend assessments was first done, this practice has been replicated in all subsequent trend assessments. Results reported in this document, however, are results for the 9-, 13-, and 17-year-olds assessed each year. The same assessment booklets, containing blocks of reading, writing, and background questions, were used in 1984, 1988, 1990, 1992, 1994, 1996, and 1999. The assessments since 1984 were administered in printed form; previous to that time the assessments were paced using audiotapes. In 1984, the assessment was administered in both modes. The reading tasks required students to read and answer questions based on a variety of materials, including informational passages, literary text, and documents. Although some tasks required students to provide written responses, most questions were multiple-choice questions. The assessment was designed to evaluate students' ability to locate specific information, make inferences based on information in two or more parts of a passage, or identify the main idea in a passage. For the most part, these questions measured students' ability to read either for specific information or for general understanding. Although the reading assessments conducted through the 1970s underwent some changes from test administration to administration, the set of reading passages and questions included in the trend assessments has been kept essentially the same since 1984, and most closely reflects the objectives developed for that assessment. The reading trend assessment administered at age 9/grade 4 included 45 passages and 105 questions, including eight that required students to construct written responses. At age 13/grade 8, the assessment included 43 passages and 107 questions, seven of them requiring constructed responses. At age 17/grade 11, the assessment contained 36 passages and 95 questions, eight of them requiring constructed responses. The Science Trend AssessmentNAEP conducted trend assessments of the science achievement of in-school 9-, 13-, and 17- year-olds during the school years ending in 1970, 1973, 1977, 1982, 1986, 1990, 1992, 1994, 1996, and 1999. In the first assessment, the 17-year-olds were assessed during the spring of the school year ending in 1969, rather than 1970. For each of the other assessments, 13-year-olds were assessed in the fall, 9-year-olds were assessed in the winter, and 17-year-olds were assessed in the spring of the assessment school year. Identical assessment booklets, containing blocks of science, math, and background questions, were used in 1986, 1990, 1992, 1994, 1996, and 1999. The assessments were administered using an audiotape that guided the students through the assessment questions. The use of audiotape minimized the dependence of the science results on reading ability. The science trend assessments measured student achievement based on assessment objectives developed by nationally representative panels of scientists, science educators, and concerned citizens. The objectives which formed the basis for the 1986, 1990, 1992, 1994, 1996, and 1999 trend assessments replicated the objectives used in previous assessments. The objectives for each assessment prior to 1986 were based on the framework used for the previous assessment with some revisions that reflected changes in content and trends in school science. Since 1986, the objectives have been identical from assessment to assessment. Although changes were made in the content of the assessment before 1990, some questions were retained from one assessment to the next in order to measure trends in achievement across time. This allows comparisons across all of the available assessments to be made. All of the trend assessments from 1977 onward contained enough common questions to put the results from these assessments on the same scale using item response theory (IRT) scaling. The 1970 and 1973 assessments had too few questions in common with subsequent assessments to have results put directly on the IRT scale; results from these assessments were placed on the trend scale using mean proportion correct extrapolation for the common questions. The 1999 science trend assessment contained 63 multiple-choice questions at age 9, 83 multiple-choice questions at age 13, and 82 multiple-choice questions at age 17. The assessment covered a range of science content areas, including topics from the life sciences, physical sciences, and earth and space sciences. Questions assessed students' abilities to understand basic scientific facts and principles, solve problems in scientific contexts, design experiments, interpret data and read tables and graphs, and understand the nature of science. Types of Summary Data TablesIn 1999, NAEP examined long-term trends in the ability of nationally representative samples of students at ages 9, 13, and 17 in mathematics, reading, and science. In each subject area, the same sets of questions and tasks used in previous trend assessments were administered again using procedures replicated from previous assessments. A key component of the program was the contextual information collected from students, who were asked a series of questions about demographic information, their home environment, and experiences and instruction in the particular subject area being assessed. The long-term trend summary data tables are based on responses to these questions. The results include average scale scores and percentages for each response alternative, scale score percentiles, performance level percentages, and percentages for test questions. The results are enumerated for important demographic groups such as student gender, race/ethnicity, and parental education level. Three types of summary data tables are provided for each subject area. Additionally, a few special tables are provided for certain subject areas. The tables are described below. 1) Scale Scores and Performance Levels Data TablesThe Scale Scores and Performance Levels Data Tables present results based on data from the background questions that were administered to each student and on data derived from school-level information contained in the sampling frame. This information is presented overall and for percentages of students at each performance level. The left-hand side of the tables shows the categories for each of the student background variables. In the tables titled "Percentages by Reporting Subgroup," the columns contain, by trend year, the estimated percentage of students corresponding to each category of the background variable. In the tables titled "Average Scale Scores by Reporting Subgroup," the columns contain, for each trend year, the estimated average scale scores that correspond to each category. The remainder of the tables show the percentages of students in each category who received scale scores at or above the various NAEP performance levels. Standard errors for each of these statistics are shown in parentheses. In all of the tables, a "c" next to a value indicates that the value is significantly different from the value for 1999 at about the 95 percent certainty level. In tables that show scale scores and percentages at or above performance levels, the rightmost column indicates significant linear and quadratic trends. "L" indicates a significant positive linear trend; "l" indicates a significant negative linear trend. Correspondingly, "Q" and "q" respectively indicate a significant positive or negative quadratic trend. "NA" means that linear and quadratic trends were not tested due to a lack of at least 5 trend points with sufficient sample size to estimate the statistics. Select "View Table" to view and print these tables through your Web browser; select "Download PDF" to use the tables with Acrobat Reader; or select "Export Table" to use the table data in a spreadsheet. 2) Percentile Data TablesThe Percentile Data Tables provide estimated percentiles for the NAEP scales. The following information is provided for the total sample and for subgroups defined by gender and race/ethnicity: 1) estimated mean; 2) estimated standard deviation; and 3) estimates of the 10th, 25th, 50th (or median), 75th, and 90th percentiles. All estimates are followed in parentheses by their estimated standard errors. Percentile Data Tables are available only in PDF format. 3) Test Question Data TablesIn these tables, the weighted percentage correct for each test question administered in the assessment is shown for each trend year for the total sample and for gender and race/ethnicity subgroups. Percentages are followed by their estimated standard errors, shown in parentheses. Test Data Tables are available only in PDF format. 4) Extrapolated Data Tables for Mathematics and ScienceThe initial long-term trend scaling did not include the 1970 or 1973 science assessments or the 1973 mathematics assessment because these assessments had too few questions in common with subsequent assessments to have results put directly on the IRT scale. To provide a link to the early assessment results for the nation and for subgroups defined by race/ethnicity, gender, and region at each of three age levels, estimates of average scale scores were extrapolated from previous analyses. An additional set of summary data tables for mathematics and science (labeled "Extrapolated Data") show the extrapolated results for these early years juxtaposed with the data for subsequent years. These tables are similar in form to the Scale Score data tables, but present results only for subgroups defined by gender, race/ethnicity, and region. Extrapolated Data Tables are available only in PDF format. Notations Used in the Summary Data Tables
NAEP Reporting GroupsThe summary data tables provide results for the nation and for groups of students defined by shared characteristics. Based on statistically determined criteria, results are reported for subpopulations only when sufficient numbers of students and adequate school representation are present. The minimum requirement is at least 62 students in a particular subgroup from at least 5 primary sampling units (PSUs). A PSU is a selected geographical region -- a county, group of counties, or a metropolitan statistical area. However, the data for all students, regardless of whether their subgroup was reported separately, were included in computing the overall national results. Definitions of the subpopulations referred to in the summary data tables are presented below. 1) GenderResults are reported separately for males and females. 2) Race/ethnicityResults are presented for students of different racial/ethnic groups according to five mutually exclusive categories: White, Black, Hispanic, Asian/Pacific Islander, and American Indian (including Alaskan Native). 3) GradeResults are presented for students who are below, at, and above the modal grade (the grade attended by most students at the assessed age). 4) RegionResults are reported for four regions of the nation: Northeast, Southeast, Central, and West. Northeast: Connecticut, Delaware, District of Columbia, Maine, Maryland, Massachusetts, New Hampshire, New Jersey, New York, Pennsylvania, Rhode Island, Vermont, Virginia (DC metropolitan statistical area only) Southeast: Alabama, Arkansas, Florida, Georgia, Kentucky, Louisiana, Mississippi, North Carolina, South Carolina, Tennessee, Virginia (other than DC metro area), West Virginia Central: Illinois, Indiana, Iowa, Kansas, Michigan, Minnesota, Missouri, Nebraska, North Dakota, Ohio, South Dakota, Wisconsin West: Alaska, Arizona, California, Colorado, Hawaii, Idaho, Montana, Nevada, New Mexico, Oklahoma, Oregon, Texas, Utah, Washington, Wyoming 5) Type of Location (1994, 1996, and 1999 only)Results are provided for students attending public schools in three mutually exclusive location types -- central city, urban fringe/large town, and rural/small town -- as defined below. The type of location variable is defined in such a way as to indicate the geographical location of a student's school. The intention is not to indicate, or imply, social or economic meanings for these location types. The type of location variable, on which the current NAEP sampling is based, does not support the reporting of regional results. Therefore, only national results are presented. Central City: The Central City category includes central cities of all metropolitan statistical areas (MSAs). Central City is a geographic term and is not synonymous with "inner city." Urban Fringe/Large Town: An Urban Fringe includes all densely settled places and areas within MSAs that are classified as urban by the Bureau of the Census. A Large Town is defined as places outside MSAs with a population greater than or equal to 25,000. Rural/Small Town: Rural includes all places and areas with a population of less than 2,500 that are classified as rural by the Bureau of the Census. A Small Town is defined as places outside MSAs with a population of less than 25,000 but greater than or equal to 2,500. 6) Parents' Education LevelStudents were asked to indicate the extent of schooling for each of their parents -- did not finish high school, graduated high school, had some education after high school, or graduated college. The response indicating the higher level of education was selected for reporting. Note that a substantial number of fourth-graders/9-year-olds indicated that they did not know their parents' education level. 7) Type of SchoolResults are presented for public schools and nonpublic schools. 8) QuartilesResults are presented for students who were in the upper quartile (upper 25 percent), middle two quartiles (middle 50 percent), and the lower quartile (lower 25 percent) of student performance. NAEP ScalesFor the 1999 mathematics, reading, and science trend assessments, separate IRT scales were constructed within each grade. These scales were linked to the previously established scales within each subject area via a common population linking procedure. The reading trend scale was constructed based on the 1984 assessment and included all previous reading assessments. The science and mathematics trend scales were developed based on the 1986 science and mathematics assessments, respectively, and also included previous assessments. The initial trend scaling, however, did not include the 1969-70 or 1973 science assessments, or the 1973 mathematics assessment, because these assessments had too few questions in common with subsequent assessments. To provide a link to the early assessment results for the nation and for subgroups defined by race/ethnicity, gender, and region at each of three age levels, estimates of average scale scores were extrapolated from previous analyses. The extrapolated estimates were obtained by assuming that within a given age level the relationship between the logit transformation of a subgroup's average p-value (i.e., average proportion correct) for common questions and its respective scale score average was linear and that the same line held for all assessment years and for all subgroups within the age level. Because of the necessity for the use of extrapolation of the average scale scores for these early assessments, caution should be used in interpreting the patterns of trends across those assessment years. Performance LevelsTo facilitate interpretation of the NAEP results, the scales were divided into successive levels of performance and a "scale anchoring" process was used to define what it means to score in each of these levels. NAEP's scale anchoring follows an empirical procedure whereby the scaled assessment results are analyzed to delineate sets of questions that discriminate between adjacent performance levels on the scales. For the science, mathematics, and reading trend scales, these levels are 150, 200, 250, 300, and 350. For these five levels, questions were identified that were likely to be answered correctly by students performing at a particular level on the scale and much less likely to be answered correctly by students performing at the next lower level. The guidelines used to select such questions were as follows: students at a given level must have at least a 65 percent probability of success with the questions, while students at the next lower level have a much lower probability of success (that is, lower than 50 percent); and the difference in probabilities between adjacent levels must exceed 30 percent. For each of the three curriculum areas, subject-matter specialists examined these empirically selected question sets and used their professional judgment to characterize each level. The reading scale anchoring was conducted on the basis of the 1984 assessment, and the scale anchoring for mathematics and science trend reporting was based on the 1986 assessments. Minimum Sample Sizes for ReportingResults for mathematics, reading, and science performance and for background variables were tabulated and reported for groups defined by gender, race/ethnicity, region, type of location, parental education, and type of school. NAEP collects data for five racial/ethnic subgroups (White, Black, Hispanic, Asian/Pacific Islander, and American Indian/Alaskan Native) and four levels of parents' education (Graduated From College, Some Education After High School, Graduated From High School, and Did Not Finish High School) plus the category "I Don't Know." In some instances, the number of students in some of these groups was not sufficiently high to permit accurate estimation of performance and/or background variable results. As a result, data are not provided for the subgroups with students from very few schools or for the subgroups with very small sample sizes. For results to be reported for any national assessment subgroup, at least 5 PSUs must be represented in the subgroup. In addition, a minimum sample of 62 students per subgroup is required. For statistical tests pertaining to subgroups, the sample size for both groups has to meet the minimum sample size requirements. In the summary data tables, the notation "****" appears in place of a result whenever minimum sample size requirements are not met. Drawing Inferences and Analyzing Subgroup DifferencesBecause the percentages of students in the reporting groups and their average scale scores are based on samples -- rather than on entire populations -- the numbers reported are necessarily estimates. As such, they are subject to a measure of uncertainty, reflected in the standard error of the estimate. When the percentages or average scale scores of certain groups are compared, it is essential to take the standard error into account, rather than to rely solely on observed similarities or differences. Therefore, the comparisons provided in these summary tables are based on statistical tests that consider both the magnitude of the difference between the averages or percentages and the standard errors of those statistics. One of the goals of the assessment program is to estimate scale score distributions and percentages of students in the standard reporting groups based on the particular samples of students assessed. The use of confidence intervals, based on the standard errors, provides a way to make inferences about the population averages and percentages in a manner that reflects the uncertainty associated with the sample estimates. An estimated sample scale score average plus or minus 2 standard errors represents about a 95 percent confidence interval for the corresponding population quantity. This means that with 95 percent certainty, the average performance of the entire population of interest is within about plus or minus 2 standard errors of the sample average. Similar confidence intervals can be constructed for percentages, provided that the percentages are not extremely large or extremely small. For percentages, confidence intervals constructed in the above manner work best when sample sizes are large and the percentages being tested have magnitude relatively close to 50 percent. Statements about group differences should be interpreted with caution if at least one of the groups being compared is small in size and/or if "extreme" percentages are being compared. Percentages, P, were treated as "extreme" if P < P lim = 200 / (NEFF + 2) where the effective sample size NEFF is equal to P(100 - P) / (SE)2 and SE is the jackknife standard error of P. This "rule of thumb" cutoff leads to flagging a large proportion of confidence intervals that would otherwise include values < 0 or > 1. Similarly, at the other end of the 0-to-100 scale, a percentage is deemed extreme if 100 - P < Plim . In either extreme case, the confidence intervals described above are not appropriate, and procedures for obtaining accurate confidence intervals are quite complicated. For these cases in the summary data tables, the value of P is reported, but the standard error is indicated by (****). To determine whether there is a real difference between the average scale score (or percentage of a certain attribute) for two groups in the population, one needs to obtain an estimate of the degree of uncertainty associated with the difference between the average scale scores or percentages of these groups for the sample. This estimate of the degree of uncertainty -- called the standard error of the difference between the groups -- is obtained by squaring each group's standard error, summing these squared standard errors, and then taking the square root of this sum. This procedure produces a conservative estimate of the standard error of the difference, since the estimates of the group averages or percentages will be positively correlated to an unknown extent due to the sampling plan. Direct estimation of the standard errors of all reported differences would involve a heavy computational burden. Similar to the manner in which the standard error for an individual group average or percentage is used, the standard error of the difference can be used to help determine whether differences between assessment years are real. If zero is within the confidence interval for the differences there is no statistically significant difference between the groups. The descriptions of trend results are based on the results of statistical tests that consider both the estimates of average performance in each assessment year as well as the degree of uncertainty associated with these estimates. The purpose of basing descriptions on such tests is to restrict the discussion of observed trends and group differences to those that are statistically dependable. Hence, the patterns of results that are discussed are unlikely to be due to the chance factors associated with the inevitable sampling and measurement errors inherent in any large-scale survey effort like NAEP. Throughout the report NAEP 1999 Trends in Academic Progress, all descriptions of trend patterns, differences between assessment years, and differences between subgroups of students which are cited are statistically significant at the .05 level. Two distinct sets of statistical tests were applied to the trend results. The purpose of the first set of tests was to determine whether the results of the series of assessments in a given subject could be generally characterized by a line or a simple curve. Simple linear and curvilinear (or quadratic) patterns do not always provide a satisfactory summary description of the patterns of trend results. Hence, a second set of statistical tests were conducted which compared results from each previous assessment year to the 1999 results. It should be noted that statistically significant changes in student performance across a two-year period may be unlikely, and in fact, are not evident in the overall results or in the results for most subgroups of students presented in this report. Changes in the average achievement of populations or subpopulations are more likely to occur over extended periods of time. In addition, the inherent uncertainty associated with estimates of performance based on samples rather than entire populations necessitates consideration of standard errors in comparing assessment results, further constraining the likelihood that the magnitude of change which may occur between two years will be statistically significant. File FormatsTables are presented in three different file formats: HTML, CSV, and PDF. Select "View Table" to view and print HTML versions through your Web browser; select "Export Table" to export the table in CSV (comma separated values) format for use in spreadsheets or data analysis applications; or select "Download PDF" to view or print the tables with Acrobat Reader. Portable Document Format (PDF) is a cross-platform, fully searchable, open file format that retains the fidelity of the original documents. To use PDF files you must install an Adobe Acrobat Reader from Adobe Systems, Inc. With Acrobat Reader software, you can view, navigate, zoom in on, search, print, and extract information from PDF files. Versions of the Reader for Windows, Macintosh, MS-DOS, and Unix systems are available free of charge from Adobe's web site (http://www.adobe.com/products/acrobat/readstep2_allversions.html). Installation instructions and system requirements are provided at the Adobe web site. For More Information about NAEP and the Summary Data Tables For questions about the NAEP Data Tool, contact:
|