Skip Navigation
small NCES header image

The Nation's Report Card — NAEP 2004 Trends in Academic Progress

Dr. Peggy G. Carr Hello, and welcome to today's StatChat on the NAEP 2004 long-term trend report. I hope you?ve had time to examine the results. There were many interesting findings, including higher scores by 9- and 13-year-olds than in any previous mathematics assessment year and a smaller white-black achievement gap, at all three ages and both subjects, than in the prior assessment years. I?m interested in hearing what you want to talk about. So, let?s get right to your questions?

Sylvia from Englewood, New Jersey asked:
In the announcement about this chat, it was stated that in 1972, 6 percent of students enrolled in public schools were Hispanic and 78 percent were White. In 2003 those numbers shifted to 19 percent and 58 percent, respectively (Condition of Education, 2005). Could you please give the corresponding percentage of African Americans enrolled during the same time span?
Dr. Peggy G. Carr: The numbers for long-term trend (LTT) are a little different from those in the Condition of Education numbers. In 1971, approximately 5 percent of the LTT population were Hispanic and 84 percent where White, while in 2004, the numbers shifted to 16 and 64 percent at age 13. The percentage of African Americans has not changed much over that time. In 1971, Black students made up approximately 14, 15, and 11 percent of the population at ages 9, 13, and 17, respectively. In 2004, these numbers were 17, 15, and 12 percent.

Philip R. Fletcher from Rockville, MD asked:
What role do changing demographics--particularly the influx of people whose native language is not English--play in setting national norms for achievement tests such as those used in NCLB, for example? Should people whose native language is not English be excluded from the norming sample in order to maintain comparability with earlier test results? Or should they be included in the norming sample to reflect current population conditions? What are the implications of norming decisions for trend analysis and for policy decisions in the education sector?
Dr. Peggy G. Carr: All NAEP assessments assess the student population as it is in the year of the assessment. NAEP's scale scores are not normed but linked to past year scores: the scale itself does not change. Therefore, an increase in English language learners can result in a decrease in the overall scale score, which could show in the trend results, but has no bearing on norms. Achievement levels are only used in main NAEP, but to set the achievement levels (Basic, Proficient, and Advanced), NAEP does use a reference sample in combination with expert judgments about the difficulty of the test. These samples are carefully drawn at the beginning of the trend and are used throughout the trend. Therefore, changing populations are held against the same achievement norms and scale. When a new trend line is started, new standards are defined, and past comparisons are no longer valid. Under this scheme, NAEP fulfills the charge to show what elementary and secondary students know and can do and how this changes over time.

Russell Farnen from West Hartford, CT asked:
As these long-term trends in National Assessment show, surrogates/proxies for SES such as parental education, books in the home, reading and TV habits,as well as race and ethnicity, etc. are highly and positively correlated with national assessment results at the individual and group levels. Why is this so and why does not the National Assessment methodology contain better measures of student SES such as parental income levels themselves? Better yet, what can U.S. education do to bridge this serious income gap in test results since your measures are being used to validate NCLB test results?
Dr. Peggy G. Carr: These are some great questions, and, unfortunately, I?m probably not going to be able to give you the answer you?re seeking. None of the NAEP Assessments are designed to answer ?why? questions. The LTT assessments identify the level of student performance and changes in that performance, but they do not explain the results of the assessment. While there are correlations, as you mentioned, we strongly caution against making causal statements about the results. Further research is needed to determine why a correlation exists. The long-term trend does not contain a parent question about parental income and research shows that students often do not know their parents? income. While NAEP results are often used in discussions about No Child Left Behind, NAEP doesn?t really validate NCLB test results as all of those are done at the individual student level.

Glynis Joseph from St. Louis, MO asked:
Are NAEP scores disaggregated by gender within the various ethnic groups? If so, what trends have been discerned for African American females in science, mathematics and reading from K-12? How is this impacted by the socioeconomic background of the student.
Dr. Peggy G. Carr: NAEP LTT no longer includes a science assessment and provides results in reading and mathematics for ages 9, 13, and 17 not K-12. It is possible to examine performance by race/ethnicity, gender, and SES (using proxies), and the tool for doing so will be released sometime this fall. In the meantime, our researchers ran a quick analysis on the performance of African American Females and found that they scored higher, on average, in 2004 than in any previous assessment year at age 9 in both reading and mathematics. At age 17, the average scores in 2004 were higher than in the base year but not significantly different from the average scores in the 1990s. For more information, look for the upcoming NAEP data tool or contact NCES directly.

Peter from Westminster, Maryland asked:
What is the largest percentage point increase in any 12 year period for a particular grade (such as 8th grade or 12th grade) for Mathematics and/or for Reading? The ESEA of 2001 is hopeful that proficiency rates will increase dramatically in 12 years (i.e. - from 2002 to 2014). Perhaps success can be claimed if NAEP data over this time period shows an increase in the percentage of students in the proficient and advanced categories that exceed that of any other 12 year period in the history of NAEP.
Dr. Peggy G. Carr: The LTT assessment provides scale scores and uses a metric called ?performance levels? to describe what students know and can do at certain points on the scale. So, over the 30+ years of the assessment, the largest scale score increase over a 12-year period was 18 points. The average score for Hispanic 9-year-olds increased from 212 to 230 between 1992 and 2004. To look at the percent Proficient, we have to go back to the 2003 main NAEP data. From 1990 to 2003, the percentage of students scoring at or above Proficient in mathematics increased from 12 to 29 percent at grade 4 and from 13 to 23 percent at grade 8. In reading, the percentage of students scoring at or above Proficient increased only 2-3 percentage points between 1992 and 2003 at grades 4 and 8. Stay tuned for the 2005 results, which should be released sometime in October.

Robert from Montclair, Virginia asked:
To what extent may exclusions and special accommodations of test-takers skew the results?
Dr. Peggy G. Carr: In 2004, approximately 7 to 8 percent of test takers were excluded from the sample upon which these results are based. Students are excluded if their schools judge that they cannot be meaningfully assessed, either because of a severe disability, or because they do not speak English. No accommodations were permitted on the current assessment, although a study was conducted in 2004 that allowed us to choose another random sample and allow accommodations for that sample. In the sample in which accommodations were allowed, the exclusion rate dropped to about 5 percent in reading and 3 percent in math. There were few significant differences in the results of the sample in which accommodations were allowed and the sample in which they were not.

Holly from Sacramento, CA asked:
Did this report disagreggate the data based on students in bilingual vs. English immersion programs? If so, what were the differences in performance level?
Dr. Peggy G. Carr: No, NAEP only collects data on whether or not the student is an English language learner and not on what type of program they have been placed in.

Dan from Cambridge, MA asked:
Are the standard deviations of the results (total sample for each sample and grade) available? Without that, it is hard to compare the changes to those on other assessments. thanks.
Dr. Peggy G. Carr: The standard deviations are not in the report. They range from 35 to 45 for the total sample in each age. However, some caution is in order when comparing this assessment to other assessments as the content can be very different.

Amber from Tifton, GA asked:
I have noticed that many of my fellow high school peers have trouble learning basic grammatical skills. For example, they can speak correctly, but they have trouble identifying verbs and adverbs in sentences. Has the quality of education really increased over the past few decades?
Dr. Peggy G. Carr: That?s a really interesting observation. But, NAEP is not designed to directly answer such questions.

Gail from Cottonwood, Arizona asked:
What trends are occuring in the enrollment numbers and achievement of second language students?
Dr. Peggy G. Carr: The NAEP long-term trend assessment did not collect data on English language learners (ELLs) in previous years. In the 2004 assessment, 9 percent of the sample of 9-year-olds were identified as ELLs, as well as 5 percent of 13- and 17-year olds. However, their numbers in the sample were not large enough to produce reliable performance data for ELLs. You can find information on the performance of ELLs from the "main" NAEP, which was last conducted in 2003.

Priscilla D. Marcial from Miami, Florida asked:
Ms. Carr: I saw a distribution of NAEP's findings by gender and ethnicity, its comparisons in both Reading and Math, and the gaps. However, I was not able to view these same results from a geographic location, i.e. results per state, can you please advise. Thank You, Priscilla
Dr. Peggy G. Carr: The long-term trend assessment only provides results at the national level, not the state level. However, summary data tables have data broken down by geographic region. These tables are available on our website at

Bernard R. Gifford from Palo Alto, CA asked:
Will the report allow one to compare student performance on the exams across different states?
Dr. Peggy G. Carr: No. It was designed to represent the nation but not to represent each of the 50 states individually.

Jeff from Sacramento, CA asked:
How do you see these improved results in younger students (9 years) carrying over into high school? In other words, will this success (in terms of closing the gap) continue into later years, or is there something inheritly divisive in secondary education that leads to less improvement.
Dr. Peggy G. Carr: NAEP is not really designed to answer a question like this. Each time the assessment is given, it is administered to a different group of students. We don?t know how they progress after we have assessed them. Of course we all hope that 17-year-olds in the future will show the same significant gains we?re seeing with our younger students.

Bill from Harrisburg PA asked:
On page 28 of the Trends Report we find: " differences were found in the United States between male and female students? scores in mathematics, but there were gender gaps in reading in which females scored higher than males in the United States (Lemke et al. 2002). So, although much of the nation?s attention has shifted to the performance gaps between different racial/ethnic groups, it is important to continue to examine the trends in the male-female score gap." Have you given consideration to disaggregating the NAEP data by gender by racial/ethnic group? If so why is it not done?
Dr. Peggy G. Carr: In the 2004 long-term trend assessment, as in the previous assessment, the percent of constructed response questions has not changed. The long-term trend assessment differs from the main NAEP assessments in that the items are not released and replaced.

Deborah from Akron, Ohio asked:
It seems like some scores are inching up over time. How does this translate to our (the U.S.) standing in the world community? Are we likely to still be lagging behind other countries?
Dr. Peggy G. Carr: We would not necessarily expect that the results we see in the NAEP long-term trend assessment would be reflected in international assessments because the content and type of test questions vary from one program to another. For the latest international results see the reports for TIMSS and PISA on the NCES website.

Debbie K. from Rockville, MD asked:
Peggy - the answer to Bill from Harrisburg doesn't match his question.
Dr. Peggy G. Carr: We apologize for the confusion. In general, NAEP does provide disaggregated data by gender and racial/ethnic groups. However, the long-term trend sample is relatively small and therefore the data cannot be disaggregated into most interactions (i.e. Hispanic males or White females), since the size of these groups becomes too small to report.

Samuel from Laredo, Texas asked:
Good afternoon. Based on these results can we now say that No Child Left Behind is a success? Thanks for taking my question.
Dr. Peggy G. Carr: This is certainly a viable hypothesis.

Bill from Harrisburg PA asked:
Is there any indication that increase in the percent of constructed response questions on the Trends assessment influenced student performance? If so how?
Dr. Peggy G. Carr: In the 2004 long-term trend assessment, as in the previous assessment, the percent of constructed response questions has not changed. The long-term trend assessment differs from the main NAEP assessments in that the items are not released and replaced.

Erin from Chapel Hill, NC asked:
I am uncomfortable with the language describing the achievement gap as "at an all time low." So people are not misled by this statement, could you please provide specific information regarding the achievement gap numbers, as well as whether the change in the gap is statistically significant, since you report that scores for all populations are rising?
Dr. Peggy G. Carr: All specific information is provided in the report card in the summary data tables posted on the Web. You're right--not all numerical differences are statistically significant. Look for the asterisks to determine which previous assessment years are significantly different from 2004.

Peter from Westminster, Maryland asked:
How can you reconcile these two facts: (1) Both reading and math scores have risen for 9 and 13 year olds, and (2) Reading and math scores have not risen for 17 year olds. It seems that if students are entering high school with greater knowledge, than they ought to achieve at levels in high school than their counterparts from 30 years ago.
Dr. Peggy G. Carr: We certainly hope that when these 9- and 13-year olds enter high school they will be better prepared than students in the past. However, NAEP is not designed to track the students we once tested to see how much they learn afterwards. Each time the test is given, it is administered to a different group of students. We don?t know how they progress years after we tested them.

Chuck from Springfield, IL asked:
Hi, since these long-term trends are in Math and Reading can we expect more of the same for other time periods for other subjects? Also has anything changed in the assessment giving that could account for differences in scores? Thanks.
Dr. Peggy G. Carr: Chuck, We would hope that improvement in mathematics and reading would be reflected in other subjects. Watch the upcoming results for science we are planning to release in the spring of 2006. On your second point, the long-term trend assessment has remained essentially the same over its 30-year lifetime. The demographics of the nation's students has changed, however. For example, Hispanics were about 4 percent of 17-year-old students in the early 1970s, but are closer to 14 percent now.

Joe from Newton, Mass. asked:
If the same exact questions have been asked over a period of 30 years, how can you be sure the knowledge measured is still relevant? Do you constrain the questions to measure understanding of very broad concepts or core principles?
Dr. Peggy G. Carr: The content of the long-term trend assessments has remained essentially constant so that we can continue to report trends. If the questions were changed, we would not be able to do this. In contrast, our main NAEP assessments are updated periodically to reflect changes in the curriculum. The long-term trend reading assessment is largely focused on locating specific information, making inferences, and identifying the main idea of a passage. The long-term trend mathematics assessment focuses on basic computational skills and applications of mathematics to straightforward everyday situations. Most educators would probably agree that these concepts and skills remain relevant today.

Ted from Durango, Colorado asked:
Too bad this good news will get lost in the news compeing with Bastille Day. That's a joke. Seriously though how can we make sure news like this that on the surface appears good gets some well deserved attention.
Dr. Peggy G. Carr: Ted, I agree that bad news often gets more attention than good news. However, in this case, you can be assured that many of the interested public will be aware of these results, as the major news organizations have covered this story.

Paula from Newport News, VA asked:
Can you describe how schools and states have elected to participate in the NAEP and how new guidelines regarding school, district, and state participation in the assessments might impact future findings?
Dr. Peggy G. Carr: For the long-term trend assessments, NAEP selects random samples of schools and students. The states and districts with schools in the sample are then asked to participate in it. Once they agree, schools and then students are asked to participate. The new participant guidelines do not directly apply to long-term trend assessments. In other recent NAEP assessments, these guidelines have already resulted in very high participation rates.

Bill from Harrisburg PA asked:
Table 5-2 shows 19 of 121 Bridge assessment math questions were constructed response. 34 of the 162 Modified assessment questions were CR, an increase from 16% to 21%. Does that represent an increase in the portion of the assessment result that depends on students' ability to answer constructed response type math questions?
Dr. Peggy G. Carr: Yes, this does represent a small increase in the proportion of that item pool that are constructed responses. However, this does not necessarily translate directly into a similar increase in the contribution of these types of items to the performance scale. I have time for one more question...

Bob from Madison, WI asked:
Today's LTT results provide some optimism for closing the achievement gap, especially in the elementary grades. How should education folks in individual states use the LTT information for school improvement initiatives?
Dr. Peggy G. Carr: One of the most important benefits of the long-term trend assessment is to provide independent verification of school improvements. However, the survey is not designed to provide the kind of information to answer your question directly. In addition, there is considerable contextual data available in the LTT's data set that might be helpful.

Thanks for all the excellent questions. Unfortunately, I could not get to all of them, but please feel free to contact members of the NAEP staff if you need further assistance. Please visit the website to learn more about the upcoming fall release of 2005 national, state and district reports in reading and mathematics from the main NAEP assessment.

Back to StatChat Home

Would you like to help us improve our products and website by taking a short survey?

YES, I would like to take the survey


No Thanks

The survey consists of a few short questions and takes less than one minute to complete.
National Center for Education Statistics -
U.S. Department of Education