Skip Navigation
Skip to main content

NAEP 1996 Trends in Writing: Fluency and Writing Conventions

April 1999

Authors: Nada Ballator, Marisa Farnum, and Bruce Kaplan

PDF Download the complete report in a PDF file for viewing and printing. 2,088K

Introduction: The NAEP Long-Term Trend Writing Assessment

Note: The Introduction section of NAEP 1996 Trends in Writing refers to content located in other sections of the report. The full report is provided as a PDF file and may be viewed using the link above. To view or print any of the Appendices, Tables 1.1 - 1.3 (Chapter 1), or Tables 2.1 - 2.3 (Chapter 2), use the PDF file link.

The NAEP long-term trend writing assessment provides an important picture of students' progress over time because it compares performance on the same writing tasks, administered in identical fashion to comparable samples of students and yielding comparable scores. There have been six national assessments of writing conducted during the school years ending in 1984, 1988, 1990, 1992, 1994, and 1996. The 1996 assessment included the same set of 12 writing tasks that had been administered in the five previous assessments. Each of these trend assessments was administered to nationally representative samples of students in grades 4, 8, and 11.

Over the past three decades, many teacher educators and classroom teachers have been emphasizing the writing process. The writing process approach focuses on the iterative nature of writing, in which writers plan, write, and revise their ideas in several drafts before a final version is produced. It is during the revision or editing stages of this process that writers focus on correcting grammatical and mechanical errors. Grammatical and mechanical correctness is not viewed as an end in and of itself, but eliminating these errors is an important part of improving the final draft. This report focuses on what changes, if any, have occurred in student writing between 1984 and 1996, the period examined by the NAEP long-term trend writing assessment.

This Report

Results of the 1996 long-term trend writing assessment are reported in two publications. This report describes two aspects of writing for which change has been measured since 1984: writing fluency, as determined by holistic scoring; and mastery of the conventions of written English (spelling, punctuation, grammar) as determined by mechanics scoring. This report is supplementary to NAEP 1996 Trends in Academic Progress,[1] the main report for the NAEP long-term trend assessment. That document reports trends in writing scores since 1984 as determined by primary trait scoring. This report presents the results of the holistic scoring of a subgroup of four of the 12 writing tasks, and the mechanics scoring of two of these four tasks.

The report is organized as follows: Chapter 1 compares student performance on writing tasks in 1984 and 1996 as measured by holistic scoring. Chapter 2 compares students' mastery of the conventions of writing (grammar, punctuation, and spelling) in 1984 and 1996. The brief Summary offers conclusions, and is followed by three appendices. Appendix A contains information about sample sizes and scoring procedures, and Appendix B contains the guides for holistic and mechanics scoring. Appendix C provides the standard errors for the data in the tables contained in the body of the report.

The NAEP long-term trend writing assessments discussed here and in Trends should not be confused with the main NAEP writing assessments. The long-term trend writing assessment was begun in 1984, and has presented students with the same writing tasks in the five ensuing assessments. These writing tasks are completely different from the prompts in the main NAEP assessment.[2] The use of different writing prompts, as well as other procedural differences, precludes direct comparison of the results of the long-term trend assessment discussed here with those of the main assessments.

Multiple Tasks and Multiple Measures of Writing

In order to assess students' abilities to write in a variety of formats and genres, the NAEP long-term trend writing assessment asks them to respond to several different tasks in each of three types of writing:

  • informative tasks ask students to write descriptions, reports, and analyses;
  • persuasive tasks ask students to write convincing letters and arguments; and
  • narrative tasks ask students to write stories.

The NAEP long-term trend instrument consists of 12 distinct writing tasks; however, each student who participated in the assessment responded to only a few (usually two) of the 12 tasks. These tasks are assessed using three types of measures:

  • primary trait scoring, as described in NAEP 1996 Trends in Academic Progress, measures success in accomplishing the specific task, e.g., writing persuasively;
  • holistic scoring, reported here, measures fluency in a subgroup of four of the 12 tasks; and
  • mechanics scoring, also reported here, measures conventions of written English using a subgroup of two of the four holistically scored tasks.

Primary trait scoring is based on established criteria that reflect the success of the student in accomplishing the specific writing task; for primary trait scoring, a unique scoring guide was used for each of the tasks. Student responses to all 12 writing tasks received primary trait scoring as reported in the principal 1996 long-term trend report, NAEP 1996 Trends in Academic Progress.

However, there are other aspects of writing that it is also important to assess. For instance, general writing quality or fluency -- the student's capacity to organize and develop a written piece, to use correct syntax, and to observe the conventions of standard written English -- is important. These aspects of written communication, taken together, are what holistic evaluation of writing addresses.[3]

The long-term trend writing assessment consisted of three distinct parts: primary trait, holistic, and mechanics scoring criteria. First, all 12 of the long-term trend writing tasks were scored using primary trait scoring criteria. The results of this are reported in NAEP 1996 Trends in Academic Progress in Chapters 7 and 8 (pages 151-197).[4]

Next, a subgroup of four of these tasks was scored holistically -- two tasks at each grade level. Two of the writing tasks were administered at grade 4 only, while the two other tasks were both administered at grades 8 and 11. One of the four is an informative task, one is a narrative task, and two are persuasive tasks. A brief description of each writing task and the grades at which the task was administered are provided in Figure I.1 below. Holistic scoring of these tasks yielded information about students' level of writing fluency, as seen in Tables 1.1 - 1.3. Different scoring guides were used for holistic scoring of narrative, informative, and persuasive tasks, as described in Appendix B.

Figure I.1:  Task by type of writing and summary of writing tasks scored for fluency (H) and for mechanics (M)

Lastly, to gain information about students' mastery of the conventions of written English, a subgroup of two of the holistic tasks was scored for mechanics -- one at each grade level (see the figure above and Tables 2.1 - 2.3). The mechanics scoring involved assessing students' use of standard English sentence structure, rules of agreement, word choice, spelling, and punctuation. It also captured information about the overall length of the students' responses and the number and complexity of the sentences that they used. For mechanics scoring, the same criteria were used to evaluate all tasks. See Appendix B of this report for the mechanics and holistic scoring guides.

Measuring the Fluency of Writing

Holistic scoring is the most commonly used method for evaluating students' writing performance in the United States today. Holistic scoring for NAEP focuses on the writer's fluency in responding to a task relative to the performance of other students at that grade level.[5] Fluency reflects a writer's facility with language both in terms of the development and organization of ideas and in the use of syntax, diction, and grammar. Holistic scoring methods were specifically designed to assess writing fluency. The underlying assumption of holistic scoring is that the whole piece of writing is greater than the sum of its parts. In holistic scoring, readers do not make separate judgments about specific aspects of a written response, but rather consider the overall effect, rating each paper on the basis of its general fluency.

In the NAEP long-term trend assessment, responses to four tasks are scored holistically, two tasks at each of the three grades (the same two tasks are administered at both eighth and eleventh grades). The characteristics of general fluency are assessed on a six-point scale, and described in the holistic scoring guides for narrative, informative, and persuasive writing tasks in Appendix B. In order to make comparisons of students' writing fluency across all six years of the assessment, all papers from the previous years were scored holistically, along with all of the 1996 papers. For each year, approximately 1200 papers[6] from each grade are scored.

As is typical with all holistic scorings, raters are trained on a particular task immediately before scoring the papers written in response to that task (as described in Appendix A). For each task, the papers from all years are randomly mixed and then assigned one of six scores. To detect changes in fluency from one assessment to another, the percentages of papers from each year within a given score category are compared. The comparisons reported here are for the first or base year and the current year, as in previous reports.[7]

Thus, while primary trait scoring is based on specific constant criteria and so permits year-to-year and grade-to-grade comparisons, holistic scoring allows within grade comparisons of relative fluency over all years according to contemporaneous criteria.

Measuring the Mechanics of Writing

Another set of analyses, applied to papers written for two of the tasks (see Figure I.1 above), focused on the mechanics of students' writing. While error counts do not fully reflect a writer's fluency and competency, many educators, policy makers, and parents are interested in the kinds of surface errors students make as they write.[8] Students' mastery of the sentence-level and word-level conventions of English, as well as their use of correct spelling and punctuation, were examined. (See Appendix A for procedures used in scoring, and Appendix B for the mechanics scoring guide.) In order to examine changes in students' success in using the conventions of written English, one task at each grade was selected for a detailed analysis of writing mechanics, including spelling, word choice, punctuation, and syntactic errors.

Expressing the Differences in Performance

Because the analysis is conducted using papers written by students who are part of a sample (rather than from the entire population of fourth, eighth, or eleventh graders in the nation) the numbers reported are necessarily estimates. As such, they are subject to a measure of uncertainty. This measure of uncertainty is reflected in the standard error of the estimate, which can be seen in Appendix C, in tables paralleling those in the main body of the report. In comparing student performances on a particular characteristic by either number or percentage, it is essential to take into account the standard error, rather than to rely solely on observed similarities or differences. The comparisons discussed in this report and marked with asterisks in the tables are based on statistical tests that consider both the magnitude of the difference between the averages and the standard errors of those statistics.

The statistical tests determine whether the evidence -- based on data from the two years -- is strong enough to conclude that there is an actual difference. If the evidence is strong (i.e., the difference is statistically significant), statements comparing 1996 with 1984 use terms such as higher, lower, increased, or decreased. The reader is cautioned to rely on the results of the statistical tests, as expressed in the text or as indicated in the tables, rather than on the apparent magnitude of the differences.[9]

The statistical tests employed here used Bonferroni procedures to form confidence intervals for the differences for sets of comparisons. Bonferroni procedures are appropriate for sets or "families" of comparisons, allowing adjustments according to family size to keep the certainty or significance level as specified (that is, a 95 percent certainty or 5 percent significance level). For comparisons in this report, several family sizes were used. Consider, for example, Table 2.1, which presents overall averages in 1984 papers compared with those in 1996 papers. For these across-year comparisons, the family size is 1, and consequently no adjustment is needed. Table 2.1 also presents across-year comparisons for papers in the lower and upper halves of the holistic scale; in this case, each half is a family of 1, so a Bonferroni adjustment is made for a family size of 2. Further information on statistical tests and adjustment procedures are in the NAEP 1996 Technical Report.

  1. Campbell, J. R., Voelkl, K. E., & Donahue, P. L. (1997). NAEP 1996 trends in academic progress: Achievement of U.S. students in science, 1969 to 1996; mathematics, 1973 to 1996; reading, 1971 to 1996; and writing, 1984 to 1996 (Publication No. NCES 97-985). Washington, DC: National Center for Education Statistics. This report is frequently referred to as Trends in this report. It is available on the Web at

  2. The NAEP long-term trend assessments have been administered in mathematics, science, reading, and writing, to national samples of students. Eighth graders are assessed in the fall, fourth graders in the winter, and eleventh graders in the spring, and the test booklets remain the same over all assessments. In contrast, the main NAEP 1992 Writing Assessment was conducted in the first quarter of 1992 at grades 4, 8, and 12, and the main NAEP 1998 Writing Assessment (based on a new framework) was conducted at grades 4, 8, and 12 in the first quarter of 1998. The 1998 main writing assessment was also administered to students in participating states at grade 8.

  3. It should be noted that holistic evaluation depends in part on aspects of writing measured in mechanics scores; Table A.3 and associated text discuss this relationship.

  4. Previous years of the Trends report also contain results from holistic and mechanics scoring of the tasks presented here. The 1994 Trends is also on the Web, as is the 1996 edition.

  5. Cooper, C. R. (1977). Holistic evaluation of writing. In C. R. Cooper & L. Odell (Eds.), Evaluation writing: Describing, measuring, judging. Urbana, IL: National Council of Teachers of English.

  6. For the first or base year of the assessment (1984), the number of papers was about half the quantity of that in ensuing years.

  7. For instance, see Campbell, J., Reese, C., O'Sullivan, C., & Dossey, J. (1996). NAEP 1994 trends in academic progress: Achievement of U.S. students in science, 1969 to 1994; mathematics, 1973 to 1994; reading, 1971 to 1994; and writing, 1984 to 1994 (Publication No. NCES 97-095). Washington, DC: National Center for Education Statistics.

  8. Shaughnessy, M. P. (1977). Errors and expectations: A guide for the teacher of basic writing. New York, NY: Oxford University Press.

  9. Standard errors measure the uncertainty that another sample drawn from the same population could have yielded somewhat different results.

PDF Download the complete report in a PDF file for viewing and printing. 2,088K

NCES 1999-456 Ordering information

Suggested Citation
U.S. Department of Education. Office of Educational Research and Improvement. National Center for Education Statistics. NAEP 1996 Trends in Writing: Fluency and Writing Conventions, NCES 1999-456, by N. Ballator, M. Farnum, and B. Kaplan. Washington, DC: 1999.

Last updated 14 March 2001 (RH)

Go to Top of Page