Illustration/Logo View Quarterly by  This Issue  |  Volume and Issue  |  Topics
Education Statistics Quarterly
Vol 1, Issue 3, Topic: Methodology
The NAEP 1996 Technical Report
By: Nancy L. Allen, James E. Carlson, and Christine A. Zelenak
 
This article was excerpted from the Introduction to the technical report of the same name. The report describes the design and data analysis procedures of the 1996 National Assessment of Educational Progress (NAEP).
 
 

The 1996 National Assessment of Educational Progress (NAEP) monitored the performance of students in American schools in the subject areas of reading, mathematics, science, and writing. The purpose of this technical report is to provide details on the instrument development, sample design, data collection, and data analysis procedures of the 1996 national assessment. Detailed substantive results are not presented here but can be found in a series of NAEP reports on the status of and trends in student performance; several other reports provide additional information on how the assessment was designed and implemented.

The national sample involved nearly 124,000 public and nonpublic school students who were 9, 13, or 17 years old or in grades 4, 8, or 12. Additional samples of approximately 125,000 fourth- and 125,000 eighth-graders in 48 jurisdictions were assessed in the 1996 state assessment in mathematics. Also, a sample of approximately 125,000 fourth-graders in 47 states and jurisdictions was assessed as part of the 1996 state assessment in science. A representative sample of about 2,500 students was selected in each jurisdiction for each subject at each grade level. The state-level sampling plan allowed for cross-state comparisons and comparisons with the nation in fourth-grade science and fourth- and eighth-grade mathematics achievement. Tech-nical details of the state assessments are not presented in this technical report but can be found in the state technical reports.

Back to the Top


For the 1996 assessment, NAEP researchers continued to build on the original design technology outlined in NAEP Reconsidered: A New Design for a New Era (Messick, Beaton, and Lord 1983). In order to maintain its links to the past and still implement innovations in measurement technology, NAEP continued its multistage sampling approach. Long-term trend and main assessment (short-term trend) samples use the same methodology and population definitions as in previous assessments. Main assessment samples use innovations associated with new NAEP technology and address current educational issues. Long-term trend data are used to estimate changes in performance from previous assessments; main assessment sample data are used primarily for analyses involving the current student population, but also to estimate short-term trends for a small number of recent assessments. In continuing to use this two-tiered approach, NAEP reaffirms its commitment to maintaining long-term trends while at the same time implementing the latest in measurement technology.

A major new design feature was introduced for 1996 to permit the introduction of new inclusion rules for students with disabilities (SD) and limited English proficient (LEP) students, and the introduction of testing accommodations for those students. The 1996 national NAEP incorporated a multiple sampling plan that allowed for studies of the effects of these changes in NAEP inclusion and accommodation procedures. Under this sampling plan, students from different samples were administered the NAEP instruments using different sets of inclusion rules and accommodation procedures. In certain samples, testing accommodations were provided for SD and LEP students who could be assessed, but not with standard instruments or administration procedures.

In the 1996 assessment, many of the innovations that were implemented for the first time in 1988 were continued and enhanced. For example, a variant of the focused balanced incomplete block (focused-BIB) booklet design, which was used in 1988 and has continued to be used in other assessment years, was used in the 1996 main assessment samples in mathematics and science. In the focused-BIB design, an individual receives blocks of cognitive items in the same subject area. The focused-BIB design allows for improved estimation within a particular subject area, and estimation continues to be optimized for groups rather than individuals.

In 1996, NAEP continued to apply the plausible values approach to estimating means for demographic as well as curriculum-related subgroups. Proficiency estimates were based on draws from a posterior distribution that was based on an optimum weighting of two sets of information: students' responses to cognitive items and students' demographic and associated educational process variables. This Bayesian procedure was developed by Mislevy (see chapter 11 of the complete report or Mislevy 1991). The 1996 procedures continued to use an improvement that was implemented first in 1988 and refined for the 1994 assessment. This is a multivariate procedure that uses information from all scales within a given subject area in the estimation of the proficiency distribution on any one scale in that subject area.

A major improvement used in the 1992 and 1994 assessments, and continued in 1996, was the use of the generalized partial credit model for item response theory (IRT) scaling. This allowed the incorporation of constructed-response questions that are scored on a multipoint rating scale into the NAEP scale in a way that utilizes the information available in each response category.

One important innovation in reporting the 1990 assessment data that was continued through 1996 was the use of simultaneous comparison procedures in carrying out significance tests for the differences across assessment years. Methods such as the Bonferroni allow one to control for the type I error rate for a fixed number of comparisons. In 1996, more powerful new procedures that control for the false discovery rate were implemented for some comparisons. Tests for linear and quadratic trends were also applied to the national trend data in reading, mathematics, science, and writing.

Back to the Top


Part I of this report describes the design of the 1996 Na-tional Assessment, beginning with a summary. Individual chapters then present in more detail the development of the objectives and the items used in the assessment, the sample selection procedures, the assessment booklets and questionnaires, the administration of the assessment in the field, the processing of the data from the assessment instruments into computer-readable form, the professional scoring of constructed-response items, and the methods used to create a complete NAEP database.

The 1996 NAEP data analysis procedures are described in Part II of the report. Following a summary of the analysis steps, individual chapters provide a general discussion of the weighting and variance estimation procedures used in NAEP, an overview of NAEP scaling methodology, and details of the trend and main assessment analyses performed for each subject area in the 1996 assessment. Basic data from the 1996 assessment, including the properties of the measuring instruments and characteristics of the sample, are also presented.

Back to the Top


Messick, S.J., Beaton, A.E., and Lord, F.M. (1983). NAEP Reconsidered: A New Design for a New Era (NAEP Report 83-1). Princeton, NJ: Educational Testing Service.

Mislevy, R.J. (1991). Randomization-Based Inference About Latent Variables From Complex Samples. Psychometrika, 56: 177-196.

For technical information, see the complete report:

Allen, N.L., Carlson, J.E., Zelenak, C.A. (1999). The NAEP 1996 Technical Report (NCES 1999-452).

Author affiliations: N.L. Allen, J.E. Carlson, and C.A. Zelenak, Educational Testing Service.

For questions about content, contact Arnold Goldstein (arnold.goldstein@ed.gov).

To obtain the complete report (NCES 1999-452), call the toll-free ED Pubs number (877-433-7827), visit the NCES Web Site (http://nces.ed.gov), or contact GPO (202-512-1800)


Back to the Top