Authors: Nancy L. Allen, John R. Donoghue, and Terry L. Schoeps
The 1998 National Assessment of Educational Progress (NAEP) monitored the performance of students in United States schools in the subject areas of reading, writing, and civics. The national main sample involved public- and nonpublic-school students who were in grades 4, 8, or 12. State assessments were also conducted at grades 4 and 8 in reading and at grade 8 in writing. Nearly 448,000 students were assessed in the national and state samples. Although a special study was done comparing 1998 civics results with those for 1988, no NAEP long-term trend (LTT) assessments of reading, writing, math, or science national samples were conducted in 1998.
For previous assessments in which there were both national (main and/or long-term trend) and state components, separate technical reports were produced for the national assessment and each state component (subject area). For 1998, this publication contains technical information about both the state and national components. Information common to both national and state components is presented in the first two parts, while later chapters contain detailed information for each subject area and for the national and state components.
The purpose of this technical report is to provide details on the instrument development, sample design, data collection, and data analysis procedures for the 1998 assessment. This document provides information necessary to show adherence to the Standards for Educational and Psychological Testing (American Educational Research Association [AERA], American Psychological Association [APA], & National Council on Measurement in Education [NCME], 2000) and to the Educational Testing Service (ETS) Standards for Quality and Fairness (ETS, 1987).
Detailed substantive results are not presented here but can be found in a series of NAEP reports covering the status of and trends in student performance; several additional reports provide information on how the assessment was designed and implemented. The reader is directed to the following reports for 1998 results:
The Report Card publications highlight results for the nation, states, and selected subgroups. The frameworks for the 1998 assessment content areas are in:
Other technical information is in:
The NAEP 1998 Reading Data Companion (Rogers, Kokolis, Stoeckel, & Kline, 2000), the NAEP 1998 Writing Data Companion (Rogers, Kokolis, Stoeckel, & Kline, 2000), and the NAEP 1998 Civics Data Companion (Rogers, Kokolis, Stoeckel, & Kline, 2000) provide information needed to analyze the 1998 NAEP results, and The NAEP Guide: A Description of the Content and Methods of the 1997 and 1998 Assessments (Calderone, King, & Horkay, 1997) contains a description of the content and methods used in both the main and state components of the 1998 assessments.
Many of the NAEP reports, including summary data tables, are available on the Internet at http://nces.ed.gov/nationsreportcard. For information about ordering printed copies of these reports, go to the Department of Education web page http://edpubs.ed.gov, call toll free 1-877-4ED PUBS (877-433-7827), or write to:Education Publications Center (ED Pubs)
The Frameworks are descriptions and plans for subject-area assessment content. For ordering information on these reports, write to:National Assessment Governing Board
The Frameworks and other NAGB documents are also available through the Internet at http://www.nagb.org.
NAEP strives to maintain its links to the past and still implement innovations in measurement technology. To that end, long-term trend samples use the same methodology and population definitions as in previous assessments. Main assessment samples incorporate innovations associated with new NAEP technology and address current educational issues. Both long-term trend samples and main assessment samples are nationally represented. The main assessment sample data are used primarily for analyses involving the current student population, but also to estimate short-term trends for a small number of recent assessments. Some of the assessment materials administered to the main assessment samples are periodically administered to state as well as national samples. In continuing to use this two-tiered approach, NAEP reaffirms its commitment to continuing to study trends while at the same time implementing the latest in measurement technology and educational advances.
In succeeding assessments, many of the innovations that were implemented for the first time in 1988 were continued and enhanced. For example, a focused balanced incomplete block (focused BIB) booklet design was used in 1988. Since that time, either focused BIB or focused partially balanced incomplete block (focused PBIB) designs have been used. Variants of the focused PBIB were used with the 1998 main national and state assessment samples in reading and writing, and a focused BIB was used in the 1998 main national civics assessment. Both the BIB and PBIB designs provide for booklets of interlocking blocks of items, so that no student receives too many items, but all receive groups of items that are also presented to other students. The booklet design is focused, because each student receives blocks of cognitive questions in the same subject area. The focused BIB or PBIB design allows for improved estimation within a particular subject area, and estimation continues to be optimized for groups rather than individuals.
Since 1984, NAEP has applied the plausible values approach to estimating means for demographic as well as curriculum-related subgroups. Scale score estimates were drawn from a posterior distribution that was based on an optimum weighting of two sets of information: the student's responses to cognitive questions, and his or her demographic and associated educational process variables. This Bayesian procedure was developed by Mislevy (1991). An improvement that was implemented first in 1988 and refined for the 1994 assessment continues to be used. This is a multivariate procedure that uses information from all scales within a given subject area in the estimation of the scale score distribution on any one scale in that subject area.
To shorten the timetable for reporting results, the period for national main assessment data collection was shortened in 1992, 1994, 1996, and 1998 from the five-month period (January through May) used in 1990 and earlier assessments to a three-month period in the winter (January through March, corresponding to the period used for the winter half-sample of the 1990 national main assessment).
A major improvement introduced in the 1992 assessment, and continued in succeeding assessments, was the use of the generalized partial-credit model for item response theory (IRT) scaling. This allowed the incorporation of constructed-response questions that are scored on a multipoint rating scale into the NAEP scale in a way that utilizes the information available in each response category.
One important innovation in reporting the assessment data that has been continued since 1990 is the use of simultaneous comparison procedures in carrying out significance tests for the differences across assessment years. Methods such as the Bonferroni procedure allow one to control for the type I error rate for a fixed number of comparisons. Beginning with the 1996 assessment, a procedure providing more powerful statistical tests that control for the false discovery rate (FDR) as applied by Benjamini and Hochberg (1994) was used for comparisons involving a large number of groups (e.g., state comparisons). In 1998 the FDR procedure was used for all comparisons in NAEP. While the Bonferroni procedure controls the probability of making even one false rejection, the FDR procedure used in NAEP controls the expected proportion of falsely rejected hypotheses. The Bonferroni procedure is more conservative than the Benjamini procedure for large families of comparison.
This report begins with the details of the design of the 1998 main and state assessments, summarized in Chapter 1. Chapters 2 through 8 provide an overview of the objectives and frameworks for items used in the assessment, the sample selection procedures, the administration of the assessment in the field, the processing of the data from the assessment instruments into computer-readable form, the professional scoring of constructed-response items, and the methods used to create a complete NAEP database.
The 1998 NAEP data analysis procedures are described in Chapters 9 through 13. Chapter 9 provides a summary of the analysis steps. Subsequent chapters provide a general discussion of the weighting and variance estimation procedures used in NAEP, an overview of NAEP scaling methodology, and information about the conventions used in significance testing and reporting NAEP results.
Details of the reading assessment data analysis are provided in Chapters 14 through 17. These chapters describe assessment frameworks and instruments, student samples, items, booklets, scoring, DIF analysis, weights, and item analyses of the main and state assessments. Similar details are provided for the writing assessment (Chapters 18 through 21) and the civics assessment (Chapters 22 through 24).
The appendices provide detailed information on a variety of procedural and statistical topics. Appendices I and J explain how achievement levels for the subject areas were set by the National Assessment Governing Board (NAGB). The last appendix (Appendix K) provides lists of committee members who contributed to the development of objectives and items.