Skip Navigation

Design Goals: NAEP 2002 and Beyond

In preparation for the 2002 assessment, NCES commissioned several authors to propose ways of streamlining and redesigning NAEP so that it would become possible to report results within 6 months of the completion of data collection. Based on the ideas originally developed for these design papers, NAEP implemented several design changes in 2002. These changes are too new to have been documented in NAEP’s technical reports or other publications and are briefly summarized below.

With the expansion of NAEP under the Elementary and Secondary Education Act, NAEP began conducting biennial state-level assessments, administered by contractor staff (not local teachers). The newly redesigned NAEP has four important features. First, NAEP now administers tests for different subjects (such as mathematics, science, and reading) in the same classroom, thereby simplifying and speeding up sampling, administration, and weighting. Second, NAEP conducts pilot tests of candidate items for the next assessment two years in advance and field tests of items for pre-calibration one year in advance of data collection, thereby speeding up the scaling process. Third, NAEP conducts bridge studies, administering tests both under the new and the old conditions, thereby providing the possibility of linking old and new findings. Finally, NAEP is adding more test questions at the upper and lower ends of the difficulty spectrum, thereby increasing NAEP’s power to measure performance gaps.

Testing different subjects in the same classroom

Previously, NAEP tests in different subjects were designed independently. For some subjects, the short questionnaire was given before the test, while for others, it was given afterwords. For some subjects, the test was given in three 15-minute blocks; for others, it was given in two 20-, 25-, or 30-minute blocks. As a result, tests for different subjects could not be administered in the same room without instructions for one test interrupting administration of another, and the process of data collection and analysis becoming unnecessarily complicated.

To solve this administrative problem, NAEP has adopted a standard test structure for all subjects—two 25-minute blocks of test questions, followed by two short blocks of questions. Common block timings permit assessing different subjects in the same classroom, reduce the number of classrooms required, require fewer numbers of students per subject in each school (increasing precision of the findings), permit simultaneously pretesting questions that are not yet operational, and simplify the development of sampling weights. In U.S. history, geography, and reading, the only required change is shifting the order of the contextual information and test question blocks; but in mathematics and science, the blocks of test questions must be reconfigured.

Shortened time for report production

Previously, NAEP has conducted weighting, scoring, and scaling after the completion of data collection. No further reductions in reporting time could be squeezed out of the previous design. Weighting has been speeded by reducing the number of different sets of sample weights. Scoring is now conducted in parallel in distributed scoring centers. To speed scaling, NAEP will pretest questions twice, two years and one year in advance. The latter pretest, with larger, representative samples, will permit calibration of items prior to operational testing and thereby accelerate scaling. NCES will look for ways to streamline its checking and approval of draft reports. Pretesting questions two years in advance will require longer lead time for development of tests in each subject.

Bridge studies to ensure comparability

Reconfiguring NAEP’s test questions into blocks of different lengths in mathematics and science and changing the order of the noncognitive and assessment questions in reading could change scale parameters, reducing comparability of current NAEP scores with those of past assessments. The solution to this problem is to conduct supplemental NAEP surveys in which the same test questions are administered under both the old and the new designs. The resulting data permit measurements of the impact of design changes and bridge past NAEP results with those of the new design.

Capacity to measure gaps in achievement

The accuracy of NAEP scores for a subgroup depends principally on two factors: the size of the subgroup sample and the accuracy of the test in the range in which the subgroup scores. NAEP ensures adequate sample sizes of groups for which NAEP measures gaps by targeting needed students for oversampling and, if necessary, by increasing state sample sizes. In addition, NAEP ensures adequate precision in the upper and lower ranges of the NAEP tests by adding more test questions at both ends of the difficulty range.

For more detailed information of each of above design principles implemented for 2002 under the expansion of NAEP, view Design Principles: 2002 and Beyond (120K PDF File More information about Adobe Systems Acrobat Viewer) by Andrew Kolstad,

Last updated 10 December 2009 (JM)