Skip to main content
Skip Navigation
Overview of the NAEP Assessment Design

NAEP Technical DocumentationOverview of the NAEP Assessment Design

To meet the goals of measuring trends reliably and responding to changes in the current thinking about subject areas, the National Assessment of Educational Progress (NAEP) has instituted a multi-component assessment system, in which each component is itself a set of assessments designed to accomplish a specific goal. There are six components in the NAEP design:

  • national main assessments (using samples representative of the nation);

  • national long-term trend assessments in reading and mathematics (using samples representative of the nation; writing and science were also assessed through 1999);

  • state assessments (using samples representative of each jurisdiction);

  • trial urban district assessments (TUDAs)—to allow reporting of NAEP results for large urban school districts;

  • special studies (e.g., mathematics online special study, oral reading study); and

  • pre-tests (pilots and field tests).

While NAEP has collected national assessment data since 1969, data specific to the long-term trend assessment booklets has been collected only since 1986. The long-term trend scales connect the data from the long-term trend assessments with the data from the national assessments prior to 1986. Beginning in 1986, the national main assessments were updated to reflect current thinking about what students should know and be able to do in specific subject areas. The main assessment sample data are used not only for analyses involving the current assessment, but also to estimate trends across years for a number of recent assessments. In continuing to use the two-tiered approach of national main and national long-term trend assessments, NAEP reaffirms its commitment to continuing to study trends while at the same time implementing the latest in measurement technology and educational advances.

Some of the assessment materials administered to the main assessment samples are periodically administered to state as well as national samples. Trial state assessment data were collected in volunteer jurisdictions in 1990, 1992 and 1994 with a design that would impact the national components of NAEP as little as possible. Since that time, state assessments have occurred at specific grades for specific subjects concurrently with the corresponding national main assessments. From 1990 until 2002, administration of the state assessments was kept separate from the national main assessments and national long-term trend assessments administration. During that time, administrators from the states gave the assessment to students participating in the state assessments, while contractor-hired administrators continued to administer the main assessments. This required special analyses to link the results of the state and national main assessments. From 2002 forward, national and state data have been collected as a single sample, using the same administration procedures. Currently, contractor-hired administrators give the assessment to students whose responses contribute to both national and state results.

In addition to the collection of data specifically for long-term trend assessments, several other changes in NAEP design occurred in the mid 1980s. Since its inception, NAEP has assessed 9-year-olds, 13-year-olds, and in-school 17-year-olds. In 1984, NAEP began gathering data for samples defined by grade as well as by age—a practice that continued in national main assessments up to 1994. Since 1994, the national main assessments have included data for samples defined by grade only. In early years, NAEP also routinely assessed out-of-school 17-year-olds or young adults. (A separate assessment of young adults of ages 21 to 25 was conducted in 1985 under a separate grant).

It should be noted that somewhat different age definitions were used in the 1984, 1986, and 1988 assessments. In the 1984 assessment, the two younger ages were defined on a calendar-year basis, while the 17-year-olds were defined on an October 1 to September 30 basis. This resulted in modal grades of 4, 8, and 11. To allow for age cohorts that were exactly four years apart, in the 1986 national main assessment all ages were defined on an October 1 to September 30 basis, resulting in modal grades of 3, 7, and 11. Special studies (Kaplan, Beaton, Johnson, and Johnson 1988) were conducted to measure the effect of the changes in age definition. Because of problems encountered in assessing third-graders, in 1988 the ages were redefined on a calendar-year basis, with the modal grades being 4, 8, and 12. These were the age definitions used in the 1990, 1992, and 1994 national main assessments. However, results for 1992 and 1994 were primarily reported in terms of grades, rather than age. From 1996 through 2003, NAEP most often assessed both fourth- and eighth-grade students in the national main and state assessments, and twelfth-grade students in the national main assessments. Currently, NAEP assesses fourth- and eighth-grade students in national main and combined-state-and-national-main assessments, and twelfth-grade students in national main assessments in years when each of these grades is assessed. See specific grades assessed in each year in the table NAEP Grade and Age Cohorts Assessed. Long-term trend samples continue to be defined on the basis of student age (9-year-olds, 13-year-olds, and 17-year-olds).

Another change that occurred in the mid 1980s was a difference in the way booklets were administered and in the way booklets were formed. Until the 1984 assessment, NAEP was administered using matrix sampling and tape recorders; that is, by administering booklets of items using an aurally presented stimulus that paced groups of students through the individual items in a common booklet. Different booklets with different taped instructions and text were administered in different sessions. The booklets did not share items. In the 1984 assessment, a balanced incomplete block (BIB) design, which does not include aural pacing, was introduced in place of taped matrix sampling. Since that time, either focused booklet design BIB or focused partially balanced incomplete block (focused pBIB) designs have been used for all but the long-term trend assessments. Both the BIB and pBIB designs provide for several booklet types containing different blocks of items, so that no student receives too many items. In these designs, items are common to more than one booklet and they are assigned using a special design that controls booklet format. The booklet design is focused, because each student receives blocks of cognitive questions in the same subject area. The focused BIB or pBIB design allows for improved estimation within a particular subject area, and estimation continues to be optimized for groups rather than individuals.

In a more recent change, to shorten the timetable for reporting results, the period for national main assessment data collection was shortened in assessment years beginning in 1992. A five-month period (January through May) was used as the assessment period in 1990 and earlier assessments; a three-month period in the winter (January through March) has been used since 1990. The three-month winter period corresponds to the first half of the assessment period of earlier national main assessments. Until 2000, state assessments occurred during February and the early part of March. Currently, national and state assessments are administered together during the January through March assessment window. Long-term trend assessment data collection periods match those of the early NAEP assessments: age 9 in winter, age 13 in fall, and age 17 in spring.

In 2004, changes were made to revitalize the long-term trend (LTT) instrument with only reading and mathematics being assessed. During this effort, 1999 assessment data was linked to a Bridge sample using concurrent calibration. Equating was used to link the 2004 Bridge sample results and the 2004 operational results. In the Bridge sample, the 1999 instrument is administered and in the operational sample, a new long-term trend instrument is administered. The new LTT instrument contains blocks with items assessed in 1999 (although reconfigured from the 1999 design) and blocks with new items. An age sample was assessed in both subjects; therefore, concurrent calibration of the 1999 sample and the 2004 Bridge sample was done based an age sample only (the reading subject was a grade-age sample in 1999).

Other changes to LTT in 2004 included:

  • In the NAEP assessments after 2002 parental education (PARED) will no longer be asked at age 9 or Grade 4. However, to strengthen the link between 1999 and 2004 this question will be asked in the Bridge sample instrument.

  • Accommodations are permitted (by regulation) in the operational samples for students identified as SD and/or LEP. However, to fortify the link between 1999 and the Bridge study, no accommodations are permitted in the Bridge sample.

  • Accommodations are permitted (by regulation) in the operational samples for students identified as SD and/or LEP. However, to fortify the link between 1999 and the Bridge study, no accommodations are permitted in the Bridge sample.

  • The new instrument (i.e. 2004 operational sample) was administered under the following new conditions, in addition to those mentioned above:

    • a focused BIB,
    • no paced-tape,
    • no “I don’t know” option for multiple-choice items, and 
    • background questions are moved to the back of the booklet.

  • Because Science and Writing are no longer assessed for LTT, blocks in the Bridge instrument that appear after the last Reading or Mathematics item in that booklet do not need to be administered. They are replaced with blocks containing new items from the 2004 instrument to strengthen the link between the 2004 Bridge and the 2004 Operational results. These blocks will not have background questions mixed in. In addition, the pace-tape will be turned off at the start of these new blocks.

  • In Mathematics, the calculator block administered previously at all ages will no longer be assessed. In all cases this block appears together with Science and/or Reading blocks only and therefore the associated booklets will be terminated in the Bridge sample. This reduces the number of booklets in the Mathematics Bridge Study for ages 9 and 13 from three to two and for age 17 from two to one.

Another change occurred in 2005 for the grade 12 mathematics assessment. These changes included both administrative and framework changes. The two administrative changes in the 2005 math assessment (as compared to 2000) were:

  • In the 2005 assessment, students were allowed to use their own calculators rather than the four function calculators provided by NAEP;

  • In 2005, the test booklets were reconfigured to the two 25-minute blocks format, rather than the three 15-minute blocks design that was used previously, and the two background sections were also moved to the end of the booklet.

In regard to changes to the framework, major modifications were made at grade 12 in order to accommodate changes in the math curricula. Many items from the previous assessment were included because they met the criteria of both the old and new framework.

Last updated 01 March 2011 (JL)