Arts Items and Instruments
NAEP assessments include cognitive items (i.e., test questions) that are designed to assess what students know and can do and non-cognitive items (background questions). Cognitive items are based on the framework and specifications documents for each assessment subject. These types of items include multiple-choice items, constructed-response items scored dichotomously, and constructed-response items scored polytomously. Non-cognitive items gather information such as time spent studying. NAEP subject areas include the arts, civics, economics, foreign language, geography, mathematics, reading, writing, science, U.S. history or world history.
The item-development steps1 for each subject area are as follows:
The National Assessment Governing Board provides content frameworks and item specifications in each subject area.
Instrument development committees in each subject area provide guidance to NAEP staff about how the objectives can be measured given the constraints of resources and the feasibility of measurement technology. The committees make recommendations about priorities for the assessment (within the context of the assessment framework) and the types of items to be developed.
Specialists with subject-matter expertise and experience in creating items according to specifications develop and review the assessment questions.
NAEP test development staff and external test specialists review and revise the items and accompanying scoring guides.
Representatives from the state education agencies meet and review all items and background questionnaires that are scheduled to be part of the state assessment.
Editorial and fairness reviews are conducted as required by NCES.
A pilot test is conducted in many of the states and jurisdictions slated to participate in the following year's operational assessment.
Based on the pilot test analyses, items are selected for inclusion in the operational assessment.
Each subject-area instrument development committee approves the selection of items to include in the following year's operational assessment.
Each subject-area instrument is submitted to the National Assessment Governing Board for approval.
After a final review, the booklets are printed.
1 Item development steps for the non-cognitive items (background questionnaires) are different.
Each administration of the NAEP assessment requires a new configuration of the student booklets given to students and how they are distributed to schools. To allow for wide content coverage within the limited testing time for each student, the instrument configuration entails a three-step design process for the subject areas to be assessed:
In the first step, the booklet block designs are created so that no student receives too many items and all receive interlocking blocks of items in a focused balanced, or partial balanced incomplete block (BIB or pBIB) design. The focused BIB or pBIB design allows for improved estimation within a particular subject area (relative to a non-focused BIB), and estimation continues to be optimized for groups rather than individuals.
Second, the spiraling scheme is designed. Spiraling refers to interleaving booklets systematically so that when they are handed out in the specified order, any group of students will receive approximately the target proportions of different types of booklets.
The third aspect is the bundling design. In 2003, NAEP test developers introduced an enhanced bundling design, referred to as vertical bundling. Vertical bundling has flexibility with respect to bundle length and reduces the required number of different bundles, decreasing booklet wastage, and improving balance of within-session booklet pairings.
|Note: Until the 1984 assessment, NAEP was administered using matrix sampling and tape recorders; that is, by administering booklets of exercises using paced audio tapes that walked groups of students through the individual assessment exercises in a common booklet. In the 1984 assessment, a balanced incomplete block booklet design, which does not include audio tape pacing, was introduced in place of taped matrix sampling.|