For every subject assessed, the NAEP program reports how well students of various demographic groups performed. (Note that NAEP does not report individual student scores.) For example, results are reported for male students and female students, for students of various racial or ethnic categories, and for students in schools in different regions.
How does NAEP summarize what students in these groups know and can do, in order to be able to compare how the groups performed?
In reading, NAEP creates a scale ranging from 0–500, based on statistical procedures called Item Response Theory (IRT). IRT is a set of statistical procedures useful in summarizing student performance across a collection of test exercises requiring similar knowledge and skills. All NAEP subject-area scales are produced using these procedures.
The 2011 reading data are scaled separately by the two types of reading (literary and informational) resulting in two separate subscales at each grade. The composite scale is a weighted combination of these subscales. Although the composite scale is defined differently beginning with the 2009 framework, special analyses determined that the 2009 and 2011 results could be compared to those from earlier assessment years. IRT information functions are only strictly comparable when the item parameters are estimated together. Because the composite scale is based on two separate estimation runs, there is no direct way to compare the information provided by the questions on the composite scale.
To give meaning to the levels of the scale, it is useful to create an "item map." An item map is a representation of the skills and abilities demonstrated by students at various levels of the NAEP reading scale. The map indicates which kinds of questions students are likely to answer correctly at each level on the scale. To get a more complete sense of the reading scale, take a look at the reading item maps.