Table of Contents | Search Technical Documentation | References
Items with responses that do not appear to fit the item response models used to describe the data may be treated in special ways. Rarely, items with responses that do not fit the models may be deleted from NAEP scales. For instance, dichotomous items with responses that are non-monotonic would be deleted from NAEP scales. These are items where the likelihood of a correct response is lower for students who do well on the other items than for students who do poorly on the other items. Other items that might be deleted from NAEP scales are those that cannot be modified to fit the item response models and that have empirical and theoretical item response functions that vary a great deal.
Information about the way specific items are treated in NAEP scales is carefully tracked. Related items might be combined into cluster items and treated as a single item during scaling. Related items can be identified by test developers when items are written or scored, or by analysis staff when item responses are prepared for scaling or when items are scaled. Combining two or more items into a single item ameliorates dependencies among the items, making the assumption of local independence more realistic.
Polytomous items with categories of response data that do not fit the
generalized partial credit model when empirical and theoretical item response functions are compared may fit the model more closely if categories of response data are combined. Therefore, some categories of polytomous items might be recoded to reflect the combination of response categories. In some instances, the three-parameter logistic model will be applied to items after the score category recoding to achieve a reasonable fit.
Finally, when items that were administered in more than one assessment year have responses that differ for those years, items might be scaled separately across assessment years. These items are most often identified by empirical item response functions for the different assessment years that differ markedly from one another and from the overall theoretical item response function.
| Year | Subject area |
|---|---|
| 2022 | Civics |
| Mathematics | |
| Reading | |
|
U.S. history | |
| 2019 | Mathematics |
| Reading | |
| Science | |
| 2018 | Civics |
| Geography | |
| Technology and engineering literacy (TEL) | |
| U.S. history | |
| 2017 | Mathematics |
| Reading | |
| 2016 | Arts |
| 2015 | Mathematics |
| Reading | |
| Science | |
| Vocabulary | |
| 2014 | Civics |
| Geography | |
| Technology and engineering literacy (TEL) | |
| U.S. history | |
| 2013 | Mathematics |
| Reading | |
| 2012 | Economics |
| 2011 | Mathematics |
| Reading | |
| Reading vocabulary | |
| Science | |
| 2010 | Civics |
| Geography | |
| U.S. history | |
| 2009 | Mathematics |
| Reading | |
| Reading vocabulary | |
| Science | |
| 2008 | Arts |
| 2007 | Mathematics |
| Reading | |
| Writing | |
| 2006 | Civics |
| Economics | |
| U.S. history | |
| 2005 | Mathematics |
| Reading | |
| Science | |
| 2003 | Mathematics |
| Reading | |
| 2002 | Reading |
| Writing | |
| 2001 | Geography R3 / R2 |
| U.S. history R3 / R2 | |
| 2000 | Mathematics R3 / R2 |
| Reading R3 / R2 | |
| Science R3 / R2 | |
NOTE:
Because preliminary analyses of students' writing performance in the 2017 NAEP writing assessments at grades 4 and 8 revealed potentially confounding factors in measuring performance, results will not be publicly reported. In NAEP, vocabulary, reading vocabulary, and meaning vocabulary refer to the same reporting scale. R2 is the non-accommodated reporting sample; R3 is the accommodated reporting sample. If sampled students are classified as students with disabilities (SD) or English learners (EL), and school officials, using NAEP guidelines, determine that they can meaningfully participate in the NAEP assessment with accommodation, those students are included in the NAEP assessment with accommodation along with other sampled students including SD/EL students who do not need accommodations. The R3 sample is more inclusive than the R2 sample and excludes a smaller proportion of sampled students. The R3 sample is the only reporting sample used in NAEP after 2001. The block naming conventions used in the 2018 civics, geography, and U.S. history assessments are described in the document
2018 Block Naming Conventions in Data Products and TDW
| |
| Year | Subject area |
|---|---|
| 2022/2023 | Mathematics long-term trend |
| Reading long-term trend | |
| 2020 | Mathematics long-term trend |
| Reading long-term trend | |
| 2012 | Mathematics long-term trend |
| Reading long-term trend | |
| 2008 | Mathematics long-term trend |
| Reading long-term trend | |
| 2004 | Mathematics long-term trend |
| Reading long-term trend | |
NOTE: The block naming conventions used in the 2020 mathematics and reading long-term trend assessments are described in the document
2020 Block Naming Conventions in Data Products and TDW . The block naming conventions used in the 2022/2023 mathematics and reading long-term trend assessments are described in the document
2022/2023 Block Naming Conventions in Data Products and TDW . In 2020 and 2022/2023, age 17 was not assessed.
SOURCE: U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics, National Assessment of Educational Progress (NAEP), various years, 2004–2023 Mathematics and Reading Long-Term Trend Assessments. | |