Skip to main content
Skip Navigation

Statistical Software

Research Center Image
Watch a module: Using Software to Analyze NAEP Data.

Analyzing the National Assessment of Educational Progress (NAEP) restricted-use datasets requires special statistical methods due to their scope and complexity. The following analysis software for NAEP restricted-use datasets can assist you in conducting analyses for your education research:

  • EdSurvey R Package,
  • AM Statistical Software, and
  • the NAEP Data Toolkit.

Restricted-Use Data

Restricted-use micro-level data can be obtained on CD-ROM for approved purposes of secondary analysis, provided the user and organization have been granted a license.

Learn More

EdSurvey R Project

EdSurvey is an R statistical package designed to analyze national and international education data from the National Center for Education Statistics (NCES). The released EdSurvey Version 2.7 includes the following data sources:

  • National Assessment of Educational Progress (NAEP) –up to 2019 NAEP
  • Trends in International Mathematics and Science Study (TIMSS) and TIMSS Advanced –up to 2019 TIMSS
  • Progress in International Reading Literacy Study (PIRLS) and ePIRLS –up to 2016
  • International Computer and Information Literacy Study (ICILS) –up to 2018
  • International Civic and Citizenship Education Study (ICCS) –up to 2016 ICCS
  • 1999 Civic Education Study (CivEd)
  • Programme for International Student Assessment (PISA) –up to 2018 PISA
  • PISA Young Adult Follow-up Study
  • Programme for the International Assessment of Adult Competencies (PIAAC) – up to Cycle 1 – Rounds 1 to 3 (2017)
  • Teaching and Learning International Survey (TALIS) –up to 2018
  • Early Childhood Longitudinal Studies (ECLS-K: 1998, ECLS-K: 2011, ECLS-B)
  • Education Longitudinal Study of 2002 (ELS)
  • High School Longitudinal Study of 2009 (HSLS)
  • Beginning Teacher Longitudinal Study (BTLS)

EdSurvey gives users the ability to process and analyze these data efficiently, taking into account their complex sample survey design and the use of plausible values. The key functions of EdSurvey version 2.7 include:

  • data processing, including downloading publicly available data and reading data in R;
  • data manipulation, such as the subsetting and merging of data, as well as renaming and recoding variables;
  • data exploration, including methods to better understand survey attributes and search for variables and levels in codebooks;
  • summary statistics, including unweighted and weighted totals, conditional means, and the percentage of respondents in a category (conditional on an ancillary categorical variable or the interactions of an arbitrary number of categorical variables), estimation of scale score means based on plausible values;
  • linear regression with or without plausible values as the dependent variable;
  • logistic regression that allows either a discrete variable or dichotomized plausible values as the dependent variable;
  • multilevel models that use weights at multiple levels and allowing plausible values in the dependent variable;
  • direct estimation that estimates student scale scores using the marginal maximum likelihood regression estimation method. An alternative method to the plausible values approach;
  • gap analysis that compares the average, percentile, achievement level, or percentage of survey responses between two groups that potentially share members;
  • percentile that calculates the percentiles of a numeric variable or plausible values;
  • analysis of achievement levels and benchmarks for NAEP and international assessment data;
  • correlations, including Pearson, Spearman, polyserial, polychoric, and correlation between plausible values, with or without weights applied;
  • multivariate regression that extends multiple linear regression to include models with multiple outcome variables;
  • quantile regression that fits a quantile regression model that uses weights and variance estimates appropriate for the data; and
  • NAEP linking error method incorporates linking error in variance estimation for NAEP assessments during transition year from paper-based assessment to digitally based assessment.

AM Statistical Software

AM is a statistical software package for analyzing data from complex samples, especially large-scale assessments such as NAEP, and the Trends in International Mathematics and Science Studies (TIMSS). AM was developed by the American Institutes for Research (AIR) with funding, in part, from NCES. Learn more about AM software and download free from the AM website.

Learn More

NAEP Data Toolkit

The NAEP Data Toolkit contains data analysis tools for restricted-use data. One of the tools within the toolkit is NAEPEX, a data extraction program for choosing variables, extracting data, and generating SPSS control statements. The other tools are cross-tabulation and regression analysis modules that operate in stand-along mode and require SPSS system files as input. These modules perform optimally for small and medium size datasets (up to 30,000 cases), but are not recommended for use on the large NAEP data files that have been available since 2002.

To use the NAEP Data Toolkit,

  • request a disc from NCES,
  • make a separate request to obtain a license to access restricted-use data, and
  • read the Toolkit procedures.


Last updated 01 November 2021 (AA)