What is the difference between restricted-use data and public-use data files?

Several modifications are made to the data on the public-use files in order to reduce the likelihood that any respondent could be identified in the data.

  • Outlier data (i.e., unusual or rare responses) are top- or bottom- coded on the public-use files. For example, the number of kindergarten teachers who did not have at least a bachelorís degree was so small that such teachers are grouped in the same category as teachers who have a bachelorís degree. Bottom and top coding prevents identification of schools, teachers, parents, or children who have unique characteristics without affecting overall data quality. Outlier data appear in their original form on the restricted files.
  • Certain variables with too few cases having valid data or a sparse distribution are suppressed in the public-use files (i.e., no data are reported for those variables) but are available in the restricted-use files.
  • Certain continuous variables are transformed into categorical variables, and certain categorical variables have their categories collapsed in the public-use file. This categorization and collapsing reduce disclosure risk, while still providing data with adequate variability that can be used in many different kinds of analyses, such as regression analysis. Data that are modified in this way on the public files appear in their original form on the restricted files.

Additionally, ECLS-K restricted-use files are cross-sectional; most ECLS-K public-use files are longitudinal.

How will the difference in public-use files and restricted-use files impact analysts?

For most users, the public-use files provide all the data they will need for most analyses. Both the public- and restricted-use files provide data at the individual child level; for the kindergarten round, data are also provided at the teacher and school levels. Overall, few variables have been suppressed on the public files. (Information about which variables have been suppressed can be found in the data file userís manuals. Additionally, all data for suppressed variables have a value of -2 in the data file.)

Some users may find that only the restricted files have the specific data they need. For example, those researchers examining certain groups of children whose representation in the population is relatively small, such as children in special education or children who speak a specific non-English language at home, or researchers interested in examining the types of kindergarten programs offered in schools, will find that the restricted files have more variables related to their topics of interest than the public files do. In many cases, however, even though the detailed information on the restricted-use files may be of interest, the sample sizes are too small for detailed analyses. Before requesting restricted-use data, NCES recommends examining the public-use files to verify if the needs of the researcher can be met using those data files.

The modifications used to reduce the likelihood that any respondent could be identified in the data do not affect the overall data quality.

Will you be collecting and releasing any more data?

At this time, NCES does not have plans to collect any more data from the students in the ECLS-K cohort or their families. The last round of data was released on the longitudinal kindergarten through eighth grade data file.

NCES is continuing its program of longitudinal studies of young children with the Early Childhood Longitudinal Study, Kindergarten Class of 2010-2011 (ECLS-K:2011). For more information on the ECLS-K:2011, please visit