Skip Navigation

Statistical Standards
Statistical Standards Program
 
Table of Contents
 
Introduction
1. Development of Concepts and Methods
2. Planning and Design of Surveys
3. Collection of Data
4. Processing and Editing of Data
5. Analysis of Data / Production of Estimates or Projections
6. Establishment of Review Procedures
7. Dissemination of Data

 
7-1 Machine Readable Products
7-2 Survey Documentation in Reports
7-3 Release and Dissemination of Reports and Data Products

Glossary
Appendix A
Appendix B
Appendix C
Appendix D
 
Publication information

For help viewing PDF files, please click here
DISSEMINATION OF DATA

SUBJECT: MACHINE READABLE PRODUCTS

NCES STANDARD: 7-1

PURPOSE: To ensure the utility of data files created by NCES staff and contractors, all NCES data files must be accompanied by easily accessible documentation that clearly describes the metadata necessary for users to access and manipulate the data.

KEY TERMS: confidentiality, confidentiality edit, edits, imputation, metadata, reference year, response rates, survey system, survey year, universe, and variance.


STANDARD 7-1-1:
Machine-readable products must be released in ASCII format. Machine-readable products include flat files, relational databases, and spreadsheets. Each record must contain a unique case identifier such as ID. Files with multiple records per case must also contain unique record type identifiers (e.g., record number, year of data). Data files must be in one of two acceptable formats:

  1. Delimited, text quoted file format that is importable, or
     
  2. Positional files where the locations of all variables are identified (i.e., file, record within file, and position within record).

    GUIDELINE 7-1-1A: Data producers are invited to provide additional data sets in alternate formats that may be helpful to users. For guidance on Web-based formats, see the NCES public Web publishing standards; request a copy by sending an e-mail to NCESWebmaster@ed.gov

    GUIDELINE 7-1-1B: To facilitate the sharing and use of data elements, national and international standards organizations have produced drafts of several standards for the creation of metadata on data elements. Examples are the International Organization for Standards "Specification and Standardization of Data Elements" standard (ISO/IEC 11179) and the more detailed American National Standards Institute "Metadata for the Management of Shareable Data" Standard (ANSI X3.285) www.ansi.org. These standards continue to be refined. Data producers should determine what metadata standards are current at the time data files are prepared and produce associated metadata for their files that are in compliance with applicable standards.


STANDARD 7-1-2: A file description and record layout must be provided for each file. The file information/metadata header must include the following:

  1. Title of the survey (survey name, part, and year as applicable);
     
  2. Name(s) of each file;
     
  3. Reference year for the data;
     
  4. Version number and date of release;
     
  5. Logical record length (in positional files) or number of variables on the file (delimited files);
     
  6. Number of records per case or observation; and
     
  7. Number of cases in the data file. For delimited files also include the delimiters (e.g., comma, space).


STANDARD 7-1-3: For each variable on the file, the file description must include the following:

  1. Variable name;
     
  2. Data type (alpha or numeric);
     
  3. Record number (if multiple records per case);
     
  4. Position within the record (beginning-end, or variable number if delimited) within the record, field length, and variable label; and
     
  5. The survey question wording and response categories.


STANDARD 7-1-4: Data set naming conventions must be standardized and must conform to Information Systems Security Organization (ISSO) (or more recent) standards for pressing a CD, which currently requires a name with the following format: "xxxxxxxx.xxx".


STANDARD 7-1-5: Jewel box covers and Web links or URL links must identify the survey system (e.g., HS&B, CCD), component, survey year, and version number.


STANDARD 7-1-6: All variables must be clearly identified and described.

  1. The description of variables must include the universe for the variable.
     
  2. In the case of composite variables, the description must identify all survey items used to construct the variables and must include the algorithm used to construct the variables.
     
  3. Upper and lower case labels that clearly describe the variables must be used.
     
  4. For all categorical variables, each value must be associated with a frequency, a percentage of total cases and a label for each category. In public-use and restricted-use file documentation, unweighted frequencies must be included (see Standard 4-2-10 for public-use files without confidentiality edits).
     
  5. For all continuous variables, the distribution of values (e.g., minimum, maximum, mean, and standard deviation) must be provided.

    GUIDELINE 7-1-6A: FIPS Standards should be used where applicable. NCES standard definitions and codes should be used where applicable (see Standard 1-4).

    GUIDELINE 7-1-6B: Variables names should be consistent across surveys within a survey system, within and across years.

    GUIDELINE 7-1-6C: In a printable record layout file, line length should be specified so that it prints correctly without wrapping and without special modification (e.g., 72 characters, 12 point type).


STANDARD 7-1-7: Data file documentation must be complete for all data files. This includes an abstract or summary that cites the methodology report or technical notes associated with the survey and a description of survey methodology that is consistent with the NCES standard for survey system documentation (see Standard 3-4). In general, survey methodology documentation for data files must include the following:

  1. Description of data collection methods;
     
  2. Weighting and imputation procedures;
     
  3. Description of editing, error resolution, and imputation flags;
     
  4. Guidelines for processing the data;
     
  5. The reference year for the data;
     
  6. Unweighted frequency counts, and response rates;
     
  7. Information on how to use replicate weights or PSUs and stratum for variance estimation; and
     
  8. Procedures for using weights to produce estimates.


STANDARD 7-1-8: The following data element conventions must be used:

  1. Numeric-fields must contain only numbers or blanks. Reserve codes for numeric fields should be extreme negative values (e.g., lower than the lowest real value).
     
  2. "0" must represent zeros. Blanks or "-" may not be used to represent 0s.
     
  3. Unique values must be used to distinguish between legitimate skips and nonresponse.
     
  4. Suppression symbols must be removed from numeric fields and stored in associated "flag" fields.
     
  5. Separate record locations must be used for all data items.
     
  6. Imputed data must be flagged in associated "flag" fields. Imputation methods must be identified in the flag. Blanks are not legitimate values for flags.

    GUIDELINE 7-1-8A: When practical, numeric data fields containing continuous variables should be identical in length.