Skip Navigation

Using Library Data Files

NCES provides data files from many of the surveys it conducts. These data files are available for downloading from the NCES website, through the Electronic Catalog.

These pages give you information about finding and using the data files.

Why download data files?

Advantages of Using Data Files

Researchers use data files to perform customized data analysis not available in the web tools and publications. For example, publications and web tools may not make available an analysis using the particular variables the researcher needs.

The "Find Public Libraries, Branches, Bookmobiles" tool does not allow users to export or download data. The "Compare Libraries" tools (available for both public and academic libraries) do provide export or download capabilities.

In the web tools and publications, ratios (e.g., per capita) are calculated by NCES. However, these calculations may have been done differently than the researcher requires, perhaps by using a different formula. On the other hand, calculations, aggregations, etc. are under the control of the researcher, if he or she downloads and works with the data files directly.

Data files contain some data fields not used in the web tools, and therefore not included in a file downloaded or exported from the web tools. These include:

  • All imputation flags. (See below for a discussion of Imputation.)
  • Unduplicated population. (See below for a discussion of population fields.)

Disadvantages of Using Data Files

Users must download the entire data file, then sort records as desired and delete records not wanted. On the other hand, the web tools allow users to download just those records they may be interested in.

Fields containing calculated values for the web tools aren't included in the data files. However, we have provided information on how these fields are calculated below, so the user can perform his/her own calculations.

Top

Documentation files

What's in them?

  • Information on survey design, imputation, and data suppression
  • Record layouts
  • Appendices containing the survey questionnaire and data definitions

The record layouts (usually in one or more Appendices) give you:

  • Variable name - This is the field name (also called the data element name).
  • Field length and start position - These are needed to import data from the ASCII-format file into other software applications, so the user will know where each field starts in the record and how long it is.
  • Data type - This is the type of data (numeric, text, date, etc) in the field.
  • Description - Gives the definition or more detail about the field or variable.

Top

File formats

Since different applications require files in different formats, data files are often available in several formats.

Files are often available in these formats: MSAccess, ASCII, or SAS formats (not all data files are available in SAS format). A number of older files are available only on diskette or magnetic tape.

The NCES Electronic Catalog data page for each data file will give you specific information on what format(s) the data file is available in, and how to obtain the data file.

Many data files are Zipped using WinZIP, for faster downloads.

Most documentation files are in the Adobe .PDF format.

Using MS Access format files

Data files that are in MS Access format (with .MDB extension) can be used directly in Microsoft's Access database application, and any application that can import or read MS Access database files.

Using ASCII format files

ASCII-format files (with .TXT extension) can be viewed and edited using any text editor (MS WordPad, TextPad, etc), and imported into many software applications:

  • most word-processing applications (MS Word, WordPerfect)
  • most spreadsheet applications (MS Excel, Lotus 1-2-3)
  • most database applications (MS Access, dBase, PARADOX)
  • most statistical applications (SPSS, SAS)

Some tips for using ASCII-format files:

  • Use "text" type for IDs, ZIP codes, phone numbers, and other codes, so that leading 0 characters aren't dropped - this is especially true importing into MS Excel, because it assumes any field with digits, like a ZIP code, is probably a number, and ZIP codes that start with 0, such as New England ZIP codes, will not be correct.
  • Many import functions allow you to skip fields you don't need
  • Many software applications have a limit on the number of fields. In most cases this limit is 256 fields (library data has fewer than 256 fields). But this could be a problem if the software application you are using has a field limit that is smaller 256 - data in fields beyond the limit will probably not be imported.
  • Numeric fields may contain negative numbers, such as "-1". "-2", or "-3". These indicate that the value is "null" (or "blank"), and may mean that the data are missing, not applicable, or suppressed to protect confidentiality. See the data file documentation for more information. Check for fields that contain negative numbers before using averages, sums, or other calculations or aggregations of data.
  • Newer versions of many software applications (such as MS Excel) allow users to open the data files directly, and the file-import process starts automatically. You may need to tell the application what type of file to look for; when in doubt, choose All files (*.*). For older versions, please refer to the documentation for the software application for details.

Top

Restricted- and Public-Use Files

Two types of data files available, restricted-use data files and public-use data files.

Restricted use data files contain all data as it was collected, edited, corrected, and imputed for non-response.

Restricted-use data files contain individually identifiable information, which is confidential and protected by law. The terms restricted-use data and "subject data" are synonymous.

The Education Sciences Reform Act of 2002 requires NCES to follow special procedures to protect the privacy of individual respondents.

From the "Restricted-Use Data Procedures Manual":

The goal is to maximize the use of statistical information, while protecting individually identifiable information from disclosure. The Restricted-Use Data Procedures Manual was created to provide a guide to the restricted-use data application process, as well as to explain the laws and regulations governing these data.

Researchers requiring access to the restricted-use data must obtain a license from NCES to use the data on loan. To obtain a license, the following information is necessary:

  1. The title of the database(s) the organization wants to access;
  2. A description of the statistical research project necessitating access to the restricted-use database;
  3. The name and title of the senior official having authority to bind the organization to the provisions of the license agreement;
  4. The name and title of the principal project officer(s) who will oversee the daily operations;
  5. The names, titles, and telephone numbers of the professional/technical and support staff who will have access to the data;
  6. The estimated loan period (not to exceed five years) for accessing the data; and
  7. The desired computer media format.

Click here for more information on how to obtain a Restricted-use license.

Public-use data files are the same as restricted-use data files, but they have had some data removed to protect the confidentiality of individually identifiable survey respondents.

Public-use data files are publicly available without restriction, and do not require a license. Survey data are coded or aggregated without individually identifiable information. Data that could be directly identified with one individual (salaries and wages for librarians for a library with one librarian, for example) are removed.

The library web tools use the public-use data files; that is, some of the data used by the tools have been removed as described above.

Top

Differences between public- and restricted-use files in the Public Libraries Survey

From the PLS documentation titled "Data File, Public-Use: Public Libraries Survey: Fiscal Year 2001" PDF File (1,036 KB):

Public-use data. On the public-use Public Library Data File, selected expenditures data (i.e., Salaries, Benefits, Total Staff Expenditures, and Other Operating Expenditures) for public libraries have been removed (i.e., the field is blank) when total full-time equivalent (FTE) staff is less than or equal to 2.00, to protect the confidentiality of respondents. These data may also be suppressed for other libraries to ensure that all states that have suppressed data have a minimum of 3 suppressed records. The library's Total Operating Expenditures are not affected by the suppression of these data. No data are suppressed on the public-use State Summary/State Characteristics Data File or the Public Library Outlet Data File.

Restricted-use data. No data are suppressed on the restricted-use Public Library Data File. The inclusion of all expenditures data irrespective of the number of employees enables the identification of individual salary data at some libraries.

Top

Data Notes

Calculated fields in the web tools

Several fields used by the web tools are calculated from other fields in the data files. These calculated fields are not included in the data files downloaded directly from the NCES website. Calculated fields include per-capita and per-1,000-enrolled values, percent-of-total values, etc.

Click below for:

Top

Calculation of Enrollments for Academic Libraries

Enrollment figures are calculated from the Integrated Postsecondary Education Data System (IPEDS) Fall Enrollment Survey data for each postsecondary institution having a record in the Academic Libraries Survey data. Fall enrollment data for 1999-2000 was used with the Fiscal Year 2000 Academic Libraries Survey data. (The 1999-2000 fall enrollment data is not currently available for download; for IPEDS data availability, click here: Integrated Postsecondary Education Data System (IPEDS), Data Files.

Total full-time equivalent enrollment is used to calculate several "per-person-enrolled" values, such as Total library expenditures per person enrolled (FTE) and Total library staff per 1,000 enrolled.

"Full-time equivalent" enrollment is calculated as the full-time enrollment plus one-third the part-time enrollment. See Calculated Data Fields for more information.

Enrollment figures are calculated for undergraduate and post-baccalaureate enrollments and the total of the two. For purposes of the "Compare Academic Libraries" tool only, "Post-baccalaureate" means enrollment in any program for which a baccalaureate degree is required for admission, including graduate studies (M.A. and Ph.D. programs), professional school studies (i.e., M.D. or J.D. programs), and post-baccalaureate certificate studies.

Top

Population fields - Differences between "Population of Legal Service Area" and "Unduplicated Population" for Public Libraries

From the PLS documentation titled "Data File, Public-Use: Public Libraries Survey: Fiscal Year 2001" PDF File (1,036 KB):

Survey Population Items

The PLS has three population items: (1) Population of Legal Service Area (reported for each public library by the state library agency), (2) Total Unduplicated Population of Legal Service Areas (a single figure, reported by the state library agency), and (3) Official State Total Population Estimate (reported by the state library agency). The total Population of Legal Service Area for all public libraries in a state may exceed the state's Total Unduplicated Population of Legal Service Areas or the Official State Total Population Estimate. This occurs when the state has one or more geographically adjacent libraries (for example, a county library and a city library within the county) that serve, and therefore count, the same population. Twenty-six states had such overlapping service areas in FY 2001.

In order to do meaningful analysis using Population of Legal Service Area data (for example, the number of books/serial volumes per capita), the data were adjusted to eliminate duplicative reporting in states with overlapping service areas. The Public Library Data File has a derived unduplicated population of legal service area for each library for this purpose, called POPU_UND. This value was prorated for each library by calculating the ratio of a library's Population of Legal Service Area to the total Population of Legal Service Area for all libraries in the state, and applying the ratio to the state's Total Unduplicated Population of Legal Service Areas. (The latter item is a single, state-reported figure. It is on the State Summary/State Characteristics Data File and is also called POPU_UND.)

Top

Imputation

Imputation is a statistical means for providing a valid value for missing data. Note that data files, both public- and restricted-use, have had imputation applied, but the data used by the Library Statistics Program web tools have not.

Imputation in the Public Libraries Survey

From the PLS documentation titled "Data File, Public-Use: Public Libraries Survey: Fiscal Year 2001" PDF File (1,036 KB):

All libraries, including nonresponding libraries, were sorted into imputation cells based on the region and size of population served. Item imputation was performed on each record with nonresponse variables. The data are identified as either imputed (estimated) or reported (actual) on the survey data file, through the use of imputation codes.
  • See pages 16-19 of 78 in the Survey Methodology section for information on the imputation rules and post-imputation findings
  • See appendices I through L for information on imputation tags, their definitions, and frequencies

Imputation in Other Surveys

Other documents that have more information about imputation and how it is applied:

Top