Appendix A: Technical Notes
Editing and Imputation
The Web-based data collection application features internal edit checks. An edit check tool alerted the respondent to questionable data via interactive "edit check warnings" during the data entry process and through edit check reports that could be viewed on screen or printed. The edit check program enabled the respondent to submit an edited data to NCES, which usually required little or no follow-up for data problems. The edit check tool includes seven types of edits:
- Summations - Reported totals are compared with the sums of the component data items. If they are not equal, a warning message is generated.
- Relational edit checks - The program compares responses entered in one section of the questionnaire with responses entered in another section of the questionnaire for consistency. For example, if a librarian reports that books and bound serials were added during the fiscal year, the program would look for some expenditure to be reported for books and bound serials. If the former is reported without the latter, an error message is generated. Another example is that the number of volumes of print materials added during the fiscal year cannot exceed the total number of volumes held at the end of the fiscal year.
- Range checks - An error message is generated if responses are above or below expected amounts. For example, if e-books added during the fiscal year was greater than 250,000, an error message is generated. If the reported hours of service are greater than 168 hours per week, an error message is generated.
- Current year/prior year comparisons - Current year data must fall within the "acceptable range" for the prior reporting period.
- Ratios - The ratio of one item to another must not exceed a limit, such as the ratio of program attendance to number of presentations exceeds an upper limit and is not reasonable.
- Item comparison - One item should not exceed another, such as books added during the year should not exceed books held at the end of the year.
- Missing or blank items - An item is not reported by a respondent, such as total expenditures is missing.
After responses were received, the U.S. Census Bureau reviewed the data and contacted respondents with questionable data to request verification or correction of that data. Data records were then aggregated into preliminary draft tables, which were reviewed by NCES and the U.S. Census Bureau for data quality issues.
When responses to surveys are sent to the Census Bureau for processing, sometimes answers are not given for every data item. In order to make complete datasets for constructing estimates of totals, we need to impute for the missing data items. This section describes the imputation methods that were used to fill in for the missing data.
For imputation, institutions were grouped into 27 imputation cells. The imputation cells were determined based on sector and full-time equivalent (FTE) enrollment of the institution. The sector categories used are (1) public, 4-year or above; (2) private non-profit, 4-year or above; (3) private for profit, 4-year or above; (4) public, 2-year; (5) private non-profit, 2-year; and (6) private for profit, 2-year. The following imputation methodologies were used:
- If values were missing and prior year (FY 2006) data were available and were not expected to change, then missing values were filled with the value in the prior year. For example, the number of reported Branches and Independent Libraries (Item 100) is likely to remain constant from year to year, so the prior year value was brought forward.
- If values were missing, a prior year value was available, and the value is expected to change from the prior year, then the missing value was filled with the prior year value multiplied by the median growth rate within the imputation cell. For example, the number of Full-time Librarians (Item 200, column 1) is expected to change from year to year, so this imputation method is used.
- If values were missing and prior year data were unavailable, then a value was imputed using the current year median cell distribution ratio. For example, impute E-books Added (Item 401, column 1) using the value of E-books Held (Item 401, column 2) multiplied by the median cell distribution ratio of E-books Added to E-books Held.
- If there were missing current year data that prevented the use of a current year median cell distribution ratio, then the current year cell median was used. For example, if it was not possible to impute E-books Added using a current year median cell distribution ratio because E-books Held was missing, then E-books Added was imputed with the current year cell median.
- After imputation, if a total was missing or known to need adjustment, then the total was readjusted to equal the sum of its detail items.
The use of institution FTE enrollment to determine imputation cells and the use of medians instead of means for imputation was not employed until 2002 and represents a change from previous survey cycles. While research indicates that the effect of the change in imputation procedure was not large, caution should be exercised in making comparisons with 2000 or earlier reports.