Skip Navigation
small header image

Projections of Education Statistics to 2008 / Appendix C


• Go to the Latest Version of Projection of Education Statistics •

Appendix C.  Data Sources

Sources and Comparability of Data

The information in this report was obtained from many sources, including Federal and state agencies, private research organizations, and professional associations. The data were collected by many methods, including surveys of a universe (such as all colleges) or of a sample, and compilations of administrative records. Care should be used when comparing data from different sources. Differences in procedures, such as timing, phrasing of questions, and interviewer training mean that the results from the different sources are not strictly comparable. More extensive documentation of one survey's procedures than of another's does not imply more problems with the data, only that more information is available.

Accuracy of Data

The accuracy of any statistic is determined by the joint effects of "sampling" and "nonsampling" errors. Estimates based on a sample will differ from the figures that would have been obtained if a complete census had been taken using the same survey instruments, instructions, and procedures. Besides sampling errors, both surveys, universe and sample, are subject to errors of design, reporting, processing, and errors due to nonresponse. To the extent possible, these nonsampling errors are kept to a minimum by methods built into the survey procedures. In general, however, the effects of nonsampling errors are more difficult to gauge than those produced by sampling variability.

Sampling Errors

The standard error is the primary measure of sampling variability. It provides a specific range--with a stated confidence--within which a given estimate would lie if a complete census had been conducted. The chances that a complete census would differ from the sample by less than the standard error are about 68 out of 100. The chances that the difference would be less than 1.65 times the standard error are about 90 out of 100. The chances that the difference would be less than 1.96 times the standard error are about 95 out of 100. The chances that it would be less than 2.58 times as large are about 99 out of 100.

The standard error can help assess how valid a comparison between two estimates might be. The standard error of a difference between two sample estimates that are uncorrelated is approximately equal to the square root of the sum of the squared standard errors of the estimates. The standard error (se) of the difference between sample estimate "a" and sample estimate "b" is:

sea-b = (sea 2+ seb 2) 1/2

Note that most of the standard errors in subsequent sections and in the original documents are approximations. That is, to derive estimates of standard errors that would be applicable to a wide variety of items and could be prepared at a moderate cost, a number of approximations were required. As a result, most of the standard errors presented provide a general order of magnitude rather than the exact standard error for any specific item.

Nonsampling Errors

Both universe and sample surveys are subject to nonsampling errors. Nonsampling errors are of two kinds--random and nonrandom. Random nonsampling errors may arise when respondents or interviewers interpret questions differently, when respondents must estimate values, or when coders, keyers, and other processors handle answers differently. Nonrandom nonsampling errors result from total nonresponse (no usable data obtained for a sampled unit), partial or item nonresponse (only a portion of a response may be usable), inability or unwillingness on the part of respondents to provide information, difficulty interpreting questions, mistakes in recording or keying data, errors of collection or processing, and overcoverage or undercoverage of the target universe. Random nonresponse errors usually, but not always, result in an understatement of sampling errors and thus an overstatement of the precision of survey estimates. Because estimating the magnitude of nonsampling errors would require special experiments or access to independent data, these magnitudes are seldom available.

To compensate for suspected nonrandom errors, adjustments of the sample estimates are often made. For example, adjustments are frequently made for nonresponse, both total and partial. An adjustment made for either type of nonresponse is often referred to as an imputation, that is, substitution of the "average" questionnaire response for the nonresponse. Imputations are usually made separately within various groups of sample members that have similar survey characteristics. Imputation for item nonresponse is usually made by substituting for a missing item the response to that item of a respondent having characteristics that are similar to those of the nonrespondent.

Although the magnitude of nonsampling errors in the data used in this Projections of Education Statistics is frequently unknown, idiosyncrasies that have been identified are noted on the appropriate tables.

Federal Agency Sources

National Center for Education Statistics (NCES)

Common Core of Data

NCES uses the Common Core of Data (CCD) survey to acquire and maintain statistical data on the 50 states, the District of Columbia, and the outlying areas from the universe of state-level education agencies. Information about staff and students is collected annually at the school, LEA (local education agency or school district), and state levels. Information about revenues and expenditures is also collected at the state and school district level.

Data are collected for a particular school year (July 1 through June 30) via survey instruments sent to the states by October 15 of the subsequent school year. States have 2 years in which to modify the data originally submitted.

Since the CCD is a universe survey, the CCD information presented in this edition of Projections of Education Statistics is not subject to sampling errors. However, nonsampling errors could come from two sources--nonreturn and inaccurate reporting. Almost all of the states submit the CCD survey instruments each year, but submissions are sometimes incomplete or too late for publication.

Understandably, when 57 education agencies compile and submit data for over 85,000 public schools and approximately 15,000 local school districts, misreporting can occur. Typically, this results from varying interpretation of NCES definitions and differing recordkeeping systems. NCES attempts to minimize these errors by working closely with the Council of Chief State School Officers (CCSSO).

The state education agencies report data to NCES from data collected and edited in their regular reporting cycles. NCES encourages the agencies to incorporate into their own survey systems the NCES items they do not already collect so that those items will also be available for the subsequent CCD survey. Over time, this has meant fewer missing data cells in each state's response, reducing the need to impute data.

NCES subjects data from the education agencies to a comprehensive edit. Where data are determined to be inconsistent, missing, or out of range, NCES contacts the education agencies for verification. NCES-prepared state summary forms are returned to the state education agencies for verification. States are also given an opportunity to revise their state-level aggregates from the previous survey cycle.

Questions concerning the Common Core of Data can be directed to:

John Sietsema
Surveys and Cooperative Systems Group
National Center for Education Statistics
1900 K Street NW, Suite 9000
Washington DC 20006

Private School Early Estimates System: 1992-93.
Early in September 1992, advance questionnaires were mailed to a national probability sample of 1,167 private elementary and secondary schools. Telephone collection of the data began in early October and was completed in mid-October. The telephone data collection used Computer Assisted Telephone Interviewing (CATI) technology to collect the data and perform preliminary edits. The overall response rate was 93.3 percent: 1,045 of the 1,120 eligible schools. Some 47 of the original 1,167 schools in the sample were determined to be out-of-scope. After adjusting for out-of-scope schools, the weighted estimate of private schools is 26,011.

The sampling frame used for the Private School Early Estimates Survey was the 1991-92 NCES Private School Survey (PSS). This survey collected information on the number of teachers and students in private schools, by school religious orientation and level as well as actual and projected counts of high school graduates. The PSS, and therefore the early estimates survey, uses two nonoverlapping frames: the list frame of approximately 24,000 eligible schools (the universe list), and an area frame developed by the Census Bureau, consisting of 355 schools identified in 124 sampled geographic areas (Primary Sampling Units or PSUs). The area frame is constructed from a sample survey designed to capture those schools not included in the universe list and is repeated every 2 years. The 355 schools identified in the sampled areas are weighted to a national estimate of the number of private schools not included in the universe list. This weighted number is then added to the universe count to produce an estimate of the total number of private schools in the United States.

For the early estimates, the list frame was stratified by level of school (elementary, secondary, and combined) and religious orientation (Catholic, other religious, and nonsectarian). Within strata, schools were further sorted by Census region (Northeast, Midwest, South, and West), by urbanicity (urban, suburban, and rural) within region, and by student membership size within urbanicity. Each school in the sorted frame was assigned a sampling measure of size equal to the square root of student membership.

The area frame was stratified by level of school (elementary, secondary, and combined) and religious orientation (Catholic, other religious, and nonsectarian). Within strata, schools were further sorted by FIPS (Federal Information Processing Standards) state code, by PSU within state, and by student membership within PSU. Samples were selected with probabilities proportionate to size from each stratum. The measure of size used for this purpose was the square root of student membership multiplied by the inverse of the probability of selection of the PSU in which the school is located.

The estimation procedure is a two-step process. The first step is to produce estimates based on the NCES frame for private schools (1991-92 Private School Survey [PSS]). These estimates are adjusted for total school nonresponse, as well as item nonresponse. The second step is to update the PSS-based estimates, using the data collected in the 1992 Early Estimates Survey (EES). This EES update is a ratio estimate of the 1992 estimate from EES divided by the 1991 estimate based on the 1991 PSS data for the EES sample. The estimates in the tables are the PSS-based estimates times the EES update. The early estimates in this report incorporate the relevant estimates from the PSS and update them using data collected in the EES.

The private school early estimates are based on a sample; these estimates may differ somewhat from figures that would have been obtained if a complete census of private schools had been taken using the same questionnaire and procedures. The standard error indicates the magnitude of the sampling error, the variability due to sampling when estimating a statistic. It indicates how much variance there is in the population of possible estimates of a parameter for a given sample size. Standard errors can be used as a measure of the precision expected from a particular sample. If all possible samples were surveyed under similar conditions, intervals of 1.96 standard errors below to 1.96 standard errors above a particular statistic would include the true population parameter being estimated in about 95 percent of the samples. This is a 95 percent confidence interval. For example, for the ratio of private school pupils to private school teachers in 1992-93, the estimate for all private schools is 14.9 and the standard error is 0.2. The 95 percent confidence interval for this statistic extends from 14.9 - (0.2 times 1.96) to 14.9 + (0.2 times 1.96) or from 14.5 to 15.3. The standard error for the 4,964,258 students in private schools is 116,612. The 95 percent confidence interval for this statistic extends from 4,735,698 to 5,192,818.

Estimates of standard errors were computed using a variance estimation procedure for complex sample survey data known as balanced repeated replication (BRR)--a technique that splits the sample into several different half-samples. Weight-adjusted estimates are computed from the half-samples. Finally, the standard error of the half-sample estimates is used as an approximation for the full-sample standard error. The standard errors for private school early estimates for school years 1991-92 and 1992-93 are shown in the table below.

------------------------------------------------------------------------
Students Teachers Graduates (1992-93) (1992-93) (1991-92)
------------------------------------------------------------------------
116,612.2 8,714.8 6,071.4
------------------------------------------------------------------------

Survey estimates are also subject to errors of reporting and errors made in the collection and processing of the data. These errors, called nonsampling errors, can sometimes bias the data. While general sampling theory can be used to estimate the sampling variability of an estimate, nonsampling errors are not easy to measure and usually require either an experiment conducted as part of the data collection procedure or use of data external to the study.

Nonsampling errors may include such things as differences in the respondents' interpretation of the meaning of the questions, differences related to the particular time the survey was conducted, or errors in data preparation. The content of the survey was developed in consultation with representatives of private school associations attending NCES meetings for users of private school data. The questionnaire and instructions were reviewed extensively by NCES staff. The CATI instrument provided online internal consistency checks (i.e., totals equal sum of parts) as well as consistency checks with 1991 data for the sample school. Interviewers resolved discrepancies with the school during the course of the interview. Machine editing of the questionnaires was conducted to check the data for accuracy and consistency. Data inputs into the CATI system were transferred directly to processing, avoiding potential keying errors.

Undercoverage in the list and area frames is another possible source of nonsampling error. The area frame was used to complement the list frame through the identification of schools missing from the list frame. The area frame represents approximately 10 percent of the total number of private schools. The 1991-92 list and area frame updates to the PSS were reflected in this year's early estimates, and so schools newly opened since 1989 are included in those new estimates.

Questions concerning the Private School Early Estimates System can be directed to:

Frank H. Johnson
Surveys and Cooperative Systems Group
National Center for Education Statistics
1900 K Street NW, Suite 9000
Washington DC 20006

Integrated Postsecondary Education Data System

The Integrated Postsecondary Education Data System (IPEDS) surveys all postsecondary institutions, including universities and colleges, as well as institutions offering technical and vocational education beyond the high school level. This survey, which began in 1986, replaces and supplements the Higher Education General Information Survey (HEGIS).

The IPEDS consists of several integrated components that obtain information on who provides postsecondary education (institutions), who participates in it and completes it (students), what programs are offered and what programs are completed, and both the human and financial resources involved in the provision of institutionally based postsecondary education. Specifically, these components include: institutional characteristics, including institutional activity; fall enrollment, including age and residence; fall enrollment in occupationally specific programs; completions; finance; staff; salaries of full-time instructional faculty; and academic libraries.

The higher education portion of this survey is a census of accredited 2- and 4-year colleges. Prior to 1993, data from the technical and vocational institutions were collected through a sample survey. Beginning in 1993, all data are gathered in a census of all postsecondary institutions. Thus, some portions of the data will be subject to sampling and nonsampling errors, while some portions will be subject only to nonsampling errors.

Prior to the establishment of IPEDS in 1986, HEGIS acquired and maintained statistical data on the characteristics and operations of institutions of higher education. Implemented in 1966, HEGIS was an annual universe survey of institutions listed in the latest NCES Education Directory, Colleges and Universities.

The information presented in this report draws on IPEDS surveys that solicited information concerning institutional characteristics, enrollment, degrees, and finances. The higher education portion of this system is a census of accredited 2- and 4-year colleges. Since these surveys cover all institutions in the universe, the data are not subject to sampling error.

However, they are subject to nonsampling error, the sources of which vary with the survey instrument. Each survey will therefore be discussed separately. Information concerning the nonsampling error of the enrollment and degrees surveys is drawn extensively from the HEGIS Post-Survey Validation Study conducted in 1979.

Institutional Characteristics. This survey provided the basis for the universe of institutions presented in the Education Directory, Colleges and Universities. The universe comprised institutions that met certain accreditation criteria and offered at least a 1-year program of college-level studies leading toward a degree. All of these institutions were certified as eligible by the U.S. Department of Education's Division of Eligibility and Agency Evaluation. Each fall, institutions listed in the previous year's Directory were asked to update a computer printout of their information.

Fall Enrollment. This survey has been part of the IPEDS or HEGIS series since 1966. The enrollment survey response rate was relatively high; the 1995 response rate was 97.0 percent. Major sources of nonsampling error for this survey were classification problems, the unavailability of needed data, interpretation of definitions, the survey due date, and operational errors. Of these, the classification of students appears to have been the main source of error. Institutions had problems in correctly classifying first-time freshmen, other first-time students, and unclassified students for both full-time and part-time categories. These problems occurred most often at 2-year institutions (private and public) and private 4-year institutions. In the 1977-78 HEGIS validation studies, the classification problem led to an estimated overcount of 11,000 full-time students and an undercount of 19,000 part-time students. Although the ratio of error to the grand total was quite small (less than 1 percent), the percentage of errors was as high as 5 percent for detailed student levels and even higher at certain aggregation levels.

Beginning with fall 1986, the survey system was redesigned with the introduction of the Integrated Postsecondary Education Data System (IPEDS) (see above). The new survey system comprises all postsecondary institutions, but also maintains comparability with earlier surveys by allowing HEGIS institutions to be tabulated separately. The new system also provides for preliminary and revised data releases. This allows the Center flexibility to release early data sets while still maintaining a more accurate final data base.

Completions. This survey was part of the HEGIS series throughout its existence. However, the degree classification taxonomy was revised in 1970-71, 1982-83, and 1991-92. Collection of degree data has been maintained through the IPEDS system.

Though information from survey years 1970-71 through 1981-82 is directly comparable, care must be taken if information before or after that period is included in any field of study comparison. The nonresponse rate did not appear to be a significant source of nonsampling error for this survey. The return rate over the years was high, with the response rate for the 1994-95 survey at 97 percent. Because of the high return rate, nonsampling error caused by imputation was also minimal.

The major sources of nonsampling error for this survey were differences between the NCES program taxonomy and taxonomies used by the colleges, classification of double majors and double degrees, operational problems, and survey timing. In the 1979 HEGIS validation study, these sources of nonsampling were found to contribute to an error rate of 0.3 percent overreporting of bachelor's degrees and 1.3 percent overreporting of master's degrees. The differences, however, varied greatly among fields. Over 50 percent of the fields selected for the validation study had no errors identified. Categories of fields that had large differences were business and management, education, engineering, letters, and psychology. It was also shown that differences in proportion to the published figures were less than 1 percent for most of the selected fields that had some errors. Exceptions to these were: master's and doctor's programs in labor and industrial relations (20 percent and 8 percent); bachelor's and master's programs in art education (3 percent and 4 percent); bachelor's and doctor's programs in business and commerce, and in distributive education (5 percent and 9 percent); master's programs in philosophy (8 percent); and doctor's programs in psychology (11 percent).

Financial Statistics. This survey was part of the HEGIS series and has been continued under the IPEDS system. Changes were made in the financial survey instruments in fiscal years (FY) 1976, 1982, and 1987. The FY 76 survey instrument contained numerous revisions to earlier survey forms and made direct comparisons of line items very difficult. Beginning in FY 82, Pell Grant data were collected in Federal restricted grants and contracts revenues and restricted scholarships and fellowships expenditures. The introduction of the Integrated Postsecondary Education Data System (IPEDS) in the FY 87 survey included several important changes to the survey instrument and data processing procedures. While these changes were significant, considerable effort has been made to present only comparable information on trends in this report and to note inconsistencies. Finance tables for this publication have been adjusted by subtracting the largely duplicative Pell Grant amounts from the later data to maintain comparability with pre-FY 82 data.

Possible sources of nonsampling error in the financial statistics include nonresponse, imputation, and misclassification. The response rate has been about 85 to 90 percent for most of the years reported. The response rate for the FY 1995 survey was 94 percent.

Two general methods of imputation were used in HEGIS. If the prior years' data were available for a nonresponding institution, these data were inflated using the Higher Education Price Index and adjusted according to changes in enrollments. If there were no data for the previous four years, current data were used from peer institutions selected for location (state or region), control, level, and enrollment size of institution. In most cases, estimates for nonreporting institutions in IPEDS were made using data from peer institutions.

Beginning with FY 87, the new system (IPEDS) comprises all postsecondary institutions, but also maintains comparability with earlier surveys by allowing 2- and 4- year HEGIS institutions to be tabulated separately. The finance data tabulated for this publication reflect totals for the HEGIS or higher education institutions only.

To reduce reporting error, NCES used national standards for reporting finance statistics. These standards are contained in College and University Business Administration: Administrative Services (1974 Edition) and the Financial Accounting and Reporting Manual for Higher Education (1990 Edition) published by the National Association of College and University Business Officers; Audits of Colleges and Universities (as amended August 31, 1974), by the American Institute of Certified Public Accountants; and HEGIS Financial Reporting Guide (1980), by NCES. Wherever possible, definitions and formats in the survey are consistent with those in these four accounting texts.

Questions concerning the surveys used as data sources for this report or other questions concerning HEGIS and IPEDS can be directed to:

Surveys and Cooperative Systems Group
National Center for Education Statistics
1900 K Street NW, Suite 9000
Washington DC 20006

Bureau of the Census

Current Population Survey

Current estimates of school enrollment, as well as social and economic characteristics of students, are based on data collected in the Census Bureau's monthly survey of about 60,000 households. The monthly Current Population Survey (CPS) sample consists of 729 areas comprising 1,973 counties, independent cities, and minor civil divisions throughout the 50 states and the District of Columbia. The sample was initially selected from the 1980 census files and is periodically updated to reflect new housing construction.

The monthly CPS deals primarily with labor force data for the civilian noninstitutional population (i.e., excluding military personnel and their families living on posts and inmates of institutions). In addition, in October of each year, supplemental questions are asked about highest grade completed, level of current enrollment, attendance status, number and types of courses, degree or certificate objective, and type of organization offering instruction for each member of the household.

The estimation procedure used for the monthly CPS data involves inflating weighted sample results to independent estimates of characteristics of the civilian noninstitutional population in the United States by age, sex, and race. These independent estimates are based on statistics from decennial censuses that include statistics on births, deaths, immigration, and emigration and statistics on the population in the armed services. Generalized standard error tables are in the Current Population Reports. The data are subject to both nonsampling and sampling errors.

More information is available in the Current Population Reports , Series P-20, or by contacting:

Education and Social Stratification Branch
Bureau of the Census
U.S. Department of Commerce
Washington, DC 20233

School Enrollment. Each October, the Current Population Survey (CPS) includes supplemental questions on the enrollment status of the population 3 years old and over. The main sources of nonsampling variability in the responses to the supplement are those inherent in the survey instrument. The question concerning educational attainment may be sensitive for some respondents who may not want to acknowledge their lack of a high school diploma. The question of current enrollment may not be answered accurately for various reasons. Some respondents may not know current grade information for every student in the household, a problem especially prevalent for households with members in college or in nursery school. Confusion over college credits or hours taken by a student may make it difficult to determine the year in which the student is enrolled. Problems may occur with the definition of nursery school (a group or class organized to provide educational experiences for children) where respondents' interpretations of "educational experiences" vary.

Questions concerning the CPS "School Enrollment" survey may be directed to:

Education and Social Stratification Branch
Bureau of the Census
U.S. Department of Commerce
Washington, DC 20233

State population projections. These state population projections were prepared using a cohort-component method by which each component of population change--births, deaths, state-to-state migration flows, international in-migration, and international out-migration--was projected separately for each birth cohort by sex, race, and Hispanic origin. The basic framework was the same as in past Census Bureau projections. Detailed components necessary to create the projections were obtained from vital statistics, administrative records, census data, and national projections.

The cohort-component method is based on the traditional demographic accounting system:

P1 = P0 + B - D + DIM - DOM + IIM - IOM

where:

P1 = population at the end of the period

P0 = population at the beginning of the period

B = births during the period

D = deaths during the period

DIM = domestic in-migration during the period

DOM = domestic out-migration during the period

IIM = international in-migration during the period

IOM = international out-migration during the period

To generate population projections with this model, the Census Bureau created separate data sets for each of these components. In general, the assumptions concerning the future levels of fertility, mortality, and international migration are consistent with the assumptions developed for the national population projections of the Census Bureau.

Once the data for each component were developed, it was a relatively straightforward process to apply the cohort-component method and produce the projections. For each projection year the base population for each state was disaggregated into eight race and Hispanic categories (non-Hispanic white; non-Hispanic black; non-Hispanic American Indian, Eskimo, and Aleut; non-Hispanic Asian and Pacific Islander, Hispanic white; Hispanic black; Hispanic American Indian, Eskimo, and Aleut; and Hispanic Asian and Pacific Islander), by sex, and single year of age (ages 0 to 85+). The next step was to survive each age-sex-race-ethnic group forward 1 year using the pertinent survival rate. The internal redistribution of the population was accomplished by applying the appropriate state-to-state migration rates to the survived population in each state. The projected out-migrants were subtracted from the state of origin and added to the state of destination (as in-migrants). Next, the appropriate number of immigrants from abroad were added to each group. The populations under age 1 were created by applying the appropriate age-race-ethnic-specific birth rates to females of childbearing age. The number of births by sex and race/ethnicity were survived forward and exposed to the appropriate migration rate to yield the population under age 1. The final results of the projection process were adjusted to be consistent with the national population projections by single years of age, sex, race, and Hispanic origin. The entire process was then repeated for each year of the projection.

More information is available in the Census Bureau Population Paper Listing 47 (PPL-47) and Current Population Report P25-1130. These reports may be obtained from:

Statistical Information Staff
Bureau of the Census
U.S. Department of Commerce
Washington, DC 20233
(301) 457-2422
INTERNET: http://www.census.gov

Other Sources

National Education Association Estimates of School Statistics

The National Education Association (NEA) reports teacher, revenue, and expenditure data in its annual publication, Estimates of School Statistics. Each year, NEA prepares regression-based estimates of financial and other education statistics and submits them to the states for verification. Generally, about 30 states adjust these estimates based on their own data. These preliminary data are published by NEA along with revised data from previous years. States are asked to revise previously submitted data as final figures become available. The most recent publication contains all changes reported to the NEA.

Additional information is available from:

National Education Association--Research
1201 16th Street NW
Washington, DC 20036

WEFA Group

WEFA provides a broad range of services that includes forecasts for over 90 economies; over 2 million time series including data on 152 countries; and consultation on a wide variety of business and government issues. One service is the Mark 11 Quarterly Macro Model of the U.S. Economy, which contains projections of the U.S. economic and financial conditions, including forecasts for the federal government, incomes, population, prices, and wages, and state and local government, over a long-term (25 year) forecast period.

Additional information is available from:

WEFA Group
Headquarters
800 Baldwin Tower
Eddystone, PA 19022



Top of Page

Appendix A1 Contents List of Figures List of Tables Glossary

1990 K Street, NW
Washington, DC 20006, USA
Phone: (202) 502-7300 (map)