## Appendix C Data Sources

Sources and Comparability of Data

The information in this report was obtained from many sources, including federal and state agencies, private research organizations, and professional associations. The data were collected by many methods, including surveys of a universe (such as all colleges) or of a sample, and compilations of administrative records. Care should be used when comparing data from different sources. Differences in procedures, such as timing, phrasing of questions, and interviewer training mean that the results from the different sources are not strictly comparable. More extensive documentation of one survey's procedures than of another's does not imply more problems with the data, only that more information is available.

Accuracy of Data

The accuracy of any statistic is determined by the joint effects of "sampling" and "nonsampling" errors. Estimates based on a sample will differ from the figures that would have been obtained if a complete census had been taken using the same survey instruments, instructions, and procedures. Besides sampling errors, both surveys, universe and sample, are subject to errors of design, reporting, processing, and errors due to nonresponse. To the extent possible, these nonsampling errors are kept to a minimum by methods built into the survey procedures. In general, however, the effects of nonsampling errors are more difficult to gauge than those produced by sampling variability.

Sampling Errors

The standard error is the primary measure of sampling variability. It provides a specific range-with a stated confidence-within which a given estimate would lie if a complete census had been conducted. The chances that a complete census would differ from the sample by less than the standard error are about 68 out of 100. The chances that the difference would be less than 1.65 times the standard error are about 90 out of 100. The chances that the difference would be less than 1.96 times the standard error are about 95 out of 100. The chances that it would be less than 2.58 times as large are about 99 out of 100.

The standard error can help assess how valid a comparison between two estimates might be. The standard error of a difference between two sample estimates that are uncorrelated is approximately equal to the square root of the sum of the squared standard errors of the estimates. The standard error (se) of the difference between sample estimate "a" and sample estimate "b" is:

sea-b = (sea2 + seb2)1/2

Note that most of the standard errors in subsequent sections and in the original documents are approximations. That is, to derive estimates of standard errors that would be applicable to a wide variety of items and could be prepared at a moderate cost, a number of approximations were required. As a result, most of the standard errors presented provide a general order of magnitude rather than the exact standard error for any specific item.

Nonsampling Errors

Both universe and sample surveys are subject to nonsampling errors. Nonsampling errors are of two kinds-random and nonrandom. Random nonsampling errors may arise when respondents or interviewers interpret questions differently, when respondents must estimate values, or when coders, keyers, and other processors handle answers differently. Nonrandom nonsampling errors result from total nonresponse (no usable data obtained for a sampled unit), partial or item nonresponse (only a portion of a response may be usable), inability or unwillingness on the part of respondents to provide information, difficulty interpreting questions, mistakes in recording or keying data, errors of collection or processing, and overcoverage or undercoverage of the target universe. Random nonresponse errors usually, but not always, result in an understatement of sampling errors and thus an overstatement of the precision of survey estimates. Because estimating the magnitude of nonsampling errors would require special experiments or access to independent data, these magnitudes are seldom available.

To compensate for suspected nonrandom errors, adjustments of the sample estimates are often made. For example, adjustments are frequently made for nonresponse, both total and partial. Imputations are usually made separately within various groups of sample members that have similar survey characteristics. Imputation for item nonresponse is an acceptable value which is substituted for missing or inconsistent data in a data set.

Although the magnitude of nonsampling errors in the data used in this Projections of Education Statistics is frequently unknown, idiosyncrasies that have been identified are noted on the appropriate tables.

Federal Agency Sources

National Center for Education Statistics (NCES)

Common Core of Data

NCES uses the Common Core of Data (CCD) survey to acquire and maintain statistical data from each of the 50 states, the District of Columbia, the Bureau of Indian Affairs, Department of Defense Dependents' Schools (overseas) and the outlying areas. Information about staff and students is collected annually at the school, local education agency or school district (LEA), and state levels. Information about revenues and expenditures is also collected at the state and LEA levels.

Data are collected for a particular school year (October 1 through September 30) via survey instruments sent to the state education agencies during the school year. States have 1 year in which to modify the data originally submitted.

Since the CCD is a universe survey, the CCD information presented in this edition of the Projections of Education Statistics is not subject to sampling errors. However, nonsampling errors could come from two sources-nonreturn and inaccurate reporting. Almost all of the states submit the six CCD survey instruments each year, but submissions are sometimes incomplete or too late for publication.

Understandably, when 58 education agencies compile and submit data for approximately 90,000 public schools and 16,000 local school districts, misreporting can occur. Typically, this results from varying interpretations of NCES definitions and differing recordkeeping systems. NCES attempts to minimize these errors by working closely with the state education agencies through the National Forum on Education Statistics.

The state education agencies report data to NCES from data collected and edited in their regular reporting cycles. NCES encourages the agencies to incorporate into their own survey systems the NCES items they do not already collect so that those items will also be available for the subsequent CCD survey. Over time, this has meant fewer missing data cells in each state's response, reducing the need to impute data.

NCES subjects data from the education agencies to a comprehensive edit. Where data are determined to be inconsistent, missing, or out of range, NCES contacts the education agencies for verification. NCES-prepared state summary forms are returned to the state education agencies for verification. States are also given an opportunity to revise their state-level aggregates from the previous survey cycle.

Further information on CCD may be obtained from:

John Sietsema
Elementary/Secondary Cooperative System and Institutional Studies Division (ESCSISD)
National Center for Education Statistics
1990 K Street NW
Washington, DC 20006
http://nces.ed.gov/ccd/

Private School Universe Survey

The purposes of Private School Survey (PSS) data collection activities are to build an accurate and complete list of private schools to serve as a sampling frame for NCES sample surveys of private schools; and to report data on the total number of private schools, teachers, and students in the survey universe. The PSS is conducted every 2 years, with collections in 1989-90, 1991-92, 1993-94, 1995-96, 1997-98, and 1999-2000 school years. The next survey will be in the 2001-02 school year.

The PSS produces data similar to that of the CCD for the public schools, and can be used for public-private comparisons. The data are useful for a variety of policy and research-relevant issues, such as the growth of religiously affiliated schools, the number of private high school graduates, the length of the school year for various private schools, and the number of private school students and teachers.

The target population for the universe survey consists of all private schools in the United States that meet NCES criteria of a school (e.g., private school is an institution which provides instruction for any of grades K through 12, has one or more teachers to give instruction, is not administered by a public agency, and is not operated in a private home). The survey universe is composed of schools identified from a variety of sources. The main source is a list frame, initially developed for the 1989-90 PSS. The list is updated regularly, matching it with lists provided by nationwide private school associations, state departments of education, and other national guides and sources which list private schools. The other source is an area frame search in approximately 120 geographic areas, conducted by the Bureau of the Census.

Further information on PSS may be obtained from:

Steve Broughman
Elementary/Secondary and Libraries Studies Division
Elementary/Secondary Sample Survey Studies Program
National Center for Education Statistics
1990 K Street NW
Washington, DC 20006
Stephen.broughman@ed.gov
http://nces.ed.gov/surveys/pss/

Integrated Postsecondary Education Data System

The Integrated Postsecondary Education Data System (IPEDS) surveys approximately 10,000 postsecondary institutions, including universities and colleges, as well as institutions offering technical and vocational education beyond the high school level. This survey, which began in 1986, replaced the Higher Education General Information Survey (HEGIS).

IPEDS consists of several integrated components that obtain information on who provides postsecondary education (institutions), who participates in it and completes it (students), what programs are offered and what programs are completed, and both the human and financial resources involved in the provision of institutionally based postsecondary education. Specifically, these components include: Institutional Characteristics, including instructional activity; Fall Enrollment, including age and residence; Completions; Finance; Staff; Salaries of Full-Time Instructional Faculty; and Graduation Rate.

The degree-granting institutions portion of this survey is a census of colleges awarding associate's or higher degrees and that were eligible to participate in Title IV financial aid programs. Prior to 1993, data from the technical and vocational institutions were collected through a sample survey. Beginning in 1993, all data are gathered in a census of all postsecondary institutions. The tabulations on "Institutional Characteristics" developed for this edition of the Projections of Education Statistics are based on lists of all institutions and are not subject to sampling errors.

The definition of institutions generally thought of as offering college and university education has been changed in recent years. The old standard for higher education institutions included those institutions that had courses that led to an associate degree or higher, or were accepted for credit towards those degrees. The higher education institutions were accredited by an agency or association that was recognized by the U.S. Department of Education or recognized directly by the Secretary of Education. The current category includes institutions which award associate or higher level degrees that are eligible to participate in Title IV federal financial aid programs. Tables that contain any data according to this standard are titled as "degree-granting" institutions. The impact of this change has generally not been large. For example, tables on faculty salaries and benefits were only affected to a very small extent. Also, degrees awarded at the bachelor's level or higher were not heavily affected. Most of the data on public 4-year colleges has been affected only to a minimal extent. The impact on enrollment in public 2-year colleges was noticeable in certain states, but relatively small at the national level. The largest impact has been on private 2-year college enrollment. Overall, enrollment for all institutions was about one-half a percent higher for degree-granting institutions compared to the total for higher education institutions.

Prior to the establishment of IPEDS in 1986, HEGIS acquired and maintained statistical data on the characteristics and operations of institutions of higher education. Implemented in 1966, HEGIS was an annual universe survey of institutions accredited at the college level by an agency recognized by the Secretary of the U.S. Department of Education. These institutions were listed in NCES' Education Directory, Colleges and Universities.

HEGIS surveys solicited information concerning institutional characteristics, faculty salaries, finances, enrollment, and degrees. Since these surveys were distributed to all higher education institutions, the data presented are not subject to sampling error. However, they are subject to nonsampling error, the sources of which varied with the survey instrument. Information concerning the nonsampling error of the enrollment and degrees surveys draws extensively on the HEGIS Post-Survey Validation Study conducted in 1979.

Further information on IPEDS may be obtained from:

Susan Broyles
Postsecondary Institutional Studies Division (PSD)
National Center for Education Statistics
1990 K Street NW
Washington, DC 20006
Susan.Broyles@ed.gov
http://nces.ed.gov/ipeds/

Institutional Characteristics This survey provides the basis for the universe of institutions presented in the Directory of Postsecondary Institutions. The survey collects basic information necessary to classify the institutions, including control, level, and kinds of programs; information on tuition, fees, and room and board charges; and unduplicated full-year enrollment counts and instructional activity. The overall response rate was 96.6 percent for 1998.

Further information may be obtained from:

Patricia Brown
National Center for Education Statistics
1990 K Street NW
Washington, DC 20006
Patricia.Brown@ed.gov
http://nces.ed.gov/ipeds/

Fall Enrollment This survey has been part of the HEGIS and IPEDS series since 1966. The enrollment survey response rate is relatively high. The 1998 overall response rate was 91.8 percent for degree-granting institutions. Major sources of nonsampling error for this survey as identified in the 1979 report were classification problems, the unavailability of needed data, interpretation of definitions, the survey due date, and operational errors. Of these, the classification of students appears to have been the main source of error. Institutions had problems in correctly classifying first-time freshmen and other first-time students for both full-time and part-time categories. These problems occurred most often at 2-year institutions (private and public) and private 4-year institutions. In the 1977-78 HEGIS validation studies, the classification problem led to an estimated overcount of 11,000 full-time students and an undercount of 19,000 part-time students. Although the ratio of error to the grand total was quite small (less than 1 percent), the percentage of errors was as high as 5 percent for detailed student levels and even higher at certain aggregation levels.

Beginning in fall 1986, the survey system was redesigned with the introduction of IPEDS (see above). The survey allows (in alternating years) for the collection of age and residence data.

Further information may be obtained from:

Frank Morgan
National Center for Education Statistics
1990 K Street NW
Washington, DC 20006
Frank.Morgan@ed.gov
http://nces.ed.gov/ipeds/

Completions This survey was part of the HEGIS series throughout its existence. However, the degree classification taxonomy was revised in 1970-71, 1982-83, and 1991-92. Collection of degree data has been maintained through the IPEDS system.

Though information from survey years 1970-71 through 1981-82 is directly comparable, care must be taken if information before or after that period is included in any comparison. The "Degrees-conferred" trend tables arranged by the 1991-92 classification are included in the Projections of Education Statistics to provide consistent data from 1970-71 to the most recent year. Data in this edition on associate and other formal awards below the baccalaureate, by field of study, cannot be made comparable with figures prior to 1982-83. The nonresponse rate did not appear to be a significant source of nonsampling error for this survey. The return rate over the years has been high, with the degree-granting institutions response rate for the 1997-98 survey at 92.3 percent. Because of the high return rate for degree-granting institutions, nonsampling error caused by imputation is also minimal. The overall response rate that includes the non-degree granting institutions was 73.8 percent in 1997-98.

The major sources of nonsampling error for this survey were differences between the NCES program taxonomy and taxonomies used by the colleges, classification of double majors, operational problems, and survey timing. In the 1979 HEGIS validation study, these sources of nonsampling contributed to an error rate of 0.3 percent overreporting of bachelor's degrees and 1.3 percent overreporting of master's degrees. The differences, however, varied greatly among fields. Over 50 percent of the fields selected for the validation study had no errors identified. Categories of fields that had large differences were business and management, education, engineering, letters, and psychology. It was also shown that differences in proportion to the published figures were less than 1 percent for most of the selected fields that had some errors. Exceptions to these were: master's and Ph.D. programs in labor and industrial relations (20 percent and 8 percent); bachelor's and master's programs in art education (3 percent and 4 percent); bachelor's and Ph.D. programs in business and commerce, and in distributive education (5 percent and 9 percent); master's programs in philosophy (8 percent); and Ph.D. programs in psychology (11 percent).

Further information on IPEDS Completions surveys may be obtained from:

Frank Morgan
Postsecondary Studies Division (PSD)
National Center for Education Statistics
1990 K Street NW
Washington, DC 20006
Frank.Morgan@ed.gov
http://nces.ed.gov/ipeds/

Financial Statistics This survey was part of the HEGIS series and has been continued under the IPEDS system. Changes were made in the financial survey instruments in fiscal years (FY) 1976, 1982, and 1987. The FY 76 survey instrument contained numerous revisions to earlier survey forms and made direct comparisons of line items very difficult. Beginning in FY 82, Pell Grant data were collected in the categories of federal restricted grants and contracts revenues and restricted scholarships and fellowships expenditures. The introduction of IPEDS in the FY 87 survey included several important changes to the survey instrument and data processing procedures. While these changes were significant, considerable effort has been made to present only comparable information on trends in this report and to note inconsistencies. Finance tables for this publication have been adjusted by subtracting the largely duplicative Pell Grant amounts from the later data to maintain comparability with pre-FY 82 data.

Possible sources of nonsampling error in the financial statistics include nonresponse, imputation, and misclassification. The response rate has been about 85 to 90 percent for most of the years reported. The response rate for the FY 97 survey was 95.1 percent for degree-granting institutions.

Two general methods of imputation were used in HEGIS. If the prior year's data were available for a nonresponding institution, these data were inflated using the Higher Education Price Index and adjusted according to changes in enrollments. If no previous year's data were available, current data were used from peer institutions selected for location (state or region), control, level, and enrollment size of institution. In most cases estimates for nonreporting institutions in IPEDS were made using data from peer institutions.

Beginning with FY 87, the IPEDS survey system included all postsecondary institutions, but maintained comparability with earlier surveys by allowing 2- and 4-year institutions to be tabulated separately. For FY 87 through FY 91, in order to maintain comparability with the historical time series of HEGIS institutions, data were combined from two of the three different survey forms that make up the IPEDS survey system. The vast majority of the data were tabulated from Form 1, which was used to collect information from public and private nonprofit 2- and 4-year colleges. Form 2, a condensed form, was used to gather data for the 2-year proprietary institutions. Because of the differences in the data requested on the two forms, several assumptions were made about the Form 2 reports so that their figures could be included in the institutions of higher education totals.

In IPEDS, the Form 2 institutions were not asked to separate appropriations from grants and contracts, nor state from local sources of funding. For the Form 2 institutions, all the federal revenues were assumed to be federal grants and contracts and all of the state and local revenues were assumed to be restricted state grants and contracts. All other Form 2 sources of revenue, except for tuition and fees and sales and services of educational activities, were included under "other." Similar adjustments were made to the expenditure accounts. The Form 2 institutions reported instruction and scholarship and fellowship expenditures only. All other educational and general expenditures were allocated to academic support.

To reduce reporting error, NCES uses national standards for reporting finance statistics. These standards are contained in College and University Business Administration: Administrative Services (1974 Edition), and the Financial Accounting and Reporting Manual for Higher Education (1990 Education), published by the National Association of College and University Business Officers; Audits of Colleges and Universities (as amended August 31, 1974), by the American Institute of Certified Public Accountants; and HEGIS Financial Reporting Guide (1980), by NCES. Wherever possible, definitions and formats in the survey form are consistent with those in these four accounting texts.

Further information on IPEDS Financial Statistics surveys may be obtained from:

Postsecondary Institutional Studies Program (PSD)
National Center for Education Statistics
1990 K Street NW
Washington, DC 20006
http://nces.ed.gov/ipeds/

Bureau of the Census

Current Population Survey

Current estimates of school enrollment rates, as well as social and economic characteristics of students, are based on data collected in the Census Bureau's monthly household survey of about 50,000 dwelling units. The monthly Current Population Survey (CPS) sample consists of 729 areas comprising 1,973 counties, independent cities, and minor civil divisions throughout the 50 states and the District of Columbia. The samples are initially selected based on the decennial census files and are periodically updated to reflect new housing construction.

The monthly CPS deals primarily with labor force data for the civilian noninstitutional population (i.e., excluding military personnel and their families living on post and inmates of institutions). In addition, in October of each year, supplemental questions are asked about highest grade completed, level and grade of current enrollment, attendance status, number and type of courses, degree or certificate objective, and type of organization offering instruction for each member of the household. In March of each year, supplemental questions on income are asked. The responses to these questions are combined with answers to two questions on educational attainment: highest grade of school ever attended, and whether that grade was completed.

The estimation procedure employed for monthly CPS data involves inflating weighted sample results to independent estimates of characteristics of the civilian noninstitutional population in the United States by age, sex, and race. These independent estimates are based on statistics from decennial censuses; statistics on births, deaths, immigration, and emigration; and statistics on the population in the armed services. Generalized standard error tables are provided in the Current Population Reports. The data are subject to both nonsampling and sampling errors.

Further information on CPS may be obtained from:

Education and Social Stratification Branch
Population Division
Bureau of the Census
U.S. Department of Commerce
Washington, DC 20233
http://www.bls.census.gov/cps/cpsmain.htm

School Enrollment Each October, the Current Population Survey (CPS) includes supplemental questions on the enrollment status of the population 3 years old and over, in addition to the monthly basic survey on labor force participation. The main sources of nonsampling variability in the responses to the supplement are those inherent in the survey instrument. The question of current enrollment may not be answered accurately for various reasons. Some respondents may not know current grade information for every student in the household, a problem especially for households with members in college or in nursery school. Confusion over college credits or hours taken by a student may make it difficult to determine the year in which the student is enrolled. Problems may occur with the definition of nursery school (a group or class organized to provide educational experiences for children), where respondents' interpretations of "educational experiences" vary.

The 1997 CPS sample was selected from the 1990 Decennial Census files with coverage in all 50 states and the District of Columbia. The sample is continually updated to account for new residential construction. The United States was divided into 2,007 geographic areas. In most states, a geographic area consists of a county or several contiguous counties. In some areas of New England and Hawaii, minor civil divisions are used instead of counties. A total of 754 geographic areas were selected for the sample. About 50,000 occupied households are eligible for interview every month. Interviewers are unable to obtain interviews at about 3,200 of these units. This occurs when the occupants are not found at home after repeated calls or are unavailable for some other reason. For the October 1997 basic CPS, the nonresponse rate was 6.3 percent and for the school enrollment supplement the nonresponse rate was an additional 4.7 percent for a total school supplement nonresponse rate of 10.7 percent.

Further information on CPS methodology may be obtained from:

Further information on CPS "School Enrollment" may be obtained from:

Education and Social Stratification Branch
Bureau of the Census
U.S. Department of Commerce
Washington, DC 20233
http://www.census.gov/population/www/socdemo/school.html

State population projections. These state population projections were prepared using a cohort-component method by which each component of population change-births, deaths, state-to-state migration flows, international in-migration, and international out-migration-was projected separately for each birth cohort by sex, race, and Hispanic origin. The basic framework was the same as in past Census Bureau projections.

Detailed components necessary to create the projections were obtained from vital statistics, administrative records, census data, and national projections.

The cohort-component method is based on the traditional demographic accounting system:

P1 = P0 + B - D + DIM - DOM + IIM - IOM

where:

P1 = population at the end of the period

P0 = population at the beginning of the period

B = births during the period

D = deaths during the period

DIM = domestic in-migration during the period

DOM = domestic out-migration during the period

IIM = international in-migration during the period

IOM = international out-migration during the period

To generate population projections with this model, the Census Bureau created separate data sets for each of these components. In general, the assumptions concerning the future levels of fertility, mortality, and international migration are consistent with the assumptions developed for the national population projections of the Census Bureau.

Once the data for each component were developed, it was a relatively straightforward process to apply the cohort-component method and produce the projections. For each projection year the base population for each state was disaggregated into eight race and Hispanic categories (non-Hispanic white; non-Hispanic black; non-Hispanic American Indian, Eskimo, and Aleut; non-Hispanic Asian and Pacific Islander; Hispanic white; Hispanic black; Hispanic American Indian, Eskimo, and Aleut; and Hispanic Asian and Pacific Islander), by sex, and single year of age (ages 0 to 85+). The next step was to survive each age-sex-race-ethnic group forward 1 year using the pertinent survival rate. The internal redistribution of the population was accomplished by applying the appropriate state-to-state migration rates to the survived population in each state. The projected out-migrants were subtracted from the state of origin and added to the state of destination (as in-migrants). Next, the appropriate number of immigrants from abroad were added to each group. The populations under age 1 were created by applying the appropriate age-race-ethnic-specific birth rates to females of childbearing age. The number of births by sex and race/ethnicity were survived forward and exposed to the appropriate migration rate to yield the population under age 1. The final results of the projection process were adjusted to be consistent with the national population projections by single years of age, sex, race, and Hispanic origin. The entire process was then repeated for each year of the projection.

More information is available in the Census Bureau Population Paper Listing 47 (PPL-47) and Current Population Report P25-1130. These reports may be obtained from:

Statistical Information Staff

Bureau of the Census
U.S. Department of Commerce
Washington, DC 20233
(301) 457-2422
INTERNET: http://www.census.gov

National population projections. The method used to produce projections of the United States population for future reference dates from a current base population reflects three fundamental principles. First, the projections are demographic. Future populations are derived from a base population through the projection of population change by its major demographic components, births, deaths, and migration. Second, the projection of the demographic components of change is driven by the composition of the population by age, sex, race, Hispanic origin, and nativity, and the way these variables determine the propensity to bear children, die, migrate to or from the United States. Third, the definition of the population with respect to who is included and the characteristics of included people remains the same throughout the projection period. We refer to these definitions collectively throughout the work as the "population universe." This concept embraces such issues as the inclusion or exclusion of people uncounted by a census, the rule defining residency in the United States, and the way we classify people by age, race, and Hispanic origin.

For more information, see "Methodology and Assumptions for the Population Projections of the United States: 1999 to 2100," Population Division Working Paper No. 38. This report is available on the INTERNET at http://www.census.gov.

Other Sources

National Education Association

Estimates of School Statistics

The National Education Association (NEA) reports teacher, revenue, and expenditure data in its annual publication, Estimates of School Statistics. Each year, NEA prepares regression-based estimates of financial and other education statistics and submits them to the states for verification. Generally, about 30 states adjust these estimates based on their own data. These preliminary data are published by NEA along with revised data from previous years. States are asked to revise previously submitted data as final figures become available. The most recent publication contains all changes reported to the NEA.

National Education Association-Research
1201 16th Street NW
Washington, DC 20036
http://www.nea.org

DRIWEFA, Inc.

DRIWEFA, Inc. provides an information system that includes more than 125 databases: simulation and planning models; regular publications and special studies; data retrieval and management systems; and access to experts on economic, financial, industrial, and market activities. One service is the DRI U.S. Annual Model Forecast Data Bank, which contains annual projections of the U.S. economic and financial conditions, including forecasts for the federal government, incomes, population, prices and wages, and state and local government, over a long-term (10 to 25-year) forecast period.