NCES 2011-017
March 2011

## Data Sources

### Sources and Comparability of Data

The information in this report was obtained from many sources, including federal and state agencies, private research organizations, and professional associations. The data were collected by many methods, including surveys of a universe (such as all colleges) or of a sample, and compilations of administrative records. Care should be used when comparing data from different sources. Differences in procedures, such as timing, phrasing of questions, and interviewer training, mean that the results from the different sources are not strictly comparable. More extensive documentation of one survey’s procedures than of another’s does not imply more problems with the data, only that more information is available on the survey.

### Accuracy of Data

The accuracy of any statistic is determined by the joint effects of “sampling” and “nonsampling” errors. Estimates based on a sample will differ from the figures that would have been obtained if a complete census had been taken using the same survey instruments, instructions, and procedures. Besides sampling errors, both of the surveys, universe and sample, are subject to errors of design, reporting, and processing, and errors due to nonresponse. To the extent possible, these nonsampling errors are kept to a minimum by methods built into the survey procedures. In general, however, the effects of nonsampling errors are more difficult to gauge than those produced by sampling variability.

### Sampling Errors

The standard error is the primary measure of the sampling variability of an estimate. Standard errors can be used to produce confidence intervals. For example, from table A-10, an estimated 91.8 percent of public school teachers reported that they worked full time in 2007–08. This figure has an estimated standard error of 0.29 percent. Therefore, the estimated 95 percent confidence interval for this statistic is approximately 92.41 to 91.27 percent (91.8 ± 1.96 (0.29)). That is, if the processes of selecting a sample, collecting the data, and constructing the confidence interval were repeated, it would be expected that in 95 out of 100 samples from the same population, the confidence interval would contain the true full time working rate.

Analysis of standard errors can help assess how valid a comparison between two estimates might be. The standard error of a difference between two independent sample estimates is equal to the square root of the sum of the squared standard errors of the estimates. The standard error (se) of the difference between independent sample estimates a and b is

sea-b = (sea2 + seb2 ) 1/2

Note that most of the standard errors in the original documents are approximations. That is, to derive estimates of standard errors that would be applicable to a wide variety of items and could be prepared at a moderate cost, a number of approximations were required. As a result, most of the standard errors presented provide a general order of magnitude rather than the exact standard error for any specific item.

### Nonsampling Errors

Both universe and sample surveys are subject to nonsampling errors. Nonsampling errors are of two kinds—random and nonrandom. Random nonsampling errors may arise when respondents or interviewers interpret questions differently, when respondents must estimate values, or when coders, keyers, and other processors handle answers differently. Nonrandom nonsampling errors result from total nonresponse (no usable data obtained for a sampled unit), partial or item nonresponse (only a portion of a response may be usable), inability or unwillingness on the part of respondents to provide information, difficulty interpreting questions, mistakes in recording or keying data, errors of collection or processing, and overcoverage or undercoverage of the target universe. Random nonresponse errors usually, but not always, result in an understatement of sampling errors and thus an overstatement of the precision of survey estimates. Because estimating the magnitude of nonsampling errors would require special experiments or access to independent data, these magnitudes are seldom available.

To compensate for suspected nonrandom errors, adjustments of the sample estimates are often made. For example, adjustments are frequently made for nonresponse, both total and partial. Imputations are usually made separately within various groups of sample members that have similar survey characteristics. Imputation for item nonresponse is usually made by substituting for a missing item the response to that item of a respondent having characteristics similar to those of the respondent.

Although the magnitude of nonsampling errors in the data used in Projections of Education Statistics is frequently unknown, idiosyncrasies that have been identified are noted on the appropriate tables.

### Federal Agency Sources

#### National Center for Education Statistics (NCES)

Common Core of Data

NCES uses the Common Core of Data (CCD) to acquire and maintain statistical data from each of the 50 states, the District of Columbia, the Bureau of Indian Education, Department of Defense Dependents’ Schools (overseas), and the outlying areas (American Samoa, Guam, Northern Marianas, Puerto Rico, and U.S. Virgin Islands). Information about staff and students is collected annually at the school, local education agency (LEA) or school district, and state levels. Information about revenues and expenditures is also collected at the state and LEA levels.

Data are collected for a particular school year via an on-line reporting system open to state education agencies during the school year. Beginning with the 2006–07 school year, nonfiscal CCD data are collected through the Department of Education’s Education Data Exchange Network (EDEN). Since the CCD is a universe collection, CCD data are not subject to sampling errors. However, nonsampling errors could come from two sources: nonresponse and inaccurate reporting. Almost all of the states submit the five CCD survey instruments each year, but submissions are sometimes incomplete.

Misreporting can occur when 58 education agencies compile and submit data for approximately 97,000 public schools and over 17,000 local education agencies. Typically, this results from varying interpretations of NCES definitions and differing record-keeping systems. NCES attempts to minimize these errors by working closely with the state education agencies through the National Forum on Education Statistics

The state education agencies report data to NCES from data collected and edited in their regular reporting cycles. NCES encourages the agencies to incorporate into their own survey systems the NCES items they do not already collect so that these items will also be available for the subsequent CCD survey. Over time, this has meant fewer missing data cells in each state’s response, reducing the need to impute data.

NCES subjects data from the state education agencies to a comprehensive edit. Where data are determined to be inconsistent, missing, or out of range, NCES contacts the agencies for verification. NCES-prepared state summary forms are returned to the agencies for verification. Each year, states are also given an opportunity to revise their state-level aggregates from the previous survey cycle.

Further information on the nonfiscal CCD may be obtained from

Chen-Su Chen
Elementary/Secondary and Libraries Studies Division
Elementary/Secondary Cooperative System and Institutional Studies Program
National Center for Education Statistics
1990 K Street NW
Washington, DC 20006
http://nces.ed.gov/ccd/

Further information on the fiscal CCD data may be obtained from

Frank H. Johnson
Elementary/Secondary and Libraries Studies Division
Elementary/Secondary Cooperative System and Institutional Studies Program
National Center for Education Statistics
1990 K Street NW
Washington, DC 20006
http://nces.ed.gov/ccd/

### Private School Universe Survey

The purposes of Private School Universe Survey (PSS) data collection activities are to build an accurate and complete list of private schools to serve as a sampling frame for NCES sample surveys of private schools, and to report data on the total number of private schools, teachers, and students in the survey universe. The PSS is conducted every 2 years, with collections in the 1989–90, 1991–92, 1993–94, 1995–96, 1997–98, 1999–2000, 2001–02, 2003–04, 2005–06, and 2007–08 school years.

The PSS produces data similar to that of the CCD for public schools and can be used for public-private comparisons. The data are useful for a variety of policy and research-relevant issues, such as the growth of religiously affiliated schools, the number of private high school graduates, the length of the school year for various private schools, and the number of private school students and teachers.

The target population for this universe survey is all private schools in the United States that meet the NCES criteria of a school (i.e., a private school is an institution that provides instruction for any of grades K through 12, has one or more teachers to give instruction, is not administered by a public agency, and is not operated in a private home). The survey universe is composed of schools identified from a variety of sources. The main source is a list frame, initially developed for the 1989–90 PSS. The list is updated regularly, matching it with lists provided by nationwide private school associations, state departments of education, and other national guides and sources that list private schools. The other source is an area frame search in approximately 124 geographic areas, conducted by the U.S. Census Bureau.

Further information on the PSS may be obtained from

Steve Broughman
Elementary/Secondary and Libraries Studies Division
Elementary/Secondary Sample Survey Studies Program
National Center for Education Statistics
1990 K Street NW
Washington, DC 20006
http://nces.ed.gov/surveys/pss/

### Integrated Postsecondary Education Data System

The Integrated Postsecondary Education Data System (IPEDS) surveys approximately 6,700 postsecondary institutions, including universities and colleges, as well as institutions offering technical and vocational education beyond the high school level. IPEDS, which began in 1986, replaced the Higher Education General Information Survey (HEGIS).

IPEDS consists of nine integrated components that obtain information on who provides postsecondary education (institutions), who participates in it and completes it (students), what programs are offered and what programs are completed, and both the human and financial resources involved in the provision of institutionally-based postsecondary education. Until 2000 these components included: institutional characteristics, fall enrollment, completions, salaries, finance, and fall staff. Since 2000, data are collected in the fall for institutional characteristics and completions; in the winter for employees by assigned position (EAP), salaries, and fall staff; and in the spring for enrollment, student financial aid, finances, and graduation rates. With the winter 2005–06 survey the employees by assigned position, fall staff, and salaries components were merged into the human resources component. In 2007–08, the enrollment component was broken into two separate components: 12-month enrollment (collected in the fall) and fall enrollment (collected in the spring). The Graduation rates 200 percent survey is new to the Spring 2010 collection. Data are collected for the number of students who completed their program within 200 percent of the normal time period. This survey was developed to fulfill requirements in the Higher Education Opportunity Act of 2008.

The degree-granting institutions portion of IPEDS is a census of colleges awarding associate’s or higher degrees, that are eligible to participate in Title IV financial aid programs. Prior to 1993, data from technical and vocational institutions were collected through a sample survey. Beginning in 1993, all data were gathered in a census of all postsecondary institutions. The IPEDS tabulations developed for this edition of Projections of Education Statistics are based on lists of all institutions and are not subject to sampling errors.

The definition of institutions generally thought of as offering college and university education has changed in recent years. The old standard for higher education institutions included those institutions that had courses leading to an associate degree or higher, or that had courses accepted for credit toward those degrees. The higher education institutions were accredited by an agency or association that was recognized by the U.S. Department of Education, or were recognized directly by the Secretary of Education. The current category includes institutions that award associate or higher level degrees and that are eligible to participate in Title IV federal financial aid programs. The impact of this change has generally not been large. For example, tables on degrees awarded at the bachelor’s level or higher were not heavily affected. Most of the data on public 4-year colleges have been affected only to a minimal extent. The impact on enrollment in public 2-year colleges was noticeable in certain states, but relatively small at the national level. The largest impact has been on private 2-year college enrollment. Overall, total enrollment for all institutions was about one-half of a percent higher for degree-granting institutions than for higher education institutions.

Prior to the establishment of IPEDS in 1986, HEGIS acquired and maintained statistical data on the characteristics and operations of institutions of higher education. Implemented in 1966, HEGIS was an annual universe survey of institutions accredited at the college level by an agency recognized by the Secretary of the U.S. Department of Education. These institutions were listed in the NCES publication Education Directory, Colleges and Universities.

HEGIS surveys collected information concerning institutional characteristics, faculty salaries, finances, enrollment, and degrees. Since these surveys were distributed to all higher education institutions, the data presented are not subject to sampling error. However, they are subject to nonsampling error, the sources of which varied with the survey instrument. Information concerning the nonsampling error of the HEGIS enrollment and degrees surveys can be obtained from the HEGIS Post Survey Validation Study conducted in 1979.

Further information may be obtained from

Elise Miller
Postsecondary Studies Division
Postsecondary Institutional Studies Program
National Center for Education Statistics
1990 K Street NW
Washington, DC 20006
http://nces.ed.gov/ipeds/

Fall (Institutional Characteristics) This survey collects the basic information necessary to classify institutions, including control, level, and types of programs offered, as well as information on tuition, fees, and room and board charges. Beginning in 2000, the survey collected institutional pricing data from institutions with first-time, full-time, degree/certificate-seeking undergraduate students. Unduplicated full-year enrollment headcounts and instructional activity are now collected in a separate component (12-month Enrollment), part of the fall collection. The overall response rate was almost 100 percent for Title IV degree-granting institutions in reporting fall 2008 data.

Further information may be obtained from

Tara Lawley
Postsecondary Studies Division
Postsecondary Institutional Studies Program
National Center for Education Statistics
1990 K Street NW
Washington, DC 20006
http://nces.ed.gov/ipeds/

Spring (Fall Enrollment) This survey has been part of the HEGIS and IPEDS series since 1966. Response rates for this survey have been relatively high, generally exceeding 85 percent. Beginning in 2000, with web-based data collection, higher response rates were attained. For fall 2008, the overall response rate was 99.9 percent for degree-granting institutions. The response rate for 4-year private not-for-profit institutions was 99.9 percent, it was 99.8 percent for 4-year public, and 4-year private-for-profit, 2-year public, 2-year private not-for-profit, and 2-year private for-profit institutions had response rates of 100.0 percent. Imputation methods and the response bias analysis for the 2007–08 survey are discussed in Knapp, Kelly-Reid, and Ginder (2010).

Public institutions made the majority of changes to enrollment data during the 2004 revision period (Jackson et al. 2005). The majority of changes were made to unduplicated headcount data, with the net differences between the original data and the revised data at about 1 percent. Part-time students in general and enrollment in private not-for-profit institutions were often underestimated. The fewest changes by institutions were to Classification of Instructional Programs (CIP) code data.

Further information about the Spring (Fall Enrollment) survey may be obtained from

Jessica Shedd
Postsecondary Studies Division
Postsecondary Institutional Studies Program
National Center for Education Statistics
1990 K Street NW
Washington, DC 20006
http://nces.ed.gov/ipeds/

Fall (Completions) This survey was part of the HEGIS series throughout its existence. Collection of degree data has been maintained through IPEDS. However, the degree classification taxonomy was revised in 1970–71, 1982–83, 1991–92, and 2002–03.

The nonresponse rate does not appear to be a significant source of nonsampling error for this survey. The response rate over the years has been high, with the overall response rate for 2008 at 100 percent for degree-granting institutions. The response rate for 4-year private not-for-profit institutions was 99.9 percent and 100 percent for all others. Because of the high response rate for degree-granting institutions, nonsampling error caused by imputation is also minimal. Imputation methods and the response bias analysis for the fall 2008 survey are discussed in Knapp, Kelly-Reid, and Ginder (2009).

Most Title IV institutions supplying revised data on completions in 2003–04 were able to supply missing data for the prior year (Jackson et al. 2005). The size of the differences between imputed data for the prior year and the revised actual data supplied by the institution indicated that the imputed values produced by NCES were acceptable.

Further information on the IPEDS Completions surveys may be obtained from

Michelle Coon
Postsecondary Studies Division
Postsecondary Institutional Studies Program
National Center for Education Statistics
1990 K Street NW
Washington, DC 20006
http://nces.ed.gov/ipeds/

### Census Bureau

#### Current Population Survey

Prior to July 2001, estimates of school enrollment rates, as well as social and economic characteristics of students, were based on data collected in the Census Bureau’s monthly household survey of about 50,000 dwelling units. Beginning in July 2001, this sample was expanded to 60,000 dwelling units. The monthly Current Population Survey (CPS) sample consists of 754 areas comprising 2,007 geographic areas, independent cities, and minor civil divisions throughout the 50 states and the District of Columbia. The samples are initially selected based on the decennial census files and are periodically updated to reflect new housing construction.

The monthly CPS deals primarily with labor force data for the civilian noninstitutional population (i.e., excluding military personnel and their families living on post and inmates of institutions). In addition, in October of each year, supplemental questions are asked about highest grade completed, level and grade of current enrollment, attendance status, number and type of courses, degree or certificate objective, and type of organization offering instruction for each member of the household. In March of each year, supplemental questions on income are asked. The responses to these questions are combined with answers to two questions on educational attainment: highest grade of school ever attended and whether that grade was completed.

The estimation procedure employed for monthly CPS data involves inflating weighted sample results to independent estimates of characteristics of the civilian noninstitutional population in the United States by age, sex, and race. These independent estimates are based on statistics from decennial censuses; statistics on births, deaths, immigration, and emigration; and statistics on the population in the armed services. Generalized standard error tables are provided in the Current Population Reports or methods for deriving standard errors can be found within the CPS technical documentation at http://www.census.gov/apsd/techdoc/cps/cps-main.html. The CPS data are subject to both nonsampling and sampling errors.

Caution should also be used when comparing data between Census years. With the release of the January 2003 CPS data, population controls that reflect the results of Census 2000 were used in the monthly CPS estimation process. The new controls increased the size of the civilian noninstitutional population by about 3.5 million in May 2002. This adjustment usually occurs 3 to 4 years after the census, and, if the adjustment is substantial, historical data will be revised. Data from January 2000 through December 2002 were revised to reflect these new controls. Over and above these revisions, the U.S. Census Bureau introduced another large upward adjustment to the controls as part of its annual update of population estimates for 2003. The prior change in population controls occurred in March 1993, where data after this date were based on the 1990 census-based population controls and data before this date were based on 1980 or earlier census based population controls. This change in population controls between 1980-based and 1990-based had relatively little impact on summary measures, such as means, medians, and percentage distributions. It does, however, have a significant impact on levels. For example, use of 1990-based population controls resulted in about a 1 percent increase in the civilian noninstitutional population and in the number of families and households. Thus, estimates of levels for data collected in 1994 and later years differed from those for earlier years by more than what could be attributed to actual changes in the population. These differences could be disproportionately greater for certain subpopulation groups than for the total population.

In addition to the changes in population controls, two other relevant changes were introduced into the CPS with the release of the January 2003 data. First, the questions on race and Hispanic origin in the CPS were modified to comply with the new standards for maintaining, collecting, and presenting Federal data on race and ethnicity for Federal statistical agencies. A major change under those standards is that respondents may select more than one race when answering the survey. Respondents continued to be asked a separate question to determine if they are Hispanic, which is considered an ethnicity rather than a race. The ethnicity question was reworded to ask directly whether the respondent was Hispanic. Persons who report they are Hispanic also are classified separately in the race (or races) they consider themselves to be. Second, improvements were introduced to both the second stage and composite weighting procedures. These changes adapt the weighting procedures to the new race/ethnic classification system and enhance the stability over time of national and state/substate labor force estimates for demographic groups. These two changes, in addition to the change in population controls discussed above, benchmark the CPS data to the results of Census 2000, improve the estimation procedures, and ensure that the data series produced from the survey reflect the evolving composition of the U.S. population.

Further information on CPS may be obtained from

Education and Social Stratification Branch
Population Division
Census Bureau
U.S. Department of Commerce
Washington, DC 20233
http://www.census.gov/cps

School Enrollment Each October, the Current Population Survey (CPS) includes supplemental questions on the enrollment status of the population 3 years old and over, in addition to the monthly basic survey on labor force participation. Prior to 2001, the October supplement consisted of approximately 47,000 interviewed households. Beginning with the October 2001 supplement, the sample was expanded by 9,000 to a total of approximately 56,000 interviewed households. The main sources of non-sampling variability in the responses to the supplement are those inherent in the survey instrument. The question of current enrollment may not be answered accurately for various reasons. Some respondents may not know current grade information for every student in the household, a problem especially prevalent for households with members in college or in nursery school. Confusion over college credits or hours taken by a student may make it difficult to determine the year in which the student is enrolled. Problems may occur with the definition of nursery school (a group or class organized to provide educational experiences for children), where respondents’ interpretations of “educational experiences” vary.

The October 2007 basic CPS household-level response rate was 92.0 percent and the school enrollment supplement person-level response rate was 94.1 percent. Since these rates are determined at different levels they cannot be combined to derive an overall response rate.

Further information on CPS methodology may be obtained from

http://www.census.gov/cps

Further information on CPS "School Enrollment" may be obtained from

Education and Social Stratification Branch
Census Bureau
U.S. Department of Commerce
Washington, DC 20233
http://www.census.gov/population/www/socdemo/school.html

National Population Projections The 2008 National Population Projections provide projections of resident population and demographic components of change (births, deaths, and net international migration) through 2050. Population projections are available by age, sex, race and Hispanic origin. The following is a general description of the methods used to produce the 2008 National Population Projections.

The projections originated with a base population from Census 2000 and were produced using a cohort-component method. Many of the characteristics of the U.S. resident population, as measured by Census 2000, were preserved as demographic patterns that worked their way through the projection period. Using the cohort-component method, the components of population change (births, deaths, and net international migration) were projected for each birth cohort (persons born in a given year). For each passing year, the population was advanced one year of age. The new age categories were updated using survival rates and levels of net international migration projected for the passing year. A new birth cohort was added to form the population under one year of age by applying projected age-specific fertility rates to the female population aged 15 to 49, and updating the new cohort for the effects of mortality and net international migration.

The assumptions for the components of change were based on time series analysis. Initially, demographic models were used to summarize historical trends. The forecast parameters obtained from these models were utilized in the models’ framework to create fertility, mortality, and migration schedules required for the cohort-component method. Because of limited data about racial characteristics in the fertility and mortality historical series, the assumptions were first developed for three mutually exclusive and exhaustive groups: Hispanic origin (any race), non-Hispanic Black alone, and non-Hispanic all other races. These assumptions were then applied to their respective detailed race/ethnic categories to project the population, allowing presentation of the race categories described above.

Further information on the National Population Projections may be obtained from

Population Division
Census Bureau
U.S. Department of Commerce
Washington, DC 20233
http://www.census.gov

State Population Projections These state population projections were prepared using a cohort-component method by which each component of population change—births, deaths, state-to-state migration flows, international in-migration, and international out-migration—was projected separately for each birth cohort by sex, race, and Hispanic origin. The basic framework was the same as in past Census Bureau projections.

Detailed components necessary to create the projections were obtained from vital statistics, administrative records, census data, and national projections.

The cohort-component method is based on the traditional demographic accounting system:

P1 = P0 + B - D + DIM - DOM + IIM - IOM

where:

 P1 = population at the end of the period P0 = population at the beginning of the period B = births during the period D = deaths during the period DIM = domestic in-migration during the period DOM = domestic out-migration during the period IIM = international in-migration during the period IOM = international out-migration during the period

To generate population projections with this model, the Census Bureau created separate datasets for each of these components. In general, the assumptions concerning the future levels of fertility, mortality, and international migration are consistent with the assumptions developed for the national population projections of the Census Bureau.

Once the data for each component were developed, it was a relatively straightforward process to apply the cohort-component method and produce the projections. For each projection year, the base population for each state was disaggregated into eight race and Hispanic categories (non-Hispanic White; non-Hispanic Black; non-Hispanic American Indian, Eskimo, and Aleut; non-Hispanic Asian and Pacific Islander; Hispanic White; Hispanic Black; Hispanic American Indian, Eskimo, and Aleut; and Hispanic Asian and Pacific Islander), by sex, and single year of age (ages 0 to 85+). The next step was to survive each age-sex-race-ethnic group forward 1 year using the pertinent survival rate. The internal redistribution of the population was accomplished by applying the appropriate state-to-state migration rates to the survived population in each state. The projected out-migrants were subtracted from the state of origin and added to the state of destination (as in-migrants). Next, the appropriate number of immigrants from abroad was added to each group. The population under age 1 was created by applying the appropriate age-race-ethnic-specific birth rates to females of childbearing age. The number of births by sex and race/ethnicity were survived forward and exposed to the appropriate migration rate to yield the population under age 1. The final results of the projection process were adjusted to be consistent with the national population projections by single years of age, sex, race, and Hispanic origin. The entire process was then repeated for each year of the projection.

Population Division
Census Bureau
U.S. Department of Commerce
Washington, DC 20233
http://www.census.gov

### Other Sources

#### IHS Global Insight

IHS Global Insight provides an information system that includes databases of economic and financial information; simulation and planning models; regular publications and special studies; data retrieval and management systems; and access to experts on economic, financial, industrial, and market activities. One service is the IHS Global Insight Model of the U.S. Economy, which contains annual projections of U.S. economic and financial conditions, including forecasts for the federal government, incomes, population, prices and wages, and state and local governments, over a long-term (10- to 25-year) forecast period.

IHS Global Insight
1000 Winter Street
Suite 4300N
Waltham, MA 02451-124
http://www.ihsglobalinsight.com/

Top

Would you like to help us improve our products and website by taking a short survey?

YES, I would like to take the survey

or

No Thanks

The survey consists of a few short questions and takes less than one minute to complete.
National Center for Education Statistics - http://nces.ed.gov
U.S. Department of Education