Skip Navigation
Digest of Education Statistics: 2022
Digest of Education Statistics: 2022

NCES 2024-009
February 2024

Reader's Guide


Data Sources

The data in the Digest of Education Statistics are obtained from many different sources—including students and teachers, state education agencies, local elementary and secondary schools, and colleges and universities—using surveys and compilations of administrative records. Users should be cautious when comparing data from different sources. Differences in aspects such as data collection procedures, timing, question phrasing, and interviewer training can affect the comparability of results across data sources.

Most of the tables present data from surveys conducted by the National Center for Education Statistics (NCES) or by other agencies and organizations with support from NCES. Some tables also include other data published by federal and state agencies, private research organizations, or professional organizations. Brief descriptions of the surveys and other data sources used in this volume can be found in Appendix A: Guide to Sources. For each NCES and non-NCES data source, the Guide to Sources also provides information on where to obtain further details about that source.

Data are obtained primarily from two types of surveys: universe surveys and sample surveys. In universe surveys, information is collected from every member of the population. For example, in a survey of public elementary and secondary school expenditures, data would be obtained from each public school district in the United States. When data from an entire population are available, estimates of the total population or a subpopulation are made by simply summing the units in the population or subpopulation.

Since universe surveys are often expensive and time-consuming, many surveys collect data from a sample of the population of interest (sample surveys). For example, the National Assessment of Educational Progress (NAEP) assesses a representative sample of students rather than the entire population of students. When a sample survey is used, statistical uncertainty is introduced because the data come from only a portion of the population. This statistical uncertainty must be considered when reporting estimates and making comparisons. For information about how NCES accounts for statistical uncertainty when reporting sample survey results, see "Data Analysis and Interpretation" later in this Reader's Guide.

Common Measures and Indexes

The Digest reports various types of statistics derived from universe and sample surveys. Many tables report the size of a population or a subpopulation, and often the size of a subpopulation is expressed as a percentage of the total population. Totals reported in the Digest are for the 50 states and the District of Columbia unless otherwise noted.

In addition, the average (or mean) value of some characteristic of the population or subpopulation may be reported. The average is obtained by summing the values for all members of the population and dividing the sum by the size of the population. An example is the average annual salary of full-time instructional faculty at degree-granting postsecondary institutions (table 316.10). Another measure that is sometimes used is the median. The median is the midpoint value of a characteristic at or above which 50 percent of the population is estimated to fall, and at or below which 50 percent of the population is estimated to fall. An example is the median annual earnings of young adults who are full-time year-round workers (table 502.30). Some tables also present an average per capita, or per person, which represents an average computed for every person in a specified group or population. It is derived by dividing the total for an item (such as income or expenditures) by the number of persons in the specified population. An example is the per capita expenditure on education in each state (table 106.50).

Many tables report financial data in dollar amounts. Unless otherwise noted, financial data are in current dollars, meaning they are not adjusted for changes in the purchasing power of the dollar over time due to inflation. For example, 1990–91 teacher salaries in current dollars are the amounts that the teachers earned in 1990–91, without any adjustments to account for inflation (table 211.20). Constant dollar adjustments attempt to account for the effects of price changes (inflation) from statistical series reported in dollars to allow for valid comparisons of dollar amounts across years. For example, if teacher salaries over a 20-year period are adjusted to constant 2021–22 dollars, the salaries for all years are adjusted to the dollar values that presumably would exist if prices in each year were the same as in 2021–22 (in other words, as if the dollar had constant purchasing power over the entire period). Any changes in the constant dollar amounts would reflect only changes in real values. Constant dollar amounts are computed using price indexes. Price indexes for inflation adjustments can be found in tables 106.70 and 106.75. Each table that presents constant dollars includes a note indicating which index was used for the inflation adjustments; in most cases, the Consumer Price Index was used.

When presenting data for a time series, some tables include both actual and projected data. Actual data are data that have already been collected. Projected data can be used when data for a recent or future year are not yet available. Projections are estimates that are based on recent trends in relevant statistics and patterns associated with correlated variables. Unless otherwise noted, all data in this volume are actual.

Standard Errors

Using estimates calculated from data based on a sample of the population requires consideration of several factors before the estimates can be interpreted. When using data from a sample, some margin of error will always be present in estimations of characteristics of the total population or subpopulation because the data are available from only a portion of the population. Consequently, data from samples can provide only an approximation of the true or actual value. The margin of error of an estimate, or the range of potential true or actual values, depends on several factors such as the amount of variation in the responses, the size and representativeness of the sample, and the size of the subgroup for which the estimate is computed. The magnitude of this margin of error is measured by what statisticians call the standard error of an estimate.

When data from sample surveys are reported, the standard error is calculated for each estimate. In the tables, the standard error generally appears in parentheses next to the estimate to which it applies. In order to caution the reader when interpreting findings, estimates from sample surveys are flagged with "!" when the standard error is between 30 and 50 percent of the estimate and suppressed with "‡" when the standard error is 50 percent of the estimate or greater. The term coefficient of variation (CV) refers to the ratio of the standard error to the estimate; for example, if an estimate has a CV of 30 percent, this means that the standard error is equal to 30 percent of the value of the estimate.

Nonsampling Errors

In addition to standard errors, which apply only to sample surveys, all surveys are subject to nonsampling errors. Nonsampling errors may arise when individual respondents or interviewers interpret questions differently; when respondents must estimate values; when coders, keyers, and other processors handle answers differently; when people who should be included in the universe are not; or when people fail to respond, either totally or partially. Total nonresponse (or unit nonresponse) means that people do not respond to the survey at all, while partial nonresponse (or item nonresponse) means that people who participate in a survey do not respond to specific survey items. To compensate for nonresponse, adjustments are often made. For universe surveys, an adjustment made for either type of nonresponse, total or partial, is often referred to as an imputation, which is often a substitution of the "average" questionnaire response for the nonresponse. For universe surveys, imputations are usually made separately within various groups of sample members that have similar survey characteristics. For sample surveys, total nonresponse is handled through nonresponse adjustments to the sample weights. For sample surveys, imputation for item nonresponse is usually made by substituting for a missing item the response to that item of a respondent having characteristics that are similar to those of the nonrespondent. For additional general information about imputations, see the NCES Statistical Standards (NCES 2014-097). Standard 4-1, available at https://nces.ed.gov/statprog/2012/pdf/Chapter4.pdf, provides information about imputation for item nonresponse. Appendix A: Guide to Sources includes some information about specific surveys' response rates, nonresponse adjustments, and other efforts to reduce nonsampling error. Although the magnitude of nonsampling error is frequently unknown, idiosyncrasies that have been identified are noted in the appropriate tables.

Data Analysis and Interpretation

The data presented in the Digest do not investigate more complex hypotheses, account for interrelationships among characteristics, or support causal inferences. We encourage readers who are interested in more complex questions and in-depth analysis to explore other NCES resources, including other publications, online data tools, and public- and restricted-use datasets at https://nces.ed.gov.

A number of considerations influence the selection of data years to include in the tables. To provide the most timely data possible, the latest year of available data is shown. The choice of comparison years in tables is often based on the need to show the earliest available survey year, as in the case of NAEP and the international assessment surveys. In the case of surveys with long time frames, such as surveys measuring enrollment, either all available data years are shown in the tables or intervening years are selected in increments in order to show the general trend.

Rounding and Other Considerations

Digest tables are the foundation of the congressionally mandated Condition of Education (COE), which contains key indicators that describe and visualize important educational developments and trends. All COE calculations using Digest data are based on unrounded estimates. Therefore, the reader may find that a calculation, such as a difference or a percentage change, cited in a COE text or a figure may not be identical to the calculation obtained by using the rounded values shown in the accompanying Digest table. Although values reported in the tables are generally rounded to one decimal place (e.g., 76.5 percent), values reported in COE text are generally rounded to whole numbers (with any value of 0.50 or above rounded to the next highest whole number). Due to rounding, cumulative percentages may sometimes equal 99 or 101 percent rather than 100 percent.

Race and Ethnicity

The Office of Management and Budget (OMB) is responsible for the standards that govern the categories used to collect and present federal data on race and ethnicity. The OMB revised the guidelines on racial/ethnic categories used by the federal government in October 1997, with a January 2003 deadline for implementation. The revised standards require a minimum of five categories for data on race: American Indian or Alaska Native, Asian, Black or African American, Native Hawaiian or Other Pacific Islander, and White. The standards also require the collection of data on the ethnicity categories of Hispanic or Latino and Not Hispanic or Latino. It is important to note that Hispanic origin is an ethnicity rather than a race, and therefore persons of Hispanic origin may be of any race. Origin can be viewed as the heritage, nationality group, lineage, or country of birth of the person or the person's parents or ancestors before their arrival in the United States. The race categories American Indian or Alaska Native, Asian, Black or African American, Native Hawaiian or Other Pacific Islander, and White exclude persons of Hispanic origin unless otherwise noted.

For a description of each racial/ethnic category, please see the "Racial/ethnic group" entry in Appendix B: Definitions. Some of the category labels are shortened for more concise presentation in tables. American Indian or Alaska Native is denoted as American Indian/Alaska Native (except when separate estimates are available for American Indians alone or Alaska Natives alone); Black or African American is shortened to Black; and Hispanic or Latino is shortened to Hispanic. When discussed separately from Asian estimates, Native Hawaiian or Other Pacific Islander is shortened to Pacific Islander.

In earlier editions of the Digest, racial/ethnic categories were generally presented in order of subgroup size, from largest to smallest. During the 2022 Digest update, racial/ethnic categories instead began to be presented in alphabetical order.

Many of the data sources used for this volume are federal surveys that collect data using the OMB standards for racial/ethnic classification described above; however, some tables include historical data collected prior to the adoption of the OMB standards. Asians and Pacific Islanders are combined into a single category for years in which the data were not collected separately for the two groups. The combined category can sometimes mask significant differences between the two subgroups. For example, prior to 2011, NAEP collected data that did not allow for separate reporting of estimates for Asians and Pacific Islanders. The population counts presented in table 101.20, based on the U.S. Census Bureau's Current Population Reports, indicate that 96 percent of all Asian/Pacific Islander 5- to 17-year-olds were Asian in 2010. Thus, the combined category for Asians/Pacific Islanders is more representative of Asians than of Pacific Islanders.

Some surveys give respondents the option of selecting more than one race category, an "other" race category, or a "Two or more races" or "more than one race" category. Where possible, tables present data on the "Two or more races" category; however, in some cases this category may not be separately shown because the information was not collected or due to other data issues. Some tables include the "other" category. Any comparisons made between persons of one racial/ethnic group and persons of "all other racial/ethnic groups" include only the racial/ethnic groups shown in the reference table. In some surveys, respondents are not given the option to select more than one race category and also are not given an option such as "other" or "more than one race." In these surveys, respondents of Two or more races must select a single race category. Any comparisons between data from surveys that give the option to select more than one race and surveys that do not offer such an option should take into account the fact that there is a potential for bias if members of one racial group are more likely than members of the others to identify themselves as "Two or more races."1 For some postsecondary data, data on race and ethnicity are not collected for nonresidents.

In addition to the major racial/ethnic categories, several tables include Hispanic ancestry subgroups (such as Cuban, Dominican, Mexican, Puerto Rican, Salvadoran, Other Central American, and South American) and Asian ancestry subgroups (such as Asian Indian, Chinese, Filipino, Japanese, Korean, and Vietnamese). In addition, selected tables include "Two or more races" subgroups (such as American Indian/Alaska Native and White, Asian and White, and Black and White).

Limitations of the Data

Due to large standard errors, some differences that seem substantial are not statistically significant. This situation often applies to estimates based on small samples, for example, those involving American Indians/Alaska Natives or Pacific Islanders. Even in larger surveys, the numbers of individuals in groups that represent a smaller portion of the overall population, such as American Indians/Alaska Natives or Pacific Islanders, who are included in a sample are often small. When sample sizes are small, standard errors tend to be larger, indicating that the findings could be less reliable population-level estimations of characteristics for those smaller groups. Readers should keep these limitations in mind when comparing estimates presented in the tables.

As mentioned, caution should be exercised when comparing data from different sources. Differences in sampling, data collection procedures, coverage of the target population, timing, phrasing of questions, scope of nonresponse, interviewer training, and data processing and coding mean that results from different sources may not be strictly comparable. For example, the response categories presented to a respondent, and the way in which the question is asked, can influence the response. In the case of questions asking about race and ethnicity, this may be especially true for individuals who consider themselves to be of more than one race or ethnicity.


1 For discussion of such bias in responses to the 2000 Census, see Parker, J., Schenker, N., Ingram, D.D., Weed, J.A., Heck, K.E., & Madams, J.H. (2004). Bridging Between Two Standards for Collecting Information on Race and Ethnicity: An Application to Census 2000 and Vital Rates. Public Health Reports, 119(2): 192–205. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1497618/.

Top