School Choice in the United States: 2019

Appendix A: Technical Notes

Estimates

Data for indicators are obtained from two types of surveys: universe surveys and sample surveys. In universe surveys, information is collected about every member of the population. When data from an entire population are available, estimates of the total population or a subpopulation are made by simply summing the units in the population or subpopulation. As a result, there is no sampling error, and observed differences are reported as true.

Since a universe survey is often expensive and time consuming, many surveys collect more detailed data from a sample of the population of interest (sample survey). For example, the National Assessment of Educational Progress (NAEP) assesses a representative sample of students rather than the entire population of students. When a sample survey is used, statistical uncertainty is introduced, because the data come from only a portion of the entire population. This statistical uncertainty must be considered when reporting estimates and making comparisons.

Various types of statistics derived from universe and sample surveys are reported in the indicators. Many indicators report the size of a population or a subpopulation, and often the size of a subpopulation is expressed as a percentage of the total population. In addition, the average (or mean) value of some characteristic of the population or subpopulation may be reported. The average is obtained by summing the values for all members of the population and dividing the sum by the size of the population.

Standard Errors

Using estimates calculated from data based on a sample of the population requires consideration of several factors before differences in the estimates can be described as meaningful. When using data from a sample, some margin of error will always be present in estimations of characteristics of the total population or subpopulation because the data are available from only a portion of the total population. Consequently, data from samples can provide only an approximation of the true or actual value. The margin of error of an estimate, or the range of potential true or actual values, depends on several factors such as the amount of variation in the responses, the size and representativeness of the sample, and the size of the subgroup for which the estimate is computed. The magnitude of this margin of error is measured by what statisticians call the “standard error” of an estimate.

When data from sample surveys are reported, the standard error is calculated for each estimate. The standard errors for all estimated totals, means, or percentages are reported in the Reference tables.

In order to caution the reader when interpreting findings in the indicators that may be unstable, estimates from sample surveys are flagged with a “!” when the standard error is between 30 and 50 percent of the estimate, and suppressed and replaced with a “‡” when the standard error is 50 percent of the estimate or greater.

Data Analysis and Interpretation

When estimates are from a sample, caution is warranted when drawing conclusions about one estimate in comparison to another, or about whether a time series of estimates is increasing, decreasing, or staying the same. Although one estimate may appear to be larger than another, a statistical test may find that the apparent difference between them is not reliably measurable due to the uncertainty around the estimates. In this case, the estimates will be described as having no measurable difference, meaning that the difference between them is not statistically significant. Conversely, statistically significant differences may be referred to as “measurably different” in the text.

For all indicators that report estimates based on samples, differences between estimates are stated only when they are statistically significant. Findings described in this report with comparative language (e.g., higher, lower, increase, and decrease) are statistically significant. To determine whether differences reported are statistically significant, two-tailed t tests at the .05 level are typically used. The t test formula for determining statistical significance is adjusted when the samples being compared are dependent. The t test formula is not adjusted for multiple comparisons. When the variables to be tested are postulated to form a trend, the relationship may be tested using linear regression, logistic regression, or ANOVA trend analysis instead of a series of t tests. These alternate methods of analysis test for specific relationships (e.g., linear, quadratic, or cubic) among variables. For more information on data analysis, please see the NCES Statistical Standards, Standard 5-1, available at https://nces.ed.gov/statprog/2012/pdf/Chapter5.pdf.

In general, only statistically significant findings are discussed in the text. However, statistically nonsignificant differences between groups may be highlighted for clarification purposes. Statistically nonsignificant differences may also be discussed when they relate to a primary focus of the report, such as if the percentage of students enrolled in a certain type of school remained unchanged over time.

Multivariate analyses, such as ordinary least squares (OLS) regression models, provide information on whether the relationship between an independent variable and an outcome measure (such as group differences in the outcome measure) persists after taking into account other variables (such as student, family, and school characteristics). For indicators that include a regression analysis, multiple categorical or continuous independent variables are entered simultaneously. A significant regression coefficient indicates an association between the dependent (outcome) variable and the independent variable, after controlling for other independent variables included in the regression model.

Data presented in the indicators typically do not investigate more complex hypotheses or support causal inferences. We encourage readers who are interested in more complex questions and in-depth analysis to explore other NCES resources, including publications, online data tools, and public- and restricted-use datasets at https://nces.ed.gov.

A number of considerations influence the ultimate selection of the data years to feature in the indicators. To make analyses as timely as possible, the latest year of available data is shown. In the case of indicators discussing trends and using surveys with long time frames, such as surveys measuring enrollment, a decade’s beginning year (e.g., 2000–01) often starts the trend line. The narrative for the indicators typically compares the most current year’s data with those from the initial year. Where applicable, the narrative may also note years in which the data begin to diverge from previous trends.

Rounding and Other Considerations

All calculations within the indicators are based on unrounded estimates. Therefore, the reader may find that a calculation, such as a difference or a percentage change, cited in the text or figure may not be identical to the calculation obtained by using the rounded values shown in the accompanying tables. Although values reported in the Reference tables are generally rounded to one decimal place (e.g., 76.5 percent), values reported in each indicator are generally rounded to whole numbers (with any value of 0.50 or above rounded to the next highest whole number). Due to rounding, cumulative percentages may sometimes equal 99 or 101 percent rather than 100 percent. While the data labels on the figures have been rounded to whole numbers for most indicators, the graphical presentation of these data is based on the unrounded estimates.

Race and Ethnicity

The Office of Management and Budget (OMB) is responsible for the standards that govern the categories used to collect and present federal data on race and ethnicity. The OMB revised the guidelines on racial/ethnic categories used by the federal government in October 1997, with a January 2003 deadline for implementation. The revised standards require a minimum of these five categories for data on race: American Indian or Alaska Native, Asian, Black or African American, Native Hawaiian or Other Pacific Islander, and White. The standards also require the collection of data on ethnicity categories, at a minimum, Hispanic or Latino and Not Hispanic or Latino. It is important to note that Hispanic origin is an ethnicity rather than a race, and therefore persons of Hispanic origin may be of any race. Origin can be viewed as the heritage, nationality group, lineage, or country of birth of the person or the person’s parents or ancestors before their arrival in the United States. The race categories White, Black, Asian, Native Hawaiian or Other Pacific Islander, and American Indian or Alaska Native, as presented in these indicators, exclude persons of Hispanic origin unless noted otherwise.

The categories are defined as follows:

American Indian or Alaska Native: A person having origins in any of the original peoples of North and South America (including Central America) and maintaining tribal affiliation or community attachment.
Asian: A person having origins in any of the original peoples of the Far East, Southeast Asia, or the Indian subcontinent, including, for example, Cambodia, China, India, Japan, Korea, Malaysia, Pakistan, the Philippine Islands, Thailand, and Vietnam.
Black or African American: A person having origins in any of the black racial groups of Africa.
Native Hawaiian or Other Pacific Islander: A person having origins in any of the original peoples of Hawaii, Guam, Samoa, or other Pacific Islands.
White: A person having origins in any of the original peoples of Europe, the Middle East, or North Africa.
Hispanic or Latino: A person of Mexican, Puerto Rican, Cuban, South or Central American, or other Spanish culture or origin, regardless of race.

Within these indicators, some of the category labels have been shortened in the text, tables, and figures for ease of reference. American Indian or Alaska Native is denoted as American Indian/Alaska Native (except when separate estimates are available for American Indians alone or Alaska Natives alone); Black or African American is shortened to Black; and Hispanic or Latino is shortened to Hispanic. Native Hawaiian or Other Pacific Islander is shortened to Pacific Islander.

The indicators in this report draw from a number of different data sources, and some indicators include data collected prior to the adoption of the OMB standards. This report focuses on the six categories that are the most common among the various data sources used: White, Black, Hispanic, Asian, Pacific Islander, and American Indian/Alaska Native. Asians and Pacific Islanders are combined into one category in indicators for which the data were not collected separately for the two groups.

Surveys from which data are presented in these indicators generally give respondents the option of selecting either an “other” race category, a “Two or more races” or “multiracial” category, or both. Where possible, indicators present data on the “Two or more races” category; however, in some cases this category may not be separately shown because the information was not collected or due to other data issues. In general, the “other” category is not separately shown. Any comparisons made between persons of one racial/ethnic group to “all other racial/ethnic groups” include only the racial/ethnic groups shown in the indicator. In the earlier administrations of some surveys prior to the implementation of the OMB guidelines, respondents were not given the option to select more than one race. In these surveys, respondents of Two or more races must select a single race category. Any comparisons between data from surveys that give the option to select more than one race and surveys that do not offer such an option should take into account the fact that there is a potential for bias if members of one racial group are more likely than members of the others to identify themselves as “Two or more races.”¹

Locale

Federal departments and agencies use various classification systems to define community types. Indicators in this report use the National Center for Education Statistics (NCES) system of locale codes. These locale codes are based on an address’s proximity to an urbanized area.

City: Territory inside an urbanized area and inside a principal city.
Suburb: Territory outside a principal city and inside an urbanized area.
Town: Territory inside an urban cluster.
Rural: Census-defined rural territory that is apart from an urbanized area or an urban cluster.

Symbols

In accordance with the NCES Statistical Standards, many tables in this report use a series of symbols to alert the reader to special statistical notes. These symbols, and their meanings, are as follows:

— Not available.
† Not applicable.
# Rounds to zero.
! Interpret data with caution. The coefficient of variation (CV) for this estimate is between 30 and 50 percent.
‡ Reporting standards not met. Either there are too few cases for a reliable estimate or the coefficient of variation (CV) for this estimate is 50 percent or greater.

¹ Such bias was found by a National Center for Health Statistics study that examined race/ethnicity responses to the 2000 Census. This study found, for example, that as the percentage of multiple-race respondents in a county increased, the likelihood of respondents stating Black as their primary race increased among Black/White respondents but decreased among American Indian or Alaska Native/Black respondents.