IES Blog

Institute of Education Sciences

Statistical Concepts in Brief: Embracing the Errors

By Lauren Musu-Gillette

EDITOR’S NOTE: This is part of a series of blog posts about statistical concepts that NCES uses as a part of its work.

Many of the important findings in NCES reports are based on data gathered from samples of the U.S. population. These sample surveys provide an estimate of what data would look like if the full population had participated in the survey, but at a great savings in both time and costs.  However, because the entire population is not included, there is always some degree of uncertainty associated with an estimate from a sample survey. For those using the data, knowing the size of this uncertainty is important both in terms of evaluating the reliability of an estimate as well as in statistical testing to determine whether two estimates are significantly different from one another.

NCES reports standard errors for all data from sample surveys. In addition to providing these values to the public, NCES uses them for statistical testing purposes. Within annual reports such as the Condition of Education, Indicators of School Crime and Safety, and Trends in High School Drop Out and Completion Rates in the United States, NCES uses statistical testing to determine whether estimates for certain groups are statistically significantly different from one another. Specific language is tied to the results of these tests. For example, in comparing male and female employment rates in the Condition of Education, the indicator states that the overall employment rate for young males 20 to 24 years old was higher than the rate for young females 20 to 24 years old (72 vs. 66 percent) in 2014. Use of the term “higher” indicates that statistical testing was performed to compare these two groups and the results were statistically significant.

If differences between groups are not statistically significant, NCES uses the phrases “no measurable differences” or “no statistically significant differences at the .05 level”. This is because we do not know for certain that differences do not exist at the population level, just that our statistical tests of the available data were unable to detect differences. This could be because there is in fact no difference, but it could also be due to other reasons, such as a small sample size or large standard errors for a particular group. Heterogeneity, or large amounts of variability, within a sample can also contribute to larger standard errors.

Some of the populations of interest to education stakeholders are quite small, for example, Pacific Islander or American Indian/Alaska Native students. As a consequence, these groups are typically represented by relatively small samples, and their estimates are often less precise than those of larger groups. These less precise estimates can often be reflected in larger standard errors for these groups. For example, in the table above the standard error for White students who reported having been in 0 physical fights anywhere is 0.70 whereas the standard error is 4.95 for Pacific Islander students and 7.39 for American Indian/Alaska Native students. This means that the uncertainty around the estimates for Pacific Islander and American Indian/Alaska Native students is much larger than it is for White students. Because of these larger standard errors, differences between these groups that may seem large may not be statistically significantly different. When this occurs, NCES analysts may state that large apparent differences are not statistically significant. NCES data users can use standard errors to help make valid comparisons using the data that we release to the public.

Another example of how standard errors can impact whether or not sample differences are statistically significant can be seen when comparing NAEP scores changes by state. Between 2013 and 2015, mathematics scores changed by 3 points between for fourth-grade public school students in Mississippi and Louisiana. However, this change was only significant for Mississippi. This is because the standard error for the change in scale scores for Mississippi was 1.2, whereas the standard error for Louisiana was 1.6. The larger standard error, and therefore larger degree of uncertainly around the estimate, factor into the statistical tests that determine whether a difference is statistically significant. This difference in standard errors could reflect the size of the samples in Mississippi and Louisiana, or other factors such as the degree to which the assessed students are representative of the population of their respective states. 

Researchers may also be interested in using standard errors to compute confidence intervals for an estimate. Stay tuned for a future blog where we’ll outline why researchers may want to do this and how it can be accomplished.

Statistical concepts in brief: How and why does NCES use sample surveys?

By Lauren Musu-Gillette

EDITOR’S NOTE: This is the first in a series of blog posts about statistical concepts that NCES uses as a part of its work. 

The National Center for Education Statistics (NCES) collects survey statistics in two main ways—universe surveys and sample surveys.

Some NCES statistics, such as the number of students enrolled in public schools or postsecondary institutions, come from administrative data collections. These data represent a nearly exact count of a population because information is collected from all potential respondents (e.g., all public schools in the U.S.). These types of data collections are also known as universe surveys because they involve the collection of data covering all known units in a population. The Common Core of Data (CCD), the Private School Survey (PSS) and the Integrated Postsecondary Education Data System (IPEDS) are the key universe surveys collected by NCES.

While universe surveys provide a wealth of important data on education, data collections of this magnitude are not realistic for every potential variable or outcome of interest to education stakeholders. That is why, in some cases, we use sample surveys, which select smaller subgroups that are representative of a broader population of interest. Using sample surveys can reduce the time and expense that would be associated with collecting data from all members of a particular population of interest. 


Example of selecting a sample from a population of interest

The example above shows a simplified version of how a representative sample could be drawn from a population. The population shown here has 60 people, with 2/3 males and 1/3 females. The smaller sample of 6 individuals is drawn from this larger population, but remains representative with 2/3 males and 1/3 females included in the sample.


For instance, the National Postsecondary Student Aid Study (NPSAS), Baccalaureate and Beyond (B&B), and the Beginning Postsecondary Study (BPS) select institutions from the entire universe of institutions contained in the Integrated Postsecondary Education Data System (IPEDS) database. Then, some students within those schools are selected for inclusion in the study.

Schools and students are selected so that they are representative of the entire population of postsecondary institutions and students. Some types of institutions or schools can be sampled at higher rates than their representation in the population to ensure additional precision for survey estimates of that population. Through scientific design of the sample of institutions and appropriate weighting of the sample respondents, data from these surveys are nationally representative without requiring that all schools or all students be included in the data collection.

Many of the NCES surveys are sample surveys. For example, NCES longitudinal surveys include nationally representative data for cohorts of students in the elementary grades (Early Childhood Longitudinal Survey), the middle grades (Middle Grades Longitudinal Study), as well as at the high school (High School Longitudinal Study), and college levels (Beginning Postsecondary Students). The National Household Education Survey gathers information on parental involvement in education, early childhood programs, and other topics using household residences rather than schools as the population. The National Postsecondary Student Aid Survey gathers descriptive information on all college students and their participation in student aid programs. Additionally, characteristics of teachers and principals and the schools in which they teach are obtained through the Schools and Staffing Survey, and the National Teacher and Principal Survey.

By taking samples of the population of interest, NCES is able to study trends on a national level without needing to collect data from every student or every school. However, the structure and the size of the sample can affect the accuracy of the results for some population groups. This means that statistical testing is necessary to make inferences about differences between groups in the population. Stay tuned for future blogs about how this testing is done, and how NCES provides the data necessary for researchers or the public to do testing of their own.

Dropout rates: Measuring high school non-completion

By Lauren Musu-Gillette

High school dropouts face increasingly high rates of unemployment and low annual earnings. Therefore, it is important to have an accurate representation of the number of high school dropouts in the U.S.


Median annual earnings of full-time year-round wage and salary workers ages 25-34, by educational attainment: 2013

Figure. Median annual earnings of full-time year-round workers ages 25-34, by educational attainment: 2013

1 Total represents median annual earnings of all full-time year-round wage and salary workers ages 25–34.
2 Total represents median annual earnings of young adults with a bachelor's degree or higher.
NOTE: Full-time year-round workers are those who worked 35 or more hours per week for 50 or more weeks per year.
SOURCE: U.S. Department of Commerce, Census Bureau, Current Population Survey (CPS), See Digest of Education Statistics 2014, table 502.30.


There are several different ways to measure the number and percentage of high school dropouts. The status dropout rate measures the percentage of individuals who are not in school and have not earned a high school diploma or alternative credential. The Condition of Education uses the Census Bureau’s Current Population Survey (CPS) to provide an annual update on the percentage of 16- through 24-year-olds who meet these criteria.

Another way of looking at high school non-completion is to examine the event dropout rate. This rate estimates the percentage of high school students who left high school between the beginning of one school year and the beginning of the next without earning a high school diploma or alternative credential. While the definition of dropout is similar in both these measures, the populations are different. The event dropout rate only includes students who left high school over the course of a given year whereas the status dropout rate can include those who dropped out over many years and who may not have attended high school at all. The event dropout rate can be calculated from the CPS or from data reported by state education agencies to NCES through the CCD collection.

As an example of how these rates can differ, the CCD event dropout rate from October 2011 to October 2012 was 3.4 percent, while the 2012 CPS status dropout rate was 6.6 percent for 16- to 24-year-olds. The broader age range and related time period captured in the status dropout rate captures a larger percentage of high school non-completers. Both rates are important because they offer different types of information about high school dropouts. However, they can also offer complementary information. For example, both the event dropout rate and the status dropout rate have declined since the 90s.

See Trends in High School Dropout and Completion Rates in the United States for more information on current dropout statistics. 

Free or reduced price lunch: A proxy for poverty?

By Tom Snyder and Lauren Musu-Gillette

The percentage of students receiving free or reduced price lunch is often used as a proxy measure for the percentage of students living in poverty. While the percentage of students receiving free or reduced price lunch can provide some information about relative poverty, it should not be confused with the actual percentage of students in poverty enrolled in school. In 2012, just over half of public school children were eligible for free/reduced price lunches. In contrast, the actual poverty rate of public school students was 22 percent. Despite the correlation between the two measures, it is important to understand that they differ in important ways and that the difference is growing.

As the largest federal program for elementary and secondary schools, the National School Lunch Program provided meals to more than 31 million children each school day in 2012. All lunches provided by the National School Lunch Program are considered subsidized to some extent because meal-service programs at schools must operate as non-profit programs. While all students at participating schools are eligible for regular priced lunches through the National School Lunch Program, there are multiple ways in which a child can become eligible for a free/reduced price lunch. Traditionally, family income has been used to establish eligibility for free/reduced price lunch.  

One way the percentage of students in poverty and those eligible for free/reduced price lunch differ is that many students eligible for free/reduced price lunch fall above the federal poverty threshold. A student from a household with an income at or below 130 percent of the poverty income threshold is eligible for free lunch. A student from a household with an income between 130 percent and up to 185 percent of the poverty threshold is eligible for reduced price lunch.

In addition, some groups of children such as foster children, children participating in Head Start and Migrant Education Programs, or children receiving services under the Runaway and Homeless Youth Act are eligible for free/reduced price lunch. Also, under the Community Eligibility option, some non-poor children may be included in the program if their district decides that it would be more efficient from an administrative or service delivery perspective to provide the free lunches to all children in the school. Thus, the percentage of students receiving free or reduced price lunch includes all students at or below 185 percent of the poverty threshold, plus some additional non-poor children who meet other eligibility criteria, plus other students in schools and districts that have exercised the Community Eligibility option, which results in a percentage that is more than double the official poverty rate.

Despite its limitations, the free/reduced price lunch data are frequently used by education researchers as a proxy for school poverty since this count is generally available at the school level, while the poverty rate is typically not available. Because the free/reduced price lunch eligibility is derived from the federal poverty level, and therefore highly related to it, the free/reduced price lunch percentage is useful to researchers from an analytic perspective.

In reports such as the Condition of Education, NCES has characterized a school as a high poverty school when more than 75 percent of its students are eligible for a free/reduced price lunch. In 2012-13, about 24 percent of students attended public schools that were classified as high poverty. Using this high poverty definition enables us to identify important differences among students: 45 percent of Black and Hispanic students attended such high poverty schools compared to 8 percent of White students.


Percentage of public school students in low-poverty and high-poverty schools, by race/ethnicity: School year 2012-13

This chart presents bars on the percentage of children attending low poverty and high poverty schools by race/ethnicity in school year 2012-13. The bars are in two groups, one group is for low poverty schools and the other group is for high poverty schools. The first group of bars show that 21 percent of total children, 29 percent of White children, 7 percent of Black children, 8 percent of Hispanic children, 38 percent of Asian children, 12 percent of Pacific Islander children, 8 percent of American Indian/Alaska Native children, and 22 percent of children of two or more races were in low-poverty schools in 2012-13. The second group of bars show that 24 percent of total children, 8 percent of White children, 45 percent of Black children, 45 percent of Hispanic children, 16 percent of Asian children, 26 percent of Pacific Islander children, 36 percent of American Indian/Alaska Native children, and 17 percent of children of two or more races were in high-poverty schools in 2012-13.

NOTE: High-poverty schools are defined as public schools where more than 75.0 percent of the students are eligible for free or reduced-price lunch (FRPL), and low-poverty schools are defined as public schools where 25.0 percent or less of the students are eligible for FRPL. 

SOURCE: U.S. Department of Education, National Center for Education Statistics, Common Core of Data (CCD), "Public Elementary/Secondary School Universe Sturvey," 2012-13. 


One of the important limitations of the free/reduced lunch count is that the change in the eligibility requirements under the Community Eligibility option has meant that more children are qualifying for free/reduced price lunches.  Between 2000-01 and 2012-13, the percentage of children eligible for a free/reduced price lunch increased from 38 percent to 50 percent, an increase of 12 percentage points. In contrast, the percentage of public school children who lived in poverty increased from 17 to 23 percent, an increase of 6 percentage points.

While the free/reduced lunch percentages can serve as a useful indicator of the relative numbers of poor children, it does not substitute as a measure of the level of child poverty, nor of changes in poverty rates over time. It is also important to keep in mind that neither free/reduced price lunch eligibility nor poverty should be considered measures of socioeconomic status (SES), which measures a broader spectrum of family characteristics (e.g. parental education and occupations) that may be related to student performance. Some NCES surveys already collect SES data while others are investigating options for collecting better indicators of students’ SES. These efforts will be detailed in a future blog post.

For more information on recent changes to free/reduced price lunch eligibility data in EDFacts see Free and Reduced-Price Lunch Eligibility Data in EDFacts: A White Paper on Current Status and Potential Changes