Skip Navigation
Dropout Rates in the United States: 2004

NCES 2007-024
November 2006

Appendix A: Current Population Survey

The Current Population Survey (CPS) provides nationally representative data for the civilian, noninstitutionalized population of the United States. The survey is conducted in a sample of 50,000–60,000 households each month. Households are interviewed for 4 successive monthly interviews, are not interviewed for the next 8 months, and then are re-interviewed for the following 4 months. Typically, the 1st and the 5th interviews are conducted in person, with the remaining conducted via computer-assisted telephone interviewing. The sample frame is a complete list of dwelling-unit addresses at the time of the decennial Census updated by demolitions and new construction and field listings. The population surveyed excludes members of the armed forces, inmates of correctional institutions, and patients in long-term medical or custodial facilities; it is referred to as the civilian, noninstitutionalized population. For the October 2004 core CPS, the unweighted response rate was 92.3 percent, and the unweighted response rate for the school enrollment supplement was 96.0 percent. Because the school enrollment supplement is dependent on the core collection, the overall unweighted response rate for the supplement is the product of core and supplement response rates, or 88.6 percent in 2004.

An adult member of each household serves as the informant for that household, supplying basic monthly data for each member of the household. In addition, in October of each year, supplementary questions regarding school enrollment are asked about eligible household members 3 years old and over. Data are collected about individuals who attend or attended public schools or private schools, who were homeschooled, or who never attended school in the United States.

CPS data on educational attainment and enrollment status in the current year and prior year are used to identify dropouts and completers, and additional items in the CPS data are used to describe some of their basic characteristics. The CPS is the only source of national time series data on dropout and completion rates. However, because CPS collects no information on school characteristics and experiences, its usefulness in addressing dropout and completion issues is primarily for providing insights on who drops out and who completes. Sample sizes in the CPS collections do not support stable state-level estimates.

There are important differences in data collection procedures between the CPS and CCD.2 First, the CCD collection includes only data for public school whereas the CPS counts include students who were enrolled in either public or private schools, and some individuals who never enrolled in school in the United States. Second, the CCD collects data about students from a given state’s public school system. CPS data are based on where individuals currently reside so the state of residence may differ from the state or country of earlier school attendance. Third, the CCD collection includes dropouts in grades 7 through 12 versus grades 10 through 12 in the CPS (although CCD event rates are reported for grades 9 through 12 as in this report). Fourth, the CCD collection is based on administrative records rather than individual self-reports based on household surveys as in the CPS.

Top

Defining and Calculating Dropout and Completion Rates Using the CPS

Event dropout rates

The October Supplement to the CPS is the only national data source that currently can be used to estimate annual national dropout rates. As a measure of recent dropout experiences, the event dropout rate measures the proportion of students who dropped out over a 1-year interval.

The numerator of the event dropout rate for October 2004 is the number of persons 15 through 24 years old surveyed in 2004 who were enrolled in grades 10–12 in October 2003, were not enrolled in high school in October 2004, and who also did not complete high school (that is, had not received a high school diploma or an alternative credential such as an equivalency certificate) between October 2003 and October 2004.

The denominator of the event dropout rate for 2004 is the sum of the dropouts (that is, the numerator) and all persons 15 through 24 years old who were attending grades 10–12 in October 2003, who were still enrolled in October 2004, or who graduated or completed high school between October 2003 and October 2004.

The dropout interval is defined to include the previous summer (in this case, the summer of 2004) and the previous school year (in the case of the 2003 school year), so that once a grade is completed, the student is then at risk of dropping out of the next grade. Given that the data collection is tied to each person’s enrollment status in October of 2 consecutive years, any student who drops out and returns within the 12-month period is not counted as a dropout.

Status dropout rates

The status dropout rate reflects the percentage of individuals who are dropouts, regardless of when they dropped out. The numerator of the status dropout rate for 2004 is the number of individuals ages 16 through 24 years who, as of October 2004, had not completed high school and were not currently enrolled. The denominator is the total number of 16- through 24-year-olds in October 2004. Those who received a GED are not considered dropouts for this rate.

Status completion rates

The numerator of the high school status completion rate is the number of 18- through 24-year-olds who had received a high school diploma or an alternative credential such as an equivalency certificate. The denominator is the number of 18- through 24-year-olds who are no longer in elementary or secondary school.

General education development (GED) credentials and the status completion rate. Prior to 2000, editions of this series of dropout reports presented estimates of overall status completion rates and estimates of the method of completion—graduation by diploma or completion by taking an alternative exam such as the General Educational Development (GED) test. Examination of the changes in the CPS GED items in the October 2000 and subsequent surveys has indicated that GED estimates for 2000 and later years are not comparable with earlier data and may not be reliable estimates of high school equivalency completions (table A-1). Therefore, CPS estimates of the method of high school completion were not presented in some recent dropout reports. Because the method of high school completion remains of interest, an estimate of those who passed the GED exam using GED Testing Service (GEDTS) data was developed.

Table A-1. Number of 18- through 24-year-olds who received a GED, by data source: 1990 through 2004 Table A-1 Number of 18- through 24-year-olds who received a GED, by data source: 1990 through 2004

Data on GED testing are collected by the GED Testing Service and reported in a series of annual statistical reports (American Council on Education, GED Testing Service 1990 through 2004). These reports indicate the number of people passing the GED test, by age group. Tabulation of data presented in GED Testing Service reports from 1998 through 2004 permits an estimate of the number of persons ages 18–24 in 2004 (the most recent year for which data are available) who ever passed the GED test. The source data from the GEDTS reports are presented in table A-2.

GED Testing Service reports present the number of GED passers3 in the United States and the percentage of passers in each age group for persons age 16 (or age 16 and under4), 17, 18, 19, 20–24, and higher age groups. The number of people in 2004 who were ages 18–24 and who passed the GED test equals the sum of the number of people who passed the GED test since 1998 at specific ages. The GEDTS reports present grouped data for persons ages 20–24. As a result, a count of the number of passers at each specific age from 20 through 24 is not available. Analysis of GEDTS data on GED passers from 2001 and 2002 indicates that approximately 8 percent of all GED passers are age 20, 6 percent are age 21, 5 percent are age 22, 4 percent are age 23, and 3 percent are age 24. It was assumed that the distribution of passers ages 20–24 follows this distribution for all years from 1998 through 2004.

Table A-2.  Percentage distribution of recipients of a GED, by age group 16 and above: 1998 through 2004 Table A-2 Percentage distribution of recipients of a GED, by age group 16 and above: 1998 through 2004

Data considerations for CPS

Over the last several decades, data collection procedures, items, and data preparation processes have changed in the CPS. Some of these changes were introduced to ensure CPS estimates were comparable to decennial Census collections, some were introduced to reflect changes in the concepts under study, some were introduced to improve upon measures, and some were introduced to develop measures for new phenomena. The effects of the various changes have been studied to help ensure they did not disrupt trend data from CPS. For a summary of these studies, please see appendix C of Dropout Rates in the United States: 2001 (Kaufman, Alt, and Chapman 2004).

CPS data include weights to help make estimates from the data representative of the civilian, noninstitutionalized population in the United States. These weights are based on decennial Census data that are adjusted for births, deaths, immigration, emigration, etc., over time.

Imputation for item nonresponse in CPS. For many key items in the October CPS, the U.S. Census Bureau imputes data for cases with missing data due to item nonresponse. However, the Census Bureau did not impute data regarding the method of high school completion before 1997. Special imputations were conducted for these items using a sequential hot deck procedure implemented through the PROC IMPUTE computer program developed by the American Institutes for Research. Three categories of age, two categories of race, two categories of sex, and two categories of citizenship were used as imputation cells.

Age and grade ranges in CPS estimates. The age and grade ranges used in the CPS measures of dropout rates are constrained by available data. Ideally, the estimates would be able to capture reliable estimates of children in grades as low as grade 9. However, the CPS asks the question about enrollment the previous October only about individuals age 15 and older. Many 9th-graders are younger than age 15, so 10th grade was selected as the lower boundary of grade ranges in the event dropout rate.

Accuracy of CPS estimates. CPS estimates in this report are derived from samples and are subject to two broad classes of error—sampling and nonsampling error. Sampling errors occur because the data are collected from a sample of a population rather than from the entire population. Estimates based on a sample will differ somewhat from the values that would have been obtained from a universe survey using the same instruments, instructions, and procedures. Nonsampling errors come from a variety of sources and affect all types of surveys, universe as well as sample surveys. Examples of sources of nonsampling error include design, reporting, and processing errors and errors due to nonresponse. The effects of nonsampling errors are more difficult to evaluate than those that result from sampling variability. As much as possible, procedures are built into surveys in order to minimize nonsampling errors.

The standard error is a measure of the variability due to sampling when estimating a parameter. It indicates how much variance there is in the population of possible estimates of a parameter for a given sample size. Standard errors can be used as a measure of the precision expected from a particular sample. The probability that a sample statistic would differ from a population parameter by less than the standard error is about 68 percent. The chances that the difference would be less than 1.65 times the standard error are about 90 out of 100; and that the difference would be less than 1.96 times the standard error, about 95 out of 100.

Standard errors for percentages and number of persons based on CPS data were calculated using the following formulas:

Percentage:
se=Square Root of (b/N)(p)(100-p)
 
wherep=the percentage (0 < p < 100),
N=the population on which the percentage is based, and
b=the regression parameter based on a generalized variance formula and is associated with the characteristic.
For both 2004, b is equal to 2,131 for the total or White population; 2,410 for the Black population; and 2,744 for the Hispanic population, 2,410 for the Asian/Pacific Islander and “more than one race” populations ages 14 through 24. The b for regional estimates are 0.90 for the Northeast, 0.93 for the Midwest, 1.14 for the South, and 1.14 for the West.

CPS documentation explain the purpose and process for the generalize variance parameter:

Experience has shown that certain groups of estimates have similar relations between their variances and expected values. Modeling or generalizing may provide more stable variance estimates by taking advantage of these similarities. The generalized variations function is a simple model that expresses the variance as a function of the expected value of a survey estimates. The parameters of the generalized variance function are estimated using direct replicate variances. (Cahoon 2005, p. 7)

Number of persons:
se=Square Root of (bx)(1-(x/T))
 
wherex=the number of persons (i.e., dropouts),
T=population in the category (e.g., Blacks ages 16 through 24), and
b=as above.

Top

Statistical Procedures for Analyzing CPS-Based Estimates

Because CPS data are collected from samples of the population, statistical tests are employed to measure differences between estimates to help ensure they are different taking into account possible sampling error. The descriptive comparisons were tested in this report using Student’s t statistic. Differences between estimates are tested against the probability of a type I error, or significance level. The significance levels were determined by calculating the Student’s t values for the differences between each pair of means or proportions and comparing these with published tables of significance levels for two-tailed hypothesis testing.

Student’s t values may be computed to test the difference between percentages with the following formula:

t equals the function represented by P sub 1 minus P sub 2 all over the square root of the combined se sub 1 squared plus se sub 2 squared.

where P1 and P2 are the estimates to be compared and se1 and se2 are their corresponding standard errors.

Several points should be considered when interpreting t statistics. First, comparisons based on large t statistics may appear to merit special attention. This can be misleading since the magnitude of the t statistic is related not only to the observed differences in means or proportions but also to the number of respondents in the specific categories used for comparison. Hence, a small difference compared across a large number of respondents would produce a large t statistic.

Second, there is a possibility that one can report a “false positive” or type I error. In the case of a t statistic, this false positive would result when a difference measured with a particular sample showed a statistically significant difference when there was no difference in the underlying population. Statistical tests are designed to control this type of error. These tests are set to different levels of tolerance or risk known as alphas. The alpha level of .05 selected for findings in this report indicates that a difference of a certain magnitude or larger would be produced no more than one time out of twenty when there was no actual difference in the quantities in the underlying population. When t values are smaller than the .05 level, the null hypothesis that there is no difference between the two quantities is rejected. Finding no difference, however, does not necessarily imply the values are the same or equivalent.

Third, the probability of a type I error increases with the number of comparisons being made. Bonferroni adjustments are sometimes used to correct for this problem. Bonferroni adjustments do this by reducing the alpha level for each individual test in proportion to the number of tests being done. However, while Bonferroni adjustments help avoid type I errors, they increase the chance of making type II errors. Type II errors occur when there actually is a difference present in a population, but a statistical test applied to estimates from a sample indicates that no difference exists. Prior to the 2001 report in this series, Bonferroni adjustments were employed. Because of changes in NCES reporting standards, Bonferroni adjustments are not employed in this report.

Regression analysis was used to test for trends across age groups and over time. Regression analysis assesses the degree to which one variable (the dependent variable) is related to one or more other variables (the independent variables). The estimation procedure most commonly used in regression analysis is ordinary least squares (OLS). When studying changes in rates over time, the rates were used as dependent measures in the regressions, with a variable representing time and a dummy variable controlling for changes in the educational attainment item in 1992 (=0 for years 1972 to 1991, =1 after 1992) used as independent variables. When slope coefficients were positive and significant, rates increased over time. When slope coefficients were negative and significant, rates decreased over time. Because of varying sample sizes over time, some of the observations were less reliable than others (i.e., some years’ standard errors were larger than those for other years). In such cases, OLS estimation procedures do not apply, and it is necessary to modify the regression procedures to obtain unbiased regression parameters. Each variable in the analysis was transformed by dividing by the standard error of the relevant year’s rate. The new dependent variable was then regressed on the new time variable and new editing-change dummy variable. All statements about trend changes in this report are statistically significant at the .05 level.

Top


2 Data in the CCD are based on data from all public schools. Data in the CPS are collected from a sample of households and not the full universe of households. As a result, CPS data have sampling errors associated with estimates whereas CCD data do not. For more information on CPS sampling errors and how to interpret them, see the section “Statistical Procedures for Analyzing CPS-Based Estimates” later in the appendix.

3 Passing the GED is a good but imperfect indicator of receiving a high school equivalency credential. Some people who pass the test may not receive the credential because they do not file necessary paperwork or pay necessary fees. People may also leave the country, die, or receive a regular high school diploma after passing the GED test.

4 The lowest standard minimum age for testing in any state is 16. Some jurisdictions grant exceptions to the minimum age on a case-by-case basis. GED Testing Service reports from the 1996–98 group the small number of individuals under age 16 as 16 years old for reporting purposes.