- Surveys & Programs
- Data & Tools
- Fast Facts
- News & Events
- Publications & Products
- About Us

- Introduction
- Acknowledgments
- Selected Findings
- References
- List of Figures
- List of Tables
- Appendix A–Technical Notes
- Common Core of Data
- Defining and Calculating Event Dropout Rates Using the CCD
- Defining the Averaged Freshman Graduation Rate for Public School Students Using the CCD
- Current Population Survey
- Defining and Calculating Dropout and Completion Rates Using the CPS
- Statistical Procedures for Analyzing CPS–Based Estimates

- Appendix B–Glossary
- Appendix C–Standard Error Tables
- PDF & Related Info
- Contact

Because CPS data are collected from samples of the population, statistical tests
are employed to measure differences between estimates to help ensure they are taking
into account possible sampling error.^{8} The
descriptive comparisons in this report were tested using Student's *t* statistic.
Differences between estimates are tested against the probability of a type I error,^{9} or significance level. The significance levels
were determined by calculating the Student's *t* values for the differences
between each pair of means or proportions and comparing these with published tables
of significance levels for two-tailed hypothesis testing.

Student's *t* values may be computed to test the difference between percentages
with the following formula:

where *P*_{1} and *P*_{2} are the estimates to be
compared and *se*_{1} and *se*_{2} are their corresponding
standard errors.

Several points should be considered when interpreting *t* statistics. First,
comparisons based on large *t* statistics may appear to merit special attention.
This can be misleading since the magnitude of the *t* statistic is related
not only to the observed differences in means or proportions but also to the number
of respondents in the specific categories used for comparison. Hence, a small difference
compared across a large number of respondents would produce a large *t* statistic.

Second, there is a possibility that one can report a "false positive" or type I
error. In the case of a *t* statistic, this false positive would result when
a difference measured with a particular sample showed a statistically significant
difference when there was no difference in the underlying population. Statistical
tests are designed to control this type of error. These tests are set to different
levels of tolerance or risk known as alphas. The alpha level of .05 selected for
findings in this report indicates that a difference of a certain magnitude or larger
would be produced no more than 1 time out of 20 when there was no actual difference
in the quantities in the underlying population. When *t* values are smaller
than the .05 level, the null hypothesis that there is no difference between the
two quantities is rejected. Finding no difference, however, does not necessarily
imply that the values are the same or equivalent.

Third, the probability of a type I error increases with the number of comparisons being made. Bonferroni adjustments are sometimes used to correct for this problem. Bonferroni adjustments do this by reducing the alpha level for each individual test in proportion to the number of tests being done. However, while Bonferroni adjustments help avoid type I errors, they increase the chance of making type II errors. Type II errors occur when there actually is a difference present in a population, but a statistical test applied to estimates from a sample indicates that no difference exists. Prior to the 2001 report in this series, Bonferroni adjustments were employed. Because of changes in NCES reporting standards, Bonferroni adjustments are not employed in this report.

Regression analysis was used to test for trends across age groups and over time.
Regression analysis assesses the degree to which one variable (the dependent variable)
is related to one or more other variables (the independent variables). The estimation
procedure most commonly used in regression analysis is ordinary least squares (OLS).
When studying changes in rates over time, the rates were used as dependent measures
in the regressions, with a variable representing time and a dummy variable controlling
for changes in the educational attainment item in 1992 (= 0 for years 1972 to 1991,
= 1 after 1992) used as independent variables. When slope coefficients were positive
and significant, rates increased over time. When slope coefficients were negative
and significant, rates decreased over time. Because of varying sample sizes over
time, some of the observations were less reliable than others (i.e., some years'
standard errors were larger than those for other years). In such cases, OLS estimation
procedures do not apply, and it is necessary to modify the regression procedures
to obtain unbiased regression parameters. This is accomplished by using weighted
least squares regressions.^{10} Each variable
in the analysis was transformed by dividing by the standard error of the relevant
year's rate. The new dependent variable was then regressed on the new time variable,
a variable for 1/the standard error for the year's rate, and the new editing-change
dummy variable. All statements about trend changes in this report are statistically
significant at the .05 level.

^{8} The CCD and GEDTS data are universe data
collections and therefore do not require statistical testing such as that used for
estimates from the CPS sample survey data.

^{9} A Type I error occurs when one concludes
that a difference observed in a sample reflects a true difference in the population
from which the sample was drawn, when no such difference is present. It is sometimes
referred to as a "false positive."

^{10} For a general discussion of weighted least
squares analysis, please see Gujarati, D., *Basic Econometrics 2nd ed.* McGraw
Hill, Inc., New York: New York, 1998.