The descriptive comparisons were tested in this report using Student’s t statistic. Differences between estimates are tested against the probability of a Type I error,16 or significance level. The significance levels were determined by calculating the Student’s t values for the differences between each pair of means or proportions and comparing these with published tables of significance levels for two-tailed hypothesis testing.

Student’s t values may be computed to test the difference between estimates with the following formula:

where E1 and E2 are the estimates to be compared and se1 and se2 are their corresponding standard errors. This formula is valid only for independent estimates. When estimates are not independent, a covariance term must be added to the formula:

where r is the correlation between the two estimates.17 This formula is used when comparing two percentages from a distribution that adds to 100. If the comparison is between the mean of a subgroup and the mean of the total group, the following formula is used:

where p is the proportion of the total group contained in the subgroup.18 The estimates, standard errors, and correlations can all be obtained from the DAS.

There are hazards in reporting statistical tests for each comparison. First, comparisons based on large t statistics may appear to merit special attention. This can be misleading since the magnitude of the t statistic is related not only to the observed differences in means or percentages but also to the number of students in the specific categories used for comparison. Hence, a small difference compared across a large number of students would produce a large t statistic.

A second hazard in reporting statistical tests for each comparison occurs when making multiple comparisons among categories of an independent variable. For example, when making paired comparisons among different race/ethnicities, the probability of a Type I error for these comparisons taken as a group is larger than the probability for a single comparison. When more than one difference between groups of related characteristics or “families” are tested for statistical significance, one must apply a standard that assures a level of significance for all of those comparisons taken together.

When significant results were detected in this analysis, which were not based on hypotheses being tested, the findings were reported only when p < .05/k for a particular pairwise comparison, where that comparison was one of k tests within a family. This procedure guarantees both that the individual comparison would have p < .05 and that for k comparisons within a family of possible comparisons, the significance level for all the comparisons will sum to p < .05.19

For example, in a comparison of males and females, only one comparison is possible (males versus females). In this family, k=1, and the comparison can be evaluated without adjusting the significance level. When students are divided into five racial/ethnic categories (American Indian, Asian/Pacific Islander, Black, Hispanic, and White) and all possible comparisons are made, then k=10 and the significance level of each test must be p < .05/10, or p < .005. The formula for calculating family size (k) is as follows:

where j is the number of categories for the variable being tested. In the case of race/ethnicity, there are five race/ethnicity groups, so substituting 5 for j in equation 4, results in the following family size.

For this report, when there were comparisons in which a specific hypothesis was not being tested, tests were adjusted to account for family size. Most of comparisons made were between two groups (e.g., between men and women or between two cohorts) and, therefore, had a k of 1. Comparisons among the three NPSAS studies had a k of 3. Comparisons among categories were also adjusted to take into account multiple comparisons. For example, when comparing the proportion of women among students who did not work and among students who worked more than 34 hours in 1999–2000 (59 vs. 53 percent, table 2), the family size is 4 (did not work, 1–24 hours, 24–34 hours, more than 34 hours) and the k is 6.