In NAEP, a t test for independent samples is used to compare estimates from two populations unless both groups have some overlap in terms of sampled students. The goal of the t test is to determine the probability that average estimates from two samples come from a single population (with a single, common average.) If this probability is small, then the two sample average estimates are said to be significantly different.
Let Ai be the statistic in question (e.g., a mean for group i) and let SAi be the jackknife standard error of the statistic. The text in the reports identified the means or proportions for groups i and j as being different if:
where Tα is the (1 - α) percentile of the t distribution with df degrees of freedom. In some cases where more than two groups or jurisdictions are compared, multiple comparison procedures are applied. This adjustment is based on the Benjamini and Hochberg (1995) procedure of controlling the false discovery rate (FDR).
Many of the group comparisons explicitly discussed in the reports involved mutually exclusive sets of students. Examples include comparisons of the average scale score for male and female students, White and Hispanic students, students attending schools in central city and urban fringe or large-town locations, students who reported watching six or more hours of television each night, and students who reported watching less than one hour of television each night.
The current procedures used to complete most statistical tests for NAEP require the assumption that the data being compared are from independent samples. Because of the sampling design in which primary sampling units (PSUs), schools, and students within school are randomly sampled, the data from mutually exclusive sets of students may not be strictly independent. Therefore, the significance tests employed are, in many cases, only approximate. Another procedure, one that does not assume independence, could have been conducted. However, a more conservative stance is taken with the use of t tests for partly overlapping groups when dependencies in the sample must be addressed.
A comparison of the standard errors using the independence assumption and the correlated group assumption was made using NAEP data. The estimated standard error of the difference based on independence assumptions was approximately 10 percent larger than the more complicated estimate based on correlated groups. In almost every case, the correlation of NAEP data across groups was positive. Because, in NAEP, significance tests based on assumptions of independent samples are only somewhat conservative, the approximate (assuming independence) procedure was used for most comparisons.