- Surveys & Programs
- Data & Tools
- Fast Facts
- News & Events
- Publications & Products
- About Us

**T****est****s ****o****f significance**

Comparisons made in the text of this report were tested for statistical significance. For example, in the commonly made comparison of education systems averages against the average of the United States, tests of statistical significance were used to establish whether or not the observed differences from the U.S. average were statistically significant. The estimation of the standard errors that is required in order to undertake the tests of significance is complicated by the complex sample and assessment designs, both of which generate error variance. Together they mandate a set of statistically complex procedures in order to estimate the correct standard errors. As a consequence, the estimated standard errors contain a sampling variance component estimated by the jackknife repeated replication (JRR) procedure; and, where the assessments are concerned, an additional imputation variance component arising from the assessment design. Details on the procedures used can be found in the *WesVar 5.0 User's Guide *(Westat 2007).

In almost all instances, the tests for significance used were standard *t *tests.^{1} These fell into two categories according to the nature of the comparison being made: comparisons of independent samples and comparisons of nonindependent samples. Before describing the *t *tests used, some background on the two types of comparisons is provided below.

The variance of a difference is equal to the sum of the variances of the two initial variables minus two times the covariance between the two initial variables. A sampling distribution has the same characteristics as any distribution, except that units consist of sample estimates and not observations. Therefore,

The sampling variance of a difference is equal to the sum of the two initial sampling variances minus two times the covariance between the two sampling distributions

on the estimates.

If one wants to determine whether girls' performance differs from boys' performance, for example, then, as for all statistical analyses, a null hypothesis has to be tested. In this particular example, it consists of computing the difference between the boys' performance mean and the girls' performance mean

(or the inverse). The null hypothesis is

To test this null hypothesis, the standard error on this difference is computed and then compared to the observed difference. The respective standard errors on the mean estimate for boys and girls can be easily computed.

The expected value of the covariance will be equal to 0 if the two sampled groups are independent. If the two groups are not independent, as is the case with girls and boys attending the same schools within an education system, or comparing an education system's mean with the international mean that includes that particular country, the expected value of the covariance might differ from 0.

In TIMSS and TIMSS Advanced, participating education systems' samples are independent. Therefore, for any comparison between two education systems, the expected value of the covariance will be equal to 0, and thus the standard error on the estimate is

with θ being a tested statistic.

Within a particular education system, any subsamples will be considered as independent only if the categorical variable used to define the subsamples was used as an explicit stratification variable.

Therefore, as for any computation of a standard error in TIMSS and TIMSS Advanced 2015, replication methods using the supplied replicate weights are used to estimate the standard error on a difference. Use of the replicate weights implicitly incorporates the covariance between the two estimates into the estimate of the standard error on the difference.

Thus, in simple comparisons of independent averages, such as the U.S. average with other education systems' averages, the following formula was used to compute the *t *statistic:

*Est*_{1} and *est*_{2} are the estimates being compared (e.g., average of education system A and the U.S. average), and *se**1*and *se**2*are the corresponding standard errors of these averages.

For TIMSS Advanced there are a small number of participating education systems. When a country is compared to the international group, there is an overlap between the samples in the sense that the country is part of the international group. These are referred to as part-whole comparisons. Such comparisons require that the standard error of the mean differences be adjusted to account for the overlap. Let *A _{i}* be the statistic in question (e.g., the mean for group

where *p* is the proportion of the 9 countries represented by each country. This proportion is 1/9 or .11.The second type of comparison used in this report occurred when comparing differences of nonsubset, nonindependent groups (e.g., when comparing the average scores of boys versus girls within the United States). In such comparisons, the following formula was used to compute the *t *statistic:

*Est*_{grp1} and *est*_{grp2} are the nonindependent group estimates being compared. *Se*(*est*_{grp1} - *est*_{grp2}) is the standard error of the difference calculated using a JRR procedure, which accounts for any covariance between the estimates for the two nonindependent groups.

^{1} Adjustments for multiple comparisons were not applied in any of the t-tests undertaken.