# Statistical Procedures

Back

Tests of significance

Comparisons made in the text of this report were tested for statistical significance. For example, in the commonly made comparison of education systems averages against the average of the United States, tests of statistical significance were used to establish whether or not the observed differences from the U.S. average were statistically significant. The estimation of the standard errors that is required in order to undertake the tests of significance is complicated by the complex sample and assessment designs, both of which generate error variance. Together they mandate a set of statistically complex procedures in order to estimate the correct standard errors. As a consequence, the estimated standard errors contain a sampling variance component estimated by the jackknife repeated replication (JRR) procedure; and, where the assessments are concerned, an additional imputation variance component arising from the assessment design. Details on the procedures used can be found in the WesVar 5.0 User's Guide (Westat 2007).

In almost all instances, the tests for significance used were standard t tests.1 These fell into two categories according to the nature of the comparison being made: comparisons of independent samples and comparisons of nonindependent samples. Before describing the t tests used, some background on the two types of comparisons is provided below.

The variance of a difference is equal to the sum of the variances of the two initial variables minus two times the covariance between the two initial variables. A sampling distribution has the same characteristics as any distribution, except that units consist of sample estimates and not observations. Therefore, The sampling variance of a difference is equal to the sum of the two initial sampling variances minus two times the covariance between the two sampling distributions
on the estimates.

If one wants to determine whether girls' performance differs from boys' performance, for example, then, as for all statistical analyses, a null hypothesis has to be tested. In this particular example, it consists of computing the difference between the boys' performance mean and the girls' performance mean
(or the inverse). The null hypothesis is To test this null hypothesis, the standard error on this difference is computed and then compared to the observed difference. The respective standard errors on the mean estimate for boys and girls can be easily computed.

The expected value of the covariance will be equal to 0 if the two sampled groups are independent. If the two groups are not independent, as is the case with girls and boys attending the same schools within an education system, or comparing an education system's mean with the international mean that includes that particular country, the expected value of the covariance might differ from 0.

In TIMSS and TIMSS Advanced, participating education systems' samples are independent. Therefore, for any comparison between two education systems, the expected value of the covariance will be equal to 0, and thus the standard error on the estimate is with θ being a tested statistic.

Within a particular education system, any subsamples will be considered as independent only if the categorical variable used to define the subsamples was used as an explicit stratification variable.

Therefore, as for any computation of a standard error in TIMSS and TIMSS Advanced 2015, replication methods using the supplied replicate weights are used to estimate the standard error on a difference. Use of the replicate weights implicitly incorporates the covariance between the two estimates into the estimate of the standard error on the difference.

Thus, in simple comparisons of independent averages, such as the U.S. average with other education systems' averages, the following formula was used to compute the t statistic: Est1 and est2 are the estimates being compared (e.g., average of education system A and the U.S. average), and se1and se2are the corresponding standard errors of these averages.

For TIMSS Advanced there are a small number of participating education systems. When a country is compared to the international group, there is an overlap between the samples in the sense that the country is part of the international group. These are referred to as part-whole comparisons.  Such comparisons require that the standard error of the mean differences be adjusted to account for the overlap. Let Ai be the statistic in question (e.g., the mean for group i) and let S(Ai) be the standard error of the statistic. Furthermore, group i is the whole (e.g., TIMSS Advanced group of nations) and group j is the part (e.g., U.S.). Groups i and j are said to be significantly different if where p is the proportion of the 9 countries represented by each country. This proportion is 1/9 or .11.The second type of comparison used in this report occurred when comparing differences of nonsubset, nonindependent groups (e.g., when comparing the average scores of boys versus girls within the United States). In such comparisons, the following formula was used to compute the t statistic: Estgrp1 and estgrp2 are the nonindependent group estimates being compared. Se(estgrp1 - estgrp2) is the standard error of the difference calculated using a JRR procedure, which accounts for any covariance between the estimates for the two nonindependent groups.

1 Adjustments for multiple comparisons were not applied in any of the t-tests undertaken.

Back