Mathematics Coursetaking and Achievement at the End of High School:
NCES 2008-319
January 2008

A.4 Statistical Procedures


A.4.1 Statistical Significance: Student t Statistics

Comparisons that have been discussed in the text of this report have been tested for statistical significance (set at a probability of .05) to ensure that the differences are larger than those that might be expected due to sampling variation. The statistical comparisons in this report were based largely on the t statistic. Whether the statistical test is considered significant is determined by calculating a t value for the difference between a pair of means or proportions and comparing this value to published tables of values, called critical values. The alpha level is an a priori statement of the probability that a difference exists in fact rather than by chance.

The t statistic between estimates from various subgroups presented in the tables can be computed by using the following formula:

t formula,

where x1 and x2 are the estimates to be compared (e.g., the means of sample members in two groups), and SE1 and SE2 are their corresponding standard errors. This formula is valid only for independent estimates.

An F-test was used to compare the fit of two regression models. This test is computed using the following formula:

F formula

where Reduced is the model with fewer independent variables, Full is the model with all the independent variables, SSE is the sum square of errors, and df is the degrees of freedom. For this test the degrees of freedom are n – k , where n = the sample size and k = the number of parameters in the model. A significant F statistic indicates that the Full model is a better fit to the data than the Reduced model.

Top

A.4.2 Effect Sizes

For means (which in this report are scores from the ELS:2002 mathematics assessment), an effect size (or standardized mean difference) has been calculated. The effect size stands as a measure of the magnitude of a difference. For purposes of comparisons drawn in this report, effect sizes (Cohen's d) were calculated as the change in mean test scores divided by their pooled standard deviation using the following formula

d formula

A criterion of one-fifth (d = 0.2) of a standard deviation was set as the minimum effect size. Differences were only reported in the text if the comparison met two criteria: (1) statistical significance at the .05 level; and (2) the difference was greater than one-fifth of a standard deviation. For purposes of evaluating effect sizes, the proficiency probability scores, like the Item Response Theory (IRT) number-right scores, have been treated as means, and are subject to the 0.2 required effect size. For proportions, this report has adopted a simple convention of reporting differences only if they are 5 percentage points or more. The effect size criterion was used because with large samples, such as the one in ELS:2002, a level of statistical significance can be reached based on differences that may be small in magnitude.3

Top

A.4.3 Multivariate Analysis: Ordinary Least Squares Regression

Ordinary least squares (OLS) regression analyses were performed to describe the relationship between math coursetaking and achievement after controlling for student and school characteristics (tables 8 and 9). The regression coefficients generated by the OLS procedure are interpreted as a slope. The regression coefficients, or slope, indicate how many units of change in the dependent variable occur for each unit change in the independent variable controlling for all other factors included in the model. A significant positive coefficient means that for every unit change in the independent variable there is a b units increase in the dependent variable. Conversely, a significant negative coefficient means that for every unit change in the independent variable there is a b units decrease in the dependent variable. T-test comparisons were conducted using the regression coefficients produced in the analyses. The same statistical significance criteria used in the bivariate analyses (p-value of .05 or less) was used. For example, in Model 2 in table 8, the regression coefficient for precalculus–AP/IB calculus is 4.2 and is significant at the .05 level—meaning that students who follow a precalculus–AP/IB calculus course sequence improve on the math achievement exam by 4.2 more correct answers than their peers who followed an algebra II—no mathematics course sequence.

Top


3 For more information about effect sizes, see Cohen (1988), Murphy and Myers (2004), and Seastrom (2003, Guideline 5-1-4F).