Skip Navigation

Search
PEDAR: Research Methodology Distance Education Instruction by Postsecondary Faculty and Staff: Fall 1998
The 1999 National Study of Postsecondary Faculty
Accuracy of Estimtes
Data Analysis Systems
Statistical Procedures
Differences Between Means
Linear Trends
Adjustments of Means to Control for Background Variation
Executive Summary
References
Full Report (PDF)
Executive Summary (PDF)
Statistical Procedures - Adjustment of Means to Control for Background Variation


Many of the independent variables included in the analyses in this report are related, and to some extent the pattern of differences found in the descriptive analyses reflect this covariation. For example, when examining the percentage of the faculty who taught distance classes by instructional level, it is possible that some of the observed relationship is due to differences in other factors related to instructional level, such as institution type, institution size, and so on. However, if nested tables were used to isolate the influence of these other factors, cell sizes would become too small to identify the significant differences in patterns. When the sample size becomes too small to support controls for another level of variation, one must use other methods to take such variation into account. The method used in this report estimates adjusted means with regression models, an approach sometimes referred to as communality analysis.

To overcome this difficulty for the analysis of the percentage of faculty teaching distance classes as well as additional analyses included in appendix C, multiple linear regression was used to obtain means that were adjusted for covariation among a list of control variables.9 Each independent variable is divided into several discrete categories. To find an estimated mean value on the dependent variable for each category of an independent variable, while adjusting for its covariation with other independent variables in the equation, substitute the following in the equation: (1) a one in the category’s term in the equation, (2) zeroes for the other categories of this variable, and (3) the mean proportions for all other independent variables. This procedure holds the impact of all remaining independent variables constant, and differences between adjusted means of categories of an independent variable represent hypothetical groups that are balanced or proportionately equal on all other characteristics included in the model as independent variables.

For example, consider a hypothetical case in which two variables, age and gender, are used to describe an outcome, Y (such as percentage of respondents teaching distance classes). The variables age and gender are recoded into a dummy variable representing age, A, and a dummy variable representing gender, G:

Age A
Less than 35 years old 1
35 years or older 0

and

Gender G
Female 1
Male 0

The following regression equation is then estimated from the correlation matrix output from the DAS as input data for standard regression procedures:

(5)

To estimate the adjusted mean for any subgroup evaluated at the mean of all other variables, one substitutes the appropriate values for that subgroup's dummy variables (1 or 0) and the mean for the dummy variable(s) representing all other subgroups. For example, suppose Y represents attainment, and is being described by age (A) and gender (G), coded as shown above, with means as follows:

Variable Mean
A 0.355
G 0.411

Next, suppose the regression equation results in:

(6)


To estimate the adjusted value for younger faculty, one substitutes the appropriate parameter estimates and variable values into equation 6.

Variable Parameter Mean
a 0.51
A -0.17 1.000
G -0.21 0.411


This results in the following equation:



In this case, the adjusted mean for younger faculty is 0.254 and represents the expected outcome for younger faculty who resemble the average faculty member across the other variables (in this example, gender). In other words, the adjusted percentage of younger faculty teaching distance classes, controlling for gender, is 25.4 percent (0.254 x 100 for conversion to a percentage).

It is relatively straightforward to produce a multivariate model using the DAS, since one of the DAS output options is a correlation matrix, computed using pairwise missing values. In regression analysis, there are several common approaches to the problem of missing data. The two simplest are pairwise deletion of missing data and listwise deletion of missing data. In pairwise deletion, each correlation is calculated using all of the cases for the two relevant variables. For example, suppose you have a regression analysis that uses variables X1, X2, and X3. The regression is based on the correlation matrix between X1, X2, and X3. In pairwise deletion the correlation between X1 and X2 is based on the nonmissing cases for X1 and X2. Cases missing on either X1 or X2 would be excluded from the calculation of the correlation. In listwise deletion the correlation between X1 and X2 would be based on the nonmissing values for X1, X2, and X3. That is, all of the cases with missing data on any of the three variables would be excluded from the analysis.

The correlation matrix can be used by most statistical software packages as the input data for least squares regression. That is the approach used for this report, with an additional adjustment to incorporate the complex sample design into the statistical significance tests of the parameter estimates (described below). For tabular presentation, parameter estimates and standard errors were multiplied by 100 to match the scale used for reporting unadjusted and adjusted percentages.

Although the DAS simplifies the process of making regression models, it also limits the range of models. The means adjustment procedure used here relies on a least squares regression model, which is sometimes sufficient for binary outcomes (such as the outcomes studied here, the percentage of faculty teaching distance classes). However, when the proportion of the sample participating in the outcome is very low or very high, logit or probit models are preferred.10 Because the outcomes of interest--teaching a distance education class or a non-face-to-face class--were relatively uncommon, a logit analysis was also performed on the restricted use data using the SUDAAN software program; variance estimation in SUDANN is accomplished by the Taylor series method using information about the sratum and primary sampling unit of each case, available in the restricted use dataset. The logit analysis exhibited similar patterns to the results shown in this report.

Most statistical software packages assume simple random sampling when computing standard errors of parameter estimates. Because of the complex sampling design used for the NPSAS survey, this assumption is incorrect. A better approximation of their standard errors is to multiply each standard error by the design effect associated with the dependent variable (DEFT),11 where the DEFT is the ratio of the true standard error to the standard error computed under the assumption of simple random sampling. It is calculated by the DAS and produced with the correlation matrix.