PEDAR: Research Methodology  The Road Lsss Traveled? Students Who Enroll in Multiple Institutions
Beginning Postsecondary Students Longitudinal Study
The 2001 Baccalaureate and Beyond Longitudinal Study
Accuracy of Estimates
Item Response Rates
Data Analaysis System
Statistical Procedures
Differences Between Means
Linear Trends
Multivariate Commonality Analysis
Missing Data and Adjusting for Complex Sampling Design
Interpreting the Results
Executive Summary
Full Report (PDF)
Executive Summary (PDF)
 Statistical Procedures: Missing Data and Adjusting for Complex Sampling Design

The DAS computes the correlation matrix using pairwise missing values. In regression analysis, there are several common approaches to the problem of missing data. The two simplest approaches are pairwise deletion of missing data and listwise deletion of missing data. In pairwise deletion, each correlation is calculated using all of the cases for the two relevant variables. For example, suppose you have a regression analysis that uses variables X1, X2, and X3. The regression is based on the correlation matrix between X1, X2, and X3. In pairwise deletion, the correlation between X1 and X2 is based on the nonmissing cases for X1 and X2. Cases missing on either X1 or X2 would be excluded from the calculation of the correlation. In listwise deletion, the correlation between X1 and X2 would be based on the nonmissing values for X1, X2, and X3. That is, all of the cases with missing data on any of the three variables would be excluded from the analysis.

The correlation matrix produced by the DAS can be used by most statistical software packages as the input data for least squares regression. The DAS provides either the SPSS or SAS code necessary to run least squares regression models. The DAS also provides additional information to incorporate the complex sample design into the statistical significance tests of the parameter estimates. Most statistical software packages assume simple random sampling when computing standard errors of parameter estimates. Because of the complex sampling design used for the survey, this assumption is incorrect. A better approximation of their standard errors is to multiply each standard error by the design effect associated with the dependent variable (DEFT),14 where the DEFT is the ratio of the true standard error to the standard error computed under the assumption of simple random sampling. It is calculated by the DAS and displayed with the correlation matrix output.

next section