Skip Navigation
small NCES header image
Access to Postsecondary Education for the 1992 High School Graduates

Appendix B: Technical Notes and Methodology

The National Educational Longitudinal Study of 1988

The National Education Longitudinal Study of 1988 (NELS:88) is a survey that began with a nationally representative sample of 1988 eighth graders and followed them every two years. The most recent follow-up survey occurred in 1994. Respondents' teachers and schools were also surveyed in 1988, 1990, and 1992, while parents were surveyed in 1988 and 1992. In contrast to previous longitudinal studies, NELS:88 began with eighth graders in order to collect data regarding the transition from elementary to secondary education. The first follow-up in 1990 provided the data necessary to understand the transition. Dropouts were administered a special survey to understand the dropout process more thoroughly. For the purpose of providing a comparison group to 1980 sophomores surveyed in High School and Beyond, the NELS:88 sample was also "freshened" with new participants who were tenth graders in 1990.

In spring of 1992, when most of the NELS:88 sample were twelfth graders, the second follow-up took place. This survey focused on the transition from high school to the labor force and postsecondary education. The sample was also "freshened" in order to create a representative sample of 1992 seniors for the purpose of conducting trend analyses with the 1972 and 1982 senior classes (NLS-72 and HS&B). Students identified as dropouts in the first follow-up were also resurveyed in 1992. In spring of 1994, the third follow-up was administered. Sample members were questioned about their labor force and postsecondary experiences, and family formation. For more information about the NELS:88 survey, consult the NELS:88/94 Methodology Report\47\.

Accuracy of Estimates

The statistics in this report are estimates derived from a sample. Two broad categories of error occur in such estimates: sampling and nonsampling errors. Sampling errors occur because observations are made only on samples of students, not on entire populations. Nonsampling errors occur not only in sample surveys but also in complete censuses of entire populations. Non-sampling Nonsampling errors can be attributed to a number of sources: inability to obtain complete information about all students in all institutions in the sample (some students or institutions refused to participate, or students participated but answered only certain items); ambiguous definitions; differences in interpreting questions; inability or unwillingness to give correct information; mistakes in recording or coding data; and other errors of collecting, processing, sampling, and imputing missing data.

Data Analysis System

The estimates presented in this report were produced using the NELS:88 and NPSAS:93 Data Analysis Systems (DAS). The DAS software makes it possible for users to specify and generate their own tables from the NELS:88 data. With the DAS, users can replicate or expand upon the tables presented in this report. In addition to the table estimates, the DAS calculates proper standard errors\48\ and weighted sample sizes for these estimates. For example, table B1 contains standard errors that correspond to table 2 in the text, and was generated by the DAS. If the number of valid cases is too small to produce a reliable estimate (less than 30 cases), the DAS prints the message "low-N" instead of the estimate.

In addition to tables, the DAS will also produce a correlation matrix of selected variables to be used for linear regression models. Included in the output with the correlation matrix are the design effects (DEFTs) for each variable in the matrix. Since statistical procedures generally compute regression coefficients based on simple random sample assumptions, the standard errors must be adjusted with the design effects to take into account the NELS:88 and NPSAS:93 stratified sampling method. (See discussion under "Statistical Procedures" below for the adjustment procedure.)

For more information about the NELS:88 and NPSAS:93 Data Analysis Systems, consult the NCES DAS Website or contact:

Aurora D'Amico
NCES Data Development and Longitudinal Studies Group
1990 K Street NW
Washington, DC 20006
(202) 502-7334
Internet address: Aurora.D'amico@ed.gov

Statistical Procedures

Two types of statistical procedures were employed in this report: testing differences between means, and adjustment of means after controlling for covariation among a group of variables. Each procedure is described below.

Differences Between Means

The descriptive comparisons were tested in this report using Student's t statistic. Differences between estimates are tested against the probability of a Type I error, or significance level. The significance levels were determined by calculating the Student's t values for the differences between each pair of means or proportions and comparing these with published tables of significance levels for two-tailed hypothesis testing.

Student's t values may be computed to test the difference between estimates with the following formula:

   (1)

where E1 and E2 are the estimates to be compared and se1 and se2 are their corresponding standard errors. This formula is valid only for independent estimates. When estimates are not independent a covariance term must be added to the formula. If the comparison is between the mean of a subgroup and the mean of the total group, the following formula is used:

    (2)

where p is the proportion of the total group contained in the subgroup\49\.

When comparing two percentages from a distribution that adds to 100 percent, the following formula is used:

    (3)

where r is the correlation between the two estimates\50\. The estimates, standard errors, and correlations can all be obtained from the DAS.

There are hazards in reporting statistical tests for each comparison. First, comparisons based on large t statistics may appear to merit special attention. This can be misleading, since the magnitude of the t statistic is related not only to the observed differences in means or percentages but also to the number of students in the specific categories used for comparison. Hence, a small difference compared across a large number of students would produce a large t statistic.

A second hazard in reporting statistical tests for each comparison occurs when making multiple comparisons among categories of an independent variable. For example, when making paired comparisons among different levels of income, the probability of a Type I error for these comparisons taken as a group is larger than the probability for a single comparison. When more than one difference between groups of related characteristics or "families" are tested for statistical significance, one must apply a standard that assures a level of significance for all of those comparisons taken together.

Comparisons were made in this report only when p< .05/k for a particular pairwise comparison, where that comparison was one of k tests within a family. This guarantees both that the individual comparison would have p< .05 and that for k comparisons within a family of possible comparisons, the significance level for all the comparisons will sum to p< .05\51\.

For example, in a comparison of the percentages of males and females who enrolled in postsecondary education only one comparison is possible (males versus females). In this family, k=1, and the comparison can be evaluated without adjusting the significance level. When students are divided into five racial-ethnic groups and all possible comparisons are made, then k=10 and the significance level of each test must be p< .05/10, or p< .005. The formula for calculating family size (k) is as follows:

   (4)

where j is the number of categories for the variable being tested. In the case of race-ethnicity, there are five racial-ethnic groups (American Indian, Asian/Pacific Islander, black non-Hispanic, Hispanic, and white non-Hispanic), so substituting 5 for j in equation 2,

Adjustment of Means to Control for Background Variation

Tabular results are limited by sample size when attempting to control for additional factors that may account for the variation observed between two variables. For example, when examining the percentages of those who completed a degree, it is impossible to know to what extent the observed variation is due to socioeconomic status (SES) differences and to what extent it is due to differences in other factors related to SES, such as type of institution attended, intensity of enrollment, and so on. However, if a nested table were produced showing SES within type of institution attended, within enrollment intensity, the cell sizes would be too small to identify the patterns. When the sample size becomes too small to support controls for another level of variation, one must use other methods to take such variation into account.

To overcome this difficulty, multiple linear regression was used to obtain means that were adjusted for covariation among a list of control variables\52\. Adjusted means for subgroups were obtained by regressing the dependent variable on a set of descriptive variables such as race-ethnicity, family income, etc. Substituting ones or zeros for the subgroup characteristic(s) of interest and the mean proportions for the other variables results in an estimate of the adjusted proportion for the specified subgroup, holding all other variables constant. For example, consider a hypothetical case in which two variables, race-ethnicity and income, are used to describe an outcome, Y (such as attending a four-year college). The variables race-ethnicity and family income are recoded into a dummy variable representing race-ethnicity and a dummy variable representing family income:

The following regression equation is then estimated from the correlation matrix output from the DAS:

To estimate the adjusted mean for any subgroup evaluated at the mean of all other variables, one substitutes the appropriate values for that subgroup's dummy variables (1 or 0) and the mean for the dummy variable(s) representing all other subgroups. For example, suppose we had a case where Y was being described by race-ethnicity (R) and family income (F), coded as shown above, and the means for R and F are as follows:

Suppose the regression equation results in:

To estimate the adjusted value for black students, one substitutes the appropriate parameter values into equation 4.

This results in:

In this case the adjusted mean for black students is 0.48 and represents the expected outcome for black students who look like the average student across the other variables (in this example, family income). In other words, the adjusted percentage who enrolled in a four-year college is 48 percent (0.48 x 100 for conversion to a percentage).

It is relatively straightforward to produce a multivariate model using the DAS, since one of the DAS output options is a correlation matrix, computed using pair-wise missing values\53\. This matrix can be used by most statistical software packages as the input data for least-squares regression. That is the approach used for this report, with an additional adjustment to incorporate the complex sample design into the statistical significance tests of the parameter estimates (described below). For tabular presentation, parameter estimates and standard errors were multiplied by 100 to match the scale used for reporting unadjusted and adjusted percentages.

Most statistical software packages assume simple random sampling when computing standard errors of parameter estimates. Because of the complex sampling design used for the NELS:88/94 and NPSAS:93 surveys, this assumption is incorrect. A better approximation of their standard errors is to multiply each standard error by the average design effect of the independent variable (DEFT)\54\, where the DEFT is the ratio of the true standard error to the standard error computed under the assumption of simple random sampling. It is calculated by the DAS and produced with the correlation matrix.



Footnotes:

47/ U.S. Department of Education, , National Education Longitudinal Study (NELS:88/94) Methodology Report, NCES 96-174 (Washington D.C.: 1996).

48/ The NELS:88/94 and NPSAS:93 samples are not simple random samples and, therefore, simple random sample techniques for estimating sampling error cannot be applied to these data. The DAS takes into account the complexity of the sampling procedures and calculates standard errors appropriate for such samples. The method for computing sampling errors used by the DAS involves approximating the estimator by the linear terms of a Taylor series expansion. The procedure is typically referred to as the Taylor series method.

49/ U.S. Department of Education, , A Note from the Chief Statistician, No. 2, 1993.

50/ Ibid.

51/ The standard that p<.05/k for each comparison is more stringent than the criterion that the significance level of the comparisons should sum to p<.05. For tables showing the t statistic required to ensure that p<.05/k for a particular family size and degrees of freedom, see Olive Jean Dunn, "Multiple Comparisons Among Means," Journal of the American Statistical Association 56 (1961): 52-64.

52/ For more information about regression, see Michael S. Lewis-Beck, Applied Regression: An Introduction, Vol. 22 (Beverly Hills, CA: Sage Publications, Inc., 1980); W.D. Berry and S. Feldman, Multiple Regression in Practice, Vol. 50 (Beverly Hills, CA: Sage Publications, Inc., 1987).

53/ Although the DAS simplifies the process of making regression models, it also limits the range of models. Analysts who wish to use other than pairwise treatment of missing values or to estimate probit/logit models (which are the most appropriate for models with categorical dependent variables) can apply for a restricted data license from NCES. See John H. D. Aldrich and Forrest D. Nelson, 1984. Linear Probability, Logit and Probit Models (Quantitative Applications in the Social Sciences, Vol. 45) Beverly Hills, CA: Sage University Press.

54/ The adjustment procedure and its limitations are described in C.J. Skinner, D. Holt, and T.M.F. Smith, eds., Analysis of Complex Surveys (New York: John Wiley & Sons, 1989).


Prev Contents

National Center for Education Statistics - http://nces.ed.gov
U.S. Department of Education