How Low Income Undergraduates Financed Postsecondary Education:1992-93
The need for a nationally representative database on postsecondary student financial aid prompted the U.S. Department of Education to conduct the National Postsecondary Student Aid Study (NPSAS), a survey conducted every three years beginning in 1987. The NPSAS sample was designed to include students enrolled in all types of postsecondary education. Thus, it included students enrolled in public institutions; private, not-for-profit institutions; and private, for-profit institutions. The sample included students at 4-year and 2-year institutions, as well as students enrolled in occupationally specific programs that lasted for less than 2 years. United States service academies were not included in the institution sample because of their unique funding and tuition base, and certain other type of institutions were also excluded.
NPSAS:93 included a stratified sample of approximately 66,000 eligible students (about 52,000 of whom were undergraduates) from about 1,100 institutions. Students were included in the sample if they attended a NPSAS-eligible institution; were enrolled between July 1, 1992 and June 30, 1993; and were enrolled in one or more courses or programs including courses for credit, a degree or formal award program of at least 3 months duration, or an occupationally or vocationally specific program of at least 3 months duration. Regardless of their postsecondary status, however, students who were also enrolled in high school were excluded.
The 1992-93 NPSAS survey sample, while representative and statistically accurate, was not a simple random sample. Instead, the survey sample was selected using a more complex three-step procedure with stratified samples and differential probabilities of selection at each level. First, postsecondary institutions were initially selected within geographical strata. Once institutions were organized by zip code and state, they were further stratified by control (i.e., public; private, not-for-profit; or private, for-profit) and offering (less-than-2-year, 2- to 3-year, 4-year nondoctorate-granting, and 4-year doctorate-granting). Sampling rates for students enrolled at different institutions and levels (undergraduate or other) varied, resulting in better data for policy purposes, but at a cost to statistical efficiency.
For each student in the NPSAS sample, there were up to three sources of data. First, institution registration and financial aid records were extracted. Second, a Computer Assisted Telephone Interview (CATI) was conducted with each student. Finally, a CATI designed for the parents or guardians of a subsample of students was conducted. Data from these three sources were synthesized into a single system with an overall response rate of about 85 percent.
For more information on the NPSAS survey, consult Methodology Report for the 1993 National Postsecondary Student Aid Study (Longitudinal Studies Branch, Postsecondary Education Statistics Division, Washington, D.C.: U.S. Department of Education, , NCES 95-211.
The Beginning Postsecondary Student Longitudinal Study (BPS) follows NPSAS:90 students who enrolled in postsecondary education for the first time in 1989-90. The first followup was conducted in spring 1992 and the second in spring 1994. BPS collected information from students on their persistence, progress, and attainment and on their labor force experience using a CATI. Approximately 8,000 students were included in the BPS sample.
The statistics in this report are estimates derived from a sample. Two broad categories of error occur in such estimates: sampling and non-sampling errors. Sampling errors occur because observations are made only on samples of students, not on entire populations. Non- sampling errors occur not only in sample surveys but also in complete censuses of entire populations.
Non-sampling errors can be attributed to a number of sources: inability to obtain complete information about all students in all institutions in the sample (some students or institutions refused to participate, or students participated but answered only certain items); ambiguous definitions; differences in interpreting questions; inability or unwillingness to give correct information; mistakes in recording or coding data; and other errors of collecting, processing, sampling, and imputing missing data.
The estimates presented in this report were produced using the NPSAS:93 Undergraduate Data Analysis System (DAS) and the BPS:90/94 DAS. The DAS software makes it possible for users to specify and generate their own tables from the NPSAS data. With the DAS, users can recreate or expand upon the tables presented in this report. In addition to the table estimates, the DAS calculates proper standard errors and weighted sample sizes for these estimates. For example, table B.1 presents the standard errors that correspond to table 8 in the text. If the number of valid cases is too small to produce an estimate, the DAS prints the message low-N instead of the estimate.
In addition to tables, the DAS will also produce a correlation matrix of selected variables to be used for linear regression models. Also output with the correlation matrix are the design effects (DEFT) for all the variables identified in the matrix. Since statistical procedures generally compute regression coefficients based on simple random sample assumptions, the standard errors must be adjusted with the design effects to take into account the NPSAS stratified sampling method. (See discussion under Statistical Procedures below for the adjustment procedure.)
For more information about the NCES NPSAS:90, NPSAS:93, and BPS:90/94 Data Analysis Systems, contact:
Aurora D Amico
(202) 502-7334
Email address: Aurora.D'Amico@ED.GOV
Two types of statistical procedures were employed in this report: testing differences between means, and adjustment of means after controlling for covariation among a group of variables. Each procedure is described below.
Differences Between Means
The descriptive comparisons were tested in this report using Student's t statistic. Differences between estimates are tested against the probability of a Type I error, or significance level. The significance levels were determined by calculating the Student's t values for the differences between each pair of means or proportions and comparing these with published tables of significance levels for two-tailed hypothesis testing.
Student's t values may be computed to test the difference between
estimates with the following formula:
(1)
where E1 and E2 are the estimates to be compared and se1 and se2 are their corresponding standard errors. Note that this formula is valid only for independent estimates. When the estimates were not independent (for example, when comparing the percentages across a
Table B1 Standard errors for table 8: Percentage of low income undergraduates attending full time, full year who received various types of financial aid and the average amounts received by aided students, by dependency status and type of institution: 1992-93
Total aid Grants Loans Work study ______________ ____________ ____________ ___________ Average Average Average Average Per- Per- Per- Per- cent amount cent amount cent amount cent amount ______________________________________________________________________________ Total Total 0.74 139.90 0.81 78.53 1.63 57.65 0.86 48.05 Institution type Public less-than-4-year 2.06 120.09 2.45 67.50 2.93 134.09 1.37 132.47 Public 4-year 1.03 115.08 1.20 60.56 1.89 54.94 0.98 52.77 Private, not-for-profit less-than-4-year 1.81 452.10 0.95 221.23 6.74 316.37 0.83 - Private, not-for-profit 4-year 1.15 578.68 1.58 381.25 4.61 127.35 2.78 62.12 Private, for-profit 1.99 425.60 2.11 152.49 6.30 178.95 2.36 - Dependent Total 1.15 218.32 1.37 133.95 2.44 68.80 1.50 50.39 Institution type Public less-than-4-year 4.89 172.65 4.75 114.94 4.80 - 4.19 - Public 4-year 1.54 195.80 1.78 92.63 3.23 77.41 1.62 77.16 Private, not-for-profit less-than-4-year 0.67 438.88 0.67 118.38 5.43 - 0.41 - Private, not-for-profit 4-year 1.73 791.46 2.49 555.37 5.95 135.82 4.31 68.48 Private, for-profit 1.48 420.33 2.39 113.66 5.20 189.84 0.42 - Single independent Total 0.85 142.62 1.03 84.09 1.58 92.44 1.01 63.05 Institution type Public less-than-4-year 2.73 171.71 3.58 104.67 3.61 2.29 - - Public 4-year 1.20 97.04 1.46 63.42 1.98 81.78 1.18 106.33 Private, not-for-profit less-than-4-year - - - - - - - - Private, not-for-profit 4-year 1.25 290.49 1.45 256.72 2.51 182.83 2.45 75.32 Private, for-profit 2.34 469.14 2.79 130.05 5.79 381.80 1.16 - Independents with dependents Total 1.21 177.16 1.23 75.08 2.33 77.79 1.49 125.59 Institution type Public less-than-4-year 2.82 194.02 3.25 70.76 3.82 217.65 1.98 - Public 4-year 1.72 122.84 1.87 89.96 2.50 93.64 1.67 107.01 Private, not-for-profit less-than-4-year 4.48 513.64 4.49 385.65 6.89 - 1.70 - Private, not-for-profit 4-year 1.41 350.73 1.48 287.00 3.06 234.98 2.35 160.64 Private, for-profit 2.49 539.62 2.70 207.00 7.56 173.52 3.87 -
- Sample size was too small for a reliable estimate.
SOURCE: U.S. Department of Education, National Center for Education Statistics, 1992-93 National Postsecondary Student Aid Study (NPSAS:93), Undergraduate Data Analysis System.percentage distribution), a covariance term was added to the denominator of the t-test formula.
There are hazards in reporting statistical tests for each comparison. First, comparisons based on large t statistics may appear to merit special attention. This can be misleading, since the magnitude of the t statistic is related not only to the observed differences in means or percentages but also to the number of students in the specific categories used for comparison. Hence, a small difference compared across a large number of students would produce a large t statistic.
A second hazard in reporting statistical tests for each comparison occurs when making multiple comparisons among categories of an independent variable. For example, when making paired comparisons among different levels of income, the probability of a Type I error for these comparisons taken as a group is larger than the probability for a single comparison. When more than one difference between groups of related characteristics or families are tested for statistical significance, one must apply a standard that assures a level of significance for all of those comparisons taken together.
Comparisons were made in this report only when p <= .05/k for a particular pairwise comparison, where that comparison was one of k tests within a family. This guarantees both that the individual comparison would have p <= .05 and that for k comparisons within a family of possible comparisons, the significance level for all the comparisons will sum to p <= .05.
For example, in a comparison of the percentages of males and females who enrolled in postsecondary education only one comparison is possible (males versus females). In this family, k=1, and the comparison can be evaluated without adjusting the significance level. When students are divided into five racial ethnic groups and all ossible comparisons are made, then k=10 and the significance level of each test must be p <= .05/10, or p <= .005. The formula for calculating family size (k) is as follows:
(2)where j is the number of categories for the variable being tested. In the case of race ethnicity, there are five racial-ethnic groups (American Indian, Asian/Pacific Islander, black non- Hispanic, Hispanic, and white non-Hispanic), so substituting 5 for j in equation 2,
Adjustment of Means
Tabular results are limited by sample size when attempting to control for additional factors that may account for the variation observed between two variables. For example, when examining the percentages of those who completed a degree, it is impossible to know to what extent the observed variation is due to low income status differences and to what extent it is due to differences in other factors related to income, such as type of institution attended, parents education, and so on. However, if a table were produced showing income within, type of institution within parent's education within, for example, the cell sizes would be too small to identify the patterns. When the sample size becomes too small to support controls for another level of variation, one must use other methods to take such variation into account.
To overcome this difficulty, multiple linear regression was used to obtain means that were adjusted for covariation among a list of control variables. Adjusted means for subgroups were obtained by regressing the dependent variable on a set of descriptive variables such as gender, race ethnicity, parents education, etc. Substituting ones or zeros for the subgroup characteristic(s) of interest and the mean proportions for the other variables results in an estimate of the adjusted proportion for the specified subgroup, holding all other variables constant. For example, consider a hypothetical case in which two variables, age and gender, are used to describe an outcome, Y (such as completing a degree). The variables age and gender are recoded into a dummy variable representing age and a dummy variable representing gender:
Age A 24 years or older 1 Under 24 years old 0 and Gender G Female 1 Male 0
The following regression equation is then estimated from the correlation matrix output from the DAS:
(3) To estimate the adjusted mean for any subgroup evaluated at the mean of all
other variables, one substitutes the appropriate values for that subgroup s
dummy variables (1 or 0) and the mean for the dummy variable(s) representing all
other subgroups. For example, suppose we had a case where Y=was being described
by age (A) and gender (G), coded as shown above, and the means for A and G are:
Variable Mean A 0.355 G 0.521
Suppose the regression equation results in:
(4)To estimate the adjusted value for older students, one substitutes the appropriate parameter values into equation 3.
Variable Parameter Value a 0.15 _ A 0.17 1.000 G 0.01 0.521
This results in:
(5)In this case the adjusted mean for older students is 0.325 and represents the expected outcome for older students who look like the average student across the other variables (in this example, gender).
It is relatively straightforward to produce a multivariate model using NPSAS:93 or BPS:90/94 data, since one of the output options of the DAS is a correlation matrix, computed using pair-wise missing values. This matrix can be used by most commercial regression packages as the input data to produce least-squares regression estimates of the parameters. That was the general approach used for this report, with two additional adjustments described below to incorporate the complex sample design into the statistical significance tests of the parameter estimates.
Most commercial regression packages assume simple random sampling when computing standard errors of parameter estimates. Because of the complex sampling design used for NPSAS:93, this assumption is incorrect. A better approximation of their standard errors is to multiply each standard error by the average design effect of the dependent variable (DEFT), where the DEFT is the ratio of the true standard error to the standard error computed under the assumption of simple random sampling. It is calculated by the DAS and produced with the correlation matrix.