- Surveys & Programs
- Data & Tools
- Fast Facts
- News & Events
- Publications & Products
- About Us

- Letter from the Commissioner
- Executive Summary
- List of Figures
- List of Tables
- Introduction
- Chapter 1: Demographic Context
- Chapter 2: Characteristics of Schools
- Chapter 3: Student Behaviors and Afterschool Activities
- Chapter 4: Academic Preparation and Achievement
- Chapter 5: College Knowledge
- Chapter 6: Postsecondary Education
- Chapter 7: Postsecondary Outcomes and Employment
- Chapter 8: Multivariate Analyses of Immediate Postsecondary Enrollment and Degree Attainment
- References
- Appendix A: Technical Appendix: Logistic Regression Analysis and Imputation Procedures
- Appendix B: Guide to Sources
- PDF & Related Info
- Contact

In chapter 8 of this report, two logistic regression analyses were conducted to
explore factors associated with students' immediate enrollment in postsecondary
education after high school and their attainment of an associate's or bachelor's
degree within 6 years of beginning postsecondary education. Multivariate analyses,
such as logistic multiple regression models, provide information on whether group
differences in immediate postsecondary enrollment and degree attainment persist
after controlling for student, family, and school/institutional characteristics.
The analysis for the first model, immediate postsecondary enrollment, was conducted
using data from the Education Longitudinal Study of 2002 (ELS:2002), including variables
from the base year (2002), first follow-up (2004), and second follow-up (2006).
The analysis for the second model, attainment of an associate's or bachelor's degree
within 6 years of beginning postsecondary education, was conducted using data from
the Beginning Postsecondary Students Longitudinal Study (BPS:04/09), including data
from the base year (2004), first follow-up (2006), and second follow-up (2009).
Descriptions of the ELS:2002 and BPS:04/09 surveys are provided in the *Guide to
Sources section* of this report.

This technical appendix provides details on the logistic regression models used with the ELS:2002 and BPS:04/09 analysis datasets. In addition, this appendix provides details on the procedures used to impute missing data for key variables used in the ELS:2002 logistic regression model. The BPS:04/09 dataset variables were imputed before release, so no additional imputation procedures were performed. The appendix concludes with a glossary of definitions of the ELS:2002 and BPS:04/09 variables used in the logistic regression models.

The analyses conducted in chapter 8 employed the technique of logistic regression
for categorical outcomes, which produces coefficients estimating the relationship
between independent variables on the probability of the dependent outcome. To aid
in the interpretation of results, the effect of a change in a given independent
variable, *X* , is transformed into an odds ratio and the percentage likelihood
of the dependent outcome. The formula for calculating an odds ratio is

*exp (beta _{j})c *

where *exp *equals base *e *(a constant equal to 2.71828182845904,
the base of the natural logarithm), *beta *equals the logistic regression
coefficient (represented in the equation as an exponent), and *c* equals
the number of units of change in *X *(e.g., 1, 2, 30). For categorical variables,
the value of *c *is set to 1 and the odds ratio equals the exponent of the
logistic regression coefficient. Both the ELS:2002 and BPS:04/09 logistic regression
analyses were conducted with the SUDA AN-callable procedure "PROC RLOGIST" using
SAS, version 9.2. For the ELS:2002 analyses, the multiple imputation option was
included in the SUDA AN procedure to include five imputed datasets, which are discussed
in the next section ("Imputation procedures for ELS:2002 data"). The ELS:2002 analysis
was weighted by the full sample weight (F2BY WT), and standard errors were calculated
using balanced repeated replication (BRR) procedures with the replicate weights
(F2BYP1 – F2BYP200). The BPS:04/09 analysis was weighted by the full sample weight
(WTB000), and standard errors were calculated using BRR procedures with the replicate
weights (WTB001 – WTB200).

In the ELS:2002 logistic regression model, the binary dependent variable was an indicator of whether an on-time 2004 high school graduate enrolled immediately in a postsecondary institution (i.e., by December 2004). Multiple categorical and continuous independent variables were entered simultaneously for the regression analysis to allow for the interpretation of relationships between each independent variable and immediate postsecondary enrollment, after controlling for other independent variables included in the model. The independent variables included in the ELS:2002 regression model include student's sex, race/ethnicity, socioeconomic status, family composition (i.e., number of parents/guardians in the household), standardized 10th-grade mathematics test score, 9th-grade GPA, previous grade retention status, sports and extracurricular activities participation status, number of absences from school, number of times skipped classes, parent engagement in discussing coursework with student, number of hours worked per week, and number of close friends who dropped out of high school. Only students who graduated from high school by August 2004 were used in the logistic model for immediate postsecondary enrollment.

In the BPS:04/09 logistic regression model, the binary dependent variable was an
indicator of whether a recent high school graduate who began postsecondary enrollment
in academic year 2003–04 attained an associate's or bachelor's degree by June 2009
(i.e., within 6 years of entering postsecondary education). Multiple categorical
and continuous independent variables were entered simultaneously for the regression
analysis to allow for the interpretation of relationships between each independent
variable and associate's or bachelor's degree attainment, after controlling for
other independent variables included in the model. The independent variables included
in the BPS:04/09 regression model include student's sex, race/ethnicity, parents'
educational attainment, income quartile in 2004, highest level of high school mathematics,
indicators for college-level credits earned in high school, SAT/ACT test taking,
control and level of first postsecondary institution, whether the student declared
a major during the first year, whether remedial classes were taken in the first
year, whether the student met with advisor in the first year, school club and sports
participation status in the first year, number of hours worked per week, attendance
intensity pattern through 2009 (e.g., always enrolled full time), and number of
"stopouts"^{12} and transfers through
2009. Only students who graduated from high school in the year prior to entering
postsecondary education were used in the logistic model for degree attainment.

Associations between student characteristics and the two outcome variables were examined for the full sample as well as separately for males and females; separately for Whites, Blacks, and Hispanics; and separately for males and females within each of these racial/ethnic groups. Multivariate analyses were not conducted for Asians, Native Hawaiians/Pacific Islanders, or American Indians/ Alaska Natives due to small sample sizes. Also, for the Black male and female and Hispanic male and female subgroup models, some of the results that appear to be substantive in magnitude are not statistically significant due to small subgroup sample sizes.

The global fit of the full sample and subgroup logistic models were assessed using
different diagnostic measures, including chi-squared statistics, pseudo *r *
squared values, and measures of the increase in the percentage accuracy in classifying
cases on the dichotomous outcome variable based on comparisons between an intercept-only
model and a fully specified model. The Likelihood Ratio (LR) Chi-Square test results
indicate that the fully specified model is a better fit than the intercept-only
model for the full sample and subgroup logistic models. The diagnostic results also
indicate that the percentage accuracy in predicting the outcome variable increased
across all of the logistic regression models when the selected independent variables
were included. For example, the overall ELS:2002 logistic regression model percentage
accuracy increased from 69.65 percent for the intercept-only model to 77.12 percent
for the fully specified model. For the BPS:04/09 model, the percentage accuracy
increased from 53.17 percent for the intercept-only model to 75.47 percent for the
fully specified model. The diagnostic results indicated adequate global fit of both
regression models.

Odds ratios are calculated for each of the categorical independent variables used in the regression models and represent the likelihood of students in one category of an independent variable (referred to as the identity group) completing an event relative to a reference group. If the event is equally likely to occur for both groups, then the odds ratio value equals one. If a category has an odds ratio that is less than one, then students in the identity group have lower odds of immediate postsecondary enrollment than students in the reference group. For example, the odds ratio of 0.65 for males (table ELS-2) is the ratio of the odds of males immediately enrolling in postsecondary education after high school to the odds of females immediately enrolling, after accounting for the effect of all of the other predictor variables in the model. The odds ratio of 0.65 indicates that the odds of a male immediately enrolling in postsecondary education after high school graduation are 35 percent lower [computed as ((odds ratio – 1) × 100) = ((0.65 – 1) * 100)] than the odds for a female (i.e., males are less likely than females to immediately enroll in postsecondary education). In this example, females are the reference group for the predictor variable. If a group category has an odds ratio greater than one, then students in the identity group are more likely to exhibit a certain outcome than students in the reference group. For example, the odds ratio of 1.63 for students who first enrolled in a 4-year postsecondary institution (table BPS-2) indicates that such a student has 63 percent higher odds of attaining a degree within 6 years than a student who first enrolled in a less-than-4-year institution. For continuous predictor variables, such as standardized test scores or number of postsecondary institution transfers, results are also interpreted in the form of odds ratios based on one unit of change in the independent variable. For example, in table ELS-2, the odds ratio of 1.88 for 9th-grade GPA indicates that a one-point increase in a student's 9th-grade GPA value (e.g., from a 2.0 to a 3.0) is associated with an 88 percent increase in the odds of the student immediately enrolling in postsecondary education. Asterisks (*) are used in the chapter tables to denote findings that are statistically significant at the .05 level.

Prior to conducting logistic regression analyses with the ELS:2002 data, sequential regression multiple imputation (SRMI) was used to impute missing values for the subset of variables that were planned for inclusion in the analysis. This method was implemented in IVEware: Imputation and Variance Estimation Software®. Research Triangle Institute conducted the imputation procedures and prepared the technical documentation for the analysis. This section provides justification for using the SRMI imputation method and details about the steps taken to conduct imputation procedures for the purposes of this report. More information about the SRMI procedure can be found in Raghunathan et al. (2001).

The SRMI methodology provides two main advantages. The first is that it can be used
to impute missing values for many types of variables—that is, categorical (binary
and nominal), continuous, count, and mixed^{13}—so
that imputations are tailored to the specific type of variable that is being imputed.
Categorical variables are imputed using logistic regression for binary variables
and polychotomous regression for nominal variables. Continuous variables are imputed
using linear regression. Count variables are imputed using Poisson regression. Mixed
variables are imputed using a two-stage process: the first stage imputes a binary
value, and the second stage imputes a continuous value for the first-stage imputed
values that were imputed as a value of one. For each of these types of models, one
can also include restrictions on observations that will receive an imputed value
and bounds on the range of imputed values. The second advantage of the SRMI methodology
is that it can use all of the available information in a dataset to impute each
variable. That is, it takes advantage of all the variables in a dataset to produce
the most informed and realistic imputed values. It can iterate through the variables
in the dataset several times to reinforce the relationships among variables and
improve the imputed values.

As a preliminary step, about 80 ELS:2002 variables were selected for imputation procedures. The variables selected included potential dependent and independent variables planned for the logistic regression model, as well as covariates that were not part of the model but were thought to be related to the variables with missing values. The 80 variables were assigned an appropriate "missing" code to be used in the imputation software so the software would recognize the data as missing and require imputation. Next, the variable type (i.e., categorical, continuous, count, or mixed) was identified for each variable. Lastly, bounds were set on the imputed values to identify the range of the valid responses for each variable. After these steps were completed, the data were ready for imputation. Fifty-five variables that were originally planned for use in the ELS:2002 logistic regression analysis required imputation. The percentage of missing values for these variables ranged from 0.02 to 33.64 percent (see exhibit A).

SRMI was conducted independently for each of the five imputed datasets that were created for this project. Below is a brief description of the methodology used for creating each dataset.

Let **X **be the matrix of variables that have no missing values. Also,
let there be *m* variables with missing values ordered from the variable
with the lowest percentage of missing values to the variable with the highest percentage
of missing values. These variables are denoted by the vectors **y**_{1},
**y**_{2}, ..., **y**_{m}. There were five iterations of imputations
within each of the five imputed datasets. In the first iteration, **y**_{1}
was regressed on **X **for the observations that had a valid value
for **y**_{1}. The information produced from this regression was used
to impute for the missing values of **y**_{1} to create **y**_{1*}
( **y**_{1*} indicates that the **y**_{1} vector included the
imputed values). Next, **y**_{2} was regressed on **X **and
**y**_{1*} and the information from this regression was used to impute
the missing values of **y**_{2} (thus creating **y**_{2*}).
This process continued until **y**_{m} was regressed on **X**,
**y**_{1*}, ..., **y**_{m-1*}, and the missing values for **
y**_{m} were imputed, creating **y**_{m*}. This completed the
first iteration.

For the second through fifth rounds of imputation, the same general process was
followed except that every variable (including the imputed values for the imputed
variables) other than the variable being imputed was used in the regression. For
the variable requiring imputation, the original variable (including the missing
values) is modeled. For example, to impute **y**_{i} (the original variable
with missing values) in the second iteration, we regressed **y**_{i} on
**X**, **y**_{1*}, ..., **y**_{i-1*}, **y**_{i+1*},
..., **y**_{m*}, using the imputed values, **y**_{1*}, ...,
**y**_{i-1*}, **y**_{i+1*}, ..., **y**_{m*}, from
the first iteration. For the third iteration, we regressed **y**_{i} on
**X **and the imputed values from the second round of imputation. For
the fourth iteration, we regressed **y**_{i} on **X **and
the imputed values from the third round of imputation. Finally, for the fifth iteration,
we regressed **y**_{i} on **X **and the imputed values from
the fourth round of imputation. After the fifth iteration was completed, the imputed
values, **y**_{1*}, ..., **y**_{m*}, from the fifth round of
imputation were retained for each of the five imputed datasets.

Applying this methodology to the ELS:2002 dataset, IVEware was used to produce five files that included the 55 variables selected for imputation and the other 25 variables that did not require imputation. Once the imputation procedures were completed, quality checks were performed to ensure that the imputed data had the same format as the original data. In addition, quality checks were developed specifically for both categorical and continuous variables. Distributions before and after imputation were visually reviewed to assess whether the imputed values were reasonable and to identify any significant deviations between the distributions. Furthermore, numeric checks were based on the percentages of each category for the categorical variables and on the quantiles represented by the minimum value, deciles, and maximum value for continuous variables. Large deviations in the relative proportions of imputed and unimputed values within categories or deviations for the imputed and unimputed densities for continuous variables would indicate a variable that should be investigated. The quality control checks did not detect any concerns with the imputation procedures.

One consideration in using SRMI is that it assumes that the dataset was generated from a simple random sample design. However, most complex survey designs involve stratification, clustering, and differential weighting. To account for this consideration, the survey design information—i.e., stratum and cluster (school)—and a weight were used in the imputation models.

Variable name | Variable label | Type | Count | Skip | Valid | Missing | Missing (%) | Response (%) |
---|---|---|---|---|---|---|---|---|

F2B01 | Ever applied to postsecondary school | Categorical | 15,689 | 1,650 | 14,036 | 3 | 0.02 | 99.98 |

F2A02 | Type of high school credential received—diploma/certificate/GED | Categorical | 15,689 | 12,878 | 2,808 | 3 | 0.11 | 99.89 |

F2PSEND | Last period of postsecondary education (i.e., persistence) | Categorical | 15,689 | 5,155 | 10,513 | 21 | 0.2 | 99.8 |

F2PSSTRT | When started postsecondary education | Categorical | 15,689 | 5,155 | 10,513 | 21 | 0.2 | 99.8 |

F2PS1FTP | Enrollment intensity at first postsecondary institution | Categorical | 15,689 | 5,155 | 10,511 | 23 | 0.22 | 99.78 |

F2B22 | Major declared/undeclared | Categorical | 15,689 | 7,114 | 8,551 | 24 | 0.28 | 99.72 |

F2B18A | Talk with faculty about academic matters outside of class | Categorical | 15,689 | 5,155 | 10,500 | 34 | 0.32 | 99.68 |

F2B18B | Meet with advisor about academic plans | Categorical | 15,689 | 5,155 | 10,492 | 42 | 0.4 | 99.6 |

F1S15 | Diploma or certificate most likely to receive | Categorical | 15,689 | 1,506 | 14,119 | 64 | 0.45 | 99.55 |

F2B18G | Participate in other extracurricular activities | Categorical | 15,689 | 5,155 | 10,480 | 54 | 0.51 | 99.49 |

F2PS1REM | Took math/writing/reading remedial course at 1st postsec institution | Categorical | 15,689 | 1,542 | 14,072 | 75 | 0.53 | 99.47 |

F2B18E | Participate in intramural or nonvarsity sports | Categorical | 15,689 | 5,155 | 10,471 | 63 | 0.6 | 99.4 |

F2B18F | Participate in varsity or intercollegiate sports | Categorical | 15,689 | 5,155 | 10,470 | 64 | 0.61 | 99.39 |

F1S14 | Grade level (at first follow-up) | Categorical | 15,689 | 2,064 | 13,541 | 84 | 0.62 | 99.38 |

F1S21C | Took or plans to take SAT or ACT | Categorical | 15,689 | 2,064 | 13,447 | 178 | 1.31 | 98.69 |

F1S65A | How many friends dropped out of high school | Count | 15,689 | 826 | 14,634 | 229 | 1.54 | 98.46 |

BYS37 | Importance of good grades to student | Categorical | 15,689 | 884 | 14,545 | 260 | 1.76 | 98.24 |

BYXTRACU | Number of school-sponsored activities participated in 01–02 | Count | 15,689 | 884 | 14,526 | 279 | 1.88 | 98.12 |

F1WRKHRS | F1 hours worked per week during 03–04 school year | Mixed | 15,689 | 826 | 14,566 | 297 | 2 | 98 |

F1S65D | How many friends plan to attend 4-year college/university | Count | 15,689 | 826 | 14,557 | 306 | 2.06 | 97.94 |

F1S65B | How many friends plan to have full-time job after high school | Count | 15,689 | 826 | 14,548 | 315 | 2.12 | 97.88 |

F2B29A | No longer enrolled due to completion of degree/certificate | Categorical | 15,689 | 13,730 | 1,917 | 42 | 2.14 | 97.86 |

F2C31P | Hours worked weekly during 2005–2006 school year—categorical | Continuous | 15,689 | 8,930 | 6,602 | 157 | 2.32 | 97.68 |

F1S65C | How many friends plan to attend 2-year community college or technical school | Count | 15,689 | 826 | 14,501 | 362 | 2.44 | 97.56 |

F1RGP9 | GPA for all 9th-grade courses | Continuous | 15,689 | 1,294 | 13,995 | 400 | 2.78 | 97.22 |

F2C26P | Hours worked weekly during 2004–2005 school year—categorical | Continuous | 15,689 | 8,938 | 6,515 | 236 | 3.5 | 96.5 |

BYS28 | How much likes school | Categorical | 15,689 | 884 | 14,277 | 528 | 3.57 | 96.43 |

BYS57 | Plans to continue education after high school | Categorical | 15,689 | 1,843 | 13,226 | 620 | 4.48 | 95.52 |

BYS24B | How many times cut/skip classes | Count | 15,689 | 884 | 14,039 | 766 | 5.17 | 94.83 |

BYNSPORT | BY number of interscholastic sports participated in at V or JV level | Count | 15,689 | 884 | 13,945 | 860 | 5.81 | 94.19 |

BYS33H | Ever in dropout prevention program | Categorical | 15,689 | 884 | 13,935 | 870 | 5.88 | 94.12 |

BYS33L | Ever in program to help prepare for college | Categorical | 15,689 | 884 | 13,911 | 894 | 6.04 | 93.96 |

BYS33I | Ever in special education program | Categorical | 15,689 | 884 | 13,907 | 898 | 6.07 | 93.93 |

BYS33K | Ever in career academy | Categorical | 15,689 | 884 | 13,866 | 939 | 6.34 | 93.66 |

BYS26 | High school program-student self-report | Categorical | 15,689 | 884 | 13,857 | 948 | 6.4 | 93.6 |

BYS33G | Ever in English as a Second Language program | Categorical | 15,689 | 884 | 13,844 | 961 | 6.49 | 93.51 |

BYS33D | Ever in a remedial English class | Categorical | 15,689 | 884 | 13,720 | 1,085 | 7.33 | 92.67 |

BYS33E | Ever in a remedial math class | Categorical | 15,689 | 884 | 13,685 | 1,120 | 7.57 | 92.44 |

BYP46 | 10th-grader ever held back a grade | Categorical | 15,689 | 2,491 | 12,178 | 1,020 | 7.73 | 92.27 |

BYS58 | Type of school plans to attend | Categorical | 15,689 | 2,212 | 12,345 | 1,132 | 8.4 | 91.6 |

F1SARACE | Individual race variables | Categorical | 15,689 | 0 | 14,304 | 1,385 | 8.83 | 91.17 |

F2HSATTM | High school attainment indicator (academic risk) | Categorical | 15,689 | 0 | 14,270 | 1,419 | 9.04 | 90.96 |

BYS59A | Has gone to counselor for college entrance information | Categorical | 15,689 | 2,212 | 12,220 | 1,257 | 9.33 | 90.67 |

BYS59B | Has gone to teacher for college entrance information | Categorical | 15,689 | 2,212 | 12,220 | 1,257 | 9.33 | 90.67 |

BYS59C | Has gone to coach for college entrance information | Categorical | 15,689 | 2,212 | 12,220 | 1,257 | 9.33 | 90.67 |

BYP09 | Number of siblings who dropped out of high school | Count | 15,689 | 3,233 | 11,208 | 1,248 | 10.02 | 89.98 |

BYS56 | How far in school student thinks will get | Categorical | 15,689 | 884 | 13,096 | 1,709 | 11.54 | 88.46 |

F1RGPA | Transcript reported cumulative GPA | Continuous | 15,689 | 1,294 | 12,550 | 1,845 | 12.82 | 87.18 |

BYS86A | How often discussed school courses with parents | Categorical | 15,689 | 884 | 12,248 | 2,557 | 17.27 | 82.73 |

BYS86B | How often discussed school activities with parents | Categorical | 15,689 | 884 | 12,224 | 2,581 | 17.43 | 82.57 |

BYS86G | How often discussed going to college with parents | Categorical | 15,689 | 884 | 12,097 | 2,708 | 18.29 | 81.71 |

BYS75 | How many hours usually works a week | Continuous | 15,689 | 6,151 | 6,827 | 2,711 | 28.42 | 71.58 |

BYS90F | Important to friends to finish high school | Categorical | 15,689 | 884 | 10,334 | 4,471 | 30.2 | 69.8 |

BYS90H | Important to friends to continue education past high school | Categorical | 15,689 | 884 | 10,272 | 4,533 | 30.62 | 69.38 |

BYS91 | Number of close friends who dropped out | Categorical | 15,689 | 884 | 9,824 | 4,981 | 33.64 | 66.36 |

**When started postsecondary education(F2PSSTRT). **First period of
attendance at the student's first attended postsecondary institution. For the logistic
regression analysis, students were grouped into two categories: "immediate postsecondary
enrollment" if they enrolled in their first "real" postsecondary institution by
December 2004 and "no postsecondary enrollment" if they either enrolled in their
first "real" postsecondary institution after December 2004 or they had no postsecondary
enrollment through 2006.

**First follow-up sex composite(F1SEX).**For base-year students, this
variable was constructed from the base-year student questionnaire or, where missing,
from (in order of preference) the school roster or logical imputation based on first
name.

**First follow-up student's race/ethnicity composite (restricted)(F1RACE _R).**This
race/ethnicity variable includes seven categories: (1) American Indian or Alaska
Native; (2) Asian or Pacific Islander, including Native Hawaiian; (3) Black or African
American; (4) Hispanic, no race specified; (5) Hispanic, race specified; (6) more
than one race; and (7) White. Categories 1, 2, 3, 6, and 7 exclude individuals of
Hispanic or Latino origin. For presentation in this report, categories 4 and 5 are
combined into "Hispanic or Latino." The ELS:2002 race variables reflect new federal
standards that require collecting race separately from ethnicity and allow students
to mark more than one choice for race. For base-year students, information on race/ethnicity
was obtained from the base-year student questionnaire when available or (in order
of preference) from the sampling roster, the parent questionnaire (if the parent
respondent was a biological parent), or logical imputation based on other questionnaire
items (e.g., surname, native language). For the logistic regression analysis, results
for "American Indian or Alaska Native," "Native Hawaiian/other Pacific Islander,"
and "Other" were collapsed into a single "Other race" category due to small sample
sizes.

**First follow-up socioeconomic status composite (F1SES2).** F1SES2
is a composite variable constructed from parent questionnaire data, when available,
and from imputation or student substitutions, when not. SES is based on five equally
weighted, standardized components: father's/guardian's education (F1FATHED), mother's/
guardian's education (F1MOTHED), family income (BYINCOME), father's/guardian's occupational
prestige score (from F1OCCUFATH), and mother's/guardian's occupational prestige
score (from F1OCCUMOTH). Father's and mother's education were based on parent reports
when available; otherwise, on student reports. If still missing, they were imputed.
Income was based on parent questionnaire information or imputed otherwise. The parent
questionnaire was the preferred source of data for mother's and father's occupation.
In the absence of parent questionnaire occupation data, student-supplied parent
occupation information from the base year (for base-year respondents) was coded
by project staff, if possible. Missing occupations were imputed.

**First follow-up family composition (F1FCOMP).**This variable indicates
the student's family composition and was constructed using the reports of parents
in 2002. It was coded into four categories: mother and father, mother or father
and guardian, single parent (mother or father), and other. For the logistic regression
analysis, students were grouped into two categories: "two-parent/guardian household"
and "single-parent/guardian household."

**Base-year mathematics standardized score (BYTX MSTD).**The standardized
T score provides a norm-referenced measurement of achievement: that is, an estimate
of achievement relative to the population (spring2002 10th-graders) as a whole.
It provides information on status compared to peers (as distinguished from an IRT-estimated
number-right score, which represents status with respect to achievement on a particular
criterion set of test items). The transformation to a familiar metric with a mean
of 50 and standard deviation of 10 facilitates comparisons in standard deviation
units.

**GPA for all 9th-grade courses (F1RGP9).** Students' 9th-grade GPA
was taken from high school transcript data and represents the GPA for all 9th-grade
courses, based on a four-point scale (A = 4.0; F = 0.0).

**10th-grader ever held back a grade (BYP46).**This variable, taken
directly from the parent questionnaire, indicates parents' response to the question,
"Was your tenth-grader ever held back a grade in school?"

**Base-year number of interscholastic sports participated in at varsity or junior
varsity level (BYNSPORT).**This variable is constructed based on a set
of eight interscholastic sports and indicates the number of these sports that the
student participated in during the 2001–02 school year, regardless of the level
of participation (junior varsity or varsity). The eight sports used as inputs for
this variable are baseball, softball, basketball, football, soccer, "other interscholastic
team sport," "individual interscholastic team sport," and cheerleading/drill team.
For the logistic regression analysis, students were grouped into two categories:
"participated in sports" and "did not participate in sports."

**Number of school-sponsored activities participated in during 2001–02 (BY XTRACU).**This
variable is constructed based on a set of nine school-sponsored activities and indicates
the number of these activities that the student participated in during the 2001–02
school year. The nine school-sponsored activities used as inputs for this variable
are school band/chorus, a school play or musical, student government, academic honor
society, school yearbook or newspaper, school service clubs, school academic clubs,
school hobby clubs, and school vocational clubs. For the logistic regression analysis,
students were grouped into three categories: "no extracurricular activities," "one
extracurricular activity," and "two or more extracurricular activities."

**How many times absent from school (BYS24C).**This variable, taken
directly from the student questionnaire, indicates how many times the student was
absent from school in the first semester or term of the school year: "never," "1–2
times," "3– 6 times," "7–9 times," or "10 or more times." For the logistic regression
analyses, the responses were collapsed into three categories: "absent 0–2 times,"
"absent 3– 6 times," and "absent 7 or more times."

**How many times cut/skip classes (BYS24B).**This variable, taken directly
from the student questionnaire, indicates how many times the student cut or skipped
class in the first semester or term of the school year: "never," "1–2 times," "3–
6 times," "7–9 times," or "10 or more times." For the logistic regression analysis,
the responses were collapsed into two categories: "never skipped class" and "skipped
class at least once."

**How often discussed school courses with parents (BYS86A).**This variable
indicates students' response to the survey question, "In the first semester or term
of this school year, how often have you discussed the following with either or both
of your parents or guardians? a. Selecting courses or programs at school." Response
options were "never," "sometimes," and "often." For the logistic regression analysis,
all three response options were included.

**How many hours usually works a week (BYS75). **This student questionnaire
variable is top-coded at 41 hours or more. All students who had ever worked for
pay were instructed to report the number of hours they usually work/worked each
week. Variable is based on BYS72 ("Have you ever worked for pay/are you currently
employed?") and BYS75 ("How many hours do/did you work each week on your current
or most recent job?"). For the logistic regression analysis, the data were collapsed
into "no hours," "1 to 20 hours per week," and "more than 20 hours per week."

**Number of close friends who dropped out (BYS91). **This variable
indicates students' response to the survey question, "Altogether, how many of your
close friends have dropped out of school before graduating? (Do not include those
who have transferred to another school.)." Response options include "none," "some,"
"most," or "all of them." For the logistic regression analysis, the categories were
collapsed into "no friends dropped out of high school" and "one or more friends
dropped out of high school."

**Attainment or level of last institution enrolled in through 2009(PRLVL6Y).**Indicates
the highest degree attained or, if no degree was attained, the level of the institution
where the student was enrolled in the spring of 2009. Response options for this
variable include "attained bachelor's degree," "attained associate's degree," "attained
certificate," "no degree, enrolled at 4-year," "no degree, enrolled at less-than-4-year,"
and "no degree, not enrolled." For the logistic regression analysis, the categories
"attained bachelor's degree" and "attained associate's degree" were collapsed into
"attained a degree within 6 years of postsecondary enrollment" and the categories
"no degree, enrolled at 4-year," "no degree, enrolled at less-than-4-year," and
"no degree, not enrolled" were collapsed into "did not attain a degree within 6
years of postsecondary enrollment."

**Gender(GENDER).**Indicates the student's sex.

**Race/ethnicity (RACE).**This race/ethnicity variable includes eight
categories: (1) White; (2) Black or African American; (3) Hispanic or Latino; (4)
Asian; (5) American Indian or Alaska Native; (6) Native Hawaiian/ other Pacific
Islander; (7) Other; and (8) more than one race. For the logistic regression analysis,
the results for "American Indian or Alaska Native," "Native Hawaiian/ other Pacific
Islander," "Other," and "more than one race" were collapsed into a single "Other
race" category due to small sample sizes.

**Parent's highest level of education(PAREDUC). **Indicates the highest
level of education of either parent of the student during the 2003–04 academic year.
Response options for this variable include "don't know," "did not complete high
school," "high school diploma or equivalent," "vocational or technical training,"
"less than 2 years of college," "associate's degree," "2 or more years of college
but no degree," "bachelor's degree," "master's degree or equivalent," "first-professional
degree," and "doctoral degree or equivalent." For the logistic regression, cases
with values of "don't know" were dropped from the model; the "did not complete high
school," "high school diploma or equivalent," and "vocational or technical training"
categories were collapsed into a "HS diploma or less and vocational/technical training"
category; the "less than 2 years of college," "associate's degree," and "2 or more
years of college but no degree" categories were collapsed into a "some college,
less than bachelor's degree" category; and the "bachelor's degree," "master's degree
or equivalent," "first-professional degree," and "doctoral degree or equivalent"
categories were collapsed into a "bachelor's or higher degree" category.

**Income quartile in 2003–04(INCGRP).**Indicates the income group of
the student, based on total income in 2002 for independent students or parents of
dependent students. Income groups were determined separately for dependent and independent
students based on percentile rankings and then combined into one variable.

**Highest level of high school mathematics(HCMATH).**Indicates the
highest level of mathematics that the student completed or planned to take, according
to self-reporting on the standardized test questionnaire and student interview.
Response options for this variable include "none of these," "algebra II," "trigonometry/algebra
II," "pre-calculus," and "calculus." For the logistic regression analysis, the "algebra
II" and "trigonometry/algebra II" categories were collapsed into an "algebra II/trigonometry"
category, and the "pre-calculus" and"calculus" categories were collapsed into a
"pre-calculus/calculus" category.

**Earned any college level credits in high school (CRDHS04). **Indicates
whether the student earned any college credits while he/she was in high school.

**SAT or ACT exams taken(TETOOK).**Indicates whether the student took
the SAT or ACT college entrance exam. A student is considered to have taken an exam
if the agency or institution reports a test score or the student reports in the
student interview having taken the test. Response options for this variable include
"did not take SAT or ACT," "took only the SAT," "took only the ACT," and "took both
the SAT and ACT." For the logistic regression analysis, the categories were collapsed
into a "did not take SAT or ACT" and "took an SAT or ACT."

**First institution control 2003–04(FCONTROL). **Indicates the control
of the first institution (public, private nonprofit, or private for-profit) that
the student attended during the 2003–04 academic year.

**First institution level 2003–04(FLEVEL).**Indicates the level of
the first institution that the student attended during the 2003–04 academic year.
Response options for this variable include "4-year," "2-year," and "less-than-2-
year." For the logistic regression analysis, the "2-year" and "less-than-2-year"
categories were collapsed into a "less- than-4-year" category.

**Major during first year 2003–04 (MAJORS). **Student's major or field
of study during the 2003–04 academic year. For the logistic regression analysis,
responses were collapsed into two categories: "no major declared" and "major declared."

**Remedial course 2004:Any taken (REMETOOK). **Indicates whether the
student took any remedial or developmental courses during the 2003–04 academic year.

**Frequency 2004: Meet academic advisor (FREQ04C). **Indicates whether
or how often the student met with an advisor concerning academic plans during the
2003–04 academic year. For the logistic regression analysis, the "sometimes" and
"often" categories were collapsed into a single "yes" category that indicated participation.

**Frequency 2004: School clubs (FREQ04E). **Indicates whether or how
often the student participated in school clubs during the 2003–04 academic year.
For the logistic regression analysis, the "sometimes" and "often" categories were
collapsed into a single "yes" category that indicated participation.

**Frequency 2004: School sports (FREQ04F). **Indicates whether or how
often the student participated in varsity, intramural, or club sports during the
2003–04 academic year. For the logistic regression analysis, the "sometimes" and
"often" categories were collapsed into a single "yes" category that indicated participation.

**Job 2004: hours worked per week(including work study) (JOBHOUR2).**Indicates
the average hours the student worked per week. For the logistic regression analysis,
this continuous variable was categorized into three groups: "not working," "working
less than 20 hours a week," and "working 20 or more hours a week."

**Attendance intensity pattern through 2009(ENINPT6Y).**Pattern of
enrollment intensity for all months enrolled through June 2009. Response options
forth is variable include "always full-time," "always part-time," and "mixed." For
the logistic regression analysis,the "always part-time" and "mixed" categories were
collapsed into a "not full-time" category.

**Stopouts number any where through 2009 (STNUM6Y).**Number of stopouts
at all institutions attended, as of June 2009. A stopout is defined as a temporary
withdrawal of 5 or more consecutive months from enrollment at a postsecondary institution.

**Number of transfers as of June 2009(TFNUM6Y). **Number of transfers
between institutions between entry to postsecondary education and June 2009.