Because the sample design for the HS&B cohorts involved stratification, disproportionate sampling of certain strata, and clustered probability sampling, the calculation of exact standard errors (an indication of sampling error) for survey estimates can be difficult and expensive.
Sampling error estimates for the first and second HS&B follow-ups were calculated by the method of Balanced Repeated Replication (BRR) using BRRVAR, a Department of Education statistical subroutine. (The BRR programs WesVar and SUREG are now available commercially.) For the base year and the third and fourth follow-ups, Taylor Series approximations were employed. More detailed discussions of the BRR and Taylor Series procedures can be found in the High School and Beyond Third Follow-Up Sample Design Report (Spencer et al. 1987). The Data Analysis System (DAS), included as part of the public-release file, automatically reports design-corrected Taylor Series standard errors for the tables it generates. Therefore, users of the DAS do not need to make adjustments to these estimates.
While design effects cannot be calculated for every estimate of interest to users, design effects will be similar from item to item within the same subgroup or population. Users can calculate approximate standard error estimates for items by multiplying the standard error under the simple random sample assumption by the square root of the average design effect for the population being studied.
Coverage error. Bias caused by explicit exclusion of certain groups of schools and students (e.g., special types of schools or students with disabilities or language barriers) is not addressed in HS&B technical reports. Potential coverage error in HS&B may relate to the exclusion of schools that refused to cooperate in the base-year survey. Students who refused to participate in the base-year survey were not excluded in the follow-ups. Since students were randomly selected from the sampled schools, the HS&B sample design did not entail exclusion of specified groups. (See “Sample Design,” above, in section 4.)
Nonresponse error.
Unit nonresponse. HS&B base–year student-level estimates include two components of unit nonresponse bias: bias introduced by nonresponse at the school level, and bias introduced by nonresponse on the part of students attending cooperating schools. At the school level, some schools refused to participate in the base-year survey. Substitution was carried out for refusal schools within a stratum when there were two or more schools within the stratum. The bias introduced by base-year school-level refusals is of particular concern since it carried over into successive rounds of the survey. Students attending refusal schools were not sampled during the base-year and had no chance for selection into subsequent rounds of observation. To the extent that these students differed from students from cooperating schools in later waves of the study, the bias introduced by base-year school nonresponse would persist. Student nonresponse did not carry over in this way since student nonrespondents remained eligible for sampling in later waves of the study.
In general, the lack of survey data for nonrespondents prevents the estimation of unit nonresponse bias. However, during the first follow-up, School Questionnaire data were obtained from most of the base-year refusal schools, and student data were obtained from most of the base-year student nonrespondents selected for the first follow-up sample. These data provide a basis for assessing the magnitude of unit nonresponse bias in base-year estimates.
Overall, 1,120 schools were selected in the original sample, and 811 of those schools (72 percent) participated in the survey. An additional 204 schools were drawn in a replacement sample. Student refusals and absences resulted in a weighted student completion rate of 88 percent in the base-year survey. Participation was higher in most follow-up surveys. Completion rates in the first follow-up were as follows: 94 percent for seniors; 96 percent for sophomores eligible for on-campus survey administration; and 89 percent for sophomores who had left school between the base-year and first follow-up surveys (dropouts, transfer students, and early graduates). In the second follow-up, 91 percent of senior cohort members and 92 percent of sophomore cohort members completed the survey. In the third follow-up, completion rates were 88 percent for seniors and 91 percent for sophomores. Only the sophomore cohort was surveyed in the fourth follow-up; 86 percent of the sample members participated.
As results from the fourth follow-up illustrate, student nonresponse varied by demographic and educational characteristics. Males had a slightly higher nonresponse rate than females (a difference of slightly over 3 percent). Blacks and Hispanics showed similarly high rates of nonresponse (around 20 percent), whereas nonresponse among White students was about 10 percent. Nonresponse increased as socioeconomic status decreased. Students who were in general or vocational programs during the base-year were more likely to be nonrespondents than students in academic programs. Dropouts had higher nonresponse rates than other students. Students with lower grades and lower test scores showed higher nonresponse than students with higher grades and test scores. Students who were frequently absent from school showed higher nonresponse than students absent infrequently. Students with no postsecondary education by the time of the second follow-up had higher nonresponse than students with some postsecondary education. By selected school characteristics, the highest nonresponse rates were among students from alternative public schools, schools with large enrollments, schools in urban areas, and schools in the Northeast and West.
The patterns were similar in earlier rounds of HS&B. Nonresponse analyses conducted by NORC support the following general conclusions:
The first and second conclusions together suggest that nonresponse bias is not a major contributor to error in base-year estimates. The first and third conclusions suggest that nonresponse bias is not a major contributor to error in the first, second, and third follow-up estimates either. The first and fourth conclusions suggest that the fourth follow-up nonresponse bias might be a little greater than for the previous follow-ups, but probably not by much. Each of these conclusions must be given some qualifications. The analysis of school-level nonresponse is based on data concerning the schools, not the students attending them. The analyses of student nonresponse are based on survey data and are themselves subject to nonresponse bias. Despite these limitations, the results consistently indicate that nonresponse had a small impact on base-year and follow-up estimates.
Item nonresponse.Among students who participated in the survey, some did not complete the questionnaire or gave invalid responses to certain questions. The amount of item nonresponse varied considerably by item. For example, in the second follow-up, a very low nonresponse rate (0.1 percent) was observed for a question asking whether the respondent had attended a postsecondary institution. A much higher nonresponse rate (12.2 percent) was obtained for a question asking if the respondent had used a micro- or minicomputer in high school. Typical item nonresponse rates ranged from 3 to 4 percent.
Imputation was not used to compensate for item nonresponse in HS&B. However, an attempt was made in the fourth follow-up to reduce item nonresponse. In previous rounds, interviews were conducted by self-administered questionnaires (SAQs). Unfortunately, respondents often skipped questions incorrectly or gave unrecognizable answers. Thus, more data were missing than would have occurred through personal interviewing. In the fourth follow-up, interviewing was conducted using a CATI program. Unlike SAQs, CATI interviewing virtually eliminated missing data attributable to improperly skipped questions.
To evaluate the effectiveness of CATI interviewing, 25 items from both the third and fourth follow-up data were selected for comparison. Refusal and “don’t know” responses were considered to be missing, but legitimate skips were not. For these 25 items, the overall percentage of missing items dropped from 4.36 percent in the third follow-up to 1.88 percent in the fourth follow-up.
CATI also eliminated all multiple responses and resulted in uncodable verbatim responses for only the two income variables. In addition, more was known about the missing data in the fourth follow-up. In the third follow-up, only 7.2 percent of the missing data were classified as refusals or “don’t know” responses. In the fourth follow-up, 50.9 percent of the missing data were classified as refusals or “don’t know” responses. The fact that most of the 25 comparisons showed a “very significant” decline in missing data supports the contention that missing data were reduced in the fourth follow-up.
Measurement error. An examination of consistency between responses to the third and fourth follow-ups provides an indication of the reliability of HS&B data.
Race/ethnicity. Race/ethnicity is one characteristic of the respondents that should not change between surveys. Overall, of the 12,310 respondents who reported their race/ethnicity on both questionnaires, 93.8 percent gave the same response in both years. However, certain race/ethnicity categories (e.g., Native American) had substantially less agreement. Only 53.4 percent of the respondents who classified themselves as Native Americans during the third follow-up classified themselves as Native Americans again during the fourth follow-up.
One explanation for these discrepancies may be the change in the method of survey administration. Unlike the third follow-up, which involved self-administered questionnaires, the fourth follow-up was conducted by telephone. The questionnaires mailed during the third follow-up had the five race/ethnicity categories listed for the respondent to see. In the fourth follow-up, respondents were simply asked over the telephone, “What is your race/ethnicity?” The interviewer coded the response. It is possible that Native Americans, Hispanics, and Asian/Pacific Islanders classified themselves as Black or White (not knowing that there was a more specific category for them to choose from), hence resulting in more Blacks and Whites in the fourth follow-up results.
Marital status. In the third follow-up, respondents were asked about their marital status in the first week of February 1986. In the fourth follow-up, respondents were asked about their marital status during and since February 1986. Although both questions asked about marital status during February 1986, respondents who had a change in marital status during the last 3 weeks of February could have given a different answer in the fourth follow-up than in the third follow-up. Overall, of the 11,850 respondents who gave their marital status in both questionnaires, 95.4 percent had answers that agreed.
Unlike the race/ethnicity question, memory and timing play an important role in matching answers for marital status. In this case, the recall period for third follow-up respondents was years shorter than the recall period for respondents in the fourth follow-up. Respondents in the third follow-up, which took place in spring 1986, were asked about a recent event. Respondents in the fourth follow-up, which was conducted in spring 1992, were asked to recall their status back in February 1986. As with the race/ethnicity question, the method of administering the question differed between rounds—namely, the question formatting had changed and the fourth follow-up used preloaded data to verify marital status.
A goal of the National Longitudinal Studies Program is to allow comparative analysis of data generated in several waves of the same study as well as to enable cross-cohort comparisons with the other longitudinal studies. While the HS&B and NLS:72 studies are largely compatible, a number of variations in sample design, questionnaires, and data collection methods should be noted as a caution to data users.
Comparability within HS&B. While many data items were highly compatible across waves, the focus of the questionnaires necessarily shifted over the years in response to the changes in the cohorts’ life cycle and the concerns of education policymakers. For seniors in the base-year survey and for sophomores in both the base-year and first follow-up surveys, the emphasis was on secondary schooling. In subsequent follow-ups, increasingly more items were collected dealing with postsecondary education and employment. Also, a major change in the data collection method occurred in the fourth follow-up, when CATI was introduced as the primary approach. Earlier waves used mailed questionnaires supplemented by telephone and personal interviews.
Comparability with NLS:72. The HS&B was designed to build on NLS:72 in three ways. First, the HS&B base-year survey included a 1980 cohort of high school seniors that was directly comparable to the NLS:72 cohort (1972 seniors). Replication of selected 1972 Student Questionnaire items and test items made it possible to analyze changes subsequent to 1972 and their relationship to federal education policies and programs in that period. Second, the introduction of the sophomore cohort in HS&B provided data on the many critical educational and vocational choices made between the sophomore and senior years in high school, thus permitting a fuller understanding of the secondary school experience and how it affects students. Third, HS&B expanded the NLS:72 focus by collecting data on a range of life cycle factors, such as family formation, labor force behavior, intellectual development, and social participation.
The sample design was largely similar for both HS&B and NLS:72, except that HS&B included a sophomore sample in addition to a senior sample. The questionnaires for the two studies contained a large number of identical (or similar) items dealing with secondary education and postsecondary work experience and education. The academic tests were also highly comparable. Of the 194 test items administered to the HS&B senior cohort in the base- year, 86 percent were identical to items that had been given to NLS:72 base-year respondents. Item response theory (IRT) was used in both studies to put math, vocabulary, and reading test scores on the same scale for 1972, 1980, and 1982 seniors. With the exception of the use of CATI in the HS&B fourth follow-up, both NLS:72 and HS&B used group administration of questionnaires and tests in the earliest surveys and mailed questionnaires in the follow-ups. HS&B, however, involved more extensive efforts to supplement the mailings by telephone and personal interviews.
Comparability with NELS:88. The sample design of HS&B was also similar to that of NELS:88. In each base-year, students were selected through a two-stage stratified probability sample, with schools as the first-stage units and students within schools as the second-stage units. Because NELS:88 base-year sample members were eighth-graders in 1988, its follow-ups encompass students (both in the modal grade progression sequence and out of sequence) and dropouts. Despite similarities, however, the sample designs of the two studies differ in three major ways: (1) the NELS:88 first and second follow-ups had relatively variable, small, and unrepresentative within-school student samples, compared to the relatively uniform, large, and representative within-school student samples in the HS&B; (2) unlike the earlier study, NELS:88 did not provide a nationally representative school sample in its follow-ups; and (3) there were differences in school and subgroup sampling and oversampling strategies in the two studies. These sample differences imply differences in the respondent populations covered. (For details on NELS:88, please refer to NELS chapter).
Comparability with ELS:2002. The ELS:2002 base-year and first follow-up surveys contain many data elements that are comparable to items from the HS&B. Differences in sampling rates, sample sizes, and design effects across the studies, however, affect the precision of estimation and comparability. Asian students, for example, were oversampled in ELS:2002, but not in HS&B, where their numbers were quite small. The base-year (1980) participating sample in HS&B numbered 30,030 sophomores; in contrast, 15,362 sophomores participated in the base-year of ELS:2002. Cluster sizes within schools were much larger for HS&B (on average, 30 sophomores per school) than for ELS:2002 (just over 20 sophomores per school); larger cluster sizes are better for school effects research, but carry a penalty in greater sample inefficiency. Mean design effect (a measure of sample efficiency) is also quite variable across the studies. For example, for 10th grade, the design effect was 2.9 for HS&B, while a more favorable design effect of 2.4 was achieved for the ELS:2002 base-year. (For details on ELS:2002, please refer to ELS chapter).