Skip Navigation

High School and Beyond (HS&B) Longitudinal Study



4. SURVEY DESIGN

TARGET POPULATION

High school students who were in the 10th or 12th grade in U.S. public and private schools in spring 1980. 

SAMPLE DESIGN

HS&B was designed to provide nationally representative data on 10th- and 12th-grade students in the United States.

Base-year Survey. In the base-year, students were selected using a two-stage, stratified probability sample design, with secondary schools as the first-stage units and students within schools as the second-stage units. Sampling rates were set so as to select in each stratum the number of schools needed to satisfy study design criteria regarding minimum sample sizes for certain types of schools. The following types of schools were oversampled to make the study more useful for policy analyses: public schools with a high percentage of Hispanic students; Catholic schools with a high percentage of Black, Hispanic, and other race/ethnicity students; alternative public schools; and private schools with high-achieving students. Thus, some schools had a high probability of inclusion in the sample (in some cases, equal to 1.0), while others had a low probability. The total number of schools in the sample was 1,120, selected from a frame of 24,730 schools with grades 10 or 12 or both (there was only one school sample in the base-year for both cohorts). Within each stratum, schools were selected with probabilities proportional to the estimated enrollment in their 10th and 12th grades.

Within each school, 36 seniors and 36 sophomores were randomly selected. In schools with fewer than 36 seniors or 36 sophomores, all eligible students were drawn in the sample. Students in all but the special strata were selected with approximately equal probabilities. (The students in the special strata were selected with higher probabilities.) Special efforts were made to identify sampled students who were twins or triplets so that their co-twins or co-triplets could be invited to participate in the study.

Substitution was carried out for schools that refused to participate in the survey. There was no substitution for students who refused, for students whose parents refused, or for students who were absent on survey day and makeup days.

First Follow-up Survey. The first follow-up sophomore and senior cohort samples were based on the base-year samples, retaining the essential features of a stratified multistage design. (For details see High School and Beyond First Follow-Up (1982) Sample Design Report [Tourangeau et al. 1983].)

For the sophomore cohort, all schools selected for the base-year sample were included in the first follow-up (except 40 schools that had no 1980 sophomores, had closed, or had merged with other schools in the sample). The sample also included 17 schools that received two or more students from base-year schools; school-level data from these institutions were eventually added to students’ records as contextual information. However, these schools were not added to the existing probability sample of schools.

Sophomores still enrolled in their original base-year schools were retained with certainty since the base-year clustered design made it relatively inexpensive to resurvey and retest them. Sophomores no longer attending their original base–year schools were subsampled (i.e., dropouts, early graduates, students who transferred as individuals to a new school). Certain groups were retained with higher probabilities in order to support statistical research on such policy issues as excellence of education throughout society, access to postsecondary education, and transition from school to the labor force.

Students who transferred as a class to a different school were considered to be still enrolled if their original school had been a junior high school, had closed, or had merged with another school. Students who had graduated early or had transferred as individuals to other schools were treated as school leavers for the purposes of sampling. The 1980 sophomore cohort school leavers were selected with certainty or according to predesignated rates designed to produce approximately the number of completed cases needed for each of several different sample categories. School leavers who did not participate in the base-year were given a selection probability of 0.1.

For the 1980 senior cohort, students selected for the base–year sample had a known, nonzero chance of being selected for the first and all subsequent follow–up surveys. The first follow–up sample consisted of 11,995 selections from the base–year probability sample (including 11,500 of the 28,240 base–year participants and 495 of the 6,740 base–year nonparticipants). In addition, 204 nonsampled co–twins or co–triplets (who were not part of the probability sample) were included in the first follow–up sample, resulting in a total of 12,200 selections.

Top

High School Transcript Study (1980 Sophomore Cohort). Subsequent to the first follow-up survey, high school transcripts were sought for a probability subsample of nearly 18,500 members of the 1980 sophomore cohort. The subsampling plan for the transcript study emphasized the retention of members of subgroups of special relevance for education policy analysis. Compared to the base-year and first follow-up surveys, the transcript study sample design further increased the overrepresentation of certain race/ethnicity groups, students who attended private high schools, school dropouts, transfers, early graduates, and students whose parents completed the base-year Parent Questionnaire on financing postsecondary education. Transcripts were collected and processed for nearly 16,000 members of the sophomore cohort.

Second and Third Follow-up Surveys. The sample for the second follow-up survey of the 1980 sophomore cohort was based upon the design of the High School Transcript Study. A total of 14,830 cases were selected from the nearly 18,500 sample members retained for the transcript study. The second follow-up sample included disproportionate numbers of sample members from policy-relevant subpopulations. The sample for the senior cohort in the second follow-up consisted exactly of those sample members selected into the first follow-up sample. The senior and sophomore cohort samples for the third follow-up survey were the same as those used for the second follow-up. The third follow-up was the last survey conducted for the senior cohort. Postsecondary school transcripts were collected for all members of the senior cohort who reported attending any form of postsecondary schooling in either of the follow-up surveys. Over 7,000 individuals reported more than 11,000 instances of postsecondary school attendance.

Fourth Follow-up Survey. The fourth follow-up was composed solely of members of the sophomore cohort, and consisted exactly of those students selected into the second and third follow-up sample. For any student who had ever enrolled in postsecondary education, complete transcript information was requested from the institutions indicated by the student.

Top

Data Collection and Processing

HS&B compiled data from six primary sources: students, school administrators, teachers, parents of selected students, high school administrative records (transcripts), and postsecondary administrative records (transcripts and financial aid). Data collection began in fall 1979 (when information from school administrators and teachers was first gathered) and ended in 1993 (when postsecondary transcripts of sophomore cohort members were collected). The National Opinion Research Center (NORC) at the University of Chicago was the contractor for the HS&B project.

Reference dates. In the base-year survey, most questions referred to the students’ experience up to the time of the survey administration in spring 1980 (i.e., all 4 high school years for the senior cohort and the first 2 high school years for the sophomore cohort). In the follow-ups, most questions referred to experiences that occurred between the previous survey and the current survey. For example, the second follow-up largely covered the period between 1982 (when the first follow-up was conducted) and 1984 (when the second follow-up was conducted).

Data collection. In both the base-year and first follow-up surveys, it was necessary to secure a commitment to participate in the study from the administrator of each sampled school. For public schools, the process began by contacting the chief state school officer. Once approval was gained at the state level, contact was made with district superintendents and then with school principals. Wherever private schools were organized into an administrative hierarchy (e.g., catholic school dioceses), approval was obtained at the superior level before approaching the school principal or headmaster. The principal of each cooperating school designated a school coordinator to serve as a liaison between the NORC staff, school administrator, and selected students. The school coordinator (most often a senior guidance counselor) handled all requests for data and materials, as well as all logistical arrangements for student-level data collection on the school premises.

In the 1980 base-year survey, a single data collection method—on-campus administration—was used for both the sophomore and senior cohorts. In the first follow-up, most members of the sophomore cohort (nearly all of whom were then in the 12th grade) were resurveyed using methods similar to those of the base-year survey. However, since some of the 1980 sophomores had left school by 1982, the first follow-up survey involved on-campus administration for in-school respondents as well as off-campus group administration for school leavers (transfers, dropouts, early graduates). On-campus surveys generally were similar to those used in the base-year. Off-campus survey sessions were held afterward for school leavers in the sophomore cohort. Personal or telephone interviews were conducted with individuals who did not attend the sessions. Members of the 1980 senior cohort were surveyed primarily by mail. Nonrespondents to the mail survey (approximately 25 percent) were interviewed either in person or by telephone.

By the time of the second follow-up, the sophomore cohort was out of school. Thus, in the second (1984) and third (1986) follow-ups, data for both the sophomore and senior cohorts were collected through mailed questionnaires. Telephone and personal interviews were conducted with sample members who did not respond to the mailed survey within 2 to 3 months. Only the sophomore cohort was surveyed in the fourth follow-up (1992). Computer-assisted telephone interviewing (CATI) was used to collect these data. The CATI program included two instruments; the first was used to locate and verify the identity of the respondent, while the second contained all of the survey questions. The average administration time for an interview was 30.6 minutes. Intensive telephone locating and field intervention procedures were used to locate respondents and conduct interviews.

Top

Data processing. Although procedures varied across survey waves, all Student Questionnaires in all waves were checked for missing critical items. Approximately 40 items in each of the main survey instruments were designated as critical or “key” items. Cases failed this edit, if a codable response was missing for any of the key items. Such cases were flagged and then routed to the data retrieval station, where staff called respondents to obtain missing information or otherwise resolve the edit failure.

The base-year procedures for data control and preparation differed significantly from those in the follow-up surveys. Since the base-year student instruments were less complex than later instruments, the completed documents were sent directly from the schools to NORC’s optical scanning subcontractor for conversion to machine-readable form. The scanning computer was programmed to perform the critical item edit on Student Questionnaires and to generate listings of cases missing critical data, which were then sent to NORC for data retrieval. School and Parent Questionnaires were converted to machine-readable form by the conventional key-to-disk method at NORC.

All follow-up questionnaires were sent to NORC for receipt control and data preparation prior to being shipped to the scanning subcontractor. The second follow-up survey contained optically scannable grids for the answers to numeric questions; staff examined numeric responses for correct entry (e.g., right justification, omission of decimal points). In the third follow-up, a portion of the instrument was designed for computer-assisted data entry (CADE), while the rest was prepared for optical scanning. All major skip items and all critical items were entered by CADE. With this system, operators were able to combine data entry with the traditional editing procedures. The CADE system stepped question by question through critical and numeric items, skipping over questions that were slated for scanning and questions that were legitimately skipped because of a response to a filter question. Ranges were set for each question, preventing the accidental entry of illegitimate responses. CADE operators were also responsible for the critical item edit; those critical items that did not pass the edit were flagged for retrieval, both manually and by the CADE system. After the retrieved data were keyed, questionnaires were shipped to the scanning firm.

For the fourth follow-up, a CATI program captured the data at the time of the interview. The CATI program examined the responses to completed questions and used that information to route the interviewer to the next appropriate question. It also applied the customary edits, described below under “Editing.” At the conclusion of an interview, the completed case was deposited in the database ready for analysis. There was minimal post-data entry cleaning because the interviewing module itself conducted the majority of necessary edit checking and conversion functions. A CADE system was designed to enter and code transcript data.

The first through fourth follow-ups required coding of open-ended responses on occupation and industry; postsecondary schools; major field of study for each postsecondary school; licenses, certificates, and other diplomas received; and military specialized schools, specialty, and pay grade. Coding was compatible with the coding done in NLS:72, using the same sources from NCES and the U.S. Bureau of the Census. (See NLS chapter) In the first follow-up, staff also coded open-ended questions in the Early Graduate and Transfer supplements, and transformed numeric responses to darkened ovals to facilitate optical scanning. In the third follow-up, all codes were loaded into a computer program for more efficient access. Coders typed in a given response, and the program displayed the corresponding numeric code.

In the fourth follow-up, interviewers received additional coding capabilities by temporarily exiting the CATI program and executing separate programs that assisted them in coding the open-ended responses. Data from the coding programs were automatically sent to the CATI program for inclusion in the dataset. In addition to the online coding tasks, interviewers recorded verbatim descriptions of industry and occupation. The coding scheme for industry in the fourth follow-up was a simplified version of the scheme used in previous rounds of HS&B (verbatim responses are available for more detailed coding). The coding scheme for occupation was adapted from verbatim responses received in the third follow-up. Postsecondary institutions were coded with Federal Interagency Committee on Education (FICE) codes.

Top

Editing. In addition to the critical item edit described above, a series of edits checked the data for out-of-range values and inconsistencies between related items. In the base-year, machine editing was limited to examining responses for out-of-range values. No interim consistency checks were performed since there was only one skip pattern.

In the first and second follow-ups, several sections of the questionnaire required respondents to follow skip instructions. Computer edits were performed to resolve inconsistencies between filter and dependent questions, detect illegal codes, and generate reports on the incidence of correctly and incorrectly answered questions. After improperly answered questions were converted to blanks, the student data were passed to another program for conversion to appropriate missing-data codes (e.g., “legitimate skip,” “refused”). Detection of out-of-range codes was completed during scanning for all questions except those permitting an open-ended response. Hand-coded data for open-ended questions (occupation, industry, institution, field of study) were matched by computer against lists of valid codes.

In the third follow-up, CADE carried out many of the steps that normally occur during machine editing. The system enforced skip patterns, range checking, and appropriate use of reserved codes—allowing operators to deal with problems or inconsistencies while they had the document in hand. For scanned items, the same machine-editing steps as those used in prior follow-ups were implemented. Since most of the filter questions were CADE-designated items, there were few filter-dependent inconsistencies to be handled in machine editing.

In the fourth follow-up, machine editing was replaced by the interactive edit capabilities of the CATI program, which tested responses for valid ranges, data field size, data type (numeric or text), and consistency with other answers or data from previous rounds. If the system detected an inconsistency due to a keying error by the interviewer, or if the respondent simply realized that he or she had made a reporting error earlier in the interview, the interviewer could go back and change the earlier response. As the new response was entered, all of the edit checks performed at the first response were again performed. The system then worked its way forward through the questionnaire using the new value in all skip instructions, consistency checks, and the like until it reached the first unanswered question, and control was then returned to the interviewer. When problems were encountered, the system could suggest prompts for the interviewer to use in eliciting a better or more complete answer.  

Top

Estimation Methods

Weighting is used to adjust for sampling and unit nonresponse.

Weighting. The weights are based on the inverse of the selection probabilities at each stage of the sample selection process and on nonresponse adjustment factors computed within weighting cells. While each wave provided weights for statistical estimation, the fourth follow-up weights can illustrate the concept of weighting. The fourth follow-up generated survey data and postsecondary transcript data. Weights were computed to account for nonresponse in both of these data collections.

First, a raw weight, unadjusted for nonresponse in any of the surveys, was calculated and included in the data file. The raw weight provided the basis for analysts to construct additional weights adjusted for the presence of any combination of data elements. However, caution should be used if the combination of data elements results in a sample with a high proportion of missing cases. For the survey data, two weights were computed. The first weight was computed for all fourth follow-up respondents. The second weight was computed for all fourth follow-up respondents who also participated in the base-year survey and in the first, second, and third follow-up surveys.

Two additional weights were computed to facilitate the use of the postsecondary transcript data. The collection of transcripts was based upon sophomore cohort reports of postsecondary attendance during either the third or fourth follow-up. A student may have reported attendance at more than one school. The first transcript weight was computed for students for whom at least one transcript was obtained. It is therefore possible for a student who was not a respondent in the fourth follow-up (but who was a respondent in the third follow-up) to have a nonzero value for the first transcript weight. The second transcript weight is more restrictive. It was designed to assign weights only to cases that were deemed to have complete data. Only students who responded during the fourth follow-up (and hence students for whom a complete report of postsecondary education attendance was available and for whom all requested transcripts were received) were assigned a nonzero value for the second transcript weight. For students who did not complete the fourth follow-up interview, complete transcripts may have been obtained in the 1987 transcript study, but since it was not certain that these transcripts were complete, they were given a weight of zero.

Imputation. No imputation was performed in HS&B.

Top