Skip Navigation

National Education Longitudinal Study of 1988 (NELS 88)



5. Data Quality and Comparability

A number of studies have been conducted to address data quality issues relating to the NELS:88 project. During the course of data collection and processing, systematic efforts were made to monitor, assess, and maximize data quality. Subsequently, studies were conducted to evaluate the data quality in NELS:88 in comparison with that in earlier longitudinal surveys.

Sampling Error

Because the NELS:88 sample design involved stratification, disproportionate sampling of certain strata, and clustered (i.e., multistage) probability sampling, the calculation of exact standard errors (an indication of sampling error) for survey estimates can be difficult and expensive. For NELS:88, the Taylor series procedure has typically been used to calculate the standard errors.

Standard errors and design effects for about 30 key variables in each NELS:88 wave from the base year through the fourth follow-up were calculated using SUDAAN software. These can be used to approximate the standard errors if users do not have access to specialized software.

Design effects. A comparative study of design effects across NELS:88 waves and between NELS:88 and HS&B was done. When comparing NELS:88 base-year Student Questionnaire data to the results from HS&B–the 30 variables from the NELS:88 Student Questionnaire were selected to overlap as much as possible with those variables examined in HS&B– the design effects indicate that the NELS:88 sample was slightly more efficient than the HS&B sample. The smaller design effects in the NELS:88 base year may reflect its smaller cluster size (24 students plus, on average, two oversampled Hispanics and Asian from each NELS:88 school vs. the 36 sophomore and 36 senior selections from each HS&B school). The mean design effect for base‑year students is 2.54.

In the comparative study of design effects across NELS:88 waves, the design effects in the subsequent follow‑up studies were somewhat higher than those in the base year, a result of the subsampling procedures used in the follow‑ups. The mean design effects for students and dropouts are 3.90 for the first follow‑up, 3.70 for the second follow‑up, 2.90 for the third follow‑up, and 3.90 for the fourth follow‑up. See the NELS:88 Base Year Through Second Follow‑up Final Methodology Report (Ingels et al. 1998) and the User’s Manual: NELS:88 Base‑Year to Fourth Follow‑up: Student Component Data File (Curtin et al. 2002).  

Nonsampling Error

Coverage error. Exclusion and undercoverage of certain groups of schools and students in NELS:88 generated coverage error. In the base‑year survey, for example, students who had linguistic, mental, or physical obstacles were excluded from the study.

Consequently, the national populations for such student groups were not fully covered by the sample.

To correct this coverage bias, the Base‑Year Ineligible (BYI) Study collected eligibility information for 93.9 percent of the sample members excluded in the base‑year survey. For those who were reclassified as eligible in the BYI Study, Student or Dropout Questionnaires were administered in person or over the telephone during the first follow‑up. Cognitive tests were also administered to a small percentage of these students. For students who remained ineligible, school enrollment status and other key characteristics were obtained. The BYI Study permitted an evaluation of coverage bias in NELS:88 and a means of reducing undercoverage by identifying newly eligible students who could then be added into the sample to ensure cross‑sectional representativeness. This effort also provided a basis for making corrected dropout estimates, taking into account both 1988‑eligible and 1988‑ineligible 8th‑graders 2 years later. For further detail on the BYI Study, see Sample Exclusion in NELS:88: Characteristics of Base Year Ineligible Students; Changes in Eligibility Status After Four Years (Ingels 1996).

Nonresponse error. Both unit nonresponse (nonparticipation in the survey by a sample member) and item nonresponse (missing value for a given questionnaire/test item) have been evaluated in NELS:88 data.

Unit nonresponse. In the NELS:88 base‑year survey, the initial school response rate was 69 percent. This low rate prompted a follow‑up survey to collect basic characteristics from a sample of the nonparticipating schools. These data were then compared to the same characteristics among the participating schools to assess the possible impact of response bias on the survey estimates. The school‑level nonresponse bias was found to be small to the extent that schools could be characterized by size, control, organizational structure, student composition, and other factors. Bias at the school level was not assessed for the follow‑up surveys because (1) sampling for the first and second follow‑ups was student‑driven (i.e., the schools were identified by following student sample members) and the third and fourth follow‑ups did not involve schools; and (2) school cooperation rates were very high (up to 99 percent). Even if a school refused to cooperate, individual students were pursued outside of school (although school context data were not collected). The student response rates are shown in table 5.

Student‑level nonresponse analysis was conducted with a focus on panel nonresponse since a priority of the NELS:88 project is to provide a basis for longitudinal analysis. Nonresponse was examined for the 8th‑grade and 10th‑grade cohorts. Any member of the 8th‑grade cohort who did not complete a survey in three rounds (base year, first follow‑up, and second follow‑up) and any member in the 10th‑grade cohort who did not complete a survey in the second and third rounds (first and second follow‑ups) was considered a panel nonrespondent for that cohort. Panel nonresponse to cognitive tests in the two cohorts was defined the same way. The nonresponse rate was defined as the proportion of the selected students (excluding deceased students) who were nonrespondents in any round in which data were collected.

Nonresponse rates for both cohorts were calculated by school‑ and student‑level variables that were assumed to be stable across survey waves (e.g., sex and race). These variables allowed comparisons between participants and nonparticipants even though the data for the latter were missing in some rounds. Estimates were made with both weighted and unweighted data. The weight used was the second follow‑up raw panel weight (not available in the public‑release dataset). About 18 percent of the 8th‑grade cohort and 10 percent of the 10th‑grade cohort were survey nonrespondents at one or more points in time. Approximately 43 percent of the 8th‑grade cohort and 35 percent of the 10th‑grade cohort did not complete one or more cognitive tests in their rounds of testing.

Nonresponse bias was calculated as the difference in the estimates between the respondents and all selected students. On the whole, the analysis revealed only small discrepancies between the two cohorts. Bias estimates were higher, however, for the 8th‑grade cohort than for the 10th‑grade cohort because of the 8th‑grade cohort’s more stringent definition of participation. The discrepancies between cognitive test completers and noncompleters were larger than between survey participants and nonparticipants; this pattern held for both cohorts. In brief, the magnitude of the bias was generally small—few percentage estimates were off by as much as 2 percent in the 8th-grade cohort and 1 percent in the 10th‑grade cohort. Such bias reflects the raw weight. The nonresponse-adjusted weight should correct for differences by race and sex to produce correct population estimates for each subgroup.

Further analysis was done using several other student and school variables. The results showed rather similar patterns of bias. When compared with estimates from HS&B, the student nonresponse bias estimates in NELS:88 were consistently lower. However, the two studies seem to share certain common patterns of nonresponse. For example, both studies generated comparatively higher nonresponse rates among students enrolled in schools in the West, Black students, students in vocational or technical programs, students in the lowest test quartile, and dropouts.

Item nonresponse. Item nonresponse was examined in base‑year through second follow‑up data obtained from surveys of students, parents, and teachers. Differences emerged among student subgroups in the level of nonresponse to a wide range of items—from language background, family composition, and parents’ education to perception of school safety. Nonresponse was often two to five times as great for one subgroup as for the other subgroups. High item nonresponse rates were associated with such attributes as not living with parents, having low SES, being male, having poor reading skills, and being enrolled in a public school. Compared with parent nonresponse to items about college choice and occupational expectations, student nonresponse rates were generally lower. For items about student’s language proficiency, classroom practices, and student’s high school track, students had consistently lower nonresponse rates than their teachers did. See the NELS:88 Survey Item Evaluation Report (McLaughlin, Cohen, and Lee 1997) for further detail.

Top

Measurement error. NCES has conducted studies to evaluate measurement error in (1) student data (compared to parent and teacher data); and (2) student cognitive test data.

Parent‑student convergence and teacher‑student convergence. A study of measurement error in data from the base‑year through second follow‑up surveys focused on the convergence of responses by parents and students and by teachers and students. (See the NELS:88 Survey Item Evaluation Report [McLaughlin, Cohen, and Lee 1997].) Response convergence (or discrepancy) across respondent groups can be interpreted as an indication of measurement reliability, validity, and communality, although the data are often not sufficient to determine which response is more accurate.

The student and parent components of this study covered such variables as number of siblings, the student’s work experience, language background, parents’ education, parent‑student discussion of issues, perceptions about school, and college and occupation expectations. Parent‑student convergence varied from very high to very low, depending on the item. For example, convergence was high for number of siblings, regardless of student‑level characteristics such as SES, sex, reading scores, public versus private school enrollment, and whether or not living with parents. In contrast, parent‑student convergence was low for items related to the student’s work experience; there was also more variation across student subgroups for these items. In general, convergence tended to be high for objective items, for items worded similarly, and for nonsensitive items.

Teacher‑student convergence was examined through variables about student’s English proficiency, classroom practices, and student’s high school track. Again, convergence was found to vary considerably across data items and student subgroups.

Convergence was high for student’s native language but low for student’s English proficiency. Across student subgroups, there was a greater range in correlations for English proficiency than for native language. Teachers and students differed quite dramatically on items about classroom practices.

Cognitive test data. In‑depth studies of measurement error issues related to cognitive tests administered in the base‑year through second follow‑up surveys are also available. See the Psychometric Report for the NELS:88 Base Year Test Battery (Rock and Pollack 1991) and the Psychometric Report for the NELS:88 Base Year Through Second Follow‑up (Rock and Pollack 1995).

The first study (Rock and Pollack 1991) addressed issues related to test speediness (the limited testing time in relation to the outcome), reliability, item statistics, performance by racial/ethnic and gender groups, and IRT parameters for the battery. The results indicate that the test battery either met or exceeded all of its psychometric objectives. Specifically, the study reported: (1) while the allotted testing time was only 1˝ hours, quite acceptable reliability was obtained for the tests on reading comprehension, mathematics, history/citizenship/ geography, and, to a somewhat lesser extent, science; (2) the internal consistency reliability was sufficiently high to justify the use of IRT scoring and, thus, provide the framework for constructing 10th‑ and 12th-grade forms that would be adaptive to the ability levels of the students; (3) there was no consistent evidence of differential item functioning (item bias) for gender or racial/ethnic groups; (4) factor analysis results supported the discriminant validity of the four tested content areas; convergent validity was also indicated by salient loadings of testlets composed of “marker items” on their hypothesized factors; and (5) in addition to providing the usual normative scores in all four tested areas, behaviorally anchored proficiency scores were provided in both the reading and math areas.

The second study (Rock and Pollack 1995) focused on issues relating to the measurement of gain scores. Special procedures were designed into the test battery design and administration to minimize the floor and ceiling effects that typically distort gain scores. The battery used a two‑stage multilevel procedure that attempted to tailor the difficulty of the test items to the performance level of a particular student. Thus, students who performed very well on their 8th‑grade mathematics test received a relatively more difficult form in 10th grade than students who had not performed well on their 8th‑grade test. There were three forms of varying difficulty in mathematics and two in reading in both grades 10 and 12. Since 10th‑ and 12th‑graders were taking forms that were more appropriate for their level of ability and achievement, measurement accuracy was enhanced and floor and ceiling effects could be minimized. The remaining two content areas—science and history/citizenship/geography—were only designed to be grade‑level adaptive (i.e., a different form for each grade but not multiple forms varying in difficulty within grade).

To maximize the gain from using an adaptive procedure, special vertical scaling procedures were used that allow for Bayesian priors on subpopulations for both item parameters and scale scores. In comparing more traditional non‑Bayesian approaches to scaling longitudinal measures with the Bayesian approach, it was found that the multilevel approach did increase the accuracy of the measurement. Furthermore, when used in combination with the Bayesian item parameter estimation, the multilevel approach reduced floor and ceiling effects when compared to the more traditional IRT approaches.

Top

Data Comparability

NELS:88 is designed to facilitate both longitudinal and trend analyses. Longitudinal analysis calls for data compatibility across survey waves whereas trend analysis requires data compatibility with other longitudinal surveys. Data compatibility issues may relate to survey instruments, sample design, and data collection methods.

Comparability within NELS:88 across Survey Waves. A large number of variables are common across survey waves. (See the NELS:88 Second Follow‑up Student Component Data File User’s Manual [Ingels et al. 1994] for a listing of common Student Questionnaire variables in the base year, first follow‑up, and second follow‑up.) However, compatibility of NELS:88 data across waves can still be an issue because of subtle differences in question wording, sample differences (e.g., with or without dropouts and freshening students, sample attrition, nonresponse), and data collection methods (e.g., on‑campus group session, off‑campus individual survey, telephone interview).

One NCES study compared 112 pairs of variables repeated from the base year to the first and second follow‑up surveys. (See the NELS:88 Survey Item Evaluation Report [McLaughlin, Cohen, and Lee 1997].) These variables cover student family, attitudes, education plans, and perceptions about schools. The results suggest that the interpretations of NELS:88 items depend on the age level at which they were administered. Data convergence tended to be higher for pairs of first and second follow‑up measures than for pairs of base‑year and second follow‑up measures. Some measures were more stable than others. Students responded nearly identically to the base‑year and second follow‑up questions about whether English was their native language. Their responses across survey waves were also fairly stable as to whether their curriculum was intended to prepare them for college, whether they planned to go to college, and their religiosity. It should be noted that cross‑wave discrepancies may reflect a change in actual student behavior rather than a change in response for a status quo situation.

Comparability within NELS:88 across Respondent Groups. While different questionnaires were used to collect data from different respondent groups (students, parents, teachers, school administrators), there are overlapping items among these instruments. One study examined the extent to which the identical or similar items in different questionnaires generated compatible information. It found considerable discrepancies between students and parents, and even greater discrepancies between students and teachers, in their responses to selected groups of overlapping variables. (See “Measurement error” above.)

Top

Comparability with NLS:72, HS&B, and ELS:2002. NELS:88 surveys contain many items that are also covered in NLS:72, HS&B, and ELS:2002—a feature that enables trend analyses of various designs. (See the NELS:88 Second Follow‑up Student Component Data File User’s Manual [Ingels et al. 1994] for a cross‑walk of common variables and a discussion of trend analyses.) To examine data compatibility across the four studies, one should consider their sample designs and data contents, including questionnaires, cognitive tests, and transcript records.

Sample designs for the four studies are similar. In each base year, students were selected through a two-stage stratified probability sample, with schools as the first‑stage units and students within schools as the second‑stage units. In NLS:72, all baseline sample members were spring term 1972 high school seniors. In HS&B, all members of the student sample were spring term 1980 sophomores or seniors. In ELS:2002, the base‑year sample students were 10th‑graders. Because NELS:88 base‑year sample members were 8th‑graders in 1988, its follow‑ups encompass students (both in the modal grade progression sequence and out of sequence) and dropouts. Sample freshening was used in NELS:88 to provide cross‑sectional nationally representative samples. Despite similarities, however, the sample designs of the four studies differ in three major ways: (1) the NELS:88 first and second follow‑ups had relatively variable, small, and unrepresentative within‑school student samples, compared to the relatively uniform, large, and representative within-school student samples in NLS:72 and HS&B; (2) unlike the two earlier studies, NELS:88 did not provide a nationally representative school sample in its follow‑ups; and (3) there were differences in school and subgroup sampling and oversampling strategies in the four studies. These sample differences imply differences in the respondent populations covered by the four studies.

Questionnaire overlap is apparent among the four studies; nevertheless, caution is required when making trend comparisons. Some items were repeated in identical form across the studies; others appear to be essentially similar but have small differences in wording or response categories.

IRT scaling was used in the four studies to put math, vocabulary, and reading test scores on the same scale for 1972, 1980, 1982, and 2002 seniors. Additionally, there were common items in the HS&B and NELS:88 math tests that provide a basis for equating 1980– 1990 and 1982–1992 math results, and common items in the NELS:88 and ELS:2002 reading and math tests that provide the link to obtain the ELS:2002 student ability estimates on the NELS:88 ability scale. In general, however, the tests in the four studies differed in many ways. Although group differences by standard deviation units may profitably be examined, caution should be exercised in drawing time‑lag comparisons for cognitive test data.

Transcript studies in NELS:88, HS&B, ELS:2002, and NAEP were designed to support cross-cohort comparisons. The ELS:2002, NAEP, and NELS:88 studies, however, provide summary data in Carnegie units, whereas HS&B provides course totals. Note too that course offerings were only collected from schools that were part of the High School Effectiveness Study in the NELS:88 second follow-up, whereas course offerings were collected from all schools in HS&B (see HS&B chapter), and course offerings were collected from all base-year schools and the last school attended by sample members who transferred out of their base-year school in ELS:2002 (see ELS chapter).

Other factors should also be considered in assessing data compatibility. Differences in mode and time of survey administration across the cohorts may affect compatibility. NELS:88 seniors were generally surveyed earlier in the school year than were NLS:72 seniors. NLS:72 survey forms were administered by school personnel while HS&B and NELS:88 survey forms were administered primarily by contractor staff. There were also differences in questionnaire formats; the later tests had improved mapping and different answer sheets.

Table 5. Unit‑level and overall weighted response rates for selected NELS:88 student populations, by data collection wave
 
Unit‑level weighted response rate
Population Base‑year
school
level
Base‑year
student
level 
1st
follow‑up
2nd
follow‑up
3rd
follow‑up
4th
follow‑up
Interviewed students 69.71 93.4 91.1 91.0 90.9 82.1
Tested students 69.71 96.5 94.1 76.6
Dropouts 69.71 91.0 88.0
Tested dropouts 69.71 48.6 41.7
  Overall weighted response rate
Base‑year
school
level
Base‑year
student
level 
1st
follow‑up
2nd
follow‑up
3rd
follow‑up
4th
follow‑up
Interviewed students 69.71 65.1 63.5 63.4 63.4 57.2
Tested students 69.71 67.3 65.6 53.4
Dropouts 69.71 63.4 61.3
Tested dropouts 69.71 33.9 29.1
† Not applicable.
1Unweighted response rate. 
SOURCE: Curtin, T.R., Ingels, S.J., Wu, S., and Heuer, R. (2002). User's Manual: NELS:88 Base‑Year to Fourth Follow‑up: Student Component Data File (NCES 2002‑323). National Center for Education Statistics, U.S. Department of Education. Washington, DC. Spencer, B.D., Frankel, M.R., Ingel, S.J., Rasinski, K.A., and Tourangeau, R. (1990). NELS:88 Base‑Year Sample Design Report (NCES 90‑463). National Center for Education Statistics, U.S. Department of Education. Washington, DC.

Top