Skip Navigation

National Household Education Surveys Program (NHES)

5. Data Quality and Comparability

In addition to the data quality activities inherent in the NHES design and survey procedures, activities designed specifically to assess data quality are undertaken for each collection. Reinterviews and analysis of coverage bias are two activities conducted during many survey administrations. Other data quality activities address specific concerns related to a topical survey. Issues of data quality and comparability are discussed below.

Sampling Error

In surveys with complex sample designs, such as NHES, direct estimates of sampling errors that assume a simple random sample will typically underestimate the variability in the estimates. Therefore, to accurately estimate variance, users must employ special calculations. The two major methods of producing approximate standard errors for complex samples are replication methods and Taylor series approximations. Special software is available for both methods, and the NHES data support either type of analysis. (Further information on the use of replication and Taylor Series methods is provided in A Guide to Using Data from the National Household Education Survey (Collins and Chandler, 1997 .)

Taylor series stratum variables and replicate weights have been included in all of the NHES data files to make this application relatively simple. Various software packages, such as SAS, R and Stata survey packages, WesVar and SUDAAN, can properly apply these weights. For NHES: 2016, the estimates and standard errors were produced using the jackknife 1 option as a replication procedure. See also NHES: 2016 Data File User’s Manual (McPhee et al ., 2018) for more specific information.


Nonsampling Error

Sample estimates also are subject to bias from nonsampling errors; however, it is more difficult to measure the magnitude of these errors. They can arise for a variety of reasons: nonresponse; undercoverage; differences in respondents’ interpretations of the meaning of questions; memory effects; misrecording of responses; incorrect editing, coding, and data entry; time effects; or errors in data processing.

Coverage error. Every household survey is subject to some undercoverage bias—the result of some members of the target population being either deliberately or inadvertently missed in the survey. Telephone surveys, such as NHES administrations prior to 2012, are subject to an additional source of bias because not all households in the United States have telephones. Raking adjustments can reduce such coverage bias, though no adjustments have been found to adequately reduce the amount of bias across all measures that might be affected by coverage issues. Additionally, as the coverage bias increases, it becomes more difficult for raking to adequately adjust ( see, e.g., Montaquilla, Brick, & Brock , 1997 ).

After NHES:2007, decreasing response rates and concerns regarding noncoverage of households without a landline telephone required NCES to redesign NHES. This redesign involved changing the sampling frame from a list-assisted Random Digit Dial (RDD) to an Address-Based Sample (ABS) frame. The NHES sample contains all types of residential addresses in order to ensure the best possible coverage of households in the United States. Addresses include street and city-style addresses, high rises, rural routes, PO Boxes, and addresses flagged as seasonal, vacant, drop points (a single postal delivery point for multiple housing units), PO Box throwbacks (a street address where the mail is delivered to a customer’s PO box), and educational addresses (addresses identified as an educational facility such as colleges, universities, dormitories, and apartment buildings occupied by students). The mode of data collection has also changed from an interviewer-led telephone interview to a self-administered paper and pencil questionnaire mailed to respondents. The raking of the person-level weights was still required in order to align the person-level weights with the person-level control totals and adjust for differential coverage rates at the person level.

Nonresponse error. Nonresponse in NHES surveys is handled in ways designed to minimize the impact on data quality—through weighting adjustments for unit nonresponse and through imputation for item nonresponse.

Unit nonresponse. Household members are identified for extended interviews in a two-stage process. First, screener interviews are conducted to enumerate and sample households for the extended interviews. The failure to complete the first-stage screener means that it is not possible to enumerate and interview members of the household. The completion rate for the first stage is the percentage of screeners completed by households. The completion rate for the second stage is the percentage of sampled and eligible persons with completed interviews. The survey response rate is the product of the first- and second-stage completion rates (screener completion rate x interview completion rate = survey response rate). All of the rates are weighted by the inverse of the units’ probability of selection (see table NHES-1).

NHES: 2016 sampling frame variables were used for the unit nonresponse bias analysis for the screener and topical surveys. Analysis of unit nonresponse bias showed evidence of bias based on the distributions of the sample characteristics for the survey respondents when compared to the full eligible sample. However, this bias was greatly reduced by the nonresponse weighting adjustments. See “Bias study” section below for further discussion, as well as the NHES:2016 Data File User’s Manual (McPhee et al ., 2018).

Item nonresponse. For most of the items collected in the NHES surveys, the item response rate is high. For example, for the ECPP and PFI surveys in NHES:2012, the median item response rates were 96.4 percent and 97.9 percent, respectively.

Measurement error. In order to assess item reliability and inform future NHES surveys, many administrations also included a subsample of respondents for a reinterview. Reinterviews were conducted for ECE-NHES:1991; both SR-NHES:1993 and SS&D-NHES:1993; AE-NHES:1995; both PFI-NHES:1996 and YCI-NHES:1996; and ASPA-NHES:2001, AEWR-NHES:2003, and AE-NHES:2005.

In a reinterview, the respondent is asked to respond to the same items on different occasions. In order to limit the response burden of the reinterview program, only selected items are included in the reinterview. The item selection criteria focus on the inclusion of key survey statistics (e.g., frequency of reading to children), items that are expected to have a potential for measurement error based on cognitive laboratory or field-test findings, and items required to control the question skip patterns for the reinterview. The results of the reinterviews are used to modify subsequent NHES surveys and to give some guidance to users about the reliability of responses for specific items in the data files (see, e.g., Use of Cognitive Laboratories and Recorded Interviews in the National Household Education Survey [Nolin 1997].) However, the reinterview procedure does not account for all measurement errors in the interviewing process, such as systematic errors that would be made in both the original interview and the reinterview.

Bias study. NHES:2016 included a bias analysis to evaluate whether nonresponse at the unit and item levels impacted the estimates. The term “bias” has a specific technical definition in this context: it is the expected difference between the estimate from the survey and the actual population value. For example, if all households were included in the survey (i.e., if a census was conducted rather than a sample survey), the difference between the estimate from the survey and the actual population value (which includes persons who did not respond to the survey) would be the bias due to unit nonresponse. Since NHES is based on a sample, the bias is defined as the expected or average value of this difference over all possible samples. Unit nonresponse bias, or the bias due to the failure of some persons or households in the sample to respond to the survey, can be substantial if either the difference between respondents and nonrespondents or the unit nonresponse rate is relatively large. The bias analysis included several analyses.

At the screener phase, significant differences were observed between respondents and the eligible sample in the distributions of characteristics available in or linked to the sample frame. Similarly, for each topical survey, significant differences were observed between respondents and the eligible sample in the distributions of characteristics available in or linked to the sample frame or collected on the screener. However, this observed bias was greatly reduced by the nonresponse weighting adjustments.

In another set of analyses, base-weighted key survey estimates for each topical survey were compared between (1) early and late screener respondents to assess the potential for bias resulting from screener-level nonresponse and (2) early and late topical respondents to assess the potential for bias resulting from topical-level nonresponse. To the extent that late respondents resemble nonrespondents in the characteristics measured by the NHES survey instruments, differences between early and late respondents suggest a potential for unit nonresponse bias in the estimates.

In another set of bias analyses, key survey estimates using the base weights and key estimates using the nonresponse adjusted weights were compared. Only a small number of measurable differences were observed. This suggests that few of these variables were powerful predictors of unit response. Therefore, the unit nonresponse adjustment had little effect on any potential bias. However, it is possible that little bias needed to be removed. It also is possible that unit nonresponse bias may still be present in other variables that were not studied. For this reason, it is important to consider other methods of examining unit nonresponse bias. One such method is benchmarking, or comparing final NHES survey estimates to estimates from external sources. When estimates from the NHES:2016 surveys were compared with external estimates—from the CPS, the ACS, and previous administrations of NHES—some measurable differences were found. However, the majority of the differences were between estimates from the NHES:2016 and the previous administration of the NHES, 4 years prior to the current one; therefore, changes in the population across time are likely.

Data Comparability

Due to declining response rates for all telephone surveys, and due to the increase in the number of households that use cellphones instead of landlines, the 2012 data collection method was changed to a mail survey. As a result, readers should use caution when comparing estimates to prior NHES administrations. However, the NHES data can be compared with estimates from several other large-scale data collections, as described in the “Comparisons of topical data” section below.

Comparisons of methodology. For analysts wanting to compare the NHES surveys with another household survey, the Survey of Income and Program Participation (SIPP)—a longitudinal household survey conducted by the U.S. Bureau of the Census—provides an appropriate comparison. The first wave of data collection in SIPP is always done by personal visit to the household. Subsequent data collection is conducted primarily by telephone but may also be done in person. The response rates for SIPP are much higher than those that could be expected using an RDD screening sample, as in the NHES program. With personal interviews, there are more opportunities to obtain participation (including activities such as speaking with neighbors), and it is easier to demonstrate the importance of the sampled person’s cooperation. It should be noted that, while the difference in response rates is largely the result of the different modes of sampling and data collection, the Census Bureau’s response rates are generally higher than those achieved by other collection organizations.

Comparisons of topical data. Specific data from NHES surveys can be compared with data from several other surveys, as described below. Please note that after the 2007 collection, NHES was redesigned to use an address-based sample and self-administered paper and pencil surveys delivered and returned through the mail. The mode change required revisions to item wording and may affect the comparability of estimates from NHES data from 1991-2007 to those from NHES:2012 onward.

Early childhood program participation. Over the years, several NHES surveys have collected similar information on young children’s care and education. Estimates from ECPP-NHES:2016 can be compared with previous NHES surveys: ECPP-NHES:2012, SR-NHES:2007, ECPP-NHES:2005, ECPP-NHES:2001, ECPP-NHES: 1995, ECE-NHES:1991, and SR-NHES:1993. Please note that surveys prior to 2012 required revisions to item wording, which may affect comparability.  Estimates from ECPP-NHES:2016 can also be compared with data from CPS, SIPP and the ECLS program. The CPS October Education Supplement collects information on nursery school enrollment. (See Current Population Survey chapter.) CPS estimates of participation in early childhood programs and estimates of retention in early grades can be compared with NHES:2016 estimates. Additionally, SIPP (described above) periodically includes a supplement that collects information on the child care and early childhood program participation of children of mothers who are employed or enrolled in school or job training which is comparable with NHES data. Finally, the ECLS program (see the Early Childhood Longitudinal Study chapters) provides data to study a wide range of family, school, community and individual variables and their relationship to children’s development, early learning and early performance in school.

Adult training and education. ATES-NHES:2016 is a new survey that collected information on educational attainment, prevalence and characteristics of certifications and licenses and their holders, prevalence and characteristics of educational certificates and certificate holders, and completion and key characteristics of work experience programs such as apprenticeships and internships. ATES-NHES:2016 data cannot be compared to prior NHES data because ATES-NHES:2016 consists of new questions that focus on “adult training” whereas previous NHES surveys focused on “adult education”. Estimates from ATES- NHES:2016 may be comparable with NAAL (see National Assessment of Adult Literacy), which collects information on English literacy among American adults age 16 and older.

Parent and family involvement in education. Estimates from PFI-NHES:2016 can be compared to previous NHES surveys: PFI-NHES:2012, PFI-NHES:2007, PFI-NHES:2003, PFI-NHES:1996. Please note that surveys prior to 2012 required revisions to item wording, which may affect comparability.  Estimates from PFI-NHES:2016 may also be comparable with ELS:2002 and HSLS:09 (see the Educational Longitudinal Study of 2002 and the High School Longitudinal Study of 2009 chapters). ELS:2002 obtains information not just from students and their school records, but also from students’ parents. HSLS:09 also obtains information on parent involvement of students from the beginning of high school into postsecondary education.

Table NHES-1. Weighted response rates for selected NHES surveys: 2012 and 2016
Questionnaire Screener/1st stage Interview/2nd stage  Overall
ECPP-NHES:2012 73.8 78.7 58.1
PFI-NHES:2012 73.8 78.4 57.8
ATES-NHES:2016 66.4 73.1 48.5
ECPP-NHES:2016 66.4 73.4 48.7
PFI-NHES:2016 66.4 74.3 49.3
SOURCE: NHES methodology reports; available at