NCES Handbook of Survey Methods - Early Childhood Longitudinal Study, Kindergarten Class of 1998

Early Childhood Longitudinal Study, Kindergarten Class of 1998–99 (ECLS-K)

5. DATA QUALITY AND COMPARABILITY

Sampling Errors and Weighting

The sample of children enrolled in kindergarten in the United States in 1998–99 selected for the ECLS–K is just one of many possible samples that could have been selected. Therefore, estimates produced from the ECLS–K sample may differ from estimates that would have been produced from other samples. This type of variability is called sampling error because it results from collecting data on a sample of children, rather than all children enrolled in kindergarten in the United States in 1998–99. The standard error is a measure of variability due to sampling when estimating a statistic.

Standard errors can be used as a measure of the precision expected from a particular sample.

For a complex sample design such as the one employed in the ECLS–K, replication and Taylor Series methods have been developed to correctly estimate variance. These methods take into account the clustered, multistage sampling design and the use of differential sampling rates to oversample targeted subpopulations. Both replication and Taylor Series methods can be used to accurately analyze data from the studies. The paired jackknife replication method using replicate weights can be used to compute approximately unbiased estimates of the standard errors of the estimates. When using the Taylor Series method, a different set of stratum and first-stage unit (i.e., PSU) identifiers should be used for each set of weights. Both replicate weights and Taylor series identifiers are provided as part of the ECLS–K data files.

Design Effects. An important analytic consideration is how the statistical efficiency of survey estimates from a complex sample survey such as the ECLS–K compares with estimates that would have been obtained had a simple random sample (SRS) of the same size been used. In a stratified clustered design, stratification generally leads to a gain in efficiency over simple random sampling, but clustering has the opposite effect because of the positive intracluster correlation of the units in the cluster. The basic measure of the relative efficiency of the sample is the design effect. A large number of data items were collected from students, parents, teachers, and schools. Each item has its own design effect that can be estimated from the survey data. The median child–level design effect is 4.7 for fall kindergarten and 4.1 for spring kindergarten. The median child–level design effect for spring third grade, spring fifth grade, and spring eighth grade is 3.3, 4.0, and 3.1, respectively.

The size of the ECLS–K design effects is largely a function of the number of children sampled per school. With about 20 children sampled per school, an intraclass correlation of 0.2 might result in a design effect of about 5. The median design effect is 3.4 for the panel of students common to both the fall and spring of kindergarten, and the lower median design effect is due to the smaller cluster size in the panel. The ECLS-K design effects are slightly higher than the average of 3.8 (with the exception of the spring third–grade collection and spring eighth–grade collection design effect) that was anticipated during the design phase of the study, both for estimates for proportions and for score estimates.

The median teacher–level design effect is 2.5 for both the fall and spring of kindergarten. This design effect is lower than the child–level design effects because the number of responding teachers per school is relatively small. The design effect for teachers is largely a result of selecting a sample using the most effective design for child–level statistics, rather than a design that would be most effective for producing teacher–level statistics. The median school–level design effect for the base year is 1.6. Design effects were not computed for items from the teacher and school administrator questionnaires in the springs of first, third, fifth, and eighth grades because no teacher or school weights were computed for any of the ECLS–K years after kindergarten.

Nonsampling Error

Nonsampling error is the term used to describe variations in the estimates that may be caused by population coverage limitations, as well as data collection, processing, and reporting procedures. The sources of nonsampling errors are typically nonresponse, differences in respondents’ interpretations of the meaning of the questions, response differences related to the particular time the survey was conducted, and mistakes in data preparation. Steps are taken to reduce nonsampling error.

In order to reduce nonsampling error associated with respondents misunderstanding what was being asked of them, the survey design phase included focus groups and cognitive laboratory interviews for the purposes of assessing respondent knowledge of different topics covered in the instruments, comprehension of questions and terms, and item sensitivity. The design phase also included testing of the CAPI/CATI instruments and a field test that evaluated the implementation of the survey in order to reduce the potential for error to be introduced as a result of errors in administration.

Another potential source of nonsampling error is respondent bias that occurs when respondents systematically misreport (intentionally or unintentionally) information in a study. One potential source of respondent bias in the ECLS surveys is social desirability bias. If there are no systematic differences among specific groups under study in their tendency to give socially desirable responses, then comparisons of the different groups will accurately reflect differences among the groups. An associated error occurs when respondents give unduly positive assessments about those close to them. For example, parents may give more positive assessments of their children’s experiences than might be obtained from institutional records or from the teachers.

Response bias may also be present in the responses teachers provide about each individual student. For example, each teacher filled out a survey for each of the sampled children they taught in which they answered questions on the child’s socioemotional development. Since the base–year and first–grade surveys in the ECLS–K were first conducted in the fall, it is possible that the teachers did not have adequate time to observe the children, and thus some of their responses may be influenced by their expectations based on the children’s outward characteristics (e.g., sex, race, ELL status, disability status). In order to minimize bias, all items were subjected to multiple cognitive interviews and field tests, and actual teachers were involved in the design of the cognitive assessment battery and questionnaires. NCES also followed the criteria recommended in a working paper on the accuracy of teachers’ judgments of students’ academic performances (see Perry and Meisels 1996).

As in any survey, response bias may be present in the data for ECLS–K. It is not possible to state precisely how such bias may affect the results. NCES has tried to minimize some of these biases by conducting one–on–one, untimed assessments, and by asking some of the same questions about the sampled child of both teachers and parents.

Coverage error. Undercoverage occurs when the sampling frame from which a sample is selected does not fully reflect the target population of inference. The potential for coverage error in the ECLS–K was reduced by using a school–level frame derived from universe surveys of all schools in the United States and master lists of all kindergartners enrolled in sampled schools.

By designing the child assessments to be both individually administered and untimed, both coverage error and bias were reduced. Untimed, individually administered exams allowed the study to include most children with special needs and/or who needed some type of accommodation, such as children with a learning disability, with hearing aids, etc. The only children who were excluded from the direct child assessments were those who needed a Braille assessment, those who needed a sign language interpreter, those whose IEP clearly stated that they were not to be tested, and non–English–speaking children who lacked adequate English or Spanish language skills to meaningfully participate in the ECLS–K battery. Exclusion from the direct child assessment did not exclude children from other parts of the study (e.g., teacher questionnaire, parent interview).

Nonresponse error. Bias can exist in survey data if too few sampled units responded for the data collected to be representative of the target population or if nonresponse is significantly higher for sample entities with certain characteristics. The unit response rate is a round–specific rate in that it indicates the proportion of the eligible sample responding to a survey at a particular time point. For a longitudinal study such as the ECLS–K, it is also useful to calculate a longitudinal response rate, also called an overall unit response rate, which takes into account response for all rounds of collection. A total of 940 of the 1,280 originally sampled schools participated in at least one round of data collection during the base year of the study. This translates into a weighted school response rate (weighted by the base weight) of 74 percent for the base year of the study. The weighted child base–year survey response rate was 92 percent (i.e., 92 percent of the children were assessed at least once during kindergarten). The weighted parent base–year unit response rate was 89 percent (i.e., a parent interview was completed at least once during kindergarten). Thus, the overall base-year response rate for children was 68 percent (74 percent of schools times 92 percent of sampled children) and the base–year overall response rate for the parent interview was 66 percent (74 percent of schools times 89 percent of parents of sampled children). See table ECLS–K–1 for details on weighted response rates.

A nonresponse bias analysis was conducted to determine if substantial bias was introduced due to school nonresponse in the ECLS–K. Five different approaches were used to examine the possibility of bias in the ECLS–K sample.

First, weighted and unweighted response rates for schools, children, parents, teachers, and school administrators were examined to see whether there were large response rate differences by characteristics of schools (e.g., urbanicity, region, school size, percent Black, Hispanic, and other race/ethnicity students, grade range) and children (e.g., sex, age, race/ethnicity).

Second, estimates based on the ECLS–K respondents were compared to estimates based on the full sample. The distributions of schools by school type, urbanicity, and region, and the distributions of enrollment by kindergarten type (public vs. private), race/ethnicity, urbanicity, region, and eligibility for free and reduced–price lunch were compared for the responding schools and all the schools in the sampling frame.

Third, estimates from the ECLS–K were compared with estimates from other data sources (e.g., Current Population Survey, National Household Education Surveys Program, Survey of Income and Program Participation).

Fourth, estimates using the ECLS–K unadjusted weights were compared with estimates using the ECLS–K weights adjusted for nonresponse. Large differences in the estimates produced with these two different weights would indicate the potential for bias.

Fifth, and last, simulations of nonresponse were conducted. The results of these analyses are summarized in the ECLS–K user’s manuals. Findings from these analyses suggest that there is no bias due to school nonresponse.

Table ECLS–K–1. Weighted child-level response rates and overall response rates for children in ECLS-K, by collection and selected components: Various years 1998–2007
Selected components	Kindergarten, 1998-99	Grade 1, 1999–2000	Grade 3, Spring 2002	Grade 5, Spring 2004	Grade 8, Spring 2007
Response rates
Child assessment	88.0	87.2	80.1	83.9	75.0
Parent interview	83.9	83.5	76.9	88.3	70.9
School administrator questionnaire	85.9	75.9	65.5	76.4	72.5
Teacher–level questionnaire, classroom information (Part A)	86.9	77.6	61.7	79.3	73.8
Teacher–level questionnaire, teacher information (Part B)	89.7	77.0	61.6	-	-
Child–level teacher questionnaire (Part C)	85.9	77.4	62.0	78.7	72.5
Overall response rates
Child assessment	65.1	56.8	45.5	38.2	28.6
Parent interview	62.1	51.8	39.9	35.2	25.0
School administrator questionnaire	63.6	48.2	31.6	24.1	17.5
Teacher–level questionnaire, classroom information (Part A)	64.3	49.9	30.8	25.0	18.4
Teacher–level questionnaire, teacher information (Part B)	66.4	51.1	31.5	-	-
Child–level teacher questionnaire (Part C)	63.6	49.2	30.5	24.0	17.4
— Not available. In the grade 5 and grade 8 collection, there was only one part in the teacher–level questionnaire. The response rate is reported in the teacher–level questionnaire, classroom information (Part A) row of the table. NOTE: Overall response rates are the product of the school–level response rate from the base year (i.e., 74.0 percent) and the completion rates from each round of data collection after the base year. SOURCE: National Center for Education Statistics. (2009). Eighth–Grade Methodology Report (NCES 2009–003), table 6–14. U.S. Department of Education. Institute of Education Sciences.

Measurement error. There was a concern in the ECLS–K that the individual mode of administration might introduce additional and unwanted variance into both the individual and between-school components of variance in the cognitive scores. Since it is more difficult to standardize test administrations when tests are individually administered, this source of variance could contribute to high design effects if the individual assessors differed systematically in their modes of administration. A multilevel analysis was carried out to estimate components of variance in the fall– and spring–kindergarten cognitive scores associated with (1) the student, (2) the school, (3) the data collection team leader, and (4) the individual test administrator. It was found that the component of variance associated with the individual test administration effect was negligible in all cognitive areas and thus had little or no impact on the design effects.