Appendix B: Methodology and Technical Notes
The Early Childhood Longitudinal Study, Kindergarten Class of 1998?99 (ECLS-K), is being conducted by Westat for the U.S. Department of Education, National Center for Education Statistics (NCES). It is designed to provide detailed information on children?s early school achievement and experiences. The study began in the fall of the 1998?99 school year. The children participating in the ECLS-K are being followed longitudinally from kindergarten through the fifth grade. Estimates in this report are based on data collected from and about children who entered kindergarten for the first time in the fall of 1998 and who were assessed in English in the fall and spring of kindergarten, the spring of 2000 when most of the children were in first grade, and the spring of 2002 when most were third-graders.
A nationally representative sample of 22,782 children enrolled in 1,277 kindergarten programs during the 1998?99 school year was selected to participate in the ECLS-K. The children attended both public and private kindergartens that offered full-day and part-day programs. The sample includes children from different racial/ethnic and socioeconomic backgrounds, and includes oversamples of Asian/Pacific Islander children, private kindergartens, and private kindergartners.
Sampling for the ECLS-K involved a dual-frame, multistage sampling design. The first stage of sampling involved the selection of 100 primary sampling units (PSU) from a national sample of PSUs. The PSUs were counties and county groups. Public and private schools were then selected within the PSUs, and children were sampled from the selected schools. Public schools were selected from the NCES 1995?96 Common Core of Data (CCD) Universe File, which is a public school frame, and private schools were selected from a private school frame developed from the 1995?96 Private School Survey (PSS), another NCES survey.37 Approximately 23 kindergartners were selected in each of the sampled schools. In the spring of first grade, the sample was freshened to obtain a nationally representative sample of first-graders by bringing into the study first-graders who were not enrolled in kindergarten during the 1998?99 school year and therefore did not have an opportunity for selection in the base year.
While all students still enrolled in their baseyear schools were recontacted, a 50 percent subsample of base-year students who had transferred from their kindergarten school was followed. For information on freshening procedures and subsampling of transfer children (i.e., movers), refer to the ECLS-K First-grade Public-Use Data Files User?s Manual (NCES 2002). Fall kindergarten data were obtained from September to December 1998, with 80 percent of the assessments conducted between early October and mid-November. The spring kindergarten data were obtained from March to June 1999, with 80 percent of the assessments conducted between mid- April and late May. Spring first-grade data were obtained from March to July 2000, and spring thirdgrade data were obtained from March to July 2002, with 80 percent of the assessments at each round conducted between early April and late May.
A total of 944 of the 1,277 originally sampled schools participated during the base year of the study. This translates into a weighted response rate of 74 percent for the base year of the study. The school response rate during the spring of the base year (74.2 percent) was higher than during the fall (69.4 percent); this was due to some of the schools that originally declined to participate deciding to participate in the spring. Nearly all (99.4 percent) of the schools that participated in the fall of the base year also participated in the spring.
The child base-year survey completion rate was 92 percent (i.e., 92 percent of the children were assessed at least once during kindergarten). The parent base-year completion rate was 89 percent (i.e., a parent interview was completed at least once during kindergarten). Thus, the overall base-year response rate for children was 68.1 percent (74 percent x 92 percent) and the base-year response rate for the parent interview was 65.9 percent (74 percent x 89 percent). About 95 percent of the children and 94 percent of the parents who participated in the fall of kindergarten also participated in the spring. About 88 percent of the children and 85 percent of the parents who were eligible for the spring first-grade collection participated. In the spring of 2002, about 80 percent of the children and 77 percent of the parents who were eligible participated.
A nonresponse bias analysis was conducted to determine if substantial bias was introduced as a result of the base-year school nonresponse. For information on the nonresponse bias analysis, refer to the ECLS-K Base Year Public-Use Data Files User?s Manual (NCES 2001) and the ECLS-K First-grade Public-Use Data Files User?s Manual (NCES 2002). Findings from these analyses suggest that there is not a bias due to nonresponse.
The item nonresponse rates for the variables used in this report are low, ranging from 0 to about 1.5 percent for the analysis sample. A few of the variables were fully imputed and have no missing data (e.g., income, mother?s education). More information on item missing data can be found in the ECLS-K Grade Three Electronic Codebook (NCES 2004?002).
Estimates produced using data from the ECLS?K are subject to two types of error, sampling and nonsampling errors. Nonsampling errors are errors made in the collection and processing of data. Sampling errors occur because the data are collected from a sample rather than a census of the population. A detailed discussion of these types of errors can be found in America?s Kindergartners (West, Denton, and Germino Hausken 2000).
Standard Errors and Weights
In order to produce national estimates from the ECLS-K data collected during the kindergarten and first-grade year, the sample data were weighted. Weighting the data adjusts for unequal selection probabilities at both the school and the child levels, and also adjusts for school, child, teacher, and parent nonresponse. The first stage of the weighting process assigns weights to the sampled primary sampling units (PSUs) that are equal to the inverse of the PSU probability of selection.38 The second stage of the weighting process assigns weights to the schools sampled within PSUs. The base weight for each sampled school is the PSU weight multiplied by the inverse of the probability of selecting the school. The base weights for eligible schools are adjusted for nonresponse. These adjustments are made separately for public and private schools.
The base weight for each child in the sample is the school nonresponse adjusted weight for the school the child attends, multiplied by a post-stratified within-school student weight (total number of students in the school, divided by the number of students sampled in the school). The child panel weight (C1_5FC0), which is the weight used to produce the estimates found in this report, is the base-year child weight adjusted for nonresponse to the child assessments at each round of data collection. Only those cases with child assessment data in both fall and spring of kindergarten, spring of first grade, and spring of third grade are included in this weighting procedure.39 Again, these adjustments are made separately for public and private schools. This weight sums to the population of all children who attended kindergarten in the fall of 1998.
In addition to properly weighting the responses, special procedures for estimating the statistical significance of the estimates were employed, because the data were collected using a complex sample design. Complex sample designs, like that used in the ECLS-K, result in data that violate the assumptions that are normally required to assess the statistical significance of the results. Frequently, the standard errors of the estimates in complex samples are larger than would be expected if the sample was a simple random sample and the observations were independent and identically distributed random variables. Replication methods of variance estimation were used to reflect multistage sample design used in the ECLS-K. Using WesVar PC statistical software,40 the jackknife replication method (JK2) was used with 90 ECLS-K replicate weights to compute approximately unbiased estimates of the standard errors of the estimates in the report. The jackknife method was used to estimate the precision of the estimates of the reported national percentages, means, and regression coefficients.
Comparisons made in the text were tested for statistical significance to ensure
that the differences are larger than might be expected due to sampling variation.
When comparing estimates between categorical groups (e.g., sex, race/ethnicity),
t statistics were calculated. The formula used to compute the t statistic
Where Est1 and Est2 are the estimates being compared and se1 and se2 are their
corresponding standard errors. For example, information from Tables 4 and 4a
are used to compare children?s third-grade mathematics IRT-scale scores by children?s
race/ ethnicity. The formula used to compute the t statistic for the
comparison of White and Black thirdgraders? mathematics scores would be:
Due to the large sample size, many differences (no matter how substantively minor) are statistically significant. In this report, we define ?substantive differences? as percentage differences of 5 points or greater, or mean score differences of one-quarter of a standard deviation or more.
In addition to t-test comparisons, ordinary least squares (OLS) regression analyses were conducted in many sections in order to describe the relationship of selected child, family, and school variables to children?s scores, after controlling for these other variables. Independent variables were entered simultaneously for each regression analysis. White children served as the reference racial/ethnic group, and children who attended public school for the first 4 years served as the reference school-type group. T-test comparisons for other groups (e.g., Hispanic vs. Black) were conducted using the unstandardized coefficients produced in the regression analyses. The same significance criteria used in the bivariate analyses (p-value of .05 or less and effect size of 0.25 standard deviation or higher or 5 percentage points difference) was used for the regression coefficients.
When describing children?s gains in reading and mathematics over the first 4 years of school, the dependent variables in the regression analyses are gain scores that represent the differences between the IRT fall kindergarten and spring third-grade scale scores for the reading and mathematics assessments. Using gain scores as the dependent variable rather than spring scores as the dependent variable with fall scores as a covariate allows results to be presented in terms of progress made during the year regardless of where along the continuum that progress is made. There are longstanding concerns about the unreliability of gain scores in the measurement literature, although these concerns have more recently been shown to be unfounded and based on faulty assumptions (e.g., Gottman and Rushe 1993; Williams and Zimmerman 1996). Rogosa and Willett (1983) show that gain score reliabilities are strong when individual differences between pre-test and post-test are substantial, as is the case in most longitudinal assessment applications such as the ECLS-K assessments. Maris (1998) argues that regression toward the mean is not a legitimate argument against using gain scores nor is pretest measurement error a concern unless assignment into independent variable groups is determined from pre-test performance (which is not the case in the ECLS-K). Additionally, the use of IRT-scale scores and the adaptive testing approach used in the ECLS-K limit the concern that gain scores may be unreliable due to floor and ceiling effects (Rock and Pollack 2002).
Direct Assessment Administration Procedures
Reading, Mathematics, and Science Assessments
During the ECLS-K cognitive test development, an initial review of commercial assessments indicated that there were no ?off-the-shelf ? tests that met the domain requirements of the ECLS-K, were both individually administered and adaptive, or provided items that could be used to measure children?s cognitive achievement longitudinally. The framework for the ECLS-K drew from the National Assessment of Educational Progress (NAEP) fourthgrade test specifications. The NAEP assessment goals are similar to those of the ECLS?K in that both projects assess cognitive skills typically emphasized in schools. For the grades in which the NAEP frameworks were inappropriate, the ECLS?K solicited advice from early elementary school educators and curriculum specialists to articulate more suitable test specifications. The expertise of item writers from Educational Testing Service (ETS), elementary school curriculum specialists, and elementary school teachers was also used to develop new assessment items and select existing items to borrow or adapt, with permission, from published tests, including the Peabody Individual Achievement Test-Revised (PIAT-R), Peabody Picture Vocabulary Test-Revised (PPVT-R), the Primary Test of Cognitive Skills (PTCS), the Test of Early Reading Ability (TERA-2), the Test of Early Mathematics Ability (TEMA-2), and the Woodcock- Johnson Tests of Achievement- Revised (WJ-R) (Rock and Pollack 2002).
In kindergarten and first grade, the same set of reading and mathematics items were used in all four rounds of data collection. The kindergarten/firstgrade reading and mathematics assessments each had three second-stage forms, with most children routing into the easiest of the second-stage forms in fall-kindergarten and most into the most difficult forms in the spring of first grade. For third grade, a new set of items was developed since children?s academic skills could be expected to have advanced beyond the levels covered by the kindergarten/ first-grade assessment. The third-grade reading, mathematics, and science assessment each had three second-stage forms. Some of the kindergarten/ first-grade items were retained in the third-grade assessment to support the development of a longitudinal score scale (Pollack et al. forthcoming).
This report focuses on those children who were assessed in English in fall of kindergarten, spring of kindergarten, spring of first grade, and spring of third grade. In the ECLS-K, the reading and science assessments, specifically designed for the ECLS-K, were only administered in English. The mathematics assessment was administered in both English and Spanish in the kindergarten and first-grade year, and only in English in third grade. Prior to administering the English reading and mathematics assessment in kindergarten and first grade, children?s English language proficiency was evaluated. Children whose home language was other than English (as determined by school records) were administered the Oral Language Development Scale (OLDS) (for more information, see the ECLS-K Base-Year User?s Manual, National Center for Education Statistics 2001). If children demonstrated sufficient proficiency in English for the ECLS-K direct child assessment, they received the English reading and mathematics battery. Approximately 68 percent of Hispanic children and 78 percent of Asian/Pacific Islander children were assessed in English in the fall and spring of kindergarten and in the spring of first grade (Denton and West 2002).41 In the fall of kindergarten, 1,567 children were not administered the English battery because of their performance on the OLDS. By spring of first grade, this number was down to 350.42 In the third-grade year, the OLDS was not administered and all children were assessed in English.
Self-Description Questionnaire (SDQ) Assessment
Since children?s reading abilities may vary greatly in third grade, the SDQ was administered by having assessors read each item to the child and then providing the child a few seconds after each statement to mark their response. Assessors were trained not to look at children?s answers in an effort to avoid making children feel tempted to answer more positively than they would have otherwise. The SDQ was administered first to each child, followed by the cognitive assessments.
Children responded to each of 42 behavioral statements in relation to their perception of themselves on a scale from 1 to 4, including ?not at all true,? ?a little bit true,? ?mostly true,? and ?very true.? Children?s responses on the 42 items were then used to calculate scores on six scales: (1) reading scale, (2) mathematics scale, (3) school scale, (4) peer scale, (5) externalizing problem behavior scale, and (6) internalizing problem behavior scale. Children?s scale scores on each of the SDQ scales represent the mean rating of the items included in the scale. The first four scales were based on scales from Marsh?s (1990) Self-Description Questionnaire I. The last two scales were added to collect data on children?s perceptions of behaviors that may interfere with learning. The six scales were field tested in spring of 2000. Scale reliabilities in the spring 2002 data collection ranged from .77 to .90 (Pollack et al. forthcoming).
A number of variables used in this report were derived by combining information from one or more questions in the ECLS-K study instruments. The derivation of key variables is described in this section. Unless otherwise noted, steps for deriving variables were identical for kindergarten, first-grade, and third-grade data. Variable names from the ECLS-K database are included in the descriptions using the kindergarten names and are indicated by all capital letters.
Children?s race/ethnicity. The race/ethnicity composite (R5RACE) on the ECLS-K Third Grade restricted-use data file was constructed from two parent-reported variables: ethnicity and race. Following new Office of Management and Budget guidelines, a respondent could select more than one race (OMB 1997). Each respondent was asked to identify whether the child was Hispanic, and then to select one or more races. The following are the five composite race/ethnicity categories presented in this report: White non-Hispanic, Black non-Hispanic, Hispanic, Asian/Pacific Islander and Other (which includes American Indians, Alaska Natives, and non-Hispanic multiracial children). When race/ ethnicity differences are presented in this report, White refers to White, non-Hispanic and Black refers to Black, non-Hispanic.
Number of family risk factors. This variable is a composite based on family type (P2HFAMIL), federal poverty status (WKPOV_R), primary home language (WKLANGST), and mother?s highest education level (WKMOMED), as collected in the kindergarten year. The variables used to construct this composite come from the ECLS-K Longitudinal Kindergarten-First Grade public-use data file. Children receive one point on the index for each of the following risk factors: single-parent household, below federal poverty level,43 primary home language other than English, and mother?s highest education level less than a high school diploma. Children?s values for this composite variable were equal to the total number of family risk factors, ranging from 0 to 4. For children without a mother in the household in the kindergarten year (n = 137), their risk factor score was calculated based on the three variables with valid data (WKPOV_R, WKLANGST, and P2HFAMIL).
Fall kindergarten program type. This variable is a composite derived from A1CLASS (fall kindergarten classroom type based on teacher information) and F1CLASS (child kindergarten program type data from the field management system) variables on the ECLS-K Longitudinal Kindergarten- First Grade public-use data file. If children had valid data on A1CLASS and were designated as being in an AM or PM kindergarten, their kindergarten program type was set to ?half-day kindergarten.? If they had valid data on A1CLASS and were designated as being in an all-day kindergarten, their kindergarten program type was set to ?full-day.? If a child was missing data on A1CLASS, their values on F1CLASS were used to designate fall kindergarten program type.
School type across all waves of the study. This variable is a composite based on S2KPUPRI and S4PUPRI from the ECLS-K Longitudinal Kindergarten- First Grade public-use data file and S5PUPRI from the ECLS-K Third Grade restricted-use data file. They are indicators of whether a child attended at public or private school in the spring of kindergarten, first grade, and third grade. Public schools included Bureau of Indian Affairs and tribal schools, schools of choice (e.g., charter schools), and public schools with magnet programs. Private schools included Catholic schools, other religious private schools, and non-religious private schools. Children were categorized into ?public school all years,? ?private school all years,? or ?public and private school attendance between kindergarten and third grade? based on their values on each of the three variables.
37 During the spring of 1998, Westat identified new schools that were not found on either frame. A sample of these schools was included in the ECLS-K school sample.
38 The approach used to develop weights for the ECLS-K is described in the ECLS-K user?s manuals (NCES 2001, NCES 2002, NCES 2003a).
39 Children received a valid child assessment weight if they participated in any part of the child assessment, such as the height and weight measurements, at each time period. Thus, children who were unable to take the cognitive assessments because of their disability or language status could still have a valid child assessment weight.
40 WesVar PC statistical software is designed to calculate estimates and appropriate standard errors for multistage, stratified, and unequal probability sample designs.
41 In an earlier ECLS-K report (Denton and West 2002), analyses were conducted to explore how including children who initially could not take the battery in English but were screened in by spring of first grade would impact achievement estimates. Significant reading t-score differences overall and by specific racial/ethnic group were not detected between the analytic sample of children assessed in English at all time points and the total sample, including those who were screened into the English assessment over time.
42 This information is based on the ECLS-K Longitudinal Kindergarten-First Grade Public Use Electronic Code Book (NCES 2002?148) variable CPSOLDS (Round in which child passed English OLDS).
43 The federal poverty level status composite variable is derived from household income and the total number of household members. Federal poverty thresholds are used to define households below the poverty level. For instance, in 1998 if a household contained 4 members, and the household income was lower than $16,655, then the household was considered to be in poverty.