A two-stage sampling process was used to select teachers for the FRSS Methodology Survey on Teacher Performance Evaluations. At the first stage, a stratified sample of 525 schools was drawn from the 1990-91 list of public schools compiled by the National Center for Education Statistics. This complete file contains about 85,000 school listings, including over 59,000 schools with grades 1 through 6, and is part of the NCES Common Core of Data (CCD) School Universe. Regular schools providing instruction in any of the grades 1 through 6 in the 50 states and the District of Columbia were included in the sampling frame.3 Special education and alternative schools, ungraded schools, and schools in the outlying territories were excluded from the frame prior to sampling. With these exclusions, the final sampling frame consisted of approximately 59000 eligible schools.
The sample was stratified by size of school, region (Northeast, Central, Southeast, and West), and urbanicity status (city, urban fringe, town, and rural). Within each of the major strata schools were sorted by enrollment size, percentage of students eligible for free or reduced price lunch, and percentage of minority students. The allocation of the sample to the major strata was made in a manner that was expected to be reasonably efficient for national estimates, as well as for estimates for major subclasses. Schools within a stratum were sampled with probabilities proportionate to the estimated number of elementary teachers in the school.
It should be noted that the number of elementary teachers is not available in the CCD school file; the estimates for this figure were derived by applying an overall pupil-to-teacher ratio to the aggregate CCD enrollment counts to derive a rough measure of size for each school in the frame.4 It should also be noted that the number of "eligible" schools included all schools that have any of the grades 1 through 6. Thus, a school coded as K- 12 in CCD would be eligible for the first-stage selection; however, only teachers of kindergarten through grade 6 would be eligible for inclusion in the survey at the second stage of selection.5
Each of the 525 schools in the sample was contacted during December 1992 and asked to provide a list of all elementary-grade teachers for sampling purposes. Eligible teachers included all full-time persons teaching a regular kindergarten through sixth grade class. Excluded from the list were part-time and itinerant teachers, substitute teachers, teachers" aides, special education teachers, special subject teachers (those teaching only physical education, music, etc.), prekindergarten teachers, and any other teachers who did not teach a kindergarten through sixth grade class. Only full-time, regular elementary teachers were included in this survey because it was thought that their experience with performance evaluation might differ from that of secondary school teachers and special subject teachers. The scope of a Fast Response survey does not permit a large enough sample to compare subpopulations. A list of 8,869 teachers was compiled from the schools. Schools were asked to indicate which teachers were in their first year of teaching in that school. Nine percent of the teachers on the list were in their first year of teaching at the school. Because these teachers may not have had the opportunity to be formally evaluated, they were declined ineligible for this survey. From this modified list, a final sample of 1,070 teachers of grades K-6 was drawn. On average, two regular, full-time teachers were sampled from each school, one from kindergarten through grade 3 and one from grades 4 through 6. The survey data were weighted to reflect these sampling rates (probability of selection) and were adjusted for nonresponse.
At the first stage of sampling of 525 schools, 5 schools were found to be out of the scope of the study (because they were closed or otherwise not eligible). Of the remaining 520 eligible schools, 493 provided complete lists of teachers. The school-level response was 95 percent (493 responding schools divided by the 520 eligible schools in the sample).
In March 1993, questionnaires were mailed to 1,070 teachers at their schools. A copy of the survey form is attached to this report. Teachers were asked to complete the questionnaire with reference to their most recent teacher performance evaluation or, if they had not been evaluated previously, they were asked to provide general information and to complete the two opinion questions. Thirteen teachers were found to be out of scope (no longer at the school or otherwise not eligible), leaving 1,057 eligible teachers in the sample. Telephone followup of nonrespondents was initiated in mid-March; data collection was completed by late May with 986 teachers completing the survey. Of these, 541 teachers (55 percent) completed the mailed questionnaire, and telephone interviews were conducted with the remaining 445 teachers (45 percent). The teacher-level response was 93 percent (986 teachers completed the questionnaire divided by 1,057 eligible teachers in the sample). The overall study response rate was 88 percent (94.8 percent rate of school response multiplied by the 93.3 percent response rate at the teacher level). The weighted overall response rate was 91 percent (95.3 percent weighted school response rate multiplied by the 95.2 percent weighted teacher response rate). Item nonresponse ranged from 0.0 percent to 3.3 percent. The majority of items with missing data had a lower than 1 percent nonresponse rate; therefore, missing data were excluded from the analysis.
The data were weighted to produce national estimates. The weights were designed to adjust for variable probabilities of selection and differential nonresponse. A final post stratification adjustment was made so that the weighted teacher counts equaled the corresponding estimated teacher counts from the CCD frame within cells defined by size of school, region, and urbanicity. The findings in this report are estimates based on the sample selected and, consequently, are subject to sampling variability.
The survey estimates are also subject to nonsampling errors that can arise because of nonobservation (nonresponse or noncoverage) errors, errors of reporting, and errors made in collection of the data. These errors can sometimes bias the data. Nonsampling errors may include such problems as the differences in the nonrespondents" interpretation of the meaning of the questions, memory effects, misrecording of responses, incorrect editing, coding, and data entry, differences related to the particular time the survey was conducted, or errors in data preparation. While general sampling theory can be used in part to determine how to estimate the sampling variability of a statistic, nonsampling errors are not easy to measure and, for measurement purposes, usually require that an experiment be conducted as part of the data collection procedures or that data external to the study be used.
To minimize the potential for nonsampling errors, the questionnaire was pretested with elementary teachers like those who completed the survey. During the design of the survey and the survey pretest, an effort was made to check for consistency of interpretation of questions and to eliminate ambiguous items. The questionnaire and instructions were extensively reviewed by the National Center for Education Statistics, the Office of Research, and the Center for Research on Educational Accountability and Teacher Evaluation (CREATE). Manual and machine editing of the questionnaire nonresponses were conducted to check the data for accuracy and consistency. Cases with missing or inconsistent items were recontacted by telephone. Data were keyed with 100 percent verification.
The standard error is a measure of the variability of estimates due to sampling. It indicates the variability of a sample estimate that would be obtained from all possible samples of a given design and size. Standard errors are used as a measure of the precision expected from a particular sample. If all possible samples were surveyed under similar conditions, intervals of 1.96 standard errors below to 1.96 standard errors above a particular statistic would include the true population parameter being estimated in about 95 percent of the samples. This is a 95 percent confidence interval. For example, the estimated percentage of teachers reporting that their last teacher performance evaluation included a formally rated observation is 92 percent, and the estimated standard error is 1.0 percent. The 95 percent confidence interval for the statistic extends from [92 - (1.0 times 1.96)] to [92 + (1.0 times 1.96)], or from 90 to 94 percent.
Estimates of standard errors were computed using a technique known as jackknife replication. As with any replication method, jackknife replication involves constructing a number of subsamples (replicates) from the full sample and computing the statistic of interest for each replicate. The mean square error of the replicate estimates around the full sample estimate provides an estimate of the variance of the statistic (see Welter 1985, Chapter 4). To construct the replications, 30 stratified subsamples of the full sample were created and then dropped one at a time to define 30 jackknife replicates (see Welter 1985, page 183). A proprietary computer program (WESVAR), available at Westat, Inc., was used to calculate the estimates of standard errors. The software runs under IBM/OS and VAX/VMS systems.
Information file survey was performed under contract with Westat, Inc., a research firm in Rockville, Maryland, using the Fast Response Survey System (FRSS). FRSS was established in 1975 by NCES. It was designed to collect small amounts of policy-oriented data quickly and with minimum burden on respondents. Over 45 surveys have been conducted through FRSS. Recent FRSS reports (available through the Government Printing Office) include the following:
Westat's Project Director was Elizabeth Farris, and the Survey Manager for the FRSS Survey on Teacher Performance Evaluations was Mary Jo Nolin. Judi Carpenter was the NCES Project Officer. The data were requested by Sue Klein, Office of Educational Research and Improvement, NCES, in coordination with Daniel Stufflebeam and Arlen Gullickson, Center for Research on Educational Accountability and Teacher Evaluation, Western Michigan University. Dr. Stufflebeam provided an initial draft of some survey items and collaborated with Westat and NCES on their further development.
The report was reviewed by John Crawford, Director of Planning and Education, Millard Public Schools; Rita Fey, Education Program Specialism Learning and Instruction Division, Office of Research, NCES; Sue Klein, Office of Educational Research and Improvement, NCES; Robert Nearine, Special Assistant, Evaluation, Research and Testing, Hartford Public Schools; and Darrell Root, Assistant Professor of Educational Administration, University of Dayton. Within NCES, report reviewers were Sharon Bobbitt, Elementary/Secondary Education Statistics Division; Patricia Dabbs, Education Assessment Division; Bernard Greene, Postsecondary Education Statistics Division; Mary Rollefson, Data Development Division and Jeffrey Williams, Postsecondary Education Statistics Division.
For more information about the Fast Response Survey System or the Survey on Teacher Performance Evaluations, contact Judi Carpenter, Elementary/Secondary Education Statistics Division, Special Surveys and Analysis Branch, Office of Educational Research and Improvement National Center for Education Statistics, 555 New Jersey Avenue, NW, Washington, DC 20208-5651, telephone (202) 219-1333.
Darling-Hammond, L., Wise, A.E., and Pease, S. (1983). "Teacher Evaluation in the Organizational Context: A Review of the Literature." Review of Educational Research, 53:285-328.
Darling-Hammond, L. (1990). "Teacher Evaluation in Transition: Emerging Roles and Evolving Methods." In The New Handbook of Teacher Evaluation: Assessing Elementary and Secondary School Teachers. Eds. J. Millman & L. Darling-Hammond. Newbury P@, CA: Sage Publications.
Dwyer, C. A., and Stufflebeam, D.L. (forthcoming). "Evaluation for Effective Teaching." In Handbook of Educational Psychology. Ed. D. Berliner.
Millman, J. (198 1). "Introduction." In Handbook of Teacher Evaluation. Ed. J. Millman Beverly Hills: Sage Publications.
Millman, J., and Dariing-Hamrnond, L. (1990). The New Handbook of Teacher Evaluation: Assessing Elementary and Secondary School Teachers. Newbury Pm CA: Sage Publications.
Bickers, P. M., comp. (1988). "Teacher Evaluation Practices and Procedures." The ERS Survey of Evaluation: Practices and Procedures. The Educational Research Service.
Stiggins, R.J., and Duke, D.L. (1988). The Case for Commitment to Teacher Growth: Research on Teacher Evaluation. New York: State University of New York Press.
Stufflebeam, D.L. ( 1991). "An Introduction to the Center for Research on Educational Accountability (CREATE)." Journal of Personnel Evaluation in Education, 5:85-92.
The WESVAR Procedures. (1989). Rockville, MD: Westat, Inc.
Wise, A. E., Darling-Hammond, L., McLaughlin, M. W., and Bernstein, H.T. (1984). Teacher Evaluation: A Study of Effective Practices. Santa Monica CA: Rand Corporation.Wolter, K. (1985). Introduction to Variance Estimation. Springer- Verlag.
Common Core of Data (CCD) Public School Universe - A data tape containing 85,000 records, one for each public elementary and secondary school in the 50 states, District of Colurnbia and 5 outlying areas, as reported to the National Center for Education Statistics by the state education agencies for 1990-91. Records on this file contain the state and federal identification number, name, address, and telephone number of the school, county name and PIPS code, school type code, enrollment size, and other codes for selected characteristics of the school.
Teacher Performance Evaluation - The process of determining how well a person has fulfilled his or her teaching responsibilities.
Formal Evaluation - The totality of the systematic process of teacher performance evaluation within a given time period.
City - A central city of a Metropolitan Statistical Area (MSA).
Urban fringe - A place within an MSA of a large or mid-size central city and defined as urban by the U.S. Bureau of Census.
Town - A place not within an MSA, but with a population greater than or equal to 2,500, and defined as urban by the U.S. Bureau of Census.
Rural - A place with a population less than 2500 and defined as rural by the U.S. Bureau of Census.
Northeast region - Connecticut, Delaware, District of Columbia, Maine, Maryland, Massachusetts, New Hampshire, New Jersey, New York, Pennsylvania Rhode Island, and Vermont.
Central region - Illinois, Indiana, Iowa, Kansas, Michigan, Minnesota, Missouri, Nebraska, North Dakota, Ohio, South Dakota, and Wisconsin.
Southeast region - Alabama, Arkansas, Florida, Georgia, Kentucky, Louisiana, Mississippi, North Carolina, South Carolina, Tennessee, Virginia, and West Virginia.
West region - Alaska, Arizona, California, Colorado, Hawaii, Idaho, Montana, Nevada, New Mexico, Oklahoma, Oregon, Texas, Utah, Washington, and Wyoming.
3Although kindergarten teachers in regular elementary schools were eligible for the survey, those in preprimary schools were not. Therefore, preprimary schools were not included in the sampling frame.
4Pupil-to-teacher ratios for elementary schools vary widely by state (see NCES E.D. Tabs, Public Elementary and Secondary Aggregate Data for School Year 1990-91 and Fiscal Year 1990, NCES 92-033). The national average for school year 1990-91 is about 19 pupils per teacher.
5The 59,589 schools in the sampling frame included 1,784 schools that provide instruction in the secondary grades 9 through 12 in addition to the elementary grades 1 through 6. These 1,784 schools account for about 3 percent of all elementary teachers.