Skip Navigation
small NCES header image

Chapter 2. Methods

SAMPLING

Our goal was to collect national probability samples of eighth-grade mathematics students in Germany, Japan, and the United States. The final sample consisted of 100 lessons in Germany, 81 in the United States, and 50 in Japan. In addition, five "public use" tapes were collected in each country to serve as examples to help us communicate the results of the study. And, a subsample of 30 lessons in each country was selected from the final sample for in-depth analysis by a group of mathematicians and mathematics educators. We review each of these samples in more detail here.

All analyses reported here were done on the full sample of 231 lessons except the following, which used the subsample of 30 lessons in each country selected for analysis by the Math Content Group: (1) Analyses of the Math Content Group; (2) Some analyses of the use of the chalkboard; (3) Analyses of second-pass coding of discourse; and (4) Analyses of explicit linking within and across lessons. We have explicitly noted in the text whenever anything less than the full sample is used in an analysis.

The Main Video Sample

The main video samples were designed to be random subsamples of the TIMSS main study sample, which was selected according to the TIMSS sampling plan in each country. Our plan was to videotape 100 eighth-grade classrooms in Germany and the United States, and 50 in Japan. In the end, these targets were attained in Germany and Japan but not in the United States, where only 81 classrooms agreed to participate. The sample size in Japan was reduced to 50 primarily because collaborators at the Japanese National Institute for Educational Research (NIER) determined that 100 classrooms would create too great a burden for their country. This reduction was further justified by the fact that certain characteristics of Japanese education (e.g., lack of tracking within or across schools, adherence to a national curriculum, and culturally more homogeneous population) led us to expect lower variability between classrooms in Japan.

The main TIMSS study focused on three separate age groups; the video sample was drawn from only one of these age groups, referred to as Population 2. Population 2 was defined as the pair of adjacent grades in each country which contained the largest percentage of 13-year-olds. In all three countries included in the video study, Population 2 was defined as grades seven and eight. NCES specifications for the study required that only eighth-grade classrooms be sampled for videotaping. According to the TIMSS international specifications, sampling in each country was accomplished by selecting schools, then classrooms within schools. Each country was required to sample a minimum of 150 schools and a minimum of one seventh- and one eighth-grade classroom within each school.

The selection of schools for the main TIMSS study followed a somewhat different procedure in each country. In the United States, schools were sampled from within primary sampling units (PSUs), geographically-defined units designed to reduce the costs of data collection. PSUs were stratified according to geographic region, metropolitan versus nonmetropolitan area, and various secondary strata defined by socioeconomic and demographic characteristics, then sampled with the probability of selection proportionate to the population of each PSU. Within each sampled PSU, schools were sampled with the probability of selection proportionate to the estimated number of students in the target grades. In Japan, schools were randomly selected from strata defined by size of community and size of school, with the probability of selection proportionate to the size of the population within each stratum. Germany followed a similar procedure but defined its strata by state and by type of school.

Further details regarding selection of the main TIMSS samples in each country can be obtained elsewhere (Foy, Rust, & Schleicher, 1996). Here, we describe how the subsamples were selected for the video study. Because specific details of sample selection and recruitment varied across the three countries, we describe each country's sample separately. (A discussion of weighted and unweighted response rates for each country can be found in appendix B.)

The U.S. Sample

The U.S. TIMSS sample for Population 2 consisted of a stratified random sample of 220 schools. Within each school, one seventh- and two eighth-grade classrooms were studied. One-half of these schools were randomly sampled to be part of the video study. Within each school, one eighth-grade classroom was randomly sampled to be videotaped.

Schools were selected for the video study as follows: First, Population 2 TIMSS schools were listed in the order in which they were originally sampled. Using this ordering, pairs of schools were generated. Within each pair one of the two schools was randomly sampled (with each school having an equal probability of being sampled). The unsampled school in the pair was reserved as a potential replacement for the sampled school. A total of 109 pairs were assigned, with one school unpaired, because one school of the original Population 2 sample of 220 schools had no eighth grade. The unpaired school was given a half chance of being selected. The final videotape sample size was 109. The unpaired school was not sampled.

Within each sampled school, one eighth-grade classroom was selected with equal probability from the two TIMSS eighth-grade classrooms in the school. There was no sorting or stratification of classrooms by level of mathematics taught. In the event that the sampled teacher refused to be videotaped, the classroom was never replaced by the other eighth-grade classroom in the same school. Instead, the entire school was replaced by its paired school.

Of the original 109 schools sampled, 100 were public and 9 were private. Forty schools, including one private school, refused to participate. The paired schools for 13 of these refusals were contacted, and 12 agreed to participate in the video study. Thus, the final video sample in the United States consisted of 73 public and 8 private schools. The high refusal rate among originally sampled U.S. classrooms should be kept in mind as a potential source of sampling bias.

Each teacher who participated in the study was awarded a $300 grant, its use to be determined "jointly by the teacher and the principal."

The German Sample

The German TIMSS sample for Population 2 consisted of a stratified random sample of 153 schools (of which 150 were eligible for participation) drawn from all states except Baden-Wuerthemberg. Sampling strata were defined by state, school type, distribution frequencies of each school type in each state, and classroom size. The random sampling of the schools was carried out by the Statistical Institutes of the German States. The four main participating school types were Gymnasium, Realschule, Hauptschule, and Integrierte Gesamtschule. Gymnasium is the highest academic track of schools. Gymnasium runs from 5th grade through 13th grade. Graduates of the Gymnasium are eligible to attend university. Realschule is the middle-level track. Realschule extends through 10th grade. Hauptschule is the lowest track, running through 9th grade. Graduates of Hauptschule are eligible to enter vocational schools. Integrierte Gesamtschule are relatively uncommon. In these schools, the three tracks are integrated into a single building, though the curricula and classes are still separate. A few schools, in the former East Germany, were not of these main types: Regelschule are combinations of Realschule and Hauptshule in a single building; Realschulklasse and Hauptschulklasse are special classes within schools that have modified curricula.

The schools for the video study were selected as follows: First, 100 schools were randomly sampled from the list of 153 schools originally sampled for the TIMSS study. Of these 100 schools, 15 refused to be videotaped. As schools declined, one of the main TIMSS replacement schools for the refusing school was selected to participate in its place. The breakdown of the final sample according to type of school is shown in figure 1. Within each school, the eighth-grade classroom that participated in the TIMSS assessments was selected for videotaping.

As in the United States, German teachers were paid a modest stipend for their participation.

 

Figure 1

German sample for the Videotape Classroom Study broken down by type of school

 

School Type
Mean
Gymnasium
34
Realschule
24
Hauptschule
23
Gesamtschule
9
Regelschule
3
Realschulklasse
4
Hauptschulklasse
3
Total
100

 

SOURCE: U.S. Department of Education, National Center for Education Statistics, Third International Mathematics and Science Study, Videotape Classroom Study, 1994-95.

 

The Japanese Sample

The Japanese TIMSS sample for Population 2 consisted of 158 schools. One-hundred and fifty of the schools were public and eight were private.

Public schools were selected by stratified random sampling. First, two factors--size of the school (small, medium, large) and size of the city (small, medium, large)--were used to classify Population 2 schools. Small schools were defined as having 8 to 40 students enrolled in eighth grade, medium schools 41 to 160 students, and large schools, over 160 students. Small cities were defined as having a population of fewer than 50,000, medium cities between 50,000 and one million, and large cities one million or more. Because no school fell into the large school/small city stratum, sampling was based on eight strata. Schools were randomly selected from each stratum in proportion to the total number of schools in the stratum. Private schools were randomly selected from among the population of private schools in Japan.

One third of the schools in the TIMSS sample were then randomly selected within each stratum for the video study, yielding a sample size of 50. Of these 50 schools, two declined to participate. Each of these was replaced by randomly selecting another school within the same stratum.

One eighth-grade classroom was selected from each school. In the event the mathematics teacher assigned to this classroom declined to participate in the video study, the particular class/teacher was never replaced by another eighth-grade classroom in the same school. Instead, another school within the same stratum replaced the entire school.

Of the schools that participated in the videotape study, all but one school participated in the main TIMSS study as well. However, during the selection of classrooms to be videotaped, a deviation from the original plan to test TIMSS classrooms arose. Because NIER did not want to overburden the teachers, videotaping was usually done in a different class from the one in which testing for the main study was conducted, unless there was only one eighth-grade classroom in the school. When there was a choice, the principal of each school chose the classroom in which the videotape study occurred. Although it is unlikely that there are significant student achievement differences between the main TIMSS classroom and the classroom chosen for the videotape study, it is possible that there are differences in teacher characteristics. It should be kept in mind that Japanese principals exercised discretion in the choice of classrooms to be videotaped.

Participating schools and teachers were offered a small token of appreciation for their participation by the U.S. government. Each teacher also received a videotape of his or her teaching.

Sampling Time in the School Year

Our goal was to spread the videotaping out evenly over the school year. In Germany and the United States we accomplished this goal by employing a single videographer in each country to tape over an 8-month period, from October 1994 through May 1995. Unfortunately, we were not able to implement the same plan in Japan. Because the school year begins in April in Japan, following a schedule analogous to the other two countries would have meant starting in June and taping through December. Unfortunately, this schedule was not possible due to the need to coordinate the videotaping with the test administration. The result was that videotaping in Japan had to be compressed primarily into a 4-month period, from November 1994 through February 1995, with a few lessons taped in March. The distribution of videotaping over time in each country is presented in figure 2.

 

Figure 2

Distribution of videotaping over time in each country

 

fig2.ai

 

SOURCE: U.S. Department of Education, National Center for Education Statistics, Third International Mathematics and Science Study, Videotape Classroom Study, 1994-95.

 

Unfortunately, the consequence of taping only during the second half of the Japanese school year is made more problematic by the near-universal adherence in Japan to a national curriculum. In most eighth-grade classrooms in Japan, the first half of the school year is devoted to algebra, the second half to geometry. Thus, our sample in Japan is skewed toward geometry. Although this is a limitation of the study, we did try to diminish the problem by sampling five additional algebra lessons during the next Japanese school year. These lessons were included in the subsample for the math content group, described here, though they were not included in the main analyses.

Subsample for the Math Content Group

A subsample of 30 videotaped lessons was selected from each country for in-depth content analysis by a group of mathematicians and mathematics educators. (This group is described in more detail in appendix C.) This total of 90 tapes was selected as follows: first, lessons in the video study were categorized as being primarily geometry or primarily algebra (broadly defined to include advanced topics in arithmetic). Then, 15 algebra and 15 geometry tapes were chosen randomly from each country to constitute the subsample.

The 15 algebra tapes from Japan included 5 that were sampled later in an attempt to remedy the overrepresentation of geometry in the sample (see section above on "Sampling Time in the School Year"). These tapes were obtained by proportionally randomly choosing 5 schools of the 50 that participated in the video study, and then choosing a different teacher (i.e., one not already videotaped teaching geometry) to be videotaped teaching algebra.

Additional Tapes for Public Use

Because participants in the video study were guaranteed confidentiality, videotapes collected in the study cannot be shown publicly. However, because we believe that video examples will be extremely useful for communicating the results of the study, we decided to collect five tapes in each country that could be used for this purpose. For these tapes, we obtained written releases from the teachers and from the parents of students appearing in the tapes.

It is not easy to find teachers who will agree to being videotaped for public viewing; one cannot simply select such teachers at random. In the United States, we relied on networks of friends and contacts to identify teachers for public taping, hoping that the teachers who agreed to participate would be representative of those included in the large study sample. German and U.S. public use tapes were not included in the analyses presented. In Japan, in accordance with the preference of our collaborators, public use tapes were selected from among the main study tapes, and permission to use the tapes publicly was secured after the fact.

 


OVERVIEW OF PROCEDURES

We primarily collected two kinds of data in the video study: videotapes and questionnaires. We also collected supplementary materials (e.g., copies of textbook pages or worksheets) deemed helpful for understanding the lesson. Each classroom was videotaped once on a date convenient for the teacher. One complete lesson--as defined by the teacher--was videotaped in each classroom.

Teachers were initially contacted by a project coordinator in each country who explained the goals of the study and scheduled the date and time for videotaping. Because teachers knew when the taping was to take place, we knew they would attempt to prepare in some way for the event. In order to cut down somewhat on the variability in preparation methods across teachers we gave teachers in each country a common set of instructions. Teachers were told the following:

Our goal is to see what typically happens in [U.S. or German or Japanese] mathematics classrooms, so we really want to see exactly what you would have done had we not been videotaping. Although you will be contacted ahead of time, and you will know the exact date and time that your classroom will be videotaped, we ask that you not make any special preparations for this class. So please, do not make special materials, or plan special lessons, that would not typify what normally occurs in your classroom. Also, please do not prepare your students in any special way for this class. Do not, for example, practice the lesson ahead of time with your students.

On the appointed day the videographer arrived at the school and videotaped the lesson. After the taping each teacher was given a questionnaire and an envelope in which to return it. The purpose of the questionnaire was to assess how typical the lesson was according to the teacher and to gather contextual information important for understanding the contents of the videotape. Both taping procedures and questionnaire contents are described in more detail.

Field Test

All procedures were tested in a field test, which was conducted in spring 1994. For the field test we collected nine videotapes from each country, together with all of the supplementary data. In addition to testing procedures of data collection, field test tapes were used to help in development of coding and analysis methods, as described more fully here. A full report of the field test may be found in Stigler & Fernandez (1995).

 


VIDEOTAPING IN CLASSROOMS

The success of any video survey will hinge on the quality, informativeness, and comparability of the tapes collected. What we see on a videotape results not only from what transpires in the classroom but also from the way the camera is used. If our aim is to compare certain aspects of instruction, then we must make sure that these aspects are clearly captured on all the tapes. In addition, we want to make sure that we are comparing classroom instruction and not camera habits. There are many decisions that must be made by the camera operator; if these are not made in a standardized manner, then the resulting tapes will not be comparable across classrooms or countries.

We developed procedures for camera use in collaboration with Scott Rankin, an experienced videographer who had worked with us in previous projects and who, therefore, was familiar with the challenges of documenting classroom instruction. Our goal was to develop a set of general principles and rules of thumb that would be relatively simple for our videographers to learn, yet comprehensive enough to apply in any classroom situation.

We should note at the outset that we decided to use one camera per taping instead of two, which made it impossible to see all of the students in a class. This constraint was based on budget considerations, although it also simplified the process of coding and analysis. Consequently this study did not collect detailed information on student behavior.

In the following sections we describe the procedures used for videotaping classroom instruction; our method for training videographers to use the procedures consistently; and an evaluation of the success of our training by comparing camera use across our three videographers.

Basic Principles for Documenting Classroom Lessons

Because we wanted to see each lesson in its entirety, all videotaping was done in real time: The camera was turned on at the beginning of the class, and not turned off until the lesson was over. This means that we can study the duration of classroom activities by measuring their length on the videotape. Obviously, this would not be possible if there were any gaps in the recording.

Classrooms are complex environments where much is going on at any given time; it is impossible to document everything, particularly when only one camera is used. We decided on two principles to guide videographers in their choices of where to point the camera. These principles yield a comprehensive view of the lesson being taped.

Principle #1: Document the perspective of an ideal student. Assume the perspective of an ideal student in the class, then point the camera toward that which should be the focus of the ideal student at any given time. An ideal student is one who is always attentive to the lesson at hand and always occupied with the learning tasks assigned by the teacher. An ideal student will attend to individual work when assigned to work alone, will attend to the teacher when he or she addresses the class, and will attend to peers when they ask questions or present their work or ideas to the whole class. In other words we chose to point the camera so as to capture the experience of a student who is paying attention to the lesson as it unfolds. In cases where different students in the same class are engaged in different activities, the ideal student is assumed to be doing whatever the majority of students are doing.

Principle #2: Document the teacher. Regardless of what the ideal student is doing, be certain to capture everything that the teacher is doing to instruct the class. Usually the two principles are in agreement: Whenever the ideal student is attending to the teacher, both principles would have the camera pointed at the teacher. However there are times when the two principles are in conflict. Take, for example, a case where the majority of students are doing seatwork while the teacher is working privately with two students at the board. The ideal student would be focused on his or her work, not on the teacher. In situations like this one the videographer must go beyond these two basic principles in order to determine where to point the camera.

The Exceptions: Three Difficult Situations

We have identified three common situations where the principles alone cannot guide choices about what to capture on the videotape. These situations are: (1) when the ideal student would be focused on something other than the teacher, (2) when two speakers who are having a conversation will not fit in a single shot, and (3) when a speaker and an object being discussed will not fit in a single shot. We developed a set of guidelines so that videographers will choose similar (i.e., comparable) shots when faced with each of these situations and so that these shots will contain a maximum amount of useful information. In the rest of this section we present a more detailed discussion of these three situations and how we chose to film them.

Situation #1: When the ideal student is not watching the teacher. As already mentioned, there are times when the ideal student should be attending to something other than the teacher. This most often occurs when students are given a task to work on individually or in small groups. Teachers can use this time in different ways. Sometimes they will walk around the class and monitor students' work. This is ideal from the videographers' point of view because by following the teacher with the camera one can also get a sense of what students are doing. In some instances, however, a problem arises because the teacher does not circulate through the class but rather stays at the board or her desk. In such cases the camera would need to be pointed in two different directions (toward the teacher and toward the students) in order to capture both the teacher and the focus of the ideal student.

Videographers were instructed to handle such situations by alternating between these two points of view. They were told to slowly do a sweep of the classroom by panning away from the teacher and then panning back to the teacher so as to document what the students are doing. After this sweep they were told to keep the camera directed at the teacher unless the nature of the students' activity changes in any significant way (e.g. new materials are introduced or they break into groups). If the students' activity were to change, videographers were instructed to carry out another sweep of the students, then return to the teacher.

Situation #2: When two speakers will not fit in a single shot. A second difficult situation occurs when the teacher is conversing with a student (or a student is conversing with another student) and the two speakers are far enough apart that they do not fit in a single camera shot. This often occurs when a teacher calls on a student seated in the back of the room, then proceeds to talk back and forth to the student.

In this case videographers were instructed to move the shot from speaker to speaker as they take turns talking. An exception to this rule occurs when one of the speaker's turns of speech are so brief that there is no time to shift the camera before the turn is over. In this case the camera should be kept on the person doing the most talking.

Situation #3: When the speaker and the object being discussed will not fit in a single shot. Another difficult situation occurs when a speaker and an object he or she is discussing will not both fit into a single camera shot. This happens frequently, for example, when someone is talking about things written on the chalkboard or about concrete representations of a mathematical situation or concept.

In this kind of situation videographers were told to document the object for long enough to provide the visual information needed to make sense of the talk, then to keep the shot on the speaker. For example, if the teacher is talking about a problem on the blackboard, the videographer should first tape the problem, then move to the teacher.

There is one important exception to this rule. Sometimes it is not sufficient to briefly see the object and then move to the speaker because the talk will make no sense unless one is seeing the object as it is being talked about. For example, if the speaker is pointing to specific features of the object as he or she talks, and if the pointing must be seen in order to understand the talk, then the rule is that the camera must stay on the object so that the talk can be understood.

How Close to Frame the Shot

Aside from making sure that videographers point their cameras at comparable things, we also wanted to make sure that their shots are framed in comparable ways. An extreme close up of the teacher talking would provide a very different sense of the action taking place than a wide shot where the teacher is seen in the context of the classroom.

We decided that in general we wanted the widest shot possible, a shot professional videographers call the "Master of Scene" (MOS) or, more simply, the "master shot." From an aesthetic point of view closer shots often look better. However, the MOS provides more contextual information and thus was judged more appropriate for our purposes. The master shot also is less prone to bias because it does not artificially focus the viewer in on whatever aspect of the lesson the videographer judged to be most interesting.

Sometimes, however, there is crucial information that cannot be captured in a master shot. Common examples include objects being discussed during the lesson or things written on the blackboard. In such instances the camera should zoom in close enough to capture this information. In other words, although our preferred view of the classroom is the MOS, a closer shot must be used when it is needed to understand what is going on. Videographers were told to hold close shots long enough to enable a viewer to read or form a mental image of the information.

Moving from Shot to Shot

Finally, having devised guidelines for what to include in the shot, we also needed some rules for how to move from shot to shot. This, too, must be done in a standardized way if the tapes are to be fully comparable.

The guidelines we gave to the videographers were based on principles of good camera work. We taught them how to compose shots and execute camera movements in ways that follow basic cinematographic conventions and fundamentals of good composition. Aside from wanting them to follow the same conventions, we wanted them to carry out good camera work. Bad camera work calls attention to itself and distracts the viewer from the contents of the tape.

Training Videographers

In order to make sure that the rules were applied correctly and reliably we had to work intensively with the videographers. Each videographer participated in two training sessions, both of which were conducted by our professional videographer, Mr. Rankin. The first training session lasted 9 days in the spring of 1994, after which each videographer was sent out to collect 10 practice tapes for a field test. The second training session lasted 5 days and was held in the early fall of 1994. Following this second training session videographers were given a test, then when they passed, sent off to collect the data.

We designed the training sessions with two goals in mind: First, we wanted to teach the videographers our camera use rules to the point that they could follow them by second nature. In an actual taping situation videographers would have to make rapid decisions about where to point the camera without time for reflection. Second, we wanted the videographers to learn and practice the fundamental skills of camera use. These skills include, for example, changing from one camera angle to another quickly without losing a focused image, tracking moving objects without having the object leave the shot, and moving rapidly back and forth from close-ups to master shots while ending up centered on the shot that needs to be captured.

The first training session was devoted to five activities: learning to use the equipment, practicing basic principles of good camera work, presentation and discussion of the standardized rules for taping classrooms, practice taping in mock classrooms, and practice taping in real classrooms. Activities in the second training session included reviewing and discussing the rules, critiquing practice tapes, and more practice taping in mock classrooms. A monitor hooked to the camera during the training sessions allowed videographers to rotate between practicing with the camera and watching/critiquing their peers in collaboration with the instructor.

We would like to pause here and insert a helpful hint for others contemplating this kind of work. One has two alternatives in deciding who to hire and train as a video survey videographer: One can hire scientists (i.e., educational researchers) and train them to take good pictures, or one can hire artists (i.e., photographers) and teach them the importance of following standardized rules for camera use. The latter proved far easier, and the pictures are much more aesthetically pleasing.

Evaluating the Comparability of Camera Use

At the end of the second training session we gave the videographers a test to measure and document how well they had internalized all they had been taught. A 7-minute mock lesson was created that covered many of the situations videographers needed to know how to handle. The lesson was taught three times, each one identical to the others, and was taped each time by one of the three videographers. The resulting tapes were analyzed and evaluated to make sure that our videographers would shoot lessons in a standardized manner.

To evaluate the videographers' performance on the test we first produced a description of how the test lesson should have been videotaped. We listed the 22 events that took place in the lesson, then determined how each event should be taped given the procedures we had developed.

Once we had a description of how the test-lesson should have been taped, we evaluated each videographer's performance against this ideal. We used a three-point scale to score how well they taped each of the 22 lesson events. They were given a score of zero if they broke any of the rules that they needed to take into account; for example, if they did not zoom in to capture information that they were supposed to capture, or if they pointed the camera at the wrong thing, they would be given a score of zero. They were given a score of one if they showed an understanding of the rule they needed to carry out but did not apply it in a timely fashion. For example, if they needed to zoom in and capture what the teacher was pointing to but reacted too slowly and missed this information, or if they let the teacher walk around the class for a while before they decided to follow her, they would receive a score of one. They were given a score of two if they applied the rules exactly as we had predicted they should.

The scores obtained were all in a similar range and also were relatively high. The German videographer received a score of 35 out of a possible total of 44. The Japanese videographer received a score of 36 and the U.S. videographer a score of 43. In addition, of the 66 events scored for the three videographers only four were rated a zero (which means that a rule was actually broken only four times). Two of these zeroes were obtained by the German and two by the Japanese videographer. This means that no videographer ever showed more than two rule breeches for the entire test.

The test lesson tapes were also used to evaluate the quality of each videographer's camera work. First we generated a list of possible flaws that a videographer might produce. Our list included the following flaws:

  • Cropping shots too tightly (e.g., cutting off part of someone's head)

  • Cropping shots too wide (e.g., too much head room)

  • Zooming in/out and then having to reframe the shot

  • Zooming in/out and then having to refocus the shot

  • Panning while zoomed in tightly

  • Jerky or awkward camera movement during zooms or pans

  • Losing from the frame the object that is being tracked

  • Unnecessary camera movement

  • Bad coordination between zooms and pans

  • Very unbalanced composition

  • We used this list to score each videographer's performance on a four-point scale for each of the 22 events in the test lesson. Videographers were given a score of three on an event if we could find no flaw in their camera work. They received a score of two if one flaw could be found, a score of one if two flaws could be found, and a score of zero if at least three flaws could be found.

    All videographers obtained scores that were within a similar range and judged to be satisfactorily high. The Japanese videographer received a score of 51 out of a possible total of 66. The German videographer received a score of 52, and the U.S. videographer a score of 60.

    Both evaluations of the test confirmed our informal impression that camera standardization had been reached by the end of the training.

    Videographers were in the field for a prolonged period of time. We worried, therefore, that they might slowly forget what they were taught or develop bad habits. In order to make sure that they continued using the camera correctly, every 10th tape that came in from the field was evaluated using a scoring system similar to the one described. Videographers were given feedback about how they were doing. In particular, they were immediately informed if they had, in any way, drifted away from the standards we knew they were able to follow. In actuality, this almost never happened.

    Some Notes on Equipment

    The quality of the data depends to a great extent on the quality of the equipment used in collecting the data. Thus, we wanted high quality cameras that would produce excellent images, and high quality microphones that would enable us to hear most of what goes on in the classroom. At the same time, we needed equipment that could be operated by a single videographer.

    The camera we selected was a Sony EVW-300 three-chip professional Hi-8 camcorder. Each camera was mounted on a Bogen fluid-head tripod. (Tripods that are not fluid head will produce jerky camera movements.) A small LCD monitor was mounted on the camera to help operators view what they were taping. Sound was collected using two microphones, one a radio microphone worn by the teacher, the second a shotgun zoom microphone mounted on the camera. This equipment was used both for data collection and for training videographers.

     


    TEACHER QUESTIONNAIRE

    A complete copy of the English version of the teacher questionnaire can be found in appendix D. The purpose of the teacher questionnaire was to elicit information that would aid us in analysis and interpretation of the videotapes. Items for the questionnaire were generated by project personnel in consultation with persons working on the main TIMSS questionnaire, questionnaire design specialists from Westat,1 mathematics educators, and classroom teachers. Questions were edited and selected to yield a questionnaire that could be completed by teachers in approximately 20-30 minutes.

    The questionnaire was translated and back-translated into German and Japanese and then pilot-tested on teachers participating in the field test. German, Japanese, and U.S. collaborators discussed the responses from the field test, and based on these discussions the questionnaire was revised.

    The final translation of the questionnaire was painstakingly reviewed, question by question, by a group of German, Japanese, and U.S. researchers, each of whom was bilingual in two of the three languages. Questions that were judged too difficult to translate accurately were dropped from the questionnaire.

    The resulting questionnaire consisted of three parts with a total of 28 questions. In Part A we asked questions about the lesson that we videotaped, and about how the class was constituted and who the students were. In Part B we asked the teachers to compare what happened in the videotaped lesson with what would typically transpire in their classroom. In Part C we asked teachers to describe what they know about current ideas on mathematics teaching and learning, and asked them to evaluate their own teaching in the videotape in light of these current ideas.

    The information collected in the questionnaire served three purposes. First, it helped us to assess the quality and comparability of our samples across the three countries. Although teachers were instructed not to prepare in any special way for the videotaping, we were still aware that what we saw on the videotape might not be typical of what normally happens in a given classroom. Teachers thus were asked to rate the typicality of the videotaped lesson, and these ratings were compared across countries. Similarly, we were able to assess the comparability of the samples across the three countries along several other dimensions as well. For example, whether a lesson dealt with new material or review material might be expected to influence the kind of teaching technique used. Knowing the percentage of lessons in each country that were new versus review helped us to judge the comparability of the samples.

    A second purpose for the questionnaire was to provide coders with information that would help them interpret what they saw on the videotapes. For example, it is often necessary to know the teacher's goal for a lesson in order to make sense of the activities that constitute the lesson, and so we asked the teacher to state his or her goal for the lesson. Similarly, to interpret the meaning a specific question has for students it is often helpful to know whether the question probes new material or reviews previously learned information. Again, teachers were asked to categorize the content of the lesson in this way on the questionnaire.

    Third, the questionnaire responses did, in some cases, enter directly into the analyses--statistical and qualitative--of the videotapes. This occurred in several ways. First, questionnaire responses were entered into correlational analyses within each country to help us relate contextual factors to variations in classroom instruction. Second, by asking teachers to comment on the lesson that was videotaped we were able to learn more about how teachers interpret the language of reform in mathematics education. For example, if a teacher told us that her lesson was focused on problem solving, we could look at the video to see what she meant by the term "problem solving."

    Response rates on the questionnaire were high: In Germany, 91 percent of the teachers whom we videotaped returned their questionnaires, in Japan, 94 percent, and in the United States, 97.5 percent.

     


    CONSTRUCTING THE MULTIMEDIA DATABASE

    Once the tapes were collected, they were sent to project headquarters at University of California, Los Angeles (UCLA) for transcription, coding, and analysis. The first step in this process was to digitize the video and store it in a multimedia database, together with scanned images of supplementary materials. The videotapes were then transcribed, and the transcript was linked by time codes to the video. This multimedia database was then accessed, coded, and analyzed using the multimedia database software system that we developed for this project.

    Digitizing, Compression, and Storage on CD-ROM

    To facilitate the processing of such large quantities of video data, we decided to digitize all of the video and supplementary materials, which allowed them to be stored, accessed, and analyzed by computer. Each videotape was digitized, compressed, and stored on CD-ROM disks, one lesson per disk. We then designed and built a multimedia database software application that would enable us to organize, transcribe, code, and analyze the digital video.

    Digital video offers several advantages over videotape for use in video surveys. First, the resulting files are far more durable and long lasting than videotape. CD-ROM disks are assumed to last for at least 100 years, as opposed to a much shorter lifespan for videotape. Digital video files also can be copied without any loss in quality, which again is not true for videotapes. And, digital files will not wear out or degrade with repeated playing and re-playing of parts of the video. Digital video also enables random, instantaneous access to any location on the video, a feature that makes possible far more sophisticated analyses than are possible with videotape. For example, when coding a category of behavior it is possible to quickly review the actual video segments that have been marked for that category. This rapid retrieval and viewing of coded segments makes it much easier to notice inconsistencies in coding, or to discover new patterns of behavior, that would not be possible without such rapid access.

    As videotapes arrived in Los Angeles they were digitized and compressed into MPEG-1 format on a large hard disk. Text pages, worksheets, and other supplementary materials collected by the videographers were digitized on a flatbed scanner and stored in PICT format on the same hard disk drive as the accompanying videotape. All files for each lesson were then burned onto a single CD-ROM disk.

    Transcription/Translation of Lessons

    Transcription of videotapes is essential for coding and analysis. Without a transcript, coders have difficulty hearing, much less interpreting, the complex flow of events that stream past in a classroom lesson. It also is possible to code some aspects of instruction directly from the transcript, without viewing the video at all. We thus transcribed, as accurately as possible, the words spoken by both the teacher and the students in each lesson and, for German and Japanese lessons, translated the transcriptions into English.

    We had several reasons for translating the German and Japanese tapes into English. First, translations were used for training coders from different cultures to apply codes in a comparable way, and for establishing independent inter-rater reliability of codes across coders from different cultures. Even though a translation is never perfect, agreement between coders working with a translation can give us a rough estimate of how reliable a code is. A second purpose for the translation was to aid us in multilanguage searches of the database. If we want to locate, for example, times when a teacher discussed the concept of area we can search using the English word "area." Finally, having the lessons translated allows members of the research team not fluent in German or Japanese to view and understand lessons taught in those languages.

    Procedures were developed to ensure that all transcriptions/translations were carried out in a standardized manner. For example, transcribers were given rules about how to indicate speakers, how to break speech into turns, how to use punctuation in a standardized manner, and how to translate technical terms in a consistent way. Using the multimedia database software developed for this project, coders had instant access to the video as they worked with the linked transcripts, and so could easily retrieve the context needed for interpreting the transcript. It was therefore not necessary to transcribe the contextual information generally needed for understanding written transcripts. By the same token, translations of the German and Japanese lessons did not have to be perfect, as all coding was done by native speakers of the language being coded. The translations served as a guide, but not as the actual foundation for coding. Coders did not rely on translations to make subtle judgments about the contents of the video.

    Videotapes were transcribed and translated by teams of transcribers fluent in the language they were transcribing. Some members of the German and Japanese teams were native speakers of those languages, while others were native speakers of English but fluent in German or Japanese. Each tape was transcribed/translated in two passes. One person worked on the first pass transcription/translation of a tape and then a different person was assigned to review the work. A hard copy of the first pass transcription/translation was printed out, and the reviewer marked any points of disagreement on this copy. The two individuals then met, discussed all the proposed revisions, and came to an agreement about what the final version should be. In the extremely rare cases in which disagreements could not be resolved, a third party was consulted.

    The last step in the transcription/translation process was to time code the tapes (i.e., to mark the exact point at which each utterance begins).

     


    DEVELOPING CODES

    Deciding What to Code

    In deciding what to code we had to keep two goals in mind: First, we wanted to code aspects of instruction that relate to our developing construct of instructional quality; Second, we wanted the codes we used to provide a valid picture of instruction in three different cultures.

    For the first goal, we sought ideas of what to code from the research literature on the teaching and learning of mathematics, and from reform documents--such as the NCTM Professional Standards for Teaching Mathematics--that make clear recommendations about how mathematics ought to be taught. We wanted to code both the structural aspects of instruction (i.e., those things that the teacher most likely planned ahead of time), and the online aspects of instruction, (i.e., the processes that unfold as the lesson progresses).

    The dimensions of instruction we judged most important included the following:

    • The nature of the work environment. How many students are in the class? Do they work in groups or individually? How are the desks arranged? Do they have access to books and other materials? Is the class interrupted frequently? Do the lessons stay on course, or do they meander into irrelevant talk?
    • The nature of the work that students are engaged in. How much time is devoted to skills, problem solving, and deepening of conceptual understanding? How advanced is the curriculum? How coherent is the content across the lesson? What is the level of mathematics in which students are engaged?
    • The methods teachers use for engaging students in work. How do teachers structure lessons? How do teachers set up for seatwork, and how do they evaluate the products of seatwork? What is the teacher's role during seatwork? What kinds of discourse do teachers engage in during classwork? What kinds of performance expectations do teachers convey to students about the nature of mathematics?

     

    Our second goal was to accurately portray instruction in Germany, Japan, and the United States. Toward this end, we were concerned that our descriptions of classrooms in other countries make sense from within those cultures, and not just from the U.S. point of view. One of the major opportunities of this study, after all, is that we may discover approaches to mathematics teaching in other cultures that we would not discover looking in our culture alone. We wanted to be sure that if different cultural scripts underlie instruction in each country, we would have a way to discover these scripts.

    For this reason, we also sought coding ideas from the tapes themselves. In a field test, in May 1994, we collected nine tapes from each country. We convened a team of six code developers--two from Germany, two from Japan, and two from the United States--to spend the summer watching and discussing the contents of the tapes in order to develop a deep understanding of how teachers construct and implement lessons in each country.

    The process was a straightforward one: We would watch a tape, discuss it, and then watch another. As we worked our way through the 27 tapes we began to generate hypotheses about what the key cross-cultural differences might be. These hypotheses formed the basis of codes (i.e., objective procedures that could be used to describe the videotapes quantitatively). We also developed some hypotheses about general scripts that describe the overall process of a lesson and devised ways to validate these scripts against the video data.

    Developing Coding Procedures

    Once we had developed a list of what to code we began developing the specific coding procedures. Coding procedures were developed by a group of four code developers, all of whom had participated in the initial viewing and discussion of the 27 field test tapes. One of the developers was from Germany (Knoll), one from Japan (Kawanaka), and one from the United States (Serrano). Each of these three were doctoral students in either psychology or education, and all had classroom teaching experience. The fourth member of the team was a doctoral student in applied linguistics (Gonzales), also from the United States, who helped us work through the technical issues involved in coding classroom discourse.

    The coding development group first viewed field test tapes and a definition of the category to be coded was proposed. Each member of the group then attempted to apply the definition to field-test tapes from their country. Difficulties were brought back to the group and definitions were revised and refined. This process was repeated until all members of the group were satisfied with the definitions and procedures, and in agreement with the coding of each instance.

    Once codes were developed, coders were trained to implement the codes. Coders, like the code developers, came from Germany, Japan, and the United States. In order to reduce the likelihood that subtle contextual cues would be missed or misinterpreted, coders only coded tapes from their country, except for purposes of training or the assessment of inter-rater reliability. Training was comprised of several activities. Code developers used group meetings to present definitions and discuss procedures for coding. Coders then practiced coding on the field test tapes. Results of practice coding were brought back to the group for discussion and any disagreements were resolved. This process was repeated until coders from each country were applying the codes in a consistent way.

    Before beginning to code the main study tapes, a formal reliability assessment was conducted to insure independent agreement across coders at a level of at least 80 percent for each judgment. Reliability was assessed by comparing each coder's results with a standard produced by the coding development team. If reliability could not be established at the 80 percent level, the code was either dropped or sent back for revision. For the reliability assessment, coders worked with tapes from all countries, relying on English translations when necessary. We reasoned that reliability established across coders from different cultural backgrounds would be a low estimate of the actual reliability achieved among coders coding only tapes from their native countries. This also enabled us to make sure that coders from different countries were applying the codes in a comparable way.

    Throughout this process we endeavored to be strategic. For example, just having collected 100 hours of video does not mean that all 100 hours must be analyzed. Depending on the frequency of what is being coded, it may be possible to time sample or event sample. It is also important to divide coding tasks into passes through the data in order to lessen the load on coders. This increases reliability and speeds up coding.

    Implementation of Codes Using the Software

    The code module of our software enables coders to view synchronized video and transcript on their computer screen. On-screen controls allow them to move instantly to the point in the video that corresponds to the highlighted transcript record, or to the point in the transcript that is closest in time to the current frame of video. Depending on the code, codes can be marked either as time codes in the video or as highlighting in the transcript.

     


    FIRST-PASS CODING: THE LESSON TABLES

    We have found that it is useful to have an intermediate representation of each lesson that can serve to guide coders as they try to comprehend a lesson, and that can be coded itself. For this purpose, our first step in coding the lessons was to construct a table that maps out the lesson along several dimensions. Each of these is defined in more detail as we present the results of the study, but a general idea of what they are is useful at this point:

    • Organization of class. Each videotape is divided into three segments: pre-lesson activities (Pre-LA), lesson, and post-lesson activities (Post-LA). The lesson needs to be defined in this way because lesson is the basic unit of analysis in the study.
    • Outside interruption. Interruptions from outside the class that take up time during the lesson (e.g., announcements over the public address system) are marked on the tables as well.
    • Organization of interaction. The lesson is divided into periods of classwork (CW), periods of seatwork (SW), and periods of mixed organization. Seatwork segments are characterized as being Individual (students working on their own, individually), Group, or Mixed.
    • Activity segments. Each classwork and seatwork segment is further divided, exhaustively, into activity segments according to changes in pedagogical function. We defined four major categories of activities: Setting Up, Working On, Sharing, and Teacher Talk/Demonstration. (Each of these was divided into subcategories, which are defined more completely in Chapter 4.)
    • Mathematical content of the lesson. The mathematical content of the lesson is described in detail. Content is marked, for analytical purposes, into units which are noted on the table: Tasks, Situations, Principles/Properties/Definitions, Teacher Alternative Solution Methods [TASM] and Student Generated Solution Methods [SGSM]. (A more detailed description of each can be found below.) In addition, frames from the video are digitized and included in the table to help illustrate the flow and content of the lesson.

     

    An example of what the resulting tables look like is shown in figure 3, which represents one of the Japanese lessons in our sample (JP-012).2 The table contains five columns. The first column indicates the time code at which each segment begins as well as the corresponding page number from the printed lesson transcript. The second column shows the segmentation of the lesson by organization of interaction, the third by activity. The fourth and fifth columns show the symbolic description of the content and the concrete description of the content, respectively. Rows with lines between them show segment boundaries. Seatwork segments are shaded gray. The acronyms used refer to the coding categories described.

     

    Figure 3

    Example of first-pass coding table for Japanese lesson (JP-012)

    ID: JP012
    Topic: 1.4.1 Transformations
    Materials: Chalkboard; computer

     

    Page # (Time)
    Organization
    Activity
    Content
    Symbol
    Description of Content
    1 (00:01) Pre-LA      
    1 (00:27) CW Working On: T/S/PPD PPD1 (Computer) The triangles between two parallel lines have the same areas.
           
    ppd1.gif
    1 (01:26)   Setting Up: M S1 (Chalkboard) There is Eda's land. There is Azusa's land. And these two peoples' border line is bent but we want to make it straight.
            edaazusa.gif
    3 (03:34)     T1 [A] Try thinking about the methods of changing this shape without changing the area.
    3 (04:05) SW: I Working On: T/S/PPD    
    4 (07:04) SW: G     "People who have come up with an idea for now work with Mr. Azuma, and people who want to discuss it with your friends, you can do so. And for now I have placed some hint cards up here so people who want to refer to those, please go ahead."
    16 (19:20) CW Sharing: T/S SGSM1 sgsm1.gifFirst you make a triangle. Then you draw a line parallel to the base of triangle. Since the areas of triangles between two parallel lines are the same we can draw a line here. [See the diagram]
    16 (19:20) CW Sharing: T/S SGSM2 sgsm2.gifWe make a triangle and draw a line parallel to the base of the triangle by fitting it with the apex. Since the length of the base doesn't change and the height in between the parallel lines doesn't change. So whereever you draw it the area doesn't change with the triangle that we got first.
    19 (22:57)   Setting Up: Phys/Dir S2 (Chalkboard)
           

    s2chalkboard.gif

    19 (23:25) T2 [A]    

    Without changing the area please try making it into a triangle.

    19 (23:39) SW: I Working On: T/S/PPD    
    21 (26:47) SW: SG     "Then people who are done please go to Mr. Azuma again. And people who want hints I will leave hint cards here so please look at them and try doing it. It's also fine to do it with your friends.? (Hint cards unidentified.)
    37 (46:11) CW Sharing: T/S   "We will make them ABCD."
          SGSM1 [Draw a diagonal line AC and make a triangle by drawing a parallel line going through D]
          SGSM2
    sgsm2.gif
    37 (46:11) CW Sharing: T/S SGSM3 [Draw a diagonal line AC and make a triangle by drawing a parallel line going through B]
          SGSM4 sgsm4.gif
          SGSM5 [Draw a diagonal line BD and make a triangle by drawing a parallel line going through A]
          SGSM6 sgsm6.gif
          SGSM7 [Draw a diagonal line BD and make a triangle by drawing a parallel line going through C]
          SGSM8 sgsm8.gif
    39 (48:58) 40 (49:47)   Working On: HW HS1 HT1 Pentagon ABCDE. "Let's try making the pentagon into a triangle ... I'll make that then a homework."
    40 (50:25-50:45) Post-LA      

     

    NOTE: Abbreviations used in the table: Pre-LA=Pre-Lesson Activities; CW=Classwork; T/S/PPD=Task/Situation/Principle Property Definition; PPD1=first Principle/Property/Definition in the lesson; Setting Up: M=Setting Up/Mathematical; S1=first situation in the lesson; T1=first task in the lesson; Ti[A]=initial, key, or target work of Task i; SW:I=Individual Seatwork; SW:G=Seatwork in Groups; SW:SG=Seatwork in Small Groups; Sharing: T/S=Sharing Task/Situation; SGSM1=first Student Generated Alternative Solution Method in the lesson; Setting Up: Phys/Dir=Setting Up: Physical/Directional; HS1=first Homework Situation in the lesson; HT1=first Homework Task in the lesson; HW=Homework; Post-LA=Post-lesson activities.

    SOURCE: U.S. Department of Education, National Center for Education Statistics, Third International Mathematics and Science Study, Videotape Classroom Study, 1994-95.

     

    We used these first-pass tables for two purposes. First, they were used by subsequent coders to get oriented to the contents of the videotapes. Often it takes a great deal of time for coders to figure out what is happening in a lesson. The tables ease the way, providing an overview of the structure and content of each lesson.

    A second use for the tables is as objects of coding themselves. Some aspects of the lesson can be coded from the tables without even going back to the videotapes. Examples of such codes include TIMSS content category, nature of tasks and situations, and changes in mathematical complexity over the course of the lesson.

     


    METHODS FOR DESCRIBING MATHEMATICAL CONTENT

    There are many possible ways of describing mathematical content. One can describe content at a general level in terms of topics (e.g., ratio and proportion, or linear functions); in terms of categories such as "concepts" and "applications"; in terms of the specific tasks and situations assigned to students; or in terms of performance expectations. We attempted in this study to use all of these techniques.

    The bases of our content descriptions are found in the first-pass coding tables we made for each lesson. We presented an example of these tables in the First-Pass Coding section. Here, we will give a more detailed description of how we constructed the content descriptions for these tables.

    As coders watched the video, they first produced a written description of the content in concrete terms. The following example (figure 4), excerpted from the lesson table presented in the First-Pass Coding section (JP-012), illustrates the different kinds of information recorded in the content description. The example shows one task and one situation that the teacher presents to the students, a hint provided by the teacher to the students during Seatwork, and a student's solution method to the problem.

     

    Figure 4

    Excerpt from the content description column of the lesson table for JP-012

     

    (Chalkboard)

    There is Eda's land. There is Azusa's land. And these two people's border line is bent but we want to make it straight.

    edaazusa.gif
    Try thinking about the methods of changing this shape without changing the area.

    "People who have come up with an idea for now work with Mr. Azuma, and people who want to discuss it with your friends, you can do so. And for now I have placed some hint cards up here so people who want to refer to those, please go ahead."

    figure4.gif
    First you make a triangle. Then you draw a line to the base of the triangle. Since the areas of triangles between two parallel lines are the same we can draw a line here. [See the diagram]

     

    SOURCE: U.S. Department of Education, National Center for Education Statistics, Third International Mathematics and Science Study, Videotape Classroom Study, 1994-95.

     

    Once the concrete description is recorded, coders produced symbolic category descriptions of the content. Categorizing the content serves two functions: First, it helps guide the coders as they struggle to determine the proper level of detail to include in the description; second, it is useful for the analysis of content. We used five mutually exclusive categories for describing content:

    • Situation (S)--The mathematical environment in which tasks are accomplished. (For example, real-world scenarios, word problems, and equations could all be the situations within which tasks are performed.)
    • Task (T)--The mathematical goal or operation to be performed on a situation. (For example, "Try thinking about the methods of changing this shape without changing the area" is a task performed within the situation defined by the particular shape that is presented.)
    • Teacher Alternative Solution Method (TASM)--An alternative method for solving a problem. A first method must be presented within the same lesson in order for there to be an alternative method coded.
    • Student Generated Solution Method (SGSM)--A solution method generated and then presented by a student.
    • Principles, Properties, and/or Definitions (PPD)--Mathematical information that is not contained in tasks and situations.

     

    Each of these codes are numbered in order to keep track of the content as the lesson unfolds. A change in number signifies a shift to a new event. Numbers of tasks and situations are linked. For example, the notation T1-S1 represents a specific task and situation combination. If the same task is assigned for multiple situations (as might happen, for example, with a worksheet containing a number of similar exercises), the notation might be as follows: T1, S1-1, S1-2, S1-3. The first number after the "S" refers to the task (T1) that is being performed on each situation. The second number (1, 2, 3) refers to the number of different situations. A situation related to more than one task would be expressed similarly: S1, T1-1, T1-2, T1-3.

    Relatedness of tasks/situations to previous events (e.g., Teacher Alternative Solution Methods) is expressed in symbolic form with parentheses. For example, T3-S3 (TASM2) means that task and situation "3" is directly related to the previous Teacher Alternative Solution Method 2.

     


    THE MATH CONTENT GROUP

    As mentioned in the introduction, one advantage of collecting video data is that they can be analyzed from multiple perspectives. In this report, we include some analyses by an independent group of researchers who are experts in mathematics and mathematics teaching. There were four members of this group (see names and brief biographical descriptions in appendix C). One had taught mathematics primarily at the high school level, one primarily at the college level but with extensive experience also teaching high school students, one primarily at the college level, and one at both college and graduate levels.

    The group was assigned the task of analyzing the mathematical content of each lesson based only on information contained in the lesson tables. The group worked with a subsample of 90 lessons, 30 from each country. In order to reduce the possibility of bias, tables were disguised so that the group would not know the country of origin. This typically required only minor changes in such details as the names of persons and currencies. Lesson tables were identified only by an arbitrarily assigned ID code; relation of this code to country was not revealed until all coding was complete. Thus, although the group was denied the additional information that would have been provided by looking at the videotapes, the goal was to make possible a blind analysis of content. The analytic tools for describing the lesson content were developed by this group and will be described when we present the results of their analyses.

    Coding of Discourse

    Language is one of the key tools teachers and students use for instruction and learning. Consequently, focusing on how language is used in the classroom has the potential to enrich our analysis of instructional processes (see, for example, Bellack, 1966; Cazden, 1988; Mehan, 1979). It is also true that reformers of mathematics education have focused a great deal of attention on changing the kind of discourse that goes on during mathematics lessons. Mathematics teachers and students, it is suggested, should use language in much the same way that mathematicians do: to explain, justify, conjecture, and elaborate on mathematical understandings (see, for example, Hiebert & Wearne, 1993; Lampert, 1991).

    Coding of discourse is very labor intensive, and very difficult when working across three languages. We decided to code discourse in several passes, and to employ a sampling scheme to save resources. We based our coding system on previous work (e.g., see references in previous paragraph), and on analysis of the field-test tapes.

    Public and Private Talk

    Our first step in coding discourse was to make a distinction between public and private talk. Public talk was defined as talk intended for everyone to hear; private talk was talk intended only for the teacher or an individual student. When the teacher stopped at an individual student's desk to make comments on the students' work this was generally coded as private talk, regardless of whether others could hear what the teacher was saying. The important thing is that the talk was primarily intended for this individual student alone. On the other hand, if a teacher stopped in the middle of a classwork period to criticize the behavior of a student sitting in the back of the room, this was coded as public talk. In this case, everyone had to stop and listen, even if they were not the one being disciplined.

    All further coding of discourse was done on public talk only. Because public talk was accessible to everyone, we assumed that it would provide the most valid representation of the discourse environment experienced by students in the classroom.

    First-Pass Coding and the Sampling Study

    Next, we divided all transcripts into utterances, which was the smallest unit of analysis used for describing discourse. An utterance was defined as a sentence or phrase that serves a single goal or function. Generally, utterances are small and most often correspond to a single turn in a classroom conversation.

    Utterances were then coded into 12 mutually exclusive categories. Six of the categories were used to code teacher utterances: Elicitation, Direction, Information, Uptake, Teacher Response, and Provide Answer. Five categories were applied to student utterances: Response, Student Elicitation, Student Information, Student Direction, and Student Uptake. One category, Other, could be applied to both teacher and student utterances. Elicitations were further subdivided into five mutually exclusive categories: Content, Metacognitive, Interactional, Evaluation, and Other. And Content Elicitations were subcategorized as well. Definitions of each of these categories will be presented later, together with the results.

    Although all lessons were coded with the first-pass categories in the lesson transcripts, we decided to enter only a sample of the codes into the computer for preliminary analysis.

    Thirty codes were sampled from each lesson according to the following procedure. First, three time points were randomly selected from each lesson. Starting with the last time point sampled, we found the first code in the transcript to occur after the sampled time. From this point, we took the first 10 consecutive codes, excluding Other, that occurred during public talk. If private talk was encountered before 10 codes were found, we continued to sample after the period of private talk. If the end of the lesson was encountered before 10 codes were found, we sampled upward from the time point until 10 codes were found. The same procedure was repeated for the second and first of the three time points. In those cases, if working down in the lesson led us to overlap with codes sampled from a later time point, we reversed and sampled upward from the selected time point.

    Two kinds of summary variables were used for the sampling study: (1) Average number of codes (out of 30) in each lesson that were of each category, and (2) Percentage of lessons that contained any codes of each category (within the 30 codes sampled).

    Second-Pass Coding of Discourse

    For second-pass coding of discourse we decided to work with a subsample of lessons. We chose to study the 30 lessons in each country that had been selected for analysis by the Math Content Group, in part because they were balanced in their representation of algebra and geometry. Before proceeding, however, we wanted to know how well the subsample of 30 lessons in each country represented the larger sample, specifically with regard to discourse. To answer this question we compared the subsample of 30 lessons in the Math Content Group sample with the rest of the lessons in each country on each of the discourse variables produced in the first-pass sampling study (presented earlier).

    Each variable was analyzed using a two-way analysis of variance (ANOVA), with country and sample group as factors. On only one analysis did we find a significant effect of sample group. However, neither for this variable nor for any of the others did we find a significant Country x Sample interaction.

    Several new codes were added for second-pass coding. Content Elicitations, Information statements, and Directions were further subdivided. In addition, we started the process of grouping utterances into higher-level categories we call Elicitation-Response sequences (ER sequences). Elicitation-Response sequences appear to be the next-level building block for classroom conversations. A more detailed definition of all of these categories will be presented later with the results, but for now it is useful to define ER sequences as a sequence of turns exchanged between the teacher and student(s) that begins with an initial elicitation and ends with a final uptake.

     


    STATISTICAL ANALYSES

    Most of the analyses presented in this preliminary report are simple comparisons of either means or distributions across the three countries. In all cases, the lesson was the unit of analysis. All analyses were done in two stages: First, means or distributions were compared across the three countries using either one-way ANOVA or Pearson Chi-Square procedures. Variables coded dichotomously were usually analyzed using ANOVA, using asymptotic approximations.

    Next, if overall analyses were significant, pairwise contrasts were computed and significance determined by the Bonferroni adjustment. In all cases, the Bonferroni adjustment was made assuming three simultaneous tests (i.e., Germany vs. Japan, Germany vs. United States, and Japan vs. United States). In the case of dichotomous variables (for which the sample estimate is a proportion) and continuous variables, we computed Student's t on each pairwise contrast. Student's t was computed as the difference between the two sample means divided by the standard error of the difference. Determination that a pairwise contrast was statistically significant with p < .05 was made by consulting the Bonferroni t tables published by Bailey (1977).

    For categorical variables, we followed the procedure suggested by Wickens (1989) and used the Bonferroni Chi-Square tables printed in that book. Throughout, a significance level criterion of .05 was used. All differences discussed met at least this level of significance, unless otherwise stated. Anytime we use terms such as "less," "more," "greater," "higher," or "lower," for example, the reader can be assured that the comparisons are statistically significant.

    All tests were two-tailed. Statistical tests were conducted using unrounded estimates and standard errors, which were also computed for each estimate. Standard errors for estimates shown in figures in the report are provided in the table in appendix E. Standard errors for estimates indicated in the text but not shown in figures are reported in footnotes to the relevant text.

    Weighting

    All of the analyses reported here were done using data weighted with survey weights, which were calculated for the classrooms in the videotape study itself, separate from any weights calculated for the main TIMSS assessment. The weights were developed for each country, so that estimates are unbiased estimates of national means and distributions. The weight for each classroom reflects the overall probability of selection for the classroom, with appropriate adjustments for nonresponse.

    The analyses also used procedures that accounted for the complex nature of the sample design within each country (with the samples being independent across countries). The jackknife procedure was used, via the WesVarPC software, to account for the fact that a stratified random sample of schools was selected, with one classroom chosen from each selected school. F-tests for the comparison of means across three countries were achieved through the use of linear regression, with dummy variables indicating country as the independent variables. Pairwise t-tests were computed using the 'FUNCTION' capability of the 'TABLE' statement. Chi-square tests were computed using the 'TABLE' statement also, using first-order Rao-Scott corrections to account for the complex sample design.

     


    COMPARISON OF VIDEO SUBSAMPLES WITH MAIN TIMSS SAMPLES

    Despite the exhaustive attempts to select the video subsample randomly from the TIMSS main study sample, it may still be asked: Are the classrooms selected for the video study representative of the larger TIMSS sample?

    Some information relevant to this question can by found by comparing the mathematics achievement scores (i.e., performance on the TIMSS student assessments) of classrooms in the main TIMSS samples with the subsample of classrooms selected for the video study. We did not have test data for all of the classrooms included in the video study; and, data from Japan were somewhat problematic in one respect: Test data were not collected on classrooms included in the video study but on other eighth-grade classrooms in the same schools as the video classrooms. Nevertheless, we did have enough data to warrant a meaningful comparison of the two samples, and the lack of any tracking in Japan gives us some confidence that the school-level estimate of performance in Japan would be a reasonable indicator of classroom-level performance.

    Distributions of mean achievement scores for classrooms in the Main TIMSS samples and in the video subsamples for each country are presented in figure 5. It is apparent in the figure that the distribution of mathematics achievement scores among the video subsamples are representative of distributions in the Main TIMSS samples.3

     

    Figure 5

    Distributions of unweighted average mathematics achievement test scores for classrooms in the Main TIMSS samples and video subsamples from each country

     

    TIMSS Main Study Sample

    Video Study Subsample
    fig5a.ai
    fig5d.ai
    fig5b.ai
    fig5e.ai
    fig5c.ai
    fig5f.ai

     

    NOTE: SD = standard deviation.

    SOURCE: U.S. Department of Education, National Center for Education Statistics, Third International Mathematics and Science Study, Videotape Classroom Study, 1994-95.

     


    VALIDITY OF THE VIDEO OBSERVATIONS

    As mentioned earlier, one of our concerns was that the presence of the video camera might in some way alter the nature of classroom instruction and thus threaten the validity of the study. One step we took to lessen the chances of this happening was to give all teachers a standard set of instructions in which we informed them of the goals of the study. We wanted to be certain that teachers understood that we wanted to film a typical day in their classroom, not one that was prepared especially for us.

    We also attempted to assess how successful we were in sampling what typically happens in these classrooms by asking teachers, after the videotaping, to evaluate the typicality of what we would find on the videotape. We will report these results here.

    Our first concern was that we might get a special lesson, one that the teacher holds in reserve for demonstration purposes. To ascertain whether or not this happened, we asked several questions on our questionnaire about how this particular lesson was chosen, and about how it related to the previous and next lessons that the teacher had taught or would teach to this same class of students. We asked, for example, whether the lesson we videotaped was a stand-alone lesson or part of a sequence of lessons. If the lesson was part of a sequence, we asked them to describe the goals and activities of the adjoining lessons so that we could judge the relationship they had to the lesson on videotape. Our reasoning was that special lessons would show up as stand-alone lessons that were unrelated to the adjoining lessons.

    As it turned out, almost all of the teachers in our sample (97.8 percent in Germany, 95.7 percent in Japan, and 93.4 percent in the United States) reported that the videotaped lesson was part of a sequence of lessons designed to teach a particular topic in the mathematics curriculum. Further, they were able to give clear and reasonable descriptions of how this lesson related to the previous and next lessons in the sequence. This outcome confirmed our sense that teachers did not make drastic accommodations to prepare for our videographer.

    We also asked teachers how many lessons were in the whole sequence of lessons, and where the lesson we videotaped fell in the sequence. The average sequence of lessons reported by U.S. teachers was 9.1 lessons, significantly shorter than the 13.6 and 14.4 lessons reported by German and Japanese teachers respectively.4 However, the position of the videotaped lesson in the sequence did not differ across countries.5

    A more subtle picture of how the presence of the camera might have affected instruction emerges when we look at some of the other judgments teachers made about the lesson. First, it is interesting to see how nervous or tense the teachers felt about being videotaped. Teachers were asked to check one of four choices: Very nervous, somewhat nervous, not very nervous, and not at all nervous. Japanese teachers reported being more nervous than both German and U.S. teachers about the presence of our videographer.6 Figure 6 shows that about three-fifths of U.S. teachers (62.1 percent), almost one-half of German teachers (48.9 percent), and about one-fifth of Japanese teachers (21.6 percent) reported being "not at all nervous" or "not very nervous."

     

    Figure 6

    Teachers' reports of how nervous or tense they felt about being videotaped

     

    fig6.ai.GIF

     

    SOURCE: U.S. Department of Education, National Center for Education Statistics, Third International Mathematics and Science Study, Videotape Classroom Study, 1994-95.

     

    A number of questions were designed to get teachers' evaluations of how typical the lesson on the videotape was of the lessons they normally teach to this class of students. In figure 7 we present teachers' judgments of the quality of the video lesson compared with their usual lesson. Again, we see that Japanese teachers appear to differ from their German colleagues. Twenty-seven percent of Japanese teachers and 18.6 percent of German teachers reported feeling that the lesson on tape was worse than usual. At the other end of the scale, 11.7 percent Japanese teachers and 2.3 percent of German teachers reported feeling that the videotaped lesson was better than usual.

     

    Figure 7

    Teachers' ratings of the quality of the videotaped lesson compared to lessons they usually teach

    fig7.ai.GIF

     

    SOURCE: U.S. Department of Education, National Center for Education Statistics, Third International Mathematics and Science Study, Videotape Classroom Study, 1994-95.

     

    Four other questions probed teacher's judgments of how typical the videotaped lesson was of lessons they normally teach. On these questions, teachers rated the typicality of the teaching methods, the behavior of the students, the tools and materials, and the lesson as a whole. Teachers used a four-point scale, where 1 means very typical/similar to what usually happens, and 4 means completely atypical/very different from what usually happens. Responses to the four questions are summarized in figure 8. Again, the Japanese teachers, on each of the four questions, rated their lessons as less typical than did teachers in the other two countries.

    Figure 8

    Teachers' average ratings of the typicality of various aspects of the videotaped lesson

     

    fig8.ai.GIF

     

    NOTE: 1=very typical, 4=completely atypical.

    SOURCE: U.S. Department of Education, National Center for Education Statistics, Third International Mathematics and Science Study, Videotape Classroom Study, 1994-95.

     

    Although Japanese teachers rated the lessons as significantly less typical than did German or U.S. teachers, the overall ratings were not particularly troubling to us. In fact, when asked to rate the typicality of the lesson as a whole, most teachers in all cultures chose either "very typical" or "mostly typical." 95.6 percent of German teachers, 85.1 percent of Japanese teachers, and 97.4 percent of U.S. teachers responded in this way. In conclusion, the videotape study probably captured a fairly representative picture of what typically happens in eighth-grade mathematics classrooms in these three countries.

     


    1 Westat, of Rockville, Maryland, was the general contractor that conducted the U.S. TIMSS and the Videotape Classroom Study.

    2 Throughout this report, individual lessons from the sample are referred to by ID numbers, which include country of origin (GR, JP, US) followed by the lesson ID. In addition, all names used in excerpts from lesson tables and transcripts have been changed to pseudonyms.

    3 These distributions are based on unweighted average achievement scores for each classroom. Our purpose is simply to compare distributions across pairs of samples within countries, not to make any inferences about true population distributions.

    4 Standard errors for Germany, Japan, and the United States are 0.99, 1.26, and 0.61, respectively.

    5 Estimates for Germany, Japan, and the United States of the position in the sequence (as a ratio of the position in the sequence to the total number in the sequence) are 0.50, 0.46, and 0.53, respectively, with standard errors of 0.035, 0.036, and 0.042.

    6 Estimates for Germany, Japan, and the United States of the level of nervousness (on a four-point scale, with 1 indicating "very nervous" and 4, "not at all nervous") are 2.5, 2.1, and 2.8, respectively, with standard errors of 0.07, 0.09, and 0.12.

     

     

    Questions, problems or comments with this website? Contact timss@ed.gov

    National Center for Education Statistics - http://nces.ed.gov
    U.S. Department of Education