Chapter 1

Chapter 1. Introduction

The Third International Mathematics and Science Study (TIMSS) is the third in a series of international studies, conducted under the auspices of the International Association for the Evaluation of Educational Achievement (IEA), which has assessed the mathematics achievement of students in different countries. The first two of these studies (Husen, 1967; McKnight, Crosswhite, Dossey, Kifer, Swafford, Travers, and Cooney, 1987) established that there were large cross-national differences in achievement and provided some information on contextual factors, such as curriculum, that could be related to the achievement differences.

In these prior studies, students from the United States scored low in comparison to other countries. Not enough was learned, however, about the contextual factors that might help to explain their relatively low performance. Finding out more about the instructional and cultural processes that are associated with achievement thus became a high priority in planning for the TIMSS.

In accordance with this priority, the National Center for Education Statistics (NCES) funded two studies to complement the main TIMSS study. Both of these studies focus on three countries: Germany, Japan, and the United States. The first involves comparative case studies of various aspects of the education systems of each country. The second is the Videotape Classroom Study.

The primary goal of the Videotape Classroom Study is to provide a rich source of information regarding what goes on inside eighth-grade mathematics classes in Germany, Japan, and the United States. We directed our attention to both teachers and students, seeking to describe the classes from both the perspective of teaching practices and that of the opportunities and experiences provided for students.

Aside from these general goals, the study had three additional objectives:

To develop objective observational measures of classroom instruction to serve as quantitative indicators of teaching practices in the three countries;
To compare actual mathematics teaching methods in the United States and the other countries with those recommended in current reform documents and with teachers' perceptions of those recommendations;
To assess the feasibility of applying videotape methodology in future wider-scale national and international surveys of classroom instructional practices.

In this report we will provide a detailed account of the methods used in the study, as well as a preliminary look at the findings up to this point. We have only started to tap the vast wealth of information available in the videos we collected. But we have made great headway in solving the considerable logistical and methodological challenges presented by the study. This report relates what we have learned thus far.

In this introductory section we discuss what can be learned from classroom observation and the advantages offered by the use of video to collect such information. We also discuss the issues and problems that arise in the course of designing and carrying out a large-scale video survey, and we describe some of the approaches we have taken to meeting these challenges. In the Methods section we provide a detailed account of our methods. In subsequent sections we present results, first regarding the content of classroom instruction, then the organization and processes.

STUDYING PROCESSES OF CLASSROOM INSTRUCTION

This is the first large-scale study to collect videotaped records of classroom instruction in the mathematics classrooms of different countries. It also is the first study--for any grade level or subject matter--to attempt direct observation of instructional practices in a nationally representative sample of students within the United States. Thus this study constitutes an important new database and a new approach to data collection for NCES.

Chief among the factors associated with student achievement must surely be the processes of teaching and learning that transpire inside classrooms. Yet, until now there have been no observational data on instructional processes from a national sample of classrooms. In a series of papers commissioned by NCES in 1985, papers designed to set the agency's priorities for the next 10 years, the need for classroom process indicators was raised numerous times (Hall, Jaeger, Kearney, and Wiley, 1985). Cronin (1985), for example, expressed concern with the paucity of data that could document curricular breadth or the actual implementation of curricular reform in the classroom. Peterson (1985) cited a near complete lack of data on the quality of educational activities in the Nation's classrooms, or even on the time teachers devote to various instructional activities. Including such indicators in the future was a recommendation of the 1985 report.

Studies of classroom process can serve two broad purposes: First, they can result in indicators of classroom instruction that can then be used to develop and validate models of instructional quality. That is, we must understand the processes that relate instruction to learning if we are to be able to improve it. A second purpose of such studies is to monitor the implementation of instructional policies in classrooms. One example of such policies is contained in the National Council of Teachers of Mathematics (NCTM) Professional Standards for Teaching Mathematics (1991). The Standards represents one point of view on what instruction should look like in the classroom. Operationalizing this point of view in a system of classroom-based indicators would allow us to assess the degree to which the Standards are being implemented, and by coupling these indicators with performance measures, the effectiveness of the Standards as educational policy.

Despite the obvious value of studying classroom instruction, describing and measuring classroom processes, especially on a large scale, is difficult. To date, measures have been largely based on questionnaires in which teachers report on what happens in their own classrooms. Using questionnaires to measure classroom processes has both advantages and disadvantages, as we discuss here. Observations have different advantages and disadvantages. Although observation is a natural way to study classroom processes, it has generally been considered too difficult and labor intensive for large-scale studies. The methods described here, however, present an approach to overcoming this problem.

Advantages and Disadvantages of Questionnaires for Studying Classroom Processes

Most attempts to measure classroom processes on a large scale have used teacher questionnaires. Teachers have been asked, for example, to report on the percentage of time they spend in lecture or discussion, the degree to which problem solving is a focus in their mathematics classrooms, and so on. Questionnaires have numerous advantages: They are simple to administer to large numbers of respondents and usually can be easily transformed into data files that are ready for statistical analysis.

On the other hand, there are at least three major limitations in using questionnaires to study classroom instruction. First, the words researchers use to describe the complexities of classroom instruction may not be understood in the same way by teachers or in a consistent way across different teachers. The phrase "problem solving" is a good example. Many reformers of mathematics education call for problem solving to become the focus of the lesson. But different teachers interpret this phrase in different ways. One teacher may believe that working on word problems is synonymous with problem solving, even if the problems are so simple that students can solve one in 15 seconds. Another teacher may believe that a problem that can be solved in less than a full class period is not a real problem but only an exercise. Such inconsistency in the use of terms is common in the United States, where teachers have few opportunities to observe or be observed by other teachers in the classroom. It may be that because teacher training in the United States generally does not engage teachers in discussions of classroom instruction, and because teachers are often isolated from one another by the conditions under which they work, teachers do not develop shared referents for the words used to describe instruction. Thus, when teachers fill in questionnaires about their teaching practices, interpreting their responses is problematic.

A second problem with relying on questionnaire-based indicators of instruction concerns their accuracy in reporting processes that may, at least in part, be outside of their awareness. Teachers may be accurate reporters of what they planned for a lesson (e.g., what kind of demonstration they used to introduce the lesson) but inaccurate when asked to report on the aspects of teaching that can happen too quickly to be under the teacher's conscious control.

A third limitation of questionnaires is their static nature. Teachers can only answer the questions we as researchers thought to ask. An observer might notice something important just by being in the classroom. This problem is more serious in international research, where unfamiliarity with other nations' instructional approaches makes effective questionnaire design difficult.

Advantages and Disadvantages of Live Observations for Studying Classroom Processes

Having discussed some of the advantages and disadvantages involved in using questionnaires to study classroom processes, let us now discuss the advantages and disadvantages of using direct observational techniques. Direct observation overcomes some of the limitations identified for questionnaires: Observations allow behavioral categories to be defined objectively by the researcher, not independently by each respondent. They also enable researchers to study online implementation of instruction as well as the planned, structural aspects. Teachers themselves may be unaware of their behavior in the classroom, yet this same behavior could be easily accessible to the outside observer.

On the other hand, there are clear disadvantages of live observation as well. Just like questionnaires, observational coding schemes can act as blinders and may make it difficult to discover unanticipated aspects of instruction. The use of live observations also introduces significant training problems when used across large samples or, especially, across cultures. A great deal of effort is required to assure that different observers are recording behavior in comparable ways. In fact, when working in different cultures, it may be impossible to achieve high levels of comparability.

THE USE OF VIDEO FOR STUDYING CLASSROOM INSTRUCTION

Bearing in mind the limitations of questionnaires and of live observational coding schemes, especially in the context of cross-cultural research, it was decided to use video for the present study. Most researchers, on hearing the word "video," imagine a small-scale qualitative study. This study is anything but small: Large quantities of video were collected on national samples of teachers. In fact, one goal of this study was to explore video's feasibility for use in producing quantitative indicators based on large samples and on the combination of these quantitative indicators with qualitative information. In this section we will discuss the advantages and disadvantages of video over live observation in the study of classroom processes.

Enables Study of Complex Processes

Classrooms are complex environments, and instruction is a complex process. Live observers are necessarily limited in what they can observe, and this places constraints on the kinds of assessments they can do. Video provides a way to overcome this problem: Observers can code video in multiple passes, coding different dimensions of classroom process on each pass. On the first pass, for example, we coded the organization of the lesson; on the second, the use of instructional materials; and on the third, the patterns of discourse that characterize the classrooms of each country. It would have been impossible for a live observer to code all of these simultaneously.

Not only can coding be done in passes but it also can be done in slow motion. With video, for example, it is possible to watch the same sample of behavior multiple times, enabling coders to describe the behavior in great detail. This makes it possible to conduct far more sophisticated analyses than would be possible with live observers.

Increases Inter-Rater Reliability, Decreases Training Problems

Video also resolves problems of inter-rater reliability that are difficult to resolve in the context of live observations. The standard way to establish the reliability of observational measures is to send two observers to observe the same behavior, then compare the results of their coding. This is often inconvenient and is even infeasible for studies that are performed cross-culturally or in geographically distant locations. Using video to establish reliability means that the behavior can be brought to the observers instead of vice versa. Thus, in the context of a cross-cultural study, observers from different cultural and linguistic backgrounds can work collaboratively, in a controlled laboratory setting, to develop codes and establish their reliability using a common set of video data.

Using video also makes it far easier to train observers. With video, inter-rater reliability can be assessed not only between pairs of observers but between all observers and an expert "standard" observer. Disagreements can be resolved based on re-viewing the video, making such disagreements into a valuable training opportunity. And, the same segments of video can be used for training all observers, increasing the chances that coders will use categories in comparable ways.

Amenable to Post-Hoc Coding, Secondary Analysis

Most survey data sets lose their interest over time. Researchers decide what questions to ask and how to categorize responses based on theories that are prevalent at a given time. Video data, because they are "pre-quantitative," can be re-coded and analyzed as theories change over time, giving them a longer shelf life than other kinds of data. Researchers in the future may code videotapes of today for purposes completely different than those for which the tapes were originally collected.

Amenable to Coding from Multiple Perspectives

For similar reasons, video data are especially suited for coding from multiple disciplinary perspectives. Tapes of mathematics classes in different countries, for example, might be independently coded by psychologists, anthropologists, mathematicians, and educators. Not only is this cost effective, but it also facilitates valuable communication across disciplines. The most fruitful interdisciplinary discussions result when researchers from diverse backgrounds compare analyses based on a common, concrete referent.

Facilitates Integration of Qualitative and Quantitative Information

Video makes it possible to merge qualitative and quantitative analyses in a way not possible with other kinds of data. With live-observer coding schemes the qualitative and quantitative analyses are done sequentially: First, initial qualitative analyses lead to the construction of the coding scheme; then, implementation of the coding scheme leads to a re-evaluation of the qualitative analysis.

When video is available it is possible to move much more quickly between the two modes of analysis. Once a code is applied and a quantitative indicator produced, the researcher can go back and look again more closely at the video segments that have been categorized together. This kind of focused qualitative observation makes it possible to refine and improve the code, and may even provide the basis for a new code.

Provides Referents for Teachers' Descriptions

Mentioned earlier was the problem that teachers lack a set of shared referents for the words they use to describe classroom instruction. Video can, in the long run, provide teachers, as potential consumers of the research, with a set of such referents. Definitions of instructional quality and the indicators developed to assess instructional quality could be linked to a library of video examples that teachers can use in the course of their professional development. In the long run, a shared set of referents can lead to the development of more efficient and valid questionnaire-based indicators of instructional quality.

Facilitates Communication of the Results of Research

It is also possible, with video, to use concrete video examples in reporting research results. This gives consumers of the information a richer qualitative sense of what each category in the coding system means and a concrete basis for interpreting the quantitative research findings.

Provides a Source of New Ideas for How to Teach

Another advantage of video over other kinds of data is that it becomes a source of new ideas on how to teach. Because these new ideas are concrete and grounded in practice, they have immediate practical potential for teachers. Questionnaires and coding schemes can help us spot trends and relationships, but they can't demonstrate a new way of teaching the Pythagorean theorem.

Disadvantages

Despite all its advantages, video also has some disadvantages. At the very least, video raises a number of problematic issues that must be addressed if it is to yield accurate and valid information about classroom processes. In the next section we will discuss some of these issues and challenges.

ISSUES IN VIDEO RESEARCH

This section briefly discusses a number of issues that must be resolved in order to conduct meaningful video research.

Standardization of Camera Procedures

Left to their own devices, different videographers will photograph the same classroom lesson in different ways. One may focus on individual students, another may shoot wide shots in order to give the broadest possible picture of what is happening in the classroom. Yet another might focus on the teacher or on the blackboard. Because we want to study classroom instruction, not the videographers' camera habits, it is important to develop standardized procedures for using the camera and then to carefully train videographers to follow these procedures. This study has done so, and the procedures are described in the Methods section of this document.

The Problem of Observer Effects

What effect does the camera have on what happens in the classroom? Will students and teachers behave as usual with the camera present, or will we get a view that is biased in some way? Might a teacher, knowing that she is to be videotaped, even prepare a special lesson just for the occasion that is unrepresentative of her normal practices?

This problem is not unique to video studies. Questionnaires have the same potential for bias: Teachers' questionnaire responses, as well as their behavior, may be biased toward cultural norms. On the other hand, it may actually be easier to gauge the degree of bias in video studies than in questionnaire studies. Teachers who try to alter their behavior for the videotaping will likely show some evidence that this is the case. Students, for example, may look puzzled or may not be able to follow routines that are clearly new for them.

It also should be noted that changing the way a teacher teaches is notoriously difficult to do, as much of the literature on teacher development suggests. It is highly unlikely that teaching could be improved significantly simply by placing a camera in the room. On the other hand, teachers will obviously try to do an especially good job, and may do some extra preparation, for a lesson that is to be videotaped. We may, therefore, see a somewhat idealized version of what the teacher normally does in the classroom.

Minimizing Bias Due to Observer Effects

This study used three techniques for minimizing observer bias. First, instructions were standardized across teachers. The goal of the research was clearly communicated to the teacher in carefully written, standard instructions. Teachers were told that the goal was to videotape a typical lesson with typical defined as whatever they would have been doing had the videographer not shown up. Teachers were also explicitly asked to prepare for the target lesson just as they would for a typical lesson. (A copy of information given to teachers prior to the study is included as appendix A.)

Second, this study attempted to assess the degree to which bias occurred. After the videotaping, teachers were asked to fill out a questionnaire in which they rated, for example, the typicality of what we would see on the videotape, and describe in writing any aspect of the lesson they felt was not typical. We also asked teachers whether the lesson in the videotape was a stand-alone lesson or part of a sequence of lessons and to describe what they did yesterday and what they plan to do in tomorrow's lesson. Lessons described as stand-alone and as having little relation to the lessons on adjoining days would be suspect for being special lessons constructed for the purpose of the videotaping. In this study, however, lessons were rarely described in this way.

Finally, one must use common sense in deciding the kinds of indicators that may be susceptible to bias and taking this into account in interpreting the results of a study. It seems likely, for example, that students will try to be on their best behavior with a videographer present, and so we may not get a valid measure from video of the frequency with which teachers must discipline students. On the other hand, it is probably less likely that teachers use a different style of questioning while being videotaped than they would when the camera is not present. Some behaviors, such as the routines of classroom discourse, are so highly socialized as to be automatic and thus difficult to change.

Sampling and Validity

Observer effects are not the only threat to validity of video survey data. Sampling--of schools, teachers, class periods, lesson topics, and parts of the school year--is a major concern.

One key issue is the number of times any given teacher in the sample should be videotaped. This obviously will depend on the level of analysis to be used. If we need a valid and reliable picture of individual teachers, then we must tape the teacher multiple times, as teachers vary from day to day in the kind of lesson they teach, as well as in the success with which they implement the lesson. If we want a school-level picture, or a national-level picture, then we obviously can tape each teacher fewer times, provided we resist the temptation to view the resulting data as indicating anything reliable about the individual teacher.

On the other hand, taping each teacher once limits the kinds of generalizations we can make about instruction. Teaching involves more than constructing and implementing lessons. It also involves weaving together multiple lessons into units that stretch out over days and weeks. If each teacher is taped once, it is not possible to study the dynamics of teaching over the course of a unit. Inferences about these dynamics cannot necessarily be made, even at the aggregate level, based on one-time observations.

Another sampling issue concerns representativeness of the sample across the school year. This is especially important in cross-national surveys where centralized curricula can lead to high correlations of particular topics with particular months of the year. In Japan, for example, the eighth-grade mathematics curriculum devotes the first half of the school year to algebra, the second half to geometry. Clearly, the curriculum would not be fairly represented by taping in only one of these two parts of the year.

Finally, although at first blush it may seem desirable to sample particular topics in the curriculum in order to make comparisons more valid, in practice this is virtually impossible. Especially across cultures, teachers may define topics so differently that the resulting samples become less rather than more comparable. Randomization appears to be the most practical approach to insuring the comparability of samples.

Confidentiality

The fact that images of teachers and students appear on the tapes makes it more difficult than usual to protect the confidentiality of study participants when the data set is used for secondary analyses. An important issue, therefore, concerns how procedures can be established to allow continued access to video data by researchers interested in secondary analysis.

One option is to disguise the participants by blurring their faces on the video. This can be accomplished with modern-day digital video editing tools, but it is expensive at present to do this for an entire data set. A more practical approach is to define special access procedures that will enable us to protect the confidentiality of participants while still making the videos available as part of a restricted-use data set.

Logistics

Contrary to traditional surveys, which require intensive and thorough preparation up front, the most daunting part of video surveys is in the data management and analysis phase. Information entered on questionnaires is more easily transformed into computer readable format than is the case for video images. Thus, it is necessary to find a means to index the contents of the hundreds of hours of tape that can be collected in a video survey. Otherwise, the labor involved in analyzing the tapes grows enormously.

Once data are indexed, there is still the problem of coding. Coding of videotapes is renowned as highly labor intensive. But there are strategies available for bringing the task under control. The present study has developed specialized computer software to help in this task. Emerging multimedia computing technologies will, over the next several years, revolutionize the conduct of video surveys, making them far more feasible than they have ever been in the past.

HARNESSING THE POWER OF THE ANECDOTE

Anecdotes and images are vivid and powerful tools for representing and communicating information. One picture, it is said, is worth a thousand words. On the other hand, anecdotes can be misleading and even completely unrepresentative of reality. Furthermore, research in cognitive psychology has shown that the human information processing system is easily misled by anecdotes, even in the face of contradictory and far more valid information (e.g., Nisbett and Ross, 1980). Methods of research design and inferential statistics were developed, in fact, specifically to protect us from being misled by anecdotes and experiences (Fisher, 1951).

A video survey, like the one being described here, provides one possible way to resolve this tension between anecdotes and statistics. Recognizing the power of video images, one can harness this power in two ways. First, discoveries made through qualitative analysis of the videos can be validated by statistical analysis of the whole set of videos. For example, while watching a video we might notice some interesting technique used by a Japanese teacher. If we only had one video, it would be hard to know what to make of this observation: Do Japanese teachers really use the technique more than U.S. teachers, or did we just happen to notice one powerful example in the Japanese data? Because we have a large sample of videos, we can turn our observation into a hypothesis that can be validated against the database.

In a complementary process, we might, after coding and quantitative analysis of the video data, discover a statistical relationship in the data. By returning to the actual video, we can find concrete images to attach to our discovery, giving us a means of further analysis and exploration, as well as a set of powerful images that can be used to communicate the statistical discovery we have made. Through this process we can uncover what the statistic means in practice.

Questions, problems or comments with this website? Contact timss@ed.gov