Chapter
1. Introduction
The Third International
Mathematics and Science Study (TIMSS) is the third in a series of international
studies, conducted under the auspices of the International Association
for the Evaluation of Educational Achievement (IEA), which has assessed
the mathematics achievement of students in different countries. The first
two of these studies (Husen, 1967; McKnight, Crosswhite, Dossey, Kifer,
Swafford, Travers, and Cooney, 1987) established that there were large
cross-national differences in achievement and provided some information
on contextual factors, such as curriculum, that could be related to the
achievement differences.
In these prior studies,
students from the United States scored low in comparison to other countries.
Not enough was learned, however, about the contextual factors that might
help to explain their relatively low performance. Finding out more about
the instructional and cultural processes that are associated with achievement
thus became a high priority in planning for the TIMSS.
In accordance with this
priority, the National Center for Education Statistics (NCES) funded two
studies to complement the main TIMSS study. Both of these studies focus
on three countries: Germany, Japan, and the United States. The first involves
comparative case studies of various aspects of the education systems of
each country. The second is the Videotape Classroom Study.
The primary goal of the
Videotape Classroom Study is to provide a rich source of information regarding
what goes on inside eighth-grade mathematics classes in Germany, Japan,
and the United States. We directed our attention to both teachers and
students, seeking to describe the classes from both the perspective of
teaching practices and that of the opportunities and experiences provided
for students.
Aside from these general
goals, the study had three additional objectives:
- To develop objective
observational measures of classroom instruction to serve as quantitative
indicators of teaching practices in the three countries;
- To compare actual
mathematics teaching methods in the United States and the other countries
with those recommended in current reform documents and with teachers'
perceptions of those recommendations;
- To assess the feasibility
of applying videotape methodology in future wider-scale national and
international surveys of classroom instructional practices.
In this report we will
provide a detailed account of the methods used in the study, as well as
a preliminary look at the findings up to this point. We have only started
to tap the vast wealth of information available in the videos we collected.
But we have made great headway in solving the considerable logistical
and methodological challenges presented by the study. This report relates
what we have learned thus far.
In this introductory section
we discuss what can be learned from classroom observation and the advantages
offered by the use of video to collect such information. We also discuss
the issues and problems that arise in the course of designing and carrying
out a large-scale video survey, and we describe some of the approaches
we have taken to meeting these challenges. In the Methods section we provide
a detailed account of our methods. In subsequent sections we present results,
first regarding the content of classroom instruction, then the organization
and processes.
STUDYING
PROCESSES OF CLASSROOM INSTRUCTION
This is the first large-scale
study to collect videotaped records of classroom instruction in the mathematics
classrooms of different countries. It also is the first study--for any
grade level or subject matter--to attempt direct observation of instructional
practices in a nationally representative sample of students within the
United States. Thus this study constitutes an important new database and
a new approach to data collection for NCES.
Chief among the factors
associated with student achievement must surely be the processes of teaching
and learning that transpire inside classrooms. Yet, until now there have
been no observational data on instructional processes from a national
sample of classrooms. In a series of papers commissioned by NCES in 1985,
papers designed to set the agency's priorities for the next 10 years,
the need for classroom process indicators was raised numerous times (Hall,
Jaeger, Kearney, and Wiley, 1985). Cronin (1985), for example, expressed
concern with the paucity of data that could document curricular breadth
or the actual implementation of curricular reform in the classroom. Peterson
(1985) cited a near complete lack of data on the quality of educational
activities in the Nation's classrooms, or even on the time teachers devote
to various instructional activities. Including such indicators in the
future was a recommendation of the 1985 report.
Studies of classroom process
can serve two broad purposes: First, they can result in indicators of
classroom instruction that can then be used to develop and validate models
of instructional quality. That is, we must understand the processes that
relate instruction to learning if we are to be able to improve it. A second
purpose of such studies is to monitor the implementation of instructional
policies in classrooms. One example of such policies is contained in the
National Council of Teachers of Mathematics (NCTM) Professional
Standards for Teaching Mathematics
(1991). The Standards
represents one point of view on what instruction should look like in the
classroom. Operationalizing this point of view in a system of classroom-based
indicators would allow us to assess the degree to which the Standards
are being implemented, and by coupling these indicators with performance
measures, the effectiveness of the Standards
as educational policy.
Despite the obvious value
of studying classroom instruction, describing and measuring classroom
processes, especially on a large scale, is difficult. To date, measures
have been largely based on questionnaires in which teachers report on
what happens in their own classrooms. Using questionnaires to measure
classroom processes has both advantages and disadvantages, as we discuss
here. Observations have different advantages and disadvantages. Although
observation is a natural way to study classroom processes, it has generally
been considered too difficult and labor intensive for large-scale studies.
The methods described here, however, present an approach to overcoming
this problem.
Advantages
and Disadvantages of Questionnaires for Studying Classroom Processes
Most attempts to measure
classroom processes on a large scale have used teacher questionnaires.
Teachers have been asked, for example, to report on the percentage of
time they spend in lecture or discussion, the degree to which problem
solving is a focus in their mathematics classrooms, and so on. Questionnaires
have numerous advantages: They are simple to administer to large numbers
of respondents and usually can be easily transformed into data files that
are ready for statistical analysis.
On the other hand, there
are at least three major limitations in using questionnaires to study
classroom instruction. First, the words researchers use to describe the
complexities of classroom instruction may not be understood in the same
way by teachers or in a consistent way across different teachers. The
phrase "problem solving" is a good example. Many reformers of mathematics
education call for problem solving to become the focus of the lesson.
But different teachers interpret this phrase in different ways. One teacher
may believe that working on word problems is synonymous with problem solving,
even if the problems are so simple that students can solve one in 15 seconds.
Another teacher may believe that a problem that can be solved in less
than a full class period is not a real problem but only an exercise. Such
inconsistency in the use of terms is common in the United States, where
teachers have few opportunities to observe or be observed by other teachers
in the classroom. It may be that because teacher training in the United
States generally does not engage teachers in discussions of classroom
instruction, and because teachers are often isolated from one another
by the conditions under which they work, teachers do not develop shared
referents for the words used to describe instruction. Thus, when teachers
fill in questionnaires about their teaching practices, interpreting their
responses is problematic.
A second problem with
relying on questionnaire-based indicators of instruction concerns their
accuracy in reporting processes that may, at least in part, be outside
of their awareness. Teachers may be accurate reporters of what they planned
for a lesson (e.g., what kind of demonstration they used to introduce
the lesson) but inaccurate when asked to report on the aspects of teaching
that can happen too quickly to be under the teacher's conscious control.
A third limitation of
questionnaires is their static nature. Teachers can only answer the questions
we as researchers thought to ask. An observer might notice something important
just by being in the classroom. This problem is more serious in international
research, where unfamiliarity with other nations' instructional approaches
makes effective questionnaire design difficult.
Advantages
and Disadvantages of Live Observations for Studying Classroom Processes
Having discussed some
of the advantages and disadvantages involved in using questionnaires to
study classroom processes, let us now discuss the advantages and disadvantages
of using direct observational techniques. Direct observation overcomes
some of the limitations identified for questionnaires: Observations allow
behavioral categories to be defined objectively by the researcher, not
independently by each respondent. They also enable researchers to study
online implementation of instruction as well as the planned, structural
aspects. Teachers themselves may be unaware of their behavior in the classroom,
yet this same behavior could be easily accessible to the outside observer.
On the other hand, there
are clear disadvantages of live observation as well. Just like questionnaires,
observational coding schemes can act as blinders and may make it difficult
to discover unanticipated aspects of instruction. The use of live observations
also introduces significant training problems when used across large samples
or, especially, across cultures. A great deal of effort is required to
assure that different observers are recording behavior in comparable ways.
In fact, when working in different cultures, it may be impossible to achieve
high levels of comparability.
THE
USE OF VIDEO FOR STUDYING CLASSROOM INSTRUCTION
Bearing in mind the limitations
of questionnaires and of live observational coding schemes, especially
in the context of cross-cultural research, it was decided to use video
for the present study. Most researchers, on hearing the word "video,"
imagine a small-scale qualitative study. This study is anything but small:
Large quantities of video were collected on national samples of teachers.
In fact, one goal of this study was to explore video's feasibility for
use in producing quantitative indicators based on large samples and on
the combination of these quantitative indicators with qualitative information.
In this section we will discuss the advantages and disadvantages of video
over live observation in the study of classroom processes.
Enables
Study of Complex Processes
Classrooms are complex
environments, and instruction is a complex process. Live observers are
necessarily limited in what they can observe, and this places constraints
on the kinds of assessments they can do. Video provides a way to overcome
this problem: Observers can code video in multiple passes, coding different
dimensions of classroom process on each pass. On the first pass, for example,
we coded the organization of the lesson; on the second, the use of instructional
materials; and on the third, the patterns of discourse that characterize
the classrooms of each country. It would have been impossible for a live
observer to code all of these simultaneously.
Not only can coding be
done in passes but it also can be done in slow motion. With video, for
example, it is possible to watch the same sample of behavior multiple
times, enabling coders to describe the behavior in great detail. This
makes it possible to conduct far more sophisticated analyses than would
be possible with live observers.
Increases
Inter-Rater Reliability, Decreases Training Problems
Video also resolves problems
of inter-rater reliability that are difficult to resolve in the context
of live observations. The standard way to establish the reliability of
observational measures is to send two observers to observe the same behavior,
then compare the results of their coding. This is often inconvenient and
is even infeasible for studies that are performed cross-culturally or
in geographically distant locations. Using video to establish reliability
means that the behavior can be brought to the observers instead of vice
versa. Thus, in the context of a cross-cultural study, observers from
different cultural and linguistic backgrounds can work collaboratively,
in a controlled laboratory setting, to develop codes and establish their
reliability using a common set of video data.
Using video also makes
it far easier to train observers. With video, inter-rater reliability
can be assessed not only between pairs of observers but between all observers
and an expert "standard" observer. Disagreements can be resolved based
on re-viewing the video, making such disagreements into a valuable training
opportunity. And, the same segments of video can be used for training
all observers, increasing the chances that coders will use categories
in comparable ways.
Amenable
to Post-Hoc Coding, Secondary Analysis
Most survey data sets
lose their interest over time. Researchers decide what questions to ask
and how to categorize responses based on theories that are prevalent at
a given time. Video data, because they are "pre-quantitative," can be
re-coded and analyzed as theories change over time, giving them a longer
shelf life than other kinds of data. Researchers in the future may code
videotapes of today for purposes completely different than those for which
the tapes were originally collected.
Amenable
to Coding from Multiple Perspectives
For similar reasons, video
data are especially suited for coding from multiple disciplinary perspectives.
Tapes of mathematics classes in different countries, for example, might
be independently coded by psychologists, anthropologists, mathematicians,
and educators. Not only is this cost effective, but it also facilitates
valuable communication across disciplines. The most fruitful interdisciplinary
discussions result when researchers from diverse backgrounds compare analyses
based on a common, concrete referent.
Facilitates
Integration of Qualitative and Quantitative Information
Video makes it possible
to merge qualitative and quantitative analyses in a way not possible with
other kinds of data. With live-observer coding schemes the qualitative
and quantitative analyses are done sequentially: First, initial qualitative
analyses lead to the construction of the coding scheme; then, implementation
of the coding scheme leads to a re-evaluation of the qualitative analysis.
When video is available
it is possible to move much more quickly between the two modes of analysis.
Once a code is applied and a quantitative indicator produced, the researcher
can go back and look again more closely at the video segments that have
been categorized together. This kind of focused qualitative observation
makes it possible to refine and improve the code, and may even provide
the basis for a new code.
Provides
Referents for Teachers' Descriptions
Mentioned earlier was
the problem that teachers lack a set of shared referents for the words
they use to describe classroom instruction. Video can, in the long run,
provide teachers, as potential consumers of the research, with a set of
such referents. Definitions of instructional quality and the indicators
developed to assess instructional quality could be linked to a library
of video examples that teachers can use in the course of their professional
development. In the long run, a shared set of referents can lead to the
development of more efficient and valid questionnaire-based indicators
of instructional quality.
Facilitates
Communication of the Results of Research
It is also possible, with
video, to use concrete video examples in reporting research results. This
gives consumers of the information a richer qualitative sense of what
each category in the coding system means and a concrete basis for interpreting
the quantitative research findings.
Provides
a Source of New Ideas for How to Teach
Another advantage of video
over other kinds of data is that it becomes a source of new ideas on how
to teach. Because these new ideas are concrete and grounded in practice,
they have immediate practical potential for teachers. Questionnaires and
coding schemes can help us spot trends and relationships, but they can't
demonstrate a new way of teaching the Pythagorean theorem.
Disadvantages
Despite all its advantages,
video also has some disadvantages. At the very least, video raises a number
of problematic issues that must be addressed if it is to yield accurate
and valid information about classroom processes. In the next section we
will discuss some of these issues and challenges.
ISSUES
IN VIDEO RESEARCH
This section briefly discusses
a number of issues that must be resolved in order to conduct meaningful
video research.
Standardization
of Camera Procedures
Left to their own devices,
different videographers will photograph the same classroom lesson in different
ways. One may focus on individual students, another may shoot wide shots
in order to give the broadest possible picture of what is happening in
the classroom. Yet another might focus on the teacher or on the blackboard.
Because we want to study classroom instruction, not the videographers'
camera habits, it is important to develop standardized procedures for
using the camera and then to carefully train videographers to follow these
procedures. This study has done so, and the procedures are described in
the Methods section of this document.
The
Problem of Observer Effects
What effect does the camera
have on what happens in the classroom? Will students and teachers behave
as usual with the camera present, or will we get a view that is biased
in some way? Might a teacher, knowing that she is to be videotaped, even
prepare a special lesson just for the occasion that is unrepresentative
of her normal practices?
This problem is not unique
to video studies. Questionnaires have the same potential for bias: Teachers'
questionnaire responses, as well as their behavior, may be biased toward
cultural norms. On the other hand, it may actually be easier to gauge
the degree of bias in video studies than in questionnaire studies. Teachers
who try to alter their behavior for the videotaping will likely show some
evidence that this is the case. Students, for example, may look puzzled
or may not be able to follow routines that are clearly new for them.
It also should be noted
that changing the way a teacher teaches is notoriously difficult to do,
as much of the literature on teacher development suggests. It is highly
unlikely that teaching could be improved significantly simply by placing
a camera in the room. On the other hand, teachers will obviously try to
do an especially good job, and may do some extra preparation, for a lesson
that is to be videotaped. We may, therefore, see a somewhat idealized
version of what the teacher normally does in the classroom.
Minimizing
Bias Due to Observer Effects
This study used three
techniques for minimizing observer bias. First, instructions were standardized
across teachers. The goal of the research was clearly communicated to
the teacher in carefully written, standard instructions. Teachers were
told that the goal was to videotape a typical lesson with typical defined
as whatever they would have been doing had the videographer not shown
up. Teachers were also explicitly asked to prepare for the target lesson
just as they would for a typical lesson. (A copy of information given
to teachers prior to the study is included as appendix A.)
Second, this study attempted
to assess the degree to which bias occurred. After the videotaping, teachers
were asked to fill out a questionnaire in which they rated, for example,
the typicality of what we would see on the videotape, and describe in
writing any aspect of the lesson they felt was not typical. We also asked
teachers whether the lesson in the videotape was a stand-alone lesson
or part of a sequence of lessons and to describe what they did yesterday
and what they plan to do in tomorrow's lesson. Lessons described as stand-alone
and as having little relation to the lessons on adjoining days would be
suspect for being special lessons constructed for the purpose of the videotaping.
In this study, however, lessons were rarely described in this way.
Finally, one must use
common sense in deciding the kinds of indicators that may be susceptible
to bias and taking this into account in interpreting the results of a
study. It seems likely, for example, that students will try to be on their
best behavior with a videographer present, and so we may not get a valid
measure from video of the frequency with which teachers must discipline
students. On the other hand, it is probably less likely that teachers
use a different style of questioning while being videotaped than they
would when the camera is not present. Some behaviors, such as the routines
of classroom discourse, are so highly socialized as to be automatic and
thus difficult to change.
Sampling
and Validity
Observer effects are not
the only threat to validity of video survey data. Sampling--of schools,
teachers, class periods, lesson topics, and parts of the school year--is
a major concern.
One key issue is the number
of times any given teacher in the sample should be videotaped. This obviously
will depend on the level of analysis to be used. If we need a valid and
reliable picture of individual teachers, then we must tape the teacher
multiple times, as teachers vary from day to day in the kind of lesson
they teach, as well as in the success with which they implement the lesson.
If we want a school-level picture, or a national-level picture, then we
obviously can tape each teacher fewer times, provided we resist the temptation
to view the resulting data as indicating anything reliable about the individual
teacher.
On the other hand, taping
each teacher once limits the kinds of generalizations we can make about
instruction. Teaching involves more than constructing and implementing
lessons. It also involves weaving together multiple lessons into units
that stretch out over days and weeks. If each teacher is taped once, it
is not possible to study the dynamics of teaching over the course of a
unit. Inferences about these dynamics cannot necessarily be made, even
at the aggregate level, based on one-time observations.
Another sampling issue
concerns representativeness of the sample across the school year. This
is especially important in cross-national surveys where centralized curricula
can lead to high correlations of particular topics with particular months
of the year. In Japan, for example, the eighth-grade mathematics curriculum
devotes the first half of the school year to algebra, the second half
to geometry. Clearly, the curriculum would not be fairly represented by
taping in only one of these two parts of the year.
Finally, although at first
blush it may seem desirable to sample particular topics in the curriculum
in order to make comparisons more valid, in practice this is virtually
impossible. Especially across cultures, teachers may define topics so
differently that the resulting samples become less rather than more comparable.
Randomization appears to be the most practical approach to insuring the
comparability of samples.
Confidentiality
The fact that images of
teachers and students appear on the tapes makes it more difficult than
usual to protect the confidentiality of study participants when the data
set is used for secondary analyses. An important issue, therefore, concerns
how procedures can be established to allow continued access to video data
by researchers interested in secondary analysis.
One option is to disguise
the participants by blurring their faces on the video. This can be accomplished
with modern-day digital video editing tools, but it is expensive at present
to do this for an entire data set. A more practical approach is to define
special access procedures that will enable us to protect the confidentiality
of participants while still making the videos available as part of a restricted-use
data set.
Logistics
Contrary to traditional
surveys, which require intensive and thorough preparation up front, the
most daunting part of video surveys is in the data management and analysis
phase. Information entered on questionnaires is more easily transformed
into computer readable format than is the case for video images. Thus,
it is necessary to find a means to index the contents of the hundreds
of hours of tape that can be collected in a video survey. Otherwise, the
labor involved in analyzing the tapes grows enormously.
Once data are indexed,
there is still the problem of coding. Coding of videotapes is renowned
as highly labor intensive. But there are strategies available for bringing
the task under control. The present study has developed specialized computer
software to help in this task. Emerging multimedia computing technologies
will, over the next several years, revolutionize the conduct of video
surveys, making them far more feasible than they have ever been in the
past.
HARNESSING
THE POWER OF THE ANECDOTE
Anecdotes and images are
vivid and powerful tools for representing and communicating information.
One picture, it is said, is worth a thousand words. On the other hand,
anecdotes can be misleading and even completely unrepresentative of reality.
Furthermore, research in cognitive psychology has shown that the human
information processing system is easily misled by anecdotes, even in the
face of contradictory and far more valid information (e.g., Nisbett and
Ross, 1980). Methods of research design and inferential statistics were
developed, in fact, specifically to protect us from being misled by anecdotes
and experiences (Fisher, 1951).
A video survey, like the
one being described here, provides one possible way to resolve this tension
between anecdotes and statistics. Recognizing the power of video images,
one can harness this power in two ways. First, discoveries made through
qualitative analysis of the videos can be validated by statistical analysis
of the whole set of videos. For example, while watching a video we might
notice some interesting technique used by a Japanese teacher. If we only
had one video, it would be hard to know what to make of this observation:
Do Japanese teachers really use the technique more than U.S. teachers,
or did we just happen to notice one powerful example in the Japanese data?
Because we have a large sample of videos, we can turn our observation
into a hypothesis that can be validated against the database.
In a complementary process,
we might, after coding and quantitative analysis of the video data, discover
a statistical relationship in the data. By returning to the actual video,
we can find concrete images to attach to our discovery, giving us a means
of further analysis and exploration, as well as a set of powerful images
that can be used to communicate the statistical discovery we have made.
Through this process we can uncover what the statistic means in practice.
|