Research Report: 1994 NAEP U.S. History Group Assessment

September 1998

Authors: Madeline Goodman, Stephen Lazer, John Mazzeo, Nancy Mead, and Amy Pearlmutter

This report documents the National Assessment of Educational Progress (NAEP) special pilot study of group assessment. In 1994, NAEP administered U.S. history projects to a limited number of students. The purpose of this study was to investigate the feasibility of group assessment, and to gain practical experience in the design, development, administration, and scoring of such instruments. The report first describes the development and conduct of the study. It then discusses practical lessons learned, and makes recommendations regarding the future assessment of groups. Appendices include the testing instruments, scoring guides, and examples of student work.


Increasingly, teachers in our nation's schools are using group learning techniques. Use of cooperative learning and other heuristic models have led many educators and curriculum developers to believe that supplementing traditional instruction with activities that require students to help each other learn is more effective than using individual-based practices alone. In addition, a number of studies have suggested that cooperative and group learning experiences are more positively related to higher levels of academic achievement than are individualistic or competitive instructional settings.[1] Many believe that, in addition to being an optimal means of helping students gain both knowledge and critical-thinking competencies, group learning provides students with the teamwork and leadership skills necessary for success in our changing economy.[2]

Because group instruction occupies an increasingly central place in American education, several analysts have argued that it is important that educational surveys track the abilities of students to work in groups.[3] However, assessing group work presents special challenges. The measurement of interpersonal processes has played little role in traditional large-scale assessments. There is a lack of understanding in the assessment community of the issues and challenges involved in the measurement of groups rather than of individual students. For example, assessment developers have little experience in crafting exercises that simultaneously allow for the real manifestation of group dynamics while remaining constrained enough to be amenable to standardized administration and scoring. Nevertheless, as the National Assessment of Educational Progress (NAEP) moves toward the future, the assessment of groups is likely to become an increasingly pressing concern. Therefore, an initial study of the procedures and issues involved in group assessment under the auspices of NAEP seemed appropriate.

When reviewing this report, the reader should keep two general points about this study in mind. First, the study focuses on the assessment of groups of students, and not on the importance, efficacy, or form of group and cooperative learning. The educational and workplace implications of group skills are clearly of great import; however, they are beyond the scope of this study.

Second, the NAEP U.S. history group assessment research study was initiated to obtain experience in the design, administration, scoring, and analysis of group assessment tasks. The intention of this study was not, principally, to obtain reliable results concerning group performance, but rather to investigate feasibility and operational issues surrounding group-based assessments. The design of this paper reflects the emphasis on operational concerns rather than results; it primarily deals with the feasibility of administering and scoring group assessments, while offering some concrete suggestions for future efforts aimed at incorporating group tasks into large-scale assessment projects such as NAEP.

The organization of this report is therefore as follows. Chapter One describes the group assessment tasks, the characteristics of the participants in the study, and the scoring of group processes. It also presents some results on the level of performance observed on the tasks, the reliability of ratings, and other characteristics associated with the tasks. Chapter Two discusses the practical lessons learned about the development, administration, scoring, and analysis of group history tasks. Chapter Three briefly summarizes the project. Appendices A and B provide copies of the administration scripts, the materials used for the tasks, the tasks themselves, and some samples of actual group responses to the written tasks.

Two group assessment projects were developed for students in grade 8. Each project consisted of a set of structured tasks to be carried out by groups of four or six students. Groups produced a series of concrete, written products (e.g., charts, lists, descriptions, or explanations). Each group was also videotaped as it performed the tasks. Tapes were later analyzed and scored by raters.

Thirty-six of the grade 8 schools that participated in the 1994 NAEP assessment, representing a range of sizes and types of communities, were recruited to participate in this special study. In each school, two groups of students were identified, one for each group project. Students were selected at random from those who participated in the 1994 NAEP U.S. history assessment, so that the results of the special study could be linked back to the main assessment findings.

The participants in the NAEP U.S. history group assessment study were not a statistically representative sample of students. Schools were selected to represent a variety of settings and types, but they were chosen from a group of schools that volunteered to participate in the project. Within schools, students were selected at random from those who participated in the 1994 NAEP U.S. history assessment. However, selected students could participate only if they returned a form from their parents giving permission to participate in a videotaped assessment activity. The rate of return of permission slips was disappointingly low, and the characteristics of the participating students suggest that they are not a representative group.

Since the group products offered particular challenges not normally associated with the scoring of individually produced constructed responses, part of the purpose of the study was to identify new scoring procedures for group assessment work. Two distinct types of ratings were assigned to each group. The first type was intended to measure the quality of the written products generated by each group. These ratings were generally related to the content-specific aspect of the group projects, that is, the extent of and quality of the historical knowledge that groups of students were able to demonstrate when confronted with the historical tasks. Each group product was evaluated independently by two trained scorers according to criteria set forth in standardized scoring rubrics. Criteria used for rating purposes included quality of historical thinking, historical correctness and accuracy, and completeness of responses.

The second set of ratings were intended to measure the communicative behavior exhibited by each group in carrying out the project tasks. Observational protocols were used to record and evaluate the communication that occurred within the groups. Group communication was rated in terms of the degree of group participation, the quality of the discussion related to the content of the task, and the extent to which the group worked in an organized fashion. Two observers independently evaluated the group communication as it occurred. Later, two raters independently rated the group communications exhibited on videotapes made at the time of the administration. Finally, experienced raters reviewed the videotapes to obtain additional descriptive information, including evidence of a dominant personality influencing group processes or products, and a comparison of what was said and what was written on the task sheets.

