Jack Buckley
Commissioner, National Center for Education Statistics

National Assessment of Educational Progress
NAEP 2009 Science Assessment

June 19, 2012

Introduction

In 2009, students at grades 4, 8, and 12 took the NAEP science assessment. During this exam, they responded to questions not only via paper-and-pencil, but also different science tasks—hands-on tasks and interactive computer tasks. Through these tasks, students performed both real and simulated scientific experiments. Today, we will be focusing on this new generation of testing by exploring the types of tasks with which students were presented, how the students performed, and some common themes across the tasks.

A New Generation of Testing

For the hands-on tasks, approximately 2,000 fourth-, eighth-, and twelfth-graders were assessed at each grade. Each student who participated in the assessment was presented two 40-minute tasks. When doing these tasks, students were asked to perform real-world experiments, and respond to data gathered and observations made.

For the interactive computer tasks, separate samples of approximately 2,000 fourth-, eighth-, and twelfth-graders were assessed at each grade. Each assessed student was administered either two 20-minute or a one 40-minute task for each of the three grades. Students were presented with simulations of experiments that allowed them to demonstrate a broad range of skills involved in doing science.

A computer-based setting also allowed us to bypass many of the logistical constraints associated with scientific experiments (e.g., the passing of long periods of time or safety concerns).

Samples and Results

The samples of students assessed were separate from those in the paper and pencil 2009 NAEP science assessment.

Results are available at the national level only. Student performance is reported in two ways:

• Percentage correct on individual tasks
• The percentage correct for all hands-on tasks or interactive computer tasks.

Because NAEP results are based on samples, there is a margin of error associated with each score or percentage. Therefore, we only report those differences in scores or percentages that meet our standards for statistical significance.

What Students Are Doing in the Classroom, Grade 4

In many classrooms, students routinely perform the kinds of tasks that now can be assessed through NAEP ICTs and HOTs.

For example, 92 percent of fourth-graders had a teacher who reported doing hands-on activities with students at least monthly. And 39 percent had a teacher who reported at least a moderate emphasis on developing scientific writing skills. ICTs and HOTs tasks routinely require students to write explanations of their conclusions, based on evidence developed in the performance of the task.

What Students Are Doing in the Classroom, Grade 8

At grade 8, ninety-eight percent had a teacher who reported doing hands-on activities with students at least monthly, while 57 percent had a teacher who reported at least a moderate emphasis on developing scientific writing skills.

What Students Are Doing in the Classroom, Grade 12

At grade 12, fifty-one percent of students reported designing a science experiment at least once every few weeks, while 28 percent said they wrote a report on a science project at least once a week.

Now I'll describe an example of a hands-on task and an interactive computer task. Both the hands-on tasks and interactive computer tasks include several different parts. For example, a single task can ask students to perform and respond to several different experiments. These experiments may contain several steps, and each of these steps may represent a student action toward making predictions, observations, or explanations. The task entitled "Maintaining Water systems" shows how these parts all fit together.

Sample Hands-On Task: Maintaining Water Systems

Maintaining Water Systems is a grade 12 hands-on task. Students were asked to investigate the best site for building a new town based on the quality of the water supply.

Students were given laboratory equipment and materials for performing experiments. They were instructed to test water samples and observe the levels of specific pollutants.

They were then evaluated on their ability to present a reasonable, logical argument to support their choice of where to situate a new town. This task required students to predict, observe, and explain and carry out two extended steps, described below.

Maintaining Water Systems: Predict

First, students were asked to predict—to make a preliminary recommendation for the site of a new town, based on information about the quality of water sources.

Sixty-four percent of students were able to explain their preliminary recommendations based on the materials provided in their kits.

When we look at breakdowns by gender, race/ethnicity, and eligibility for the National School Lunch Program (an indicator of family income):

• Sixty-three percent of male students and 64 percent of female students were able to explain their recommendations. These percentages are not significantly different.
• Sixty-eight percent of White students and 76 percent of Asian/Pacific Islander students successfully explained their recommendations, higher than the 39 percent of Black students and 58 percent of Hispanic students who were able to do so.
• And 65 percent of students not eligible for free and reduced-price lunch successfully explained their recommendation, higher than the 56 percent of eligible students who wrote successful explanations.

There were similar though not identical patterns on the individual performance items to follow. Because our samples here are smaller than on most NAEP assessments, particularly for the racial/ethnic groups, differences that appear large may not in fact be statistically significant.

Maintaining Water Systems: Observe

After the prediction stage, students performed water tests and evaluated data in comparison to national drinking water standards.

Seventy-five percent of students could make such observations and perform a straightforward investigation to test the water samples and accurately tabulate data.

• Seventy-four percent of male students were able to perform water tests and accurately tabulate data and 76 percent of females were able to do so
• Eighty-four percent of White students successfully performed water tests and accurately tabulated data compared to 50 percent of Black students, 59 percent of Hispanic, and 66 percent of Asian/Pacific Islander students. White students had a higher percentage than the other three racial/ethnic groups.
• And 79 percent of students not eligible for free and reduced-price lunch successfully explained their recommendation compared to 62 percent of eligible students.

Maintaining Water Systems: Explain

Following their observations, students were asked to make a final recommendation and explain their recommendations with data gathered in the experiment.

Eleven percent of students were able to do so.

• Nine percent of male student were able to provide a valid final recommendation and explain their recommendations and 13 percent of females were able to do so.
• Thirteen percent of White students and 17 percent of Asian/Pacific Islander students successfully provided and explained their final recommendation compared to 3 percent of Black students and 6 percent of Hispanic students.
• And 12 percent of students not eligible for free and reduced-price lunch successfully explained their recommendation compared to 5 percent of eligible students.

Variations in percent correct among these demographic groups ranged from 3 to 17 percent.

Students extended their inquiries by matching pollutants to specific water treatment processes.

Fourteen percent of students were able to correctly evaluate the treatment steps and select those needed to reduce pollutant levels to meet national drinking water standards.

• Seventeen percent of male student were able to do so compared to 11 percent of females.
• Seventeen percent of White students and 13 percent of Asian/Pacific Islander students were able to do so compared to 1 percent of Black students and 6 percent of Hispanic students.
• And 15 percent of students not eligible for free and reduced-price lunch successfully completed this step compared to 7 percent of eligible students.

In this part of task, the percent correct ranged from 1 to 17 percent among the four racial/ethnic groups. White and Asian/Pacific Islander students had higher percentages than Hispanic students. For statistical reasons, we could not compare the performance of Black students with that of other student groups on this step of the task.

Maintaining Water Systems: Further Questions

In addition to the steps noted above in this task, students were asked to describe the processes used in water treatment by applying their knowledge of physical, chemical, and biological processes.

Twenty-eight percent of students successfully completed this step of the task.

• Thirty percent of male student and 27 percent of females were able to do so .
• Thirty-one percent of White students and 39 percent of Asian/Pacific Islander/Pacific Islander students were able to do so compared to 13 percent of Black students and 19 percent of Hispanic students.
• And 29 percent of students not eligible for free and reduced-price lunch successfully completed this task compared to 21 percent of eligible students.

Sample Interactive Computer Task: Mystery Plants

Today I'll describe one of the nine interactive computer tasks given in this assessment. Mystery Plants is a grade 4 task, in which students were asked to design and conduct three different experiments, with the difficulty increasing as they proceeded through the three experiments. They were given a series of simulations and asked to determine the following:

1. What are the best sunlight conditions for growth for Plant A (a sun-loving plant)?
2. What are the best sunlight conditions for growth for Plant B (a shade-tolerant plant)?
3. What are the best fertilizer amounts for growth for Plant A?

For this presentation, I will focus on Experiment 1, deciding on the optimal amount of sunlight for a sun-loving plant.

Mystery Plants Experiment 1: Predict

In the first stage of this experiment, students were asked questions that required them to use their prior knowledge to make predictions about plant growth under varying conditions.

Fourth-graders were asked how much sunlight plants need to grow well. They were presented with four options, the correct one being that "Different kinds of plants need different amounts of sunlight to grow well." Students who chose this answer were given credit for showing complex prior knowledge. They were then shown a brief tutorial about how to perform the experiment, showing them a simulated greenhouse with various sunlight levels: full sun, partial sun, and little sunlight.

Fifty-nine percent of students displayed complex prior knowledge, understanding that different plants have different sunlight needs.

• Fifty-nine percent of male students and 58 percent of females were able to display complex prior knowledge.
• Sixty-five percent of White students successfully displayed complex prior knowledge compared to 49 percent of Black students, 47 percent of Hispanic students, and 67 percent of Asian/Pacific Islander students.
• And 63 percent of students not eligible for free and reduced-price lunch successfully displayed complex prior knowledge compared to 52 percent of eligible students.

Percent correct for the different student groups ranged from 47 to 67 percent. White and Asian/Pacific Islander students had a higher percentage than the other two racial/ethnic groups.

Mystery Plants Experiment 1: Observe

Next, students were tasked with performing the experiment, and making observations about the outcomes.

Students were presented with the greenhouse with the three different lighting conditions and Plant A Students placed Plant A in the trays with differing amounts of sunlight and observed how many flowers and leaves grew in each. Then they observed how Plant A's growth was affected by various amounts of sunlight, which the computer simulated. They were also able to see the growth results in a data table presented at the top of their computer screens. Students noted their observations about how the different amounts of sunlight affected the growth of Plant A.

Eighty percent of students correctly performed this step of experiment 1.

• Eighty percent of male students and 79 percent of females were able to record their observations.
• Eighty-one percent of White students successfully recorded their observations, compared to 79 percent of Black students, 74 percent of Hispanic students, and 86 percent of Asian/Pacific Islander students.
• Eighty-one percent of students not eligible for free and reduced-price lunch and 78 percent of eligible students successfully recorded their observations.

The range across groups was 74 to 86 percent.

Mystery Plants Experiment 1: Explain

After making their investigations, students were required to select the correct conclusion for each investigation and provide an explanation for each.

To provide a correct answer to this step, based on their observations, students had to choose that Plant A was a sun-loving plant, and support their decision. They used the information that they observed throughout the experiment as to how different amounts of sunlight affected Plant A's growth.

Ninety-three percent of students correctly concluded that Plant A was a sun-loving plant. Thirty-six percent were able to support their conclusion with evidence from the experiment.

• Thirty-four percent of male students and 37 percent of female students were able to explain their conclusions.
• Forty-three percent of white students successfully explained their conclusion compared to 19 percent of black students, 27 percent of Hispanic, and 47 percent of Asian/Pacific Islander students.
• Forty-three percent of students not eligible for free and reduced-price lunch successfully explained their conclusion compared to 28 percent of eligible students.

The percentages of students able to support their conclusions ranged from 19 to 47 percent. White and Asian/Pacific Islander/Pacific Islander students had a higher percentage than the other two groups, and Hispanic students had a higher percentage than Black students.

These steps collectively comprised Experiment 1 of the Mystery Plants task.

Student Performance Across Steps: Complex Knowledge

When we look at how students performed across the three steps in Experiment 1— Predict, Observe, Explain—we see some interesting patterns. A chart in the report and on the website shows the percentages of students who performed each stage of each task correctly or incorrectly. For example, in the Mystery Plants task, looking just at the 59 percent of fourth-grade students who showed complex prior knowledge in step 1, we see that 49 percent both showed complex knowledge and correctly completed step 2 (observe), and that 23 percent received complete credit on all three steps (predict, observe, explain).

However, 10 percent of students displayed complex knowledge but failed to complete the observation step correctly. In addition, 8 percent displayed complex knowledge but failed to complete step 2 correctly and also failed to provide a complete answer for step 3. And 2 percent of students who showed complex knowledge failed to complete step 2 correctly, but did provide a complete answer for step 3.

Student Performance Across Steps: No Prior Knowledge

In the Mystery Plants task, we found that 10 percent of students showed no prior knowledge and were unable to predict the relationship between sunlight and plant growth. Despite a lack of prior knowledge, 8 percent of students performed the experiment correctly and 2 percent of students could then go on to choose Plant A as a sun-loving plant and explain their conclusion.

What Did We Learn Across All The Tasks?

When we looked at results for all nine Interactive Computer Tasks and all six Hands-on Tasks, we identified a few broad patterns, or key discoveries.

Discovery 1

First of all, we saw that students were likely to be successful on parts of investigations that involved limited sets of data and making straightforward observations from those data. For example, 84 percent of grade 8 students could use a simulated laboratory to test two soil samples in an ICT called "Playground Soil." Seventy-five percent of twelfth- grade students could perform a similar investigation in the Maintaining Water Systems task.

Discovery 2

Students were challenged by parts of investigations that contained several variables to manipulate or involved strategic decision making to collect appropriate data. For example, in Experiment 3 of the Mystery Plants ICT, which required students to identify appropriate levels of fertilizer, 35 percent of students could select from nine fertilizer levels to test and determine correctly the level most conducive to the growth of a sun- loving plant. Similarly, 25 percent of grade 12 students could perform a challenging investigation on "Energy Transfer," an ICT task that I did not describe earlier, that asked students to investigate thermal energy transfer between substances to determine the better metal for a cooking pan.

Discovery 3

In addition, students were able to arrive at correct conclusions from an investigation, but had difficulty when asked to explain their results. For example, in the "Mystery Plant" experiment that I just described, while 93 percent of grade 4 students could identify that Plant A was a sun-loving plant, only 36 percent could use evidence from their investigation to support that conclusion. Similarly in a grade 8 ICT task, "Bottling Honey," 88 percent of students selected the correct conclusion regarding an experiment involving the flow of liquids, while 54 percent could support their conclusion.

How Did Students Perform on the Tasks?

For ICT tasks, the percentages of students who gave correct answers on the steps they attempted in the tasks were 42 percent for grade 4, forty-one percent for grade 8, and 27 percent for grade 12.

We found a number of differences, and some non-differences, when comparing the performance of various student groups.

For example, there was no score gap between White and Asian/Pacific Islander/Pacific Islander students on the ICT or HOT tasks. However, on the main science assessment (i.e., the paper-and-pencil assessment reported earlier), White students in grades 4 and 8 scored higher than Asian/Pacific Islander/Pacific Islander students.

When we compared male and female students, we found that female students outscored male students on the HOT tasks, even though male students had a higher average score on the main science assessment. On the ICT tasks, male and female students scored about the same.

Students And Coordinators

During the assessment we heard many enthusiastic comments about the ICT and HOT tasks from students and NAEP staff who conducted the assessments in the schools. Some students found the tasks more exciting than a regular assessment and the assessment administrators felt this was generally true of the students they observed.