Skip Navigation

Sampling in the United States

In the United States and most other education systems, the target population of students corresponded to students in grade 4. The U.S. used a two-stage stratified cluster sampling design to sample the target population. The U.S. sampling frame was explicitly stratified by three categorical stratification variables: poverty status (high or low),6 type of school (public or private), and region of the country (Northeast, Midwest, South, West).7 The U.S. sample was implicitly stratified (that is, sorted for sampling) by two categorical stratification variables: locality (four levels),8 and minority status (above or below 15 percent of the student population).

The first stage selected schools for the original sample using probability proportional to size (PPS). The school sampling frame was constructed using the 2011-2012 Common Core of Data (CCD) and 2009-2010 Private School Universe Survey (PSS). Schools were selected with a probability proportionate to the school's estimated enrollment of grade 4 students. In addition, for each original school selected, the two neighboring schools in the sampling frame were designated as substitute schools. For each school, a random selection was used to determine whether the preceding or following substitute school was used as the first substitute. If an original school refused to participate, the first substitute was contacted. If that school also refused to participate, the second substitute was contacted. There were several constraints on the assignment of substitutes. One sampled school was not allowed to substitute for another, and a given school could not be assigned to substitute for more than one sampled school. Furthermore, substitutes were required to be in the same implicit stratum as the sampled school.

The second stage consisted of selecting intact 4th-grade classes within each participating school. Schools provided lists of 4th-grade classrooms. Within schools, classrooms with fewer than 15 students were collapsed into pseudo- classrooms so that each classroom in the school's classroom sampling frame had at least 20 students.9 An equal probability sample of one to two classrooms (including pseudo-classrooms) was identified from the classroom frame for the school. In schools where there was only one classroom, this classroom was selected with certainty. For PIRLS 2016, 10 pseudo-classrooms were created prior to classroom sampling, with 6 of these being selected in the final classroom sample. All students in sampled classrooms and pseudo-classrooms were selected for assessment.

The PIRLS 2016 sample included 176 public and private schools. For each of the 176 schools, two replacement schools were also selected. The replacements were only contacted about the study if the original school refused to participate.


6 High poverty schools are defined as having 76% or more of the students eligible for participation in the National School Lunch Program (NSLP), and low poverty schools have less than 76% of students eligible for NSLP. Private schools are all classified as low poverty because no NSLP information is available.
7 The Northeast region consists of Connecticut, Maine, Massachusetts, New Hampshire, Rhode Island, Vermont, New Jersey, New York, and Pennsylvania. The Midwest region consists of Indiana, Illinois, Michigan, Ohio, Wisconsin, Iowa, Kansas, Minnesota, Missouri, Nebraska, North Dakota, and South Dakota. The West region consists of Alaska, Arizona, California, Colorado, Hawaii, Idaho, Montana, Nevada, New Mexico, Oregon, Utah, Washington, and Wyoming. The South region consists of Delaware, the District of Columbia, Florida, Georgia, Maryland, North Carolina, South Carolina, Virginia, Alabama, Kentucky, Mississippi, Tennessee, West Virginia, Arkansas, Louisiana, Oklahoma, and Texas.
8 Schools were classified into four locales: City, Suburb, Town, and Rural.
9 Since classrooms are sampled with equal probability within schools, small classrooms would have the same probability of selection as large classrooms. Selecting classrooms under these conditions would likely mean that student sample size would be reduced, and some instability in the sampling weights created. To avoid these problems, pseudo-classrooms are created for the purposes of classroom sampling, in which small classrooms are joined to reach a larger student count. These pseudo-classrooms are treated as single classes in the class sampling process. Following class sampling, the pseudo-classroom combinations are dissolved and the small classes involved retain their own identity. In this way, data on students, teachers, and classroom practices are linked in small classes in the same way as with larger classes.