NCES Blog

National Center for Education Statistics

Celebrate LGBTQ+ Pride Month With NCES

Sexual minorities are people whose sexual orientation is something other than straight or heterosexual.

Gender minorities are people whose sex as recorded at birth is different from their gender.

June is LGBTQ+ Pride Month, and NCES is proud to share some of the work we have undertaken to collect data on the characteristics and well-being of sexual and gender minority (SGM) people. Inclusion of questions about sexual orientation and gender identity on federal surveys allows for a better understanding of SGM people relative to the general population. These questions generate data to inform the development of resources and interventions to better serve the SGM community. Giving respondents the opportunity to describe themselves and bring their “whole self” to a questionnaire also helps them to be more fully seen and heard by researchers and policymakers.

Sometimes, we get asked why questions like this appear on education surveys. They can be sensitive questions for some people, after all. We ask these questions so we can better understand educational equity and outcomes for SGM people, just as we do for other demographic groups, such as those defined by race, ethnicity, household income, and region of the country. Just as is the case for other demographic groups, it is possible that SGM people have unique experiences compared with students and educators from other demographic groups.

Over the past 10 years, NCES has researched how to best ask respondents about their sexual orientation and gender identity, how respondents react to these questions, and what the quality of the data is that NCES has collected in questionnaires and datasets that include sexual orientation and gender identity information.

Several NCES studies include background questions for adults about their sexual orientation and gender identity, including the High School Longitudinal Study of 2009 (HSLS:09) Second Follow-up in 2016, the Baccalaureate and Beyond Longitudinal Study (B&B) 08/18 and 16/21 collections, the National Postsecondary Student Aid Study (NPSAS) in 2020, the Beginning Postsecondary Students Longitudinal Study (BPS) 20/22 and 20/25 collections, and the 2023–24 National Teacher and Principal Survey. In addition, the School Crime Supplement (SCS) to the National Crime Victimization Survey (NCVS), conducted by the Bureau of Justice Statistics and sponsored by NCES, asks students several questions pertinent to SGM experiences. For example, the SCS asks students whether they were bullied due to their gender or sexual orientation and whether they experienced hate speech related to their gender or sexual orientation. As participants in the NCVS, students ages 16 and older who respond to the SCS also report their gender identity and sexual orientation. Collectively, these data allow NCES to describe the experiences of students who identify as sexual and gender minorities.

  • As of 2021, 2009 ninth-graders who were bisexual and questioning left postsecondary education without degrees or credentials at higher rates than other groups of students who were in ninth grade in 2009, and they earned bachelor’s or higher degrees at lower rates than other students.1
     
  • In 2020, some 9 percent of students who identified as genderqueer, gender nonconforming, or a different identity had difficulty finding safe and stable housing, which is the three times the rate of students who identified as male or female (3 percent each).2
     
  • In 2018, about 10 years after completing a 2007–08 bachelor’s degree, graduates who were gender minorities3 described their financial situations. Graduates who were gender minorities were less likely to own a home (31 percent) or hold a retirement account (74 percent) than graduates who were not gender minorities (63 percent and 87 percent, respectively).4
     
  • Among 2008 bachelor’s degree graduates with a full-time job in 2018, those who were straight people reported higher average salaries than those who were either lesbian/gay or bisexual.    
     
  • In the 2017–18 school year, 18 percent of public schools had a recognized student group that promoted the acceptance of students’ sexual orientation and gender identity, such as a Gay-Straight Alliance (GSA). This was an increase from the 2015–16 school year, in which 12 percent of schools reported having a GSA.5|
     
  • Among all students ages 12–18 in grades 6–12 who reported being bullied (19 percent), the percentage who reported being bullied due to their sexual orientation more than doubled from 2017 (4 percent) to 2022 (9 percent).6 That change was primarily driven by female students, for whom the percentage tripled from 2017 to 2022 (from 4 to 13 percent), while the percentage of bullied males who reported being bullied for their sexual orientation was not statistically significantly different across the period (3 percent in 2017 and 4 percent in 2022).

Figure 1. Among students ages 12–18 enrolled in grades 6–12 who reported being bullied, percentage who reported that they thought the bullying was related to their sexual orientation: 2017, 2019, and 2022

! Standard error for this estimate is 30 to 50 percent of the estimate’s value.

* Statistically significantly different (p < .05) from 2022. 


NCES is committed to collecting data about equity in education and describing the experiences of all students and educators, including SGM people.

To learn more about the research conducted at NCES and across the federal statistical system on the measurement of sexual orientation and gender identity, visit nces.ed.gov/FCSM/SOGI.asp.

Plus, be sure to follow NCES on XFacebookLinkedIn, and YouTube and subscribe to the NCES News Flash to stay informed when resources with SGM data are released.

 

By Elise Christopher, Maura Spiegelman, and Michael McGarrah, NCES


[1] SOURCE: Christopher, E. M. (2024). Disparities in postsecondary outcomes for LGBTQ+ individuals:
New evidence from the High School Longitudinal Study of 2009. Presented at the American Education Research Association Annual Meeting, Philadelphia, PA.

[2] SOURCE: U.S. Department of Education, National Center for Education Statistics, 2019–20 National Postsecondary Student Aid Study (NPSAS:20, preliminary data)

[3] On the NCES surveys mentioned above, gender identity categories include male; female; transgender, male-to-female; transgender, female-to-male; genderqueer or gender nonconforming; a different gender identity; and more than one gender identity.

[4] SOURCE: U.S. Department of Education, National Center for Education Statistics, 2008/18 Baccalaureate and Beyond Longitudinal Study (B&B:08/18).

[5] SOURCE: U.S. Department of Education, National Center for Education Statistics, 2015–16 and 2017–18 School Survey on Crime and Safety (SSOCS).

[6] SOURCE: U.S. Department of Education, National Center for Education Statistics, 2017, 2019, and 2022 School Crime Supplement (SCS) to the National Crime Victimization Survey (NCVS)

 

Celebrating the ECLS-K:2024: Participating Children Are the Key to Increasing Our Knowledge of Children’s Education and Development Today

As we highlighted in our March blog post, NCES is excited to be in the field for the base-year data collection of our newest national early childhood study, the Early Childhood Longitudinal Study, Kindergarten Class of 2023–24 (ECLS-K:2024). Although the study collects much needed data from a variety of adult respondents (i.e., parents/guardians, teachers, and school administrators), the heart of the ECLS-K:2024 is our data collection with the participating children.

With the permission of their parent/guardian, the children take part in the ECLS-K:2024 child session activities, answering engaging, age-appropriate reading and math questions during one-on-one sessions with trained ECLS-K:2024 team members (watch an example of children participating in the child activities). These ECLS-K:2024 child sessions are planned for every currently expected round of data collection, starting with the fall and spring of the current school year (2023–24) when the children are in kindergarten.

Although the child sessions look pretty similar every year, there are some changes to the activities as the children age. For example, in kindergarten and first grade, we include memory-related items in the sessions; we then swap out these items for child surveys in the later rounds, when children are in higher grades. In prior ECLS kindergarten cohort studies, child surveys included items on topics such as children’s sense of school belonging; worry or stress about school; media usage; and peer relationships. Explore the items we asked in the child surveys in the ECLS-K:2024’s sister studies, the ECLS-K and ECLS-K:2011. Many of these items will likely be asked again of the children participating in ECLS-K:2024. Also, in past studies, children had their height and weight measured to provide information about early physical development; this study component returns to the ECLS-K:2024’s spring data collection in some schools.

Child-provided data from the earlier ECLS program studies have been used extensively. A recent analysis conducted by the ECLS program team found that more than 1,000 studies and reports published between 2000 and 2021 have analyzed the ECLS academic skills and school performance data, with more than 80 percent of those utilizing the child assessment data. For example, NCES published a report on reading achievement over children’s early elementary school years using the ECLS-K reading assessment data. Use NCES’s Bibliography Search Tool to explore these reports (select “ECLS” from the Data Source drop-down menu).

If you’re instead interested in exploring trend data, research has been conducted on the differences between children who were in kindergarten during the 1998–99 and 2010–11 school years (use the Bibliography Search Tool to find reports on this topic). Additional research comparing kindergartners in the 1998–99 and 2010–11 school years with kindergartners in the 2023–24 school year is expected after this year’s ECLS-K:2024 data collection. Once the ECLS-K:2024 collection concludes, NCES will produce data files—made available to the public with deidentified data—that will allow for direct comparisons between these three groups of children. Our understanding of how the abilities of today’s kindergartners vary from those of kindergartners in the late 1990s and early 2010s relies on the participation of children in the ECLS-K:2024.  

Of course, it’s not just the children’s reading and mathematics data that will provide answers to key questions about education and child development. All the data the ECLS-K:2024 children are providing now in the study’s base year—as well as the data they will provide as they advance through their elementary years—will help inform our understanding of what today’s children know and can do.

On behalf of the thousands of researchers, policymakers, educators, and parents who rely on the important data provided by the ECLS-K:2024’s youngest contributors—thank you to our ECLS-K:2024 children!

Want to learn more?


Next up in this blog series celebrating the ECLS-K:2024, we’ll highlight the study’s parents and families. Keep an eye out this summer!

 

By Jill McCarroll and Korrie Johnson, NCES

Data on the High School Coursetaking of American Indian and Alaska Native Students

Understanding the racial/ethnic equity of educational experiences is a vital objective. The National Assessment of Educational Progress (NAEP) High School Transcript Study (HSTS) collects and analyzes transcripts from a nationally representative sample of America’s public and private high school graduates, including information about the coursetaking of students by race/ethnicity.

In 2019, NCES collected and coded high school transcript data from graduates who participated in the grade 12 NAEP assessments. The participants included American Indian and Alaska Native (AI/AN) students as well as students from other racial/ethnic groups. The main HSTS 2019 results do not include AI/AN findings because the sample sizes for AI/AN students in earlier collection periods were too small to report NAEP performance linked to coursetaking measures. Therefore, this blog post serves to highlight available AI/AN data. Find more information about NAEP's race/ethnicity categories and trends.
 

About HSTS 2019

The 2019 collection is the eighth wave of the study, which was last conducted in 2009 and first conducted in 1987. Data from 1990, 2000, 2009, and 2019—representing approximately decade-long spans—are discussed here. Data from HSTS cover prepandemic school years.
 

How many credits did AI/AN graduates earn?

For all racial/ethnic groups, the average number of Carnegie credits AI/AN graduates earned in 2019 was higher than in 2009 and earlier decades (figure 1). AI/AN graduates earned 27.4 credits on average in 2019, an increase from 23.0 credits in 1990. However, AI/AN graduates earned fewer overall credits in 2019 than did Asian/Pacific Islander, Black, and White graduates, a pattern consistent with prior decades.


Figure 1. Average total Carnegie credits earned by high school graduates, by student race/ethnicity: Selected years, 1990 through 2019 

[click to enlarge image]

Horizontal bar chart showing average total Carnegie credits earned by high school graduates by student race/ethnicity in selected years from 1990 through 2019.

* Significantly different (p < .05) from American Indian/Alaska Native group in the given year.                                                              
+ Significantly different (p < .05) from 2019 within racial/ethnic group.                                                   
NOTE: Race categories exclude Hispanic origin. Black includes African American, Hispanic includes Latino, and Pacific Islander includes Native Hawaiian.                                                               
SOURCE: U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics, National Assessment of Educational Progress (NAEP) High School Transcript Study, various years, 1990 to 2019.


In 2019, the smaller number of total credits earned by AI/AN graduates—compared with graduates in other racial/ethnic groups—was driven by the smaller number of academic credits earned. On average, AI/AN graduates earned about 1 to 3 academic credits less (19.3 credits) than graduates in other racial/ethnic groups (e.g., 22.2 for Asian/Pacific Islander graduates and 20.6 for Hispanic graduates) (figure 2). In contrast, AI/AN graduates earned more or a similar number of credits in career and technical education (CTE) (3.6 credits) and other courses (4.5 credits) compared with graduates in other racial/ethnic groups.


Figure 2. Average Carnegie credits earned by high school graduates in academic, career and technical education (CTE), and other courses, by student race/ethnicity: 2019

[click to enlarge image]

Horizontal bar chart showing average Carnegie credits earned by high school graduates in academic, career and technical education (CTE), and other courses by student race/ethnicity in 2019

* Significantly different (p < .05) from American Indian/Alaska Native group.                                                                            
NOTE: Race categories exclude Hispanic origin. Black includes African American, Hispanic includes Latino, and Pacific Islander includes Native Hawaiian.                                                                                                                                                            
SOURCE: U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics, National Assessment of Educational Progress (NAEP) High School Transcript Study, 2019.         
  



What was the grade point average (GPA) of AI/AN graduates?

As with credits earned, GPA has been generally trending upward since 1990. AI/AN graduates had an average GPA of 2.54 in 1990 and an average GPA of 3.02 in 2019 (figure 3). Unlike with credits earned, however, the average GPA for AI/AN graduates was between the GPA of graduates in other racial/ethnic groups in 2019: it was lower than the GPAs for Asian/Pacific Islander and White graduates and higher than the GPAs for Black and Hispanic graduates.


Figure 3. Average overall grade point average (GPA) earned by high school graduates, by student race/ethnicity: Selected years, 1990 through 2019

[click to enlarge image]

Horizontal bar chart showing average overall grade point average (GPA) earned by high school graduates by student race/ethnicity in selected years from 1990 through 2019.

* Significantly different (p < .05) from American Indian/Alaska Native group in the given year.                                            
+ Significantly different (p < .05) from 2019 within racial/ethnic group.                                                                                       
NOTE: Race categories exclude Hispanic origin. Black includes African American, Hispanic includes Latino, and Pacific Islander includes Native Hawaiian.                                                                                                                                                            
SOURCE: U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics, National Assessment of Educational Progress (NAEP) High School Transcript Study, various years, 1990 to 2019.



What curriculum level did AI/AN graduates reach?

HSTS uses curriculum levels to measure the rigor of high school graduates’ coursework as a potential indicator of college preparedness. There are three curriculum levels: standard, midlevel, and rigorous. Students who did not meet the requirements for a standard curriculum are considered to have a “below standard” curriculum.

Reflecting the smaller numbers of academic credits earned by AI/AN graduates, as described above, a lower percentage of AI/AN graduates reached the rigorous level (the highest level): only 5 percent of AI/AN graduates had completed a rigorous curriculum in 2019, compared with 10 percent of Hispanic, 13 percent of White, and 28 percent of Asian/Pacific Islander graduates (table 1). Similarly, a lower percentage of AI/AN graduates completed a midlevel curriculum than did White, Black, or Hispanic graduates. At the standard and below-standard levels, therefore, AI/AN graduates were overrepresented relative to most other groups.


Table 1. Percentage distribution of high school graduates across earned curriculum levels, by student race/ethnicity: 2019

Table showing the percentage distribution of high school graduates across earned curriculum levels (below standard, standard, midlevel, and rigorous) by student race/ethnicity in 2019.

* Significantly different (p < .05) from American Indian/Alaska Native group.
NOTE: Details may not sum to total due to rounding. A graduate who achieves the standard curriculum earned at least four Carnegie credits of English and three Carnegie credits each of social studies, mathematics, and science. A graduate who achieves a midlevel curriculum earned at least four Carnegie credits in English, three Carnegie credits in mathematics (including credits in algebra and geometry), three Carnegie credits in science (including credits in two among the subjects of biology, chemistry, and physics), three Carnegie credits in social studies, and one Carnegie credit in world languages. A graduate who achieves a rigorous curriculum earned at least four Carnegie credits in English, four Carnegie credits in mathematics (including credits in precalculus or calculus), three Carnegie credits in science (including credits in all three subjects of biology, chemistry, and physics), three Carnegie credits in social studies, and three Carnegie credits in world languages. Graduates with curriculum that do not meet the requirements for the standard level are considered as “Below standard.” Race categories exclude Hispanic origin. Black includes African American, Hispanic includes Latino, and Pacific Islander includes Native Hawaiian.
SOURCE: U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics, National Assessment of Educational Progress (NAEP) High School Transcript Study, 2019.


Explore the HSTS 2019 website to learn more about the study, including how courses are classified, grade point average is calculated, and race/ethnicity categories have changed over time. Be sure to follow NCES on XFacebookLinkedIn, and YouTube and subscribe to the NCES News Flash to stay informed about future HSTS data and resources.

 

By Ben Dalton, RTI International, and Robert Perkins, Westat

Leveraging Economic Data to Understand the Education Workforce

The Digest of Education Statistics recently debuted 13 new tables on K–12 employment and wages from a data source that is new to the Digest—the Occupational Employment Wage Statistics (OEWS) program of the Bureau of Labor Statistics (BLS). NCES’s Annual Reports and Information Staff conducted an extensive review of existing and emerging data sources and found that BLS’s OEWS program provides high-quality, detailed, and timely data that are suitable to inform policymaking in education and workforce development.1 In this blog post, we share why we added this new data source, how we evaluated and prepared these data, and our future plans to expand on these efforts.

 

Need for Education Workforce Data

NCES recognized that education stakeholders need more granular and timely data on the condition of the education workforce to inform decisionmaking. In the wake of the coronavirus pandemic, school districts are looking to address critical staffing needs. According to NCES’s School Pulse Panel, entering the 2023–24 school year (SY), just under half of U.S. public schools reported feeling understaffed and had a need for special education teachers, transportation staff, and mental health professionals.

Since staffing needs and labor markets vary from district to district and state to state, it is important that we create national- and state-level tabulations for specific occupations, including those of special interest since the pandemic, like bus drivers, social workers, and special education teachers. Similarly, we want to be able to provide annual data updates so stakeholders can make the most up-to-date decisions possible.

Annual Digest table updates, coupled with detailed occupational and state-level data, will provide relevant and timely information on employment and wage trends that will be valuable in current and future efforts to address teacher and staff retention and recruitment. See below for a list of the new Digest tables.

  • National-level employment and annual wages
  • Selected teaching occupations (211.70)
  • Selected noninstructional occupations (213.70)
  • State-level employment and annual wages
  • Preschool teachers (211.70a)
  • Kindergarten teachers (211.70b)
  • Elementary school teachers (211.70c)
  • Middle school teachers (211.70d)
  • Secondary school teachers (211.70e)
  • Kindergarten and elementary special education teachers (211.70f)
  • Middle school special education teachers (211.70g)
  • Secondary school special education teachers (211.70h)
  • Substitute teachers (211.70i)
  • Teaching assistants (211.70j)
  • All occupations in the Elementary and Secondary Education industry (213.75)

 

Strengths of OEWS

OEWS and the Digest tables are aligned with the Federal Committee on Statistical Methodology’s Data Quality Framework, specifically the principles of objectivity (standardization), utility (granularity and timeliness), and integrity (data quality).


Standardization

OEWS produces employment and wage estimates using standardized industry and occupational classifications. Using the North American Industry Classification System, establishments are grouped into categories—called industries—based on their primary business activities. Like industries, occupations are organized into groups or categories based on common job duties (using the Standard Occupational Classification). Occupations that are common to K–12 schools can also be found in other industries, and the OEWS provides both cross-industry estimates and industry-specific estimates for just Elementary and Secondary Education industry. To provide the most relevant and comparable data for education stakeholders, NCES chose to focus on distinct occupational estimates for the Elementary and Secondary Education industry, since all establishments (e.g., school boards, school districts) provide the same services: instruction or coursework for basic preparatory education (typically K–12).2     

Another advantage of the OEWS data is the ability to examine specific detailed occupations, like elementary school teachers, secondary school teachers, and education administrators. Digest tables include estimates for specific instructional and noninstructional occupations, which allows users to make comparisons among teachers and staff with similar job responsibilities, providing opportunities for more targeted decisionmaking.


Granularity

In addition to data on detailed occupations, OEWS data provide data at national and state and levels, allowing for comparisons across geographies. National-level Digest tables include estimates for public and private education employers.3 Publicly funded charter schools run by private establishments are included in private ownership estimates, as they can be managed by parents, community groups, or private organizations. Public ownership is limited to establishments that are run by federal, state, or local governments. State-level Digest tables provide more localized information covering labor markets for the 50 states, the District of Columbia, Puerto Rico, Guam, and the U.S. Virgin Islands.
   

Timeliness and Data Quality

OEWS data are updated annually from a sample of about 1.1 million establishments’ data collected over a 3-year period. The OEWS sample is drawn from an administrative list of public and private companies and organizations that is estimated to cover about 95 percent of jobs.4 When employers respond to OEWS, they report from payroll data that are maintained as a part of regular business operations and typically do not require any additional collections or calculations. Payroll data reflect wages paid by employers for a job, which has a commonly accepted definition across employers or industries. This allows for more accurate comparisons of annual wages for a particular job. In contrast, when wages are self-reported by a respondent in person-level or household surveys, the reported data may be difficult to accurately code to a specific industry or detailed occupation, and there is greater chance of recall error by the respondent. Additionally, OEWS provides specialized respondent instructions for elementary and secondary schools and postsecondary institutions that accommodate the uniqueness of what educators do and how they are paid. These instructions enable precise coding of the occupations commonly found in these industries and a more precise and consistent reporting of wages of workers with a variety of schedules (e.g., school year vs. annual, part time vs. full time).   

OEWS uses strict quality control and confidentiality measures and strong sampling and estimation methodologies.5 BLS also partners with state workforce agencies to facilitate the collection, coding, and quality review of OEWS data. States’ highly trained staff contribute local knowledge, establish strong respondent relationships, and provide detailed coding expertise to further ensure the quality of the data. 

After assessing the strengths of the OEWS data, the Digest team focused on the comparability of the data over time to ensure that the data would be best suited for stakeholder needs and have the most utility. First, we checked for changes to the industrial and occupational classifications. Although there were no industrial changes, the occupational classifications of some staff occupations—like librarians, school bus drivers, and school psychologists—did change. In those cases, we only included comparable estimates in the tables.

Second, all new Digest tables include nonoverlapping data years to account for the 3-year collection period. While users cannot compare wages in 2020 with 2021 and 2022, they can explore data from 2016, 2019, and 2022. Third, the Digest tables present estimates for earlier data years to ensure the same estimation method was used to produce estimates over time.6 Finally, we did not identify any geographical, scope, reference period, or wage estimation methodology changes that would impact the information presented in tables. These checks ensured we presented the most reliable and accurate data comparisons.

 

Next Steps  

The use of OEWS data in the Digest is a first step in harnessing the strength of BLS data to provide more relevant and timely data, leading to a more comprehensive understanding of the education workforce. NCES is investigating ways we can partner with BLS to further expand these granular and timely economic data, meeting a National Academies of Science, Engineering, and Medicine recommendation to collaborate with other federal agencies and incorporate data from new sources to provide policy-relevant information. We plan to explore the relationship between BLS data and NCES data, such as the Common Core of Data, and increase opportunities for more detailed workforce analyses.

NCES is committed to exploring new data sources that can fill important knowledge gaps and expand the breadth of quality information available to education stakeholders. As we integrate new data sources and develop new tabulations, we will be transparent about our evaluation processes and the advantages and limitations of sources. We will provide specific examples of how information can be used to support evidence-based policymaking. Additionally, NCES will continue to investigate new data sources that inform economic issues related to education. For example, we plan to explore Post-Secondary Employment Outcomes to better understand education-to-employment pathways. We are investigating sources for building and land use data to assess the condition and utilization of school facilities. We are also looking for opportunities to integrate diverse data sources to expand to new areas of the education landscape and to support timelier and more locally informed decisionmaking.
 

How will you use the new Digest tables? Do you have suggestions for new data sources? Let us know at ARIS.NCES@ed.gov.

 

By Josue DeLaRosa, Kristi Donaldson, and Marie Marcum, NCES


[1] See these frequently asked questions for a description of current uses, including economic development planning and to project future labor market needs.

[2] Although most of the K–12 instructional occupations are in the Elementary and Secondary Education industry, both instructional and noninstructional occupations can be found in others (e.g., Colleges, Universities, and Professional Schools; Child Care Services). See Educational Instruction and Library Occupations for more details. For example, preschool teachers differ from some of the other occupations presented in the Digest tables, where most of the employment is in the Child Care Services industry. Preschool teachers included in Digest tables reflect the employment and average annual wage of those who are employed in the Elementary and Secondary Education industry, not all preschool teachers.

[3] Note that estimates do not consider differences that might exist between public and private employers, such as age and experience of workers, work schedules, or cost of living.

[4] This includes a database of businesses reporting to state unemployment insurance (UI) programs. For more information, see Quarterly Census of Employment and Wages.

[5] See Occupational Employment and Wage Statistics for more details on specific methods.

[6] Research estimates are used for years prior to 2021, and Digest tables will not present estimates prior to 2015, the first year of revised research estimates. See OEWS Research Estimates by State and Industry for more information.

Making Meaning Out of Statistics

By Dr. Peggy G. Carr, NCES Commissioner

The United States does not have a centralized statistical system like Canada or Sweden, but the federal statistical system we do have now speaks largely with one voice thanks to the Office of Management and Budget’s U.S. Chief Statistician, the Evidence Act of 2018, and proposed regulations to clearly integrate extensive and detailed OMB statistical policy directives into applications of the Act. The Evidence Act guides the work of the federal statistical system to help ensure that official federal statistics, like those we report here at NCES, are collected, analyzed, and reported in a way that the public can trust. The statistics we put out, such as the number and types of schools in the United States, are the building blocks upon which policymakers make policy, educators plan the future of schooling, researchers develop hypotheses about how education works, and parents and the public track the progress of the education system. They all need to know they can trust these statistics—that they are accurate and unbiased, and uninfluenced by political interests or the whims of the statistical methodologist producing the numbers. Through the Evidence Act and our work with colleagues in the federal statistical system, we’ve established guidelines and standards for what we can say, what we won’t say, and what we can’t say. And they help ensure that we do not drift into territory that is beyond our mission.

Given how much thought NCES and the federal statistical system more broadly has put into the way we talk about our statistics, a recent IES blog post, “Statistically Significant Doesn't Mean Meaningful, naturally piqued my interest. I thought back to a question on this very topic that I had on my Ph.D. qualifying statistical comprehensive essay exam. I still remember nailing the answer to that question all these years later. But it’s a tough one—the difference between “statistically significant” and “meaningful” findings—and it’s one that cuts to the heart of the role of statistical agencies in producing numbers that people can trust.

I want to talk about the blog post—the important issue it raises and the potential solution it proposes—as a way to illustrate key differences in how we, as a federal agency producing statistics for the public, approach statistics and how researchers sometimes approach statistics. Both are properly seeking information but often for very different purposes requiring different techniques. And I want to say I was particularly empathetic with the issues raised in the blog post given my decades of background managing the National Assessment of Educational Progress (NAEP) and U.S. participation in major international assessments like the Program for International Student Assessment (PISA). In recent years, given NAEP’s large sample size, it is not unheard of for two estimates (e.g., average scores) to round to the same whole number, and yet be statistically different. Or, in the case of U.S. PISA results, for scores to be 13 points apart, but yet not be statistically different. So, the problem that the blog post raises is both long standing and quite familiar to me.


The Problem   

Here’s the knotty problem the blog post raises: Sometimes, when NCES says there’s no statistically significant difference between two numbers, some people think we are saying there’s no difference between those two numbers at all. For example, on the 2022 NAEP, we estimated an average score of 212 for the Denver Public School District in grade 4 reading. That score for Denver in 2019 was 217. When we reported the 2022 results, we said that there was no statistically significant difference between Denver’s grade 4 reading scores between 2019 and 2022 even though the estimated scores in the two years were 5 points apart. This is because the Denver scores in 2019 and 2022 were estimates based on samples of students and we could not conclude that if we assessed every single Denver fourth-grader in both years that we wouldn’t have found, say, that the scores were 212 in both years. NAEP assessments are like polls: there is uncertainty (a margin of error) around the results. Saying that there was no statistically significant difference between two estimates is not the same as saying that there definitely was no difference. We’re simply saying we don’t have enough evidence to say for sure (or nearly sure) there was a difference.

Making these kinds of uncertain results clear to the public can be very difficult, and I applaud IES for raising the issue and proposing a solution. Unfortunately, the proposed solution—a “Bayesian” approach that “borrows” data from one state to estimate scores for another and that relies more than we are comfortable with, as a government statistical agency, on the judgment of the statistician running the analysis—can hurt more than help.


Two Big Concerns With a Bayesian Approach for Releasing NAEP Results

Two Big Concerns With a Bayesian Approach for NAEP

Big Concern #1: It “borrows” information across jurisdictions, grades, and subjects.

Big Concern #2: The statistical agency decides the threshold for what’s “meaningful.”

Let me say more about the two big concerns I have about the Bayesian approach proposed in the IES blog post for releasing NAEP results. And, before going into these concerns, I want to emphasize that these are concerns specifically with using this approach to release NAEP results. The statistical theory on which Bayesian methods are based is central to our estimation procedures for NAEP. And you’ll see later that we believe there are times when the Bayesian approach is the right statistical approach for releasing results.


Big Concern #1: The Proposed Approach Borrows Information Across Jurisdictions, Grades, and Subjects

The Bayesian approach proposed in the IES blog post uses data on student achievement in one state to estimate performance in another, performance at grade 8 to estimate performance at grade 4, and performance in mathematics to estimate performance in reading. The approach uses the fact that changes in scores across states often correlate highly with each other. Certainly, when COVID disrupted schooling across the nation, we saw declines in student achievement across the states. In other words, we saw apparent correlations. The Bayesian approach starts from an assumption that states’ changes in achievement correlate with each other and uses that to predict the likelihood that the average score for an individual state or district has increased or decreased. It can do the same thing with correlations in changes in achievement across subjects and across grade levels—which also often correlate highly. This is a very clever approach for research purposes.

However, it is not an approach that official statistics, especially NAEP results, should be built upon. In a country where curricular decisions are made at the local level and reforms are targeted at specific grade levels and in specific subjects, letting grade 8 mathematics achievement in, say, Houston influence what we report for grade 4 reading in, say, Denver, would be very suspect. Also, if we used Houston results to estimate Denver results, or math results to estimate reading results, or grade 8 results to estimate grade 4 results, we might also miss out on chances of detecting interesting differences in results.


Big Concern #2: The Bayesian Approach Puts the Statistical Agency in the Position of Deciding What’s “Meaningful”

A second big concern is the extent to which the proposed Bayesian approach would require the statisticians at NCES to set a threshold for what would be considered a “meaningful” difference. In this method, the statistician sets that threshold and then the statistical model reports out the probability that a reported difference is bigger or smaller than that threshold. As an example, the blog post suggests 3 NAEP scale score points as a “meaningful” change and presents this value as grounded in hard data. But in reality, the definition of a “meaningful” difference is a judgment call. And making the judgment is messy. The IES blog post concedes that this is a major flaw, even as it endorses broad application of these methods: “Here's a challenge: We all know how the p<.05 threshold leads to ‘p-hacking’; how can we spot and avoid Bayesian bouts of ‘threshold hacking,’ where different stakeholders argue for different thresholds that suit their interests?”

That’s exactly the pitfall to avoid! We certainly do our best to tell our audiences, from lay people to fellow statisticians, what the results “mean.” But we do not tell our stakeholders whether changes or differences in scores are large enough to be deemed "meaningful," as this depends on the context and the particular usage of the results.

This is not to say that we statisticians don’t use judgement in our work. In fact, the “p<.05” threshold for statistical significance that is the main issue the IES blog post has with reporting of NAEP results is a judgement. But it’s a judgement that has been widely established across the statistics and research worlds for decades and is built into the statistical standards of NCES and many other federal statistical agencies. And it’s a judgement specific to statistics: It’s meant to help account for margins of error when investigating if there is a difference at all—not a judgement about whether the difference exceeds a threshold to count as “meaningful.” By using this widely established standard, readers don’t have to wonder, “is NAEP setting its own standards?” or, perhaps more important, “is NAEP telling us, the public, what is meaningful?” Should the “p<.05” standard be revisited? Maybe. As, I note below, this is a question that is often asked in the statistical community. Should NCES and NAEP go on their own and tell our readers what is a meaningful result? No. That’s for our readers to decide.


What Does the Statistical Community Have to Say?

The largest community of statistical experts in the United States—the American Statistical Association (ASA)—has a lot to say on this topic. In recent years, they grappled with the p-value dilemma and put out a statement in 2016 that described misuses of tests of statistical significance. An editorial that later appeared in the American Statistician (an ASA journal) even recommended eliminating the use of statistical significance and the so-called “p-values” on which they are based. As you might imagine, there was considerable debate in the statistical and research community as a result. So in 2019, the president of the ASA convened a task force, which clarified that the editorial was not an official ASA policy. The task force concluded: “P-values are valid statistical measures that provide convenient conventions for communicating the uncertainty inherent in quantitative results. . . . Much of the controversy surrounding statistical significance can be dispelled through a better appreciation of uncertainty, variability, multiplicity, and replicability.”

In other words: Don't throw the baby out with the bathwater!


So, When Should NCES Use a Bayesian Approach?

Although I have been arguing against the use of a Bayesian approach for the release of official NAEP results, there’s much to say for Bayesian approaches when you need them. As the IES blog post notes, the Census Bureau uses a Bayesian method in estimating statistics for small geographic areas where they do not have enough data to make a more direct estimation. NCES has also used similar Bayesian methods for many years, where appropriate. For example, we have used Bayesian approaches to estimate adult literacy rates for small geographic areas for 20 years, dating back to the National Assessment of Adult Literacy (NAAL) of 2003. We use them today in our “small area estimates” of workplace skill levels in U.S. states and counties from the Program for the International Assessment of Adult Competencies (PIAAC). And when we do, we make it abundantly clear that these are indirect, heavily model-dependent estimates.

In other words, the Bayesian approach is a valuable tool in the toolbox of a statistical agency. However, is it the right tool for producing official statistics, where samples, by design, meet the reporting standards for producing direct estimates? The short answer is “no.”


Conclusion

Clearly and accurately reporting official statistics can be a challenge, and we are always looking for new approaches that can help our stakeholders better understand all the data we collect. I began this blog post noting the role of the federal statistical system and our adherence to high standards of objectivity and transparency, as well as our efforts to express our sometimes-complicated statistical findings as accurately and clearly as we can. IES has recently published another blog post describing some great use cases for Bayesian approaches, as well as methodological advances funded by our sister center, the National Center for Education Research. But the key point I took away from this blog post was that the Bayesian approach was great for research purposes, where we expect the researcher to make lots of assumptions (and other researchers to challenge them). That’s research, not official statistics, where we must stress clarity, accuracy, objectivity, and transparency.  

I will end with a modest proposal. Let NCES stick to reporting statistics, including NAEP results, and leave questions about what is meaningful to readers . . . to the readers!