Forum Guide to Data Quality

PDF acrobat (3.6 MB) & Related Information

TABLE OF CONTENTS

National Cooperative Education Statistics System

Foreword

Working Group Members

Glossary

Part One: Data Quality

Part Two: Case Studies from State and Local Education Agencies (SEAs and LEAs)

Part Three: Data Quality Tip Sheets

Reference List

Related Resources

Part One: Data Quality—Chapter 1: Introduction

The collection and use of education data have grown exponentially in recent decades. As local and state education agencies (LEAs and SEAs) use increasing amounts of data to answer questions, identify concerns, and make decisions, quality data are critical. Quality data help educators at all levels to make informed decisions and improve teaching and learning. Some examples include teachers using formative data in classrooms to guide instruction, administrators using data to justify funding, LEA staff using data to evaluate programs, and SEA staff using data for federal reporting. Quality data result from the efforts of stakeholders throughout the education system who understand why particular data are collected and used; who adhere to agency policies and best practices for collecting, maintaining, and using data; and who promote a culture of quality data, in which everyone who has a role in student outcomes understands their responsibilities in the data process.

Data quality is increased when it is emphasized from the point of entry into the collection system through use for reporting. LEAs and SEAs have different methods of reporting; many LEAs report data through datasets to the SEAs, while others use statewide collection systems. SEAs report through datasets for federal and state reporting. The focus of this publication is best practices to ensure quality data reporting.

What Are Quality Data?

Quality data are complete, valid, and accurate. These qualifications are individually important, but data must first be both complete and valid to be accurate. Complete data include all expected information. Valid data must have been evaluated against certain defined rules to ensure the correctness of the data both in structure and content. While these three qualities are not the only components of data quality, they provide a key foundation upon which other aspects are built.

Completeness

Data that are considered complete meet two qualifications. The first qualification is that all expected records or respondents are entered into the collection system and included in the datasets used for reporting. For example, an LEA might check that all schools expected to respond to a data collection submitted data, and an SEA might check that required data have been submitted by their LEAs. The second qualification is that each record or response includes all expected information. For example, a record for a student discipline incident in which the student was suspended that did not include the number of days of the suspension would be lacking critical information. Student enrollment data, which indicate whether a student’s name was, is, or will officially be registered at a school or schools, show the importance of completeness to quality data.1 At the SEA level, completeness means that for each submission, the SEA has data from each LEA (and within each LEA, data from each expected school) and each individual data element within each required file is submitted.

For data to be considered complete, zeros, missing data, or data anomalies must follow the guidelines established by the file requirements or specifications. A file specification is a technical document that establishes rules for how each piece of information should be submitted, and provides guidelines for handling specific situations, such as missing data. Some data collections may require, for example, that any anomalies are explained with a data note or reason code.

Data notes and reason codes are especially useful for large datasets, in which a single data anomaly can lead to multiple reported errors. Some collections require that an agency submit an action plan for fixing missing or anomalous data issues before data are considered complete.

Validity

The Oregon Department of Education has front-end validations that are checked upon data loads. These typically are restricted to ensuring the submission conforms to the file format and business rules for the collection (that is, the data are valid and internally consistent). Once all data are submitted, data stewards review the data for additional quality checks, including reasonableness and consistency. Individual rows of data are flagged, with a reason for the flag entered, and those rows are uploaded in an audit system for school districts to review. This gives them the option to confirm that the data are correct or to correct any errors that are present. This audit process has significantly improved data quality for the state.

Valid data conform to expectations for reasonable values and accepted norms.2 For example, when collecting staff and student phone numbers, valid data typically do not include letters, only numbers. Validity also encompasses formatting; for example, data standards often specify the format for entering the month, day, and year in date fields. Consistency in whether items such as months are stored as text or as values is another formatting issue that can affect data validity. Additionally, valid data do not contain duplicates or redundancies, which can affect calculations.

Validity and completeness should be checked against business rules as data are submitted. Business rules guide the creation and management of quality data. A business rule is a framework or set of constraints under which an organization operates and the expression of those constraints as a mathematical or logical assertion governing how data can be entered or used within a data system. For example, an agency that only serves students between the ages of 5 and 21 may have a business rule that states that values for the data element Student Age must fall within the range of 5 to 21 (that is, 5 ≤ Student Age ≤ 21).

Accuracy

Accurate data pass edit-checks, follow data quality rules, and contain no known errors. In many cases, the data are certified by the appropriate party.3 Accuracy follows validity because data cannot be accurate if they are not valid. Data accuracy is specific to each data collection and refers to whether data adhere to pre-set standards and expectations. Examples of expectations:

  • Reasonableness—Values do not fall outside of an expected range and are comparable to similar collections. For example, a state might compare data for schools with similar demographics to establish parameters for what is considered reasonable.
  • The collection of student guardian data in Milwaukee Public Schools (WI) attested to the importance of accuracy. To avoid the risk of sharing secure data with people who should not have access, the LEA ensures that legal documentation identifies a grandparent or other adult as guardian prior to their information being entered into the student information system (SIS).

  • Consistency—There are no large, unexpected changes in data from what was submitted in the prior collection period. For example, communities with stable industries would expect stable student enrollment. Unexpected and unexplainable changes may indicate a data accuracy issue.
  • Summation to totals—The sum of elements is equal to subtotals and totals, and combined subtotals are equal to totals. For example, the sum of individual educational attainment categories should equal total membership.
  • Internal consistency within a file—Values entered for individual elements are consistent with related elements. For example, age is consistent with birthdate.
  • Consistency across files—Data submitted across multiple files are aligned. For example, daily attendance codes for individual students should align with the aggregated totals for the school, which are then reported to the LEA.

Accuracy also must be viewed in the context of a particular situation. Large shifts could occur across time periods and may not be errors. For example, Milwaukee Public Schools (WI) created a virtual program in one of the district’s schools, and the enrollment skyrocketed after the first year. This shift was striking, but the data were both valid and accurate.

Ensuring Accuracy

Since home addresses can be written in different formats and with varying abbreviations, it is possible to have multiple versions of the same address appear in the student information system (SIS). To avoid the possible confusion this may cause with residency, some districts use an address database to ensure that each address only appears in one form in the SIS. Rather than typing an address, staff members select an address from a prompt.

Data collections should be reviewed and certified for accuracy. Certification is the process of revalidating data after all edits have been reviewed and resolved and ensuring that the data can pass all business rules. Data that do not pass all business rules may be certified, provided that an accepted reason is given. For example, LEA data provided to SEAs may be incomplete when data collections are affected by natural disasters. An action plan that accompanies the data can explain why the agency cannot provide the data and any necessary steps planned to ensure that the data are available for the next collection. For more information and best practices that education agencies can adopt before, during, and after a crisis, see the Forum Guide to Planning for, Collecting, and Managing Data About Students Displaced by a Crisis (https://nces.ed.gov/forum/pub_2019163.asp).

A Framework for Data Quality

One model that provides a more detailed structure—and that builds upon the foundation of completeness, validity and accuracy—is the Data Quality Framework established by the Federal Committee on Statistical Methodology (FCSM).4 In response to the rapidly changing world of data sources and analysis methods, the FCSM established a Data Quality Analysis Working Group in 2020 to provide practical information on identifying and reporting data quality for federal agencies. The working group established a quality framework that provides an inventory of the elements (that is, domains and dimensions) of data quality with a review of identifiable threats to each dimension of data quality. The FCSM Data Quality Framework provides a common foundation upon which federal agencies can make decisions about the management of data products throughout their lifecycle by identifying and mitigating key data quality threats, evaluating trade-offs among different quality dimensions where necessary, applying accepted methods at an appropriate level of rigor, and accounting for and reporting on the quality of data products and outputs.

While the FCSM Framework was designed with federal agencies in mind, it provides a structure that is useful at many levels. It divides the concept of data quality into three key domains: utility, objectivity, and integrity, and each of these domains is composed of multiple dimensions. Agencies or stakeholders working to establish and maintain data quality—especially when working with staff newer to the concept—may be able to use the components and definitions provided by the framework to expand understanding of the necessary concerns involved in a data quality process and culture.

Figure 1: FCSM Data Quality Framework
Figure 1: FCSM Data Quality Framework

Why are Quality Data Important?

Education agencies rely on high-quality data for all aspects of their operations, from supporting students in the classroom to planning, funding, and providing efficient programs and services to students. Quality data are also essential for communicating information about education to stakeholders such as parents, community groups, boards of education, policymakers, and others. Although data collected by schools and LEAs are used to inform local decisions and meet local needs, many also are required for federal and state reporting. Federal staff rely on relevant, accessible, timely data collected at the right level of detail, or granularity, to make decisions. The quality of these data is essential to ensure that schools receive the appropriate resources, comply with laws and regulations, and make effective internal decisions about how services are provided to students.

Quality data allow agencies to track and use key information, meet reporting requirements, and shape and support their data strategy. Quality data help

  • teachers make the right decisions about their students’ instructional needs;
  • district leaders assess the effectiveness of systems and services in aiding student development;
  • principals track student and teacher progress to make sure goals are being met and problems that impede progress are identified;
  • district personnel apportion staff or other resources equitably;
  • state departments of education plan and manage effective programs;
  • state lawmakers make informed decisions about public education programs and funding;
  • researchers evaluate the impact of education programs; and parents, policymakers, and other stakeholders understand how resources are making a difference in education.
Data Requests from Stakeholders

Clear, specific data requests provide a stronger basis for communication between requestors and data stewards. When requests for information in West Virginia’s SEA go beyond what is regularly available, an initial data request process with a brief form allows state data staff to pull the data for the requestor, if possible and in compliance with standard privacy protections. The form clarifies the request and allows the data team to keep track of requests to inform planning for additional public reporting. Beyond this level of information is a tier for researchers who want access to restricted-use or suppressed data. In these cases, researchers must submit a more official and detailed proposal application.

More information about data requests can be found in the Forum Guide to Strategies for Education Data Collection and Reporting (SEDCAR) (https://nces.ed.gov/pubs2021/NFES2021013.pdf).

Because practices related to data collection and use change over time, education agencies must review their policies and procedures regularly to account for new requirements, changing technologies, and shifting data needs. This guide offers considerations and best practices for agencies to ensure that their policies and procedures help improve data and encourage a culture of data quality.



1 U.S. Department of Education, National Center for Education Statistics, Common Education Data Standards. CEDS Elements v. 10. Retrieved August 28, 2023, from https://ceds.ed.gov/elements.aspx?v=10.
2 National Forum on Education Statistics. (2020). Forum Guide to Exit Codes. U.S. Department of Education. Washington, DC: National Center for Education Statistics.
3 EDFacts. Maximizing EDFacts Data Quality: A Comprehensive Approach. Retrieved August 28, 2023, from https://www2.ed.gov/about/inits/ed/edfacts/edfacts-data-quality-process-overview.pdf.
4 Federal Committee on Statistical Methodology. (2020). A Framework for Data Quality. FCSM 20-04. Federal Committee on Statistical Methodology. September 2020.