The Forum Guide to Data Ethics
NCES 2010-801
March 2010

7. Provide all relevant data, definitions, and mentation to promote comprehensive understanding and accurate analysis when releasing information

The data looked good. Scores on the ninth grade state mathematics assessment had gone up substantially. But rather than celebrate, the administrative staff stared at each other. The testing coordinator said what everyone was thinking: "You know, we could get away with reporting that our scores have improved—the assessment has the same title and the scores are higher. But we know that the test was redesigned and it would be wrong not to mention that in our reports, right?" Fortunately, the superintendent went even further, and asserted, "Not only will we state that the assessment has changed, we will emphasize the point given the high likelihood that people will be confused unless it is addressed explicitly".

Data handlers are ethically obligated to provide sufficient information for data users to reasonably interpret the meaning of data shared in reports and other publications. In some cases, this can mean including additional information (e.g., relevant differences between the old and new assessments) or pointers to other data sets. In others, it might mean sharing caveats about appropriate data use and quality limitations that might not otherwise be understood or assumed by a data user. Under almost all scenarios, data users need access to data definitions, formulas, methodologies, documentation, and other contextual information that influence use and interpretation. This is a delicate process; too much detail, or too technical an explanation, can lead readers to throw up their hands (and the report!) in frustration.

Education indicators are too often interpreted out of context. For guidance on appropriate indicator use and presentation, see the Forum Guide to Education Indicators. Also, see the Forum Guide to Metadata for information related to "data about data."

Defining the terms in reports goes a long way toward ensuring they are understood. Best practice definitions for many data elements can be found in the NCES Handbooks Online.

No isolated piece of data is meaningful without related data that explain further, provide a comparative or complementary perspective, or otherwise serve as context for guiding interpretation. In fact, most data are value neutral unless interpreted in light of their context. For example, "test score" is a piece of information used frequently (and with high stakes) in schools. But what does a value of "68" mean? Is it an individual's score, a class average, or a national median? What is the passing score? Is the exam in a core subject matter area or in an elective class? Is passing the test required for graduation? Value judgments—whether a "68" on an exam is a good score—depend greatly on the related data that provide context in which meaning is assessed.

Just as canon 2 asserts that data can never completely represent an individual, no single piece of data can supply all the information needed to answer a policy question confidently. Assessing any aspect of a complex education enterprise usually requires a sizable body of data so that decisionmakers can inspect the issue within a well-integrated, multidimensional context. For example, consider two school districts: the first with a 90 percent graduation rate and the second with a 40 percent graduation rate. Two years later, these rates have improved 3 points at the first school, to 93 percent; and 20 points at the second school, to 60 percent. Without the original graduation rates, one might assume that the school with the 20 point increase is retaining students more effectively than the school with the 3 point increase. Looking at data over time provides a context in which data users would see that the difference in improvement is probably affected to some degree by the starting points—it becomes harder to improve as any rate approaches the 100 percent ceiling. Other information could advance the interpretation even further. Say, for example, that the school with the 40 percent graduation rate was the alternative school for students at risk for dropping out. In that case, progressing from a 40 to 60 percent graduation rate would be a significant achievement.

No single piece of data can provide meaningful information in the absence of related data that provide context.

Recommended Practices and Training

  1. Establish statistical standards and guidelines for data presentation in all public reports.
    1. Include explanations of any preparatory or statistical procedures used while developing the report.
    2. Prepare documentation to summarize research methodology and issues involved in collecting, analyzing and publishing the data.
    3. Include file structure, record layout, codebook, metadata, and data dictionary guidance or databases as appropriate.
    4. Include definitions for all technical, data, and educational terms and jargon; as well as any surveys and formulas referenced in the report.
    5. Include related information that encourages a broader, more comprehensive, and accurate interpretation of the data in the report.
    6. Add an explanation when readers may misinterpret data. For example, if dropout rates are being reported, state whether the data represent a cohort rate or annual rate, and define the rate, rather than assuming it will be intuitively understood.
    7. Provide technical contact information on all reports so that readers with questions about methodology, definitions, or documentation know where to turn for additional information and clarification.
  2. Subject draft reports to multiple reviews and share reviewers' comments with the authors.
  3. Train all data handlers to provide additional data and metadata in reports and correspondence as appropriate to improve data use and interpretation.