The state education agency’s data collection and management practices had come about over time, driven mainly
by compliance and funding. Various program areas were created to focus on specific federal surveys, and staff
collected the data needed to do their jobs. Program area staff administered the surveys, followed their own
quality assurance processes, and maintained and secured the data in their own silo systems. And, of course,
data were reported as required by the federal government. Individual managers took their own approach to
directing staff and organizing work, and coordination across program areas was limited.
Over time, different departments began to collect some of the same data elements. Inconsistencies were commonplace.
For instance, while the student information system listed an “Aileen Hutchinson,” who was not in special education,
the Special Education Department’s system included a girl named “Allie N Hutchinsen.” Despite these discrepancies,
the staff knew the two records referred to the same student based on other directory information. However,
structural differences between the systems—different definitions, formats, and option sets—further complicated
matters. For example, Allie’s race, White, was coded as “1” in one database and “2” in the other because the
systems used different option sets. Furthermore, because program areas defined their own data elements and
used different software to manage them, the ability of the agency’s many data systems to “talk” to one another
varied from limited to nonexistent, burdening staff with redundant data entry work and introducing errors into
the system. With no clear requirement for documentation of data processes, methodologies often changed, as
happened when Joe left the agency and no one else knew how to produce the dropout rate. The new guy, Steve,
calculated it the way he had at his previous agency. No one saw a problem with this, especially since Steve’s
numbers were lower than Joe’s. But when the time came to compare the new rates to previous year’s, some staff
realized that they were comparing apples to oranges.
Year in and year out, the work was done, but specific tasks weren’t assigned to anyone in a consistent manner,
and sometimes reports were late or incomplete. And since there was no “official” source for each data element,
data for federal reports might come from one source one year, another the following year. Similarly, without
clear guidelines, staff fielded data requests as best they could. When inquiries came in, the recipient would
decide to which program area the request should be sent. In one instance, Talia was asked for the school
addresses in a district. While McKenna and Vita both managed school directory data, in separate systems,
Vita was given the task because her desk was near Talia’s.
When postsecondary education leaders asked about sharing data, agency officials cried “FERPA!,” invoking the
federal privacy law they mistakenly thought prohibited such exchange (see chapter 7). Security and access
protocols were not well understood, and staff often took a lax approach to protecting sensitive information
(see “Identity theft in the printer room” in chapter 8). Data quality problems were handled as they arose,
but no long-term changes were made to ensure the same issues wouldn’t crop up again. And since no responsible
group or process was in place to identify the sources of the agency’s problems, they went undiscovered and data
quality issues were just blamed on IT. This went on for some time and, other than the techies, most staff didn’t
see much of a problem. It was simply business as usual.
Over time, the country became more interested in education data. Education stakeholders wanted more information for
accountability purposes and to better understand what programs and instructional strategies worked. They wanted data
to inform decisionmaking at all levels; and to improve administration, instruction, and student performance. The bottom
line was, they wanted data from across the agency and they wanted them fast. This changing environment posed many problems
for the agency. Requested analyses required linking across silos, or data integration in a central data store. Before this
could happen, duplicate, inconsistent data had to be reconciled. However, once the integration work began, more inconsistencies
were discovered than anyone had imagined. Data quality had to be a higher priority, security had to be improved, and the data
elements collected had to serve business and stakeholder needs, not just meet federal requirements.
Better methods of sharing data had to be devised if the agency was to meet the growing demand for a “P–20” system. And
better, more consistent protocols were needed to make data sharing more efficient and prevent improper dissemination.
The chief information officer decided something had to be done. Having seen a presentation at a national conference,
he was convinced a process called “data governance” could help address the agency’s problems. See figure 5