Skip Navigation
small header image
National Forum on Education Statistics

Chapter 2: Education Data Model Overview


What is a Data Model?

A data model is a structured description of how data are represented, organized, and accessed in an information system. A data model defines the data objects contained in an information system, the relationships between these data objects, and how the information is to be used. Data models are often represented as entity relationship diagrams. They support the development of information or data management systems, enable the exchange of data, enhance application maintainability, and can be imported in parts or whole to reduce the time and cost of developing new systems or redesigning existing ones.

Data modeling consists of a family of techniques used to describe the types of information important to an enterprise. It is a critical tool in providing the information that enterprises need to thrive and to comply with government regulations and industry standards. Data modeling is no longer limited to databases nor is it still viewed as just a tool for information technology (IT) staff. Rather, it is an increasingly expected tool for designing, visualizing, and communicating about education data, data systems, and analyses.

The enterprise of education requires continuous and wide-ranging information-based conversations to facilitate everything from teaching and learning processes to making federal policy decisions. Data modeling makes it possible to organize that information so that it can be used effectively and efficiently.

The American National Standards Institute describes three kinds of data models:1

  1. A conceptual data model describes a content domain, such as education information, that consists of significant concepts and the relationships among the concepts in the domain. The meaning of a concept is communicated not only by text descriptions, but also by the concept's context in the web of relationships with other concepts. The relationships between pairs of concepts can be expressed in a formal logic language that is independent of any data manipulation technology or implementation environment.
  2. A logical data model describes a subset of a content domain, such as student information or special education program information, in terms of a particular data manipulation technology. For example, if the data manipulation technology is a relational database, then the content is expressed in terms of tables and columns. If the data manipulation technology is XML2, then the content is expressed in terms of XML objects and XML elements. Logical models do not contain details of the implementation environment.
  3. A physical data model describes a logical model in terms of the implementation environment. The implementation environment includes the specific product being used to manage the data and the physical means by which data are stored, such as servers and disk drives, lookup tables, management tables, etc.

Database engineers and designers use physical and logical models to design and implement their systems. Conceptual models are convenient tools to help end users and other stakeholders discuss, understand, and participate in the design of systems, such as education data systems or education delivery systems. The Education Data Model is a conceptual data model.

The Education Data Model: Version 1 (PK-12) depicts a large portion of the information that should be contained in education software, information systems, and data warehouses. As a high-level conceptual model, the Education Data Model outlines the meaning of concepts and the relationships among concepts without imposing any particular logical or physical implementation approaches. (See chapter 4, especially the Concept Map section for further information about the model's content.)

The information in this conceptual model takes into account the processes associated with teaching, learning, and the business operations of education organizations. The Data Model focuses on granular information at the school and LEA levels, rather than upon aggregate statistics or indicators for accountability; for example, it includes "student" as an entity but not "total number of students." However, the Data Model includes information that is necessary to produce aggregate and other types of statistics. In brief, the Education Data Model is a catalogue of the data used in PK-12 education and a description of the relationships among those data.

Figure 1 depicts the concepts, relationships, and attributes of the Education Data Model. In the Data Model and the figure below, concepts are called entities (constructs that need to be tracked, measured, and described by software systems in order to support education processes). These entities have relationships to one another, reflecting the associations that exist between them in the real world. In addition, entities have attributes (information associated with an entity that can be measured, classified, or described) (see the Core Data Model Concepts section of Chapter 4). These entities, relationships, and attributes provide the basis for answering important education questions and designing complete information systems.

Figure 1. Concepts, Relationships and Attributes of the Education Data Model

Figure 1. Concepts, Relationships and Attributes of the Education Data Model

For instance, in the Education Data Model, teachers and students are entities. These entities are related because teachers provide services to students, and conversely, because students receive services from teachers. Teachers have attributes such as full-time equivalency, certification, hire date, role and content knowledge. Students have attributes including state ID, name, free and reduced price lunch eligibility, allergies, and course completion records. Later in this guide, the structure of the data model will be discussed in greater detail, showing how the entities are organized into a broad taxonomy of classes and subclasses (see chapter 4 for further information).

The Benefits of Conceptual Education Data Modeling

Comprehensive data models can be invaluable in helping education respond to changing business conditions. When used appropriately, conceptual data models can lead to more accurate data, which in turn support more effective use of data contained in business-intelligence and business-analytic tools. Education stakeholders at the LEA and SEA levels who understand basic data-modeling concepts and who work with effective data models are likely to be especially productive. Conversely, information workers who have to struggle with ineffective models or who are not literate in data models face a couple of unhappy choices. They can rely solely on IT counterparts or vendor-specific solutions to assist them with data analyses, and may not get exactly the information they need; they can skip the analyses altogether; or, potentially the most dangerous solution, they can base analyses on inaccurate or incomplete data. Some possible benefits of a well-developed data model are described in the following subsections.3

Process Design: The very process of collaboratively designing a data model has substantial benefits. Designing a local data model can bring together all stakeholders and give them a better understanding of their own data needs and those of their coworkers. The process can also help stakeholders better understand the needs and value of reporting authorities, and drive data decisions and uses with the input of stakeholders' diverse education expertise rather than depend on IT staff to make all determinations.

Communication with Stakeholders: Education shares with other endeavors the tendency to operate in various silos, based on the belief of each silo's owner that it represents unique needs and functioning. Data models, when used strategically, can stimulate communication between educators who are the silo owners and with various external stakeholders requesting program information—policy makers, the media, local and state agencies outside of education, and parents. When stakeholders share a common and comprehensive picture of all the data needed in the education enterprise, individuals in the various stakeholder roles can more readily and accurately communicate their needs.

Employee Training: Comprehensive data models can provide significant professional development opportunities for data users. Staff who are vested owners of a developed data model become better informed in all aspects of the education enterprise and the various dependencies on information. In local systems development, each employee adds expertise to the model to ensure completeness and clearly articulate the data needs to be met. An additional benefit is that an understanding of the need for quality data becomes embedded in the culture of the entire organization.

IT Benefits: Accurate data models lead to more productive and usable data systems at all levels of the organization. Input from the eventual consumers of data systems enables IT professionals, including vendors, to design and implement systems that accurately depict information needed to support daily operations, teaching and learning, and various data reporting demands: information systems that work.

Reporting and Operational "Alignment": Well-developed data models can have dramatic effects outside the daily operations in education. Data models can be used to align data collected and maintained in local systems with data needed for state and federal reporting entities such as the accountability reporting for No Child Left Behind. When data models in local systems vary greatly from the data model of the mandating entity, the reporting process becomes inefficient and increases the burden on the reporting entities, leading to the complaint that schools are asked to report what is basically the same information over and over again. This extra burden ultimately affects data quality and can result in misinformation.

Risks Associated With Conceptual Modeling

One of the most common stumbling blocks in developing and using data models is viewing them first and foremost as IT projects. Successful data model projects first focus on the business and use case needs of eventual data consumers. A strength of the conceptual Education Data Model is that it is technology-independent and was developed by individuals in various stakeholder roles. Before technology becomes the focus of a data modeling project, the considerations described in the following subsections should be taken into account.4

Product Scope: If the scope of a data model project is not carefully thought out before other work begins, the project may end up with the wrong product—or no product at all. During the development of the Education Data Model Version 1: PK-12, the task force carefully defined the scope of the two year project. As a first step, the task force determined that the Data Model would focus only on PK-12 data—more specifically, on the data interactions and definitions supporting the student, teacher, and course triad. Although it is expected that the Data Model will continue to develop over time, defining the initial scope allowed the task force to complete Version 1.

Incompleteness: As with data systems, a data model is never really complete. Instead, it should be viewed as dynamic information that provides ongoing opportunities for input and for addressing evolving information needs. In this project, the task force made every effort to identify the most basic and common data needs at the school and district levels, and left itself open to add future directions.

Generalization: In effective data model development, it is important that specific bits of data link via specific relationships to other bits of data. In building conceptual data models, it is easier to make general linkages between categories of entities than to look for the specific relationships that provide the most detailed information for users. Attaining this kind of specificity is difficult, but the standardization that results from using this kind of granular information is beneficial. The Education Data Model is built on specificity rather than on general linkages.

Rationale for Developing the Education Data Model: Version 1 (PK-12)

The Education Data Model is a comprehensive, localized, conceptual model that provides a generic blueprint for schools and districts. This blueprint enables schools to evaluate and improve instructional tools, communicate those needs to their umbrella agency or directly to vendors, enhance the movement of student information from one LEA to another, and in the end, have better tools to inform instruction. Using a standard Education Data Model as a starting point contributes to a comprehensive understanding of the need for data, how data are used, and the questions that can be answered with the data. For instance, the Data Model helps to answer questions such as the following:

  • What data do schools, LEAs, and states need to collect and manage at the local level to meet the information needs of students, staff, and other stakeholders?
  • What data do they need to effectively manage education organizations in order to increase success in teaching, learning, and school leadership?
  • What data do they need to efficiently manage and run an education organization from a fiscal and administrative perspective?

Schools and LEAs that have data models often rely on proprietary models developed by vendors and implemented in vendors' software applications. With two out of three (69.6 percent)5 districts in the United States enrolling fewer than 2,500 students, many LEAs either cannot afford proprietary data solutions, or cannot afford to tailor purchased models to the needs of their education stakeholders. States may lack the financial or technical means to develop comprehensive data models for their LEAs.

Until now, there has not been a comprehensive, nonproprietary, generic education data model for use by schools, LEAs, and states to design or guide the selection of systems for instructional delivery, decision support, operations, reporting, and data warehousing. The Data Model serves as a tool to convene relevant stakeholders around the evaluation of data management processes at the school and LEA levels. The model works as a tool to help agencies through the process of identifying requirements, and can then be used to communicate these requirements to vendors (or internal staff in the case of a build solution). And, the model provides a template against which agencies can assess the adequacy of proprietary systems to meet their specific needs.

Uses and Users for the Education Data Model

The Data Model is intended to serve multiple purposes for multiple audiences. Some potential users who should find its benefits relevant to their roles and responsibilities include:

  • Local educators, administrators, and technology directors: To help spur data management discussions and define needed data elements when designing, choosing, or augmenting education software or data systems. For example, the Data Model can help
    • schools evaluate their own local data input and management from a multiple stakeholder perspective (clerical, administrative, educator, etc.);
    • policy makers become more aware of the range of available data and form appropriate policy questions; and
    • request for proposals writers determine more accurately and efficiently the range of information needed to meet the requirements of a desired system.
  • Software developers: To ensure that software products contain standard information needed by all PK-12 organizations. For example, the Data Model can help
    • vendors develop systems that more accurately support management, analysis, and reporting; and
    • responders to requests for proposals understand the appropriate range of information for the systems and services to be provided.
  • Researchers: To identify what information is currently available in schools and what might be available in the near future. For example, the Data Model can help
    • researchers explore the composition of data sets for analysis and communicate their data needs.
  • All constituents: To communicate about the education-related information that should be collected now and in the future, in order to foster research and practice. For example, the Data Model can help LEA and SEA administrators better identify analytical needs and the necessary data for continuous improvement planning.

Top


1 American National Standards Institute. 1975. "ANSI/X3/SPARC Study Group on Data Base Management Systems; Interim Report". FDT (Bulletin of ACM SIGMOD) 7:2.
2 Extensible markup language (XML) is a flexible way to create common information formats and share both the format and the data on the World Wide Web, intranets, and elsewhere.
3 Maguire, Joe. 2008. Data Modeling: A Necessary and Rewarding Aspect of Data Management. Midvale, UT: Burton Group.
4 Maguire, Joe. 2008. Data Modeling: A Necessary and Rewarding Aspect of Data Management. Midvale, UT: Burton Group.
5 Table 84: Snyder, T.D., Dillow, S.A., and Hoffman, C.M. (2008). Digest of Education Statistics 2007 (NCES 2008-022). National Center for Education Statistics, Institute of Education Sciences, U.S. Department of Education. Washington, DC.