Forum Guide to Metadata
NFES 2009-805
July 2009

Chapter 3. Using Metadata - Data Profiling

A good place to end a discussion on quality metadata is with the concept of a data profile. A "profile" is commonly defined as "an analysis representing the extent to which something exhibits various characteristics." As an extension of this idea, a "data profile" is a formal summary of distinctive features or characteristics of a data set, including the data quality items described throughout this section.

Data profiling generally starts with an examination of what an organization expects to find in its data (or database), and then determines whether the data reflect those expectations. For example, what percentage of fields contain data? If a field is mandatory, it should be 100 percent, but profiling may uncover a somewhat different reality. Similarly, if a field stores a coded value, what and how many codes are found in that field? For example, in some organizations, sex might be represented within a single database by "F, M," "female, male," or "x, y." More advanced data profiling techniques can determine whether a particular information system tends to over- or undercount some feature in the dataset, (e.g., the number of students) relative to expected results. As such, profiling often is used to evaluate data quality; assess whether a collection system supports quality; and determine whether documentation and other available guidance are being used correctly.