Federal Committee on Statistical Methodology (FCSM) Equitable Data Toolkit

Race and Ethnicity Data Tools

Recommended Citation:

Irving, S., Ahmed, S., Burns, S., Jensen, E., Koo, G., Pratt, B., and Sivinski, R . (2023). Race and Ethnicity Data Tools. FCSM Equitable Data Toolkit: A Toolkit for Strengthening Federal Data to Analyze Historically Underserved Populations. Washington, DC: Federal Committee on Statistical Methodology.

|

The value of detailed race and ethnicity data
Detailed race and ethnicity data are needed to identify small and vulnerable racial and ethnic groups and to understand differences in outcomes of interest (e.g., income and poverty levels, employment and disability status, and birth and mortality rates) across racial and ethnic groups. These data are also relied upon to responsibly enforce civil rights laws. In particular, race and ethnicity data collected by federal agencies through censuses, surveys, administrative forms, etc. are used to monitor equal access in housing, education, employment, and other areas. They are essential for understanding populations that have historically experienced discrimination and differential treatment.

To meet growing demand, federal agencies—both statistical agencies and program agencies1 -can work to enhance the detail of the race and ethnicity information they collect and produce data that comply with all applicable federal guidelines (e.g., protecting the disclosure of personally identifiable information (PII)) and meet the growing demand of data users. This must be done in the context of maintaining confidentiality standards and data quality, while minimizing data collection costs and respondent burden. Producing accurate and reliable detailed race and ethnicity data can present challenges for federal agencies, including locating and motivating respondents, ensuring sufficient numbers of cases for meaningful analysis, and protecting the confidentiality and privacy of respondents.


Goals of Race and Ethnicity Tools

The purpose of this section of the FCSM Equitable Data Toolkit is to provide federal agencies with useful tools for supporting equity analyses of racial and ethnic groups, particularly detailed groups beyond the minimum Federal racial and ethnic reporting categories. These tools can help agencies collect more detailed and accurate data; access, analyze and use existing data; and release more granular data while protecting privacy.

While some approaches require significant resources and expertise, there are incremental approaches that every agency can take to improve the representation of underserved populations in their data and analyses.

This report provides tools for improving:

  • Privacy and confidentiality (e.g., maintaining the confidentiality for data subject who are of small or vulnerable racial and ethnic groups),
  • Data collection (e.g., locating and oversampling specific groups),
  • Data analysis and evaluation (e.g., maximizing the accuracy and reliability of statistics given small numbers of cases), and
  • Data access and dissemination (e.g., developing strategies for minimizing disclosure risk when releasing information).

Some of the steps above—such as “targeting and oversampling specific groups”—may be especially applicable to a federal agency as it constructs or revises a survey design. Program agencies that conduct surveys or collect data from applicants, clients, or participants in the course of administering a federal program also need to consider the information it solicits on race and ethnicity in light of increased requirements to evaluate program effectiveness in terms of historically underserved populations (e.g., certain racial and ethnic populations).

1 "Program agency," for purposes of this Toolkit, refers to an agency or unit, typically within the organization structure of a Federal department, that administers, or helps to administer, a Federal program within which a determination about the rights, benefits, or privileges of individuals, businesses, or institutions is made, including those agencies with regulatory or law enforcement responsibilities.




Federal guidelines on race and ethnicity encourage the production of:

… as much detailed information on race and ethnicity as possible. However, Federal agencies shall not present data on detailed categories if doing so would compromise data quality or confidentiality standards.2

The following tools can help balance between the need to collect and release detailed, useful data and the legal and ethical obligations to protect privacy and ensure confidentiality.

FCSM’s Data Protection Toolkit

Agencies should start with meaningful assessments of the three dimensions of the triple constraint: disclosure risk, access to data, and data accuracy. Agency leadership and staff with statistical expertise should discuss these dimensions, the interrelationships and tradeoffs between them, and develop agency standards that are appropriate based on law and the agency’s risk assessment.

By leveraging best practice policies, methods, and technologies, agencies can sufficiently mitigate the overall risks while promoting broader access to sufficiently accurate data. This may entail the design and application of statistical disclosure limitation methods best suited to the intended data users’ needs; or could involve agencies providing data users with controlled access to the confidential data through one or more tiered access models1.

protection techniques on the reliability of the data (e.g., see Section 4. Evaluating the 2020 Census Data).

2 U.S. Office of Management and Budget. (1997). Standards for Maintaining, Collecting, and Presenting Federal Data on Race and Ethnicity. Federal Register, 62(210): 58782-58790. Retrieved June 17, 2022 from, https://www.govinfo.gov/content/pkg/FR-1997-10-30/pdf/97-28653.pdf.

3 Legal Information Institute. (n.d.). Right to Privacy Definition, Wex Law Dictionary. Retrieved June 17, 2022 from, https://www.law.cornell.edu/wex/right_to_privacy#:~:text=1)%20The%20right%20not%20to,fundamental
%20personal%20issues%20and%20decisions
.

4 4 U.S.C. § 3563.




Federal agencies serve the needs of diverse racial and ethnic communities. Many racial and ethnic communities, however, are not adequately represented in federal data.5 The approaches below may help to address some frequently encountered barriers to participation.

As racial and ethnic populations are heterogenous, the same approach will not work for every audience. A mixture of approaches is preferred to improve engagement by diverse populations. Often, barriers to participation need to be probed before applying a specific approach. For example, many community members do not provide personal data due to a lack of understanding, motivation, or trust.6

Data can be collected through surveys or administrative forms, such as applications for permits or benefits. The methods and context of data collection will determine the specific barriers that are present. Agencies should begin by identifying these barriers and then applying the appropriate strategies below.

Ways to Improve Participation Across Racial and Ethnic Groups

Identify the population of interest. Start by being specific about the intended use of the data. Consider possible internal uses of the data, and use public comment and other methods of stakeholder engagement to help identify particular racial and/or ethnic groups that you want to draw inferences about.

Help respondents see themselves in response options. Use culturally relevant labels that people relate to, within statutory requirements (as described above). Respect that populations are heterogenous and may require customized approaches. Provide examples that can help respondents match their self-identity to available options and, if possible, allow respondents to select more than one option. Ask people to self-identify their racial and ethnic identity, but do not collect more detailed demographic data than is necessary to avoid adding to fears of surveillance. Offer free-response "fill-in" options if you have the capability to capture and process these responses.

Collaborate with other agencies, whenever possible. Share experiences on outreach and engagement through communities of practice. Collaborate with other agencies in data collection to reduce duplication of data and data collection approaches that allow for datainteroperability. Share networks of trusted partners, organizations, and community leaders to help engage diverse audiences. Your agency’s Paperwork Reduction Act staff can help you identify other offices within your agency, and, through the Council of Agency Paperwork Reduction Act (CAPRA), they can connect you with efforts in other Departments.

Engage communities of interest, including community organizations, advocates, media, respected leaders, and influencers. Work with local leaders who have the trust of their communities. When resources allow, cultivate relationships with people who can validate and amplify your message across different segments of the community – faith and civic leaders, media that reaches the communities of interest,7 small business owners, and social media influencers. People who have a high profile with local communities may not be visible to federal agencies. Take the time to look into the background of grassroots contacts when possible. Use trusted community leaders to distribute outreach materials and post them on a publicly accessible website.

Engage communities early to identify barriers to participation. Listen for what might motivate community members to participate or to avoid participation. Ask, for example, how they conceive of demographic concepts such as "race" and "ethnicity" and how they prefer being asked these questions. For example, prior to the 2020 Census, the U.S. Census Bureau conducted a study into barriers, motivators, and attitudes related to participation in the census. While this was an extensive and well-funded effort, agencies can invest in smaller-scale efforts to examine similar issues.

Front load outreach investments. It takes time to develop local relationships. It also takes time to raise awareness, understanding, and engagement with data collection efforts. However, an early investment in awareness-building in the form of outreach with community members and leaders who can encourage their communities to participate may be more cost-effective than making individual visits or phone calls to make up for low participation rates. Establish indicators to monitor data collection so you know if you are on track to collect enough data and adjust outreach investments accordingly, rather than waiting until the end of data collection to evaluate response from various communities of interest.

Communicate the purpose and value of data collection. Communicate the benefits of participation with those populations so that they better understand the value of engaging. Name specific, positive outcomes that could accrue to the community if individuals provide needed data – answer the question, "What’s in it for me/us?" For example, the 2020 Decennial Census offered materials explaining "Why we conduct the decennial census" and "Why we ask each question" during their outreach campaign. Additionally, trusted organizations who provide benefits (e.g., state Supplemental Nutrition Assistance Program (SNAP) benefit offices) may be effective and credible community messengers of the importance of participation in surveys.

Co-create materials and outreach plan with key communities if possible. Build in the time necessary to test an approach with the communities you are targeting to confirm that messages are being communicated effectively. For example, the Centers for Disease Control (CDC) established the Health Message Testing System to pretest materials with audiences and revise based on feedback received.

When testing key messages and outreach materials, check that materials convey key information, motivate action, allay concerns, and do not inadvertently trigger negative responses. Use the Paperwork Reduction Act's (PRA) generic clearance process, if appropriate, to gather feedback early and often. When that is not possible, agencies can always test materials with nine or less participants without PRA approval. Online tests can provide fast, low-cost initial validations of your approach with populations that are harder to reach in person or through other means. Try to validate results obtained through testing with trusted reaches the communities of interestcommunity leaders whenever possible for further confirmation on the best approach.

Use the language of the target community when creating press releases, social media, outreach materials, and public service announcements. Use different media and channels – print, digital, radio, video - to reach different segments of the community. Employ bilingual interviewers or neighborhood interpreters if budget allows. [See "Transcreate into non-English languages" below.]

For sample-based collections, oversample areas with higher concentrations of detailed groups of particular interest to your collection purposes. This will increase the number of respondents of interest included in the sample and will aid in making resulting estimates more reliable. The Census Bureau’s Planning Database can be used to identify census tracts and block groups with a high concentration of targeted groups where oversampling, special outreach, and promotion efforts could be considered. It should be noted that there are statistical trade-offs associated with oversampling one particular group. Improved estimates for the oversampled group may come at the cost of less precise estimates for other groups. In addition, results will reflect the characteristics of those group member who reside in the concentrated area and may not reflect well the intra-group diversity that exists across wider areas. For more information about trade-offs and how to make them efficiently, reach out to your agency Statistical Official.

Offer clear and simple instructions on how to participate and how information will be used. Explain exactly how the data collection will work and provide clear, step-by-step instructions. Assure potential participants that their responses will be kept private or confidential, if applicable, to the extent permitted by law, and will not result in negative consequences such as criminal prosecution or loss of benefits. Offer specific statutory and/or regulatory protections in outreach materials that may be in place to protect information provided, as was done, for example, with the 2020 Census, the National Health Interview Survey (NHIS), and the Survey of Income and Program Participation (SIPP). If the data will be collected on an administrative form, be clear about whether and how the information will be used for program administration. Talk to your agency counsel and PRA staff to understand what protections and assurances you’re able to offer. Whenever possible, pre-test messages that would best communicate these concerns without triggering additional fears and to ensure that the types of respondents needed understand the confidentiality assurances being shared with them.

Use multiple modes. Offering multiple modes of response – phone, in-person interviews, paper and digital forms (e.g., emails, text messages, or other platforms) – can help to reduce barriers and meet people where they are most comfortable.8

Develop a strategy to follow up with non-respondents. Send reminder letters to respondents who do not participate early in the collection period.9 For interviewer-administered surveys, send FAQ postcards that convey the importance of participating in advance of data collection to prepare respondents of the upcoming survey. These will also help to communicate the legitimacy of the interviewer who will come knocking on their door. When contacting respondents, cite trusted organizations and leaders who are supporting this effort, when possible.


Transcreate into non-English languages

Transcreation describes the process by which content is translated from one language to another, while maintaining its intent, style, tone, and context. When adapting outreach messages, survey instruments, and other materials, work with a translator so that the target audience receives not only the intended information, but also style, tone, and context.10 Use plain language and avoid word-for-word translations that do not resonate with the audience. Ask the translator, "How would you say this?" Be aware of differences in word choices for differing language dialects. Avoid bureaucratic and legalistic language, but also avoid being too familiar.

Validate transcreated materials by testing them with community partners or with focus groups within the intended community. Community partners can help to provide additional context around your materials. In some cases, they may even be able to transcreate materials when there is a lack of resources to do so.


Alternatives to Data Collection

It may not be feasible to directly collect race and ethnicity data on forms such as benefit applications. The content of forms may be restricted by statute or regulation, voluntary data may have low response rates, and respondents could fear that their responses to demographic questions will be used to adjudicate applications. In these cases, agencies should consider alternatives to collecting race and ethnicity data directly:

  • Data linkage - The self-reported race and ethnicity of program participants may exist on other datasets that can be linked to program data. For example, the Census Bureau's Data Linkage Infrastructure collaborates with federal agencies to support high-quality research and evaluation.
  • Modeling or imputation - Existing information can be used to estimate or impute the race and ethnicity of individuals. For example, information found on administrative forms such as first name, last name, and address can be used to estimate race and ethnicity.11 These experimental methods are still being tested and developed by several agencies, and must be used with caution to avoid inadvertently introducing bias.

Additional Data Collection Resources

5 Office of Management and Budget. (2021). "Study to Identify Methods to Assess Equity: Report to the President." Washington, DC: Office of Management and Budget. Retrieved July 20, 2021 from, https://www.whitehouse.gov/wp-content/uploads/2021/08/OMB-Report-on-E013985-Implementation_508-Compliant-Secure-v1.1.pdf

6 Evans, S., Levy, J., Miller-Gonzalez, J., Vines, M., Sandoval Girón, A., Walejko, G., Bates, N., and García Trejo, Y. (2020). 2020 Census Barriers, Attitudes, and Motivators Study (CBAMS) Focus Group Final Report. Washington, DC: Census Bureau. Retrieved January 24, 2019 from, https://www.census.gov/programs-surveys/decennial-census/decade/2020/planning-management/plan/final-analysis/2020-report-cbams-focus-group.html

7 These may require intentional research into, for example, non-English, digital, and/or social media.

8 Groves, R. M., Fowlre Jr., , F.J. . Couper, M.P., Lepkowski, J.M., Singer, E., and Tourangeau, R. (2009). Methods of Data Collection. Survey Methodology. Hoboken, NJ: Wiley.

9 Clark,S. (2016).. How Effective is a Prenotice Letter in Increasing Self-Response? Washington, DC: Census Bureau. Retrieved June 17, 2022 from, https://www.census.gov/newsroom/blogs/research-matters/2016/05/how-effective-is-a-prenotice-letter-in-increasing-self-response.html

10 Kim, J., Kopp, J., and Hotchkiss, M. (2021). Developing Public-Facing Language Products: Guidance From the 2020 Census Language Program. Washington, DC: Census Bureau., Retrieved June 17, 2022 from, https://www2.census.gov/library/publications/decennial/2020/operations/language-product-handbook.pdf

11 Haas, A., Elliott, M.N., Dembosky, J.W., Adams, J.L. (2019). Imputation of Race/Ethnicity to Enable Measurement of HEDIS Performance by Race/Ethnicity. Health Services Research, 54(1)): 13-23. https://pubmed.ncbi.nlm.nih.gov/30506674/




This section presents tools to help agencies produce accurate and reliable race and ethnicity statistics. Fortunately, there are proven statistical methods for getting the most value out of limited information and small sample sizes. For help applying these methods, reach out to your agency Statistical Official.


Data Analyses for Small Race and Ethnicity Categories and/or Detailed Groups

Use existing guidance on the presentation of race and ethnicity data. For example, OMB offers guidance on the aggregation and allocation of data on race and flexibilities and best practices for implementing their race and ethnicity data standards.

Pool multiple months or years of data together. This method will help ensure adequate sample size for analysis. The number of pooled cycles or years can be based on target minimum cell sizes or on target maximum uncertainty, or variance. It may be necessary to adjust weights when pooling data 12. For example, the American Community Survey and the Puerto Rican Community Survey provide multi-year files to data users.

Aggregate detailed groups where needed. If data is collected at very granular levels that will require several years of pooled data to reach publishable sample sizes, work with the data to discover the most granular level possible for publication at any given time.

For example, if data for detailed Black or African American groups are needed but the sample size is too small, then combine Black or African American detailed groups to the high-level identities and/or geographic regions that comprise this category as defined in SPD 15 (e.g., aggregate Jamaican, Haitian, and Bahamian into an Afro Caribbean grouping; or aggregate Nigerian, Liberian, and Ghanaian into a West African grouping). The Census Bureau’s Hispanic Origin and Race Code List (Appendix F beginning on page 193) is an available resource when deciding how to aggregate groups.

Supplement with other data sources. Increasingly, federal agencies are linking administrative records and survey data to create an enhanced data file for analysis. This approach can provide a more detailed picture of the economic and social well-being for racial and ethnic groups. This also allows different federal agencies to help each other to better understand how programs are working and where they could be improved.

Presenting Results for Race and Ethnicity Categories and/or Detailed Groups

Similar attention given to collecting and analyzing race and ethnicity information should be given to presenting results so that they’re safe and useful.

Present results from detailed race and ethnicity groups with equitable, balanced, and relevant terminology. To do so, rely on (1) terminology used by specific racial and ethnic communities and/or (2) terminology of how respondents describe themselves. Use the Census Bureau's Hispanic Origin and Race Code List (Appendix F, which begins on page 193) as a guide.

Do not use terms like "majority," "minority," "other," and "non-White," and avoid combining specific races and ethnicities into "majority," "minority," "other," and "non-White" groupings. These terms have several conceptual and practical challenges and have become more complex and contested in recent decades. For more information, see Measuring Racial and Ethnic Diversity for the 2020 Census (beginning with "Measuring Diversity Then and Now"). In cases where the minimum reporting categories cannot be used, agencies may use "all other races" when such a collective description is appropriate.13


Measures of Uncertainty

Especially when dealing with small populations it’s important to consider the uncertainty associated with results. The acceptable amount of uncertainty will depend on the particular use and the importance of having accurate and precise estimates.

A margin of error (MOE) describes the precision of an estimate at a given level of confidence. The confidence level associated with the MOE indicates the likelihood that the sample estimate is within a certain range (the MOE) of the population value. MOEs are often provided at a 90 or 95 percent confidence level.

Confidence intervals are easily calculated using MOEs and are often displayed as an upper or lower bound at a given confidence level for the estimate.14 As with MOEs, it is common to produce confidence intervals with a confidence level of 90 or 95 percent. Confidence intervals or MOEs are excellent tools for communicating uncertainty to a non-technical audience. The larger the MOE or confidence interval for a particular estimate, the more caution is required when using the estimate. The size of a CI may be used a criterion for determining whether to report an estimate to the public–a practice that is adopted by the National Center for Health Statistics.15

Measures of statistical uncertainty can also be used to produce coefficients of variation (CV), which are measures of how close the observed data points are to their mean.16 The CV is also called the relative standard error. It is calculated by dividing the standard error of an estimate by its mean and is usually expressed as a percentage. According to the U.S. Census Bureau’s Statistical Quality Standards, serious data quality issues related to sampling error occur when the estimated CVs for the majority of the key estimates are larger than 30 percent.17

Sensitivity Analysis

A sensitivity analysis is a method used to evaluate how the results change when the inputs or assumptions used in the analysis are varied. It can be a useful tool when modeling with either survey data or administrative records. Sensitivity analyses can be particularly useful for administrative records, where it’s difficult to estimate the uncertainty in the results because the respondents aren’t randomly sampled.

For example, birth records do not list the race and ethnicity of the child but do list this information for the child's mother and father. An analysis of births by race and ethnicity will need to assign these characteristics to the child using the mother's and/or father's characteristics. A possible sensitivity analysis could be to change how race and ethnicity are assigned to the child and then evaluate how this impacts the results for an outcome variable. Ideally, a known outcome variable would be used to calibrate the methods to best approach the correct value.


Suitability of Detailed Race and Ethnicity Group Data for Publication

When preparing materials for publication, it is important to use disclosure avoidance procedures to protect the confidentiality of respondents, especially for small and vulnerable groups. These procedures may range from relatively simple methods such as suppression (i.e., redacting cells with small, unweighted counts) and rounding of cell counts (e.g., to the nearest thousand) to more complex methods such as differential privacy. The Data Protection Toolkit Data Protection Toolkit provides a good source information and resources on methods to protect data.

Measures of statistical uncertainty (described above) can be used to determine whether estimates are suitable for release. When releasing estimates, provide the accompanying measures of statistical uncertainty. Measures of uncertainty, accompanied by explanation and interpretation provided by subject matter experts, help data users understand the reliability and limitations of the estimate.18

Additional Analysis and Evaluation Resources

12 You can read about how the American Community Survey combines multiple years of data here: American Community Survey Multiyear Accuracy of the Data (5-year 2017-2021) (census.gov)

13 U.S. Office of Management and Budget. (1997). Standards for Maintaining, Collecting, and Presenting Federal Data on Race and Ethnicity. Federal Register, 62(210): 58782-58790, Retrieved, October 1997 from, https://www.govinfo.gov/content/pkg/FR-1997-10-30/pdf/97-28653.pdf.

14Technically, a 90 percent confidence interval means that 90 percent of samples drawn the same way as the working sample would include the true estimate within the estimated confidence interval. For a more in-depth definition of confidence intervals, see the Census Bureau’s Basic Explanation of Confidence Intervals.

15 Parker,J.D., Talih, M., Malec, D.J. (2017),. National Center for Health Statistics data presentation standards for proportions. National Center for Health Statistics. Vital and Health Statistics, 2(175). 2017; Parker JD, Talih M, Irimata KE, Zhang G, Branum AM, Davis D, et al. (2023). National Center for Health Statistics data presentation standards for rates and counts. National Center for Health Statistics. Vital and Health Statistics, 2(200).

16 U.S. Census Bureau. (2013) U.S. Census Bureau Statistical Quality Standards. Washington, DC: U.S Census Bureau., Retrieved July 2013 from, https://www.census.gov/content/dam/Census/about/about-the-bureau/policies_and_notices/quality/statistical-quality-standards/Quality_Standards.pdf

17 U.S. Census Bureau. (2013). U.S. Census Bureau Statistical Quality Standards. Washington, DC: U.S Census Bureau.. Retrieved July 2013 from, https://www.census.gov/content/dam/Census/about/about-the-bureau/policies_and_notices/quality/statistical-quality-standards/Quality_Standards.pdf. The 30 percent CV means that the ratio of the standard deviation to the mean is 0.3, which is the extent of the variability relative to the mean. Data users should use discretion when computing the CV (or relative standard error) for proportions. The CV for an estimated proportion of 0.05 is much larger than the CV for its complement (0.95), which could be used to convey the same information. For example, if the proportion of the population having characteristic X is 0.05, then the proportion of the population without characteristics X is 0.95. The CV for the proportion having characteristic X will be larger than the proportion without characteristic X.

17(b) Parker, J.D. (2018). National Center for Health Statistics Data Presentation Standards for Proportions. Vital and Health Statistics, 2(175). Washington DC: National Center for Health Statistics. Retrieved June 17, 2022 from, https://www.cdc.gov/nchs/data/series/sr_02/sr02_175.pdf.

18 See, for example, the NCES Condition of Education report which informs readers when the coefficient of variation for an estimate is so large that it threatens its validity.


This section provides guidance on the release and dissemination of race and ethnicity data. Federal agencies have an obligation to release data files and a range of data products to meet data users’ needs in a timely manner. Additionally, they should provide information necessary to ensure that users interpret data accurately. However, it is critical that agencies adhere to disclosure laws and guidelines to keep respondent information confidential.

Agencies should establish a dissemination plan and communicate the plan to the general public to promote transparency and build public trust. The dissemination plan should provide timely access to all users and information to the public about the agencies’ dissemination policies and procedures, including those related to any planned or unanticipated data revisions. For example:

  1. Develop a schedule and determine the mode for the release (e.g., printed hardcopy, PDF file, or HTML format) of information products;
  2. Inform the general public, as well as targeted audiences; and
  3. Ensure timely access to data and data products to all users.


Tiered Data Access

Tiered data access is a strategy for disseminating data that includes multiple versions of the same data in order to help the agency monitor and control the risk of disclosure. Less sensitive or lower risk versions might be released publicly (public use data, or PU), while more detailed and sensitive versions may require a signed data use agreement, or oversight from a Disclosure Review Board.

Public use (PU) data is released by agencies in multiple formats. These can include PU files that have undergone additional disclosure mitigation procedures to allow public release (e.g., collapsing detailed race and ethnicity groups to mitigate disclosure risk and suppressing data), online data tools (e.g., National Center for Education Statistics DataLab) that include procedures to protect data generated, or release of tabular data either in reports or separately. Agencies must evaluate their audiences and their data needs to determine which PU data products best suit their users’ needs.

Some data users require access to more sensitive data that cannot be made available through PU data releases. This might include information on small race or ethnic categories that are collapsed or suppressed on PU data files. Restricted use (RU) access is a process that allows federal agencies to make restricted data available to approved users in a secure data environment that minimizes risk of disclosure. Restricted access to the least modified federal statistical data has been through restricted-use data centers (i.e., The Federal Statistical Research Data Centers (FSRDCs) managed by the Census Bureau20) or through licensing researchers as CIPSEA agents (i.e., National Center for Education Statistics, National Center for Health Statistics).21 Movement toward remote RU access is underway (i.e., Coleridge initiative at the National Center for Education Statistics). Contact your agency's statistical official for updates on the standard application process (SAP) and remote access options.

Since data must be protected by law, providing a wide range of data products can allow access by the widest audience.

Additional Access and Dissemination Resources

20Restricted data are available to qualified researchers with approved projects at secure Federal Statistical Research Data Centers (RDCs). There are currently 31 open Federal Statistical Research Data Center (RDC) locations. The RDCs partner with over 50 research organizations including universities, non-profit research institutions, and government agencies.

21Any results the researcher proposes to release must still go through the disclosure avoidance protocols.


The U.S. Office of Management and Budget’s (OMB) Statistical Policy Directive No. 15: Standards for Maintaining, Collecting, and Presenting Federal Data on Race and Ethnicity (SPD 15) establishes a uniform framework for the collection of race and ethnicity data that all federal agencies must use when collecting race and ethnicity information.22 Sometimes cited as a barrier to collection of more detailed race and ethnic categories, it actually encourages agencies to collect more detailed racial and ethnic categories so long as they roll up to the following minimum categories.

The two required minimum categories for ethnicity,23 and their corresponding definitions, are:

  1. Hispanic or Latino: A person of Cuban, Mexican, Puerto Rican, South or Central American, or other Spanish culture or origin, regardless of race.
  2. Not Hispanic or Latino

Note, people of Hispanic or Latino origin may be of any race, per SPD 15.

The five required minimum categories for data on race,24 and their following definitions are:

  1. American Indian or Alaska Native: A person having origins in any of the original peoples of North and South America (including Central America), and who maintains tribal affiliation or community attachment.  
  2. Asian: A person having origins in any of the original peoples of the Far East, Southeast Asia, or the Indian subcontinent including, for example, Cambodia, China, India, Japan, Korea, Malaysia, Pakistan, the Philippine Islands, Thailand, and Vietnam.   
  3. Black or African American: A person having origins in any of the black racial groups of Africa.  
  4. Native Hawaiian or Other Pacific Islander: A person having origins in any of the original peoples of Hawaii, Guam, Samoa, or other Pacific Islands.
  5. White: A person having origins in any of the original peoples of Europe, the Middle East, or North Africa. 

Note, people may report multiple races.

SPD 15 permits the collection of more detailed racial and ethnic groups provided that any additional information can be aggregated into the minimum categories. It is important for some uses of data to have race and ethnicity information disaggregated beyond - or more detailed than - the minimum categories provided in SPD 15. For example, there are a wide variety of detailed groups that fall under the Asian category (e.g., Chinese, Asian Indian) and Hispanic or Latino category (e.g., Mexican, Cuban). In addition to the 2020 Census, many surveys collect more detailed information about race and ethnicity that can be aggregated to the minimum racial and ethnic categories in SPD 15 (e.g., American Community Survey (ACS), National Health Interview Survey (NHIS); Early Childhood Longitudinal Study, Kindergarten Class of 2010-11 (ECLS-K:2011); and Consumer Expenditures Survey (CE)).

SPD 15 is currently under review by OMB and may be revised in the future. You can learn more about the review process here: SPD15revision.gov.

22 The reporting guidelines can be found at OMB Bulletin No. 00-02 - Guidance on Aggregation and Allocation of Data on Race for Use in Civil Rights Monitoring and Enforcement and Provisional Guidance on the Implementation of the 1997 Standards for Federal Data on Race and Ethnicity.

23 Multiple responses to the ethnicity question are not permitted.

24 SPD 15 does not permit an "Other" race category. The only exception is the decennial census, including the American Community Survey, which is required by law to include a "Some Other Race" response category.