Skip Navigation
small header image
Selected Papers in School Finance 1995

Proposed "Good Practices" for Creating Data Bases from the F-33 and CCD for School Finance Analyses

Michael O'Leary and Jay Moskowitz

Washington, DC

Introduction

Background

With recent developments in school reform and state and Federal financing of public schools, the desire for accurate school finance (and related enrollment and programmatic) data has grown. The widespread use of personal computers which can run sophisticated statistical software packages and analyze very large data sets has also made it much easier for researchers to scrutinize public data sets. Computer runs that used to take hours can now be done in minutes or seconds, allowing researchers to subject data bases to new levels of scrutiny. The increased scrutiny certainly improves the quality of the data, but it also opens the door to wide variations in the procedures used by individual researchers (even those in the same fields) to create data bases for analysis (from multiple, larger data sets).

At least three studies \1\ sponsored by the U.S. Department of Education utilized the same school finance and enrollment data sets as their sources- the Survey of Local Government Finances for School Systems (F-33) and Common Core of Data (CCD). Realizing the need for common data bases upon which to conduct their analyses, researchers working on these studies at the American Institutes for Research (AIR), Consortium for Policy Research in Education's Finance Center (CPRE), and Pelavin Associates \2\; cooperated to develop a set of common practices regarding data base creation. While the researchers' ultimate goals and methodologies were different, they were able to create nearly identical data bases by collaborating and adopting each other's techniques for selecting appropriate district records from their primary data sources: the F-33 and CCD surveys (revenue, expenditure, and enrollment data) and the District-Mapped Decennial Census (demographic and economic data). Revenue, expenditure, and enrollment data from the F-33 and CCD surveys are the sine qua non for national school studies and demographic and economic data.

Objectives

This paper provides future users of these data sets (and subsequent releases thereof) with an introduction to the major issues regarding creation of a data base for analysis. Based on the procedures followed by the AIR, CPRE, and Pelavin Associates researchers, this paper summarizes a methodology for selecting the records (i.e., school districts) appropriate for most school finance analyses. \3\ This methodology is offered as a potential model for future data base creation and modification.

By presenting the issues and caveats regarding initial modification of the data, and by proposing a methodology for data base creation, the authors hope the paper will encourage greater standardization among data users. Individual research methods will (and should) always vary, but with common data base creation procedures, researchers' conclusions can be debated on their merits rather than dismissed due to variations in the underlying data sets.

Organization

Subsequent sections of this paper include descriptions of the F-33 and CCD data sets; the use of these data sets in past studies; issues identified during creation of the data bases for the AIR, CPRE, and Pelavin Associates studies; and a proposed process for modifying the data sets. The proposed process includes the steps taken to address the issues identified by AIR, CPRE, and Pelavin Associates and notes the additional issues (e.g., adjustment for special-needs pupils) that could not be addressed fully or were not common to all three studies.

Major National Data Sets Used For School Finance Research

The Census Bureau's Survey of Local Government Finances for School Systems (F-33) and the ' (NCES') Common Core of Data (CCD) are the two national data sets most relevant to school finance research.

The F-33 is conducted annually by the Census Bureau under contract with NCES. Although conducted annually, it is not based on a universe of school districts each year. Typically, the survey includes all districts in all states in years ending in 2 or 7 (1982, 1987, and 1992). To obtain financial data which corresponded more closely to the decennial censuses, NCES also sponsored special F-33 surveys in 1980 and 1990.

Consequently, F-33 data are available for all districts for the 1979-80, 1981-82, 1986-87, 1989-90, and 1991-92 school years. For the intervening years, the F-33 includes data on all districts for 33 to 35 states, but only a sample of districts in the remaining 15 to 17 states.

The CCD contains data at three different levels from five different surveys. It has three separate files for state, agency (district), and school-level data. These files are based on data from the F-33 and the Schools and Staffing, Public Elementary-Secondary Agency, Public Elementary-Secondary School, State Non-Fiscal Elementary and Secondary Education, and National Public Education Financial Surveys.

The District-Mapped Census data are available as a result of the NCES-sponsored "School District Mapping Project." For the project, the Census Bureau used household level data by census tract from its 1980 and 1990 decennial censuses. The Bureau took geographic coordinates for each state's census tracts and school districts and aggregated (or mapped) the tract-level data up to the district level. As a result, the District-Mapped Census data include a wide variety of demographic and socioeconomic information for each of the nation's school districts.

Recent Uses Of Major School Finance Data Sets

The F-33 and CCD data sets have been used extensively for studies conducted by researchers at AIR, CPRE, and Pelavin Associates during the last two years. The AIR and Pelavin Associates researchers also used some of the district-mapped data. The studies' foci, or emphases, are presented in Table 1 and the data elements used by the Pelavin Associates study are presented in Tables 1a and 1b.

Other major national school finance studies utilizing the F-33 data include those by Wyckoff (1992), Riddle (1990), and Schwartz and Moskowitz (1988). Wyckoff used enrollment and current and instructional expenditure data from the F-33 to examine changes in intrastate equity between 1980 and 1987. Riddle also examined intrastate equity using F-33 data, but for 1987 only. Schwartz and Moskowitz used F-33 enrollment, expenditure, and revenue data, as well as 1980 district-mapped income, poverty, and property data to examine changes in horizontal equity and equal opportunity between 1977 and 1985.

Unlike the AIR, CPRE, and Pelavin Associates studies, these three earlier studies did not use a common approach to data processing and provided only summary information about the approach that they did take. For example, Schwartz and Moskowitz discussed the problems associated with using sample data and listed the number of districts included in their analysis (by state), but they did not indicate what criteria they used to include those districts and exclude others.

Riddle included a lengthy discussion of the limitations of school finance data, but provided very little information about which (or how many) districts he included in his database. Other than noting that he screened out districts with low enrollments, \4\ he provided no information about the creation of his database and no information about whether he screened for special or non-operating districts.

Wyckoff provided more detail about his data base than Riddle, Schwartz, or Moskowitz. Like Riddle, Wyckoff noted some of the basic limitations of the data (e.g., state by state variations in classifying expenditures), but Wyckoff also noted the total number of districts in the 1980 and 1987 data sets, the number eliminated because they were not unified, and the number eliminated because they were community college, vocational, non-operating, or service districts. However, Wyckoff did not provide the number of districts included by state or any other details about his data base and the variables contained therein.

Given this lack of information and apparent inconsistency regarding data processing, researchers working on the AIR, CPRE, and Pelavin Associates studies collaborated and, to the extent possible, standardized their data base creation process. The issues identified during the course of this collaboration are discussed in the following section.

Data Base Creation Issues And Process

While the F-33 and CCD data sets are tremendously valuable resources, they have their limitations, and there are a number of data base creation issues which must be addressed clearly by any users of the data. These issues deal primarily with observation or record selection- i.e., the process by which "appropriate" districts are selected for analysis and inappropriate districts are winnowed out.

Three key issues emerged as the researchers planned their studies and created their data bases:

1. F-33 Sampling,

2. District Types, and

3. District Levels.

Each of these issues and the standardized process used to address them is discussed below.

F-33 Sampling

For most years, the F-33 survey is based on a sample of districts in 15 to 17 states. Realizing this, and adjusting or limiting analyses accordingly, are crucial first steps for any study using the data. One researcher devoted considerable effort to examining "the absence of revenue data for many districts," without realizing that he was using F-33 data from a sample year (Toenjes 1994).

In the past, researchers have dealt with this limitation in a variety of ways. For example, Schwartz and Moskowitz weighted data for 17 states \5\ for their 1984-85 equity analysis. They note, "It is difficult to assess the impact of these sampling procedures on... estimates of fiscal equity [because] probability distributions of equity statistics are either unknown or are difficult to obtain, and....the sampling procedures are quite complex and variance estimates are not straight-forward."

In part to parallel the district-mapped data, the AIR study used data for 1990 only- also an F-33 universe year. Given the F-33 sampling issues, the Pelavin Associates and CPRE studies also used F-33 data for universe years only. Researchers interested in the intervening years either have to (1) limit their study to the 35 states for which data from a universe of districts are available, or (2) tolerate the uncertain level of error resulting from weighted data for 15 to 17 states, or (3) attempt to estimate the sampling error for the weighted data.

District Types

The F-33 and CCD contain various types of districts which are "special" because of their instructional programs or students. These special districts include community college, agricultural, vocational-technical, correctional/custodial, special education (e.g., BOCES districts in New York State), and "non-operating" districts. Non-operating districts have students but no buildings and include regional support centers, co-ops, media centers, and other supervisory or service districts.

The AIR, CPRE, and Pelavin Associates studies went to considerable lengths to exclude such districts because: (1) their programs have considerably different resource needs; (2) they often receive state funds through categorical aid formulas that are different from those used for "regular, operating" districts; and (3) their borders often overlap or encompass those of regular, operating districts.

For studies that include comparisons of current expenditures, non-operating districts can present additional problems. States like Massachusetts, Nebraska, and Vermont report administrative and support expenditures in special administrative, supervisory union, or service districts. As a result, function-level expenditure data for districts in these states would underestimate the true costs of school administration unless the non-operating districts expenditures were allocated back to the operating districts that they served. Currently, the codes in the F-33 data base do not indicate the relationship between operating and non-operating districts. To accurately include all expenditures by function, operating and non-operating districts would have to be matched by hand and then a formula would have to be created to allocate the administrative and support service expenditures reported by the non-operating districts back to the operating districts they served.

However, the F-33 does include special codes for regular, operating, and "special" districts. These codes are shown in Table 2. The AIR, CPRE, and Pelavin Associates studies all used these codes as an initial screen of their data and eliminated more than 1,200 observations.

Second, they screened for districts with zero or missing enrollment and expenditures or revenues. Districts with no enrollment may have been non-operating or may simply have been errors in the original creation of the original data base. In either case, enrollment and the F-33 revenue/expenditure data were the sine qua non for these studies.

Third, the researchers used text-searches to flag observations with "VOC," "TECH," "SPEC," or "AGRIC" in the district name field. These districts were then reviewed by hand to ensure that, for example, "spec" was not found in the middle of a regular district's name.

Fourth, they used individualized education plan (I.E.P.) counts aggregated from the CCD school file to the district level and screened out any districts with more than 50 percent I.E.P. or "special education" students.

Fifth, they used the CCD district type codes (Table 3) for a final screen. These district type codes identify supervisory unions, regional service districts, stateoperated districts (e.g., districts for handicapped students or students who are in the state's juvenile correctional system), and federally-operated districts (e.g., Department of Defense schools). Combined, these five stages of screening eliminated or "winnowed out" nearly 3,000 observations that were not regular, operating districts. \6\

District Levels

There are four basic "levels" of districts: (1) elementary-only, (2) secondary-only, (3) unified K-12, and (4) college-graded. \7\ For most school finance studies, the college-graded districts can be winnowed out immediately. They are typically community colleges which are not directly relevant to states' support of elementary and secondary education.

Over the years as states have encouraged mergers and consolidations of districts, the distribution of districts among the first three levels has shifted toward unified districts. However, there are still some states with a significant number of non-unified districts, and there are also two states (Montana and Vermont) with no unified districts. Unlike the earlier study in which Wyckoff eliminated all non-unified districts (approximately 25 percent of his data base for 1987), the AIR, CPRE, and Pelavin Associates studies included all three levels of districts. However, they handled the analysis thereof somewhat differently.

As Wyckoff, Riddle, and others have noted, past research has demonstrated that costs vary by district levels. Based partially on arguments made by Odden and Picus in School Finance: A Policy Perspective, the CPRE study standardized non-unified districts' revenues to those of unified districts by increasing elementary-only districts' revenues by 10 percent and decreasing secondary-only districts' revenues by 25 percent.

The AIR and Pelavin Associates studies combined all district levels and relied on pupil-weightings to mitigate the effects of non-unified districts (which tend to have very low enrollments). Representing a third approach to the district level issue, Riddle analyzed financial indicators separately for each level.

Whether weighting, separate presentation, or elimination is chosen as the appropriate way to adjust for district levels, researchers can utilize the SCHLVLCOD field in the F-33 (see Table 2) and the GRADESPAN field in the CCD for years since 1985-86. For districts with missing values in the SCHLVLCOD field, AIR, CPRE, and Pelavin Associates used values from the GRADESPAN field. Then, the researchers used text searches of the districts' names and manual checks for districts which still had missing level codes. \8\

Unstandardized or Unresolved Data Base Creation Issues

While the AIR, CPRE, and Pelavin Associates researchers developed a standard process for choosing the districts for inclusion in their data bases, they took slightly different approaches to or were not able to fully address issues in the following areas:

  • Enrollment

  • Special-needs pupils

  • Property, poverty, and income data

  • State and district finances.

Researchers using the F-33, CCD, and district-mapped data should be aware of these issues as they develop their own data bases and study designs.

Enrollment

The F-33 and CCD both include enrollment variables. The AIR and Pelavin Associates studies used the F-33 enrollments and relied on the CCD to fill in missing values where possible. The CPRE study used the CCD enrollments.

For 1990, CCD enrollments exceeded F-33 enrollments by 2.5 percent or more for 10 percent of districts. At the other end of the scale, F-33 enrollments were at least 2.5 percent greater than CCD enrollments for 25 percent of all districts. Based on manual checks of districts with large discrepancies between CCD and F-33 enrollments, neither survey appears to be "correct." Common causes of the discrepancies appear to be (1) separate reporting of elementary and secondary schools on one survey and consolidated reporting on the other, and (2) miscoding of districts with the same names in the same state. The effects of these variations on per-pupil school finance indicators are uncertain and warrant additional study.

Special-Needs Pupils

Given the existing F-33 and CCD data, it is impossible to isolate all special districts, schools, and students and their respective financial data. Some special-needs (e.g., special education) districts were eliminated based on the I.E.P. counts and district type codes from the CCD. However, it is impossible to separate funds targeted for special-needs pupils and special-circumstances districts (e.g., geographically isolated ones) from those allocated through basic aid programs.

The AIR and Pelavin Associates studies used state and local revenues in order to eliminate Federal categorical funds from their calculations. This provides a more accurate picture of the fiscal disparities inherent to a state's funding system but a less comprehensive picture of funding available to serve special-need pupils where federal funds play an important role. For analyses using district expenditures, categorical funds (from any source) cannot be excluded. The Pelavin Associates study examines the statistical impact on basic school finance indicators using revenues or expenditures.

Until complete and nationally comparable data identifying the number and types of special-needs students in each district and the amount of funds allocated through basic and categorical funds are available, school finance researchers will struggle to disentangle the effects of variations in student needs and district circumstances and the subsequent variations in the allocation of funds to districts. In the meantime, the CCD data and District-Mapped Census provide some proxies.

The CCD school file includes data on urbanicity, race/ethnicity, special education (I.E.P.) status, and free and reduced price lunches. The District-Mapped Census also includes a variety of demographic and socioeconomic data (e.g., race/ethnicity, limited English proficiency, and poverty). Adjustments based on these data, in conjunction with the unadjusted data, will shed some light on the relationship between pupil needs and school finances, but much more research needs to be done regarding the actual range in (and adequacy of) funding for special needs students and districts.

Property, Poverty, and Income

In order to examine the relationship between the wealth or affluence of a school district and its revenues and expenditures, the Pelavin Associates study utilized District-Mapped Census data on home-owner estimates of residential property value (in lieu of the more commonly used assessed or equalized property valuation per pupil which is not available on a nationally-comparable basis), child poverty counts, and median household incomes. Real property (upon which taxes are levied) includes commercial, industrial, and agricultural property as well as residential property. Because residential property includes only a small portion of the real tax base in many districts and is distinctly different from assessed property valuations, the applicability of these data to school finance studies without other measures of affluence is questionable. Property data were also unavailable for a large number of districts, especially in California. Data for these districts had to be estimated using growth rates for property values in surrounding districts.

More comprehensive and appropriate property data are not currently available for all states. Any study using the District-Mapped Census data on residential property values must acknowledge all of these shortcomings.

Given the limitations of the property data, child poverty counts and median household income provide more accurate estimates of districts' fiscal capacity. However, they too have their limitations. For example, a district's fiscal capacity could be high in terms of real property values even though it had high child poverty rates or low median incomes. The District-Mapped Census data represent a major step towards better analysis of the relationship between fiscal capacity and school finance, but conclusions based on them must still be made with great caution.

State and District Finances

Researchers using the F-33 data must deal with a variety of issues regarding state and district finances. Because they had different goals and used different financial variables, the AIR, CPRE, and Pelavin Associates studies did not use the same approach to the following issues:

  • On-Behalf-of-LEA Funds- In most states, the total on-behalf-of-LEA revenues and expenditures of districts in the F-33 data file are not equal to the amount of Direct State Support provided by the states. The AIR study developed an imputation procedure to allocate all funds to the appropriate district accounts.

  • "Outliers"- All three studies scrutinized districts with per-pupil expenditures and revenues in the top and bottom 1 or 0.5 percent of the distribution. Determinations about whether to include or exclude these outlier districts were made based on manual checks and contacts with Census, state, and district officials. These determinations were not necessarily standard across the three studies.

  • Fund Imbalances- Considerable differences between total revenues and current expenditures exist. In some cases, this is due to districts' financing capital expenditures from their general revenue funds; in others, it is due to inter-governmental transfers (see below). The studies were not able to fully address these cases.

  • Intergovernmental Transfers- The number of pupils covered by inter-governmental transfers is not apparent from the F-33 or CCD data. As a result, per-pupil expenditures and per-pupil revenues may be quite different. Adding intergovernmental transfers to operating expenditures because the transfers may contain more than the instructional costs for children attending schools in other districts. While this issue could not be "resolved," the studies followed NCES practice and did not include transfers in their expenditures.

Summary

Despite the fact that the AIR, CPRE, and Pelavin Associates researchers were not able to resolve all the issues regarding data base creation, their collaboration and the common process they used greatly increased the comparability among the data bases they ultimately used for their analyses. Consequently, these studies can be compared and evaluated on the basis of their results and analytical methods alone to a much greater extent than previous studies which varied widely in their data base creation procedures.

The common process used is summarized in Table 4 as a model or suggested "best practice" for other researchers using the F-33 and CCD data. Once the CCD and F-33 are merged and missing enrollments are filled, the order of the steps in this winnowing process is not crucial.

Notes

1/ Parrish, T.B., et al. 1995; Hertert, L. 1994; and O'Leary, M., et al. 1995.[Back]

2/ After the initiation of these studies, Pelavin Associates, Inc. became an affiliate of the American Institutes for Research and was renamed Pelavin Research Institute.[Back]

3/ Readers interested in the specific procedures used in the AIR, CPRE, and Pelavin Associates studies are encouraged to consult with the authors of those studies directly.[Back]

4/ "To avoid marginal cases where expenditures per pupil are very high largely because the [districts'] enrollment is quite small," Riddle eliminated unified districts with fewer than 500 students and non-unified districts with fewer than 250 students.[Back]

5/ Alabama, Arizona, Arkansas, Colorado, Georgia, Indiana, Kentucky, Mississippi, Montana, New Jersey, New Mexico, Oklahoma, Oregon, South Carolina, South Dakota, Utah, and Vermont.[Back]

6/ Note that these CCD codes are actually in the TYPECODE field, while the F-33 codes are in the SCHLVLCOD field. Note also that the SCHLVLCOD field encompasses both district levels (i.e., elementary-only, secondary-only, and unified) and district types (i.e., operating, non-operating, vocational, etc.).[Back]

7/ Note that the terms "level" and "type" are used inconsistently in the literature. For this paper and the Pelavin Associates study, "level" refers to the grade levels included in a district and "type" refers to the variations of regular and special, operating and non-operating districts.[Back]

8/ To obtain district types for all districts prior to 1986-87, Pelavin Associates researchers merged the 1980 district ID codes with the 1987 codes. For districts which did not match, the researchers used text searches of the district names and manual checks. As a result, the 1980 district level codes are "best guesses." It is possible that changes in district levels occurred between 1980 and 1987 which would have been obscured by the Pelavin Associates ID code merge. This is just one example of problems associated with use of the 1980 data. Additional difficulties encountered by the Pelavin Associates researchers included missing or incomplete poverty and property data for several states (e.g., Alabama and California), and incomplete documentation of the source data sets' record layouts.[Back]

References

Hertert, L., Busch, C., and Odden, Allan. Winter, 1994. "School Financing Inequitites Among the States: The Problem from a National Perspective." Journal of Education Finance 19(3): 231-255.

O'Leary, M., Moskowitz, J., and O'Malley, Amy. 1995. Trends in School Finance Equity Indicators, 1980-1992. Washington, DC: Pelavin Research Institute. unpublished manuscript.

Parrish, T.B., Matsumoto, C.S., and Fowler, W.J., Jr. 1995. Disparities in Public School District Spending 1989-90: A Multi-variate, Student-weighted Analysis, Adjusted for Differences in Geographic Cost of Living and Student Need. Washington, DC: , U.S. Department of Education.

Riddle, Wayne Clifton. 1990. "Expenditures in Public School Districts: Why Do They Differ?" Washington, DC: Congressional Research Service.

Schwartz, Myron, and Jay Moskowitz. 1988. "Fiscal Equity in the United States, 1984-85." Washington, DC: Decision Resources Corporation.

Toenjes, Laurence O. March, 1994. "Interstate Revenue Disparities and Equalization Costs." Presented at the Annual Conference of the American Education Finance Association in Nashville, Tennessee.

Wyckoff, James H. 1992. "Intrastate Equality of Public Primary and Secondary Education Resources in the U.S., 1980-87." Economics of Education Review 11(1): 19-30.

 



Top

[Previous Paper]Prev.File Table of Contents Next[Next Paper]
1990 K Street, NW
Washington, DC 20006, USA
Phone: (202) 502-7300 (map)