NCES 2013-008
January 2013

Introduction to Projection Methodology

Content of appendix A

Since its inception in 1964, the Projections of Education Statistics series has been providing projections of key education statistics to policy makers, educators, researchers, the press, and the general public. This edition of Projections of Education Statistics is the fortieth in the series.

Appendix A contains this introduction, which provides a general overview of the projection methodology, as well as six additional sections, which discuss the specific methodology for the different statistics projected:

This introduction

• outlines the two major techniques used to make the projections;
• summarizes key demographic and economic assumptions underlying the projections;
• examines the accuracy of the projections; and
• introduces the subsequent sections of appendix A.

Projection techniques

Two main projection techniques were used to develop the projections presented in this publication:

• Exponential smoothing was the technique used in the projections of elementary and secondary enrollments and high school graduates. This technique also played a role in the projections of teachers at the elementary and secondary level, as well as enrollments and degrees conferred at the postsecondary level.
• Multiple linear regression was the primary technique used in the projections of teachers and expenditures at the elementary and secondary level, as well as enrollments and degrees conferred at the postsecondary level.

Exponential smoothing

Two different types of exponential smoothing, single exponential smoothing and double exponential smoothing, were used in producing the projections presented in this publication.

Single exponential smoothing was used when the historical data had a basically horizontal pattern. Single exponential smoothing produces a single forecast for all years in the forecast period. In developing projections of elementary and secondary enrollments, for example, the rate at which students progress from one particular grade to the next (e.g., from grade 2 to grade 3) was projected using single exponential smoothing. Thus, this percentage was assumed to be constant over the forecast period.

In general, exponential smoothing places more weight on recent observations than on earlier ones. The weights for observations decrease exponentially as one moves further into the past. As a result, the older data have less influence on the projections. The rate at which the weights of older observations decrease is determined by the smoothing constant.

When using single exponential smoothing for a time series, , a smoothed series, , is computed recursively by evaluating where is the smoothing constant.

By repeated substitution, we can rewrite the equation as where time, s, goes from the first period in the time series, 0, to time period t-1.

The forecasts are constant for all years in the forecast period. The constant equals where T is the last year in the estimation sample and .

These equations illustrate that the projection is a weighted average based on exponentially decreasing weights. For higher smoothing constants, weights for earlier observations decrease more rapidly than for lower smoothing constants.

For each of the approximately 1,200 single exponential smoothing equations in this edition of Projections of Education Statistics, a smoothing constant was individually chosen to minimize the sum of squared forecast errors for that equation. The smoothing constants used to produce the projections in this report ranged from 0.001 to 0.999.

Double exponential smoothing is an extension of single exponential smoothing that allows the forecasting of data with trends. It produces different forecasts for different years in the forecast period. Double exponential smoothing with two smoothing constants was used to forecast the number of doctor's degrees awarded to men and women.

The smoothing forecast using double exponential smoothing is found using the three equations: Where denotes an estimate of the level of the series at time t, are the smoothing constants.

Forecasts from double smoothing are computed as where T is the last year in the estimation sample and . The last expression shows that forecasts from double smoothing lie on a linear trend with intercept and slope . Single exponential smoothing can be viewed as a special case of double exponential smoothing where the impact that time has on the forecasts has been eliminated (i.e., requiring the slope term to equal 0.0).

The smoothing constants for each of the two double exponential smoothing equations used for this report were selected using a search algorithm that finds the pair of smoothing constants that together minimizes the sum of forecast errors for their equation.

Beginning  with the Projections of Education Statistics to 2020, each smoothing  constant  was chosen separately. In earlier editions all the smoothing constants had been set to 0.4. Also beginnings with that edition, two smoothing constants, rather than one, were used for double exponential smoothing.

Multiple linear regression

Multiple linear regression was used in cases where a strong relationship exists between the variable being projected (the dependent variable) and independent variables. This technique can be used only when accurate data and reliable projections of the independent variables are available. Key independent variables for this publication include demographic and economic factors. For example, current expenditures for public elementary and secondary education are related to economic factors such as disposable income and education revenues from state sources. The sources of the demographic and economic projections used for this publication are discussed below, under “Assumptions.”

The equations in this appendix should be viewed as forecasting rather than structural equations. That is, the equations are intended only to project values for the dependent variables, not to reflect all elements of underlying social, political, and economic structures. Lack of available data precluded the building of large-scale structural models. The particular equations shown were selected on the basis of their statistical properties, such as coefficients of determination (R2s), the t-statistics of the coefficients, the Durbin-Watson statistic, the Breusch-Godfrey Serial Correlation LM test statistic, and residual plots.

The functional form primarily used is the multiplicative model. When used with two independent variables, this model takes the form: This equation can easily be transformed into the linear form by taking the natural log (ln) of both sides of the equation: One property of this model is that the coefficient of an independent variable shows how responsive in percentage terms the dependent variable is to a one percent change in that independent variable (also called the elasticity). For example, a 1 percent change in X1 in the above equation would lead to a b1 percent change in Y.

Assumptions

All projections are based on underlying assumptions, and these assumptions determine projection results to a large extent. It is important that users of projections understand the assumptions to determine the acceptability of projected time series for their purposes. All the projections in this publication are to some extent dependent on demographic and/or economic assumptions.

Demographic assumptions

Many of the projections in this publication are demographically based on the U.S. Census Bureau's 2008 National Population Projections (August 2008) and the Interim State Population Projections (April 2005).

The two sets of Census Bureau population projections are produced using cohort-component models. In order for the national-level population projections by age, sex, and race/ethnicity to be consistent with the most recent historical estimates released by the Census Bureau, the projections were ratio-adjusted by applying the ratio of the last historical estimate to the corresponding projections year to the projections for each age, sex, and race/ethnicity combination. This allows for a consistent set of historical estimates and projections. For more information on the methodology used for Census Bureau population projections, see appendix C, Data Sources.

The enrollment projections in this publication depend on Census Bureau population projections for the various age groups that attend school. The future fertility rate assumption (along with corresponding projections of female populations) determines projections of the number of births, a key factor for population projections. The fertility rate assumption plays a major role in determining population projections for the age groups enrolled in nursery school, kindergarten, and elementary grades. The effects of the fertility rate assumption are more pronounced toward the end of the forecast period, while immigration assumptions affect all years. For enrollments in secondary grades and college, the fertility rate assumption is of no consequence, since all the population cohorts for these enrollment ranges have already been born.

Economic assumptions

Various economic variables are used in the forecasting models for numbers of elementary and secondary teachers, public elementary and secondary school expenditures, and postsecondary enrollment.

The source of these variables is the trend scenario of the "U.S. Monthly Model January 2012: Short-Term Projections" developed by the economic consulting firm IHS Global Insight. The trend scenario depicts a mean of possible paths that the economy could take over the forecast period, barring major shocks. The economy, in this scenario, evolves smoothly, without major fluctuations.

For details about the primary assumptions used in this edition of Projections of Education Statistics, see table A-1 on page 84.

Accuracy of the projections

Projections of time series usually differ from the final reported data due to errors from many sources. This is because of the inherent nature of the statistical universe from which the basic data are obtained and the properties of projection methodologies, which depend on the validity of many assumptions.

The mean absolute percentage error (MAPE) is one way to express the forecast accuracy of past projections. This measure expresses the average absolute value of errors over past projections in percentage terms. For example, an analysis of projection errors over the past 28 editions of Projections of Education Statistics indicates that the MAPEs for public school enrollment in grades PK-12 for lead times of 1, 2, 5, and 10 years were 0.3, 0.6, 1.3, and 2.6 percent, respectively. For the 1-year-out projection, this means that one would expect the projection to be within 0.3 percent of the actual value, on average.

For a list of MAPEs for selected national statistics in this publication, see table A-2 on page 85. Sections A.1 through A.5 each contains at least one text table (tables A through F) that presents the MAPEs for the key national statistics of that section. Each text table appears directly after the discussion of accuracy of that section's national projections. For a list of MAPEs by state and region for public elementary and secondary enrollment, see tables A-7 through A-9 on pages 94-99 and for a list of MAPEs by state and region for the number of high school graduates in public schools, see table A-10 on pages 104-105.

Tables A-3 and A-4 present an example of how the MAPEs were constructed using actual values for national public elementary and secondary enrollment projections for schools years 2007 through 2010 and enrollment projections from the last four editions of Projections of Education Statistics. The top two panels of table A-3 shows the actual values for school years 2007 through 2010 and enrollment projections for each year from Projections of Education Statistics to 2017 with the number of projections generally decreasing by one for each subsequent edition. The bottom panel of table A-3 shows the percentage differences between the actual values and the projected values. For example, the projected value for 2007 presented in Projections of Education Statistics to 2017 was 0.7 lower than the actual value for that year.

The top panel of table A-4 shows the absolute value of the percent differences from table A-3 arranged by lead time rather than year. For example, in the Projections of Education Statistics to 2018,  the last year of actual data reported was 2006-07 and thus the lead time for the projection of 2007-08 data was 1 year. Thus, the 0.4 appearing in the 2007-08 column of Table A-3 for Projections of Education Statistics to 2018 appears in the column for lead times of 1 year in Table A-4, indicating that projection of the one-year-out  forecast from Projections of Education Statistics to 2018 differed by 0.4 percent in absolute terms from its actual value. The MAPEs for each lead time shown in the bottom panel of table A-4 were calculated by computing the average of the absolute values of the percentage differences for that lead time. For example, the absolute values of the percentage differences for lead time 2 for the four editions of the Projections of Education Statistics appearing on the top panel table A-4 are 0.7, 0.7, 0.1, and 0.4. The MAPE for a lead time of 2 years was then calculated by taking the average of these numbers, or 0.5. This matches the MAPE that appears in the bottom panel for a lead time of 2 years. (Calculations for table A-3 are based on unrounded numbers.) These MAPEs are different from the MAPEs for public elementary and secondary enrollment projections elsewhere in this report because the MAPEs in the example were calculated using only the last 4 editions of Projections of Education Statistics.

The number of years used in the analysis of the projection error differs by statistics both because projections of additional education statistics have been added to the report over time and because, for some statistics, there have been such a substantial change in the methodology used to produce the projections that the projections produced using the earlier methodology were not included in the analysis of the projection error. MAPEs are presented for a statistic only after it has been produced using substantially the same methodology in five previous editions of Projections of Education Statistics.