Skip Navigation

STATISTICAL APPENDIX

Note on standard errors

The information presented in this report was obtained from many sources, including federal, national, international, and state agencies, private research organizations, and professional associations. The data were collected using many research methods, including surveys of a universe (such as all colleges) or of a sample, compilations of administrative records, and statistical imputations. Readers should take particular care when comparing data from different sources. Differences in procedures, timing, phrasing of questions, and interviewer training mean that the results from the different sources may not be strictly comparable. In the Sources of Data section, descriptions of the information sources and data collection methods are presented, grouped by sponsoring organization. More extensive documentation of a particular survey's procedures does not imply more problems with the data, only that more information is available.

Many of the data in this report emanate from universe surveys. Higher education enrollment and finance figures from the Integrated Postsecondary Education Data System, for example, come from surveys that cover virtually all collegiate institutions in the United States.

Three of the most important sources of data for this report, however, provide estimates based on large samples. Figures from the 1990 U.S. Census of Population and Housing are derived from a 5 percent sample of U.S. households that filled in the long form of the decennial Census. Figures from the International Assessment of Education Progress and National Assessment of Education Progress are derived from either of two samples -- of students (13-year-old students in countries or 8th-grade public school students in the United States) or school administrators at participating schools.

Unless otherwise noted, all statements based on sample surveys cited in the text were tested for statistical significance and are statistically significant at the .05 level. Several test procedures were used. Which procedure was used depended upon the type of data being interpreted and the nature of the statement being tested. The most commonly used procedure was multiple t-tests with a Bonferoni adjustment to the significance level. When multiple comparisons between more than two groups were made, even if only one comparison is cited in the text, a Bonferoni adjustment to the significance level was made to ensure the significance level for the tests as a group was at the .05 level. This commonly arises when making comparisons between the United States and other countries or between U.S. states.

Accuracy of data

The accuracy of any statistic is determined by the joint effects of sampling and nonsampling errors. Estimates based on a sample will differ somewhat from the figures that would have been obtained if a complete census had been taken using the same survey instruments, instructions, and procedures. In addition to such sampling errors, all surveys, both universe and sample, are subject to design, reporting, and processing errors and errors due to nonresponse. To the extent possible, these nonsampling errors are kept to a minimum by methods built into the survey procedures. In general, however, the effects of nonsampling errors are more difficult to gauge than those produced by sampling variability.

Sampling errors

The samples used in surveys are selected from a large number of possible samples of the same size that could have been selected using the same sample design. Estimates derived from the different samples would differ from each other. The difference between a sample estimate and the average of all possible samples is called the sampling deviation. The standard or sampling error of a survey estimate is a measure of the variation among the estimates from all possible samples and, thus, is a measure of the precision with which an estimate from a particular sample approximates the average result of all possible samples.

The sample estimate and an estimate of its standard error permit us to construct interval estimates with prescribed confidences that the interval includes the average result of all possible samples. If all possible samples were selected under essentially the same conditions and an estimate and its estimated standard error were calculated from each sample, then: 1) approximately 2/3 of the intervals from one standard error below the estimate to one standard error above the estimate would include the average value of all possible samples; and 2) approximately 19/20 of the intervals from two standard errors below the estimate to two standard errors above the estimate would include the average value of all possible samples. We call an interval from two standard errors below the estimate to two standard errors above the estimate a 95 percent confidence interval.

The estimated standard errors for two sample statistics can be used to estimate the precision of the difference between the two statistics and to avoid concluding that there is an actual difference when the difference in sample estimates may only be due to sampling error. The need to be aware of the precision of differences arises, for example, when comparing mean proficiency scores between states in the National Assessment of Educational Progress. The standard error, s(A-B), of the difference between sample estimate A and sample estimate B (when A and B do not overlap) is:

                      _________________
                     /
       S(A-B) = /\  /  s^2(A) + s^2(B)
                  \/

where s(A) and s(B) are the standard error of sample estimates A and B, respectively. When the ratio (called a t-statistic) of the difference between the two sample statistics and the standard error of the difference as calculated above is less than 2, one cannot be sure the difference is not due only to sampling error and caution should be taken in drawing any conclusions. In this report, for example, we would not conclude there is a difference. Some analysts, however, use the less restrictive criterion of 1.64, which corresponds to a 10 percent significance level, and would conclude there is a difference.

To illustrate this further, consider the data on mathematics proficiency of 13-year-olds in Table 25a and the associated standard error Table 25ax. The estimated average mathematics proficiency score for the sample of 13-year-olds in the United States was 262. For the sample in Ireland, the estimated average was 269. Is there enough evidence to safely conclude that this difference is not due only to sampling error and that the actual average mathematics proficiency of 13-year-olds in the United States is lower than for their counterparts in Ireland? The standard errors for these two estimates are 1.2 and 1.0, respectively. Using the above formula, the standard error of the difference is calculated as 1.6. The ratio of the estimated difference of 7 to the standard error of the difference of 1.6 is 4.38. Using the table below, it can be seen that there is less than a 1 percent chance that the 7 point difference is due only to sampling error, and one may safely conclude that the proficiency scores of 13-year-olds in the United States are lower than those of their counterparts in Ireland.

Percent chance/1 that a difference is due only to sampling error:

----------------------------------------------------------------
t-statistic      1.00     1.64     1.96     2.00     2.57
Percent chance    32       10        5       4.5      1
-----------------------------------------------------------------
1/ Based on a 2-tailed test.

When examining a table, most readers draw conclusions after making multiple comparisons within the table. In these circumstances, the chance that one of the many differences examined is only a result of sampling error increases (accumulates) as the number of comparisons increases. One procedure to ensure that the likelihood of any of the comparisons being only a result of sampling error stays less that 5 percent is to reduce this risk for each of the comparisons being made. If N comparisons are being made, then divide 5 percent by N and ensure that the risk of a difference being due only to sampling error is less than 5/N for each comparison. The table below provides critical values for the t-statistic for each comparison when it is a part of N comparisons.

--------------------------------------------------------------------------------------------
Number of comparisons       1        2        3       4        5       10      20      40
Critical value/1         1.96     2.24     2.39     2.50     2.58    2.81     3.02    3.23
--------------------------------------------------------------------------------------------
1/ Based on a 2-tailed test.

For example, a reader might examine Table 25a not for the purpose of comparing the United States to Ireland but to compare the United States to, say, its economic competitors which includes many, say 10, of the countries in the table. After making 10 comparisons, the reader may want to draw the conclusion: With the exception of Spain, 13-year-olds in the United States had lower mathematics proficiency scores than 9 other of its economic competitors for which data are available. If the reader uses the critical value of 1.96 to make each of the 10 comparisons, the chance that some component of the statement is due only to sampling error is greater than 5 percent. To compensate, the reader should use the critical value of 2.81. In this case, each of the 9 t-statistics is greater than 2.81 and the conclusion is safe to make.

It should be noted that most of the standard error estimates presented in subsequent sections and in the original documents are approximations. That is, to derive estimates of standard errors that would be applicable to a wide variety of items and could be prepared at a moderate cost, a number of approximations were required. As a result, the standard error estimates provide a general order of magnitude rather than the exact standard error for any specific item.

Nonsampling errors

Universe and sample surveys are subject to nonsampling errors. Nonsampling errors may arise when respondents or interviewers interpret questions differently, when respondents must estimate values, or when coders, keyers, and other processors handle answers differently, when persons who should be included in the universe are not, or when persons fail to respond (completely or partially). Nonsampling errors usually, but not always, result in an understatement of total survey error and thus an overstatement of the precision of survey estimates. Since estimating the magnitude of nonsampling errors often would require special experiments or access to independent data, these nonsampling errors are seldom available.

Note on standard errors of estimates from the International Assessment of Educational Progress and the National Assessment of Educational Progress (Indicators 16, 17, 18, 19, 20, 25)

Standard errors used here for these two data sources are, in most cases, copied directly from their own publications. In some cases, however, more than one category of response from a multiple response question have been combined. (Such cases are footnoted on the standard error tables when relevant.) To approximate the standard error for these figures, the design effect was obtained for each percentage included in the summation. The design effect was approximated for the combined percentage (represented in the tables) as the average of these component design effects. The standard errors presented represent the standard error that would result from a simple random sample, inflated by the square root of the average design effect of the component percentages.

Tables

Note on standard errors of U.S. Census estimates

Instructions for calculating standard errors for indicators based on U.S. Census estimates (Indicators 2, 3, 8, 9, 10, 11, 12, 13, 21, 22, 23, 24, 26, 27, 28, 29)

Data for these indicators were prepared from a 5 percent public use microdata sample based on the 1990 Census long form sample. Differences between these figures and the actual numbers may stem from several sources, categorized as either sampling errors or non-sampling errors. Sampling error refers to differences between the actual figures and those that are estimated from a sample. All estimates based on samples are subject to sampling error, and methods are readily available to estimate the likelihood of this type of error. This note explains how to estimate the precision of these U.S. Census estimates. Other errors, known as non-sampling errors are more difficult to quantify, and stem from errors in reporting, data collection, processing, and estimation. The method discussed here does not address non-sampling errors.

The "standard error" is a measure of the sampling error of an estimate from a sample. In general, we can be about 95 percent certain that the actual value falls within an interval defined by the estimate plus or minus two times the standard error. This interval is called the 95 percent confidence interval.

Below, we describe how standard errors and confidence intervals can be calculated. Because the microdata sample is not based on a simple random sample, the sample design slightly complicates calculation of the standard error. Here, we provide a simple method for approximating the standard errors for the U.S. Census estimates, taking into account that they are not based upon a simple random sample. The approximation entails two steps: first, calculate the standard error as though it resulted from a simple random sample. Second, apply an adjustment factor (called the design factor ) that reflects the differences between the actual sample design and a simple random sample design.

Step 1: Use the formula given below to calculate the unadjusted standard error assuming a simple random sample. (Base population numbers are provided in Table S23 and the estimated percentage is found in the tables of the corresponding indicator.)

                            ___________________
                           /
          Se(p)     = /\  /   19*p*(100-p)/B          where:
                        \/

B = base population of estimated percentage (weighted total), obtained from Table S23, and
p = estimated percentage, obtained from the indicator table

Step 2: First, identify the appropriate characteristic and the corresponding design effect from the list in Table S22. (For example, if calculating the standard error for a percentage in Indicator 3, Labor force participation, the appropriate population characteristic would be Employment status, and the corresponding design effect would be 1.2.)

Then, multiply the unadjusted standard error assuming a simple random sample found in step 1 by the design effect identified above. The resulting value approximates the (adjusted) standard error.

NOTE: Standard errors of percentages derived in this manner are approximate. Calculations can be expressed to several decimal places, but to do so would indicate more precision in the data than is justifiable. Final results should contain no more than two decimal places.

Table S22 Standard error design factors

---------------------------------------
Characteristic            Design factor
---------------------------------------
Age                            1.2
Sex                            1.2
Educational attainment         1.3
School enrollment              1.8
Employment status              1.2
Household income in 1989       1.2
---------------------------------------

Table S23: Base populations for indicators using 1990 U.S. Census estimates, by population group (and indicator) and state: 1990

Table S23 is divided into 8 pages:

Supplemental Notes Table of Contents Glossary