Frequently Asked Questions
- What is small area estimation and why is it necessary?
- What states were oversampled?
- How does the over sampling of certain states affect the indirect estimates?
- What is the target population for the indirect estimates and is it the same as the NAAL direct estimates?
- How many mental disability and language barrier cases are there?
- What do the four literacy levels mean?
- What does "lacking" Basic prose literacy measure?
- Why do you compute the indirect literacy estimate only for prose?
- What variables were considered as predictors for the small area model?
- What is the statistical model behind the small area estimation process?
- Can you give examples of other indirect estimates?
- How do you know whether the model works?
- How accurate are the indirect estimates?
- Why does the website limit my choice to one pairwise comparison at a time?
- What is the difference between a credible interval and a confidence interval?
- What are the national estimates of low literacy?
1. What is small area estimation and why is it necessary?
The NAAL and NALS sample sizes are large enough to provide reasonably precise standard survey "direct" estimates of literacy levels for the nation's adults and for major population groups of interest such as gender and age. In addition, reasonably precise direct estimates of literacy levels can be produced for those states that participated in the SAAL and SALS surveys and for their major subdomains. However, the sample sizes in other states and in jurisdictions within states, such as counties, are not large enough to produce direct estimates of adequate precision (some larger states may have sufficient sample sizes but the survey design does not support state-level estimation). Indeed, some states and most counties in the nation have no sample in the surveys. Nevertheless, policymakers, business leaders, and educators/researchers often need literacy information for states and counties.
In response to this need, NCES has used a statistical modeling approach to produce model-dependent estimates of the percentages of adults in the lowest literacy level on the prose scale for all states and counties in the nation. These estimates are called "indirect" estimates to distinguish them from standard survey or "direct" estimates that are derived directly from responses of individuals who live in an area included in the assessment. The indirect estimates are produced using small area estimation techniques that rely both on literacy estimates from other geographic areas included in the assessment and on other variables such as educational attainment that are available for all counties from "auxiliary" data produced by other sources (such as the decennial Census). This approach uses sample information from all counties to "borrow strength" in producing the indirect estimates. By creating a model that predicts literacy levels for counties in the sample from the auxiliary data, the model can then be used to make predictions for all counties and states. Rao 2003 and Jiang
and Lahiri (2006) provide comprehensive overviews and comparisons of models and methods for small area estimation.
2. What states were oversampled?
The states oversampled in 2003 (SAAL states) were Kentucky, Maryland, Massachusetts, Missouri, New York, and Oklahoma. The states oversampled in 1992 (SALS states) were California, Illinois, Indiana, Iowa, Louisiana, New Jersey, New York, Ohio, Pennsylvania, Texas, and Washington.
3. How does the oversampling of certain states affect the indirect estimates?
The main purpose of the SAAL and SALS samples was to enable states to produce reliable direct state estimates of literacy levels for all scales, at all levels, and for their major subgroups. The larger sample sizes in these states were also beneficial in producing generally more precise state and county indirect estimates of the percentages of adults lacking Basic Prose Literacy Skills (BPLS).
4. What is the target population for the indirect estimates and is it the same as for the NAAL direct estimates?
The NAAL and the NALS household samples were designed to be nationally representative samples of the population of persons who were 16 years of age or older, excluding persons not living in households or college dormitories, at the time of the interview. This population is the starting point for both the direct and indirect estimates. Adults who could not be tested because of a mental disability that precluded conducting the interview do not contribute to either the direct or the indirect estimates. The direct estimates also exclude adults who were unable to take the assessment because of a language barrier. However, these adults are included in the indirect estimates and are classified as lacking Basic Prose Literacy Skills (BPLS) on the grounds that they can be considered to be at the lowest level of English literacy. As a result, the indirect estimates of the percentages of adults lacking BPLS are not comparable to the percentages of adults Below Basic in prose literacy in other NAAL or NALS published results.
In addition to the household samples, both NAAL and NALS included samples of adults from federal and sate prisons. The inmate samples did not contribute to the indirect county and state estimates presented in this report.
5. How many mental disability and language barrier cases are there?
Of the adults sampled for NAAL, 1 percent was classified as mental disability cases and 2 percent were classified as language barrier cases. The NALS had same percentages of mental disability and language barrier cases as the NAAL.
6. What do the four literacy levels mean?
The NAAL used a set of four categories: Below Basic, Basic, Intermediate, and Proficient to describe the literacy levels of the adult population in prose, document, and quantitative literacy. For definitions of the four levels, see NAAL's webpage on Performance Levels. The indirect estimates were computed for prose only. See question 8 for further explanation.
7. What does "lacking"Basic prose literacy measure?
Adults in the Below Basic group and those not able to take the assessment because of a language barrier are classified as lacking Basic Prose Literacy Skills (BPLS). The percentage of those who lack BPLS reflects the magnitude of the adult household population at the lowest level of English literacy. The literacy of adults who lack BPLS ranges from being unable to read and understand any written information to being able only to locate easily identifiable information in short, commonplace prose text in English, but nothing more advanced. For the indirect estimates, adults who were not able to take the assessment because of a language barrier are included.
8. Why do you compute the indirect literacy estimate only for prose?
Three components of literacy were measured in the 2003 NAAL and the 1992 NALS: prose, document, and quantitative. Reviews of the NAAL literacy (direct) estimates showed that prose performed better in measuring literacy skills at the lower end of the literacy scales than did the other components.
9. What variables were considered as predictors for the small area model?
More than 100 county-level variables across 20 major types of variables (e.g., poverty, income, education, occupation) were examined as potential predictors for the percentage of adults lacking Basic Prose Literacy Skills in the small area modeling used to produce the 2003 NAAL indirect estimates,. The primary source was county-level data from the 2000 Census of Population. Summary File 3 (SF3) was used to extract county-level auxiliary variables. The SF3 contains the Census "short form" items (asked of all households) including information about age, gender, race, Hispanic or Latino origin, household relationship, and owner/renter status. The SF3 also contains the Census "long form" data coming from questions asked of about one-sixth of America's households. The questions ask about income, education, language spoken, housing structure, housing costs, commuting, and many other topics. In addition to the Census of Population, various other sources were used for obtaining county-level and state-level auxiliary variables, for example, the Bureau of Economic Analysis (BEA) per capita personal income estimates for local areas, the Census Bureau's Small Area Income and Poverty Estimates (SAIPE) program, and the U.S. Department of Agriculture (USDA) Economic Research Service Rural-Urban Continuum Codes program. For the 1992 NALS model, in general the variables used in the final Hierarchical Bayes (HB) model for 2003 were considered (using the 1990 Census variable definitions) along with language spoken from the 1990 Census long form. A list of the final predictor variables is given in the Estimation Approach.
10. What is the statistical model behind the small area estimation process?
The statistical model used to produce the indirect county estimates of the percentages of adults lacking Basic Prose Literacy Skills (BPLS) was developed using the 2003 NAAL data; the same modeling approach was then applied to the 1992 NALS data. A Hierarchical Bayes (HB) model was adopted using a Markov Chain Monte Carlo (MCMC) method, and was implemented using the WinBUGS software (Lunn et al. 2000). The key component of the approach was to develop a logit model (linear logistic regression model) to predict the direct county percentages of adults lacking BPLS for counties with sample respondents from a set of auxiliary variables that were available and measured consistently for all counties. Non-informative prior distributions were used for the model parameters.
The posterior distributions for the model parameters were used to produce the indirect estimates for all U.S. counties based on their values for the predictor variables and incorporating information about the direct estiamtes in counties with sample data. The state estimates were created by aggregating the county estimates, again using an HB approach. See Small Area Estimation Method for state and county estimate for more information.
See the NAAL Small Area Estimation Technical Report (U.S. Department of Education, National Center for Education Statistics, 2007) for further details of the model.
11. Can you give examples of other indirect estimates?
The Census Bureau's Small Area Income and Poverty Estimates (SAIPE) is another example of indirect estimates. SAIPE provides annual estimates of income and poverty for states, counties, and school districts. Indirect estimates are also produced for the National Survey of Drug Use and Health. Other examples can be found in the Federal Committee on Statistical Methodology, Statistical Policy Working Paper 21.
12. How do you know whether the model works?
A number of methods were used to evaluate the fit of the Hierarchical Bayes (HB) models to the county direct estimates. None of the methods indicated appreciable problems with the final models. For the 2003 NAAL, alternative models were fit to the data to determine whether the model results were sensitive either to the prior distributions used for modeling or to the set of auxiliary variables used in the model. This analysis supported the choice of the final model and indicated that the indirect estimates were not sensitive to the variants of the model that were investigated. The final model also proved satisfactory with regard to several diagnostic tests of fit. In addition, comparisons of direct estimates for a variety of domains defined along different dimensions with aggregations of the indirect county estimates for those domains showed a close correspondence in each case. A comparison between the NAAL and NALS results showed that the models were generally comparable in their ability to fit the data at the county level.
13. How accurate are the indirect estimates?
Overall, the levels of precision of the 2003 and 1992 model estimates for sample counties are fairly comparable. The county estimates have median (CVs) of 33.0 percent for the 2003 NAAL and 34.7 percent for the 1992 NALS. The state estimates are more precise, with median CVs of 14.0 and 15.3 for the 2003 NAAL and the 1992 NALS, respectively. Overall, the analysis of the 2003 and 1992 results indicated that gains in precision were achieved in the indirect estimates for SAAL and SALS states as a result of their increased sample size.
14. Why does the website limit my choice to one pairwise comparison at a time?
When the credible interval for a difference does not include 0, there is a statistical risk that there is in fact not a true difference. As the number of comparisons conducted increases, so does the risk that a false conclusion of a significant difference is made for one or more of the differences being compared. To focus users on specific comparisons, the pairwise comparison tool is constructed to allow only one comparison at a time.
15. What is the difference between a credible interval and a confidence interval?
A credible interval for the percentage of the adult population in a county or state lacking Basic Prose Literacy Skills (BPLS) defines the interval for which there is a specified probability (often chosen to be 95 percent) that the true value of the percentage lacking BPLS is within this interval, given initial assumptions of what the value of this percentage may be and information provided in the data. A confidence interval uses information to describe the range of values for which the true value of percentage of the population lacking BPLS could come, given the estimate calculated from the available data. In the context of hypothesis testing, a 95 percent confidence interval for an estimate of the percentage of the population lacking BPLS indicates the range of values for which the hypothesis equaled the value would be accepted with 95 percent confidence. Using traditional hypothesis testing, we would reject the hypothesis that values of percentage of the population lacking BPLS outside of a given confidence interval could have produced the observed value this percentage at the level of confidence associated with the confidence interval
16. What are the national estimates of low literacy?
The national direct estimates of the percentages of adults lacking BPLS are 14.5 percent for the 2003 NAAL and 14.7 percent for the 1992 NALS. In comparison, the national direct estimates of the percentages Below Basic in prose literacy are 13.6 percent for the NAAL and 13.8 percent for the NALS.