State and County Literacy Estimates - Frequently Asked Questions

Estimates Home

Overview

Frequently Asked Questions

Estimation Approach

General Cautions

Frequently Asked Questions

What is small area estimation and why is it necessary?
What states were oversampled?
How does the over sampling of certain states affect the indirect estimates?
What is the target population for the indirect estimates and is it the same as the NAAL direct estimates?
How many mental disability and language barrier cases are there?
What do the four literacy levels mean?
What does "lacking" Basic prose literacy measure?
Why do you compute the indirect literacy estimate only for prose?
What variables were considered as predictors for the small area model?
What is the statistical model behind the small area estimation process?
Can you give examples of other indirect estimates?
How do you know whether the model works?
How accurate are the indirect estimates?
Why does the website limit my choice to one pairwise comparison at a time?
What is the difference between a credible interval and a confidence interval?
What are the national estimates of low literacy?

1. What is small area estimation and why is it necessary?

The NAAL and NALS sample sizes are large enough to provide reasonably precise standard survey "direct" estimates of literacy levels for the nation's adults and for major population groups of interest such as gender and age. In addition, reasonably precise direct estimates of literacy levels can be produced for those states that participated in the SAAL and SALS surveys and for their major subdomains. However, the sample sizes in other states and in jurisdictions within states, such as counties, are not large enough to produce direct estimates of adequate precision (some larger states may have sufficient sample sizes but the survey design does not support state-level estimation). Indeed, some states and most counties in the nation have no sample in the surveys. Nevertheless, policymakers, business leaders, and educators/researchers often need literacy information for states and counties.

In response to this need, NCES has used a statistical modeling approach to produce model-dependent estimates of the percentages of adults in the lowest literacy level on the prose scale for all states and counties in the nation. These estimates are called "indirect" estimates to distinguish them from standard survey or "direct" estimates that are derived directly from responses of individuals who live in an area included in the assessment. The indirect estimates are produced using small area estimation techniques that rely both on literacy estimates from other geographic areas included in the assessment and on other variables such as educational attainment that are available for all counties from "auxiliary" data produced by other sources (such as the decennial Census). This approach uses sample information from all counties to "borrow strength" in producing the indirect estimates. By creating a model that predicts literacy levels for counties in the sample from the auxiliary data, the model can then be used to make predictions for all counties and states. Rao 2003 and Jiang and Lahiri (2006) provide comprehensive overviews and comparisons of models and methods for small area estimation.

Frequently Asked Questions

1. What is small area estimation and why is it necessary?

2. What states were oversampled?

3. How does the oversampling of certain states affect the indirect estimates?

4. What is the target population for the indirect estimates and is it the same as for the NAAL direct estimates?

5. How many mental disability and language barrier cases are there?

6. What do the four literacy levels mean?

7. What does "lacking"Basic prose literacy measure?

8. Why do you compute the indirect literacy estimate only for prose?

9. What variables were considered as predictors for the small area model?

10. What is the statistical model behind the small area estimation process?

11. Can you give examples of other indirect estimates?

12. How do you know whether the model works?

13. How accurate are the indirect estimates?

14. Why does the website limit my choice to one pairwise comparison at a time?

15. What is the difference between a credible interval and a confidence interval?

16. What are the national estimates of low literacy?