NAAL scoring is designed to measure adults’ abilities to perform literacy tasks in everyday life. Since adults are likely to make mistakes as they interact with printed and written material, NAAL scorers make allowances for partial responses and writing errors.
While most responses are either correct or incorrect, a response can be partially correct if the information provided is still useful in accomplishing the task. For example, a respondent who writes the wrong product price on a catalog order form could receive partial credit because in real life such a minor error would not necessarily result in the placement of an incorrect order because other information is provided, such as product name, price, etc. However, if a respondent miswrites a social security number on a government application form, such an error would not receive partial scoring.
Similarly, responses containing writing errors—grammatical and spelling errors, use of synonyms, incomplete sentences, or circling instead of writing the correct answer—are scored as correct as long as the overall meaning is correct and the information provided accomplishes the task. However, if a respondent is filling out a form and writes the answer on the wrong line, or if, for a quantitative task, the calculation is right but the respondent writes the wrong answer in the blank, then the response is scored as incorrect.
During the task development stage, scoring experts develop scoring rubrics that detail the rules for scoring each assessment question. To ensure that all assessment questions are scored accurately, NAAL scoring rubrics undergo several stages of verification both before and after the assessment is administered.
Before the main NAAL study begins, a field test of about 1,400 adults is conducted to help identify and screen out problems with the scoring rubrics, such as alternative correct responses and scoring rubrics that are difficult to implement consistently (thus leading to low rates of interrater reliability).
After the main study ends, a sample of responses from the household and prison interviews is scored using the scoring rubrics. As the test developers score the sample responses, they make adjustments to the scoring rubrics to reflect the kinds of responses adults gave during the assessment. Together, these sample responses and the revised scoring rubrics are used in training the scorers who score the entire assessment.
In a group setting, scorers are trained to recognize each task and its corresponding scoring rubric, as well as sample responses that are representative of correct, partially correct, and incorrect answers. After group training, readers score numerous practice questions before they begin to score actual booklets.
To ensure that readers are scoring accurately, 50 percent of the assessment questions are subject to a second interrater reliability check, in which second readers score the booklet and their scores are compared to those of the first readers. Interrater reliability is the percentage of times two readers agree exactly in their scores. (In 1992, the average percentage of agreement was 97.) Any batch of questions that exceed a low level of scoring mistakes is sent back to the scorers for corrections. Also, the scoring supervisor discusses the discrepancy with the scorers involved. Quality-control procedures like this ensure reliability of the scoring.