Skip to main content
top

Scoring

procedure

Scoring Procedure

In the 2018 NAEP ORF studyClick to open pdf., students’ passage reading was scored using the automatic speech analysis and scoring system with the exception of passage reading expression, which was scored by trained human scorers. Word list reading and pseudoword list reading was first transcribed by trained human scorers and then the rate, accuracy and words correct per minute (WCPM) variables were calculated by the automatic speech analysis and scoring system.

scoring

Passage Scoring

The 2018 NAEP ORF studyClick to open pdf. used a new automatic speech analysis and scoring system that transcribed students’ passage reading recordings and then aligned the resulting orthographic transcripts with the passage text in order to calculate rate, accuracy, and words correct per minute variables. The system recognizes accepted pronunciations of each word, taking into account dialect and second-language variations as long as the speaking pattern remains consistent throughout the reading.

The automatic speech analysis and scoring system began by identifying the “text span” in the passage (that is, the string of text that the student read aloud from the passage, starting with the first word that the student read and ending with the last word). Then the system transcribed and calculated the number of words within the text span that the student attempted to read (i.e., the “span length”). It also calculated the duration of oral reading time that the student spent reading the text span (i.e., the “span duration”). Lastly, the system counted the number of words that the student correctly read in the correct order within the span length (i.e., the number of correctly read words in the text span). These three pieces of information were used to calculate passage reading rate, accuracy, and WCPM variables.

wordlist

Word List and Pseudoword List Scoring

For recordings of word reading and pseudoword reading, trained human scorers transcribed students’ oral responses. Human transcription was conducted instead of machine transcription because students did not always follow the same order (e.g., from top to bottom and then left to right) when they read the word and pseudoword lists. After the lists were transcribed, the automatic speech analysis and scoring system produced a time alignment of each transcript with the corresponding student recording, calculated the length of time that the student spent reading the word list or pseudoword list, and counted the number of correctly read words or pseudowords from the list of words or pseudowords presented to students. These counts and the corresponding reading durations were combined to calculate the word reading and pseudoword reading WCPM (words correctly read per minute) variables. For example, if a student read 20 words correctly from the word list in 40 seconds, the word reading WCPM score would be 30 WCPM (20 words / 40 seconds X 60 seconds).

passage

Passage Reading Expression Scoring

Passage reading expression is a rating of the student's ability to clearly express the meaning and structure of the text through appropriate intonation, rhythm, emphasis, and pausing that groups words into phrasal and larger units in ways that will enhance understanding and enjoyment in a listener. This variable was scored by trained human scorers using a 6-point scoring rubric developed for the study, as shown below. The scorers received intensive training on the use of the rubric and then successfully completed a qualification evaluation that demonstrated their understanding of and ability to accurately use the rubric to rate students’ oral reading.

scale

NAEP Oral Reading Fluency Passage Reading Expression Scoring Rubric

ScoreLevelDescription
0Insufficient Sample
  • Insufficient sample for rating (fewer than 12 words read aloud correctly).
1Word by Word
  • Less than ¼ of the words read aloud with appropriate expression.
  • Reading focuses on individual words (not phrases, sentences, or the passage).
  • Reading is all or mostly monotone.
2Local Grouping
  • More than ¼ and less than ½ of the words read aloud with appropriate expression.
  • Reading focuses on local word groups (with little to no focus on phrases, sentences, or the passage).
  • Reading may be mostly arrhythmic or monotone.
3Phrase & Clause
  • More than ½ of the words read aloud with appropriate expression.
  • Reading expresses the structure or meaning of words, phrases, clauses, and a few sentences (with little or no focus on the passage).
  • Intonation may sometimes reinforce rhythmic grouping, or reading may be monotone.
4Sentence Prosody
  • More than ¾ of the words read aloud with appropriate expression.
  • Reading correctly expresses text and sentence structure and meaning (which may include non-local text connections).
  • Reading can be occasionally inconsistent, but not monotone.
  • Reading rate is at least 55 words per minute (at least 80 text-words-read to merit this level or above).
5Passage Expression
  • Passage read as if for a listener – of the passage portion read aloud, all or nearly all (at least 90 percent) is read with appropriate expression. The reading consistently expresses the structure and meaning of sentences, paragraphs, and the passage as a whole (which may include non-local text connections).
  • Reading may include a few word stumbles or misreading, but it is expressive throughout.
  • Reading rate is at least 80 words per minute (at least 120 text-words-read to merit this level).
8Silent Reader
  • Recording has audio signal, but no near-field speech from the student.
  • Audible background sounds, breathing, or microphone touching may suggest the reader did not speak throughout the recording period.
9Anomaly
  • Not 0, 1, 2, 3, 4, 5, or 8. Not a silent reader, nor any near-field reading aloud.
  • Possibly with off-task or irrelevant speech, evidence of confusion, or anything else unexpected, including electronic crackle or dead flat-line signal.
NOTE: Passage expression ratings of 8 and 9 were treated as missing as these students’ expression level could not be determined because of the quality/content of the audio file.
SOURCE: U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics, National Assessment of Educational Progress (NAEP) 2018 Oral Reading Fluency study.

Back to Top

reliability

Reliability of Scoring

Evaluating the reliability of scoring is an essential step that is taken to ensure the validity and accuracy of the analysis performed for the study.

automatic

Reliability of Passage Automatic Scoring

To evaluate the reliability of the automatic speech analysis and scoring system, a sample of the passage recordings (about 280 recordings for each passage, see column 3 of the table below) was transcribed by both the automatic speech analysis and scoring system and a trained human scorer. Each of the two transcriptions was aligned with reference to the passage text such that the alignment minimized insertions, deletions, and substitutions. Then, within the span of passage text that the student attempted to read, the system counted the number of words that were correctly read in the correct order. The correlation between the counts of words correctly read using the machine transcriptions and the human transcriptions of the same recordings for each passage is shown in the table below. On average, the interrater reliability between the machine and human transcriptions of the same recordings was 0.96.

Interrater Reliability for Passage Scoring

PassageMaximum Number
of Words
Number of
Second-Scored
Audio Recordings
Interrater
Reliability
Passage 1162279.99
Passage 2153275.98
Passage 3162283.93
Passage 4152283.94
NOTE: Hyphenated forms, e.g., ice-covered, were counted as two words. Passage interrater reliability is a correlation between the counts of words correctly read using the machine transcriptions and those using the human transcriptions of the same audio recording. The final passage interrater reliability (i.e. .96) is the average correlation across the four passages.
SOURCE: U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics, National Assessment of Educational Progress (NAEP) 2018 Oral Reading Fluency study.

pseudoword

Reliability of Word List and Pseudoword List Scoring

To examine the interrater reliability of word and pseudoword reading scoring, approximately 20 percent of students’ oral response recordings was transcribed by two different scorers to evaluate the reliability of human transcripts of the two list types. The correlation between the two human transcriptions was 0.99 and 0.97 for word reading and pseudoword reading, respectively.

human

Reliability of Passage Reading Expression Human Scoring

To examine the reliability of the human scoring for passage reading expression, approximately 40 percent of students’ passage reading responses across four passages were scored for expression by two scorers independently. Between two human scorings, the exact agreement rate (i.e., the percentage of scores that were exactly the same) was 58 percent and the adjacent agreement (i.e., the percentage of scores that were only one level different) was an additional 39 percent.

According to the standards for NAEP Writing assessment scoring, which has six scoring categories, exact agreement lower than 61 percent is flagged to indicate mild concern; exact agreement lower than 57 percent is flagged to indicate greater concern. Thus, the interrater agreement accomplished by the human scoring of passage reading expression (58 percent) was above the minimum standard. Learn more by reading about within-year interrater agreement.

Back to Top


Last updated 19 January 2023 (DS)