In all of the reports, estimates of composite and scale score means and percentages of students at or above achievement levels are reported for the population of students in each grade. These estimates are also reported for student groups of interest as defined by responses to background questions (as measured in the student, teacher, and school questionnaires) and from school records. Specific interest is placed on key variables such as
However, for some regions of the country and sometimes for the nation as a whole, school and/or student sample sizes are too small to permit accurate reporting of one or more of the categories of these variables .
A consideration in deciding whether to report an estimated quantity is whether the sample size is sufficient to detect a specific, minimum effect size. Below that sample size, the population representation of the sample becomes questionable. A second consideration is whether the standard error estimate that accompanies a statistic is itself sufficiently accurate to inform potential readers about the reliability of the statistic. The precision of a sample estimate (be it sample mean or standard error estimate) for a population group from a three-stage sample design (the one used to select samples for the national main assessments) is a function of the sample size of the student group and of the distribution of that sample across first-stage sampling units (i.e., primary sampling units (PSUs) in the case of the national main assessments).
NAEP only reports student group results if the student sample size is 62 or more. This number was obtained by determining the sample size necessary to detect an effect size of 0.5 with a probability of 0.8 or greater. The effect size of 0.5 pertains to the "true" difference in mean scale score between the group in question and the total population, divided by the standard deviation of scale score in the total population. A design effect of two was assumed for this, implying a sample design-based variance twice that of simple random sampling. This assumption is consistent with previous NAEP experience (Johnson and Rust 1992). In carrying out the statistical power calculations when comparing any student group to the total group, it was assumed that the total population sample size is large enough to contribute negligibly to standard errors.
Furthermore, NAEP only reports student group results if the students within a student group are adequately distributed across PSUs to allow for reasonably accurate estimation of standard errors. NAEP only publishes those statistics that have standard error estimates based on five or more PSUs.
In earlier reports (before 2005), a flag (“!”) was placed next to estimates with a large coefficient of variation. Specifically, the coefficient of variation of the denominator was used to provide an indication of possible unreliability of the standard error estimate (Hansen, M and Tepping, B. 1985). Empirical research (Oranje 2006) with 2003 NAEP samples this flag was an unreliable indicator and its use was discontinued.