Confidence Intervals: NDE Statistical Specification

Statistical testing is based on confidence intervals. In the calculation of confidence intervals, a separate procedure is required for each of four types of weighted statistics: means, student group distribution proportions, achievement level proportions, and percentiles.

Means
For weighted means, denoted as X bar, the confidence interval takes on the form:
X bar minus t sub df times SE sub X, X bar plus t sub df times SE sub X

wheret sub dfis the 97.5th quantile of the t-distribution with degrees of freedom as estimated following the usual formula of    

df equals quantity the sum of r of the square of X bar sub 1, r minus X bar sub 1 quantity squared divided by the sum of r of quantity X bar sub 1, r minus X bar sub 1 quantity to the 4th power

where r denotes the mean based on replicate weight r and the index 1 denotes that the first plausible value is used for this computation. The Johnson-Rust adjustment needs to be applied to the df result.

Furthermore,

SE squared sub X equals the sum of r of the square of X bar sub 1, r minus X bar sub 1 plus 1 plus uppercase M to the -1 power divided by uppercase M minus 1 times the sum of lowercase m of the square of X bar sub lowercase m minus X bar     

where m denotes the plausible value and X baris the average over plausible values.
 

Student group proportions

The method of choice for proportions is a method derived by Wilson. Wilson’s approach takes on the following (asymmetric) form:   

P hat times n tilde plus one half times t square sub df divided by n tilde plus t square sub df minus t sub df times the square root of n tilde divided by n tilde plus t square sub df times the square root of p hat times 1 minus p hat plus t square sub df over 4 times n tilde, P hat times n tilde plus one half times t square sub df divided by n tilde plus t square sub df plus t sub df times the square root of n tilde divided by n tilde plus t square sub df times the square root of p hat times 1 minus p hat plus t square sub df over 4 times n tilde

There are three variables in this equation:

p hatis the estimated achievement level proportion,
n tilde is the effective sample size, which is computed as n tilde equals n divided by deff equals p hat times 1 minus p hat divided by SE squared
where
n is the weighted sample size of the sample (NOT the population estimate). When the proportion observed in the sample is 0, then the limits have to be evaluated. Since the denominator is a squared term, it will reach 0 quicker than the numerator and, thus, the effective sample size becomes infinite. Hence, an additional restriction is placed which is that n tilde equals the minimum of n tilde and n, which basically means that in very small samples the design effect is 1. The logic is that in very small samples students are approximately at random distributed. Empirically it can be verified that in relatively small samples, unless a specific clustering exists, the design effect is relatively close to 1.

t sub dfis the t-distribution with df degrees of freedom.

NOTE that the degrees of freedom does not exist when the proportion is zero. By inspecting the limits, the denominator goes to zero faster than the numerator. Instead, a t-distribution with one degree of freedom may be chosen, i.e.,
df equals maximum of 1 and df
 

Achievement level proportions

For achievement level proportions the same procedure as above is followed except that the standard error also has to take into account the variance due to measurement. This component can be easily added to the design effect to decrease the effective sample size and increase the variation accordingly. Specifically, this component is
v of p hat equals uppercase M minus 1 to the -1 power times the sum of lowercase m of the square of p hat sub lowercase m minus p hat  
where p hat lowercase mis the proportion estimate based on the mth plausible value, and the average of that is the estimated proportion p hat.

This component is expected to be very small since the proportion is a summary statistic, which are generally quite stable across plausible values unless a particular small group is queried. The effective n is:

n tilde equals p hat times 1 minus p hat divided by SE squared equals p hat times 1 minus p hat divided by the sum of r of quantity p hat sub r, 1 minus p hat sub 1 quantity squared plus 1 plus M to the -1 power times v sub p hat
where r is the index for the replicate weight and 1 denotes that the first plausible value is used. The formula of the interval is similar to that for student group proportions, including the adjustments for minimum design effects and degrees of freedom.
 

Percentiles  

Percentiles can be computed using much of the same techniques as above. This approach is somewhat different from the current approach.

1. Both standard error components can be found by finding the student who is exactly at the pth percentile (or by using the usual extrapolation if such student does not exist) and finding this student's proportion (if ranked) across replicate weights and plausible values.

2. Then, first a lower and upper bound can be found accounting for measurement.

3. Subsequently, the Wilson formula can be applied similar as with achievement level proportions.

4. After finding the final lower and upper bound for the proportion, the average plausible value can be used to translate these bounds into bounds in the percentile scale.

Again, the same adjustments for small proportions are used, although these will usually not be an issue as the exact percentile is known (i.e. manipulated).Note that the weight of a particular student at the pth percentile may be zero for a particular replicate weight and therefore equal to the student below him or her with a non-zero weight.