
5. VARIANCE OF THE LINKING FUNCTIONIf the means and standard deviations used to construct the linking
function in Equation (1) were known without error, the transformed
value
However, the means and standard deviations used to construct Equation
(1) are based on sample data and hence are subject to various
sources of variability. This implies that the linking function
also is subject to variability and that the variance of
Each of these components will be considered in turn. Prior to
that, however, a general equation needs to be developed for the
variance of Equation (1) expressed the linked value Since
where the partial derivatives are evaluated x,
where
Since
one has
Estimates of Let X =
and
some algebra produces
Since
where
Equation (4) and the equivalent Equation (5) form the basis of
the variance estimate of As observed, the variance of
Table 2.Values of Var(x) and x used for comparing variances of the linked estimate
5.1 Component of Because both NAEP and TIMSS are samples, the estimates of the
statistics Traditional analysis procedures often assume that the observed data come from a simple random sample. That is, it is assumed that the observed values from different respondents are independent of each other and that these values are identically distributed. Such assumptions do not hold for data from complex sampling designs such as those used by NAEP and TIMSS. In fact, the complex sample designs of NAEP and TIMSS lead to variance estimates that are larger than the simple random sampling values. Both assessments use the jackknife procedure (see, e.g., Johnson and Rust 1992) to estimate the variance due to sampling. The aim of the jackknife is to simulate the repeated drawing of samples of individuals according to the specified sample design. Once the various replicate samples are available, it is straightforward to compute the statistic of interest, t, on each sample and from these, obtain a variance estimate. Pairs of first-stage sampling units (FSSUs) are defined to model the sample design as one in which two first-stage units are drawn within each of a number of strata. The sampling variability of any statistic t is estimated as the sum of the components of variability that may be attributed to each of the FSSU pairs. The variance attributed to a particular pair of FSSUs is measured by recomputing the statistic of interest, t, on an altered sample. The ith altered sample is created by randomly designating the two members of the ith FSSU pair as the first and second respectively, eliminating the data from the first FSSU, and replacing the lost information with that from the second FSSU of the pair. The statistic of interest is then recomputed producing the pseudoreplicate estimate ti. The component of sampling variability attributable to the ith pair of FSSUs is (ti-t)2. The estimated sample variance of the statistic t is the sum of these components across the M FSSU pairs2:
To estimate the sampling variance of the linking function, the
jackknife procedure is applied to estimate the sampling variance
for each of
3 Following accepted practice, the jackknife variance estimates
were based only on the first plausible value (see Mislevy, Johnson,
and Muraki 1992).
Table 3.Components of
Table 4 provides a comparison between the naive estimate of the variance
of
Table 4.Comparison of the naive estimate of
These results show that the inclusion of the sampling variability as a component of the variance of the linked estimate can substantially increase that variance estimate. The increases shown here are in accord with similar findings presented by Johnson, Mislevy, and Zwick (1990) who report a study where the traditional estimate of the standard error of a linked estimate of the mean underestimated by a factor of 1.6 a standard error that properly took the sampling variance into account.
5.2 Component of Both NAEP and TIMSS use IRT scaling models to summarize their
data (see, e.g., Mislevy, Johnson, and Muraki 1992). IRT was developed
in the context of measuring individual examinees' abilities. In
that setting, each individual is administered enough items to
permit a reasonably precise estimation of his or her ability,
The essential idea of plausible value methodology is to represent what the true proficiency of an individual might have been, had it been observed, with a small number of random draws from an empirically derived distribution of proficiency values that is conditional on the observed values of the assessment items and on background variables for each sampled student. These background variables are called conditioning variables. The random draws from the distribution can be considered to be representative values from the distribution of potential proficiencies for all students in the population with similar characteristics and identical patterns of item responses. The several draws from the distribution are different from each other in a way that quantifies the degree of precision in the underlying distribution of possible proficiencies that could have generated the observed performances on the items. Both NAEP and TIMSS provide five sets of plausible values. Following Rubin (1987) the plausible values are regarded as five completed data sets, where the mth data set consists of all information about each student along with the mth plausible value for that student. Calculating a statistic, t, based on the mth plausible value across all students provides an estimate, t(m), of t. A better estimate of t is tM, the mean of the t(m). The variance of tM consists of two components. The first component is the variance due to sampling subjects. There are five potential estimates of this variance, one for each plausible value, the mth estimated as the jackknife variance of t(m) according to Equation (6). While the best estimate of the sampling variance of tM is the average of the five jackknife estimates, due to the heavy computational requirement of computing five jackknife variances, the typical practice used by NAEP and TIMSS is to simply use the jackknife variance for the first plausible value. That practice will be followed in this report. The second component of the variance of tM is that which is due to not observing
Table 5 gives the components of
Table 5.Components of
Table 6 provides a comparison between the estimate of the variance of
Table 6.Comparison of the estimate of
It can be seen that, while the measurement error provides a noticeable increase in the size of the naive variance estimate, the bulk of the overall variance is determined by the sampling error component.
5.3 Component of As discussed earlier, statistical moderation can produce markedly different links if carried out with different samples of students. To be useful, the link between NAEP and TIMSS should be the same for various subpopulations. That is, the function linking TIMSS to NAEP should be the same for boys as it is for girls, for members of various ethnic categories, and for students in public and private schools. To the extent that the link is consistent across the subpopulations, there is increased confidence in the goodness of the link. Tables 7A and 7B provide estimates of
Table 7A.Parameters and linked estimates derived within subpopulationgrade 8 mathematics
Table 7B.Parameters and linked estimates derived within subpopulationgrade 8 science
On examining Tables 7A and 7B, some variability exists in the
parameter estimates across subgroups, particularly for the intercepts,
In essence, variability of the linking function across subpopulations is an indication of model misspecification. That is, the linking function needs to include terms related to specific subpopulations. This was the approach adopted by Williams, et al., (1995) in their linking of NAEP to the North Carolina End of Grade (NC-EOG) mathematics test. In their study, they noted different relationships between the NC-EOG and NAEP by gender and race. These differences were accounted for through the use of a prediction equation that included intercepts and slopes for those groups. A similar approach was adopted by Bloxom, et al., (1995) in a linkage of scaled scores on the Armed Services Vocational Aptitude Battery (ASVAB) with NAEP. However, both the NC-EOG and the ASVAB situations involved the construction of a linking function that would then be applied to individuals who are plausible members of the same population. That is, the NC-EOG to NAEP link was derived on a sample of North Carolina students for application in North Carolinathe ASVAB to NAEP link was based on a sample of the population to which the ASVAB is normally administered. This is less clearly the case for the linking of NAEP to TIMSS, where the linking is performed on the combined U.S. population, but the results are to be applied to separate states. Instead, it is reasonable to view the instability of the linking function across subgroups as a potential component of variance of the linking function. Suppose one has N subpopulations, which collectively constitute a partitioning of the population. For specificity, the 12 subpopulations formed by crossing gender by race/ethnicity (black, Hispanic, white+Asian+other) by school type (public, private) will be used. The selection of these specific subpopulations was made because they are key subgroups, and because the linking function could potentially differ across the subgroups. For subpopulation s, suppose the linking function is
where
Notice that
where E denotes expectation and S stands for subpopulation. By standard probability theory, the
following representation for the unconditional variance of
where ES and VarS denote the expectation and variance taken across subpopulations. The first term of Equation (9) is
where, for example,
is the weighted average of the subpopulation values of Approximating Equation (11) by
Thus, the variance of
where As and Bs are the population values of the intercept and slope for subpopulation
s and
Note that even if
with estimate The design effect measures the impact of complex sample data collection designs, such as used by NAEP and TIMSS, on the variance of a statistic. Specifically, the design effect is the ratio of the actual variance of the statistic, taking the data collection design into account, to the equivalent variance estimate obtained by ignoring the complex nature of the data caused by the sample design and by measurement error. Typically, the design effect is larger than 1. Additionally, it is possible that the design effects for subpopulations are smaller than those for the total population, implying that the ratio, d/D, could be smaller than 1. Experience based on NAEP, TIMSS, and other complex data sets suggests that the ratio could be as small as 0.5, implying that the multiplier for the expected value of the estimate of variance due to model misspecification could be as small as 5. Table 8 gives the values of
Table 8.Comparison of the component of variance due to model
misspecification estimated by
5.4 Component of One disadvantage with using the actual TIMSS and NAEP data to construct a link is due to the fact that TIMSS and NAEP were administered in different years. Any procedure that attempts to link 1996 NAEP scores to 1995 TIMSS scores, based only on the 1995 TIMSS and the 1996 NAEP samples, will suffer from an unavoidable confounding of secular changethe within-instrument change in achievement over timewith effects due to differences between the instruments. Estimation of the temporal effect of linking 1996 data to 1995 data is problematic, since any direct measure is lacking of the change in either NAEP or TIMSS measures of achievement between the 2 years. It is possible, by using related data (the NAEP long-term trend data from 1994 and 1996), to estimate the potential change in achievement as measured by NAEP between 1995 and 1996. As in every other case, it is impossible to estimate what the change in achievement would be in the TIMSS countries in 1996. Adjustment for temporal trend would potentially adjust m^N of the linking function by a prediction of the difference between the NAEP mean in 1996 and what the mean would have been in 1995. This difference is estimated by
where
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||