As with any study, there are limitations to TIMSS and TIMSS Advanced data that researchers should take into consideration. Estimates produced using data from TIMSS and TIMSS Advanced are subject to two types of error—nonsampling and sampling errors. Nonsampling errors can be due to errors made in collecting and processing data. Sampling errors can occur because the data were collected from a sample rather than a complete census of the population.
Nonsampling error is a term used to describe variations in the estimates that may be caused by population coverage limitations, nonresponse bias, and measurement error, as well as data collection, processing, and reporting procedures. The sources of nonsampling errors are typically problems like unit and item nonresponse, differences in respondents' interpretations of the meaning of the survey questions, response differences related to the particular time the survey was conducted, and mistakes in data preparation.
Missing data for background questions, administrative data, and assessment items were identified by separate missing data codes: omitted, uninterpretable, not administered, and not applicable. The assessment items also include a missing code for not reached. An item was considered omitted if the respondent was expected to answer the item but no response was given (e.g., no box was checked in the item which asked “Are you a girl or a boy?”). Items with invalid responses (e.g., multiple responses to a question calling for a single response) were coded as uninterpretable. The not administered code was used to identify items not administered to the student, teacher, or principal (e.g., those items excluded from the student's test booklet because of the booklet design which rotates assessment blocks across booklets). An item was coded as not applicable when it is not logical that the respondent answer the question (e.g., when the opportunity to make the response is dependent on a filter question). Finally, items that are not reached were identified by a string of consecutive items without responses continuing through to the end of the assessment or questionnaire.
The three key reporting variables identified in the TIMSS and TIMSS Advanced data for the United States—sex, race/ ethnicity, and the percentage of students in the school eligible for free or reduced-price lunch (FRPL)—all have low rates of missing responses. The response rates for these variables exceed the NCES standard of 85 percent and so can be reported without notation. Furthermore, the FRPL variable missing responses for public schools were imputed by substituting values taken from the Common Core of Data (CCD) for the schools in question. FRPL is only available for public schools.
Sampling errors arise when a sample of the population, rather than the whole population, is used to estimate some statistic. Different samples from the same population would likely produce somewhat different estimates of the statistic in question. This fact means that there is a degree of uncertainty associated with statistics estimated from a sample. This uncertainty is referred to as sampling variance and is usually expressed as the standard error of a statistic estimated from sample data. The approach used for calculating standard errors in TIMSS and TIMSS Advanced was jackknife repeated replication (JRR). Standard errors can be used as a measure for the precision expected from a particular sample. Standard errors for all of the reported estimates are included here.
Confidence intervals provide a way to make inferences about population statistics in a manner that reflects the sampling error associated with the statistic. Assuming a normal distribution, the population value of this statistic can be inferred to lie within the confidence interval in 95 out of 100 replications of the measurement on different samples drawn from the same population.
For example, if the average TIMSS mathematics score for the U.S. grade 8 students was 500, and had a standard error of 2.6, it can be stated with 95 percent confidence that the actual average of U.S. 8th-grade students was between 495 and 505 (1.96 x 2.6 = 5.1; confidence interval = 500 +/- 5.1).Back