-A-
An accommodation is a change in how a test is presented, in how a test is administered,
or in how the test taker is allowed to respond. This term generally refers to
changes that do not substantially alter what the test measures. The proper use
of accommodations does not substantially change academic level or performance
criteria. Appropriate accommodations are made to provide equal opportunity to
demonstrate knowledge.
An African American or Black person has origins in any of the black racial groups
of Africa. Terms such as "Haitian" or "Negro" can be used
in addition to "Black or African American."
An American Indian or Alaska Native person has origins in any of the original
peoples of North and South America (including Central America), and who maintains
tribal affiliation or community attachment.
An Asian person has origins in any of the original peoples of the Far East,
Southeast Asia, or the Indian subcontinent, including, for example, Cambodia,
China, India, Japan, Korea, Malaysia, Pakistan, the Philippine Islands, Thailand,
and Vietnam.
An assessment is any systematic procedure for obtaining information from tests
and other sources that can be used to draw inferences about characteristics
of people, objects, or programs.
An award incentive plan links all or some of the contract deliverables to performance
incentive payments beyond the fixed fee of the contract. There are minimum performance-based
requirements that must be specified in order for a contract to be considered
as an Award Incentive performance-based contract.
-B-
The base weight is the inverse of the probability of selection.
A bridge study continues an existing methodology concurrent with a new methodology
for the purpose of defining the relationship between the new and old estimates.
A Black or African American person has origins in any of the black racial groups
of Africa. Terms such as "Haitian" or "Negro" can be used
in addition to "Black or African American."
-C-
The capture/recapture technique uses two independent frames to estimate the
number of units missed on both frames. The first step is to match frames to
provide counts of units on one frame, but not the other; as well as a count
of units on both frames. With this information and several basic assumptions,
it is possible to estimate the number of units missed on both frames. In practice,
the two frames may not be completely independent; in which case, a number of
assumptions will be necessary to proceed with this type of estimation.
Classical test theory postulates that a test score can be decomposed into two
parts-a true score and an error component; that the error component is random
with a mean of zero and is uncorrelated with true scores; and that observed
scores are linearly related to true scores and error components.
Clustered samples are those in which a naturally occurring group is first selected,
such as a school or a residential block, and then units are sampled within the
selected groups.
Coarsening disclosure limitation techniques preserve the individual respondent's
data by reducing the level of detail used to report some variables. Examples
of this technique include: recoding continuous variables into intervals; recoding
categorical data into broader intervals; and top or bottom coding the ends of
continuous distributions.
Confidentiality involves the protection of individually identifiable data from
unauthorized disclosures.
Confidentiality edits are defined as edits that are applied to microdata for
the purpose of protecting data that will be released in tabular form. Confidentiality
edits are implemented using perturbation techniques. These techniques are used
to alter the responses in the microdata file before tabulations are produced.
Thus, all tables are protected in a consistent way. Because the perturbation
techniques that are used are designed to preserve the level of detail in the
microdata file, confidentiality edits maximize the information that can be provided
in tables, without requiring cell suppression or controlled rounding.
A consistent data series maintains comparability over time by keeping an item
fixed, or by incorporating appropriate adjustment methods in the event an item
is changed.
To be recognized as a Consolidated Metropolitan Statistical Area (CMSA) an area
must meet the requirements for recognition as an MSA, have a total population
of one million or more, and have: (1) separate component areas that can be identified
within the entire area by meeting specified statistical criteria, and (2) local
opinion that indicates support for the component areas.
Coverage refers to the extent to which all elements on a frame list are members
of the population, and to which every element in a population appears on the
frame list once and only once.
Coverage error refers to the discrepancy
between statistics calculated on the frame population and the same statistics
calculated on the target population. Undercoverage
errors occur when target population units are missed during frame construction,
and overcoverage errors occur
when units are duplicated or enumerated in error.
A crosswalk study delineates how categories from one classification system are
related to categories in a second classification system.
A cross-sectional sample survey is based on a representative sample of respondents
drawn from a population at one point in time.
Cross-sectional imputations are based on data from a single time period.
Cross-wave imputations are imputations based on data from multiple time periods.
For example, a cross-sectional imputation for a time 2 salary could simply be
a donor's time 2 salary. Alternatively, a cross-wave imputation could be the
change in a donor's salary from time 1 to time 2 multiplied by the time 1 nonrespondent's
salary.
A cut score is a specified point on a score scale such that scores at or above
that point are interpreted or acted upon differently from scores below that
point.
-D-
A Data Analysis System (DAS) is an analysis software system that generates tabular
estimates and correlation coefficients in a framework that allows external users
to analyze individually identifiable data without allowing the user direct access
to individual data records. Users are denied access to individual data records
because the data are not in a directly readable format. Additional safeguards
come through the use of population subsampling and differential weighting from
the sample design, as well as confidentiality edits. The degree of editing required
is a direct function of the capabilities of the DAS. As an example, a DAS that
provides weighted totals (i.e., a direct measure of population size) within
cells would require more confidentiality editing than one that does not provide
weighted cell totals, because there is a greater risk of disclosure in groups
with small population size.
Data swapping is a perturbation disclosure limitation technique that results
in a confidentiality edit. A simplistic example of data swapping would be to
assume a data file has two potential individual identifying variables, for example,
sex and age. If a sample case needs disclosure protection, it is paired with
another sampled case so that each element of the pair has the same age, but
different sexes. The data on these two records are then swapped. After the swapping,
anyone thinking they have identified either one of the paired cases gets the
data of the other case, so they have not made an accurate match and the data
have been protected.
DEFT is the square root of a design effect.
A derived score is a raw score converted by numerical transformation into a
new score providing a more meaningful and/or different measure (e.g., conversion
of raw scores to percentile ranks, standard scores, or grade equivalence).
The design effect (DEFF) is the ratio of the true variance of a statistic (taking
the complex sample design into account) to the ----variance of the statistic
for a simple random sample with the same number of cases. Design effects differ
for different subgroups and different statistics; no single design effect is
universally applicable to any given survey or analysis.
Differential Item Functioning (DIF) exists when examinees of equal ability differ
on an item solely because of their membership in a particular group.
Disability is a physical or mental impairment that substantially limits one
or more of the major life activities (42 U.S.C. 12102).
Disclosure risk analysis is used to determine which records require masking
to produce a public-use data file from a restricted-use data file.
Domain refers to a defined universe of knowledge, skills, abilities, attitudes,
interests, or other human characteristics.
Dual-frame estimation uses a dual-frame design to combine two frames in the
same survey to offer coverage rates that may exceed those of any single frame.
Sometimes the best available list is known to have poor coverage and there are
no known supplemental frames to provide sufficient coverage. For example, an
area frame could be used as the second frame.
-E-
Editing is a procedure that uses available information and some assumptions
to derive substitute values for inconsistent values in a data file.
Effect size refers to the standardized magnitude of the effect or the departure
from the null hypothesis. For example, the effect size may be the amount of
change over time, or the difference between two population means, divided by
the appropriate population standard deviation. Multiple measures of effect size
can be used (e.g., standardized differences between means, correlations, and
proportions).
The effective sample size, as used in the design phase, is the sample size under
a simple random sample design that is equivalent to the actual sample under
the complex sample design. In the case of complex sample designs, the actual
sample size is determined by multiplying the effective sample size by the anticipated
design effect.
Equating of two tests is established when examinees of every ability level and
from every population group can be indifferent about which of two tests they
take. Not only should they have the same expected mean score on each test, but
they should also have the same errors of measurement.
Estimation is the process of using sample data to provide a single best value
for a parameter (such as a mean, proportion, correlation, or effect size), or
to provide a range of values in the form of a confidence interval.
-F-
Fairness of a test is attained when construct-irrelevant personal characteristics
such as race, ethnicity, sex, or disability have no appreciable effect on test
results or their interpretation.
In a field test all or some of the survey procedures are tested on a small scale
that mirrors the planned full-scale implementation.
A frame is a mapping of the universe elements (i.e., sampling units) onto a
finite list (e.g., the population of schools on the day of the survey).
The frame population is the set of elements that can be enumerated prior to
the selection of a survey sample.
A freshened sample includes new cases added to a longitudinal sample plus the
retained cases from the longitudinal sample used to produce cross-sectional
estimates of the population at the time of a subsequent wave of a longitudinal
data collection.
-H-
The half-open interval technique is used to increase coverage. In this technique,
new in-scope units between a unit A on the previous frame up to, but not including,
unit B (the next unit on the previous frame) are associated with unit A. These
new units have the same selection probability as unit A's. This process is repeated
for every unit on the frame. The new units associated with the actual sample
cases are now included in the sample with their respective selection probabilities.
For example, in the case of freshening the sample, this technique may be applied
to a new list that includes cases that were covered in a previous frame, as
well as new in-scope units not included in the previous frame.
A Hispanic or Latino person is of Cuban, Mexican, Puerto Rican, Cuban, South
or Central American, or other Spanish culture or origin, regardless of race.
The term "Spanish origin" can be used in addition to "Hispanic
or Latino."
Hypothesis testing draws a conclusion about the tenability of a stated value
for a parameter. For example, sample data may be used to test whether an estimated
value of a parameter (such as the difference between two population means) is
sufficiently different from zero that the null hypothesis, designated H0 (no
difference in the population means), can be rejected in favor of the alternative
hypothesis, H1 (a difference between the two population means).
-I-
Imputation is a procedure that uses available information and some assumptions
to derive substitute values for missing values in a data file.
An Individualized Education Plan (IEP) refers
to a written statement for each individual with a disability that is
developed, reviewed, and revised in accordance with Title 42 U.S.C.
Section 1414(d).
Individually identifiable data refers specifically to data from any list, record,
response form, completed survey, or aggregation about an individual(s) from
which information about particular individuals or their schools/education institutions may be revealed by either direct
or indirect means.
Instrument refers to an evaluative device that includes tests, scales, and inventories
to measure a domain using standardized procedures.
Item nonresponse occurs when a respondent fails to respond to one or more relevant
item(s) on a survey.
Item Response Theory (IRT) postulates that the probability of correct responses
to a set of test questions is a function of true proficiency and of one or more
parameters specific to each test question.
-K-
Key variables include survey-specific items for which aggregate estimates are
commonly published by NCES. They include, but are not restricted to, variables
most commonly used in table row stubs. Key variables also include important
analytic composites and other policy-relevant variables that are essential elements
of the data collection. They are first defined in the initial planning stage
of a survey, but may be added to as the survey and resulting analyses develop.
For example, the National Assessment of Educational Progress (NAEP) consistently
uses gender, race-ethnicity, urbanicity, region, and school type (public/private)
as key reporting variables.
-L-
A Latino or Hispanic person is of Cuban, Mexican, Puerto Rican, Cuban, South
or Central American, or other Spanish culture or origin, regardless of race.
The term "Spanish origin" can be used in addition to "Hispanic
or Latino."
Linkage results from placing two or more tests on the same scale, so that scores
can be used interchangeably.
A longitudinal sample survey follows the experiences and outcomes over time
of a representative sample of respondents (i.e. a cohort) who are defined based
on a shared experience (e.g. shared birth year or grade in school).
-M-
Metadata contain information about the microdata.
Metropolitan Statistical Areas (MSAs) are those areas that: (1) include a city
of at least 50,000 population, or (2) include a Census Bureau-defined urbanized
area (of at least 50,000 population) with a total metropolitan population of
at least 100,000 (75,000 in New England). In addition to the county(ies) containing
the main city or urbanized area, an MSA may include additional counties that
have strong economic and social ties to the central county(ies) and meet specified
requirements of metropolitan character. The ties are determined chiefly by census
data on commuting to work. A metropolitan statistical area may contain more
than one city with a population of 50,000 and may cross state lines.
The minimum substantively significant effect (MSSE) is the smallest effect,
that is, the smallest departure from the null hypothesis, considered to be important
for the analysis of key variables. The minimum substantively significant effect
is determined during the design phase. For example, the planning document should
provide the minimum change in key variables or perhaps, the minimum correlation,
r, between two variables that the survey should be able to detect for a specified
population domain, or subdomain of analytic interest. The MSSE should be based
on a broad knowledge of the field, related theories, and supporting literature.
Multiplicity estimation is a technique used to adjust selection probabilities
when the unit of interest has multiple chances of being selected. For example,
in a random digit dialing household survey, households with multiple phone numbers
have a probability of being selected more than once. In this case by identifying
the number of distinct telephone numbers in a household, the sampling weights
can be adjusted to generate an unbiased household weight.
-N-
A Native Hawaiian or Other Pacific Islander person has origins in any of the
original peoples of Hawaii, Guam, Samoa, or other Pacific Islands.
New England County Metropolitan Areas (NECMAs) are county-based alternatives
to the city- and town-based metropolitan areas that are used in the rest of
the country. The NECMA for an MSA or CMSA includes: (1) the county containing
the city named first in that MSA/CMSA title (this county may include the cities
named first for other MSAs/CMSAs), and (2) each additional county having at
least half its population in the MSA/CMSA(s) whose cities that are listed first
are in the county identified in step 1. NECMAs are not defined for individual
PMSAs.
Noncoverage involves eligible units of the target population that are missing
from the frame population; this includes the problems of incomplete frames and
missing units.
Nonresponse bias occurs when the observed value deviates from the population
parameter due to differences between respondents and nonrespondents. Nonresponse
bias is likely to occur as a result of not obtaining 100 percent response from
the selected cases.
Nonsampling error includes measurement errors due to nonresponse, coverage,
interviewers, respondents, instruments, processing, and mode.
-O-
An Other Pacific Islander or Native Hawaiian person has origins in any of the
original peoples of Hawaii, Guam, Samoa, or other Pacific Islands.
Overall unit nonresponse reflects a combination of unit nonresponse across two
or more levels of data collection, where participation at the second stage of
data collection is conditional upon participation in the first stage of data
collection.
Overcoverage errors occur when units are duplicated or enumerated in error.
-P-
Perturbation disclosure limitation techniques directly alter the individual
respondent's data for some variables, but preserve the level of detail in all
variables included in the microdata file. Blanking and imputing for randomly
selected records; blurring (e.g., combining multiple records through some averaging
process into a single record); adding random noise; and data swapping or switching
(e.g., switching the sex variable from a predetermined pair of individuals)
are all examples of perturbation techniques.
In a pilot test a laboratory or a very small-scale test of a questionnaire or
procedure is conducted.
A planning document includes a justification for a study, a description of the
survey design and methodology, an analysis plan, a survey evaluation plan, and
a cost estimate.
The potential magnitude of nonresponse bias can be estimated by taking the product
of the nonresponse rate and the difference in values of a characteristic between
respondents and nonrespondents.
The power (1-b) of a test is defined as the probability of rejecting the null
hypothesis when a specific alternative hypothesis is assumed. For example, with
b = 0.20 for a particular alternative hypothesis, the power is 0.80, which means
that 80 percent of the time the test statistic will fall in the rejection region
if the parameter has the value specified by the alternative hypothesis.
Precision of survey results refers to how closely the results from a sample
can reproduce the results that would be obtained from a complete count (i.e.,
census) conducted using the same techniques. The difference between a sample
result and the result from a complete census taken under the same conditions
is known as the precision of the sample result.
A survey pretest involves experimenting
with different components of the questionnaire or survey design or operationalization
prior to full-scale implementation. This may involve pilot
testing, that is a laboratory or a very small-scale test of a questionnaire
or procedure, or a field test in
which all or some of the survey procedures are tested on a small scale that
mirrors the planned full-scale implementation.
A point estimate involves using the value of a particular sample statistic to
estimate the value for a parameter of interest.
Primary Metropolitan Statistical Areas (PMSAs) are then the component areas
of a CMSA. If no PMSAs are recognized, the entire area is designated an MSA.
The probability of selection is the probability that an element will be drawn
in a sample. In a simple random selection, this probability is the number drawn
in the sample divided by the number of elements on the sampling frame.
A public-use data file includes a subset of data that have been coded, aggregated,
or otherwise altered to mask individually identifiable information, and thus,
is available to all external users. Unique identifiers, geographic detail, and
other variables that cannot be suitably altered are not included in public-use
data files.
Public-use edits are based on an assumption that external users have access
to both individual respondent records and secondary data sources that include
data which could be used to identify respondents. For this reason, the editing
process is relatively extensive. When determining an appropriate masking process,
the public-use edit takes into account and guards against matches on common
variables from all known files that could be matched to the public-use file.
-R-
Raking is a method of adjusting sample estimates to known marginal totals from
an independent source. For a two-dimensional case, the procedure uses the sample
weights to proportionally adjust the weights to one set of marginals. Next,
these adjusted weights are proportionally adjusted to the second set of marginals.
This two-step adjustment process is repeated a number of times until the adjusted
sample weights converge simultaneously to both sets of marginals.
A random-digit dial sample survey randomly selects respondents based on a sample
of phone numbers and information obtained using a screener questionnaire.
The reference year is the year about which the data were collected.
The rejection region is defined by the alternative hypothesis H1 and the a level.
If the test statistic is in this region, the null hypothesis is rejected.
Reliability is the degree to which test scores for a group of test takers are
consistent over repeated applications of a measurement procedure and hence are
inferred to be dependable and repeatable for an individual test taker.
Replication methods are approximate variance methods that estimate the variance
based on the variability of estimates formed from subsamples of the full sample.
The subsamples are generated to properly reflect the variability due to the
sample design.
Required response items include the minimum set of items required for a case
to be considered a respondent.
Response rates calculated using base weights measure the proportion of the sample
frame that is represented by the responding units in each study.
A restricted-use data file includes individually identifiable information that
is confidential and protected by law. Restricted-use data files are not required
to include variables that have undergone coarsening disclosure risk edits.
-S-
Sampling error is the error associated with nonobservation, that is, the error
that occurs because all members of the frame population are not measured. It
is the error associated with the variation in samples drawn from the same frame
population. The variance equals the square of the sampling error.
Scaling refers to the process of assigning a scale score based on the pattern
of responses.
Scoring/rating is the process of evaluating the quality of the examinee's responses
to individual cognitive questions.
Section 504 of the Rehabilitation Act of 1973, as amended (Title 29 U.S.C. 794
Section 504), prohibits discrimination on the basis of handicap in federally
assisted programs and activities.
Simple comparison is a test (such as a t test or a z test), of the difference
between two means or proportions.
Simple Random Sampling (SRS) uses equal probability sampling with no strata
or clusters. Most statistical analysis software assumes SRS and independently
distributed errors.
Stage of data collection includes any stage or step in the sample identification
and data collection process in which data are collected from the identified
sample unit. This includes information obtained that is required to proceed
to the next stage of sample selection or data collection (e.g., school district
permission for schools to participate or schools providing lists of teachers
for sample selection of teachers).
Statistical disclosure limitation techniques are used to prepare microdata files
for release, included are perturbation techniques and coarsening techniques.
A statistical inference is a decision about one or more unknown or unobserved
population parameter(s) based on estimation and/or hypothesis testing.
Strata are created by partitioning the frame; and are generally defined to include
relatively homogeneous units within strata.
Substitutions are done using matched pairs, in which the alternate member of
the pair does not have an independent probability of selection.
A supplemental area frame can be created. This is often done by first, generating
a frame of geographic units where all the geographic units are represented providing
full geographic coverage. Next, a probability sample of the geographic units
is selected. An intensive search procedure is carried out in each selected area.
This generates a supplemental area frame for each selected area. Assuming no
error in the search process, the supplemental area frame has complete coverage
and the cases can be weighted to represent a national estimate. The data from
both the main list frame and the supplemental area frame are then combined so
that the weighted sample estimates provide complete coverage.
An individual survey is driven by one data collection form, such as the Private
School Survey or the Academic Library Survey.
A survey system is a set of individual surveys that are interrelated components
of a data collection, such as the Schools and Staffing Survey or the Integrated
Postsecondary Education Data System.
The survey year is the year in which the data were collected.
-T-
The tail of the sampling distribution of the test statistic contains the rejection
region for the hypothesis tested, H0.
The target population is the finite set of observable or measurable elements
(i.e., sampling units) that will be studied.
Taylor-series linearization is an approximate variance method in which an estimate
is linearized as a first step. The variance of the linearized estimate is then
computed using either an exact or approximate variance formula appropriate for
the sample design.
Total nonresponse reflects a combination of the overall unit nonresponse and
item nonresponse for a specific item.
Type I error is made when the tested hypothesis, H0, is falsely rejected when
in fact it is assumed true. The probability of making a Type I error is denoted
by alpha (a). For example, with an alpha level of 0.05, the analyst will conclude
that a difference is present in 5 percent of tests where the null hypothesis
is true.
Type II error is made when the null hypothesis, H0, is not rejected when in
fact a specific alternative hypothesis, H1, is assumed true. The probability
of making a Type II error is denoted by beta (b). For example, with a beta level
of 0.20, the analyst will conclude that no difference is present in 20 percent
of all cases in which the specific hypothesized alternative, H1, is true.
-U-
Undercoverage errors occur when target population units are missed during frame
construction.
Un-duplication involves the process of deleting units that are
erroneously in the frame more than once to correct for overcoverage.
Unit nonresponse occurs when a respondent fails to respond to all required response
items (i.e., fill out or return a data collection instrument).
A universe survey involves the collection of data covering all known units in
a population (i.e. a census).
-V-
Validity is the extent to which a test or set of operations measures what it
is supposed to measure. Validity refers to the appropriateness of inferences
from test scores or other forms of assessment.
Variance is the error associated with nonobservation, that is, the error that
occurs because all members of the frame population are not measured. It is the
error associated with the variation in samples drawn from the same frame population.
The variance equals the square root of the sampling error.
-W-
A wave is a round of data collection in a longitudinal survey (e.g., the base
year and each successive follow-up are each waves of data collection).
A White person has origins in any of the original peoples of Europe, the Middle
East, or North Africa.