**1990 Donnelly file. **A file that contains median household income by zip code for zip codes in the United States. The data were derived from the 1990 Census and were obtained from Donnelly Marketing Information Services.

**a sample. **In years when the NAEP assessment includes a field test or equating studies, samples are referred to as the A sample, the B sample, the C sample, etc. For these assessments, the A sample is the operational (national main and/or state) assessment.

**accommodation. **A change in how a test is presented, in how it is administered, or in how the test taker is allowed to respond. This term generally refers to changes that do not substantially alter what the test measures. The proper use of accommodations does not substantially change academic level or performance criteria. Appropriate accommodations are made to provide equal opportunity to demonstrate knowledge. The most frequently used accommodations in NAEP are large-print booklets, extended time in regular test sessions, reading questions aloud in regular sessions, small groups, one-on-one sessions, scribes or use of computers to record answers, bilingual booklets (mathematics assessment only), and bilingual dictionaries (not for the reading assessment). In NAEP, accommodations may be provided to certain students with disabilities (SD) and/or English language learners (ELL), as specified in the student's Individualized Education Program (IEP).

**achievement levels. **Performance standards set by the National Assessment Governing Board that provide a context for interpreting student performance on NAEP, based on recommendations from panels of educators and members of the public. The levels, *Basic*, *Proficient*, and *Advanced*, measure what students should know and be able to do at each grade assessed. See each NAEP subject for a detailed description of what students should know and be able to do at each level at grade 4, 8, or 12.

**achievement-level percentages. **The percentage of students within the total population, or in a particular student group, who meet or exceed expectations of what students should know and be able to do. Specifically, it is the weighted percentage of students with NAEP composite scores that are equal to, or exceed, the achievement-level cut scores specified by the National Assessment Governing Board.

**administration form. **Form used by field staff to prepare booklets, record attendance status, and assessment participation status.

**administration schedule (AS). **A form prepared for each session to be held in a cooperating school. The administration schedule serves as a student roster to be used to carry out the session that lists the students that are to be included in the assessment in that session.

**Advanced. **One of the three NAEP achievement levels, denoting superior performance at each grade assessed. See each NAEP subject for a detailed description of what students should know and be able to do at grade 4, 8, or 12 at the *Advanced* level. The cut scores determining each level are available with these descriptions.

**advanced math. **Used for the High School Transcript Study, advanced mathematics includes courses, other than calculus, that are generally taken after algebra II (e.g. AP statistics and precalculus).

**advanced science. **Used for the High School Transcript Study, advanced science are science courses that contain advanced content (like AP Biology, IB Chemistry, AP Physics, etc.) or are considered second-year courses (Chemistry 2, Advanced Biology, etc.). Students may take advanced science courses (like second-year chemistry) instead of physics.

**affiliation. **Affiliation indicates the type of organization that a private school is associated. Typically, this is a religious affiliation (e.g., Catholic, Lutheran, etc.), but can be nonsectarian, as well.

**Age-specific enrollment. **The number of students within a school who were born within a specific twelve-month period. Since 1970 these ages have been defined as follows:

Age 9: Students who were 9-years-old on December 31 of the calendar year before the assessment was conducted. The assessment takes place in the period from January to March.

Age 13: Students who were 13-years-old on December 31 of the calendar year in which the assessment is conducted. The assessment takes place in the period from October to December. Note that Age 13 assessments are conducted in the calendar year prior to those of Ages 9 and 17.

Age 17: Students who were 17-years-old on September 30 of the year of the assessment. The assessment takes place in the period from March to May.

**almanac. **A comprehensive collection of tables of NAEP results.

**alpha reliability. **A formula for estimating the internal consistency reliability of a measurement instrument. In NAEP assessments, this formula is used as an indicator of the average consistency between item scores and the block scores for the blocks in which these items appear. However, observed score reliability only provides a preliminary look into the data and does not serve as a decision-making statistic for a group-level assessment such as NAEP. In addition, it is important to note that most assessments are content balanced at the block level, meaning that all subscales are represented in each block, further reducing the expected reliability of any block.

**Alpha sample. **The Alpha sample is the name given to the fourth and eighth grade public school and student samples for the operational studies in a given NAEP year.

**assessment administrator (AA). **A trained proctor who administers the assessments in a particular session. Also called an exercise administrator.

**assessment coordinator (AC). **A trained contractor who administers the assessment and oversees all NAEP activities in a school.

**assessment session. **A group of students reporting for the administration of an assessment. Most schools conduct only one session, but some large schools conduct as many as 10 or more. Also referred to as a "session."

**Authorized Return Service. **A service offered by United Parcel Service, which provides for the easy return of UPS-compatible packages. Preprinted return labels are provided to the shipper by UPS; the shipper includes the label with an outbound shipment or distributes separately to its customers.

**background questionnaires. **The instruments used to collect information about student demographics and educational experiences.

**Basic. **One of the three NAEP achievement levels, denoting partial mastery of prerequisite knowledge and skills that are fundamental for proficient work at each grade assessed. NAEP also reports the proportion of students whose scores place them below the *Basic* achievement level. See each NAEP subject for a detailed description of what students should know and be able to do at grade 4, 8, or 12 at the *Basic* level. The cut scores determining each level are available with these descriptions.

**Beta sample. **The Beta sample is the name given to fourth and eighth grade public school and student samples for any studies other than the operational studies in a given NAEP year.

**bias. **In statistics, the difference between the expected value of an estimator and the population parameter being estimated. If the average value of the estimator across all possible samples (the estimator's expected value) equals the parameter being estimated, the estimator is said to be unbiased; otherwise, the estimator is biased.

**BIB (Balanced Incomplete Block) booklet design. **A complex variant of matrix sampling in which items are administered so that each pair of items is dispensed to a nationally representative sample of respondents in a specific pattern.

**biserial correlation coefficient. **The correlation between a dichotomous variable and a continuous variable obtained by hypothesizing the existence of a continuous "latent" variable underlying the dichotomous variable.

**block. **A group of assessment items created by dividing the item pool for an age or grade into subsets. Blocks are used in the implementation of the BIB spiral sample design.

**booklet. **The assessment instrument created by combining blocks of assessment items.

**booklet distribution map, booklet map, bookmap, bundle map. **A plan for assembling booklets into bundles for distribution to students.

**bridge study. **A study in which two randomly equivalent samples of students are selected. One sample is assessed using the existing design and the other using the modified design. The primary purpose of a bridge study is to maintain a constant scale for trend reporting.

**bundle. **A package of booklets delivered to a testing site for administration to students.

**Bureau of Indian Affairs. **Bureau of Indian Affairs provides education services to approximately 48,000 students of American Indian ethnicity.

**Bureau of Indian Education (BIE) . **An office within the Bureau of Indian Affairs with responsibility for providing quality educational opportunities for American Indians. The Bureau operates elementary and secondary schools for American Indians, funded by the federal government.

**calibrate. **To estimate the parameters of a set of items using responses of a sample of examinees.

**Carnegie unit. **The number of credits a student received for a course taken every day, one period per day, for a full school year; a factor used to standardize all credits indicated on transcripts across the study.

**categorized grade. **a categorical variable based on the most frequent grade classification (or 'modal grade') of students of a particular age. Categorized grade takes the value 'upper' for students whose grade level is at or higher than the modal grade for their age, and 'lower' for those students whose grade level is lower than the modal grade for their age.

**causal relationship. **A relationship between two variables in which changes in the value of one variable cause changes in the value of the other variable.

**Census division. **A grouping of states within a census geographic region, established by the Census Bureau for the presentation of census data. The nine divisions are intended to represent relatively homogeneous areas that are subdivisions of the four census geographic regions, and are as follows:
1. New England - Maine, New Hampshire, Vermont, Massachusetts, Rhode Island, Connecticut.
2. Mid Atlantic States - New York, New Jersey, Pennsylvania.
3. East North Central - Ohio, Indiana, Illinois, Michigan, Wisconsin.
4. West North Central - Missouri, Iowa, Minnesota, North Dakota, South Dakota, Nebraska, Kansas
5. South Atlantic - Delaware, Maryland, District of Columbia, Virginia, West Virginia, North Carolina, South Carolina, Georgia, Florida.
6. East South Central - Kentucky, Alabama, Tennessee, Mississippi.
7. West South Central - Louisiana, Arkansas, Texas, Oklahoma.
8. Mountain - New Mexico, Arizona, Colorado, Utah, Nevada, Montana, Idaho.
9. Pacific - California, Oregon, Washington, Alaska, Hawaii.

**certainty. **An entity included in a sample with certainty has a selection probability of one.

**certainty PSU. **A primary sampling unit (PSU) that is automatically included in the sample. Its selection probability is one.

**charter school. **A public charter school is a publicly funded school that, in accordance with an enabling state statute, has been granted a charter exempting it from selected state or local rules and regulations. A charter school may be newly created, or it may previously have been a public or private school; it is typically governed by a group or organization (e.g., a group of educators, a corporation, or a university) under a contract or charter with the state. In return for funding and autonomy, the charter school must meet accountability standards. A school's charter is reviewed (typically every 3 to 5 years) and can be revoked if guidelines on curriculum and management are not followed or the standards are not met.

**classical item analysis. **Analysis is performed on the test as a whole rather than on an item. Although item statistics can be generated, they apply only to that group of students on that collection of items.

**classical test statistics. **Counts, percentages, measures of item difficulty, and measures of item discrimination that are not based on Item Response Theory (IRT).

**classical test theory. **A set of measurement concepts that postulates that a test score can be decomposed into a true score and an error component; that the error component is random, has a mean of zero and is uncorrelated with true scores; and that observed scores are linearly related to true scores and error components.

**Classification of Secondary School Courses (CSSC). **A coding system employed for the purpose of standardizing High School Transcript Study (HSTS) transcripts. The CSSC, a modification of the Classification of Instructional Program (CIP) is used for classifying college courses and contains 2,268 course codes. (For more information see http://nces.ed.gov/surveys/hst/courses.asp).

**cluster item. **A pseudo-item formed by combining the responses to two or more actual items presented in an assessment. For instance, a cluster item composed of several multiple-choice items might be the number of those items to which the student responded correctly.

**cluster sampling. **The selection of sets (clusters) of units rather than individual units. In cluster sampling, survey population members are divided into unique, nonoverlapping groups prior to sampling. Clusters are often naturally occurring groups such as schools, or geographic units such as city blocks. Once clusters are randomly selected in the sample, all sampling units in each cluster are included in the sample. Clustered sampling usually decreases the precision of the statistics as compared to stratified sampling. On the other hand, clustering usually results in reducing survey costs. Specifically, transportation and training costs are substantially lessened, as all sampling units are surveyed in one location.

**clustering. **The process of forming sampling units as groups of other units.

**codebook. **A formatted printout of NAEP data for a particular sample of respondents.

**coefficient of variation (CV). **The ratio of the standard deviation of an estimate to the value of the estimate.

**Cohen's Kappa. **A statistical measure of inter-rater reliability. It is generally thought to be a more robust measure than simple percent agreement calculation since it takes into account the agreement occurring by chance.

**collapsed urban-centric locale. **City: Territory inside an urbanized area and inside a principal city.
Suburb: Territory outside a principal city and inside an urbanized area.
Town: Territory inside an urban cluster.
Rural: Census-defined rural territory outside an urban cluster.

**common block. **A group of background items included at the beginning of every assessment booklet.

**common calibration linking. **Linking Item Response Theory (IRT)
scales by calibrating responses to items on the scales together using items that are common to the scales to provide a connection between the scales.

**common population linking. **Linking scales by matching the distributions of scores on two different scales for a single group or for randomly equivalent groups of examinees.

**complex sample design. **A sample design that incorporates stratification, multistage sampling, and/or varying probabilities of selection. NAEP utilizes all of these. This term is contrasted with the term simple random sample design.

**composite scale. **An overall subject-area scale based on the weighted average of the scales that are used to summarize performance on the primary dimensions of the curricular framework for the subject-area assessment. For example, the mathematics composite scale is a weighted average of five content-area scales: number sense, properties, and operations; measurement; geometry and spatial sense; data analysis, statistics, and probability; and algebra and functions. These five scales correspond to the five content-area dimensions of the NAEP mathematics framework.

**conditional correlation coefficient. **An estimate of the correlation between subscale scores taking into account the population model of NAEP. See also marginal correlation.

**conditional probability. **Probability of an event happening, given the occurrence of another event.

**conditioning. **The process of imputation used in NAEP that allows plausible values to be drawn at random from a conditional distribution of a NAEP respondent, given his or her response to cognitive exercises and to a specific subset of background variables (conditioning variables).

**conditioning variables. **Demographic and other background variables characterizing a respondent. These variables are used to construct plausible values.

**confidence intervals. **A confidence interval is an interval on a scale (e.g., score scale, percent scale) that indicates where the true value (e.g., average, percentage) most likely is located. Because NAEP is a survey with an educational measurement component, there is only an estimate of the true value available. The interval indicates how far the estimate could be from the true value, or, how reliable the estimate is. A smaller interval indicates a more reliable estimate.

**confirmatory factor analysis. **A method that provides an explanation of the relationships among variables in terms of a smaller number of unobserved variables called factors, in which assumptions about the relationships among the variables are tested.

**consistent estimator. **An estimator that when calculated with data from the whole population will equal the value of what is being estimated.

**construct. **An abstract image, idea, or theory, especially a complex one, formed from a number of simpler observable elements. Constructs represent ideas constructed by scientists to help summarize a group of related phenomena or objects.

**constructed-response item. **A non-multiple-choice item that requires some type of written or oral response.

**contrasts. **Variables that define specific groups; most often these variables equal 1 when the group is one to which the student belongs and 0 otherwise.

**core academic courses. **A course type defined for the High School Transcript Study, core courses are English, mathematics, science, and social studies.

**Core Based Statistical Area (CBSA). **The 2000 Census standards provide that each CBSA must contain at least one urban area of 10,000 or more population. Each metropolitan statistical area must have at least one urbanized area of 50,000 or more inhabitants. Each micropolitan statistical area must have at least one urban cluster of at least 10,000 but less than 50,000 population.

**correlation. **A measure of the relation between two or more variables. Correlation coefficients can range from -1.00 to +1.00. The value of -1.00 represents a perfect negative correlation while a value of +1.00 represents a perfect positive correlation. A value of 0.00 represents a lack of correlation.

**course types. **The High School Transcript Study reports credits earned for three types of courses: core academic, other academic and, other courses.

**credits earned. **In the High School Transcript Study, course credits earned are converted to standardized Carnegie units, in which a single Carnegie unit is equal to 120 hours of classroom instruction over the course of a year. One Carnegie credit is often described as what a student earns for completing a one year course that meets 40 minutes per school day (assuming 180 school days in a school year). However, it should be noted that some courses may meet for one semester for 0.5 credits, while others may meet for 60 minutes each day for a year for 1.5 credits.

**CSV (comma-separated values). **A file format used as a portable representation of a database. Each line is one entry or record; the fields in the record are separated by commas. This format is often used to import data into spreadsheet software.

**Current Population Survey (CPS). **The Current Population Survey (CPS) has been conducted by the Bureau of the Census for more than 50 years. The CPS is the primary source of information on the labor force characteristics of the U.S. population.

**curriculum level. **For the purposes of the High School Transcript Study, three curriculum levels were defined: standard, midlevel, and rigorous. The curriculum levels are based on the number of credits and the types of courses graduates take.

This is a modified version of curriculum levels used by Laura Horn and Lawrence K. Kojaku (U.S. Department of Education. National Center for Education Statistics. High School Academic Curriculum and
the Persistence Path Through College, NCES 2001-163. Project Officer: C. Dennis Carroll. Washington, DC: 2001). This modification was made to ensure that HSTS data for earlier years are consistent with 2005.

**cut score. **The minimum score required for performance at each NAEP achievement level. NAEP cut scores are determined through a standard-setting process that convenes a cross-section of educators and interested citizens from across the nation. The group determines what students should know and be able to do relative to a body of content reflected in the framework. The National Assessment Governing Board then adopts a set of cut scores on the scale that defines the lower boundaries of *Basic*, *Proficient*, and *Advanced*.

**degrees of freedom (df) [of a variance estimator]. **The number of independent pieces of information used to generate a variance estimate.

**delta. **A measure of item difficulty, it is a non-linear transformation of the proportion of correct responses arranged to have a mean of 13 and a standard deviation of 4. It should range in value from 1 to 25. Conversely, the facility of an item is measured by the p+ value.

**Delta sample. **Sample of students enrolled in private schools in grades 4, 8, and 12.

**Department of Defense Dependents Schools (DoDDS). **(*see also* DoDEA). One of two distinct educational systems operated by the Department of Defense Education Activity (DoDEA). DoDDS provides comprehensive educational programs on military installations overseas.

**Department of Defense Domestic Dependent Elementary and Secondary Schools (DDESS). **(*see also* DoDEA). One of two distinct educational systems operated by the Department of Defense Education Activity (DoDEA). DDESS provides comprehensive educational programs on military installations located in seven states and Puerto Rico.

**Department of Defense Education Activity (DoDEA). **A civilian agency of the U.S. Department of Defense. It is divided into two separate but parallel systems: the Department of Defense Dependents Schools (DoDDS) overseas, and the Department of Defense Domestic Dependent Elementary and Secondary Schools (DDESS) in the United States.

**derived variables. **Student group data that were obtained through interpretation, classification, or calculation procedures rather than from assessment responses.

**design effects. **The ratio of the variance for the sample design to the variance for a simple random sample of the same size.

**dichotomous item. **In NAEP, a common multiple-choice item or an item that requires a constructed response from the student, with the response being subsequently scored in one of two categories, being either correct or incorrect.

**differential item functioning (DIF). **An item exhibits differential item functioning if the probability of doing well on the item depends on group membership, even after controlling for overall performance.

**difficulty. **In the NAEP Questions Tool, item or question difficulty is a measure of student performance on a question. Multiple-choice or constructed-response questions scored either right or wrong are rated "easy" if answered correctly by 60 percent or more of students, "medium" if answered correctly by 40 to 59 percent, or "hard" if answered correctly by fewer than 40 percent. For a
constructed-response question in which students could earn partial or complete credit, the percent correct is computed by adding the percent of students receiving full credit to a fraction of the percent of the students receiving partial credit. For example, some questions are scored correct, partial, or incorrect. If, for example, 16 percent of the students gave a fully correct answer on a question, and an additional 24 percent of the students gave a partial answer, the percent correct for this question would be computed as 16 + 1/2 (24) = 28. The partial results were weighted by 1/2 because there were two levels of credit for the question. Responses to a question with four levels of credit would receive weights of 1/4 (minimal), 1/2 (partial), and 3/4
(satisfactory).

**diocese. **A diocese is the territorial jurisdiction of a bishop. With respect to Catholic schools, a diocese functions as the central administrative office for schools within a defined jurisdictional unit of the Catholic Church (e.g., the Archdiocese of Washington).

**disadvantaged minority. **Used to define minority school status, historically disadvantaged minorities include Black students and Hispanic students.

**discriminant validity. **A type of construct validity where it is shown that assessment scores have a low correlation with other scores that should not be related to the assessment of interest; in contrast, convergent validity shows that assessment scores have a high correlation with other scores that should be related to the assessment of interest.

**distractor. **An incorrect response choice included in a multiple-choice item.

**district size. **A variable used in sampling schools which classifies schools into two groups: large districts and small districts. Large districts contain at least 20 percent of the jurisdiction's eligible grade enrollment. Small districts contain less than 20 percent of the jurisdiction's eligible grade enrollment.

**Educational Testing Service (ETS). **The item development, instruments, database, and data analysis contractor for NAEP.

**effect size. **A way to compare the scores for two groups or for the same group under two different conditions that takes into account how variable the scores are; the standardized mean difference is the most common example.

**e-file. **An Internet-based method by which states, districts, and schools may provide data files identifying eligible students to NAEP.

**e-filing. **An Internet-based method by which states, districts, and schools may provide data files identifying eligible students to NAEP.

**English language learners (ELL). **A term used to describe students who are in the process of acquiring English language skills and knowledge. Some schools refer to these students using the term limited-English-proficient (LEP). "Limited English Proficient" is also the terminology used in NAEP technical documentation prior to the 2005 NAEP assessment.

**estimation. **Process by which sample data are used to indicate the value of an unknown quantity in a population.

**ETS. **See Educational Testing Service.

**excluded student questionnaire. **An instrument completed for every student who was selected to participate but ultimately excluded from the assessment.

**excluded students. **Sampled students determined by the school to be unable to participate because they are English language learners (ELL) or have a disability.

**exercise administrator (EA). **Westat field staff member hired and trained to administer the assessment and to assist his or her NAEP supervisor with other NAEP activities in the school. As of 2003, this term is no longer used in Westat administrative materials. Also called an assessment administrator.

**expected value. **The average of the sample estimates given by an estimator across all possible samples. If the estimator is unbiased, then its expected value will equal the population value being estimated.

**explicit stratification. **Explicit stratification consists of building separate sampling frames, according to the set of explicit stratification variables under consideration; used for *categorical* variables. Contrast this with implicit stratification.

**factor analysis. **A procedure that provides an explanation of the relationships among variables in terms of a smaller number of unobserved variables called factors.

**Falcon System. **A computer software package for data entry that allows the user to define the data entry screens and output record layout in one step.

**field director. **Westat home office staff member who coordinates and oversees all aspects of NAEP field procedures.

**field manager. **Westat field staff member hired to coordinate all NAEP field activities with the State Departments of Education and the Westat home office staff.

**field supervisor. **Westat field staff member hired to manage assessment teams, to select the samples of students to be assessed, and to send NAEP materials to the participating schools.

**field test. **Items in NAEP mathematics and reading assessments at grades 4 and 8 go through two phases of pretesting: pilot testing and field testing. A field test is the second phase of pretesting and is given one year prior to the operational NAEP assessment. After the field test, the development for the assessment instruments for the following year is finalized. The instruments are then administered to a nationally representative sample of students, and the analytical steps for estimating the distribution parameters of items by population and reporting groups are conducted. NOTE: Previously, the term "field test" referred to the first phase of item pretesting in all NAEP subject-area assessments. However, beginning with the 2003 assessments, the term applies only to reading and mathematics. The phase of pretesting formerly referred to as a field test, beginning in 2003 and for all future assessments, will be referred to as the "pilot test." All items in NAEP assessments are pilot tested, but only reading and mathematics are field tested.

**finite population correction (fpc). **An adjustment to account for the added precision gained by sampling a large percentage of a population, which has the effect of narrowing the margin of error.

**focal group. **The group of students of interest in an analysis of differential item functioning; often this is a group of students considered to be in the minority.

**focused BIB spiraling. **A variation of BIB spiraling in which items are administered so that each pair of items within a subject area is dispensed to a nationally representative sample of respondents.

**focused booklet design. **A booklet design in which each booklet contains items pertaining to a single subject area.

**foils. **The correct and incorrect response choices included in a multiple-choice item.

**gamma sample. **A nationally representative sample of students enrolled in twelfth grade in public schools.

**gender. **NAEP results are reported separately for males and females, based on students' self-reported gender.

**grade enrollment. **The number of students within an assessed grade.

**grade point average (GPA). **GPA is used in the High School Transcript Study. Points are assigned to each letter grade as follows: A=4 points; B=3 points; C=2 points; D= 1 point; F= 0 points. The points are weighted by the number of Carnegie credits earned, so that a course with 120 hours of instruction counts twice as much as one with 60 hours. The average of the points earned for all the courses taken is the grade point average. Courses in which a graduate did not receive a grade, such as pass/fail and audited courses, do not factor into the GPA calculation. No additional grade points are assigned for Advanced Placement, International Baccalaureate, and other honors classes. This process does not standardize for differences in grading practices among schools and teachers.

**group effect. **The difference between the mean for a specific group and the mean for the nation.

**hierarchical stratification. **A form of stratification that uses multiple levels based on varying unit characteristics (e.g., Census region followed by school type). The nesting of lower-level strata within the higher-level strata may vary between higher-level strata in hierarchical stratification. For example, in one level of the higher-level stratum, there may be no lower-level strata and in another of the higher-level stratum, there may be numerous lower-level strata.

**high density schools. **Defined by the Office of Indian Education as those schools in which at least 25 percent of the students are American Indian or Alaska Native.

**High School and Beyond. **A longitudinal study following cohorts of 1980 high school students from which the 1982 High School Transcript Study (HSTS) sample was drawn. Samples for subsequent studies were drawn from the corresponding NAEP samples.

**High School Transcript Study (HSTS). **A periodic study developed by NCES to provide the Department of Education and other educational policymakers with information about current course offerings and students' course-taking patterns in the nation's secondary schools.

**hits. **The number of times a school was selected for NAEP. Large schools may be selected, or hit, more than once. For these schools, additional students are selected within the school. For example, if a school hit once had up to 30 students sampled, a school hit twice may have had up to 60 students sampled.

**HOT - Hands-On Tasks. **Hands-on tasks in which students manipulate selected physical objects and try to solve a scientific problem involving the objects. Hands-on tasks probe students' abilities to combine their science knowledge with the investigative skills reflective of the nature of science and inquiry.

**hotdeck. **An imputation method that replaces missing values in a data set with values from other records in the data set.

**hybrid MB1. **A subsample of the short-form mathematics sample to which the first and third blocks of the main market-basket form MB1 and the second block of the second main market-basket form MB2 were administered.

**hybrid MB2. **A subsample of the short-form mathematics sample to which the first and third blocks of the second main market-basket form MB2 and the second block of the main market-basket form MB1 were administered.

**ICT - Interactive Computer Tests. **Interactive computer tests that are delivered to students by computer. These tasks may include information search and analysis, empirical investigation, and simulation. The computer delivery affords measurement of science knowledge, processes, and skills not able to be assessed in other modes, such as performance of investigations that include observations of phenomena that would otherwise take a long time, modeling of phenomena on a very large scale or invisible to the naked eye, and research of extensive resource documents.

**image-based scoring system. **Constructed-response items are scanned, and all student responses to those items are captured and stored on a server until scorers have been trained. All responses on the server for each particular constructed-response item are sent to those specific trained scorers. Once that item is completed, the next constructed-response item is trained and scored.

**implicit stratification. **A method of achieving the benefits of stratification often used in conjunction with systematic sampling. The sampling frame is sorted with respect to one or more stratification variables but is not explicitly separated into distinct strata. Contrast this with explicit stratification. See also sort variable.

**imputation. **Prediction of a missing value based on some procedure, using a mathematical model in combination with available information. See also plausible values.

**imputed race/ethnicity. **The race or ethnicity of an assessed student as derived from his or her responses to particular common background items. A major NAEP reporting group for assessments prior to 2001.

**imputed values. **Values generated through imputation. In NAEP, the imputed values are called plausible values. See also imputation.

**independent samples. **Two samples are independent if the realization of the first sample does not affect how the second sample is drawn.

**Individualized Education Program (IEP). **A written statement for each individual with a disability that is developed, reviewed, and revised in accordance with Title 42 U.S.C. Section 1414(d).

**in-field sampling. **A student sample selection method performed by NAEP field staff when a school chooses not to electronically submit (E-File) information about eligible students. Field staff obtain hard copy listings of eligible students at the school during the preassessment visit, and select students for NAEP using a random number generator.

**intraclass correlation. **A ratio of the variance of interest over the sum of the variance of interest plus error. In NAEP, it is used to describe the accuracy of raters scoring student constructed item responses.

**item. **The basic scorable part of an assessment; a test question.

**item bias. **An item is biased if the probability of the student doing well on the item depends *not only on*

- what the examinee knows and can do and
- the characteristics of the item as reflected in the item parameters,

*but also on*

- a characteristic of the item that is unrelated to the construct being measured.

Item bias is shown through a statistical technique that updates the entries in a multiway frequency table, so that the resulting quantities in the cells conform to a new set of marginal figures, while preserving the higher-order associations present in the original table. See also bias.

**item map. **Item maps illustrate the knowledge and skills demonstrated by students performing at different scale scores on a given assessment. The item map provides concrete examples of what students at various achievement levels likely know and can do in a subject.

**item response function (IRF). **An equation or the plot of an equation that indicates the probability of an item response for different levels of the overall performance.

**Item Response Theory (IRT). **Test analysis procedures that assume a mathematical model for the probability that an examinee will respond correctly to a specific test question, given the examinee's overall performance and characteristics of the questions on the test.

**iterative proportional fitting. **A statistical technique that updates the entries in a multiway frequency table, so that the resulting quantities in the cells conform to a new set of marginal figures, while preserving the higher-order associations present in the original table.

**jackknife. **A replication method that estimates standard errors of percentages and other statistics. It is particularly suited to complex sample designs. In the jackknife, sample units are grouped into pairs (replicate groups). Portions of the sample (replicates) are formed by repeatedly omitting one half of the units in one of the replicate groups and calculating the desired statistic (replicate estimate). The number of replicate estimates is equal to the number of replicate groups. The variability among the replicate estimates is used to estimate the overall sampling variability.

**jurisdiction. **Any government-defined geographic area sampled in the NAEP assessment (e.g., a state, the District of Columbia, a United States territory, a Trial Urban District, the Department of Defense Domestic Dependent Elementary and Secondary Schools (DDESS), a subdivision within a state or county).

**jurisdiction-specific estimate. **A state's estimated change in inclusion of students with disabilities who are not English language learners based on the jurisdiction-specific model.

**key teacher. **The teacher or school staff member who is most knowledgeable about a student with a disability (SD) or English language learner (ELL).

**Keyfitz process. **A process used to minimize the overlap between two samples which share the same sampling frame (Sample 1 and Sample 2). Sample 1 is drawn first, and an adjustment is made to the measures of size of sampling units such that the conditional probability of selection is small or zero for selection for Sample 2 given that the unit was sampled in Sample 1. The unconditional probability of selection for each unit in Sample 2 (regardless of Sample 1) is as originally specified.

**large city. **Territory inside an urbanized area and inside a principal city with population of 250,000 or more. NAEP uses large city (formerly referred to as large central city) as a comparison group for the Trial Urban District Assessment (TUDA). In order to make comparisons between the TUDAs and large cities, the NAEP large city jurisdiction also includes those portions of the participating urban districts which fall outside of the city limits. Large city is not synonymous with the term inner city.

**limited English proficient (LEP). **A term used to describe students who are in the process of acquiring English language skills and knowledge. Some schools refer to these students using the term English language learners, or ELL. Beginning with the NAEP 2005 assessment, the terminology changed to "English language learners," or "ELL."

**linking form. **A group of items that are administered in order to put scores from different tests on the same scales. For example, a subsample of the short-form mathematics student sample was tested using a linking form created by combining one block from the main market-basket form with two main NAEP blocks for the purpose of putting the market-basket form on the same scale as the main NAEP mathematics assessment.

**locale. **NAEP results are reported for four mutually exclusive categories of school location: city, suburb, town, and rural. The categories are based on standard definitions established by the Federal Office of Management and Budget using population and geographic information from the U.S. Census Bureau. Schools are assigned to these categories in the NCES Common Core of Data based on their physical address. The classification system was revised for 2007 and 2009; therefore, trend comparisons to previous years are not available. The new locale codes are based on an address's proximity to an urbanized area (a densely settled core with densely settled surrounding areas). This is a change from the original system based on metropolitan statistical areas. To distinguish the two systems, the new system is referred to as "urban-centric locale codes." The urban-centric locale code system classifies territory into four major types: city, suburban, town, and rural. Each type has three subcategories. For city and suburb, these are gradations of size: large, midsize, and small. Towns and rural areas are further distinguished by their distance from an urbanized area. They can be characterized as fringe, distant, or remote. To see the full description of the urban-centric and the metro-centric locale codes, visit the CCD website.

**logistic regression model. **A regression model for binary (dichotomous) outcomes. The data are assumed to follow binomial distributions with probabilities that depend on the independent variables.

**longitudinal. **A sample survey that follows the experiences and outcomes over time of a representative sample of respondents (i.e. a cohort) who are defined based on a shared experience (e.g. shared birth year or grade in school).

**long-term trend. **NAEP assessments that are designed to give information on the changes in the basic achievement of America's youth. They are administered nationally and report student performance at ages 9, 13, and 17 in mathematics and reading. Measuring trends of student achievement or change over time requires the precise replication of past procedures. Therefore, the long-term trend instrument does not evolve based on changes in curricula or in educational practices.

**low density schools. **Defined by the Office of Indian Education as those schools in which fewer than 25 percent of the students are American Indian or Alaska Native.

**machine-readable catalog. **Computer-processing control information, Item Response Theory (IRT) parameters, foil codes, and labels in a computer-readable format.

**marginal correlation coefficient. **An estimate of the Pearson product-moment correlations between subscale scores. See also conditional correlation.

**marginal maximum likelihood methods. **Statistical estimation methods where the estimate of a parameter is selected to be the estimate that gives the greatest probability to the outcome that was actually observed. These are often iterative methods.

**market basket. **A collection of test questions representative of some larger content domain; an easily understood index to summarize performance on the items. NAEP conducted a market-basket special study during the 2000 assessment.

**matching criterion. **The score that is used for determining which students in a study of differential item functioning are similar to one another.

**matrix sampling. **Sampling plan in which different samples of respondents take different samples of items.

**maximum likelihood factor analysis. **Factor analysis in which the parameters of the factor model are estimated using maximum-likelihood methods; maximizing the likelihood function is a common statistical approach to estimating parameters.

**MB1. **The national main market-basket form. A subsample of the short-form mathematics sample where students were tested on new items from the 1999 mathematics field tests. The test questions were grouped into three blocks.

**MB2. **The second national main market-basket form. A subsample of the short-form mathematics sample where students were tested on secure items from the 1996 main NAEP mathematics assessment (also used again in 2000 main mathematics assessment). The test questions were grouped into three blocks.

**MCBS. **Mathematics computer-based study, a special study in Multi-Stage Testing, with the objective of investigating the use of principles of adaptive testing in the NAEP context. A sample of students are given an online assessment which adapts to their ability level. All of the items in the study are existing NAEP items.

**mean square error. **A quantity indicating the degree to which survey estimates differ from the population values. Mean square error is the variance plus the squared bias.

**Measure of size. **An auxiliary variable, known for each higher level sampling unit, and believed to be highly correlated with the number of population elements contained in the unit. In NAEP the population elements are eligible students. It is used in probability proportional to size (PPS) sampling, and specifically for the selection of higher level units in multistage sampling.

**metropolitan statistical area (MSA). **A geographical MSA is an area with a large population nucleus and its adjacent communities that have a high degree of social and economic integration with that nucleus. MSAs are defined in terms of entire counties, except in six New England states where they are defined in terms of cities and towns. An MSA has a city of at least 50,000 population or an urbanized area of at least 50,000 with a total metropolitan population of at least 100,000 (or 75,000 in New England).

**midlevel curriculum. **One of the curriculum levels defined for the High School Transcript Study—at least four credits of English; three each of social studies, mathematics (which includes geometry and Algebra I or II), and science (which includes two subjects among biology, chemistry and physics); and one credit of foreign language.

**minority school status. **A measure of the level of historically disadvantaged minority student groups being served by schools participating in the High School Transcript Study. Low minority schools have less than 5 percent disadvantaged minority students. Medium minority schools have 5 to 50 percent disadvantaged minority students. High minority schools have over 50 percent disadvantaged minority students.

**multiple matrix sampling. **Sampling plan in which different samples of respondents take different samples of items.

**multiple-choice item. **An item that consists of one or more introductory sentences followed by a list of response options that include the correct answer and several incorrect alternatives.

**multi-stage sample design. **Indicates more than one stage of sampling. The following is an example of three-stage sampling: (1) sample of counties (primary sampling units or PSUs), (2) sample of schools within each sample county, and (3) sample of students within each sample school.

**NAEP. **The National Assessment of Educational Progress (NAEP), also known as "the Nation's Report Card," is the only nationally representative and continuing assessment of what America's students know and can do in various subject areas. Since 1969, assessments have been conducted periodically in mathematics, reading, science, writing, U.S. history, geography, civics, the arts, and other subjects.

**NAEP region. **See the entry for region for more information. For regions used for the National Indian Education Study (NIES), see the entry for region (NIES).

**NAEP scales. **The scales common across age or grade levels and assessment years used to report NAEP results.

**NAEP State Coordinator (NSC). **Staff member of participating state Department of Education who works with field staff to coordinate all NAEP activities in the state. Full-time federally funded coordinators are also responsible for coordinating NAEP activities in their state including promoting understanding of NAEP and coordinating assessment administrations.

**NAEP/TIMMS Linking Study. **ETS will provide.

**NAIS. **See National Association of Independent Schools.

**National Assessment Governing Board. **Independent organization whose members are appointed by the U.S. Secretary of Education. The Governing Board provides overall policy direction to the NAEP program. It is an independent, bipartisan group whose members include governors, state legislators, local and state school officials, educators, business representatives, and members of the general public.

**National Association of Independent Schools (NAIS). **A membership organization that represents nearly 1,200 U.S. independent schools, including day, boarding, and day/boarding schools; elementary and secondary schools; and boys', girls', and coeducational schools.

**National Indian Educational Study (NIES). **The national Indian Educational Study is a two-part study designed to describe the condition of education for American Indian/Alaska Native students in the United States.

**national linking sample. **Prior to 2002, separate state samples and national samples were drawn for the NAEP state and national assessments. NAEP state scales were linked to the national scales using a common population linking procedure in which the mean and standard deviation of the aggregate state sample was matched to the mean and standard deviation of a national linking sample. This national linking sample (NLS) contained the subset of national data that was representative of the students from jurisdictions participating in the state assessments. Beginning in 2002, a combined state and national sample has been drawn for NAEP state and national assessments, and the establishment of a national linking sample is therefore no longer necessary.

**National School Lunch Program (NSLP). **A federally assisted meal program that provides low-cost or free lunches to eligible students. It is sometimes referred to as the free/reduced-price lunch program. Free lunches are offered to those students whose family incomes are at or below 130 percent of the poverty level; reduced-price lunches are offered to those students whose family incomes are between 130 percent and 185 percent of the poverty level.

**nation-based estimate. **A state's estimated change in inclusion of students with disabilities who are not English language learners based on the nation-based model.

**NCES Private School Survey (PSS). **A survey of private school information collected by the National Center for Education Statistics (NCES). Enrollment grade span and other data for individual private schools was aggregated into data for use in sampling schools and in preliminary session allocation.

**new enrollees. **New enrollees are students that enrolled in a school after the original list of students was created and the original sample was subsequently drawn.

**new school. **A new school is a school selected from the new school sampling frame, created to update the NAEP school frame to account for newly constructed or newly eligible schools not on the original NAEP school frame.

**newly identified students. **Newly identified students are students that were not on their school's original list from which the original sample was subsequently drawn.

**noncertainty. **Selected with a probability less than one.

**noncertainty PSU. **A primary sampling unit (PSU) that is selected with a probability less than one.

**nonresponse. **The failure to obtain responses or measurements for all sample elements.

**nonresponse adjustment class. **A set of units (e.g., schools or students) that are grouped together for the purpose of calculating nonresponse adjustments. The units are homogeneous with respect to certain unit characteristics, such as school size, location, public/private, student's age, sex, and student disability status.

**nonresponse class level. **A particular value of a unit characteristic that defines a nonresponse adjustment class.

**nonsampling error. **A general term applying to all sources of error, with the exception of sampling error. Includes errors from defects in the sampling frame, response or measurement errors, and mistakes in processing the data.

**nonsectarian. **Having no religious orientation or affiliation.

**not-reached item. **An item to which the student did not respond because the time limit was up for the section of the assessment on which s/he was working. After the first "not reached" item, the student will have no responses to any further questions on that section of the assessment.

**objective. **A desirable education goal accepted by scholars in the field, educators, and concerned laypersons and established through a consensus approach.

**observed race/ethnicity. **Race or ethnicity of an assessed student as perceived by the exercise administrator (EA).

**off track students. **Students who attend year-round schools, but are not in school at the time of the assessment. (In year-round schools, a certain percentage of students are "off" at any given time, on vacation, etc.)

**off-task response. **A response that is unrelated to the question being posed; differs from an incorrect response to the question or an omitted response.

**omitted response. **A missing response prior to the last observed response; this is considered an intentional behavior. This term is contrasted with the term not reached.

**on the fly. **Automatically generated in real time.

**online bundle assignment and distribution system. **A software program that uses the information from the quality-control scanning of the bundle barcodes, the supervisor list, and the distance from the distribution point to assign specific bundles to specific sessions/schools and supervisors for bulk distribution.

**other academic courses. **A course type defined for the High School Transcript Study, courses are fine arts, foreign languages, and computer-related studies.

**other courses. **A course type defined for the High School Transcript Study, other courses includes courses such as, vocational education, general skills, personal health and physical education, religion, military science and special education, etc.

**oversampling. **Deliberately sampling a portion of the population at a higher rate than the remainder of the population.

**p+. **The proportion of examinees who received a correct score on the item.

**parental education. **A NAEP reporting group defined by the highest level of education of the mother and father of an assessed student as derived from the student's response to two background questionnaire items.

**partially balanced incomplete block (pBIB) booklet design. **A complex variant of matrix sampling in which items are administered so that each pair of items is dispensed to a nationally representative sample of respondents in a less stringent pattern than that required for a BIB booklet design.

**Pearson. **The materials and scoring contractor for NAEP.

**Pearson product-moment correlation. **A statistical index that quantifies the degree of relationship between the two scores.

**percent correct. **The percent of a target population that would answer a particular exercise correctly.

**percentage of exact agreement. **A quantitative index of the degree of decision consistency. It is used to describe the accuracy of raters scoring constructed-response items.

**percentile. **A score location below which a specified percentage of the population falls. For example, in 1998, the tenth percentile of fourth-grade reading scores was 167. This means that in 1998, ten percent of fourth-graders had NAEP reading scores below 167, while 90 percent scored at or above 167.

**performance levels. **Reported as the percentages of students attaining specific levels of performance corresponding to five points on the NAEP long-term trend reading and mathematics scales (150, 200, 250, 300, and 350). The specific descriptions for each level reflect the types of questions students performing at that level were more likely to answer correctly than students at lower levels. The five performance levels in each subject are applicable for 9-, 13-, and 17-year olds, however, the likelihood of attaining higher performance levels is related to a student’s age.

**Phase Review System (PRS). **A quality control method that requires the user to document the Business Plan, the Financial Plan, the Operations Plan, and the Risks of the program. The documents generated are reviewed by management at specific times in the program's lifecycle. It is program team focused.

**pilot test. **A pretest of items to obtain information regarding clarity, difficulty levels, timing, feasibility, and special administrative situations. The pilot test is performed before revising and selecting the items to be used in the assessment, or in the case of math and reading at grades 4 and 8, before selecting items to be used in the field test.

**plausible values. **Proficiency estimates for an individual NAEP respondent, drawn at random from a conditional distribution of potential scale scores for all students in the sample who have similar characteristics and identical patterns of item responses. NAEP usually assigns five plausible values to each respondent. The plausible values are not test scores for individuals in the usual sense; they are offered only as intermediary computations for calculating summary statistics for groups of students.

**point-biserial correlation coefficient. **The correlation between a dichotomous variable and a continuous variable. The point biserial correlation is equal to the biserial correlation multiplied by a factor that depends only on the item difficulty.

**polyserial correlation coefficient. **The correlation between a categorical variable and a continuous variable obtained by hypothesizing the existence of a continuous "latent" variable underlying the categorized variable.

**polytomous item. **An item for which more than two possible responses, other than missing and
off-task, exist.

**population. **In the case of NAEP, the population of interest is the entire collection of American students in public or private schools at grades 4, 8, or 12 (or in the case of the long-term trend assessments, at ages 9, 13, and 17 years). The small samples of students that NAEP selects for the assessment permit inferences about academic performance to be made for all school students at the three grade or age levels.

**population-structure model. **A model that relates the scale scores in NAEP to the groups to which students belong.

**posterior distribution. **A distribution based on the probability of an event, such as a certain student response pattern, given an actual occurrence and an expectation of that occurrence.

**poststratification. **Classification and weighting to correspond to external values of selected sampling units by a set of strata definitions after the sample has been selected.

**power. **The probability of a study yielding a significant result if the research hypothesis is true.

**primary sampling unit (PSU). **The basic geographic sampling unit for NAEP; can be either a single county or a set of contiguous counties.

**prior distribution. **A distribution based on the probability of an event, such as a certain student response pattern, given only an expectation of an occurrence and not on an actual occurrence.

**probability proportional to size (PPS) sampling. **A sampling method in which the probability of selecting a unit is directly proportional to the unit's measure of size. For example, in NAEP, schools are selected with probabilities proportionate to estimated grade enrollment.

**probability sample. **A sample in which every element of the population has a known, nonzero probability of being selected.

**probe. **An assessment for which estimates are reported but which has a lower sample size because of either the experimental nature of the assessment or because of the low degree of teaching to the topic.

**Process Control System. **A computer software package that provides various services for the internal user. The software tracks the numbers of each booklet type by grade, lists the school and session materials that have been received, and records the status of the Alerts (problems) that were seen in the box receiving/opening process.

**Proficient. **One of the three NAEP achievement levels, representing solid academic performance for each grade assessed. Students reaching this level have demonstrated competency over challenging subject matter, including subject-matter knowledge, application of such knowledge to real-world situations, and analytical skills appropriate to the subject matter. See each NAEP subject for a detailed description of what students should know and be able to do at grade 4, 8, or 12 at the *Proficient* level. The cut scores determining each level are available with the descriptions.

**provisional scale. **A score scale that is used during analysis of NAEP data. Usually the mean is zero and the standard deviation is one for the data being analyzed. Provisional scales are transformed to the 0-300 or 0-500 NAEP reporting scales.

**pseudoreplicate. **The value of a statistic based on an altered sample. Used by the jackknife variance estimator.

**quality control monitor (QCM). **Prior to the advent of NAEP administration of all assessment components, the state component was administered by state personnel. Even so, certain assessment day activities were undertaken by NAEP field staff. Quality Control Monitors observed local assessment administrators as they conducted sessions.

**Quality Education Data (QED). **A survey of public school information from Quality Education Data, Inc. Enrollment grade span and other data for individual public schools was aggregated into data for use in sampling primary sampling units (PSUs)
and schools, and in preliminary session allocation.

**R2. **The non-accommodated reporting sample. It sampled students that have neither a student disability (SD) nor a limited English proficiency (LEP), plus SD/LEP students from sessions in which accommodations were not allowed.

**R2 reporting population. **The non-accommodated reporting population. The population represented by the R2 reporting sample.

**R3. **The accommodated reporting sample. It sampled students that have neither a student disability (SD) nor a limited English proficiency (LEP), plus SD/LEP students from sessions in which accommodations were allowed. The R3 sample is more inclusive and excludes a smaller proportion of sampled students.

**R3 reporting population. **The accommodated reporting population. The population represented by the R3 reporting sample.

**race, race/ethnicity. **In order to allow comparisons across years, assessment results presented are based on information for six mutually exclusive racial/ethnic categories: White, Black, Hispanic, Asian/Pacific Islander, American Indian (including Alaska Native), and Other. Students who identified with more than one of the first five categories or had a background other than the ones listed were categorized as Other. In all NAEP assessments, data about student race/ethnicity is collected from two sources: school records and student self-reports. Before 2002, NAEP used students' self-report of their race and ethnicity on a background questionnaire as the source of race/ethnicity data. In 2002, it was decided to change the student race/ethnicity variable highlighted in NAEP reports. Starting in 2002, NAEP reports of students' race and ethnicity are based on the school records, with students' self-report used only if school data are missing. Information based on student self-reported race/ethnicity will continue to be reported in the NAEP Data Explorer for assessments after 2001.

**random variable. **A variable that takes on any value of a specified set with a particular probability.

**rangefinding. **The process of looking at student responses (during the scoring of field tests or first year of an operational assessment) to find the range of student responses and to use those responses to building training sets for scoring. For NAEP, this is led by the scoring trainer with input by other staff and the subject-area standing committee.

**rater. **A person hired by the NAEP scoring contractor to score constructed-response items in a NAEP subject area (e.g., mathematics, reading, science). Prospective raters are required to have educational background and experience in that subject area. They also must pass NAEP scoring qualification set tests that ensure consistency of rating across responses and across scorers.

**reference group. **The group of students with which the focal group is compared in a study of differential item functioning; often this is the largest group of students.

**region. **One of four geographic areas used in gathering and reporting data, and a NAEP student group. Prior to 2003, the four regions were Northeast, Central, Southeast, and West. Beginning with the 2003 assessment, the National Assessment Governing Board changed the definitions of the four geographic regions used in reporting NAEP results to match those used by the Census: Northeast, South, Midwest, and West. The states composing the pre-2003 regions (defined by the Office of Business Economics, U.S. Department of Commerce) can be found in contemporaneous reports.

**region--NIES. **The National Indian Education Study (NIES) can report results for American Indian/Alaska Native students for five NIES-defined regions of the country: Atlantic, North Central, South Central, Mountain, and Pacific. These regions, which differ from those used in other NAEP reports, are based on U.S. Census divisions and are configured to align with the overall distribution of the AI/AN student population. The regional results are based on samples from students enrolled in all types of schools (public, private, Bureau of Indian Education, and Department of Defense) and reflect the combined samples from all of the states within each region. For regions used for NAEP reporting, see the entry for region.

**relative age. **a categorized variable based on the 'modal' age for a particular grade. The variable takes the value 'younger' for those students born later than the period that defines 'modal', and it takes the value 'older' for those students born earlier than the period that defines 'modal.'

**released item. **Test question that has been made available to the public. After each assessment, NCES releases nearly one-third of the questions. Released questions often serve as models for teachers who wish to develop their own classroom assessments.

**reliability. **Consistency of a set of measurements or of the measuring instrument. Because NAEP findings have an impact on the public's understanding of student academic achievement, precautions are taken to ensure the reliability of these findings. In its current legislation, as in previous legislative mandates, Congress has called for an ongoing evaluation of the assessment as a whole. In response to these legislative mandates, the National Center for Education Statistics (NCES) has established various panels of technical experts to study NAEP, and panels are formed periodically by NCES or external organizations, such as the National Academy of Sciences, to conduct evaluations. The Buros Center for Testing, in collaboration with the University of Massachusetts/Center for Educational Assessment and the University of Georgia, recently conducted an external evaluation of NAEP.

**replicate weights. **The weights used to compute the survey estimates for each portion of the sample (replicate) being kept when replication methods are used for variance estimation. For example, in the jackknife replication method, some sampled units have their weights doubled to account for the sampled units that were dropped.

**replication methods. **A class of methods for estimating sampling error in complex surveys in which the desired statistic is calculated repeatedly (replicate estimates) using different portions of the sample (replicates), and the variability among the replicate estimates is used to estimate the overall sampling variability. Specific replication methods, such as jackknife and balanced repeated replication (BRR), differ in the way in which the replicates are formed and the overall sampling variability is calculated.

**reporting group. **Groups within the national population for which NAEP data are reported; for example, gender, race/ethnicity, grade, age, level of parental education, region, and type of location.

**respondent. **A person who is eligible for NAEP, is in the sample, and responds by completing one or more items in an assessment booklet.

**response options. **In a multiple-choice question, alternatives that can be selected by a respondent.

**response propensity. **Response propensity is a measure of the likelihood that a school or student will participate in NAEP.

**Results of Contact form. **A form used to document discussions with each administrator concerning the district's and school's willingness to participate and any special circumstances.

**retrofitting. **The process of reassigning unused substitutes to sampled schools that did not get substitutes assigned in the first or second pass of substitution selection.

**rigorous curriculum. **One of the curriculum levels defined for the High School Transcript Study—at least four credits of English and mathematics (which includes precalculus or higher); and three each of social studies, science (which includes all three subjects of biology, chemistry and physics), and foreign language.

**row percentage. **In a table presentation (such as in the NAEP Data Explorer), the number of students represented in a particular cell of the table, divided by the number of students in the row of the table, converted to a percentage.

**rule of 5. **In NAEP, this rule states that statistics are suppressed if they are based on less than five primary sampling units (PSUs). In national samples before 2002, the PSUs were geographic areas. In state samples and in the national combined samples after 2002, the PSUs are schools. The rule serves two purposes in NAEP: (1) to avoid reporting results for groups for which highly unstable standard error estimates, and (2) to protect the privacy of respondents. Flagging (full suppression as in the rule of 62) based on this rule is sporadically encountered because other rules are generally more stringent. The first purpose is generally superseded by the coefficient of variation, while the second purpose is generally superseded by the rule of 62.

**rule of 62. **In NAEP, this rule states that statistics for a group are suppressed if they are based on less than 62 students. Statistics in this case are: means, standard errors, standard deviations, a set of percentiles, and a set of achievement-level percentages. The rule serves to assure a minimum power requirement to detect moderate differences at nominal significance level (0.05). The minimum power is 0.80 and the moderate effect size is 0.5 standard deviation units. A design effect of 2 is assumed to derive an appropriate complex sample standard deviation. The basic concept is to avoid reporting results for groups, about which little of interest could be said due to lack of power.

**S2. **A sample in which accommodations were not allowed.

**S3. **A sample in which accommodations were allowed.

**sample. **A subset of a population whose characteristics are studied to gain information about the entire population. NAEP assesses a representative sample of students each year, rather than the entire population of students.

**sample type. **A designation given to a sample to indicate which administration rules were used: either those that did not allow accommodations (S2) or those that did allow accommodations (S3).

**sampling error. **The error in survey estimates that occurs because only a sample of the population is observed. Measured by sampling standard error.

**sampling frame. **The list of sampling units from which the sample is selected.

**sampling variability. **The variability in survey estimates that occurs because only a sample of the population is observed. Measured by standard error.

**sampling weight. **A multiplicative factor equal to the reciprocal of the probability of a respondent being selected for assessment with adjustment for nonresponse and, perhaps, poststratification. The sum of the weights provides an estimate of the number of persons in the population represented by each respondent in the sample.

**sampling without replacement. **a sampling method in which a unit, once selected, is removed from the frame and cannot be selected a second time.

**scale score. **A score, derived from student responses to assessment items, that summarizes the overall level of performance attained by that student. While NAEP does not produce scale scores for individual students, NAEP does produce summary statistics describing scale scores for groups of students. NAEP subject area scales typically range from 0 to 500 (reading, mathematics, U.S. history, and geography) or from 0 to 300 (science, writing, and civics).

**scaling. **The process of assigning numbers to reflect students' performance on an assessment. In NAEP, scaling is based on Item Response Theory (IRT) and results in a scale score for each subject area that can be used to summarize levels of performance attained by particular groups of students.

**school control form. **A form used to gather information about each school including the name of the person designated to be the school coordinator, the number of students in the designated grade, and tentative dates for the sampling visit and assessment.

**School Control System (SCS). **Also called State Coordinator System. The web-based field management system used by NAEP State Coordinators (NSC) and field staff to report NAEP information to Westat concerning assessment activities. The system is used to obtain and/or provide information about sampled schools and contains up-to-date information on school recruitment and assessment scheduling at the schools.

**school coordinator. **A school coordinator is appointed by each school to be the primary contact for NAEP staff at a school. This person assists in making plans for the assessment at a school by notifying students and teachers and securing space for the assessment.

**School Debriefing Form. **A form used to provide a written record of the conduct of assessment session(s) in each school, including any problems that occurred during each session, the attitude of the school staff and students toward the NAEP assessment. This form is completed by supervisors.

**school questionnaire. **A questionnaire completed for each school by the principal or other official. It is used to gather information concerning school administration, staffing patterns, curriculum, and student services.

**score scale. **A scale used to describe what students know and can do. NAEP subject area scales typically range from 0 to 500 (reading, mathematics, history, and geography) or from 0 to 300 (science, writing, and civics).

**scoring guide. **(*see also* scoring rubrics). A guide used to score a response to a constructed-response item.

**scoring rubrics. **(*see also* scoring guide). Guides used to score responses to constructed-response items.

**SCS. **See School Control System and State Coordinator System.

**SD/LEP. **See Students with disabilities/limited English Proficient.
More recently in NAEP, LEP is termed English language learner (ELL).

**SD/LEP questionnaire. **A questionnaire completed for each selected student identified as a student with a disability (SD) and/or limited English proficient (LEP) by the school staff member most knowledgeable about the student.

**secondary-use data files. **Restricted-use data files containing respondent-level cognitive, demographic, and background data. They are available for use by researchers who have obtained a license from NCES and wish to perform analyses of NAEP data.

**second-order factor model. **A factor analysis model in which the variances and correlations, rather than assessment scores, are modeled.

**Section 504. **Section 504 of the Rehabilitation Act of 1973, as amended (Title 29 U.S.C. 794 Section 504), prohibits discrimination on the basis of handicap in federally assisted programs and activities.

**selection probability. **The chance that a particular sampling unit has of being selected in the sample.

**self-weighting sample. **A sample for which every member of the population has an equal probability of being selected.

**serpentine sorting. **A method of sorting in which records are ordered in an alternating ascending and descending pattern, so that any two consecutive records in the sorted file are more similar with respect to their values of the sort variables than in traditional sorting. This technique reduces the estimates of variance when replication methods, such as the jackknife method, are used.

**session. **A group of students reporting for the administration of an assessment. Most schools conduct only one session, but some large schools conduct as many as 10 or more. Also referred to as an assessment session.

**session assignment form (SAF). **A form generated for each cooperating school, that identifies the subjects to be administered and the line numbers on the student listing form (SLF) that identifies the sampled students to be included in the assessment of each subject.

**Session Debriefing Form. **A form used to provide a written record of the conduct of each individual session, including any problems occurred during the session. This form is completed by the person who administered the assessment session.

**session type. **A designation that indicates which NAEP subject or subjects were assessed during the given session.

**short form. **In the 2000 assessment, a mathematics market-basket special study was conducted using short forms; thus, the study is often referred to as the market-basket/short-form study.

**short shipment. **A box of additional assessment materials (e.g., additional ancillary materials) sent to a school during or shortly before the administration of the assessment. Short shipments are sent upon request of the assessment administration.

**significantly different, statistically significant, statistically significant difference. **Statistical tests are conducted to determine whether the changes or differences between two result numbers are statistically significant. The term "significant" does not imply a judgment about the absolute magnitude or educational relevance of changes in student performance. Rather, it is used to indicate that the observed changes are not likely to be associated with sampling and measurement error, but are statistically dependable population differences. NAEP uses widely accepted statistical standards in analyzing data. For instance, this website discusses only findings that are statistically significant at the .05 level. However, some differences that are statistically significant appear small, particularly in recent assessment years, when the sample sizes have been larger.

NOTE: Differences between scale scores or percentages are calculated using unrounded values. In some instances, the result of the subtraction differs from what would be obtained by subtracting the rounded values shown in the accompanying figure or table.

**simple random sample. **The process for selecting *n* sampling units from a population of *N* sampling units so that each sampling unit has an equal chance of being in the sample and every combination of *n* sampling units has the same chance of being in the sample chosen.

**socioeconomic status (SES). **A combination of social and economic factors that are used as an indicator of household income and/or opportunity. NAEP uses eligibility for the Department of Agriculture's National School Lunch Program (NSLP) as a measure of socioeconomic status.

**sort variable. **(*see also* implicit stratification). A particular school-level characteristic used to sort the schools, usually before sampling begins; sorting is used for continuous variables. Pass/fail or correct/incorrect indicators are not effective sort variables, because they include only two categories, and sort variables should be continuous.

**sparse state option. **The sparse state option, when utilized in State NAEP, is designed to reduce the school sample size in states (jurisdictions) whose expected school sample size is substantially larger than that of most states (defined by a cutoff value: 120 for example in State NAEP 2002). These over-large school sample sizes occur at a particular grade level when a jurisdiction has many schools with small enrollments at that grade. The option is extended to jurisdictions for grades in which the expected sample size exceeds the designated cutoff value. If the jurisdiction chooses to exercise this option, the target sample size of students is reduced to bring the school sample size down to the cutoff level.

**special mathematics assessment. **In 2013, representative samples of students in Puerto Rico at grades 4 and 8 participated in a special version of the NAEP mathematics assessment. In both 2011 and 2013, the regular operational sections of the mathematics assessment were augmented with special sections of mathematics questions to better measure — both more precisely and reliably — the full range of mathematical abilities. These sections were administered in both Puerto Rico and the United States. The special sections allowed the results for Puerto Rico to be placed on the NAEP scale with relatively small margins of error, and permitted meaningful comparisons with achievement in the mainland United States, as well as across the two years.
Fourth- and eighth-grade students in Puerto Rico also participated in NAEP mathematics assessments in 2003, 2005, and 2007. Because of issues such as relatively large portions of omitted responses and incorrect responses, results from these earlier assessments cannot be compared to 2011 and 2013.

**spiral cycle. **One systematic ordering of all the booklets in the spiral.

**spiral length. **The minimum number of booklets required for each booklet in the spiral cycle to appear once in each bundle position. This is achieved by consecutively repeating the spiral cycle horizontally (in the old bundle procedure) or vertically (in the new bundle procedure).

**spiraling. **A method to interleave booklets systematically so that when the booklets are handed out in the specified order, any group of students will receive predetermined proportions of different types of booklets.

**standard curriculum. **One of the curriculum levels defined for the High School Transcript Study—at least four credits of English; three each of social studies, mathematics, and science.

**standard deviation. **A measure of the dispersion of a set of scores. Specifically, it is the square root of the average squared deviation of scores about their arithmetic mean.

**standard error. **In NAEP, a measure of sampling variability and measurement error for a NAEP scale score. However, for other statistics, it reflects the sampling variability. Because of NAEP's complex student sampling design, sampling standard errors are estimated by jackknifing the samples from first-stage sample estimates. Standard errors may also include a component due to the error of measurement of individual scores estimated using plausible values.

**standardized mean difference. **The difference between the mean values for two groups, divided by the standard deviation.

**starting point. **The status of a state's inclusion rate (actual inclusion rate minus benchmark inclusion rate) in the first year over which change is measured.

**state coordinator. **See NAEP State Coordinator.

**State Supervisor. **Prior to the advent of NAEP administration of all assessment components, the state component was administered by state personnel. Even so, certain preassessment activities were undertaken by NAEP field staff. State supervisors hired, trained, and supervised quality control monitors, trained local assessment administrators, selected student samples, and conducted telephone follow-up interviews with school staff after assessments.

**status measure. **The status measure is a measure of a state's inclusiveness, relative to other states, at a given point in time, accounting for differences in the characteristics of each state's SD population. The status measure is the difference between a state's actual inclusion rate and its nation-based benchmark inclusion rate.

**stratification. **The division of a population into parts, or strata, each of which is more homogeneous than the population as a whole. If sample sizes for these strata are set proportional to the stratum share of the population, then the resultant sample will be more efficient than a simple random sample of the population disregarding the strata, as the simple random sample will have resultant sample sizes for each stratum that are randomly smaller or larger than the stratum share (too much in one stratum, too little in others, by chance).

**stratified sample. **A sample selected from a population that has been stratified, with a sample selected independently in each stratum. The strata are defined for the purpose of reducing sampling error.

**stratum. **A collection of sampled units defined by a characteristic. All sampling units belong to a stratum and the strata are mutually exclusive.

**student group. **Groups of the student population identified in terms of specific demographic or background characteristics. Some of the major student groups used for reporting NAEP results are those defined by students' gender, race or ethnicity, highest level of parental education, and type of school (public or nonpublic). Information gathered from NAEP background questionnaires also makes it possible to report results based on variables such as course-taking, home discussions of school work, and television-viewing habits. The High School Transcript Study uses these student groups in presenting results.

**student ID number. **A unique identification number assigned to each respondent to preserve his or her anonymity. NAEP does not record the names of any respondents.

**student listing form (SLF). **A form that is used by the school to list the students who are eligible to participate in a particular assessment.

**student sample. **A portion of a population, or a subset from a set of units, that is selected by some probability mechanism for the purpose of investigating the properties of the population.

**students with disabilities (SD). **A student with a disability may need specially designed instruction to meet his or her learning goals. A student with a disability will usually have an
Individualized Education Plan (IEP), which guides his or her special education instruction. Students with disabilities are often referred to as special education students and may be classified by their school as learning disabled (LD) or emotionally disturbed (ED). The goal of NAEP is that students who are capable of participating meaningfully in the assessment are assessed, but some students with disabilities selected by NAEP may not be able to participate, even with the accommodations provided.

**subgroups. **The term "subgroup" has been replaced by the term "student group."

**subject area. **One of the areas assessed by NAEP: the arts, civics, economics, foreign language, geography, mathematics, reading, science, U.S. history, world history, or writing.

**substitute school. **A substitute school is a school that takes the place of a refusing original school. A substitute school is treated as if it were the original school that it replaced.

**systematic sample (systematic random sample). **A sample selected by a systematic method; for example, units selected from a list at equally spaced intervals after a random start.

**taxonomy. **The classification of items into larger categories. In the High School Transcript Study (HSTS), the items are specific secondary school courses that are classified into broader groupings to define course content and level (e.g., AP English, Remedial Mathematics, Regular Science, and IB Social Studies).

**teacher questionnaire. **A questionnaire completed by selected teachers of sampled students. It is used to gather information concerning years of teaching experience, frequency of assignments, use of teaching materials, and availability and use of computers.

**technology-based assessment (TBA). **A NAEP special study designed to explore the use of technology, especially the use of the computer, as a tool to enhance the quality and efficiency of educational assessments.

**TEL. **Technological and engineering literacy computer-based assessment measures students' ability to use, understand, and evaluate technology as well as to understand technological principles and strategies needed to develop solutions and achieve goals. The assessment is completely computer-based and includes interactive scenario-based tasks.

**theta metric. **A standard normal scale that measures the latent ability estimate theta.

**Title I. **A federally funded assistance program for economically and educationally disadvantaged students. Title I refers to a section of Public Law 107-110 (and predecessor, P. L. 103-382), "Improving The Academic Achievement of The Disadvantaged." The Title I status of each participating student is indicated on the NAEP Assessment Administration form. In the Data Explorer (accessed by clicking on "Analyze Data" toward the top of any page of the NAEP website), NAEP began reporting Title I by aggregated student participation with the 2000 assessments. The data were collected before then (for Chapter 1 and its successor, Title I) but are reported in a non-comparable statistic due to changing criteria for qualification as a Title I school. Currently, students classified as Title I include those in schools offering targeted assistance to low-income children and also schools with high rates of low-income children that use Title I funds to support school wide programs.

**Title I Participation. **The variable Title I (TITLE1) is based on available school records. Students are classified as either currently participating in a Title I program or receiving Title I services, or as not receiving such services. The classification applies only to the school year when the assessment is administered and is not based on participation in previous years. If the school did not offer any Title I programs or services that year, all students in that school were classified as not participating.

**transcript. **A student's secondary school record containing courses taken, grades, graduation status, and attendance. In addition, it often includes assessments such as PSAT, SAT, ACT, and honors. Transcripts were used in the NAEP-related High School Transcript Study (HSTS).

**transformation. **An equation used to convert values on one score scale to values on another score scale.

**trend samples. **Study of change over time in a group selected to represent a larger population.

**Trial State Assessment Program. **A NAEP program authorized by Congress in 1988 and established to provide for a program of voluntary state-by-state assessments on a trial basis.

**Trial Urban District Assessment (TUDA). **NAEP began the urban school district assessment on a trial basis in 2002, in a few large urban districts in participating states. The purpose of the TUDA is to allow reporting of NAEP results for large urban school districts and to allow the NAEP program to evaluate the usefulness of NAEP data to cities of different sizes and demographic compositions.

**trimming. **A process by which extreme weights are reduced (trimmed) to diminish the effect of extreme values on estimates and estimated variances.

**type I error. **This is made when the tested hypothesis, H_{0}, is falsely rejected when, in fact, it is assumed true. The probability of making a Type I error is denoted by alpha (a). For example, with an alpha level of 0.05, the analyst will conclude that a difference is present in 5 percent of tests where the null hypothesis is true.

**type II error. **Is made when the null hypothesis, H_{0}, is not rejected when in fact a specific alternative hypothesis, H1, is assumed true. The probability of making a type II error is denoted by beta (b). For example, with a beta level of 0.20, the analyst will conclude that no difference is present in 20 percent of all cases in which the specific hypothesized alternative, H1, is true.

**type of location (TOL). **One of the NAEP student groups, dividing the communities in the nation into groups based on the proportion of the students living in each of three sizes and types of communities.

**United States territories. **The following territories, which are under the jurisdiction of the federal government of the United States, have been included in one or more NAEP administrations: American Samoa, Guam, Puerto Rico, U.S. Virgin Islands.

- City: Large: Territory inside an urbanized area and inside a principal city with population of 250,000 or more
- City: Midsize: Territory inside an urbanized area and inside a principal city with population less than 250,000 and greater than or equal to 100,000.
- City: Small: Territory inside an urbanized area and inside a principal city with population less than 100,000.
- Suburb: Large: Territory outside a principal city and inside an urbanized area with population of 250,000 or more.
- Suburb: Midsize: Territory outside a principal city and inside an urbanized area with population less than 250,000 and greater than or equal to 100,000.
- Suburb: Small: Territory outside a principal city and inside an urbanized area with population less than 100,000.
- Town: Fringe: Territory inside an urban cluster that is less than or equal to 10 miles from an urbanized area.
- Town: Distant: Territory inside an urban cluster that is more than 10 miles and less than or equal to 35 miles from an urbanized area.
- Town: Remote: Territory inside an urban cluster that is more than 35 miles of an urbanized area.
- Rural: Fringe: Census-defined rural territory that is less than or equal to 5 miles from an urbanized area, as well as rural territory that is less than or equal to 2.5 miles from an urban cluster.
- Rural: Distant: Census-defined rural territory that is more than 5 miles but less than or equal to 25 miles from an urbanized area, as well as rural territory that is more than 2.5 miles but less than or equal to 10 miles from an urban cluster.
- Rural: Remote: Census-defined rural territory that is more than 25 miles from an urbanized area and is also more than 10 miles from an urban cluster.

**variance. **One of several indices of variability that statisticians use to characterize the dispersion or spread among a list of numbers; the square of the standard deviation.

**vertical bundling. **A new bundling plan in which the spiral cycle is repeated vertically across all the bundles. The vertical scheme can be of any length. It is created based on a technique used in construction of a Youden Rectangle, and has the potential of balancing both booklet position and booklet pairings.

**Waiting for ETS to provide definition. **Waiting for ETS to provide definition

**WCBA. **Writing computer-based assessment, conducted for the first time in 2011, which measures students' ability to write using a computer. The assessment is designed to take advantage of many features of current digital technology, such as word processing software. The computer-based writing tasks are delivered in multimedia formats, such as short videos and audio

**weighted percentage. **A percentage that has been calculated by differentially weighting observations to account for complex sampling procedures. It differs from a simple percentage in which all cases are equally weighted.

In NAEP, each sampled student is assigned a weight that makes proper allowances for the sampling design and reflects adjustments for school and student nonparticipation.

Weighted percentages are estimates of the percentages of the total population, student group that share a specified characteristic. For example, the weighted percentage of fourth-grade students in the NAEP sample that correctly answered a particular NAEP test item is an estimate of the percentage of fourth-grade students in the nation that can correctly answer that question.

**Westat. **The supplier of customized sampling, data collection, and weighting procedures for NAEP.

**Work Flow Management System (WFM). **A computer software package that allows users to track NAEP materials as those materials are processed. Each session's materials are placed in a uniquely numbered batch that remains with the materials throughout processing and into the warehouse. During scanning, the individual student document is stamped so that each document can be located. Stations in WFM include Receipt Control, Data Preparation, Queue Control, Logging, Slitting, Scanning, Editing, Warehouse, and Scoring.

