Skip Navigation

National Postsecondary Student Aid Study (NPSAS)



4. SURVEY DESIGN

Target Population


The target population is defined as all eligible students enrolled at any time during the federal financial aid award year in postsecondary institutions in the United States or Puerto Rico3 that have a signed Title IV participation agreement with the U.S. Department of Education (thus making these institutions eligible for federal student aid programs). The population includes both students who receive aid and those who do not receive aid. It excludes students who are enrolled solely in a general equivalency diploma (GED) program or are concurrently enrolled in high school. 

Sample Design

The design for the NPSAS sample involves the selection of a nationally representative sample of postsecondary education institutions and students within these institutions. Prior to NPSAS:96, a geographic-area-clustered, three-stage sampling design was used to: (1) construct geographic areas from three-digit postal zip code areas; (2) sample institutions within the geographic sample areas; and (3) sample students within sample institutions. Beginning with NPSAS:96, the sample design eliminated the first stage of sampling (geographic area construction), thereby increasing the precision of the estimates. Institutional and student sample sizes vary somewhat from cycle to cycle depending on study design and budget considerations at the time. Approximately 2,000 institutions and 122,030 students were initially selected for participation in NPSAS:16.

Institution Sample. To be eligible for inclusion in the institution sample, an institution must satisfy the following conditions: (1) offer an education program designed for persons who have completed secondary education; (2) offer an academic, occupational, or vocational program of study lasting at least 3 months or 300 clock hours; (3) offer access to the general public; (4) offer more than just correspondence courses; (5) be located in the 50 states, the District of Columbia, or Puerto Rico4 ; and (6) be other than a U.S. Service Academy. Also, beginning with NPSAS:2000, eligible institutions must have a signed Title IV participation agreement with the U.S. Department of Education.

The institution-level sampling frame is constructed from the Integrated Postsecondary Education Data System (IPEDS) Institutional Characteristics (IC) and header files (see IPEDS chapter). Although the institutional sampling strata have varied across NPSAS administrations, in all years the strata are formed by classifying institutions according to control (public or private), level, and highest degree offering. The NPSAS:04 strata were also formed by Carnegie classification and state, and the NPSAS:08 strata were also formed by state. A stratified sample of institutions is then selected with probability proportional to size. School enrollment, as reported in the IPEDS, defines the measure of size; enrollment is imputed if missing in the IPEDS file. Institutions with expected frequencies of selection greater than unity are selected with certainty. The remainder of the institution sample is selected from the other institutions within each stratum. Although prior NPSAS administrations aggregated private for-profit 2-year and 4-year institutions into one sampling strata, NPSAS:12 split the two into separate strata to reflect the growth in enrollment in the for-profit sector.

Although prior NPSAS administrations aggregated public 4-year non-doctorate-granting institutions into one sampling strata, NPSAS:16 split it into two into separate strata: public 4-year institutions that were primarily subbaccalaureate and those that were primarily baccalaureate. The subbaccalaureate institutions were usually community colleges that predominantly awarded subbaccalaureate degrees while offering bachelor’s degrees in only a small number of select fields. Splitting the public 4-year, non-doctorate-granting institutions into two strata, rather than combining them, allowed for oversampling and controlling the sample size of the subbaccalaureate institutions and students in them, including the baccalaureate recipients.

Additional implicit stratification is accomplished within each institutional stratum by sorting the stratum sampling frame in a serpentine manner. Implicit stratification allows the approximation of proportional representation of institutions on additional measures.

NPSAS:16 sampled 2,000 institutions, and serves as the base-year survey for the B&B:16 cohort of baccalaureate recipients. NPSAS:16 categorized institutions into 11 strata based on institution level and control. Within each institution stratum for NPSAS:16, additional implicit stratification was accomplished by the following classifications: (1) Historically Black Colleges and Universities (HBCUs) status; (2) Hispanic Serving Institutions (HSIs) status; (3) INSTCAT (institution category derived using the level of offerings reported on the IPEDS Institutional Characteristics component and the number and level of awards that were reported on the IPEDS Completions component); (4) Carnegie classifications of degree-granting postsecondary institutions; (5) the Office of Business Economics Region from the IPEDS Header file (Bureau of Economic Analysis of the U.S. Department of Commerce Region); (6) state and system for states with large systems (e.g., the SUNY and CUNY systems in New York, the state and technical colleges in Georgia, and the California State University and University of California systems in California); and (7) the institution measure of size. This implicit stratification helped ensure that the sample was approximately proportional to the population for these measures.

NPSAS:12 sampled 1,690 of a universe of 7,050 institutions, and serves as the base-year survey the BPS:12 cohort of first-time beginning college students. NPSAS:12 categorized institutions into 10 strata based on institution level, control, and highest level of offering. Within each institution stratum, NPSAS statisticians accomplished additional implicit stratification by sorting the sampling frame within stratum by the following classifications: (1) historically Black colleges and universities (HBCU) indicator; (2) Hispanic-Serving Institutions (HSI) indicator; (3) Carnegie classifications of degree-granting postsecondary institutions; (4) 2-digit Classification of Instructional Programs code of the largest program for less-than-2-year institutions; (5) the Office of Business Economics Region from the IPEDS header file (Bureau of Economic Analysis of the U.S. Department of Commerce Region); (6) state and system for states with large systems, e.g., the SUNY and CUNY systems in New York, the state and technical colleges in Georgia, and the California State University and University of California systems in California; and (7) the institution measure of size.

In NPSAS:08, the implicit strata were formed using (1) Historically Black Colleges and Universities (HBCU) indicator; (2) Hispanic-Serving Institutions (HSI) indicator; (3) Carnegie classifications of postsecondary institutions; (4) the Office of Business Economics (OBE) Region from the IPEDS header file (Bureau of Economic Analysis of the U.S. Department of Commerce Region); and (5) an institution measure of size. Further implicit stratification was done for the State University of New York (SUNY) and City University of New York (CUNY) systems in New York, the state and technical colleges in Georgia, and the state universities in California.

The NPSAS:08 institution sampling frame was constructed from the 2004-05 IPEDS IC, header, and Fall Enrollment files and, because NPSAS:08 also serves as the base-year survey for a longitudinal cohort of baccalaureate recipients (i.e., B&B), the 2004-05 IPEDS Completions file. A total of 1,960 of the 6,780 institutions in the survey universe were selected for the NPSAS:08 sample. The sampled institutions were stratified into 22 national strata and 24 state strata based on institutional control, institutional offering, and highest degree offering.

In NPSAS:04, the implicit strata were formed using (1) the HBCU indicator; (2) Carnegie classifications (3) OBE Region; and (4) an institution measure of size. In NPSAS:2000, for less-than-2-year, 2-year, and private for-profit institutions, the implicit strata were formed using (1) institutional level of offering (where levels had been collapsed to form strata); (2) the OBE Region from the IPEDS header file; (3) the Federal Information Processing Standard (FIPS) state code; and (4) an institution measure of size. For public 4-year and private nonprofit 4-year institutions, the implicit strata were formed using (1) Carnegie classifications of institutions or groupings of Carnegie classifications; (2) the HBCU indicator; (3) the OBE Region from the IPEDS header file; and (4) an institution measure of size. In NPSAS:96, the implicit strata were formed using (1) institutional level of offering; (2) the IPEDS IC-listed U.S. Department of Commerce Region; and (3) an institution measure of size. Selected institutions are asked to verify their IPEDS classification (institutional control and highest level of offering) and the calendar system that they use (including dates that terms start).

The institutional sampling frame for NPSAS:04 was constructed from the 2000–01 IPEDS IC, header, and Fall Enrollment files; 1,670 of the 6,706 institutions in the survey universe were selected for NPSAS:04. The sampled institutions were stratified into 22 national strata and 36 state strata based on institutional control, institutional offering, highest degree offering, and Carnegie classification. The institutional sampling frame for NPSAS:2000 was constructed from the 1998–99 IPEDS IC file and, because NPSAS:2000 also served as the base-year survey for a B&B cohort, the 1996–97 IPEDS Completions file. Eligible institutions were partitioned into 22 institutional strata based on institutional control, highest level of offering, and percentage of baccalaureate degrees awarded in education. Approximately 1,100 institutions were initially selected for NPSAS:2000. As noted above, NPSAS:96 was the first administration of NPSAS to employ a single-stage institutional sampling design, no longer constructing geographic areas as the initial step.

Top

Student Sample. Full- and part-time students enrolled in academic or vocational courses or programs at eligible institutions, and not concurrently enrolled in a high school completion program, are eligible for inclusion in NPSAS. NPSAS:87 sampled students enrolled in the fall of 1986. Beginning with NPSAS:90, students enrolled at any time during the year were eligible for the study. This design change provided the data necessary to estimate full-year financial aid awards.

Sampled institutions are asked to provide student enrollment lists with the following information for each student: full name, identification number, Social Security number, educational level, an indication of first-time beginning student (FTB) status or baccalaureate recipiency (depending on the longitudinal cohort being launched), major, and, beginning with NPSAS:04, a local address, a local telephone number, a campus e-mail, a permanent address, a permanent phone number, and a permanent e-mail. Additionally, beginning with NPSAS:08, date of birth and class level of undergraduates are requested. The student sample is drawn from the enrollment lists, which were provided by 1,750 of 1,990 eligible institutions for NPSAS:16; 1,480 of 1,690 eligible institutions for NPSAS:12; 1,730 of 1,940 eligible institutions in NPSAS:08; 1,360 of 1,630 eligible institutions in NPSAS:04; 1,000 of the nearly 1,100 eligible institutions in NPSAS:2000; and 840 of 900 eligible institutions in NPSAS:96.

Basic student sample. Students are sampled on a flow basis (using stratified systematic sampling) from the lists provided by institutions. Steps are taken to eliminate both within- and cross-institution duplication of students. NPSAS classifies students by educational level as undergraduate, master’s, doctor’s, other graduate, or professional students. NPSAS:16 further classified students into 17 total strata based on program type, veteran status, and whether the student was a baccalaureate recipient. For the purpose of defining the third cohort of B&B, NPSAS:08 classified undergraduates into (1) business major potential baccalaureate recipients, (2) other potential baccalaureate recipients, and (3) other undergraduates. Potential baccalaureate recipients were further stratified by those who are science, technology, engineering, or mathematics (STEM) majors and all other majors and by SMART Grant recipients and non-recipients. Other undergraduates were further stratified by SMART Grant recipients, Academic Competitiveness Grant (ACG) recipients, and non-recipients. The categories for potential baccalaureate recipients and other undergraduates were then stratified by in-state and out-of-state status. NPSAS:04 stratified undergraduate students as (1) potential FTBs and (2) other undergraduates. These two categories were then stratified by in-state and out-of-state status. The FTBs in NPSAS:04 make up the third cohort of BPS. For the purpose of defining the second cohort of B&B, NPSAS:2000 also broke down undergraduate categories into: (1) business major baccalaureate recipients, (2) other baccalaureate recipients, and (3) other undergraduates. In NPSAS:96, FTBs, or students beginning their postsecondary education during one of the terms of the NPSAS:96 sample year composed the second cohort of the BPS, with the data collected serving as the base-year data for the subsequent longitudinal studies.

The student sample is allocated to the combined institutional and student strata (e.g., graduate students in public 4-year doctorate institutions). Initial student sampling rates are calculated for each sample institution using refined overall rates to approximate equal probabilities of selection within the institution-by-student sampling strata. These rates are sometimes modified to ensure that the desired student sample sizes are achieved.

For NPSAS:16, initial student sampling rates were calculated for each sample institution using sampling rates designed to generate approximately equal probabilities of selection within the institution-by-student sampling strata. In certain instances, NPSAS statisticians modified sampling rates as follows: 1) Student sampling rates were increased for each institution to yield at least 10 students (if possible) to ensure sufficient yield for variance estimation. 2) Student sampling rates were decreased, with few exceptions, if an institution sample size was greater than 300 students. 3) Student sampling rates were adjusted higher or lower based on expected yield calculations for institutions where the sample had not yet been selected. These adjustments to the initial sampling rates resulted in some additional variability in the student sampling rates and increased survey design effects. For NPSAS:16, the expected sample of students was 126,320, with 122,030 being achieved, of which 37,890 students were potential baccalaureate recipients and 22,950 students were graduate level.

In NPSAS:12, adjustments were also made to the initial sampling rates. For NPSAS:12, the targeted sample of students was 124,650, with 128,120 being achieved, of which 59,740 students were undergraduate FTBs and 17,330 students were graduate level.

Initial sampling rates were adjusted in NPSAS:08, NPSAS:04, NPSAS:2000, and NPSAS:96, as well. The overall sample yield in NPSAS:08 was close to expected (137,800 students vs. the target of 138,000). The student sample consisted of 29,470 potential baccalaureate recipients; 95,650 other undergraduates; 6,530 master’s students; 3,760 doctoral students; 470 other graduate students; and 1,920 first-professional students. The overall sample yield in NPSAS:04 was less than expected (109,210 students vs. the target of 121,680). The student sample consisted of 49,410 FTBs; 47,680 other undergraduates; 3,720 master’s students; 4,950 doctoral students; 1,660 other graduate students; and 1,790 first-professional students. (See “FTB sample” below for more detail on the sampling of FTBs.) In NPSAS:2000, the overall sample yield was very close to expected (70,230 students vs. the target of 70,270). The student sample consisted of 57,600 undergraduates; 5,960 master’s students; 3,950 doctoral students; 1,370 other graduate students; and 1,350 first-professional students. In NPSAS:96, the overall sample yield was actually greater than expected (63,620 students vs. the target of 59,510). The student sample consisted of 23,610 potential FTBs; 27,540 other undergraduates; 9,690 graduate students; and 2,780 first-professional students.

Student interview sample. NPSAS:04 was the first administration of NPSAS to offer the option of self-administration of the student interview via the Web, in addition to computer-assisted telephone interviewing (CATI). In NPSAS:16, there were about 77,030 completed full interviews (excludes 1,830 partial completes) with 39,020 completed by self-administration via web without telephone contact; 26,450 completed via self-administration via web with telephone contact; and the remaining 11,570 being completed via CATI. In NPSAS:12, there were approximately 85,000 completed interviews with 36,770 completed by self-administration (web without telephone contact); 31,710 completed via the web with telephone contact; and the remaining 14,820 being completed via telephone. In NPSAS:08, these procedures resulted in 95,360 completed interviews, about two-thirds of which were completed by self-administration and one-third by CATI. In NPSAS:04, these procedures resulted in 62,220 completed interviews, 28,710 of which were completed by self-administration and 33,510 by CATI.

In NPSAS:2000, student interviews were conducted primarily by CATI. To help reduce the level of nonresponse to CATI, computer-assisted personal interviewing (CAPI) procedures, using field interviewers, were used for the first time. Of the 66, 340 eligible students in the initial CATI sample, some 51,010 were located for CATI interviewing, while 11, 960 were “unlocatable” in CATI and were eligible for field locating and/or CAPI; the rest were either ineligible or excluded.

Due to budget limitations, NPSAS:96 attempted CATI interviews for only a subsample of the basic student sample. A two-phase, nonrespondent follow-up subsampling design was used to maximize the yield of completed student interviews obtained from the CATI subsample while achieving acceptable response rates. These procedures resulted in 51,200 students being selected for Phase 1 of the CATI interviewing. A sample of nonrespondents to Phase 1 was selected for Phase 2 with specified rates based on the outcome of the Phase 1 efforts and the seven sampling strata; 25,770 students were selected for Phase 2.

Parent interview subsample. In NPSAS:96, a subsample of students selected for the student interview was also designated for parent interviews. In the Phase 1 CATI subsample of NPSAS:96, students were designated for parent interviews if they met one of the following criteria: they were dependent undergraduate students not receiving federal aid; they were dependent undergraduate students receiving federal aid whose parents’ adjusted gross income was not available; or they were independent undergraduate students who were 24 or 25 years old on December 31, 1995. All 8,800 students who fell into one of these groups were sampled for parent interviews. The parent interview was discontinued after NPSAS:96.

Top

Longitudinal Study Samples. In NPSAS:90, a new longitudinal component collected baseline data for students who started their postsecondary education in the 1989–90 academic year. These students were followed over time in BPS, with the first follow-up in 1992. Beginning postsecondary students from NPSAS:96, NPSAS:04, and NPSAS:12 were also followed up and surveyed two and five years later. Similarly, NPSAS:93, NPSAS:2000, NPSAS:08, and NPSAS:16 provided baseline data for students who received baccalaureates in the 1992–93, 1999–2000, 2007–08, and 2015–16 academic years, respectively. These graduates have been followed over time as part of B&B.

BPS sample. To be eligible for BPS, students must have begun their postsecondary education for the first time, after completing high school, on or after July 1. NPSAS survey staff pay particular attention to accurately identifying FTBs in NPSAS to avoid unacceptably high rates of misclassification that were observed in past BPS studies, particularly false positives. High rates of misclassification can, and have, resulted in (1) excessive cohort loss, (2) excessive cost to “replenish” the sample, and (3) an inefficient sample design (excessive oversampling of “potential” FTBs) to compensate for anticipated misclassification error.

The participating institutions and several administrative data sources provided data to aid in properly classifying FTBs. Key data the institutions provided included an FTB indicator, high school graduation date, and date of birth. Administrative data sources, including the NSLDS, CPS, and National Student Clearinghouse (NSC), provided data that was of particular use in identifying false positives. Of the 719,450 students that the NPSAS staff sent to NSC for the NPSAS:12 data collection, about 7 percent were false positives.

B&B sample. The first B&B longitudinal cohort was identified in NPSAS:93 and consisted of students who received their bachelor’s degree in academic year 1992–93. NPSAS:93 provided the base-year data, and students were interviewed in an initial follow-up in 1994; this follow-up also included a collection of transcript data. The 1993 cohort was surveyed again in 1997 and 2003. The first transcript collection was conducted as part of B&B:93/94. The second B&B cohort was selected from NPSAS:2000, which became the base year for a single follow-up in spring 2001.

B&B:08 was the third cohort in the B&B series and the second to gather college transcript data on such a longitudinal sample. The B&B:08 sample consisted of students eligible to participate in the NPSAS:08 full-scale study who completed requirements for the bachelor’s degree in the 2007–08 academic year. The first follow-up study (B&B:08/09) involved two data collection components. First, postsecondary transcripts were collected from each of the NPSAS institutions where sample members completed their program requirements. It was followed by an interview focusing on plans after degree completion.

B&B:16 is the fourth cohort in the B&B series. The B&B:16 sample consisted of students eligible to participate in the NPSAS:16 full-scale study who completed requirements for a bachelor’s degree in the 2016–17 academic year. NCES contacted the B&B:16 cohort in 2017 for a follow-up survey and plans to do so again in 2020 and 2026.

B&B status is determined on the basis of multiple sources: student enrollment lists from institutions, student record collection, student interviewing, and transcripts (in B&B:93/94 and B&B:08/09) . 

Top

Data Collection and Processing

Reference Dates. Data are collected for the financial aid award year, which spans from July 1 of one year through June 30 of the following year.

Data Collection. NPSAS involves a multistage effort to collect information related to student aid. The first stage involves collecting applicants from the U.S. Department of Education’s Central Processing System (CPS).

Another stage of data collection involves collecting information from the student’s records at the school from which he or she was sampled. Since NPSAS:93, these data have been collected through a computerized system, which facilitates both the collection and transfer of information to subsequent electronic systems. To reduce respondent burden, several data elements are preloaded into the records collection system records prior to collection at the institution. These include student demographics, Student Aid Report (SAR) information on federal financial aid applicants, and nonfederal aid common to a particular institution. Institutional Coordinators are given the option of having their staff or contractor field data collectors perform the data collections. About 66 percent of the institutions in NPSAS:04, as well as 74 percent in NPSAS:2000, and 57 percent in NPSAS:96 chose self- administration, using a computer-based program to provide student record data. In NPSAS:08, very few institutions (about 1 percent) chose the field interviewer option for completion. Approximately 63 percent chose self- administration, and 36 percent provided the student record data via electronic files (primarily large institutions or systems).

NPSAS:12 used four modes for student record abstraction: (1) Case Mode, in which institution staff entered data directly into the web-based system one student at a time, either by section or by student; (2) Grid Mode, in which institution staff entered data directly into the web-based system for multiple students at a time in a format resembling a grid; (3) Template Upload, in which institution staff downloaded an Excel template, entered data into it, then uploaded it back to the website; and (4) Data Files Upload, in which institution staff created data files following provided specifications. For NPSAS:12, 39 percent of the institutions keyed data into the web-based student record application via Case Mode or Grid Mode, 37 percent uploaded the Excel Template, and about 24 percent used the Data File Upload.

For NPSAS:16, student records could be completed in three modes via the Postsecondary Data Portal (PDP), a web-based student records interface: (1) Web mode, in which institution staff used drop-down boxes and text-entry fields to key data directly on the PDP website, one student at a time; (2) Excel mode, in which institutions downloaded a preformatted Excel spreadsheet template from the PDP, keyed or copied student data into a spreadsheet template offline, and then uploaded the completed template to the PDP website; and (3) Comma-separated values (CSV) mode, in which institutions downloaded customized file specifications from the PDP website, prepared data files offline according to the file specifications, and then uploaded completed files to the PDP website. In NPSAS:16, most institutions opted for the Excel mode (62 percent), 30 percent uploaded a CSV file, and the remaining 8 percent used the Web mode and entered data directly into the PDP student records interface.

In the student interview stage of data collection, information on family characteristics, demographic characteristics, and educational and work experiences and aspirations is obtained from students. Student and parent paper questionnaires were used to collect this information in NPSAS:87, but beginning with NPSAS:90, student and parent data were collected by computer-assisted-telephone-interviewing (CATI). Parent interviews, however, were not conducted after NPSAS:96. NPSAS:04 was the first administration of NPSAS to offer students the opportunity to participate by self-administered web surveys or by CATI, an approach that has continued in subsequent NPSAS administrations (i.e., NPSAS:08, NPSAS:12, and NPSAS:16).

The NPSAS:08 student interview contained seven sections, and was programmed for both self-administered web surveys and CATI. An abbreviated interview was developed that contained a subset of key items from the main interview. This version was used during refusal conversion toward the end of data collection. The abbreviated interview was also translated into Spanish for telephone administration to Spanish speakers with limited English proficiency.

The student interview included an online coding system used to obtain IPEDS information for postsecondary institutions (other than the NPSAS institution from which the student was sampled) that the student attended during the same year. After the respondent or interviewer provided the state and city in which the institution is located, the online coding system displayed the list of all postsecondary institutions in that location, and the respondent/interviewer could select the appropriate institution. Upon selection, the name of the institution, as well as selected IPEDS variables (institutional level, control), was inserted into the database.

An assisted coding system was also developed to facilitate the coding of major/field of study into categories that can be mapped to values in NCES’s Classification of Instructional Programs (CIP).

The data collection design for student interviewers has evolved over time. In NPSAS:2000, student interviews were conducted primarily by telephone, and occasionally in person, using CATI/CAPI technology. In NPSAS:04 and NPSAS:08 abbreviated interviews were developed to convert refusals toward the end of data collection, and an online coding system was used, to obtain IPEDS information. NPSAS:96 differed from other cycles in that only a subsample of the initial student sample was selected for the interview stage (in order to reduce overall costs for the study).

The final stage of data collection involves retrieval of additional SAR data (for the academic year beyond the NPSAS year) from the Central Processing System (CPS), data on Pell Grant applications for the NPSAS year from the Pell Grant file, and data on recipients of Academic Competitiveness Grants and SMART Grant, as well as loan histories of applicants for federal student loans from the National Student Loan Data System (NSLDS). All of these files are maintained by the U.S. Department of Education. Additional data for the NPSAS sample are obtained from other sources as well, including test score data from the ACT and College Board (SAT), enrollment data from the National Student Clearinghouse (NSC), and data from the Veterans Benefits Administration (VBA).

Top

Editing. Initial editing takes place during data entry. The web-based data collection systems used for the student interview and student record collection have built-in quality control checks to notify users of invalid or out-of-range entries. For example, the student records collection system will notify the user of any student records that are incomplete (and the area of incompleteness) and any records that have not yet been accessed. A pop-up screen provides full and partial completion rates for institutional record collection. Data are subjected to edit checks for completeness of critical items.

Following the completion of data collection, all student record and interview data are edited to ensure adherence to range and consistency checks. Range checks are summarized in the variable descriptions contained in the data files. Inconsistencies, either between or within data sources, are resolved in the construction of derived variables. Items are checked for validity by comparing the student interview responses to information available in institutional records. Missing data codes characterize blank fields as don’t know/data not available; refused; legitimate skip; data source not available (not applicable to the student); or other.  

Top

Estimation Methods

Weighting is used to adjust NPSAS data to national population totals and to adjust for unit nonresponse. Imputation is used to compensate for item nonresponse and mitigate associated bias.

Weighting. For the purpose of obtaining nationally representative estimates, sample weights are created for both the institution and the student. Additional weighting adjustments, including nonresponse and poststratification adjustments, compensate for potential nonresponse bias and frame errors (differences between the survey population and the ideal target population). The weights are also adjusted for multiplicity at the institution and student levels and for unknown student eligibility.

In NPSAS:04 through NPSAS:16, the institution weight was computed first and then used as a component of the student weight. Student weights were calculated as the product of the total of 9 weight components for NPSAS:16; NPSAS:12 used 12 weight components, NPSAS:08 used 10 weight components, and NPSAS:04 used 13 weight components. Each represented either a probability of selection or a weight adjustment.

In NPSAS:2000, statistical analysis weights were computed for two sets of respondents: CATI respondents and other study respondents. These were calculated as the product of 13 weight components, again representing either a probability of selection or a weight adjustment.

In NPSAS:96, study weights were applied to students who responded to specified student record or CATI data items. Study and CATI weights were calculated as the product of 14 weight components. First-time beginning students (FTBs) whose first postsecondary institution was not the NPSAS sample institution were not included in BPS. To compensate for their exclusion, FTB weights were computed by making a final weighting class adjustment to the CATI weights by institution type. All adjustment factors were close to one, ranging from 1.00 to 1.02. The development of the student record weight components was similar to the development of the study and CATI weight components—except that the student record components applied to a different set of respondent data and did not include the CATI weight components.

Top

Imputation. When the editing process (including logical imputations) is complete, the remaining missing values for all variables with missing data are statistically imputed in order to reduce the bias of survey estimates caused by missing data. Variables are imputed using a weighted sequential hot-deck procedure whereby missing data are replaced with valid data from donor records that match the recipients with respect to the matching criteria.

For NPSAS:16, missing data were imputed for all variables. The imputation procedures involved a four-step process. In the first step, missing values were logically imputed. In the second step, variables and groups of variables were prioritized for imputation based upon their level of missing data; those with low levels of missingness were imputed before those with greater levels of missingness. In the third step, an initial weighted sequential hot deck (WSHD) process was implemented. Finally, in the fourth step, a cyclic n-partition hot deck process was implemented to iteratively cycle through n-partition hot decks. For NPSAS:12, missing data were imputed for all variables included in the restricted-use derived file. After replacing missing data in those cases where values could be deduced with certainty based upon logical relationships among observed variables, the weighted sequential hot deck (WSHD) method was used to replace missing data by imputing plausible values from statistically selected donor cases.

In NPSAS:08 and NPSAS:04, variables requiring imputation were not imputed simultaneously. However, some variables that were related substantively were grouped together into blocks, and the variables within a block were imputed simultaneously. Basic demographic variables were imputed first using variables with full information to determine the matching criteria. The order in which variables were imputed was also determined to some extent by the substantive nature of the variables. For example, basic demographics (such as age) were imputed first and these were used to process education variables (such as student level and enrollment intensity), which, in turn, were used to impute financial aid variables (such as aid receipt and loan amounts).

For variables with less than 5 percent missing data, the variables used for matching criteria were selected based on prior knowledge about the dataset and the known relationships between the variables. For variables with more than 5 percent missing data, a statistical process called Chi-Squared Automatic Interaction Detection (CHAID) was used to identify the matching criteria that were most closely related to the variables being imputed.

In NPSAS:2000, the remaining missing values for 23 analysis variables were imputed statistically; most of the variables were imputed using a weighted hot-deck procedure. To implement the weighted hot-deck procedure, imputation classes and sorting variables relevant to each item being imputed were defined. If more than one sorting variable was chosen, a serpentine sort was performed where the direction of the sort (ascending or descending) changed each time the value of a variable changed. The serpentine sort minimized the change in the student characteristics every time one of the variables changed its value.

The respondent data for five of the items being imputed were modeled using a CHAID analysis to determine the imputation classes. These items were parent income (imputed for dependent students only), student income (imputed for independent students only), student marital status, local residence, and a dependents indicator.

A CHAID analysis was performed on these variables because of their importance to the study and the large number of candidate variables available with which to form imputation classes. Also, for the income variables, trying to define the best possible imputation classes was important due to the large amount of missing data. The CHAID analysis divided the respondent data for each of these five items into segments that differed with respect to the item being imputed. The segmentation process first divided the data into groups based on categories of the most significant predictor of the item being imputed. It then split each of these groups into smaller subgroups based on other predictor variables. It also merged categories of a variable that were found insignificant. This splitting and merging process continued until no more statistically significant predictors were found (or until some other stopping rule was met). The imputation classes were then defined from the final CHAID segments.

In NPSAS:96, some 22 analysis variables were statistically imputed. All variables, with the exception of the estimated family contribution were imputed using a weighted hot-deck procedure. First, the respondent data for six key items were modeled using a CHAID analysis to determine the imputation classes. These items were race/ethnicity, parent income (for dependent students only), student income, student marital status, a dependents indicator, and number of dependents. Then, 21 items imputed by the weighted hot-deck approach. The remaining 15 items were: parent family size, parent marital status, student citizenship, student gender, student age, dependency status, local residence, type of high school degree, high school graduation year, fall enrollment indicator, attendance intensity in fall term, student level in last term, student level in first term, degree program in last term, and degree program in first term. Only four of these 15 items had more than 5 percent of their cases imputed: parent family size (18 percent), parent marital status (16 percent), high school degree (5 percent), and high school graduation year (5 percent).   

Top

Recent Changes

In prior NPSAS administrations, federal student loans older than 10 years as of the beginning of the study were excluded from cumulative borrowing and outstanding loan amount variables. In NPSAS:16, this was changed so that loans older than 10 years were included in these variables. As a result, cumulative borrowing estimates in NPSAS:16, especially for older student subpopulations, may differ from estimates for prior NPSAS administrations, with prior studies underestimating these amounts.

Prior to NPSAS:16, certain state grants that were administered by institutions (similar to how federal campus-based aid is administered) were classified as institutional grants. Because federal campus-based aid programs are classified by the source of funds and not by who administers the aid, this practice was changed in NPSAS:16, where campus-based state grants are now classified as state aid. This change in methodology mainly affects the aid of undergraduate students attending public institutions in California but led to larger population estimates of state grants and smaller estimates of institutional grants in NPSAS:16 compared with past NPSAS studies. To promote the analysis of trends in state and institutional aid over time, new state and institutional aid variables were added to the undergraduate files of NPSAS:96, NPSAS:2000, NPSAS:04, NPSAS:08, and NPSAS:12. These variables remove campus-based state grants from institutional grants and add them to state grant variables to be comparable with the NPSAS:16 methodology.

In NPSAS:16, an administrative data match to VBA databases was conducted to obtain information on sampled students’ receipt of federal veterans’ education benefits and their military service. The VBA data was the sole source for federal veterans’ education benefits amounts, and they include payments for tuition and fees, books and supplies, work-study, housing, and other education expenses. Estimates of federal veterans’ education benefits in prior NPSAS cycles were derived from self-reported amounts, amounts reported by the recipient’s NPSAS institution, and stochastic imputation and were significantly lower on average than amounts in NPSAS:16. These earlier values may not include all the benefits included in the VBA data, particularly housing benefits, which were not explicitly requested from students or their institutions.

For NPSAS:12 and NPSAS:16, sample members were classified as a study member if data were available for him or her on a set of key variables, and these study members are the unit of analysis for those collections.

Changes to the NPSAS:12 student interview included core data elements used in previous NPSAS student interviews as well as new data elements developed in association with a redesign of the BPS longitudinal follow-up study. Additionally, 20 newly eligible institutions were included in the sample, using newly available 2009–10 IPEDS IC header, 12-Month and Fall Enrollment, and Completions components to create an updated sampling frame of current NPSAS-eligible institutions5.

NPSAS:04 included important new features in sample design and data collection. For the 2004 study, NPSAS and NSOPF were conducted together under one contract: the 2004 National Study of Faculty and Students (NSoFaS:04). There has historically been a great deal of overlap in the institution samples for these two studies since the target populations for both involve postsecondary institutions. To minimize institutional burden, and to maximize efficiency in data collection procedures, the two studies were combined.

Another important change in NPSAS:04 was that it was designed to provide state-level representative estimates for undergraduate students within three institutional strata—public 2-year institutions, public 4-year institutions, and private nonprofit 4-year institutions—in 12 states that were categorized into three groups based on population size (four large, four medium, and four small): California, Connecticut, Delaware, Georgia, Illinois, Indiana, Minnesota, Nebraska, New York, Oregon, Tennessee, and Texas. NPSAS:08 was designed to provide state-level representative estimates for undergraduates within four institutional strata—public 2-year institutions, public 4-year institutions, private nonprofit 4-year institutions, and private for-profit degree-granting 2-year-or-more institutions. In NPSAS:08, state-level estimates were provided for California, Texas, New York, Illinois, Georgia, and Minnesota.

Also of importance is the inclusion of an option for self-administration via the Web of the student interview in NPSAS:04. This option was provided in addition to CATI interviews, which were employed in past rounds of NPSAS. Regardless of completion mode, a single web-based instrument was employed.

NPSAS:08 was again conducted independently of the NSOPF study but carried along all of the technical innovations and design enhancements of prior rounds. It was also designed to provide state-level representative estimates for undergraduates within four institutional strata—public 2-year institutions, public 4-year institutions, private nonprofit 4-year institutions, and private for-profit degree-granting 2-year-or-more institutions. In NPSAS:08, state-level estimates were provided for California, Texas, New York, Illinois, Georgia, and Minnesota. The most significant enhancement to NPSAS:2000 involved the development and implementation of a new web-based system for use in the student record abstraction process. This web-based software had an improved user interface compared to the NPSAS:96 system and addressed several of the student records collection issues raised during NPSAS:96 (e.g., insufficient computer memory, failures during diskette installation and virus scanning, and lack of information regarding institutions’ progress during data collection).

Other changes in NPSAS:2000 included: adding a series of questions about financial aid, as a new way of obtaining information about financial assistance received from sources other than federal student aid; adding several new items intended to capture the increased use of technology among students; and adding a new eligibility requirement for postsecondary institutions-to have a signed Title IV participation agreement with the U.S. Department of Education during the NPSAS academic year.

NPSAS:96 introduced important new features in sample design and data collection. It was the first NPSAS to employ a single-stage institutional sampling design (no longer using an initial sample of geographic areas and institutions within geographic areas). This design change increased the precision of study estimates. NPSAS:96 was also the only NPSAS to select a subsample of students for telephone interviews and to take full advantage of administrative data files. Through file matching/downloading arrangements with the Department of Education’s Central Processing System, the study obtained financial data on federal aid applicants for both the NPSAS year and the following year. Through similar arrangements with the National Student Loan Data System, full loan histories were obtained. Cost efficiencies were introduced through a dynamic two-phase sampling of students for CATI, and the quality of collected institutional data was improved through an enhanced student records collection procedure. New procedures were also introduced to broaden the base of postsecondary student types for whom telephone interview data could be collected: the use of Telephone Display for the Deaf technology to facilitate telephone communications with hearing-impaired students, and a separate Spanish translation interview for administration to students with limited English language proficiency.

Future Plans

The next NPSAS data collection (NPSAS:18-AC) is scheduled for the 2017–18 academic year. Future NPSAS collections will continue to include a student interview every four years (NPSAS:16, NPSAS:20, NPSAS:24) to yield nationally representative data. In alternating cycles, an Administrative Collection (NPSAS:18-AC, NPSAS:22-AC, and NPSAS:26-AC) will be conducted in which only administrative data from the Department’s data systems and institutional student records will be compiled to yield state-representative data.

3Puerto Rico was not included in the 1987 and 2012 administrations of NPSAS.
4Puerto Rico was not included in the 2012 administration of NPSAS.
5Puerto Rico was not included in the 2012 administration of NPSAS.

Top