Skip Navigation

National Postsecondary Student Aid Study (NPSAS)



5. DATA QUALITY AND COMPARABILITY

Every major component of the study is evaluated on an ongoing basis so that necessary changes can be made and assessed prior to task completion. Separate training is provided for Student Records (previously referred to as CADE) and CATI data collectors, and interviewers are monitored during CATI operations for deviations from item wording and skipping of questions. The CATI system includes online coding of postsecondary education institution and major field of study, so that interviewers can request clarification or additional information at the time of the interview. Quality circle meetings of interviewers, monitors, and supervisors provide a forum to address work quality, identify problems, and share ideas for improving operations and study outcomes. Even with such efforts, however, NPSAS—like every survey—is subject to various types of errors, as described below.

Sampling Error

Because NPSAS samples are probability-based samples rather than simple random samples, simple random sample techniques for estimating sampling error cannot be applied to these data. Two procedures for estimating variances, the Taylor Series linearization procedure and the Jackknife replicate procedure, are available for use with NPSAS:96 data. The Taylor Series linearization procedure and the balanced repeated replication (BRR) procedure are available on the NPSAS:2000 data files. The Taylor Series linearization procedure and the bootstrap replication procedure are available on the NPSAS:12, NPSAS:08, and NPSAS:04 data files.

Taylor Series. For NPSAS:96, analysis strata and replicates for three separate datasets were defined: all students, all undergraduate students, and all graduate/first-professional students. For NPSAS:2000, analysis strata and replicates for four separate datasets were defined: all students, all undergraduate students, all graduate/first-professional students, and all baccalaureate recipients.Beginning with NPSAS:04, analysis strata and replicates were defined for the combined set of all students.

Jackknife. In NPSAS:96, the Jackknife analysis strata were defined to be the same as the analysis strata defined for the Taylor Series procedure. Based on the Jackknife strata and replicate definitions, seven replicate weight sets were created—one set for the CADE weights and three sets each for the study and CATI weights. The study and CATI sets included separate replicate weights for all students, undergraduates only, and graduates only.

Balanced Repeated Replication. The BRR procedure is an alternative variance estimation procedure that computes the variance based on a balanced set of pseudo-replicates. To form pseudo-replicates for BRR variance estimation, the Taylor Series analysis strata were collapsed. The number of Taylor Series analysis strata and primary sampling units were different for all students combined, graduates/first-professionals, and baccalaureate recipients, so the collapsing was done independently and, hence, with different results. Replicate weights were created, associated with the two analysis weights: study weights and CATI weights. Thus, a total of five replicate weight sets were created for NPSAS:2000. For the study weights, this included separate replicate weights for all students and for graduate/first-professional students only; for the CATI weights, this included separate replicate weights for all students, graduate/first-professional students only, and baccalaureates only.

Bootstrap. In NPSAS:08 and NPSAS:04, a vector of bootstrap sample weights was added to the analysis file to facilitate computation of standard errors for both linear and nonlinear statistics. These weights are zero for units not selected in a particular bootstrap sample; weights for other units are inflated for the bootstrap subsampling. The initial analytic weights for the complete sample are also included for the purpose of computing the desired estimates. The vector of replicate weights allows for computing additional estimates for the sole purpose of estimating a variance. The replicates in NPSAS:16, NPSAS:12, and NPSAS:08 were produced using methodology adapted from Kott (1998) and Flyer (1987) and those in NPSAS:04 were produced using a methodology and computer software developed by Kaufman (2004). NPSAS:16, NPSAS:12, and NPSAS:08 included 200 replicate weights.

Top

Nonsampling Error

Coverage Error. Because the institutional sampling frame is constructed from the IPEDS IC file, there is nearly complete coverage of the institutions in the target population. Student coverage, however, is dependent upon the enrollment lists provided by the institutions. In NPSAS:16, approximately 1,750 of the 1,990 eligible institutions provided enrollment lists. For NPSAS:12, approximately 1,480 of the 1,690 eligible institutions provided enrollment lists. In NPSAS:08, approximately 1,730 of the 1,940 eligible institutions provided student lists or databases that could be used for sample selection. A total of 1,360 of the 1,630 eligible institutions in NPSAS:04; 1,000 of the nearly 1,100 eligible institutions in NPSAS:2000; and 840 of the 900 eligible institutions in NPSAS:96 provided student lists or databases that could be used for sample selection.

Several checks for quality and completeness of student lists are made prior to actual student sampling. In NPSAS:96 and NPSAS:04, completeness checks failed if (1) FTBs were not identified (unless the institution explicitly indicated that no such students existed) or (2) student level (undergraduate, graduate, or first professional) was not clearly identified. In NPSAS:2000 and NPSAS:08, completeness checks failed if (1) baccalaureate recipients/graduating seniors were not identified, (2) student level was not clearly identified, or (3) major fields of study or CIP codes were not clearly identified for baccalaureates.

Quality checks are performed by comparing the unduplicated counts (by student level) in institution lists with the nonimputed unduplicated counts in IPEDS IC files. Institutions failing these checks were called to rectify the problems before sampling began. These checks were performed through the 2007–08 administration. In NPSAS:08, after any necessary revisions, all but seven lists submitted were usable for selecting the student sample; in NPSAS:04, all but two lists submitted were usable for selecting the student sample.

For NPSAS:12, institutions were contacted if quality and completeness checks failed for the requested list of data items, which included: student’s name; Social Security number; student ID number (if different than Social Security number); student level (undergraduate, masters, doctoral-research/scholarship/other, doctoral-professional practice, other graduate); FTB indicator; class level of undergraduates (first year, second year, etc.); date of birth; Classification of Instructional Program code or major; undergraduate degree program; high school graduation date (month and year); and contact information (local and permanent street address and telephone number and school and home e-mail address).

For NPSAS:16, once institutions submitted enrollment lists, NPSAS project staff performed several checks on the quality and completeness before selecting the student sample. These included verifying that institutions used a readable format and that key data needed for sampling and initial locating (e.g., baccalaureate indicator, SSN, contact information) were provided. If staff detected problems with lists during quality checks, they contacted institutions to resolve any issues.

Top

Nonresponse Error. Unit nonresponse. For NPSAS:16, there were 1,750 respondent institutions from among the 1,990 eligible sample institutions (88 percent unweighted and 90 percent weighted). The institution-weighted response rate was below 85 percent for 3 of the 11 types of institutions: public less-than-2-year; private for-profit, less-than-2-year; and private for-profit, 2-year institutions. The weighted response rates, by control and level of institution, ranged from 74 percent for private for-profit, less-than-2-year institutions to 95 percent for public 4-year, nondoctorate-granting, primarily subbaccalaureate institutions. Table NPSAS-1 provides a summary of response rates across NPSAS administrations.

For NPSAS:12, there were 1,480 respondent institutions from among the 1,690 eligible sample institutions (88 percent unweighted and 87 percent weighted). The institution weighted response rate was less than 85 percent for five of the ten institution types: public less-than-2-year institutions; public 2-year institutions; private, nonprofit, less-than-4-year institutions; private, for-profit, less-than-2-year institutions; and private for-profit 2-year institutions. The weighted response rates, by type of institution, ranged from 78 percent for private nonprofit less-than-4-year institutions and private for-profit 2-year institutions to 92 percent for public 4-year non-doctorate-granting institutions. Because study members, not interview respondents, were the unit of analysis in NPSAS:12, only a study member weight was created. As a result, analysts could not compare nonresponse bias analyses after weight adjustments. For more information, see Wine, Bryan, and Siegel, 2014.

For NPSAS:08, some 90 percent (weighted) of eligible sample institutions provided student enrollment lists. The total weighted student response rate was 96 percent. The institution participation rates were generally lowest among for-profit institutions and institutions whose highest offering is less than a 4-year program.

For the student record abstraction phase of the study (referred to as CADE for studies prior to NPSAS:12), institution completion rates were 94 percent (weighted) for institutions choosing field-CADE in NPSAS:08, approximately 96 percent for institutions choosing self-CADE, and 98 percent for data-CADE (submitting data via electronic files). CADE completion rates varied by type of institution, ranging from 92 percent for private nonprofit less-than-2-year institutions to 100 percent for private nonprofit less-than-4-year institutions. Overall, the student-level CADE completion rate (the percentage of NPSAS-eligible sample members for whom a completed CADE record was obtained) was 96 percent (weighted). Weighted student-level completion rates ranged from 87 percent for private, nonprofit, less-than-4-year institutions to 99 percent for public, 4-year, non-doctorate-granting institutions. Weighted completion rates by student type were 96 percent for undergraduate and 97 percent for graduate and first-professional students.

Overall, 95,360 of approximately 132,800 eligible sample members (72 percent unweighted) completed either a full or partial NPSAS:08 student interview. The weighted response rate was 71 percent overall and ranged from 56 percent for private, for-profit, less-than-2 year institutions to 77 percent for public, 4-year, doctorate-granting institutions.

For NPSAS:12, the unweighted institution response rate was 88 percent, while the unweighted interview completion rate was 69 percent. Across institution level and control, student response rates ranged from 55 percent for private for-profit less-than 2-year institutions to 82 percent for private nonprofit 4-year doctorate-granting institutions. Potential FTBs were significantly less likely to respond than other undergraduates (60 percent compared with 73 percent), with graduate and professional students (83 percent) completing at a higher rate than undergraduate students (66 percent).

For NPSAS:16, the unweighted institution response rate was 93 percent, while the unweighted interview completion rate was 66 percent. Across institution level and control, student response rates ranged from 48 percent for private for-profit less-than 2-year institutions to 73 percent for private nonprofit 4-year doctorate-granting institutions. Potential B&B sample members were significantly more likely to respond than other undergraduates (67 percent compared with 63 percent), with graduate students (73 percent) completing at a higher rate than undergraduate students (64 percent).

Item nonresponse. Each NPSAS institution is unique in the type of data it maintains for its students. Because not all desired information is available at every institution, the CADE software allows entry of a “data not available” code. In NPSAS:08, item response rates student record abstraction were very high overall. Two items had low response rates: marital status (46 percent) and additional phone numbers (17 percent). Thus, student records frequently lack these items. The other items had response rates ranging from 73 percent to just below 100 percent.

Missing data for items in the NPSAS:08 student interview were associated with several factors: (1) a true refusal to answer, (2) an unknown answer, (3) confusion over the question wording or response options, or (4) hesitation to provide a “best guess” response. Item nonresponse rates were based on the number of interview respondents to whom the item was applicable and of whom it was asked. Overall, item-level nonresponse rates were low, with only 23 items out of approximately 500 having more than 10 percent of data missing.

For NPSAS:12, the item-level nonresponse analysis showed that of 364 interview items, 11 items had more than 10 percent missing data.

For NPSAS:16, the item-level nonresponse analysis showed that of the approximately 500 interview items, four items had more than 10 percent missing data.

Measurement error. Due to the complex design of NPSAS, there are several possible sources of measurement error, as described below.

Sources of response. Each source of information in NPSAS has both advantages and disadvantages. While students are more likely than institutions to have a comprehensive picture of education financing, they may not remember or have records of exact amounts and sources. This information may be more accurate in student financial aid records and government databases since it is recorded at the time of application for aid.

Institutional records. While financial aid offices maintain accurate records of certain types of financial aid provided at their own institution, these records are not necessarily inclusive of all support and assistance. They may not maintain records of financial aid provided at other institutions attended by the student, and they may not include employee educational benefits and institutional assistantships, which are often treated as employee salaries. These amounts are assumed to be underreported.

Government databases. Federal aid information can only be extracted from federal financial aid databases if the institution can provide a valid Social Security number for the student. It is likely that there is some undercoverage of federal aid data in NPSAS.

CATI question delivery and data entry. Any deviation from item wording that changes the intent of the question or obscures the question meaning can result in misinterpretation on the part of the interviewee and an inaccurate response. CATI entry error occurs when the response to a question is recorded incorrectly. Measures of question delivery and data entry are used for quality assurance monitoring. Due to ongoing monitoring of student telephone interviews, problems are usually detected early and the CATI interviewers are retrained, if necessary. Overall error rates in NPSAS:08 were low (typically below 2 percent) and within control limits.

Self-administered web survey. Self-administration introduces challenges not experienced with single-mode interviewer-administered surveys. For instance, in self-administration, interviewers are not able to clarify question intent and probe when responses are unclear. Surveys also require modifications to account for the mixed-mode presentation (i.e., self-administered and CATI) to maintain data quality and to make the interview process as efficient as possible for respondents. These considerations were addressed in the design of the survey, making the two modes as consistent as possible.   

Top

Data Comparability

As noted above, important design changes have been implemented in NPSAS across administrations. While sufficient comparability in survey design and instrument was maintained to ensure that comparisons with past NPSAS studies could be made, institution eligibility conditions have changed since the inception of the NPSAS studies in three notable ways. First, beginning with NPSAS:2000, an institution had to be eligible to distribute federal Title IV aid to be included. Next, institutions that offered only correspondence courses—provided that these same institutions were also eligible to distribute federal Title IV student aid—were first included in NPSAS:04. Finally, institutions in Puerto Rico were not originally included in NPSAS in 1987, but subsequently were added to administrations of NPSAS between 1993 and 2008 and in 2016. Institutions in Puerto Rico are not included in 2012 administration of NPSAS. Puerto Rican institutions enroll only about 1 percent each of undergraduate and graduate students nationally. These institutions have unique aid, enrollment, and demographic patterns that distinguish them from institutions in the 50 states and the District of Columbia. Analysts wishing to compare NPSAS:12 to other NPSAS administrations may filter those data sets to exclude Puerto Rico. NCES resumed the inclusion of Puerto Rican institutions in NPSAS:16.

Comparisons with IPEDS Data. Revised weights for NPSAS:08 were released simultaneously with the release of NPSAS:12 data. NCES has reweighted NPSAS:08 data to match weighting procedures used in NPSAS:12. At the time NPSAS:08 was originally released, only 2006–07 12-month enrollment counts were available from the Integrated Postsecondary Education Data System (IPEDS) for poststratification (weighting estimates to known population totals). The revised weights, which use 2007–08 12-month enrollment counts, provide better estimates in sectors where significant enrollment shifts occurred between 2006–07 and 2007–08. Prior NPSAS iterations did not use IPEDS 12-month enrollment counts for poststratification and, as such, are unaffected.

NCES recommends that readers not try to produce their own estimates (e.g., the percentage of all students receiving aid or the numbers of undergraduates enrolled in the fall who receive federal or state aid) by combining estimates from NPSAS publications with IPEDS enrollment data. The IPEDS enrollment data are for fall enrollment only and include some students not eligible for NPSAS (e.g., those enrolled in U.S. Service Academies and those taking college courses while enrolled in high school).

Top

Table NPSAS-1.Weighted response rates for NPSAS administrations: Selected years, 1996 through 2016.
Component Institution list
participation rate
Student
response rate
Overall
NPSAS:96  
  Student survey (analysis file1) 91 96 88 
  Student survey (student interview) 91 76 70 
NPSAS:2000     
  Student survey (analysis file1) 91 97 89
  Student survey (student interview) 91 72 66
NPSAS:04       
  Student survey (analysis file1) 80 91 72
  Student survey (student interview) 80 71 56
NPSAS:08       
  Student survey (analysis file1) 90 96 86
  Student survey (student interview) 90 71 64
NPSAS:122      
  Student survey (analysis file1) 87 91 79
   Student survey (student interview) 87 69 60
NPSAS:162      
  Student survey (analysis file1) 90 93 84
   Student survey (student interview) 90 66 59
—Not available.
1The NPSAS analysis file contains analytic variables derived from all NPSAS data sources (including institutional records and extant data sources) as well as selected direct student interview variables.
2Study members, not interview respondents, are the unit of analysis in NPSAS:12 and NPSAS:16.
NOTE: The student interview response rates for NPSAS:96 and NPSAS:2000 are for CATI interviews only. The response rates for student interviews in NPSAS:04 include all interview modes.
SOURCE: Methodology reports for the National Postsecondary Student Aid Study. Reports are available at https://nces.ed.gov/pubsearch/getpubcats.asp?sid=013

Top