Skip Navigation

Peggy G. Carr, Ph.D.
Commissioner of the National Center for Education Statistics

A Pragmatic Future for NAEP: Containing Costs and Updating Technologies
08/27/22

By the National Academies of Sciences, Engineering, and Medicine's Panel on Opportunities for the National Assessment of Educational Progress in an Age of AI and Pervasive Computation

The National Center for Education Statistics commissioned the National Academics of Science, Engineering, and Medicine to convene a panel of experts to provide advice on how "the use of digital technology and other major innovations could transform NAEP over the next ten years and beyond." They were asked also to consider the effects of proposed innovations on the cost of NAEP. The resulting report, A Pragmatic Future for NAEP: Containing Costs and Updating Technologies (March, 2022), includes 20 recommendations spanning a wide range of topics from operational aspects of NAEP such as item development, assessment delivery platform, and test administration, to more structural facets such as program and cost management.

On balance, NCES welcomes the panel's recommendations. Many recommendations endorse existing efforts NCES has undertaken in recent years to modernize the program and others point to promising directions for the program. Unfortunately, some of the recommendations rely on conclusions that are based on inaccurate assumptions. In particular, the panel misrepresents NAEP cost data in some instances.  However, NCES welcomes the opportunity to consider the panel's report and recommendations in shaping the future of the NAEP program. Below, the panels' recommendations are grouped thematically and followed by initial responses from NCES.

Program Management, Cost, and Research and Development

RECOMMENDATION 2-1: "NCES and NAGB should develop clear, consistent, and complete descriptions of current spending on the major components of NAEP, including contract structure, contractual spending, and direct spending on government staff and other costs. These cost descriptions should be used to inform major decisions about the program to ensure that their long-term budgetary impact is supportable."

RECOMMENDATION 10-1: "NAGB and NCES should commission an independent audit of the program management and decision-making processes and costs in the NAEP program, with a charge and sufficient access to review the program's costs in detail. That audit should include proposed ways to streamline these processes."

RECOMMENDATION 10-2: "NCES should increase the visibility and coherence of the NAEP's research activities to help NAEP's stakeholders, as well as other assessment programs, understand the innovations the program is investigating and the lessons it is learning. The NAEP research program should have an identifiable budget and program of activities."

The panel reports that "despite cooperation from NCES, the panel could not obtain a clear picture of the overall budget for NAEP" (p. 11-1) and recognizes that the budget figures presented in the report are the "panel's best estimate" (p. 3-3). In fact, the panel's estimates of some costs are simply inaccurate. For example, the panel dramatically overestimates costs for management and planning and includes under management and planning costs that are actually analysis and reporting and other core activities.

NCES recognizes that NAEP's contracting and cost structures are complex and difficult to understand. Thus, the panel's recommendation for more clarity about NAEP's cost structures (Rec. 2-1) is well-taken. It is important that key stakeholders, such as the National Assessment Governing Board (NAGB) members who oversee policy for the program and federal policymakers, who, on behalf of the taxpayers, fund the program, can understand the foundations of the cost structures for the NAEP program. Accordingly, NCES will develop for stakeholders involved in the oversight of the program resources describing NAEP cost structures and convene a series of workshops for these stakeholders.

The panel also presents a confused picture of NAEP costs over time. Although, as shown in the report's Figure 2-1 (p. 2-8), the program's core budget, adjusted for inflation, has not increased since 2002, the panel suggests that NAEP costs are rising, threatening the viability of the program. In addition, the panel suggests there are inefficiencies caused by the joint oversight of the program between NAGB and NCES. The panel advocates for an independent audit of the program management and decision-making processes (Rec. 10-1) to improve efficiency and reduce costs (p. 11-11). NCES does not agree with this recommendation as stated. Overlapping responsibilities between NAGB and NCES for the form and content of the assessments and the release and dissemination of results were included by Congress in the legislation for NAEP to produce important safeguards for the independence of NAEP as a consistently fair and unbiased monitor of educational progress in the nation. However, we recognize that a thorough review of NAEP processes and costs within the framework required by law could identify efficiencies for NCES and NAGB to consider. Accordingly, NCES will commission an independent review of NAEP's cost structures, including recommendations for process improvements to reduce costs.

The panel also recommends establishing an identifiable research budget and program of activities for NAEP to support ongoing innovation (Rec. 10-2). We support this position. Currently, the program's research and operational budgets are not separated. This limits NCES' ability to fund essential research activities without compromising the operational activities. We also agree with the panel's call for the program to increase the visibility and coherence of its research activities (Rec. 10-2). NCES will produce a white paper describing the current and future plans to modernize the program. NCES will post the white paper on its website in a new research and development hub disseminating information on NAEP research and development efforts.

The Maintenance of Trend Measures and Potential for Integrating Subjects

RECOMMENDATION 3-1: "NCES should prepare a detailed plan and budget for the modernization of long-term trend NAEP, including the costs of creating post-hoc assessment frameworks, bridging between paper and digital assessment, maintaining trends, and ongoing costs after the bridge. Congress, NAGB, and NCES should then consider the value of a modernized and continued long-term trend NAEP in comparison with other program priorities. If continued, long-term trend NAEP should be renamed to better distinguish it from the trend data provided by main NAEP."

RECOMMENDATION 3-2: "NAGB and NCES should work both independently and collaboratively to implement smaller and more frequent framework updates. This work should include consideration of the possibility of broadening the remit of the standing subject-matter committees that already exist to include responsibility for gradual framework updates, participation in item model development, and working directly with both NAGB and NCES."

RECOMMENDATION 3-3: "NAGB should give high priority to consideration of integrating non-mandated subjects that are currently assessed separately (such as science and technology and engineering literacy), as well as the possibility of integrated pairs of subjects that include a mandated subject, such as reading and writing. This consideration should examine the possibility of preserving separate subject subscores in an integrated assessment that could maintain trends, along with potential benefits related to efficiency and cost, closer alignment with student learning, and synergy across subjects that has been found by research."

The panel makes a set of recommendations concerning how the program designs its assessments to be able to measure trends in achievement over time (Rec. 3-1 and 3-2), as well as potential new ways to integrate subject areas (Rec. 3-3). These are intriguing recommendations that NCES and NAGB, which is responsible for the content of assessments, will explore. NCES and the Governing Board staff will establish a working group to consider the short-term, mid-term, and long-term innovation strategies for the program, including these three recommendations concerning the content of the assessments.

Item Development, Scoring, Analysis and Reporting

RECOMMENDATION 4-1: "NCES should examine the costs and scope of work in the item development contract that are not directly related to item development and pilot administration and explore possibilities for changes that would reduce costs."

RECOMMENDATION 4-2: "NAGB and NCES should move towards using more structured processes for item development to both decrease costs and improve quality. This work should include drawing from the detailed achievement-level descriptions to specify intended inferences and claims, better integrating the work of framework development and item creation and carrying out critical aspects of review and quality control at the level of task models rather than at the level of individual items."

RECOMMENDATION 4-3: "NAGB should commission an analysis of the value and cost of different item types when multiple item types can measure the construct of interest. A full range of potential item types should be included in this analysis. The analysis should develop a framework for considering the tradeoff between value and cost. The value considered should include both the item's contribution to a score and its signal about the relevant components of the construct. The costs considered should include item development (both item creation and pilot administration), administration time, and scoring."

RECOMMENDATION 7-1: "NCES should continue its work to implement automated scoring on the reading and mathematics assessments for grades 4 and 8, with the item types that current scoring engines can score accurately and consistently. NCES should also consider the use of automated scoring on other assessments administered to state-level samples. In addition to benefiting from modest net reductions in costs, NCES should work to leverage the potential of automated scoring to improve the speed of reporting, increase the information provided about open-ended responses, and increase the consistency and fairness of scoring over time."

RECOMMENDATION 8-1: "NCES should devote a greater percentage of its budget for innovative analysis and reporting that will increase the use and understanding of NAEP's data, including finding ways to make the raw data available more quickly to researchers, improving the usability and sophistication of the NAEP Data Explorer, making process data more easily accessible, and expanding the availability and use of important contextual variables."

In its recommendations regarding item development, the panel calls for NCES to "examine the costs and scope of work in the item development contract" (Rec. 4-1), "move towards using more structured processes for item development to both decrease costs and improve quality" and draw from the achievement-level descriptions (Rec. 4-2). The panel also recommends an analysis of the value and cost of different item types (Rec. 4-3). Although the report misrepresents some item development costs, NCES agrees that studying item development processes to identify ways to improve efficiency would be beneficial. Similarly, although previous attempts at employing task models (Rec. 4-2) have proven costly, NCES is committed to reevaluating the use of task models with cost efficiency in mind. NAGB is also in the process of creating more detailed reporting achievement level descriptions in reading and mathematics based on a study conducted earlier this year. NCES expects that these reporting achievement level descriptions will better lend themselves to informing the item development process. NCES will include as part of the independent review of NAEP's processes and cost (see under Rec. 10-1) a review of item development processes as suggested by the panel.

As mentioned above, many of the panel's recommendations support current NCES plans. Another example of this is Rec. 7-1, which endorses NCES' work to implement automated scoring in mandated assessments and recommends that NCES consider automated scoring in other assessments administered to state-level samples. NCES is currently planning to use automated scoring in Reading in 2024 and will continue to explore its use next in Mathematics and then in other subjects in coming years. NCES agrees with the panel that automated scoring will not only reduce scoring costs but also enhance and accelerate reporting, and possibly increase the consistency and fairness of scoring over time (p. 7-6). NCES will continue to pursue automated scoring.

Another recommendation that speaks to enhanced and faster reporting is Rec. 8-1, where the panel calls for more budget to be allocated to innovative analysis and reporting, including faster and easier dissemination of raw data and process data, improvements in the NAEP Data Explorer (NDE), and expansion in the use of contextual variables. NCES agrees that the program would benefit from devoting more of its budget for innovative analysis and reporting. NCES has leveraged technology to provide interactive and engaging data visualizations and dashboards since its transition to digital reporting in 2013 in addition to the NDE within current budget limits. For example, state and district profile pages allow users to see how their jurisdictions' results compare with others across grades and subjects by clicking on maps and interactive tables. The achievement gaps dashboard allows users to see trends in gaps not only for traditional factors but also cross tabulated characteristics. Furthermore, the digital report cards showcase modern data visualizations to summarize results across jurisdictions, student groups, and contextual factors. NCES is always exploring ways to improve the dissemination of data. NCES will prioritize this recommendation as part of the efforts of the joint working group with the Board described in response to Rec. 3-3.

Modernizing Assessment Administration

RECOMMENDATION 5-1: "NCES should continue to develop its plan to administer NAEP using local school staff as proctors with online assessment delivery on local school computers, with development and bridge studies as needed to understand the feasibility and effects of this change in different contexts. This new model should be accompanied by adequate training and support of school staff, including tailored support for schools with more limited resources that may need NCES to provide proctors and equipment. NCES should also explore the use of flexible administration windows to allow schools to develop plans that accommodate local constraints on available equipment and consider appropriate ways to compensate local schools for their contributions to the administration, especially during the transition to this new model."

RECOMMENDATION 5-2: "Since a key component of moving to local administration will be the development of minimum requirements for equipment, operating systems, and connectivity, information about local devices, bandwidth, and administration conditions will have to be included in the data collection. Analysts should use statistical techniques that account for the effects of differences in devices and other local conditions to produce estimates that generalize across those differences. NCES should explore the use of random effects and other statistical techniques to produce estimates that reflect generalization across devices."

RECOMMENDATION 5-3: "NCES should review its estimates of the potential cost savings from local administration of the mandated assessments in reading and mathematics in grades 4 and 8. The estimated savings are unexpectedly small when local administration would largely eliminate the large current costs for traveling proctors and equipment, even after considering any offsetting additional costs for training and technological infrastructure. NCES should also consider the use of the local administration model for reducing costs of all other assessments, as well as the costs for the pilot administration of new items."

RECOMMENDATION 6-1: "NCES should continue to develop its plan to administer NAEP in longer sessions that allow for 90 minutes for the testing of cognitive items for each student. NCES should explore other models for using longer tests, in addition to its current plan. The decision to use longer tests should be based primarily on their potential to reduce testing burden by reducing the number of sampled students and to understand dependencies in proficiency across subjects, rather than being based on any long-term cost savings, which would be minimal with local test administration."

RECOMMENDATION 6-2: "NCES should commission an analysis of the tradeoff between NAEP's sample sizes and its statistical power in detecting differences in performance, including trends and gaps, and its ability to achieve minimum cell sizes for reporting on subpopulations. In particular, this analysis should consider the stated purposes of the National Assessment Governing Board to measure not only average scores, but also differences over time and between targeted subpopulations, and it should provide evidence about the level of precision required for these results to be meaningful to educators and policy makers. Evidence about meaningful levels of statistical power and minimum cell sizes for subpopulations should be directly related to the implications for NAEP's sample sizes and associated administration costs."

RECOMMENDATION 6-3: "NCES should not pursue adaptive testing for NAEP as a way of saving costs, but the agency should continue to investigate its use for its potential to improve the precision of statistical estimates and the test-taking experiences for low-performing students. NCES should also consider that no single approach to adaptive testing may fit all subjects and that some changes to assessment frameworks may be necessary to facilitate adaptive administration."

RECOMMENDATION 6-4: "Efforts to coordinate NAEP test administration with the international assessment programs sponsored by the National Center for Education Statistics should not be used as a strategy to reduce costs."

NAEP's digitally based assessments are currently administered by professionally trained NAEP staff using NAEP-owned tablets. The panel recognizes that NAEP's current model is intended to reduce the burden on schools while maintaining the level of standardization that is deemed essential for NAEP (p. 11-4).

NAEP's strategic plan for advancements in NAEP's digital platform will allow the program to adopt a more cost-efficient approach that leverages school-owned devices and local school staff as test administrators. The panel endorses NAEP's vision in this realm (Rec. 5-1) and recommends that information about local devices be included in the data collection to allow NCES to explore the use of statistical techniques to produce estimates that generalize across devices (Rec. 5-2).

NCES has a series of studies in place to investigate local administration with local devices. In addition, NCES will bring the suggested investigation to explore statistical techniques (Rec. 5-2) to its advisory groups including Design and Analysis Committee (DAC) and NAEP Validity Studies (NVS) Panel.

The panel also recommends a review of potential cost savings from local administration of the mandated assessments (Rec. 5-3). The program could indeed benefit from a review of projected cost savings as we get closer to the next contract cycle (2024 through 2029), where most of these savings will be realized. NCES will continue to update these projected cost savings. The panel also recommends local administration model for other assessments. NCES agrees with this and has included non-mandated assessments in its plans for local administration (Rec. 5-3).

Two other areas where the panel endorses NCES' current plans are on the investigation of administering more than one NAEP subject to each student1 (Rec. 6-1) and the use of adaptive testing (Rec. 6-3). NCES fully agrees with panel's view that the efforts to assess each student in multiple subjects "should be based primarily on its potential to reduce testing burden by reducing the number of sampled students and to understand dependencies in proficiency across subjects" (p. 11-7), and that adaptive testing should be investigated "for its potential to improve the precision of statistical estimates and the test-taking experiences for low-performing students" (p.11-7). NCES is in agreement with Rec. 6-1 and 6-3 and will continue its plans to conduct studies in 2026 to investigate these two design innovations together. Depending on the results of the studies, the design changes will be implemented in 2028.

The panel also recommends that NCES commission an analysis of the tradeoff between NAEP's sample sizes and its statistical power (Rec. 6-2). In fact, NCES routinely conducts this type of analysis in advance of each assessment cycle. However, in response to this Rec. 6-2, NCES will incorporate these analyses in NAEP's Technical Documentation on the Web (TDW) starting with the 2022 assessments to make them more accessible.

Finally, the panel also recommends that efforts to coordinate NAEP with the international assessment programs should not be used as a strategy to reduce costs (Rec. 6-4). This recommendation is in line with our internal analysis. The value in coordination with the international assessments lies in potential linking studies based on shared samples or content, as the report also acknowledges (p. 6-7).

Technological Infrastructure

RECOMMENDATION 9-1: "NCES should regularly evaluate the software built by vendors or available in open-source libraries for its potential to meet the requirements of the different components of Next-Gen eNAEP. To support the viability of local administration of NAEP, the ease of installing, managing, and troubleshooting test delivery software should be a strong consideration in selecting the software to be used. Given the substantial ongoing expense associated with developing and maintaining a proprietary platform, Next-Gen eNAEP components should be custom built only if there are clearly large net benefits from doing so that have been identified by rigorous analysis. This decision should be made on a component basis, not as a single decision to build or buy all components. NCES should immediately carry out an evaluation with respect to any components of Next-Gen eNAEP that have not already been substantially developed, and then periodically thereafter. The platform development contract should provide the right incentives to make the best decision between building and buying each component."

RECOMMENDATION 9-2: "NCES should ensure that there is adequate internal and external expertise related to enterprise software development to support and oversee the development of Next-Gen eNAEP for both the NCES staff and the staff working for the platform development contractor. This software expertise is substantially different than expertise related to psychometrics and statistics."

RECOMMENDATION 9-3: "NCES should seek expert guidance from enterprise application developers and educational technologists who understand assessment technology platforms to evaluate the reasonability of the projected costs for the development of Next-Gen eNAEP."

During the 2010s, the transition of largescale assessments such as NAEP, PISA, and TIMSS and state assessments systems from paper-and-pencil to digitally based administration necessitated the development of digital platforms to deliver the assessments. The NAEP program developed "eNAEP," a platform intended to deliver assessments and collect data with minimal burden on schools and across the range of digital capabilities across the nation's schools at that time. NCES is currently developing a new "Next-Generation" eNAEP integrated into other NAEP program processes to improve efficiency and reduce costs in areas ranging from item development to administration in the field.  

The panel acknowledges NAEP's work on Next-Generation eNAEP (p. 11-9) but misinterprets some elements of the platform. NCES agrees with the panel's recommendation that Next-Gen eNAEP components should be custom-built only if rigorous analysis shows that there are clearly large net benefits to this approach (Rec. 9-1). However, NCES does not agree with the panel's position that current eNAEP does not have contemporary data architecture. NCES is already utilizing approaches that address these points with the new Next-Gen eNAEP development. Wherever possible, mature, open-source and commercial components are utilized and integrated as components into the platform. In some cases, when no available technologies meet NAEP's technical or security requirements, or to integrate open-source and commercially licensed components, custom-built software solutions are developed following open standards and best practices for reusability and cost-saving. Software built by vendors or available in open-source libraries is evaluated regularly. The Next-Gen eNAEP platform was developed based on informed decisions on what to build versus buy considering development cost and efficiency as well as the total cost of ownership.

NAEP's transition to a digitally based administration reinforced the need to have the right expertise at the table when making decisions about technological innovations. NCES agrees with the panel's recommendation to ensure that there is adequate internal and external staff with software development expertise (Rec. 9-2) and to seek expert guidance from enterprise application developers and educational technologists who understand assessment technology platform (Rec. 9-3). NCES is confident that the current vendor and internal staff have the required requisite knowledge, skills and experience.2 However, NCES needs more internal staff in this critical area. NCES has requested that priority be given to hiring additional staff with background in enterprise application development to support these complex technical activities (Rec. 9-2). Moreover, NCES will commission a system review of the Next-Gen eNAEP. In addition, NCES will establish an ongoing independent panel to set an agenda to advise, evaluate and support continued improvement of eNAEP platforms and supporting systems (Rec. 9-3).

Summary of NCES' Plan of Action

Moving forward NCES will continue to process the recommendations in the final report, with three major principles in mind: accountability, transparency, and modernization. NCES will be taking the following actions in the coming weeks to address the recommendations in the final report and to illustrate commitment to these principles:

  1. NCES will develop for stakeholders involved in the oversight of the program resources describing NAEP cost structures and convene a series of workshops for these stakeholders. (accountability and transparency)
  2. NCES will commission an independent review of NAEP's cost structures, including recommendations for process improvements to reduce costs. (accountability and transparency)
  3. NCES will produce a white paper describing the current and future plans to modernize the program.  (transparency)
  4. NCES and the Governing Board staff will establish a working group to consider the short-term, mid-term, and long-term innovation strategies. (modernization)
  5. NCES will commission a system review of the Next-Gen eNAEP and will establish an ongoing independent panel to support continued improvement of eNAEP. (modernization)

NCES appreciates the NASEM panel's work and the resulting report. The panel's recommendations and the actions NCES plans to initiate or continue in response to these recommendations will ensure the program's continued leadership into the future.


1 Currently, in NAEP assessments each student receives two 30-minute cognitive blocks of items from a single subject (e.g. reading or mathematics).
2 The platform development contractor has extensive expertise related to enterprise, customer-responsive software development, cloud-based architecture, and agile development processes. This includes proven experience with direct development of other state and large-scale test delivery engines/platforms for clients

Top