NAEP Technical DocumentationData Validation and Resolution

Image Processed Documents

Questionnaire Editing

Each dataset produced by the scanning system contains data for a particular batch. These data have to be validated (or edited) for type and range of response. The data entry and resolution system used is able to simultaneously process a variety of materials from all age groups, subject areas, control documents, and questionnaires as the materials are submitted to the system from scannable media.

The data records in the scan file are organized in the same order in which the paper materials are processed by the scanner. A record for each batch header precedes all data records for that batch. The document code field on each record distinguishes the header record from the data records.

When a batch header record is read, a pre-edit data record and an edit log entry is generated. As the program processes each record within a batch from the scan file, it writes the edited data records to the pre-edit file and records all errors on the edit log. The data fields on an edit log record identify each data problem by the batch sequence number, booklet serial number, section or block code, field name or item number and data value. After each batch has been processed, the program generates a listing or online edit file of the data problems and resolution guidelines. An edit log listing is printed at the termination of the program for all non-image documents. Image "clips" requiring edits are routed to online editing stations for those documents that were image scanned.

As the program processes each data record, it first reads the booklet number and checks it against the session code for appropriate session type. Any mismatch is recorded on the error log and processing continues. The booklet number is then compared against the first three digits of the student identification number on the administration schedule. If they do not match, a message is written on the error log. All data values that are out of range are read "as is" but are flagged as suspect. All data fields that are read as asterisks (*) are recorded on the edit log or online edit file.

The blocks in a document are transcribed in the order that they appear in the document. Each block's fields are validated during this process. If a document contains suspect fields, the cover information is recorded on the edit log along with a description of the suspect data. The edited booklet cover is transferred to an output buffer area within the program. As the program processes each block of data from the dataset record, it appends the edited data fields to the data already in this buffer.

The program then cycles through the data area corresponding to the item blocks. The task of translating, validating, and reporting errors for each data field in each block is performed by a routine that requires only the block identification code and the string of input data. This routine has access to a block definition file that has, for each block, the number of fields to be processed and, for each field, the field type (alphabetic or numeric), the field width in the data record, and the valid range of values. The routine then processes each field in sequence order, performing the necessary translation, validation, and reporting tasks.

The first of these tasks checks for the presence of blanks or asterisks (*) in a critical field. These are recorded on the edit log or online edit file and processing continues with the next field. No action is taken on blank fields for multiple-choice items. Since the asterisk code indicates a double-response, these items are written to the edit log for possible resolution by editing staff. Each field is then validated for range of response and any values outside of the specified range are recorded on the edit log or online edit file. The program uses the item-type code to make further distinction among constructed-response item scores and other numeric data fields.

Moving the translated and edited data field into the output buffer is the last task performed in this phase of processing. When the entire document has been processed, the completed string of data is written to the data file. When the program encounters the end of a file, it closes the dataset and generates a hardcopy (paper) edit listing for non-image edited fields in the questionnaires. Image scanned items that require correction are displayed at an online edit terminal.

Last updated 08 May 2008 (MH)

Printer-friendly Version