Project Topic

Design a data repository that incorporates archival data and current data capture practices for program management and research

Project Background

An office of the National Institute for Occupational Safety and Health [NIOSH] was given a mandate from the U.S. Congress and the Centers for Disease Control to implement a program of compensation for employees of atomic weapons manufacturers as well as individuals employed in various nuclear energy-related jobs. One portion of the data collection process was managed by an offsite team operating independently from the project’s IT management. This offsite group developed a data acquisition process that relied upon the use of Excel Workbooks. The workbooks were adequate for single case reviews, but were unsuited to viewing data in the aggregate or gathering basic statistical information.

I worked with this government client to clarify their longer-term research needs and to establish a data model and set of requirements that would allow for the aggregation and restructuring of the data so that they would be compatible with, and more useful to, the ongoing activities of the organization.

Actions Taken

Initial steps focused on reviewing the current data acquisition process, which, while not optimal, was continuing to be used as the production system for data transcription. The costs associated with both retooling the data acquisition process and in retraining staff were negative incentives as was any interruption to the ambitious production quotas that were currently being met. These negative incentives precluded a re-design to a more appropriate front-end process. Consequently, an alternative strategy was elected to maintain the currency of data in the resulting database. This strategy involved the development of programs to read several versions of the Excel workbooks on an ongoing basis, to evaluate data for accuracy, and to store quality-checked data in an SQL Server database.

Results

Key to the success of this project was the development of a data quality process that assured data were accurate. The need for a data quality review process was particularly cogent as the workbooks did not have data filtering and relational integrity controls; controls that are routinely built into contemporary information systems.

An unexpected, but welcome outcome of this project was an overall improvement in the ongoing data transcription effort. Errors detected in data entry through data filtering programs were managed and reported back to the data entry team. This feedback, previously unavailable, allowed for rapid corrective actions and improved the quality of the data entry effort in “real time”, as well preventing “bad data” from entering final-destination database tables.
Another objective of this project was to be able to provide the client with statistical summaries of the aggregated data; that is, to provide information about the data which were being collected. Traditional queries, views and SQL reporting tools were developed to establish various views and summaries about the data that could then be used by the client. As a result of the efforts undertaken, the government agency discovered that the data which had been collected had several uses beyond their original scope and that they were of immediate benefit for other related projects.