In anticipation of continued proliferation of polling data and the growth of user communities interested in Roper Center data and services, one of our central priorities over the past four years has been to review and redesign the workflows and backend technologies that support data processing and access. The ultimate goal of our infrastructure redesign initiative is to ensure the efficiency and long-term sustainability of efforts to expand our collections, provide new user services, and improve the user experience. Read on to learn about the unique challenges we've faced and the progress we've made in our efforts to improve our infrastructure.
Background
Since its founding in 1947, the Roper Center has been at the forefront of collecting, preserving, and promoting the responsible use of public opinion data. As polling in the United States and throughout the world proliferated, the Roper Center correspondingly grew. Center staff developed unique software and analytic tools that have become crucial for public opinion research, such as iPOLL, which facilitates discovery and retrieval of question-level summary statistics and iPOLLPlus, which supports online analysis of survey questions. We are a small but dedicated organization that has consistently fueled front-end innovation and prioritized the development of new tools for our users. While the Center has made significant progress by developing new search and data analysis services, the technical and procedural infrastructure to support this growth in new data acquisitions and data services is becoming outdated. Streamlining and modernizing these procedures will allow for faster, more accurate processing of surveys and strengthen the foundation on which front-end innovation can continue to flourish.
Challenges
Variety of Data Sources A unique challenge we face at the Roper Center is that, unlike most data archives, we process summary statistics before we acquire respondent-level datasets. Public opinion organizations release their data in summary form before the embargo on their full datasets ends, and it is during this time frame that we acquire questions and marginals for release in iPOLL. The public opinion research community depends on the timeliness and comprehensiveness that iPOLL promises, so it is crucial that we continue to prioritize processing questions and responses as soon as they are made publicly available, that is, before complete datasets are available for processing. After datasets do become available, we process them and create links between questions and datasets whose provenance is the same. Dual-Track Workflow The processing of complete datasets and of summary statistics for iPOLL have historically been separate, using different formats and terminology. The Review and Redesign project's focus is to build a processing system where, at the point of ingest, there is an integrated, non-duplicative workflow to support operations across multiple source materials. This creates a unique opportunity for the Center to integrate statistical resources with contextual materials by identifying and implementing the best methods for documenting and linking question level information with statistical datasets and the metadata that describe their contents and use.
Progress
Workflow Review Thanks to funding from the Robert Wood Johnson Foundation, we brought in Ann Green, Digital Lifecycle Researcher & Consultant and Roper Center Board of Directors member, to consult on this project. Green learned the intricacies of both processing workflows and worked with staff to standardize our terminology and develop a single-stream processing workflow. Digital Curation System The next step was to begin designing a Digital Curation System database that will support the implementation of our new single-stream workflow. The central function of the database is to serve as a centralized, single point of ingest for all incoming archival materials. It is within this ingest unit that we will manage methodological and citation information about each study in our holdings as well as the published releases, datasets, and questions from each study. For more technical information about the current version of the Digital Curation System, which is being developed on an ongoing basis, check out the presentation (.pptx) the Roper Center gave at IASSIST 40 in June 2014. Preservation Planning Another key component of the infrastructure redesign project is the formalization of our preservation processes and planning. Our core mission is to ensure that public opinion data remain accessible to depositors and secondary users in perpetuity, and digital curation best practices dictate the necessity of developing and maintaining a formal, stand-alone preservation policy to support this critical responsibility. We have been hard at work identifying and documenting the activities we perform to ensure that the public opinion data we archive remains accessible to researchers for the long-term. Learn more about all of the work we do to guarantee data preservation by reviewing our Digital Preservation Policy.
Looking Forward
Now that the core ingest unit is in production, we will focus on developing units that support acquisition management, question-level and variable-level processing, and constantly improve the quality assurance and workflow management functionality of the system. We will continue to build out and improve the Digital Curation System over the next 2-3 years while ensuring uninterrupted access to our holdings throughout its entire development. The Review and Redesign of Roper Center Infrastructure project is made possible by funding from the Robert Wood Johnson Foundation, Grant Number 68713.