Since moving to Cornell in late 2015, the Roper Center has been rebuilding its archival structure and replacing old systems with new technology. At the heart of these improvements is a restructuring of Roper Center metadata.
Metadata is a term commonly used by archivists and librarians—and rarely by anyone else. But people use metadata all the time, whether they are aware of it or not. Metadata is data about data, information about information, and good metadata makes for good user experience.
Imagine a person watching a new film based on a popular novel. Inspired by the viewing, he goes to Amazon to buy a copy of the book and searches for the title. He sees several results, including entries for the movie version (not yet available on DVD or Blu-ray), a movie tie-in version of the book, and several editions of the original book, including an anniversary edition with an introduction written by a famous author and several translated editions. Because he wants a book, he clicks on an entry for an English hardcover edition. The amount of information provided for the user is staggering: how many pages in the book, the year it was originally published and the year of the edition available, the author, the publisher, the illustrator, a synopsis. Reviews might be provided. When he chooses, frugally, to consider a particular used copy available from a secondhand seller, he is able to see information about the condition of the book, the presence of a dust jacket, shipping fees, and more.
Each of those pieces of information is an example of metadata.
Of course, researchers looking for datasets require different information from consumers searching for novels. Roper Center users need to know when studies were conducted, who conducted them, and what methodologies they used. Sometimes information like metadata on poll topics is necessary for effective searching, and sometimes this information is useful for filtering search results. For example, the researcher might want to view only studies available in their preferred file format. Sometimes, like a consumer who rejects a used copy of a book because there is highlighting in the text, a researcher doesn’t know how important certain information is until it is presented to them. Response rates, limitations on sampling coverage, or the presence of two different weighting variables in the data might be essential information in understanding how to use a poll’s data.
To improve usability, Roper is making the metadata on its collection more comprehensive, more granular, and more standardized. More comprehensive metadata means collection of more information and, eventually, display of that information on study listings. More granular metadata means reducing information to the smallest units to improve usability. Anyone who has worked with a poorly designed spreadsheet—like one in which first, last, and middle names are all entered in the same column—has had experience with information that was not usable because it was not granular enough.
Standardizing metadata means looking beyond Roper to the communities it serves. While the American Association for Public Opinion Research (AAPOR) Transparency Initiative informs the Roper Center’s metadata collection practices, the Data Documentation Initiative (DDI) metadata standard defines how that information is captured and stored. By adopting DDI, Roper is making it easier for researchers to work with Roper data in conjunction with data from other archives.
Last week, representatives from the Roper Center attended the North American Data Documentation Initiative (NADDI) conference in Washington, D.C., a meeting of data archivists and researchers focused on the use of DDI metadata standards in the social sciences. We were proud to present on our efforts to improve Roper metadata, a project that will improve the discovery and usability of polls in the Roper archive for all researchers.