Using digital tools for archiving historical and contemporary data

by Magdalena Olszanowski and Sarah Ring

Virtual Daylighting is a creative investigation and virtual presentation of the “lost rivers” that once traversed the island of Montreal. These waterways have been buried over time by the processes of industrialization and urbanization and are all but extinct from our landscape and our histories.

Our approach to daylighting this history is multimodal, using digital and analog methods that we established as we went along. Specifically, our methodological development has been generated by the platform’s (ResourceSpace) structure we are using. We took advantage of brick-and-mortar and digital archives, both local ones like neighbourhood historical societies and national ones, like the Library & Archives Canada. Some of our goals for the project are: 1) test the potentials of new forms of digital archiving; 2) make a digital archive accessible for a variety of users and uses.

The Platform: ResourceSpace
To manage all our bilingual archival documents (photo, audio, video, text) we turned to an open source archival system, ResourceSpace (RS). RS is a web-based, open source digital asset management system which has been designed to give content creators easy and fast access to print and web ready assets. We started using RS as it was being tailored for our project and is continuously being updated for our needs.
Unlike the familiarity of Dropbox, which follows the folder/subfolder structure we are used to with our own work and with physical files, RS allows for public access to its database through its organizational set up: by keyword and description, according to broader “collections,” or by one of the four categories the developer has specified (photo, video, audio, document).

Collecting and Sorting Material
Images and textual documents were pulled from a variety of online and traditional archival sources all of which used different methods of information organization. As such researchers had to navigate each source —whether a religious institution, library or local community organization— relative to how they initially classified data. In addition, at the outset, there was no way of estimating how much information (primary or secondary sources) existed. Though the short-term focus was to collect documents pertaining to the five bodies of water for the app, the long-term focus was to build a more comprehensive database of all “lost rivers” on the Island of Montreal into which future researchers could add and contribute knowledge.
Once located, images then needed to be deemed relevant to the researcher’s purposes; all along the image retrieval process some personal conceptions of what information was relevant (was a contemporary photograph of a street over which a river was built pertinent? was any image of sewer work?) were necessary in determining what would make the cut and be digitally archived. Given the nature of the final digital medium in which the sources were going to be used, visually and aesthetically pleasing maps, photographs and images were privileged over textual documents. Thus, the way in which these sources were envisioned to be archived and used in digital form determined in part their historical relevance. Therefore, the continuation of the archival process was reinforced —what did others deem important? how did they organize data? what was ignored and discarded?— and magnified both consciously and unconsciously.
Some challenges arose not only from the variety of archival sources from which to draw on but also from the variety of people who were involved in collecting and sorting documents. Past researchers had used a hodgepodge of paper maps, binders, and online documents which now needed to be organized and consolidated into a unified digital archive. These resources were sometimes not properly cited making it difficult to track down copyright information necessary to use items in both the Lost Rivers documentary and the iPhone application.
Once we determined the usefulness of the images, they had to be collected; images found in books were photographed with mobile phones, higher resolution images needed to be ordered sometimes for a fee. Maps, photographs, images, and newspaper articles were digitized and classified into various and sometimes overlapping collections; for example, a given document could be sorted and cross referenced into “map,” “flood” and “Rivière Saint-Pierre.”

Archiving through the Keyword
One of the key components of online digital archiving is ease of access to potentially infinite amounts of data, dependent on bandwidth of course. More specifically, the way to easily view large amounts of data is through the metadata keyword. The keyword is essential to categorizing, storing, and retrieving data. If a database has a large amount of archival material without a consistent and rigorous initial taxonomy structure then it is useless. If unable to be searched for, it might as well not exist, unless of course we assume digital archivists will make use of a digital archive like physical archives —document by document. Indeed, this defeats the point. Thus, best practices are crucial at this stage. Thinking through how to organize all the various documents with various incomplete citation information took months and continued to evolve as we started to upload the materials to RS. We started with an initial set of keywords based on larger themes we knew would come up with the project (e.g., river name, flood, map, archive from which document was procured, etc.) But more specific keywords emerged as we started archiving more and more material; previously uploaded files then had to be revised using these new keywords. In addition, we had to decide between the use of plural or singular of nouns; how much metadata should we include; and the parameters of each keyword. For example, we had many conversations concerning the use of “archival” and “contemporary” as keywords to differentiate between images. This was not solely because of what we now define as either archival or contemporary but we also had to envisage how future users of the database will differentiate between these two term. Classifying archival images from the 18th, 19th, and early 20th centuries was easy. They are clearly “historical,” but what about the materials from the 1980s and 1990s and so on? Those are technically “contemporary” because we assume them to be more recent, but are they? How long will they be signified as contemporary?
The database does not have any instructions or collection of keywords to choose from initially and as such this was our biggest hurdle. The RS keyword search box searches the descriptions of the material which is useful for newcomers to the database. Also, at the bottom of each file’s main page there is a selection of “related keywords” that have been algorithmically decided based on all the text attached to the file.
We acknowledge that the database may have to be iteratively re-taxonomized based on new material added, but also through user feedback.

Digital Archiving
Over time the Virtual Daylighting project has made use of various online digital platforms and storing methods. Currently our main archiving databases are stored and shared using RS, Dropbox, and Google Drive. Initially some of the collected data was archived on CDs and hard-drives and then partially stored on Teambox, another collaborative tool. Researchers quickly recognized that moving between several platforms was confusing and led to files being dispersed and categorized in many different ways due to each platform’s framework. Finally, we were able to pare down, and move away from Teambox and archive all the content from the CDs and various backup hard-drives onto RS. Nonetheless, we decided to keep the original copies of all the multi-media documents on a shared Dropbox folder, since it can store files of limitless size and comprises of subfolders which our habits are used to; we sometimes refer back to it when needing to reference documents for other projects. After classifying data to the best of our abilities, all the while thinking about future uses, we still wonder: will our methods be clear to people who will work on the project once we leave? If the RS database has all the original files and is on a secure backed-up server, why do we still keep original files on Dropbox that may or may not have corresponding names? Is this not how archives become unwieldy and why digital management tools exist? What does it mean to move through two digital sharing platforms? What are the benefits to using such a system for large-scale multi-media collection?
We don’t have any clear answers for this yet, and welcome any thoughts.

Final Remarks
If you are interested in seeing how this works we urge you to email us for a guest pass. In addition, if you would like more information of the process for your own research, please email Magdalena Olszanowski.