World Digital Preservation Day guest blog: managing the “digital deluge” at the National Library of Australia
In this special World Digital Preservation Day entry to our ‘Be More’ blog series, Terence Ingram, Director of IT Operations for the National Library of Australia (NLA), shares his digital preservation journey, the challenges of working with an increasingly diverse, complex and growing volume of digital content, and how Preservica’s active digital preservation platform is helping the National Library of Australia manage the “digital deluge”.
Since our partnership began more than seven years ago, Preservica has played an ever-increasing role in helping the library improve digital preservation of collections to ensure their future accessibility.
Over the last three decades, the Library has accumulated a wealth of culturally significant collections in digital format, such as 55,000 hours of unique recordings of oral history, 25 million pages of digitised newspapers, and 23 years of archiving the Australian web domain (comprising 9 billion URLs and thousands of electronic titles). The overall size of the digital collection is around 1.85 petabytes. We expect that, over time, most of this material will be managed in Preservica. However, due to the large volume of digital content and our limited resources, we have chosen a staged approach. Currently, we are focusing on born-digital published and unpublished collections, which are at higher risk than digitised material and are unmanaged in any system.
The first publication deposited into the National edeposit system (NED) by a publisher in May 2019
The man who saw the future
In the early 00’s, my now-retired colleague, Colin Webb, predicted a future need for a new, scalable digital preservation solution within the National Library. While the world was just learning how to Google, and the concept of the smartphone was frankly futuristic, Colin saw a future of millions and if not billions of born digital objects.
We are used to receiving truckloads of physical items to archive, but the Library recognised that the increased popularity of born digital data would create a ‘digital deluge’, with digital data being produced and archived in significantly vaster quantities.
This was a key reason to create a scalable, navigable and preserved digital library which could store billions of digitised and born digital files. Implementing this change became a key part of the organisational mission between 2012 — 2017.
Preparing for the digital deluge
Beginning in 2008, the Library developed an in-house solution for processing content from physical carriers. Prometheus, a collaboration between IT and the Digital Preservation team was created. In a period of 12 – 18 months, the experimental alpha-style platform was up and running. The team continued to develop and improve the archiving platform between 2009-12, relying on the system to facilitate preservation of mainly published born-digital collections. However, the system was never designed to manage the entire digital preservation life cycle. With the rate of technological advancement and data production quickly increasing, investment in the project was eventually stopped in favour of finding a sustainable alternative solution.
We were faced with the challenge of adopting a digital preservation solution that was able to keep up with the constantly increasing amounts of data being created and archived. Moving forward with a new digital preservation solution, we were required to preserve vast amounts of data from a wide range of sources, both external and from within the NLA. Each year, the digital collection, a mix of born digital and digitised material, grows at around 10%. In June 2019 the unique, non-replicated collection was around 1.85 petabytes comprising a hugely diverse range of material including audio, photographs, websites, newspapers, maps, electronic books and personal digital archives.
Funding the future and Preservica’s presence
In 2009/10, the Library in collaboration with other national cultural agencies approached the Australian government with a New Policy Proposal (NPP), requesting funding to invest into a Digital Library to digitally preserve and make those valuable archives available to the public. Whilst, the proposal was well received by government, it was ultimately rejected, putting the National Library back to square one.
The decision was a blow but showed us that we needed to self-fund a more modest program of work and look for partners that could grow with us. In 2012 we began collaborating with Preservica.
Having wanted to adopt an effective digital preservation solution for 10 – 15 years, our team identified that Preservica (then called Safety Deposit Box) was the best and most cost-effective digital preservation solution in the market. As proven platform already in use at several national organizations we saw the potential of the platform, and the opportunity to grow our archives alongside them.
Hunting emus, Wahgunyah Region, Victoria, 1880 [picture] / Tommy McRae
What next at the NLA?
Looking to the future, the NLA aims to improve and scale the way it collects and safeguards data. We’re hoping to reduce the amount of human input required for data to go from entering the library in its digital form and being processed and preserved in the Digital Library.
In July 2019, National edeposit (NED) a collaboration with Australian state and territory libraries to create a national infrastructure supporting digital collecting and preservation was launched. NED is an online service for the deposit, archiving, management, discovery and delivery of published electronic material across Australia. All digital publications contained in the NED repository are stored in a managed preservation environment, namely Preservica, hosted by the National Library of Australia.
The Library is looking forward to upgrading to Preservica version 6 to leverage the new features and opportunities it presents.
NED National edeposit