Digital Preservation at scale with Yaso Arumugam, CIO of The National Archives of Australia
In this guest blog Q&A, Yaso Arumugam explores the challenges of digitally preserving a collection of over 40 million items.
The National Archives of Australia (NAA) is responsible for maintaining and preserving Australia’s official government records and plays a critical role in archiving the events, decisions and stories that have been pivotal in shaping the country into what it is today.
In this guest blog Q&A, Yaso Arumugam, Chief Information Officer of the NAA explores the challenges of digitally preserving a collection with over 40 million items, and how collaborating with Preservica will create a preservation platform equipped to bridge the digital ‘capability gap’.
Credit - John Gollings
Could you tell us about your role within the National Archives of Australia?
I am the Chief Information Officer for the National Archives and also the Chief Information Security Officer. I have always wanted to do things a bit differently in my roles. People talk a lot about ‘agile’. So, what does agile mean? When is agile useful? I usually take on the iterative approach, the learning approach: start small, learn, and then keep moving. Joining the archives NAA two years ago provided an opportunity for me to bring my skills in, but also to learn more about the business. Now I have an opportunity to make a difference by addressing a core critical capability gap by implementing a preservation platform, which is something really important for the NAA.
Could you expand more on what the capability gap means?
We currently have in house capabilities that have been built over time to meet the NAA needs, and some of these will need to be re-designed to meet future needs. From a scalability perspective, it’s about bringing in more features, new features with agility, and integration, with other contemporary technologies.
Preservica provides the necessary support and aligns to one of our core accountabilities from the government; we need to hold the Australian government records for years to come with confidence that we are using modern ways of ensuring that they are preserved and safeguarded.
What are the types of records you hold and the size of the current collection?
We currently hold so many different types of media - examples range from audio visual tapes, to disc, to magnetic tape hard drives - and there’s fragility with all of them. We’ve got paper collections too – about 350 kilometres of documents that need to be digitized for longevity.
And then we also have around 1.2 petabytes of audio visual and other material, and we see our digital born collection growing in the next few years to up to about five petabytes. That just gives you an idea of the complexity, the breadth, and the storage capacity, that we need.
Credit - John Gollings
You touched on how you need to hold Australian government records into the future - how does this process work? Are there other collections that are really popular?
We have something called an ‘open period’; records are preserved until they come into that open period, at which point we release them. For example, with the Australian Government cabinet records, after 20 years they come into the open period, and then we examine and release them for public access. Every year on 1st of January, they get released for public access. On 1st January, people get to see how, 20 years ago, the cabinet records led to some key decisions in the history of the country. Obviously, they’ve been screened for sensitivities, but it gives you a pretty good story on what happened, and that improves the transparency of government for the public.
The other popular collections are the World War service records that we hold. We've got WWI and WWII records, and they are cherished by family members and veterans themselves. We’ve obviously got some good audio-visual documentary of Australia too.
Recently our collection of palace letters from previous Governor General, John Kerr came into the news, and they garnered a lot of interest with various researchers and the public.
I read that story. That's really interesting. So, you preserve those letters?
The government officials, at the end of their term, hand over their documents for preservation and public access. In that collection is a letter from the Governor General explaining how this process will be important for future generations. The recent release of correspondence between the Queen and John Kerr during his tenure as Governor-General, included information relating to the dismissal of one of our past Prime Ministers, hence the interest it produced, nationally and internationally.
Can you tell me about how Preservica came to work with the NAA and what the best aspects have been?
By the time I joined the National Archives, they were ready to see what the market could offer in terms of an integrated archival solution. The digital preservation, archival control cataloguing system, and search and discovery were our three main priorities.
I have to credit Preservica for the support they gave during the pilot process. One of the reservations we had was how would we work with a company that is not based in Australia. We are finding that the team are great to work with, because having the software that, out of the box, provides most of the capabilities, reduces the customization we would have had to focus on.
Having a Preservica installer come down and spend time with us enabled a much more focused process, because then you are in the country, you can just drop in and out. That gave the team more focus time and a good opportunity to establish some relationships and get across the product. The training was really in depth and effective. They’re the key factors that came into play to make the pilot a success.
An attestation (enlistment) form, setting out personal details such as age, next of kin and former occupation
Did any particular element of Preservica stand out and make people say, “wow, really been looking for that for years” or “we didn't even know this existed”?
I think what stood out for us is Preservica’s ability to scale and the ability to meet the demand. What attracted us most is an actual core preservation platform to scale, ingest and preserve multiple formats, the ability to view those formats and the ability to transform all formats mass scale. For example, if we had a thousand WordPerfect files and we wanted them to be migrated across to a new format, it was just a case of “click, click, click”, and you're done! You don't need to go and find the thousand individual documents. It sets us up into the future, especially when we want to ingest using digital online.
Preservica has that capability where we can ingest content much more safely, seamlessly and directly into a preservation platform, and then second handling, third handling documents through different media before it comes into our platform.
What are you looking to do with Preservica in the future? Are there other capabilities you want to help fill the gap in?
I definitely want to look at how we can influence the future roadmap of Preservica, and, on that same line, a seamless, digital transfer. We currently have an innovation trial going on with two Australian industry players, exploring how to automate the complex determinations of records. We currently have a process where we create records, authority instruments, and give it to government agencies to say, “okay, now use the instruments and tell us which records need to be kept and which can be disposed”. This is a manual process at the moment.
The innovation trial I’ve currently got underway explores how we could use AI and Machine Learning to translate these record authority instruments into the AI-ML platforms, and then have this engine go to Mass Storage and look for the records that need to be kept. This is where Preservica comes in – the Australian government agencies can then automatically transfer these records over to us, and then Preservica picks it out and preserves it. Everything's going to be automated all the way, from finding the record, to transferring it, to preserving it. That’s the future and we need to get there. I don't see it as a 10- or 20-year plan - it needs to happen sooner rather than later.