Harvesting Preservica metadata to your library discovery layer using APIs - Case Study: Preservica to Ex Libris' Primo
Why integrate a digital preservation system with a front-end discovery system?
Here at The University of Manchester (UoM) we have been working on integrating Preservica, our digital preservation system, with Ex Libris Primo, our front-end discovery system. We have different types of content in Preservica requiring different levels of controlled access. Our University of Manchester theses (hard copy) are now digitized and stored in a Preservica ‘collection’ which we wanted to be publicly and easily available.
Background to the project
We wanted to make available several thousand UoM student theses in our discovery system. These were originally hardcopy, when submitted, and date back to 1898. They had been digitized by ProQuest and already made available on the web. They were discoverable in our library catalog Ex Libris Primo but would potentially be behind a paywall depending on who the viewer was. Our Primo instance was already harvesting from our other systems (such as our image viewer Luna and our Archives Hub microsite) and we decided to try the same process with Preservica utilizing its OAI-PMH and APIs. One advantage of this method was the possibility of achieving a solution without the input of specialist developers. Preservica helpfully set up a user group of customers with similar systems and a common aim.
Watch the webinar replay
Jan recently presented about the Preservica/Ex Libris Primo integration at an ELUNA Learns session. Click below to watch the replay of this educational webinar or read on below.
ProQuest supplied the theses in PDF format on hard drives in batches, each batch was accompanied by a file of metadata in MARC format. We realized that we needed to convert the metadata to a better format, split it by record, and ingest it into Preservica. Crucially we needed to ensure Preservica only exposed the content we wanted to expose, keeping the rest under restricted access. The final step was to surface it in Primo.
We used MarcEdit to convert the metadata file to a MODS metadata file and then used xml_split to divide it up into individual records. I wrote a short script to undertake a basic edit to each record in the directory (adding a suitable xml header). These could then be ingested in batches to Preservica. I created a new ‘Primo’ user with the appropriate credentials in Preservica and ensured that only the content required was designated as ‘public’.
How we set up the API to integrate our Preservica digital preservation system with a Primo discovery system
I set up a discovery import profile in Primo and added the credentials of the new ‘Primo’ user. In this way, Primo is allowed restricted access to Preservica and can harvest from it using the OAI-PMH protocol. The profile pulls in Preservica’s XIP records and the selected fields can then be mapped to the required Dublin Core fields.
Primo’s normalization rules manipulate the data when it comes into the system. The rules filter the record looking for particular fields/text and acting accordingly — mapping it to ‘Dublin Core’. There is a rule for each field. Primo has a community section built in which allows you to see and customize the rules other institutions have already created, which is useful when you are writing your own.
From Primo’s back end you can run the harvest (or schedule it to run later), ‘History’ usefully shows you when the harvest has previously run and how successful it was in importing records into Primo. There is also a ‘Delete Source’ job which will simply remove all the records from Primo that come from your source.
Our library catalog is open access and is available online to see in action. We are hoping to add more theses (to the current 30,000) as they come in.
To conclude, we were delighted to find a way to integrate Preservica with Ex Libris Primo and make freely available thousands of UoM theses. Working on this project without the input of specialist developers was challenging and the user group set up by Preservica comprising of clients with similar systems and a common aim was invaluable. Looking ahead, another key piece of work is the integration of Preservica with our imaging platforms Library Digital Collections and Manchester Digital Collections in order to create a seamless public interface experience.