Preservica Logo
by Julian Fowler

Incorporating emulation into a business-as-usual digital preservation workflow.

We have a number of migration (and charactersation) tools supported in our Safety Deposit Box (SDB) solution.

April 19, 2012

Great piece by Euan Cochrane on his blog (http://digitalcontinuity.org/post/20400819609/incorporating-emulation-into-a-business-as-usual). We have just finished working on the KEEP (Keeping Emulation Environments Portable) project. However, you are correct to say that most efforts have been focussed on migration: and we have a number of migration (and characterisation) tools supported in our Safety Deposit Box (SDB) solution.

However, as part of KEEP we did add in an end-to-end emulation workflow to SDB as well. This allowed users to choose to emulate an object as an alternative to migration. My (personal) observations of this are:

  1. We can use the output of characterisation to determine an emulation pathway (as you suggested) by looking up the format (and potentially other properties).
  2. We still do format validation (as Jan suggested in his comment). I see DROID as a tool to use to identify a list of potential formats for each file. The format validation tools like Jhove then confirm this (and choose one from the list of candidates). This confirmation is as important to choosing the correct emulation approach as it is to choosing the correct migration approach. Having said that DROID normal produces a single (correct) identification and often we don't have a format-validation tool so I can't say it is vital.
  3. In addition we still need to determine the unit of emulation (in the same way we determine the unit of migration). We call this "conceptual characterisation" where we infer the presence of an intellectual entity (the "component") from the set of archived files. For example, we don't allow a stylesheet to be migrated or emulated but you could do either action to a web page (which consists of many files).
  4. As you also point out, we still need to determine the best fit to the stored software images. This is achievable for the simple cases we tried (e.g., a version of Word Perfect running on a specific version of DOS) but it could be quite complex.
  5. There may be legal / licensing issues with using the image. KEEP looked into this but to keep this post short(er) I'll just say it is complex!
  6. We had some problems with invocation. This depends on the access scenario but we really wanted to demonstrate web access to a Word Perfect document in a similar user experience as someone would get if they accessed a document that had been migrated to, say, PDF1.4. This meant we renamed the file to be of a given name and a given location accessible to the image. Then we either had to gave the user instructions for how to use the image to load it (i.e. how to start Word Perfect and then where the file would be located so it could be loaded) OR we had to tailor the image so it auto-loaded a given named file. The latter is a much smoother user experience (more comparable to that obtained by migration) but now the stored software image is now even more tailored to a given scenario so we'd need to keep even more images!
  7. In doing this, we used up a fair bit of server resources (i.e. invoking the emulation framework, loading the image etc.). All this for one single user's access request to a single intellectual entity. Of course, we's also need to keep this available for a while before we could tear it down as well.
  8. There are also security implications of allowing people to run emulated software on archival servers.

However, having said all of the above: it did work!

My feeling is that emulation is probably overkill for access to a simple text document. Hence, migration might be a better solution here. In saying that I would acknowledge that this means I am making two assertions that I can't fully test: (i) the migration is not lost and (ii) the behaviour required remains available in modern software. Emulation has the advantage of not requiring me to make these assertions. This is more important where the behaviour is complex and hard to reproduce in alternative software (e.g., a CAD application). In such scenarios, it is likely that the end user will be technically savvy so it is possible that some of the server-side issues I mentioned above would go away and instead the system could provide a link to the necessary files, the emulation framework and the appropriate images and let them get on with it.

Thanks again for a stimulating post.