Preservica Logo
by Rob Sharpe

Do we need active preservation?

At recent conferences there has been a lot of talk about whether formats really do become obsolete.

December 3, 2010

At recent conferences there has been a lot of talk about whether formats really do become obsolete (see for example http://blog.dshr.org/2010/11/half-life-of-digital-formats.html) and thus whether the research that has taken place to address such obsolescence is actually needed? In this posting, I'll address both these questions.

Do formats become obsolete?

To address the first question we first need to ask ourselves what does an "obsolete" format really mean? A loose, working definition might be "a format for which no supported software that can interpret it exists". Personally, I think there are lots of formats that meet this criteria (e.g., at Tessella we have a lot of old information trapped in Microsoft Project 98 files: a format that is obsolete according to my definition). Other people have come up with other examples but such examples are sometimes dismissed as either too old or not popular formats and thus not of interest. Dismissing such examples strikes me as a bit of a circular argument: it is simply a statement that popular, modern formats are not obsolete. Well, yes (of course!) but that doesn't mean that are not other formats that are, at least by this definition, obsolete.

Another thing that is used in evidence of the lack of obsolescence is that there is not a lot of content that has been migrated by major memory institutions for preservation reasons. This is a true statement but is also, to some extent, another circular argument since most such institutions either reject content that is not a "preservable" format (and thus only ingest content that doesn't need immediate preservation) or ingest such material but don't have the tools to deal with them (which is not a good argument that we don't need such tools).

Another part of the argument goes along the lines: anyway any format can be rescued through sufficiently clever "digital archeology". This strikes me as self-defeating (since it seems to accept that formats do become obsolete). Really it is saying that such obsolescence doesn't matter but I'd argue that it does because this approach restricts access to such material to only people with the correct background/knowledge/tools to deal with such obsolescence: a restriction I think is unacceptable.

In fact, it is worth mentioning that a number of institutions already regularly perform a lot of migration for presentation purposes. These migrations enable content to be accessible to more people (e.g., migrating documents to a version of PDF or migrating images to a lower resolution, compressed format to allow web-based dissemination). This means ordinary users are not burdened with complex technical needs: it means information is made available to them in a form they readily understand.

Do we need to deal with obsolescence?

This brings me on to the second issue: i.e. if we don't do many preservation migrations, do we need "complex" "Active Preservation" functionality (i.e. the concept of measuring and comparing significant conceptual properties before and after migration) which was a major tenet of projects like Planets? I think the answer to this is yes and there are two main reasons.

First of all, the assumption in the question seems to be that presentation migrations are somehow less important or need less checking. I disagree with this assumption as I think it is important that the authenticity of presentation migrations is checked if the information is being presented as a true interpretation of the original. For example, we have found image migrations that have failed a check that the significant property of colour integrity has been preserved despite using an image migration tool that had performed thousands of successful migrations in the past. Humans can detect the error easily but this is not a solution (given that often millions of migrations are needed) but really a statement of the problem: i.e. it is important because people will interpret the information differently if we don't get it right! Hence, the only realistic way of doing this is using an automated "Active Preservation" system.

Secondly, the "complexity" required is really a solved problem and is thus abstracted away from end users. For example, the functionality exists in our archival offering, SDB, and is available for our customers to use. It is true that there is the issue of a lack of best practice but this is not really an issue of complexity but one of lack of practice and lack of tools to deal with every format

Hence, I believe that mainstream digital preservation research over the last 5 years or so has moved us considerably further forwards and should be celebrated. I also think the lessons being learned today in dealing with presentation migrations will hold us in good stead when more preservation migrations are needed at some point in the future. Thus, the argument should be about what is best practice (which tools are the best for migration? which target formats should we aim at? which properties to measure to validate the migration? etc.) and not whether any practice is needed at all.