Formats over time: exploring UK web history
The recent paper by Andy Jackson of the British Library can be found here http://arxiv.org/pdf/1210.1714v1.pdf and postulates that file formats for web based technologies are lasting for longer than originally thought. However, they do not last forever and ultimately will become unreadable.
On the one hand this is good news — we have longer to react to the threat of file format obsolescence. However, as he points out, there will come a time, maybe in 10 years rather than 5, when the quality and accuracy on the information starts to degrade. So this just delays the problem rather than removing it.
The other observation is this only covers internet based information. This is, by definition, publicly accessible and thus uses mainstream formats that are widely supported. The variety of formats used within an organisation is much more varied can include highly niche formats that hold critical content. The digital preservation challenge for these files is even more acute.
The other challenge is that expectations of quick access are much more demanding. If a file is found but can only be read using a special environment running archaic software it is likely in a post-Google world that is will just be ignored.
We continue to recommend that Digital Preservation and the threat of file format obsolescence is taken very seriously and that policy and practice is put in place to make sure critical information does not become lost or unreadable.