This blog is published as part of International Archives Week, which explores the theme of ‘Designing your archives in the 21st century’.
Preserving digital content over time is all about understanding and managing the risks. This is really no different to how we manage our older, physical collections. For example, our geographic location in Kew brings its own risk: the Thames could flood! However, the building has various design features to reduce the likelihood that flood water will get inside and cause damage. Even our lovely ponds are partly there to help control water run-off.
Digital risks
While there are some physical risks to digital collections (such as damage to hard drives and tapes), most digital risks are harder to visualise. For data stored on magnetic media, there is a tiny risk that individual magnetic particles that record the data will ‘flip’. This changes a value read from the media from a one to a zero (or vice-versa). For some files that might not make much difference:
- In plain text a single character would change, introducing a small error
- For uncompressed images (tiffs) the colour of a single pixel would change
However, for other file types, such as zips or jpegs, such a change could be more dramatic. It might even mean that a file wouldn’t open at all.
Simply being sure that a file can be opened may not be enough. If it is not identical to the file we originally received is it an authentic record? Will we be trusted by researchers if we can’t show the authenticity of our records?
We can start to guard against these risks by having multiple copies of files. We also create checksums for each file, which provide a digital ‘fingerprint’ to show that a file is unchanged.
Understanding the risks
We really want a way to see how all the different risks interact. Ideally, this would also show us what the greatest risks actually are – and these might not be ones we assume are important. Or, we might see that several risks can be protected against easily and cheaply, and together have a bigger impact than tackling one big risk.
We think a Bayesian Dynamic network will allow us to do all this. We carried out some internal statistical modelling work to investigate this approach. Now, we’ve begun to work with experts at the University of Warwick’s Applied Statistics and Risk Unit to take this further. We’ll be looking to bring in a wide range of perspectives.
If you are interested in reading more about this work, visit the Digital Preservation Coalition website.
David Underdown is a Senior Digital Archivist in the digital archiving department at The National Archives. He joined the archives from an IT background in 2005 and holds a BSc (Hons) in Mathematics from Imperial College London.