Digital archiving is a risky business

This blog is published as part of International Archives Week, which explores the theme of ‘Designing your archives in the 21st century’.

Preserving digital content over time is all about understanding and managing the risks. This is really no different to how we manage our older, physical collections. For example, our geographic location in Kew brings its own risk: the Thames could flood! However, the building has various design features to reduce the likelihood that flood water will get inside and cause damage. Even our lovely ponds are partly there to help control water run-off.

Digital risks

While there are some physical risks to digital collections (such as damage to hard drives and tapes), most digital risks are harder to visualise. For data stored on magnetic media, there is a tiny risk that individual magnetic particles that record the data will ‘flip’. This changes a value read from the media from a one to a zero (or vice-versa). For some files that might not make much difference:

  • In plain text a single character would change, introducing a small error
  • For uncompressed images (tiffs) the colour of a single pixel would change

However, for other file types, such as zips or jpegs, such a change could be more dramatic. It might even mean that a file wouldn’t open at all.

Simply being sure that a file can be opened may not be enough. If it is not identical to the file we originally received is it an authentic record? Will we be trusted by researchers if we can’t show the authenticity of our records?

We can start to guard against these risks by having multiple copies of files. We also create checksums for each file, which provide a digital ‘fingerprint’ to show that a file is unchanged.

Understanding the risks

We really want a way to see how all the different risks interact. Ideally, this would also show us what the greatest risks actually are – and these might not be ones we assume are important. Or, we might see that several risks can be protected against easily and cheaply, and together have a bigger impact than tackling one big risk.

We think a Bayesian Dynamic network will allow us to do all this. We carried out some internal statistical modelling work to investigate this approach. Now, we’ve begun to work with experts at the University of Warwick’s Applied Statistics and Risk Unit to take this further. We’ll be looking to bring in a wide range of perspectives.

A diagram showing a small Bayesian network relating to the storage and conditions interacting with the likelihood of a digital file being opened.

A small Bayesian network developed at The National Archives during the initial research phase. This shows how considerations of storage media type and environmental conditions can interact to affect how likely it is that a digital file can be opened (percentages shown are illustrative only).

A diagram of a Bayesian network showing an incident exposing storage to high levels of magentic flux.

The initial network being used to investigate the impact of standardising on disk storage, but experiencing an incident which exposed the storage to high levels of magnetic flux, increasing the risk of corruption.

If you are interested in reading more about this work, visit the Digital Preservation Coalition website.

David Underdown is a Senior Digital Archivist in the digital archiving department at The National Archives. He joined the archives from an IT background in 2005 and holds a BSc (Hons) in Mathematics from Imperial College London.

Leave a comment

Visit this page for family history and other research enquiries. Please do not post personal information. All comments are pre-moderated. See our moderation policy for more details.

Your email address will not be published. Required fields are marked *