From digital dark age to digital enlightenment

You may have heard about the ‘digital Dark Age‘ in recent media reports…

For us, similarly to other institutions in the archives sector across the world, managing, preserving and providing access to born-digital records (records natively created in a digital format, such as emails, documents and spreadsheets) is a major challenge, now and for the years to come.

Why is this important now?

This year some government departments are due to transfer born-digital records to The National Archives to meet their legal obligations under the Public Records Act.

I have been leading the Digital Transfer Project since July 2014 to ensure that we, The National Archives, as well as other government departments, are ready to embrace this challenge.

We have been very busy over the past year and a half, and our philosophy has been ‘learning by doing’. To avoid reinventing the wheel, we reviewed what other archival institutions around the world were doing in the field of digital records management and transfer. We interviewed key UK government departments in order to identify their challenges early, and be able to proactively find solutions. Alongside this we launched a series of pilot transfers to design and test the new process to appraise, select, sensitivity review, transfer, preserve and give access to born-digital records.

Two transfers completed!

We are proud to say that two transfers have already been completed. You can find born-digital records from both the Welsh Government, (see WA 11, WA 12 and WA 13) and The National Archives (RW 33) on our online catalogue Discovery. These records are available to download for free from anywhere in the world. Four further transfers are planned for the coming months.

Learning by doing

We’ve learnt that two of the main challenges experienced by government departments as part of this transfer process are:

  • extracting meaning from unstructured digital record collections in order to make appraisal and selection decisions. We found that up to two thirds of government departments’ information is held on unstructured shared drives. Some departments also had up to 190 terabytes of information in email servers
  • sensitivity reviewing born-digital records at scale without having to read all the individual documents

Exciting technologies

We decided to look at what existing technologies could offer in the field of digital search, digital information management, digital appraisal and selection and sensitivity review to address these challenges. The results are really promising.

Digital enlightenment cartoon Chris Shipton

Image by Chris Shipton

We explored whether technology-assisted-review – a process involving expert document reviewers using a combination of computer software and tools to electronically classify records – could have interesting applications for the archives sector. Technology-assisted review typically uses eDiscovery software. This type of software was originally designed to extract meaning or identify sensitive information from large unstructured digital collections for the purpose of disclosing electronic information between parties before a trial. Our underlying assumption was that if these technologies were good enough for the legal profession and the courts, they could also be good enough for information and records management.

What we learnt was really exciting. Technology-assisted review is starting to be widely accepted in court cases in the United States. Last year these technologies were also endorsed in a lawsuit by the High Court in the Republic of Ireland. Technology-assisted review can also be as, if not more, accurate than manual review. We found that traditional ‘keyword’ searches return only 20% of relevant documents whereas it is possible for technology-assisted review to return a lot more. We also found that on average 40% of a digital collection is duplicated therefore having a tool that can separate the wheat from the chaff and reduce the amount to review can be particularly helpful!

Although there is no ‘silver bullet’ or completely automated solution, technology-assisted review offers ways to prioritise and reduce the information to be manually reviewed. Particularly useful functionalities include categorisation and clustering, which groups contextually similar information, and therefore allows for macro-level decisions, be they appraisal and selection decisions or sensitivity review decisions.

In-depth research

We have just published two reports that detail these findings. The first is a snapshot of the digital landscape in the UK government, highlighting some of the current challenges experienced by government departments in the management and transfer of born-digital records. The second showcases how technology-assisted review could help addressing some of these challenges. You can download both reports for free.

We feel we have started our evolution from a digital dark age to a digital enlightenment. It is still early days and there is still a lot of work to be done, both collaboratively across UK government and also working with third party partners and the academic community. It’s also important to note that the answers to these challenges are not set in stone. We will have to adopt a ‘lean’ approach – evolving our solutions as technology evolves, which is a really exciting prospect!

Don’t hesitate to contact us at DigitalRecordsTransfer@nationalarchives.gov.uk if you have questions or comments or want to contribute your ideas to address these exciting challenges!

6 comments

  1. David Matthew says:

    UK Government departments should of course refer to Government departments of the UK Government (there is a difference) so that this, I assume, should mean that records of the Scottish Government will still (I hope) go to the National Records of Scotland and similarly for Northern Ireland to their own archive. It is interesting as someone recently pointed out that the UK Parliament have decided not to go digital for legislation because of doubts over how long digital records will last, an issue with which I agree with.

    As far as sensitivity-checking go (which has been largely abandoned in some departments from what I have seen) the idea of reading one e-mail after another must/will be boring in the extreme in my view, paper files are much better!. Trying to macro sensitivity check digital records carries with it the same dangers that documents that should not be in the public domain could end up there. The macro level of selecting files has been shown to be not cost-effective, not least when you consider the number of Treasury files (37,000 plus by the end of this year) and with many duplicates and files not worthy of preservation, even keeping empty files and files marked for destruction, let alone being placed in the wrong series. This all adds to TNA storage costs.

    The reason why keyword searches for paper files don’t work is that the descriptions are often poor (having not changed since the file was created when less papers had been filed)and the cross-references are not reflected in the descriptions.

    Whilst the transfer of digitally born records is to b e welcomed I would suggest that it will not be until transfers from the ‘big’ departments (No 10, FCO, Home Office, MOD and Treasury) that we can see how it will work and how effect. The final question is do we need TNA in the format it is in at the moment, will there will be a time that we don’t need such a large building or at all?.

    1. Nell Brown (Admin) says:

      Hi David,

      Thanks for your comment. We’ve taken your first point on board and updated the blog accordingly.

      Best,

      Nell

  2. […] From digital dark age to digital enlightenment […]

  3. […] UK National Archives – From digital dark age to digital enlightenment  http://blog.nationalarchives.gov.uk/blog/digital-dark-age-digital-enlightenment/ UK National Archives – Researchers of the future  […]

  4. Caroline Pegden says:

    It is interesting to note that the England and Wales High Court have just started endorsing the use of predictive coding.

    In February 2016, Pyrrho Investments v. MWB Property was the first English court decision to consider and approve the use of predictive coding:
    http://www.bailii.org/ew/cases/EWHC/Ch/2016/256.html

  5. […] alongside active records.  For example, a National Archives of the UK blog post mentions that  up to two-thirds of government information is held on unstructured shared drives with some departments holding up to 190 terabytes of […]

Leave a comment

Visit this page for family history and other research enquiries. Please do not post personal information. All comments are pre-moderated. See our moderation policy for more details.

Your email address will not be published. Required fields are marked *