Dirty data at the UKAD Forum 2016

‘I have looked into the abyss … and it is data!’

So said Steve Jupe, Head of Archive Governance and Policy at the BBC, in a day of all things data at The National Archives.

Every year for the past six years, we have hosted the UKAD Forum as members of the UK Archives Discovery Network (UKAD), and for the past two, in partnership with the Archives and Records Association (ARA)’s Section for Archives and Technology (SAT).

The Forum is an opportunity for archivists to share ideas for networking collections to make it easier for users to make connections between them. This year the theme was ‘Data Matters’!

Joining Steve from the BBC, we had representatives from the British Library, the British Museum, the Chelsea College of Arts and the Royal College of Nursing, offering different approaches to managing data to make archive collections more discoverable.

David Reeve, Head of Information Strategy at Jisc, opened proceedings with the basics, challenging the delegates to think about digital data differently. He began what became a day of soundbites and cultural references with the help of Star Trek’s Lieutenant Commander Data – did we really need to duplicate him?

From The National Archives, Senior Archivist Andrew Janes wanted his ‘dirty data’ to pass the Ronseal Test: it should ‘do exactly as it says on the tin’. All data is ‘dirty’ to an extent – that is, it needs cleaning up and sorting out. What should our focus be?

Do we have time to improve the quality of our existing data, our current collections information? Or should we aim for data adequacy and provide enough good data to help users find things by just title or description, date and a unique number or code?

Image of slide from presentation saying: The bare minimum?

Is the bare minimum enough for discoverability?

So maybe archivists should approach things a bit differently.

At the British Library they have recently published their Collection Metadata Strategy – aptly named Unlocking the Value – and Bill Stockting described how thinking had shifted from managing collections as catalogues to managing them as data.

Athanasios Velios from the Chelsea College of Arts, proposed an even more radical approach: perhaps we should be describing our collections as connected series of events? If archivists thought about archives as documenting history – as data providing evidence of historical events and not as separate collections of objects – it would help users navigate through them and between them.

But sometimes an individual object can tell almost the whole story. There was a momentary hush when the British Museum’s Glenn Cumiskey showed us The Lampedusa Cross. He explained that it had been fashioned from pieces of a boat that was wrecked off the coast of Lampedusa, Italy in 2013 with over 300 Eritrean and Somali refugees drowned and another 155 saved by the townspeople.

He asked, ‘What is Data?’ and argued that while we had to know how to manage the bits and the bytes, we needed to celebrate the value and importance of our very human interactions with that data. It is those interactions which turn data into knowledge. He illustrated his point with a quote from Alfred, Lord Tennyson’s 1834 poem ‘The Two Voices’ to remind us:

‘Forerun thy peers, thy time, and let

Thy feet, millenniums hence, be set

In midst of knowledge, dream’d not yet.’

The poster sessions at the Forum offer a different way to engage with colleagues with knowledge and experiences to share. This year the buzzword was ‘maps’: from a crowd-sourced gazetteer to digitised Welsh tithe maps; from mapping between disparate performing arts, digital and physical collections to identifying gaps in digital preservation workflows.

An image of Rache MacGregor's poster, entitled 'A map of data - finding your way'

Rachel MacGregor’s poster

In the afternoon session – aside from a look into the BBC’s ‘data abyss’ – the focus shifted to medicine.

Image of presentation slide entitled: 'Breaking News: Archival Data Infiltrates Library Resource Discovery System'

Breaking news at the Royal College of Nursing!

Teresa Doherty from the Royal College of Nursing broke the news that archive data was subtly infiltrating the College’s library resource discovery system – and to good effect! Users of the system can now access any one of their 17 datasets through a one-stop shop for different types of data including both the library and archive catalogues.

There was more of an invitation than a newsflash from The National Archives’ Jonathan Cates, asking for imaginative ideas on how to exploit the data contained within the Hospital Records (HOSPREC) database. Compiled by the Wellcome Library, the database is in need of a facelift – ‘Too much beige!’ Jonathan complained – but he wants to do more with the data itself and is open to any suggestion, evidenced by the final blank slide of his presentation.

The hot topic of managing academic and other research data – and whether it was something for archivists to be doing – was the question posed by Alex Eveleigh from University College London (UCL) and Victoria Cranna from the London School of Hygiene & Tropical Medicine (LSHTM). And their conclusion? Well, haven’t archivists always been managing research data? Is digital research data really any different?

To finish the day, the panel of speakers – led by UCL lecturer and Programme Director Jenny Bunn – left us with a welcome positive spin on managing data. We might have lots of different types of dirty data – huge and growing amounts of it – but if we accept that digital data is chaotic, then the possibilities for helping people to use that data in new and different ways are endless.

Image of panel members for the final session of the day.

Do panel members agree on the most important Data Matters?

So I would rather not focus on Joseph Conrad’s ‘The horror! The horror!’ – borrowed so eloquently by Steve at the BBC – but instead go back to Tennyson’s ‘The Two Voices’ which ends with the rousing cry: ‘Rejoice! Rejoice!’

Perhaps that’s how we should be feeling about Data Matters?

From the end of April you will be able to view the presentations and posters online, or listen to the presentations, panel discussions and Q&A sessions as podcasts: www.nationalarchives.gov.uk/archives-sector/engaging-with-ukad.htm  


  1. David Matthew says:

    Descriptions are all very well but information contained with them may be relevant and may we in a few years have so much data that we don’t know what is relevant and what is not. Perhaps you need to ask people what they want and how do they look for such material. In my view archivists have always had ‘research data’ but it is what they have been given by others. HOSPREC is a useful tool but it needs reorganising into types of records across the archives and not individually and not just what is or is not available.

  2. Colin Moretti says:

    “From the end of April you will be able to view the presentations and posters online, or listen to the presentations, panel discussions and Q&A sessions as podcasts: http://www.nationalarchives.gov.uk/archives-sector/engaging-with-ukad.htm

    Although the programme for the meeting and the speaker biographies and their talk synopses are downloadable at the above link neither the presentations nor the posters seem to be available. When will they be available please?

Leave a comment

Visit this page for family history research enquiries. Please do not post personal information. See our moderation policy for more details.

Your email address will not be published.