Beyond paper: The digital trail

In June, we hosted a discussion between Professor Lisa Jardine CBE and Professor the Lord Hennessy of Nympsfield sub-titled ‘openness and the national collective memory’. The distinguished historians explored the value of our archival heritage and considered why ‘sustaining the collective memory of the nation is a first-order requirement’.

The event was live-tweeted with the hashtag #sptrail and, while lots of digital topics were touched on, we want to re-visit and expand on some of the key themes that were raised. On Thursday 30 August, between 13:00 and 14:00 BST, we will host a live Twitter debate on the #digitaltrail @UkNatArchives featuring contributions from our Director of Technology David Thomas, Head of Digital Preservation Tim Gollins and Research and Policy Manager Valerie Johnson.

Listen to the podcast of the original debate and please do join us on Twitter on Thursday, using the hashtag #digitaltrail, for an undoubtedly fascinating review of our digital past and future.

Topics that we’re keen to discuss include:

Quantity – will there simply be too much information?

Researchers of the 16th century have commented that there are relatively few records of the period, whether due to archivist’s selections, lack of collection care until recent times or the degradations of time. But these researchers also attest to relishing the challenge of the hunt for information, of successfully following a difficult trail.

In principle we could keep a very large amount of digital data – but is this likely? What about cost and storage implications?

Also, do we need to worry that there will be vast amounts of data? There is a vast amount of information available on internet, yet this doesn’t present a challenge as we use tools to find what we need. Why would or should a digital archive be any different?

Is it that historians and researchers fear they will miss out on what they enjoy most – the thrill of finding leads, the thrill of holding the physical record?

Serendipity – the joy of an unexpected discovery

With the paper record, it is sometimes a piece of paper filed away unnoticed for decades that can be the most interesting. Or, eureka! The moment you make an unexpected discovery while looking for something completely different, just because it was in the same archive box.

Engineering for serendipity is quite hard. Several marketplace websites, for example, Ebay and Amazon, try to provide a serendipitous experience. Could this be recreated for an archive?

If it’s all in how information is presented, can we generate digital files or ways of ordering information in a similar way to boxes or physical files?

Marginalia – will this be lost in digital records?

When looking at paper files it is often the annotations, scribbles on drafts, doodles and drawings, that make the content as rich as it is. The very nature of the digital medium means researchers will never get that sense of ‘this is the handwriting of X’. But does the digital record still contain the richness of opinion, the decisions made and the thought processes of the file creators?

Purist records managers may say, yes, we have versions in ERM systems. BUT are these kept in a way that can be easily seen?

Many file formats, (e.g. Microsoft Word) contain an edit history; this can provide some clues as to who contributed what and when to a document. Is this good enough? If we moved away from the primacy of paper and begin to use a purely digital paradigm could a wiki show more edit history than a paper file ever could? Would that be too much information?

But do we lose the sense of an author’s opinions and thoughts? Does this matter? Is it necessary in order to write history?

Forgery – is it easier to create digital forgeries than paper?

It can be quite straightforward to insert forgeries into paper files. Studies of forged documents (for example medieval forged charters) can in themselves be quite interesting – both with regards to the nature of the forgery and why it was forged.

However, digital records, while in custody of an archive, are much harder to forge. Measures can be put in place to guarantee authenticity and checks can be made to ensure a document hasn’t been tampered with.

Can we really prevent forgeries with digital files? And if a digital forgery requires tampering with the original file, how do we ensure the original information isn’t lost?

Finding things – what is this trail we’re following?

How do we know what we’re looking for? And, how do we know there will be something at the end of it?

Essentially we’re following a trail of references – clues we find, references to other documents within other documents, appendices, bibliographies and more.

As we go forward, and with the rise of the semantic web, linking and cross referencing will increase and improve. If links are vital in the digital world, as they are when following a trail, has digital, in fact, improved the ‘trail’? Or does this make the trail more tangled?

Do you need to know which sequence of links to follow, to find what you are looking for?

But, if the links exist, you just need to follow them. Much like you would a paper trail…

Is this where digital serendipity lies? Following the right sequence of links, clicking on a link that just looks interesting …

2 comments

Matt Palmer says:

Wed 29 Aug 2012 at 12:57 pm

You say “digital records, while in custody of an archive, are much harder to forge.”. I don’t think this is correct. The issue is not the difficulty of forging them – it is whether they can successfully be inserted into the archival store. Hacking the underlying systems is clearly harder than simply adding a piece of paper to a checked out file.

However, if the underlying systems are successfully hacked, then detecting a digital forgery may be much harder, as there is essentially no physical evidence of the age or means of production of the digital forgery. The age of paper and ink, and the means of printing can be used to diagnose physical forgeries

There is an interesting paper on content integrity of digital archives available here:

http://www.hpl.hp.com/techreports/2006/HPL-2006-54.pdf

This approach essentially relies on maintaining a cumulative hash of the digital content and publishing the hash value at regular intervals, allowing audit of integrity.

1. Ruth Ford (admin) says:
  
  Wed 29 Aug 2012 at 2:16 pm
  
  Hi Matt, Thanks very much for commenting – we hope you can get involved tomorrow lunchtime?