Making connections: tracing people through our collection
Since February 2014, we’ve been investigating methods for linking people who appear in our records, to start revealing the connections that tell the real stories of people’s lives.
Two years later this project, Traces through Time, is at the point where users of our catalogue, Discovery, can benefit from our efforts.
If, in the last few days, you’ve been researching a person who served in the First World War there’s a chance you’ve noticed our new feature: a section reading ‘other possible matches’. This feature went live just a few days ago at the end of March. If you have already come across it, we hope you liked it.
If you haven’t seen it, take a look at the description of Alfred Minall’s naval service record in series ADM 337. Scroll down to the bottom of the page for links to the same Alfred Minall in other records.
We’ve all made links like this on Discovery while researching our family history. It’s relatively easy for a person to decide whether two records relate to the same individual. We can make a judgement based on the information presented to us: maybe the date of birth is very close, or maybe an unusual middle name is the clincher.
But how can we ‘teach’ a computer to make these very human judgments?
Well, that’s exactly what our data scientists and statistical experts here at The National Archives have been working on. We have identified ways of linking names across records, with the added value of a confidence rating.
For example, when we look at Alfred Minall’s records we can be reasonably sure that they relate to the same individual, but how confident are we that Kenneth L’Estrange Davy in AIR 76 is the same person as K. L. E. Davey who appears elsewhere in the same series? Making links is only half the task; we also need to calculate the statistical likelihood that they really are the same individual. This is based on a range of measures, such the similarity of name and dates, whether a name is unusual or whether we have other information such as service numbers.
We’ve already found a number of links that searching in Discovery would never have been able to unearth. Take the service record of Gerald Nassau Stewart Lane in ADM 273. Or is it actually Gerald Nasseau Stuart Lane? Either way, the new system links the two together as a ‘strong match’ despite the different spellings. As an added bonus, the system also suggests a link to ‘G. N. S. Lano’ – and a detailed look at the original documents reveals that in this case all three records do, indeed, relate to the same person. See Mark Bell’s recent blog post for more examples of weird and wonderful names.
The Beta 1 version of this new function covers twenty records series from the First World War period and over half a million newly identified are links now available through Discovery, each with an associated confidence score. We will be linking to more series over time and we’ll also extend this work to include records from earlier time periods and links to other archives’ collections.
This is our first public release of this new feature and we’ll continue to develop and improve it over the coming months. As always, we’d love to hear your feedback – please add your comments below. We’d be especially interested to hear from you if the ‘other possible matches’ feature helps you find links that you weren’t previously aware of. We hope this new feature opens up new and exciting avenues for your research!