Since February 2014, we’ve been investigating methods for linking people who appear in our records, to start revealing the connections that tell the real stories of people’s lives.
Two years later this project, Traces through Time, is at the point where users of our catalogue, Discovery, can benefit from our efforts.
If, in the last few days, you’ve been researching a person who served in the First World War there’s a chance you’ve noticed our new feature: a section reading ‘other possible matches’. This feature went live just a few days ago at the end of March. If you have already come across it, we hope you liked it.
If you haven’t seen it, take a look at the description of Alfred Minall’s naval service record in series ADM 337. Scroll down to the bottom of the page for links to the same Alfred Minall in other records.
We’ve all made links like this on Discovery while researching our family history. It’s relatively easy for a person to decide whether two records relate to the same individual. We can make a judgement based on the information presented to us: maybe the date of birth is very close, or maybe an unusual middle name is the clincher.
But how can we ‘teach’ a computer to make these very human judgments?
Well, that’s exactly what our data scientists and statistical experts here at The National Archives have been working on. We have identified ways of linking names across records, with the added value of a confidence rating.
For example, when we look at Alfred Minall’s records we can be reasonably sure that they relate to the same individual, but how confident are we that Kenneth L’Estrange Davy in AIR 76 is the same person as K. L. E. Davey who appears elsewhere in the same series? Making links is only half the task; we also need to calculate the statistical likelihood that they really are the same individual. This is based on a range of measures, such the similarity of name and dates, whether a name is unusual or whether we have other information such as service numbers.
We’ve already found a number of links that searching in Discovery would never have been able to unearth. Take the service record of Gerald Nassau Stewart Lane in ADM 273. Or is it actually Gerald Nasseau Stuart Lane? Either way, the new system links the two together as a ‘strong match’ despite the different spellings. As an added bonus, the system also suggests a link to ‘G. N. S. Lano’ – and a detailed look at the original documents reveals that in this case all three records do, indeed, relate to the same person. See Mark Bell’s recent blog post for more examples of weird and wonderful names.
The Beta 1 version of this new function covers twenty records series from the First World War period and over half a million newly identified are links now available through Discovery, each with an associated confidence score. We will be linking to more series over time and we’ll also extend this work to include records from earlier time periods and links to other archives’ collections.
This is our first public release of this new feature and we’ll continue to develop and improve it over the coming months. As always, we’d love to hear your feedback – please add your comments below. We’d be especially interested to hear from you if the ‘other possible matches’ feature helps you find links that you weren’t previously aware of. We hope this new feature opens up new and exciting avenues for your research!
Really impressed by this data mapping initiative. Hope it helps to reveal new insights into relationships, acquaintances, and associations that have otherwise been buried.
I’ve been looking for a tool like this to relate items/people/objects visually, beyond the usual genealogy approach of x is a parent of y. Is this a tool that you’ve developed, or with a partner, or plan to make available (commercially/open source?).
Keep up the great work and innovation.
Thanks for the feedback, Andrew.
The ‘other possible matches’ feature is an output of the Traces Through Time research project: http://www.nationalarchives.gov.uk/about/our-role/plans-policies-performance-and-projects/our-projects/traces-through-time/
This is a beta version so, based on user feedback, we’ll be developing the feature further. We’re not at the stage yet where we can make a tool available but we are in discussions about how we do that in the future.
Please keep an eye on our website for further enhancements to this new feature.
I’m puzzled. I should have thought that “Timothy Goddard Elliott, Rifleman Queen Victoria’s Rifles 1914 to 1919” was quite enough, but alas, I get “No match”. He is described in full in my book “Tim’s Wars” (Loaghtan Books). Robin E Gregory
Whilst this may be useful, surely Discovery can do a similar job without employing scarce resources (which could be used for uncatalogued records) when the chances are limited of being sure you have found the right person, or even the person of the right sex. I am sure most people would be able, when told that their ancestors may not have a full name in the catalogue, to find them. Discovery often does not list the person’s names in full (if known), it is the way it used to be done (Treasury being a prime example) and in some cases still are and of course there are a number of examples pre-WW1 where men have used aliases and how do you match those up ( I have one ancestor who joined the Royal Navy, went and joined the Liverpool Regiment and then re-joined the Royal Navy) and of course when women marry?. Of course it does depends on the entries in Discovery being correct, unlike ADM 188 which has so many errors I would not like to estimate how many thousands there are.
I should add that when my ancestor joined the Liverpool Regiment he did so under an alias.
Using clever matching should unearth valuable links that will help genealogists and historians alike to make significant strides in their research. Looking forward to seeing more and more of it. Lots of learning and refining will need to happen but starting somewhere is a huge leap in the right direction. Thanks TNA for taking that first leap!
Thank You TNA for skilling up to produce this wonderful tool. It is only early days yet for your project, but I am hopeful of some great ‘finds’ in the future from use of this facility. THANK YOU in advance, for what I eventually hope to find.
I have only just discovered this blog and only learned about Discovery through taking a genealogy course with the University of Strathclyde. This is very interesting information as I plan to visit in September.
I came across some of the suggested links but I am not impressed:-
BT 351/1/118208: John Rennie, born 1891, Liverpool: possible match with ADM 188/1114/108251, John Rennie b. 8.1.1891 Kilmarnock, Ayrshire
ADM 196/8/505: James McCraith (a surgeon) b 28.6.1836; possible match to ADM 188/1107/101100 b 7.6.1884 Largs, a short service stoker (and a gap of 48 years in the date of birth!)
ADM 188/1101/7886 Donald Campbell b.7.7.1899 ‘Duntroon’ (Ayrshire): strong match/possible/weak matches to BT 377/7/61172, b. 7.7.1898 Erodale, Ness, Stornoway and to BT 351/1/20396, b.1899 Barra and to BT 351/1/20391, b 1900 Forest Gate.
It should not just be a link to a name but with dates of birth as well as places of birth and of course names like Donald Campbell are quite common.
[…] as probabilities, not absolute values. Connections within the collection are then sought out to determine if, for example, different names are in fact referring to the same individual. I found this approach very intriguing, it reminds me of other projects that aim to use […]