In her presentation for The National Archives’ Catalogue Week 2022, Meada Wadman gives an introduction to the cataloguing of selected Second World War service personnel records. She discusses the enhancement and standardisation of data from transferring departments, series allocation, data loading and approaches to addressing project challenges.
Transcript
Thank you very much for listening to this Catalogue Week presentation about some of our work cataloguing three brand new series of military service personnel records. These series are WO 420, WO 421 and WO 427 and they form part of a much larger accession of records from the Ministry of Defence, which The National Archives will receive and catalogue over several years. I have been working on the project as part of my role in TNA’s Cataloguing, Taxonomy and Data team.
Although all three of the series contain records of those who served in the army as other ranks during Second World War, they represent a diversity of experience. While WO 420 concerns the records of those who served in the Corps of the Royal Electrical and Mechanical Engineers and includes more than 56,000 files, WO 421 pertains to the records of those who served in selected smaller corps and includes almost 100,000 files. In contrast WO 427 is smaller, comprising 471 service files of nurses and other ranks in Eastern Africa. Together these series comprise more than 150,000 records in total, I’d like to explain some of the work behind preparing, cataloguing and releasing them to the public via our online catalogue, Discovery.
Unlike some of the cataloguing work that takes place at TNA, this work is not part of a catalogue enhancement project. Instead, these series were brand new accessions and we needed to start from scratch. When it comes to preparing and releasing the catalogue descriptions.
We received the physical files from MoD in big tranches, known as collations. For WO 420 and WO 421, alongside the physical records, MoD also provided us with a basic skeleton of information about the contents of each of the files, including, for example, the name of the serviceman. This data was certainly a very useful starting point for cataloguing the records, but we did have to be careful about how we approached it. We didn’t have ready-made catalogue descriptions, but rather functional databases that MoD had used to manage the records during their ‘working life’, so we knew the information provided would have to undergo a considerable transformation in order for it to be useful to us as catalogue descriptions.
When it came to WO 427, we didn’t have any metadata at all, just the physical files that MoD had sent to us. For this series, colleagues from Collections Expertise and Engagement indexed the files themselves, extracting key information to build an understanding of the history and experiences communicated by this series. They then passed this data on to us to catalogue and release to the public.
One area where the data provided by MoD was particularly useful was in helping us to decide what information we ought to include in the catalogue descriptions. We knew that we needed the descriptions to be useful to readers searching for individual service people and as consistent as possible. In other words, they had to be compliant with the standards framework surrounding TNA’s work. However at the same time we had to be realistic about the resources we had available for the project. Of great importance, our descriptions also had to be ethically and legally sound; we needed to make sure that we were showing the respect required to each record subject, particularly as many of the files include considerable personal information. Taking all of this into account we settled on catalogue descriptions showing initials, surnames, service numbers and dates of birth where this information was available. There would also be some caveats to protect personal information. Specifically this meant that for individuals born less than 100 years ago – and therefore considered likely to still be alive – we would redact the descriptions to show just the initials and surname and the year of birth. It also meant that as a blanket rule we decided to close all files for public consultation until 115 years after the year of birth to protect personal information.
From this foundation we were able to make a start on transforming the data into catalogue descriptions.
The first step was to decide whether or not we felt the records included in each collation transferred to us by MoD were cohesive enough to form a series in their own right. To make this decision we worked closely with military records experts in our Collections Expertise and Engagement department. These colleagues also undertook research into the content and history of each of the collations, building up the information which we later used as the series level description made available on Discovery. This provides readers with information about the type of files included within the series, their context, history and related series held by TNA. It was at this point that the files went from being contained in three collations organised by MoD into the form in which they will be preserved and made accessible by TNA, as three series: WO 420, WO 421 and WO 427.
Once we had assigned the records into series and had a good understanding of their context, we were able to start creating catalogue descriptions for each file. The first stage was to take intellectual control of the data we had from MOD. Colleagues in our IT services department delivered this data to our team in spreadsheets extending to tens of thousands of rows. We then assigned TNA references to each file, allowing each file to become a piece. A ‘piece’ is a valid unit of description that we could upload to Discovery, embedding each file in TNA’s catalogue and elucidating its context, as well as providing a deliverable unit for readers to request in the reading rooms.
The next stage was to make sure that the data we had was accurate and consistent. Although the bulk of the data from MOD was reasonably high quality and could be adapted to our needs, we did find several areas that needed to be amended. These instances ranged from examples of clear typos, such as an open square bracket featuring in place of a ‘P’ in a surname, to dates of birth that were clearly erroneous (such as the suggestion that someone born in 1820 could have fought in the Second World War), all the way through to completely absent rows of data, where MoD had not captured any information about particular files. In some cases were able to amend the data from common sense, identifying and rectifying clear typos as we came across them, in other instances we needed to check the documents themselves to record the information that was missing. Because of the sheer scale of these series, even a small percentage of inaccuracies translated into considerable numbers of records to inspect. This has become a considerable area of work, and to date the team have checked a rather staggering 1656 files in the repositories at Kew to collect, amend and clarify data that was either absent or was suspect enough to raise considerable doubt about its validity and accuracy. We updated the data in Excel spreadsheets and recorded each inaccuracy/ inadequacy we found and its resolution.
Each of the three series presented slightly different data quality challenges. WO 421 presented more examples of completely unindexed files, where in comparison WO 420 included more erroneous dates of birth. We became familiar with the demographics of each series as we worked through them and developed a good sense of what inaccuracies we particularly needed to look out for in each series. For WO 427, indexing the material ourselves at TNA was a big undertaking but also gave us complete confidence over the quality of the cataloguing data.
Every stage of preparing and cataloguing this data has required close attention to detail and a sound understanding of archival practice as we forge a new process for dealing with these extensive sets of records. At each stage we recognised that we have a responsibility to the public and to each record-subject to employ a consistent and fair approach to cataloguing each file. In all of our work we have to maintain a quality standard while also making the files available to the public in a timely manner. As we catalogued WO 420, WO 421 and WO 427 we became aware that we needed some additional resources in order for this to be achieved. We were successful in acquiring a new Cataloguing Officer, James, who is a qualified archivist and has been able to share responsibility for data quality management; checking documents, amending faulty data and preparing catalogue descriptions.
And the work was not over once we were happy with the quality of the descriptive information we had. We then had to transfer the data from our spreadsheets into the catalogue. This first involved importing an xml skeleton into the editorial element of our cataloguing system, which created the space for us to load the full descriptions. Once this stage was complete, we encoded the descriptions we had written into the xml language our systems recognise, meaning that the descriptive information for each series would appear in the right format on Discovery. We relied on Excel functions to encode the data as such and we were then able to load them into the cataloguing system thanks to the earlier skeleton import.
Once the data was in the editorial part of our catalogue, we could perform a final check. This final check allowed us to identify and amend any previously-missed inaccuracies in the data. As such it forms an important part of our data quality process. Finally, we were able to release the descriptions on Discovery for the public to browse and order to the reading rooms. It felt particularly momentous when we released the descriptions for the first WO 420 files to the public, but this is certainly not the end of our work. With each series catalogued we have learned important lessons about this data and its particularities and we’ve used this experience to improve and streamline our workflows.
As a consequence we now have three fully catalogued series of Second World War service personnel files and have built up a considerable body of knowledge about how to catalogue series extending to scores of thousands of records. Between them they provide a fascinating insight into military, social and family history, and represent a range of experiences including of women’s services. Crucially, we now also have a process in place and the resources necessary to allow us to catalogue further, similar, accessions to the same high standards. Work continues alongside our colleagues in MoD to arrange the transfer and cataloguing of further series, and we’re very much looking forward to furthering the progress we have already made.
For further information about the project, please do take a look at the FAQs and research guide.
” It also meant that as a blanket rule we decided to close all files for public consultation until 115 years after the year of birth to protect personal information.”
There is no such blanket closure by the MoD. If personnel can be shown to be deceased, at least some information is allowed to be sent out. So what is the advantage of transferring to TNA if less access is to be given?
If I am wrong, can we please be given a comparison of access?
And why is the given name not in the intended index? How many J Smith records do we access when we want Jeremiah Smith?
Thank you very much for your comment. To address each of your points in turn may I first say that while standard closure of 115 years applies to all records across these series, readers are still able to make an FOI request to see a file before the end of this period. You are quite right that after receiving proof of death the team in question would then be able to make a decision about whether or not material from the file could be released to the requester. A request can be made by following the link next to the catalogue description in question on Discovery. There is further information about this process in the context of service personnel files in the FAQs (here: https://cdn.nationalarchives.gov.uk/documents/mod-service-records-collection-faqs.pdf).
Secondly, thank you for your comment about including full first names in the catalogue descriptions. We have used data provided by MoD to build the catalogue descriptions. This data does not include full first names.
I hope this further information is helpful to you.