My secondment to the archive of Amnesty International’s International Secretariat (AI) has been my first exposure to the archival sector. I am one of eight digital archive trainees on the Bridging the Digital Gap programme, who started in January at archives in London and the South West. The programme is managed by The National Archives and funded by the National Lottery Heritage Fund.
I found acclimatising to AI a relatively painless experience as this is the third role I’ve had that has seen me predominantly working remotely. ‘International’ is in the name, and the diffuse nature of the organisation pre-supposed a preparedness for lockdown measures. They didn’t disappoint.
The traineeship started off slowly. The first month was dominated by inductions, online training, meet-and-greets and workshops arranged by The National Archives introducing us to archival theory. I am an impatient person, and I admit I struggled to reconcile the role’s primary purpose as that of a traineeship – my overriding desire is to get stuck in and discover what archivists actually do on a day-to-day basis.
Then, having gained a degree of theoretical and semantic understanding of the archival sector, I became involved in various projects. There were three main projects in the first four months. The first was a web-archiving project, in which I was asked to help preserve the numerous microsites created by AI in the course of their campaigns.
An example of the type of site I had to archive is the teargas investigation website: quite complex, with many interactive elements. Typical, automated web-crawlers tend to struggle with such sites; colleagues recommended manual crawling tools such as Conifer instead.
I got to work crawling each site, ensuring I clicked on and interacted with every aspect of it. Within a week, I had the hang of it, opening multiple tabs simultaneously to record sites faster, and skipping through videos instead of waiting for the whole thing to play.
After a month, I had recorded most of the sites and then ingested them into Preservica, AI’s digital preservation platform. I even created an elaborate spreadsheet to track the sites, their crawl status and how much time had elapsed since they were last crawled.
My second project was a large-scale migration of files from network drives, split into active documents destined for SharePoint and files deemed to have enduring value, which was where I came in. Each department had its own location in the network drive where it stored its files. I had to generate reports detailing the file names, extensions, size and file structure, among other metadata. I did this using TreeSize and DROID (built at The National Archives).
These reports could take time, especially as these folders have been growing for years with numerous duplicates and even more draft versions. It was here I ran into hardware issues. The virtual machine I was using only had 4GB of RAM, which is nigh-on unusable considering how resource-intensive Windows 10 is. I would come back to check on the reports to be met by a black screen – hours of report-generating ruined. While IT worked to resolve these issues, I focused on metadata clean-up within the archive catalogue, and divided the rest of my time among the other projects.
One of those was the third project, which is still very much in progress. I have been given the task of finding a suitable process with which to archive and preserve emails generated within the organisation. These emails range from announcements and organisational decisions to weekly updates and statements shared in response to external events.
You’d think archiving emails would be a simple job – you’d be wrong. The first issue was one of file formats. Outlook emails are stored in the MSG format, which is not an open standard – EML is preferable. Preservica has recently added functionality to migrate from MSG to EML, but the quality of the conversion is not of the required standard. I attempted to create a Microsoft Power Automate to set up an automatic conversion from MSG to PDF, but this too created an unsatisfactory output. I therefore found specialist software that met the archive’s requirements for metadata preservation, report generation, output structure and more. Currently, I am in the process of putting together a business appraisal, to make the case for why we need the software.
A lot of the literature and articles we consume could not exist without archives preserving the information that goes on to be cited in these works. This traineeship has provided me with a perspective of the archival sector I wouldn’t have arrived at of my own volition – one of respect for the immense work that occurs behind the curtains in order to keep the accumulated knowledge of society perpetually accessible.
Good luck. In the distant past it was hoped PDF was the answer then Adobe decided to make it possible to edit them! Being able to copy’n’paste extracts is one thing but change the original document is another! At least MS introduced protecting a WORD document unless a recipient of a document from another author has to unlock it. This was suggested to them in the early 1980’s.