Trainee Tuesday: Tales from the Dark Archive

Checksums, dark archives, OAIS, trusted storage and ingest packages. No, these are not the vital components to some epic science fiction novel – although they are all terms that were completely alien to me before I started my Opening Up Archives Traineeship at Gloucestershire Archives. My Name is Tom Charnock and since April I have been working with all of these terms (and more!) on an almost daily basis as a large part of my traineeship is focused on Digital Preservation. Before I started at Gloucestershire, my knowledge of Digital Preservation was fairly minimal – I’d never even heard the phrase before. I did have a good idea what it the term meant when I was introduced to it, seeing as I have a fairly good grasp of computer technology, software and am a bit of a tech geek at heart…but as far as being actively involved with Digital Preservation? No.

Gloucestershire Archives' building

Gloucestershire Archives’ building

That’s changed quite a bit in the five months that I’ve been at Gloucestershire. The first thing I had to learn to appreciate was what exactly the term ‘Digital Preservation’ actually means. At the most basic level, it clearly involves the preserving of digital objects, but there is so much more to it than that and, even though the learning curve has been a pretty steep one, I feel I’ve grasped both the concept and the actual practical implementation of the concept quite well.

Here at Gloucestershire Archives, we have developed a bespoke software tool that exists solely for use in a Digital Preservation workflow. It’s called SCAT (SCAT is Curation and Trust), and is based on the Linux platform Fedora. This makes the tool open source, and freely available to anybody who wants to implement it in their own Digital Preservation activities. What SCAT does, is take a digital object and turns it into an Archival Ingest Package (AIP), adds metadata and fixity files and then allows the file to be stored securely in a digital archive. Terms like ‘SIP,’ ‘AIP’ and ‘DIP’ (Submission Information Package, Archival Information Package and Dissemination Information Package ) have become second nature to me in recent months, as have other concepts such as ‘fixity’ (the ways in which we can ensure a digital object remains unchanged in storage), ‘bit rot’ (how the individual binary ones and zeros that constitute digital objects at the base level can change over time), and the notion of ‘trusted storage’ – a data storage facility that has the ability to not only continually verify the content of the digital objects within it, but also be able to guarantee the safety and retrievability of stored data in the event of, for example, a power outage, system failure or even a natural disaster.

Digital Preservation is important because so much of what we produce today is in one digital format or another, and there needs to be some method of storing all of this information for future generations. Gone are the days when everything we write down is on parchment or paper, and the explosion of other digital media forms such as images, sound and movie files means that the only way we can capture and store memories is in their native digital form – these files are known as ‘born digital.’ Using an open source platform such as SCAT to package and store these artefacts means that we do not need to rely on proprietary software from major corporations who not only impose limits and restrictions on users, but may also charge for their service.

At the time of writing, Gloucestershire Archives is working continually to improve and update the SCAT tool, and I am heavily involved with the testing and day to day usage of the current build. I have ingested a whole range of digital objects into our digital archive and continue to keep up to date with the current trends in Digital Preservation. There is still quite a lot of work to be done with SCAT – namely the implementation of the ability to deliver Dissemination Packages to service users who may want to view the digital files we have ingested, but I’m sure that at the current rate the next few months will be very fruitful.


  1. Lee Durbin says:

    Great post – I loved the sci-fi comparison. The terminology does make it sound as though digital archivists are living in an Alastair Reynolds novel.

  2. Claire Collins says:

    I love the idea of being a dark archivist… It is fantastic that in just 5 months you have got a handle on all this.

  3. Michael Carden says:

    Nice to hear that SCAT is open source. My web-search-fu seems to be failing me because I can’t find a web site or a source repository for this open source project. I did find a quote from March 2010 that said “SCAT is still very much alpha code, and Viv Cothey (its developer) intends to do a bit of tidying up before putting it out on the web.”

    Two and a half years later, has this happened? Is it out there somewhere?


  4. Michael Carden says:

    Ah, it’s not been indexed by ubiquitous search engines, but it exists! Excellent.


    1. Tom Charnock says:

      Hi Michael, thanks for the comments. Viv Cothey is indeed the tool’s developer. I’ll speak to him this week and see if he can put the new beta on the download site you linked to.

  5. Anne Ramon says:

    It’s good to hear that the ‘problem’ of storing digital records for posterity is being tackled. Have any other Record Offices expressed interest?

  6. Sarah Fellows says:

    This is fascinating. I like the term ‘bit rot’ ! I didn’t realise that binary could change over time, this has got my mind racing trying to imagine what that entails. I will be reading up on this, thank you.

  7. David Underdown says:

    Well on magnetic media the binary digits are (essentially) represented by individual particles of magnetic material whose polarity represents either 1 or 0. Sometimes interference or just general degradation of the media can cause these particles to flip, reversing the meaning of a particular bit. Or on optical media, physical damage or decay of the dyes which are used in writable DVDs and CDs has similar effects. There’s usually a degree of error correction built in, but eventually this can build up and corrupt data irretrievably.

    1. shirley says:

      This is important. We are never told to keep CDs etc away from magnetic sources; imagine in years to come all those vital records being completely unreadable. Thanks for the warning.

      1. David Underdown says:

        CDs and magnets should be fine – they are an optical rather than magnetic medium. It’s hard drvies, tapes and if you still have any, floppies that are more of an issue.

  8. Zen says:

    SCAT certainly sounds interesting and i think the way the world is going we will need all the backup and data preservation we can get. Im dreading the day my HD stop working, backing up is quite a chore.

Leave a comment

Visit this page for family history and other research enquiries. Please do not post personal information. All comments are pre-moderated. See our moderation policy for more details.

Your email address will not be published. Required fields are marked *