Bring out your dead (files)
On Monday 28 January, the Digital Preservation Coalition (DPC) hosted a file formats day of action, creatively titled ‘Bring Out Your Dead (Files)’ at the Wellcome Collection. As The National Archives’ resident File Format Signature Developer, I was invited to deliver a presentation on DROID and PRONOM, our file format identification tool and file format registry, and a workshop on Developing File Format Signatures for PRONOM.
My own talk reviewed DROID and PRONOM developments in 2012:
- DROID 6.1 was released in August. DROID development has switched to Github, and we have a Google Groups discussion page open for support enquiries
- The PRONOM registry has grown considerably, with 100 new file formats, 177 new file format signatures, and a full time researcher appointed
- PRONOM has been able to grow this much in part due to the wealth of external contributors who continue to provide us with file format signature and research information. Over a dozen institutions and individuals contributed last year
- Finally I was delighted to announce that the download for our DROID tool now has a permanent home on The National Archives’ own website.
My workshop focused on demystifying the file format research and signature development processes I undertake and allowed willing participants the chance to try developing their own signatures.
Although I had a busy day, I was able to observe a number of other presentations.
Chris Rusbridge, an independent digital preservation consultant formerly of the UK Digital Curation Centre, talked about some of the challenges he has encountered over the years and how collaborative effort and crowd-sourcing may be the answer to many of the problems we, as digital preservation practitioners, are facing. Amongst Chris’ wide-ranging presentation, he cited a recent and ongoing collaborative effort called ‘Just Solve the File Format Problem,’ which is a public, wiki-based undertaking aimed at creating a permanent file format knowledge-base. Launched in November 2012, the wiki has already been successful in capturing a broad range of file format information and will hopefully grow to become a strong focus point for those interested in finding out more about file formats.
Next, Maureen Pennock from the British Library talked about a project called cRIsp – Crowd-sourced Representation Information for Supporting Preservation – which is aimed at harnessing the wisdom of the crowd to collect together representation information of digital content. Representation information may include information about a particular file format, such as what tools will render a given format successfully, what dependencies the format has, and what technical specifications or standards the format conforms to. Any interested party is free to contribute to this effort, and a form is available via their web page.
Paul Wheatley of the University of Leeds discussed a couple of collaborative activities: the Atlas of Digital Damages is a Flickr-hosted site that users can contribute their images of digital rendering gone wrong. It is fascinating to browse through the gallery and the pictures really help to expose just how fragile digital objects can be. Corruption of a single byte can completely break a file in quite spectacular ways.
Paul also spoke of a Format Corpus held on Github. The purpose of this site is to collect together samples of a wide range of file formats freely licensed for individuals to make use of for their own research purposes, whether testing preservation or presentation systems, or like me, researching the internal structure particular formats.
All in all I felt it was a very enjoyable and worthwhile day. I was struck with the sense that the digital preservation community really is a strong community, and that there is considerable effort taking place to work together to understand and overcome the issues we all share.
Organisations like the DPC, Jisc, and the Open Planets Foundation regularly organise events such as these and I would encourage anybody with an interest in digital preservation issues to attend any suitable event. The next event I’ll be attending will be the Jisc-funded SPRUCE project’s hackathon in Leeds on 11-12 March, aimed at unifying the community’s approach to characterisation by coordinating existing toolsets and improving their capabilities. I hope to see many of you there!
Presentations delivered at the File Formats Day of Action event are available through the DPC website. DPC events are usually live-tweeted so if you cannot attend an event, you can still follow the proceedings. The hashtag for the day was #DPC_ff.