My name is David Clipsham and I have been employed as the File Format Signature Developer for a month, having previously worked as Customer Service Manager for the cross-government social collaboration tool, Civil Pages. My role is to improve the coverage of The National Archives’ PRONOM file format registry. The internal and external signature information contained in the PRONOM registry is utilised by our file format identification tool DROID, which is used to identify file formats so we can make informed decisions about the long term preservation of digital records.
My day is typically spent researching obscure and not-so-obscure file formats, picking through the internal code of each format and identifying the key characteristics that make the file format what it is, as described in Ross Spencer’s recent blog post. I then recreate the key byte sequences, test them against sample files and upload them to PRONOM, ready for our bi-monthly signature release.
How do I focus my research?
My research is triggered by The National Archives’ current and future holdings, as generated by government, as the data The National Archives holds is generated by over 250 government departments and agencies. In short, the needs are:
• What have other government departments (OGDs) sent us already?
• What do OGDs want to send us soon?
• What are OGDs likely to send us in the future?
These needs ensure exposure to a wide variety of file formats, which certainly keeps the role interesting and fresh. For example, the London Organising Committee of the Olympic and Paralympic Games (LOCOG) is likely to generate a lot of multimedia files, so part of my role is to research these formats to ensure we are able to identify them.
On the other hand, the forthcoming transition to the 20-year rule will likely accelerate our exposure to more archaic file formats, whether they are generated by older versions of productivity software like Microsoft Office, Lotus Notes, or something entirely unexpected.
Our current focus is improving breadth – ensuring that DROID can identify as many file formats as possible. This dictates a signature-centric approach to the data we input into PRONOM.
PRONOM and DROID have a wide-ranging user-base, and many institutions across the world use these tools to assist with the management of their own digital archives. We therefore actively encourage anybody with an interest to contribute to the development and extend the coverage of our signature base.
Regular contributions have been submitted by multiple organisations, including the Georgia Institute of Technology (GTRI), the Museum of London (MoL), the American National Archives and Records Administration (NARA), the National Library of New Zealand (NLNZ) and many more international institutions, as attributed in the PRONOM release notes.
We are always happy to receive data to input into the PRONOM database; however we will always prioritise signature submission as this is where our focus currently lies. Ideally much of the groundwork will have been done, such as sourcing available file format specifications that describe potential signatures, and providing sample files so that we may validate your findings. Outside of this scope of signature research it is harder for us to dedicate resource to validating detailed free-text information about each format, so we often direct users who require this information to other resources, such as Wikipedia, the Library of Congress website: digitalpreservation.gov or fileinfo.com for this level of information.
A special mention must be given to NARA, whose students have been working on, and providing us with, detailed text for our ‘description’ field as part of their own research project.
If you would like to contribute to PRONOM, then please use the form provided. If you would like a particular format to be researched, it would be great if you could provide us with a handful of sample files and, if you’re feeling particularly adventurous, then our guide ‘How to research and develop signatures for file format identification‘ will help you understand our research methodology. Finally, if you are able to build a file format signature of your own then this will greatly increase the speed we can publish this to the benefit of the community!
I look forward to hearing from you.