I think that was the quote we were looking for? Ok, maybe not but If I mention the word DROID you might figure the right one out!
Tenuous links over, in Digital Preservation today we’ve released a new version of the DROID (Digital Record and Object Identification) tool – version 6.1. We’ve spoken about the tool before when I blogged about the PRONOM and DROID user consultation we held at The National Archives last year. The day resulted in a consultation wiki where contribution is invited by all members of the public with an interest in a potential DROID 7. The wiki page lists requirements that users of the tool have for DROID 7 and all future versions.
The release of 6.1 addresses requirement 16: ‘DROID provides a database free mode for identification of formats without the need for profile reports.’ This requirement came out of the consultation event but earlier on this year it was found to be a requirement in the Digital Records Infrastructure Project [PDF] – the requirement would mean that DROID can run faster and provide an output on the command line that was easier for users to work with and to manipulate. From a personal perspective I consider this release to be a major one for DROID. From a software versioning point of view it is a minor because it doesn’t break compatibility with existing systems but it does offer a new streamlined mechanism for interacting with the tool and its results.
The following command reflects the majority of the additions we have made if you run DROID 6.1 from the command line:
C:\>java -jar droid-command-line-6.1.jar droid -Nr "C:\Files\A Folder"
-Ns DROID_SignatureFile_V63.xml -Nc container-signature-20120828 -A
- -Nr – is the folder or “resource” to be scanned by DROID
- -Ns – provides droid with a standard signature file to work against – the signature file stores the byte sequences to match files against
- -Nc – provides droid with a container signature file that contains extra detail for working with special container formats such as ZIP or OLE2
- -A – means that DROID will also output the identification results of the contents of archive files such as ZIP, TAR and GZIP
We hope that this addition to DROID will make it easier to integrate it into the workflows that are operated by other institutions like ourselves where records are accessioned from external departments. Wherever in the workflow the digital record goes, it will almost always go through an identification tool such as DROID first. The output generated with this command are simple comma-separated values:
The output simply lists the file that was identified by DROID followed by the PRONOM Unique Identifier. The command also displays some information about itself and how it was run, however this can be switched off with a quiet mode switch (-q, –quiet)
What else have we achieved with this release?
While the requirements of the project were clear from the offset, one of the first things we had to do was stabilise the project for the development environments we use and for The National Archives to be able to work on it. This has resulted in fixing unit tests in the code which provide a layer of automated testing for developers, we have simplified the way the code is bought together and packaged into the application we deliver, and we’ve moved the project into a more transparent and accessible location on the internet for everyone to access in the form of GitHub.
GitHub will allow developers unrestricted access to the code that DROID is built from. This means they can play with the tool, experiment with it and hopefully submit changes back to The National Archives which we can then roll into the project to benefit our own efforts. GitHub as the name implies provides a community hub which we can build around to encourage more involvement in the tool and the work we’re doing in digital preservation at The National Archives.
With this refresh in the way we approach the project, as opposed to the previous SourceForge pages, we’re also creating a Google Group to foster even more communication with the team and between community members. Again, the benefit of this is that everyone can see what we’re doing and talking about and everyone can get involved with our work.
Over the next few days, the ‘backlog’ of work we have remaining will be transcribed to the GitHub issues log for DROID. The backlog comes from a combination of the agile development methodology used to develop this release and coming to the end of that time bounded period. Some important goals the project still wishes to meet are:
- Fixing the code to build and test correctly other important platforms (Windows Server 2008, Travis CI)
- Getting DROID to work with the most recent version of Java (1.7)
- Improving the quality of the unit tests already provided by the tool
What about DROID 7?
The new year began with a report about the DROID consultation event. The consultation wiki is still open for all who want to comment on the tool and suggest features that they’d like to see implemented in future versions of DROID. Having stabilised the project and with the new project infrastructure in place the teams at The National Archives are in a better position to be able to implement features requested by the community.
As highlighted by Tim Gollins – head of Digital Preservation, at the consultation event, this depends heavily on the requirements of The National Archives and the records we accession, but as demonstrated with this release, as those requirements are met they will often meet the requirements of the larger community, so hopefully we’ll see them slowly ticked off, one-by-one until the tool meets all of our needs. Hopefully in 900 years DROID will have helped to keep all of our digital records looking just as good as they do now.