Hedgehog Street and Linked Open Data
The best way to explain the title of this blog is to begin by quoting directly from the Hedgehog Street website:
“Through Hedgehog Street, we are asking people to become Hedgehog Champions to rally support from their neighbours and work together to create ideal hedgehog habitat throughout their street, estate or communal grounds.”
I saw this initiative on BBC Springwatch a while back, specifically, one simple thing we can all do to become Hedgehog Champions – link your garden. Again to quote the Hedgehog Street website:
“Hedgehogs travel around one mile every night through our parks and gardens in their quest to find enough food and a mate. If you have an enclosed garden you might be getting in the way of their plans. Hedgehogs have enough barriers to contend with such as roads and rivers that we can’t do much about. However we can make their life a little easier by removing the barriers within our control – for example making holes in or under our garden fences and walls for them to pass through. The gap need only be around 15cm in diameter and so should not affect your pets’ safety.”
The idea of doing something so simple to protect our cute friends is a nice one. We’re converting one garden into hundreds, and combined with more naturally occurring wildlife corridors, potentially thousands. This is what we’re doing when we link data, the gardens represent our data and datasets and the link we’ve created gives users and machines unrestricted access to navigate from one dataset to another. It’s an almost perfect analogy – an analogy which I hope will help to open up the concept to all our readers, technical and non-technical alike.
In previous blog posts Simon Demissie and Linda Stewart provided some good overviews about what people are doing with linked data. To go into more depth and to expand on Simon’s thoughts, there are four principles that are attached to linked open data:
- Use URIs as names for things
- Use HTTP URIs so that people can look up those names
- When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL)
- Include links to other URIs. so that they can discover more things
Contrary to De La Soul’s thoughts, four for me is the magic number – ‘Include links to other URIs so that they can discover more things.’ Four is the idea we’re embracing when linking our gardens – it is the very heart of linked data, by name and definition. I feel that while the other principles are enablers, it is only through linking one dataset, be it about genealogy, music, wildlife, or the work we do in digital preservation, that we turn it from one resource into thousands.
Like the hedgehog finding its way through the numerous linked gardens each night, what I seek are seamless links between preservation resources. With some of the work we are doing with PRONOM, we’re removing the proverbial brick in the wall to allow our prickly friends through; the wall leads from one format registry to other data sources about software components, organisations etc.
For us principle number four opens our dataset to more data about formats and information about auxiliary data that we should also be referencing relating to software, compression algorithms and more.
To help understand the importance of linked open data we can talk about what we’re good at (which others might link to) and what others are good at (which we might link to). Within the department we’re experts at writing digital signatures used to identify file formats.
Other people are experts in the field of software, or at describing organisational information. Instead of repeating this work we need a different mechanism to use it, and for it to enhance our own knowledge. Examples of sources include:
- Software Ontology Project (SWO): Could potentially lead to sources of information about software that we can link to
- Companies House Linked Data (e.g. Data about Microsoft): Provides a mechanism for linking to factual information about companies, information that we would require specialists to populate ourselves
- dbpedia.org (e.g. Data about Portable Document Format (PDF)): Based on Wikipedia, dbpedia data allows us to link to general facts and information about formats which can be hard to populate to satisfy different requirements of this data. It is also available in different languages helping to internationalize the work we’re doing
Being able to link to other expert knowledge reduces the cost of maintaining similar data ourselves, reduces the cost of researching it, and increases the value returned to the end user. This will normally be our own community, but that will trickle down to the academics, historians and family researchers who all look at records from The National Archives and whom require our digital records to survive as resources long into the future.
Looking at the Companies House and dbpedia links in the list above, the data is made available as a HTML visualisation. Following the third linked data principle – ‘When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL)’ – these resources also make available representations in machine-readable form that enables software developers to manipulate the data. There is an entire blog post that could be written about the potential of being able to manipulate this data but once we’ve connected our resources we are able to create mash-ups and visualisations of data – in digital preservation one could imagine monitoring system for early warning signs of digital obsolescence.
Too easy to get carried away
As a developer working in digital preservation and given such resources I could bury my head for years producing some of this work. Others in our field see the potential for registries of this interrelated information too such as Andrew Jackson of the British Library and Bill Roberts of the Open Planets Foundation (OPF) and the National Archives of the Netherlands.
Connecting the datasets we’re interested in won’t be a killer application for linked data but it’ll certainly be a killer medium over which to deliver format registry information. We’ve all got our bits and pieces locked in filing cabinets and on our desks, we just need to publish it and join it up – the links are our hedgehog streets and our wildlife corridors – As Linky, my fictional friend and (potential!) mascot of linked open data might say – “we just need to get the hedgehogs yomping down them!”
This is before we get into quads and provenance – that’s where the really cool stuff begins!