Hedgehog Street and Linked Open Data

The best way to explain the title of this blog is to begin by quoting directly from the Hedgehog Street website:

“Through Hedgehog Street, we are asking people to become Hedgehog Champions to rally support from their neighbours and work together to create ideal hedgehog habitat throughout their street, estate or communal grounds.”

I saw this initiative on BBC Springwatch a while back, specifically, one simple thing we can all do to become Hedgehog Champions – link your garden. Again to quote the Hedgehog Street website:

“Hedgehogs travel around one mile every night through our parks and gardens in their quest to find enough food and a mate. If you have an enclosed garden you might be getting in the way of their plans. Hedgehogs have enough barriers to contend with such as roads and rivers that we can’t do much about. However we can make their life a little easier by removing the barriers within our control – for example making holes in or under our garden fences and walls for them to pass through. The gap need only be around 15cm in diameter and so should not affect your pets’ safety.”

The idea of doing something so simple to protect our cute friends is a nice one. We’re converting one garden into hundreds, and combined with more naturally occurring wildlife corridors, potentially thousands. This is what we’re doing when we link data, the gardens represent our data and datasets and the link we’ve created gives users and machines unrestricted access to navigate from one dataset to another. It’s an almost perfect analogy – an analogy which I hope will help to open up the concept to all our readers, technical and non-technical alike.

Linky - The Linked Data Hedgehog

In previous blog posts Simon Demissie and Linda Stewart provided some good overviews about what people are doing with linked data. To go into more depth and to expand on Simon’s thoughts, there are four principles that are attached to linked open data:

  • Use URIs as names for things
  • Use HTTP URIs so that people can look up those names
  • When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL)
  • Include links to other URIs. so that they can discover more things

Contrary to De La Soul’s thoughts, four for me is the magic number – ‘Include links to other URIs so that they can discover more things.’ Four is the idea we’re embracing when linking our gardens – it is the very heart of linked data, by name and definition. I feel that while the other principles are enablers, it is only through linking one dataset, be it about genealogy, music, wildlife, or the work we do in digital preservation, that we turn it from one resource into thousands.

Like the hedgehog finding its way through the numerous linked gardens each night, what I seek are seamless links between preservation resources. With some of the work we are doing with PRONOM, we’re removing the proverbial brick in the wall to allow our prickly friends through; the wall leads from one format registry to other data sources about software components, organisations etc.

For us principle number four opens our dataset to more data about formats and information about auxiliary data that we should also be referencing relating to software, compression algorithms and more.

To help understand the importance of linked open data we can talk about what we’re good at (which others might link to) and what others are good at (which we might link to). Within the department we’re experts at writing digital signatures used to identify file formats.

Other people are experts in the field of software, or at describing organisational information. Instead of repeating this work we need a different mechanism to use it, and for it to enhance our own knowledge. Examples of sources include:

  • Software Ontology Project (SWO): Could potentially lead to sources of information about software that we can link to
  • Companies House Linked Data (e.g. Data about Microsoft): Provides a mechanism for linking to factual information about companies, information that we would require specialists to populate ourselves
  • dbpedia.org (e.g. Data about Portable Document Format (PDF)): Based on Wikipedia, dbpedia data allows us to link to general facts and information about formats which can be hard to populate to satisfy different requirements of this data. It is also available in different languages helping to internationalize the work we’re doing

Being able to link to other expert knowledge reduces the cost of maintaining similar data ourselves, reduces the cost of researching it, and increases the value returned to the end user. This will normally be our own community, but that will trickle down to the academics, historians and family researchers who all look at records from The National Archives and whom require our digital records to survive as resources long into the future.

Looking at the Companies House and dbpedia links in the list above, the data is made available as a HTML visualisation. Following the third linked data principle – ‘When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL)’ – these resources also make available representations in machine-readable form that enables software developers to manipulate the data. There is an entire blog post that could be written about the potential of being able to manipulate this data but once we’ve connected our resources we are able to create mash-ups and visualisations of data – in digital preservation one could imagine monitoring system for early warning signs of digital obsolescence.

Too easy to get carried away

As a developer working in digital preservation and given such resources I could bury my head for years producing some of this work. Others in our field see the potential for registries of this interrelated information too such as Andrew Jackson of the British Library and Bill Roberts of the Open Planets Foundation (OPF) and the National Archives of the Netherlands.

Connecting the datasets we’re interested in won’t be a killer application for linked data but it’ll certainly be a killer medium over which to deliver format registry information. We’ve all got our bits and pieces locked in filing cabinets and on our desks, we just need to publish it and join it up – the links are our hedgehog streets and our wildlife corridors – As Linky, my fictional friend and (potential!) mascot of linked open data might say – “we just need to get the hedgehogs yomping down them!”

And… 

This is before we get into quads and provenance – that’s where the really cool stuff begins!

5 comments

  1. Hugh Glaser says:

    Hi Ross,
    Nice blog – good analogy!
    I hope you don’t mind me emailing off list, but the main topic is not really list or blog comment stuff.
    As I understand your blog, it is all about the National Archive now engaging with the links.

    I run the http://sameas.org site.
    For example:
    http://sameas.org/?uri=http://statistics.data.gov.uk/id/local-authority/00QP
    I also run sub-sites for Freebase (http://sameas.org/store/freebase/), VIAF (http://sameas.org/store/viaf/) and the British Library (http://sameas.org/store/britishlibrary/), among others.

    From your blog it seems to me that the time may finally have arrived when the National Archives might want to really engage with the sameAs world.
    (I have met with Richard Stirling about this and know John Sheridan and Jeni Tennison quite well, so we have explored this before.)

    So, what might we do?
    I would be very happy to get linkage data of the sameAs type to put into sameAs.org.
    I would also be happy to bring up a specific sameAs store for your linkage data.
    (I won’t bore you with why it is a good way of doing it, but can if you want.)
    Of course, I already harvested a lot of the data, but that is now quite old – you will see that the 00QP link has quite a few dead links on data.gov.uk
    But of course, that is good, it gives access to legacy stuff.
    You might well want to have a nice, shiny, store that only has the links that you want in, but can also be federated to the wider world if people want.
    This is what the stores above offer, for example, Freebase:
    http://sameas.org/store/freebase/?uri=http://rdf.freebase.com/ns/en.edinburgh
    compared with
    http://sameas.org/store/freebase/?uri=http://rdf.freebase.com/ns/en.edinburgh&federate=true

    The advantage of you generating the data and sending it to me (or pointing me at a link) is that we can be more confident of the accuracy and comprehensive nature – at the moment I would not bring up a data.gov.uk store, as I don’t think it would have exactly the right stuff in it.

    That’s it, I guess.

    Best
    Hugh

    PS
    By the way, from the blog point of view, I went looking for links to what the NA was actually doing, and found it very hard to find anything.
    Maybe that was deliberate, but it left me frustrated – lots of links to what other people were doing, but not even a link to data.gov.uk, as far as I could find.

    1. Ross Spencer says:

      Many thanks for the comments Hugh; I’m glad you appreciate the analogy. With regard to engaging with SameAs then my own work on linked data, specifically the PRONOM project is some way away from that. You can see an overview of this work and where we are on the labs blog:

      http://labs.nationalarchives.gov.uk/wordpress/index.php/2011/01/linked-data-and-pronom/

      When the project is further progressed then we will certainly look around at technology to help us enhance the dataset, what we’re linking to and what might link to us.

      As for the blog and finding other work we’re doing then I’m just providing an overview here for all our blog readers. I am sure we will be able to expand on current and future work in other blog posts. I do recommend having a read through our labs site as this collects some of our more exciting and recent developments: http://labs.nationalarchives.gov.uk/wordpress/

      Hope that helps.

      Ross

  2. Adrian Walker says:

    Data by itself is necessary, but not enough, for many practical uses of an intranet or the Web.

    What’s also needed is knowledge about how to use the data to answer an ever increasing number of questions — such as, “How much could the US save through energy independence?”.

    There’s emerging technology that can leverage social networking for the significant task of acquiring and curating the necessary knowledge — in the form of Executable English.

    You can Google “Executable English” to find this.

    The technology underlies a Web site that works as a kind of Wiki, for collaborative content in open vocabulary, executable English.

    Example: http://www.reengineeringllc.com/demo_agents/RDFQueryLangComparison1.agent

    1. Ross Spencer says:

      Many thanks for the comment Adrian. I think your example seems at least as powerful as SPARQL. I’d need to look into it more but it could be useful for non-expert users.

      I certainly agree that we need knowledge about how to use data to answer questions. There is already a lot of data out there. Mash-ups are nice but there will be a requirement for programmers working with this data to apply solid numerical and statistical analysis skills to draw out meaningful conclusions to what people want to know.

  3. Rozella says:

    Excellent post. Keep writing such kind of info on your blog. Im really impressed by your blog.
    Hello there, You’ve performed an incredible job. I’ll certainly digg it and in my view suggest to my friends. I am sure they’ll be benefited from this web site.

Leave a comment

Visit this page for family history and other research enquiries. Please do not post personal information. All comments are pre-moderated. See our moderation policy for more details.

Your email address will not be published. Required fields are marked *