Linked Data (and repositories, again)
This is another one of those posts that started life in the form of various drafts which I didn't publish because I thought they weren't quite "finished", but then seemed to become slightly redundant because anything of interest had already been said by lots of other people who were rather more on the ball than I was. But as there seems to be a rapid growth of interest in this area at the moment, and as it ties in with some of the themes Andy highlights in his recent posts about his presentation at VALA 2008, I thought I'd make an effort to pull try to pull some of these fragments together.
If I'd got round to compiling my year-end Top 5 Technical Documents list for 2007 (whaddya mean, you don't have a year-end Top 5 Technical Documents list?), my number one would have been How to Publish Linked Data on the Web by Chris Bizer, Richard Cyganiak and Tom Heath.
The Semantic Web isn't just about putting data on the web. It is about making links, so that a person or machine can explore the web of data. With linked data, when you have some of it, you can find other, related, data. (emphasis added)
And the key to realising this, argues Berners-Lee, lies in following four base rules:
- Use URIs as names for things.
- Use HTTP URIs so that people can look up those names.
- When someone looks up a URI, provide useful information.
- Include links to other URIs. so that they can discover more things.
Bizer, Cyganiak & Heath present linked data as a combination of key concepts from the Web Architecture on the one hand (including the TAG's resolution to the httpRange-14 issue) and the RDF data model on the other, and distill them into a form which is on the one hand clear and concise, and on the other backed up by effective, practical guidelines for their application. While many of those guidelines are available in some form elsewhere (e.g. in TAG findings or in notes such as Cool URIs...), it's extremely helpful to have these ideas collated and presented in a very practically focused style.
As an aside, in the course of assembling those guidelines, they suggest that some of those principles might benefit from some qualification, in particular the use of URI aliases, which the Web Architecture document suggests are best avoided. For the authors,
URI aliases are common on the Web of Data, as it can not realistically be expected that all information providers agree on the same URIs to identify a non-information resources. URI aliases provide an important social function to the Web of Data as they are dereferenced to different descriptions of the same non-information resource and thus allow different views and opinions to be expressed. (emphasis added)
I'm prompted to mention Linked Data now in part by Andy's emphasis on Web Architecture and Semantic Web technologies, but also by a post by Mike Bergman a couple of weeks ago, reflecting on the growth in the quantity of data now available following the principles and conventions recommended by the Bizer, Cyganiak & Heath paper. In his post, Bergman includes a copy of a graphic from Richard Cyganiak providing a "birds-eye view "of the Linked Data landscape, and highlighting the principal sources by domain or provider.
"What's wrong with that picture?", as they say. I was struck (but not really surprised) by the absence - with the exception of the University of Southampton's Department of Electronics & Computer Science - of any of the data about researchers and their outputs that is being captured and exposed on the Web by the many "repository" systems of various hues within the UK education sector. While in at least some cases institutions (or trans-institutional communities) are having a modicum of success in capturing that data, it seems to me that the ways in which it is typically made available to other applications mean that it is less visible and less usable than it might be.
Or, to borrow an expression used by Paul Miller of Talis in a post on Nodalities, we need to think about how to make sure our repository systems are not simply "on the Web" but firmly "of the Web" - and the practices of the growing Linked Data community, it seems to me, provide a firm foundation for doing that.