« In the clouds | Main | We met, we tweeted, we archived... then what? »

February 26, 2010

The 2nd Linked Data London Meetup & trying to bridge a gap

On Wednesday I attended the 2nd London Linked Data Meetup, organised by Georgi Kobilarov and Silver Oliver and co-located with the JISC Dev8D 2010 event at ULU.

The morning session featured a series of presentations:

  • Tom Heath from Talis started the day with Linked Data: Implications and Applications. Tom introduced the RDF model, and planted the idea that the traditional "document" metaphor (and associated notions like the "desktop" and the "folder") were inappropriate and unnecessarily limiting in the context of Linked Data's Web of Things. Tom really just scratched the surface of this topic, I think, with a few examples of the sort of operations we might want to perform, and there was probably scope for a whole day of exploring it.
  • Tom Scott from the BBC on the Wildlife Finder, the ontology beind it, and some of the issues around generating and exposing the data. I had heard Tom speak before, about the BBC Programmes and Music sites, and again this time I found myself admiring the way he covered potentially quite complex issues very clearly and concisely. The BBC examples provide great illustrations of how linked data is not (or at least should not be) something "apart from" a "Web site", but rather is an integral part of it: they are realisations of the "your Web site is your API" maxim. The BBC's use of Wikipedia as a data source also led into some interesting discussion of trust and provenance, and dealing with the risk of, say, an editor of a Wikipedia page creating malicious content which was then surfaced on the BBC page. At the time of the presentation, the wildlife data was still delivered only in HTML, but Tom announced yesterday that the RDF data was now being exposed, in a similar style to that of the Programmes and Music sites.
  • John Sheridan and Jeni Tennison described their work on initiatives to open up UK government data. This was really something of a whirlwind (or maybe given the presenters' choice of Wild West metaphors, that should be a "twister") tour through a rapidly evolving landscape of current work, but I was impressed by the way they emphasised the practical and pragmatic nature of their approaches, from guidance on URI design through work on provenance, to the current work on a "Linked Data API" (on which more below)
  • Lin Clark of DERI gave a quick summary of support for RDFa in Drupal 7. It was necessarily a very rapid overview, but it was enough to make me make a mental note to try to find the time to explore Drupal 7 in more detail.
  • Georgi Kobilarov and Silver Oliver presented Uberblic, which provides a single integrated point of access to a set of data sources. One of the very cool features of Uberblic is that updates to the sources (e.g. a Wikipedia edit) are reflected in the aggregator in real time.

The morning closed with a panel session chaired by Paul Miller, involving Jeni Tennison, Tom Scott, Ian Davis (Talis) and Timo Hannay (Nature Publishing) which picked up many of the threads from the preceding sessions. My notes (and memories!) from this session seem a bit thin (in my defence, it was just before lunch and we'd covered a lot of ground...), but I do recall discussion of the trade-offs between URI readability and opacity, and the impact on managing persistence, which I think spilled out into quite a lot of discussion on Twitter. IIRC, this session also produced my favourite quote of the day, from Tom Scott, which was something along the lines of, "The idea that doing linked data is really hard is a myth".

Perhaps the most interesting (and timely/topical) session of the day was the workshop at the end of the afternoon by Jeni Tennison, Leigh Dodds and Dave Reynolds, in which they introduced a proposal for what they call a "Linked Data API".

This defines a configurable "middleware" layer that sits in front of a SPARQL endpoint to support the provision of RESTful access to the data, including not only the provision of descriptions of individual identified resources, but also selection and filtering based on simple URI patterns rather than on SPARQL, and the delivery of multiple output formats (including a serialisation of RDF in JSON - and the ability to generate HTML or XHTML). (It only addresses read access, not updates.)

This initiative emerged at least in part out of responses to the data.gov.uk work, and comments on the UK Government Data Developers Google Group and elsewhere by developers unfamiliar with RDF and related technologies. It seeks to try to address the problem that the provision of queries only through SPARQL requires the developers of applications to engage directly with the SPARQL query language, the RDF model and the possibly unfamiliar formats provided by SPARQL. At the same time, this approach also seeks to retain the "essence" of the RDF model in the data - and to provide clients with access to the underlying queries if required: it complements the SPARQL approach, rather than replaces it.

The configurability offers a considerable degree of flexibility in the interface that can be provided - without the requirement to create new application code. Leigh made the important point that the API layer might be provided by the publisher of the SPARQL endpoint, or it might be provided by a third party, acting as an intermediary/proxy to a remote SPARQL endpoint.

IIRC, mentions were made of work in progress on implementations in Ruby, PHP and Java(?).

As a non-developer myself, I hope I haven't misrepresented any of the technical details in my attempt to summarise this. There was a lot of interest in this session at the meeting, and it seems to me this is potentially an important contribution to bridging the gap between the world of Linked Data and SPARQL on the one hand and Web developers on the other, both in terms of lowering immediate barriers to access and in terms of introducing SPARQL more gradually. There is now a Google Group for discussion of the API.

All in all it was an exciting if somewhat exhausting day. The sessions I attended were all pretty much full to capacity and generated a lot of discussion, and it generally felt like there is a great deal of excitement and optimism about what is becoming possible. The tools and infrastructure around linked data are still evolving, certainly, but I was particularly struck - through initiatives like the API project above - by the sense of willingness to respond to comments and criticisms and to try to "build bridges", and to do so in very real, practical ways.

TrackBack

TrackBack URL for this entry:
https://www.typepad.com/services/trackback/6a00d8345203ba69e20120a8d7d104970b

Listed below are links to weblogs that reference The 2nd Linked Data London Meetup & trying to bridge a gap:

Comments

"Perhaps the most interesting (and timely/topical) session of the day was the workshop at the end of the afternoon by Jeni Tennison, Leigh Dodds and Dave Reynolds, in which they introduced a proposal for what they call a "Linked Data API"."

I was gutted not to be able to make that session (I was doing a Dev8D session at the same time) but thing that the easy URL and json output are a real boon.

I've also been exploring a complementary approach, wondering how we might support live Linked Data queries from spreadsheet formulae; a first pass demo at what I mean by this can be seen in the post "Using Data From Linked Data Datastores the Easy Way (i.e. in a spreadsheet, via a formula)":
http://ouseful.wordpress.com/2010/02/17/using-data-from-linked-data-datastores-the-easy-way/

We can use a similar formula based approach to pull in data from other sources too, e.g. as described in "Grabbing “Facts” from the Guardian Datastore with a Google Spreadsheets Formula"
http://ouseful.wordpress.com/2010/02/19/grabbing-facts-from-the-guardian-datastore-with-a-google-spreadsheets-formula/

(I intend to post a few more demos showing how to access sets of linked data rather than just single cell values over the next week or two.)

Aside from the technical details, I think there may some mileage in comparing the design of spreadsheet formulas and the design of the URI scheme used in the Linked Data api. If you know of anyone who's looked at that sort of comparative design issue before, I'd really appreciate some references:-)

Hi Tony,

It was a pity you couldn't make that session. You're probably exactly the person to test out what Jeni, Leigh and Dave have come up with! :-)

I don't think I know of any work along the lines you describe at the end there, though Jeni, Leigh or Dave might do.

But my gut feeling is that working against the API should make it easier to construct at least some of the sort of queries you were grappling with i.e. the API layer might handle what you were doing with the script function? Though I imagine it may also be that an API implementation offers less power/flexibility than that offered by the SPARQL endpoint, so there may be cases where you need SPARQL if your particular query isn't supported....

Re "the URI scheme used in the Linked Data api", my understanding (which may be incorrect) is that rather than specifying a single set of URI patterns, it provides a good deal of flexibility in the way various patterns can be supported, and the various components mapped into parts of SPARQL queries.

The comments to this entry are closed.

About

Search

Loading
eFoundations is powered by TypePad