« Key trends in education? - a crowdsource request | Main | An App Store for the Government? »

September 17, 2010

On the length and winding nature of roads

I attended, and spoke at, the ISKO Linked Data - the future of knowledge organization on the Web event in London earlier this week. My talk was intended to have a kind of "what 10 years of working with the Dublin Core community has taught me about the challenges facing Linked Data" theme but probably came across more like "all librarians are stupid and stuck in the past". Oh well... apologies if I offended anyone in the audience :-).

Here are my slides:

They will hopefully have the audio added to them in due course - in the meantime, a modicum of explanation is probably helpful.

My fundamental point was that if we see Linked Data as the future of knowledge organization on the Web (the theme of the day), then we have to see Linked Data as the future of the Web, and (at the risk of kicking off a heated debate) that means that we have to see RDF as the future of the Web. RDF has been on the go for a long time (more than 10 years), a fact that requires some analysis and explanation - it certainly doesn't strike me as having been successful over that period in the way that other things have been successful. I think that Linked Data proponents have to be able to explain why that is the case rather better than simply saying that there was too much emphasis on AI in the early days, which seemed to be the main reason provided during this particular event.

My other contention was that the experiences of the Dublin Core community might provide some hints at where some of the challenges lie. DC, historically, has had a rather librarian-centric make-up. It arose from a view that the Internet could be manually catalogued for example, in a similar way to that taken to catalogue books, and that those catalogue records would be shipped between software applications for the purposes of providing discovery services. The notion of the 'record' has thus been quite central to the DC community.

The metadata 'elements' (what we now call properties) used to make up those records were semantically quite broad - the DC community used to talk about '15 fuzzy buckets' for example. As an aside, in researching the slides for my talk I discovered that the term fuzzy bucket now refers to an item of headgear, meaning that the DC community could quite literally stick it's collective head in a fuzzy bucket and forget about the rest of the world :-). But I digress... these broad semantics (take a look at the definition of dcterms:coverage if you don't believe me) were seen as a feature, particularly in the early days of DC... but they become something of a problem when you try to transition those elements into well crafted semantic web vocabularies, with domains, ranges and the rest of it.

Couple that with an inherent preference for "strings" vs. "things", i.e. a reluctance to use URIs to identify the entities at the value end of a property relationship - indeed, couple it with a distinct scepticism about the use of 'http' URIs for anything other than locating Web pages - and a large dose of relatively 'flat' and/or fuzzy modelling and you have an environment which isn't exactly a breeding ground for semantic web fundamentalism.

When we worked on the original DCMI Abstract Model, part of the intention was to come up with something that made sense to the DC community in their terms, whilst still being basically the RDF model and, thus, compatible with the Semantic Web. In the end, we alienated both sides - librarians (and others) saying it was still too complex and the RDF-crowd bemused as to why we needed anything other than the RDF model.

Oh well :-(.

I should note that a couple of things have emerged from that work that are valuable I think. Firstly, the notion of the 'record', and the importance of the 'record' as a mechanism for understanding provenance. Or, in RDF terms, the notion of bounded graphs. And, secondly, the notion of applying constraints to such bounded graphs - something that the DC community refers to as Application Profiles.

On the basis of the above background, I argued that some of the challenges for Linked Data lie in convincing people:

  • about the value of an open world model - open not just in the sense that data may be found anywhere on the Web, but also in the sense that the Web democratises expertise in a 'here comes everybody' kind of way.
  • that 'http' URIs can serve as true identifiers, of anything (web resources, real-world objects and conceptual stuff).
  • and that modelling is both hard and important. Martin Hepp, who spoke about GoodRelations just before me (his was my favorite talk of the day), indicated that the model that underpins his work has taken 10 years or so to emerge. That doesn't surprise me. (One of the things I've been thinking about since giving the talk is the extent to which 'models build communities', rather than the other way round - but perhaps I should save that as the topic of a future post).

There are other challenges as well - overcoming the general scepticism around RDF for example - but these things are what specifically struck me from working with the DC community.

I ended my talk by reading a couple of paragraphs from Chris Gutteridge's excellent blog post from earlier this month, The Modeller, which seemed to go down very well.

As to the rest of the day... it was pretty good overall. Perhaps a tad too long - the panel session at the end (which took us up to about 7pm as far as I recall) could easily have been dropped.  Ian Mulvany of Mendeley has a nice write up of all the presentations so I won't say much more here. My main concern with events like this is that they struggle to draw a proper distinction between the value of stuff being 'open', the value of stuff being 'linked', and the value of stuff being exposed using RDF. The first two are obvious - the last less-so. Linked Data (for me) implies all three... yet the examples of applications that are typically shown during these kinds of events don't really show the value of the RDFness of the data. Don't get me wrong - they are usually very compelling examples in their own right but usually it's a case of 'this was built on Linked Data, therefore Linked Data is wonderful' without really making a proper case as to why.

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8345203ba69e20133f451eb15970b

Listed below are links to weblogs that reference On the length and winding nature of roads:

Comments

I'm a librarian, and I'm stupid. I happily admit that I am stupid.

These systems have got to work with and for stupid people. I have yet to see them do so. That's my problem with RDF (writ large; RDF+OWL+SPARQL+everything else). If it only works for the eggheads among us, it's dead in the water.

Also I am skeptical about http identifiers. :)

Picking up on one point - your last sentence:
"they are usually very compelling examples in their own right but usually it's a case of 'this was built on Linked Data, therefore Linked Data is wonderful' without really making a proper case as to why."

Absolutely! I find this really frustrating. Recently I sat through a series of such demonstrations at a workshop at the British Library. I came away thinking that the creators of these demonstrations were basically proud of the fact that they had managed to create a standard web application (albeit well-designed and executed) on an RDF platform at all.

"Look at what you can do with Linked Data!", they cry. "Yes, but we've been doing that with relational databases and CGI for 15 years", I mutter grumpily....

I'm not at all clear that RDF offers much in the way of immediate benefit to content creators - enough for them to struggle with the awful tools and general complexity. And, it seems to me, the real vision for the Semantic Web (in its Linked Data) incarnation depends on RDF being pretty much ubiquitous.

When even the best demonstrations do not actually demonstrate the benefits of RDF, when it is still so poorly supported in terms of CMS systems, tools, storage etc. I struggle to see why the average content/data provider should care about RDF. And, exactly as Dorothea says in the previous comment, "if it only works for the eggheads among us, it's dead in the water"

The comments to this entry are closed.

About

Search

Loading
eFoundations is powered by TypePad