« On names | Main | What's a tweet worth? »

July 21, 2009

Linked data vs. Web of data vs. ...

On Friday I asked what I thought would be a pretty straight-forward question on Twitter:

is there an agreed name for an approach that adopts the 4 principles of #linkeddata minus the phrase, "using the standards (RDF, SPARQL)" ??

Turns out not to be so straight-forward, at least in the eyes of some of my Twitter followers. For example, Paul Miller responded with:

@andypowe11 well, personally, I'd argue that Linked Data does NOT require that phrase. But I know others disagree... ;-)

and

@andypowe11 I'd argue that the important bit is "provide useful information." ;-)

Paul has since written up his views more thoughtfully in his blog, Does Linked Data need RDF?, a post that has generated some interesting responses.

I have to say I disagree with Paul on this, not in the sense that I disagree with his focus on "provide useful information", but in the sense that I think it's too late to re-appropriate the "Linked Data" label to mean anything other than "use http URIs and the RDF model".

To back this up I'd go straight to the horses mouth, Tim Berners-Lee, who gave us his personal view way back in 2006 with his 'design issues' document on Linked Data. This gave us the 4 key principles of Linked Data that are still widely quoted today:

  1. Use URIs as names for things.
  2. Use HTTP URIs so that people can look up those names.
  3. When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL).
  4. Include links to other URIs. so that they can discover more things.

Whilst I admit that there is some wriggle room in the interpretation of the 3rd point - does his use of "RDF, SPARQL" suggested these as possible standards or is the implication intended to be much stronger? - more recent documents indicate that the RDF model is mandated. For example, in Putting Government Data online Tim Berners-Lee says (refering to Linked Data):

The essential message is that whatever data format people want the data in, and whatever format they give it to you in, you use the RDF model as the interconnection bus. That's because RDF connects better than any other model.

So, for me, Linked Data implies use of the RDF model - full stop. If you put data on the Web in other forms, using RSS 2.0 for example, then you are not doing Linked Data, you're doing something else! (Addendum: I note that Ian Davis makes this point rather better in The Linked Data Brand).

Which brings me back to my original question - "what do you call a Linked Data-like approach that doesn't use RDF?" - because, in some circumstances, adhering to a slightly modified form of the 4 principles, namely:

  1. use URIs as names for things
  2. use HTTP URIs so that people can look up those names
  3. when someone looks up a URI, provide useful information
  4. include links to other URIs. so that they can discover more things

might well be a perfectly reasonable and useful thing to do. As purists, we can argue about whether it is as good as 'real' Linked Data but sometimes you've just got to get on and do whatever you can.

A couple of people suggested that the phrase 'Web of data' might capture what I want. Possibly... though looking at Tom Coates' Native to a Web of Data presentation it's clear that his 10 principles go further than the 4 above.  Maybe that doesn't matter? Others suggested "hypermedia" or "RESTful information systems" or "RESTful HTTP" none of which strikes me as quite right.

I therefore remain somewhat confused. I quite like Bill de hÓra's post on "links in content", Snowflake APIs, but, again, I'm not sure it gets us closer to an agreed label?

In a comment on a post by Michael Hausenblas, What else?, Dan Brickley says:

I have no problem whatsoever with non-RDF forms of data in “the data Web”. This is natural, normal and healthy. Stastical information, geographic information, data-annotated SVG images, audio samples, JSON feeds, Atom, whatever.

We don’t need all this to be in RDF. Often it’ll be nice to have extracts and summaries in RDF, and we can get that via GRDDL or other methods. And we’ll also have metadata about that data, again in RDF; using SKOS for indicating subject areas, FOAF++ for provenance, etc.

The non-RDF bits of the data Web are – roughly – going to be the leaves on the tree. The bit that links it all together will be, as you say, the typed links, loose structuring and so on that come with RDF. This is also roughly analagous to the HTML Web: you find JPEGs, WAVs, flash files and so on linked in from the HTML Web, but the thing that hangs it all together isn’t flash or audio files, it’s the linky extensible format: HTML. For data, we’ll see more RDF than HTML (or RDFa bridging the two). But we needn’t panic if people put non-RDF data up online…. it’s still better than nothing. And as the LOD scene has shown, it can often easily be processed and republished by others. People worry too much! :)

Count me in as a worrier then!

I ask because, as a not-for-profit provider of hosting and Web development solutions to the UK public sector, Eduserv needs to start thinking about the implications of Tim Berners-Lee's appointment as an advisor to the UK government on 'open data' issues on the kinds of solutions we provide.  Clearly, Linked Data is going to feature heavily in this space but I fully expect that lots of stuff will also happen outside the RDF fold.  It's important for us to understand this landscape and the impact it might have on future services.

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8345203ba69e20115712ab7ee970c

Listed below are links to weblogs that reference Linked data vs. Web of data vs. ...:

Comments

Andy

some good points. It's worth noting, though, that the inclusion of the 'offending' RDF and SPARQL stipulation does not date back to the genesis of Tim's 2006 document; it's a far more recent addition. See, for example, this from Ed Summers...

Your points and concerns remain valid; but now you can make them because they're VALID... rather than because you think TimBL wrote it thus in 2006...

Paul

Bah. TypePad ignored the link under 'this', above. The link is http://twitter.com/edsu/status/2740552720.

Paul,
thanks. Actually, I think the issue of when he wrote that specific phrase is pretty much irrelevant because it's not a very good phrase anyway (as I argue in the post - and as I think you also say in yours). The issue is less, "what did he mean then?", more, "what has the idea come to mean since then?". I don't see much consensus with your view on that. There are lots of documents that say, "Linked Data == http URIs and RDF", but very few that agree with you.

To be clear... I agree more or less with your "provide something useful" view of the world, I just don't think we can label that view "Linked Data". We have to find a different label.

It would certainly be useful to have a label for this whole web of data that's not Linked Data thing! It seems clear that most research data will fall into this category, along with most things in government databases etc. Some of these might be expressible as RDF but don't seem to be naturally framed as such.

I asked NERC data managers what they felt the role of RDF was in relation to their kind of (really important environmental) data, and the answer seemed to be largely in the metadata. So this is a dataset measuring these variables at this place and time with this context etc. But the variable measurement at 1816 is a field in the dataset rather than a piece of Linked Data.

I noticed you mentioned the word: metadata. That's the essence of the matter. You cannot have a World Wide Web of Linked Data without a metadata model that uses HTTP URIs to achieve the following:

1. Subject or Entity Identification mechanism
2. Conduit to negotiated representations of Entity or Subject or Resource Metadata (the "Description" part of RDF acronym).

I would like to state one more time:
The Linked Data meme is about an implicit binding of an Entity and negotiated representatins of its Metadata via a single HTTP URI.

The magic is in the HTTP URI courtesy of its inherent Data Identity & Data Access duality.

RDF is the moniker for an EAV/CR data model. Thus, as long as you have HTTP URIs intergral to an EAV/CR data model [1] you are compatible with the Linked Data meme.

Now what do we have re. Linked Data without an HTTP URI based EAV/CR model? What we have now, a Web of Linked Data Containers that lack more granular Datum level Links re. the HTTP based Network Cluster within the Internet known as the World Wide Web.

I hope I've answered you intial question? :-)

Links:

1. http://en.wikipedia.org/wiki/Entity-attribute-value_model

Kingsley

The comments to this entry are closed.

About

Search

Loading
eFoundations is powered by TypePad