Readability and linkability
In July last year I noted that the terminology around Linked Data was not necessarily as clear as we might wish it to be. Via Twitter yesterday, I was reminded that my colleague, Mike Ellis, has a very nice presentation, Don't think websites, think data, in which he introduces the term MRD - Machine Readable Data.
It's worth a quick look if you have time:
We also used the 'machine-readable' phrase in the original DNER Technical Architecture, the work that went on to underpin the JISC Information Environment, though I think we went on to use both 'machine-understandable' and 'machine-processable' in later work (both even more of a mouthful), usually with reference to what we loosely called 'metadata'. We also used 'm2m - machine to machine' a lot, a phrase introduced by Lorcan Dempsey I think. Remember that this was back in 2001, well before the time when the idea of offering an open API had become as widespread as it is today.
All these terms suffer, it seems to me, from emphasising the 'readability' and 'processability' of data over its 'linkedness'. Linkedness is what makes the Web what it is. With hindsight, the major thing that our work on the JISC Information Environment got wrong was to play down the importance of the Web, in favour of a set of digital library standards that focused on sharing 'machine-readable' content for re-use by other bits of software.
Looking at things from the perspective of today, the terms 'Linked Data' and 'Web of Data' both play up the value in content being inter-linked as well as it being what we might call machine-readable.
For example, if we think about open access scholarly communication, the JISC Information Environment (in line with digital libraries more generally) promotes the sharing of content largely through the harvesting of simple DC metadata records, each of which typically contains a link to a PDF copy of the research paper, which, in turn, carries only human-readable citations to other papers. The DC part of this is certainly MRD... but, overall, the result isn't very inter-linked or Web-like. How much better would it have been to focus some effort on getting more Web links between papers embedded into the papers themselves - using what we would now loosely call a 'micro format'? One of the reasons I like some of the initiatives around the DOI (though I don't like the DOI much as a technology), CrossRef springs to mind, is that they potentially enable a world where we have the chance of real, solid, persistent Web links between scholarly papers.
RDF, of course, offers the possibility of machine-readability, machine-processable semantics, and links to other content - which is why it is so important and powerful and why initiatives like data.gov.uk need to go beyond the CSV and XML files of this world (which some people argue are good enough) and get stuff converted into RDF form.
As an aside, DCMI have done some interesting work on Interoperability Levels for Dublin Core Metadata. While this work is somewhat specific to DC metadata I think it has some ideas that could be usefully translated into the more general language of the Semantic Web and Linked Data (and probably to the notions of the Web of Data and MRD).
Mike, I think, would probably argue that this is all the musing of a 'purist' and that purists should be ignored - and he might well be right. I certainly agree with the main thrust of the presentation that we need to 'set our data free', that any form of MRD is better than no MRD at all, and that any API is better than no API. But we also need to remember that it is fundamentally the hyperlink that has made the Web what it is and that those forms of MRD that will be of most value to us will be those, like RDF, that strongly promote the linkability of content, not just to other content but to concepts and people and places and everything else.
The labels 'Linked Data' and 'Web of Data' are both helpful in reminding us of that.