« DC-2008 Call for Papers | Main | Blastfeed - a small case-study in API persistence »

January 14, 2008

Following your nose

I think one of the most helpful principles which I've picked up on from following various discussions around the topic of the Web architecture is that sometimes described as "following your nose".

I'm not sure there's a concise one document summary of the principle anywhere (or at least I struggle to find one with Google). (Edit: It looks as if this draft W3C TAG finding by Noah Mendelsohn is an attempt to provide one. And I should emphasis that it is very much a draft, as it clearly indicates that it is incomplete.) The principle has been highlighted in a number of recent presentations related to the GRDDL specification (see e.g. Dan Connolly, Practical Semantic Web Deployment with Microformats and GRDDL), but I think it's important to emphasise that it is in no way specific to that context. Rather, it is a general principle of the Web, and indeed it arises from some of the central constraints of the REST architectural style: that messages should be self-descriptive. Or as Mark Baker phrases it in a presentation from 2004, "The meaning of a message is fully grounded in public specification, or the Web itself". Each message which forms a representation of resource state should carry information which indicates - in Web-friendly ways - the conventions used in that message that are important for its interpretation.

Mark's presentation in turn references Tim Berners-Lee's keynote presentation from the World Wide Web Conference of 2002, in which he emphasises that working on the Web involves "a serious commitment to common meanings", and traces the application of the principle to bitstreams on the Web, illustrating the role of the chain of unambiguous references to various public specifications in the interpretation of messages on the Web.

And the "follow your nose" approach is not an "optional extra"; on the contrary, it is fundamentally necessary in order to support the highly devolved, loosely coupled nature of interaction on the Web. As the Mendelsohn draft puts it, "Web architecture dictates that any user agent may at any time issue a GET and attempt to interpret representations for any HTTP resource." It is not sufficient to rely on an expectation of some additional pre-coordination between provider and consumer, some private agreement on the use of specialised conventions, in order to to enable interpretation.

The use of URIs as names - and in particular URIs that can be dereferenced using the HTTP protocol - is a critical enabling factor in the "follow your nose" approach. Just at the level of naming/identification, the use of a URI provides for disambiguation in the global context in a way a plain character string can not. But further, when a server provides a URI in a representation, the client can in turn seek to dereference that URI to try to obtain more information about the resource identified by that URI provided its owner. That information may take the form of a human-readable document, but it may also provide information to enable further processing of the original representation. (Aside: This was another note that I started writing before Christmas, and I noticed that in the meantime Ed Summers has posted a draft of a forthcoming Information Standards Quarterly (available from NISO) article titled "Following your nose to the Web of Data", in which he explores this further.)

Over the last couple of years there has been a good deal of interest in embedding structured data in X/HTML documents, not only in the case of GRDDL but also that of microformats. One of the important "hooks" for establishing this "chain of meaning" in this context is the use of HTML's meta data profile feature and the profile attribute of the HTML head element. And indeed a page on the microformats wiki notes that "it is ACCEPTED that each microformat should have a profile URI". For each microformat, an HTML meta data profile provides more information about the interpretation of that microformat, either for a human reader or an application or both. (In practice, unfortunately, as Dave Orchard notes, many microformats implementers - even where a profile has been defined by the microformat creator - ignore the recommendation to use the profile URI in their HTML instances.)

There are examples of conventions used within the digital library and e-learning communities (and indeed more broadly), at least some of them enshrined in de facto or de jure standards, which ignore, or at least do not adhere as closely as perhaps they should to, the "self-describing messages" principle. In many cases, I guess that can be put down to a case of "if we'd known then what we know now...", but I'd like to think we now recognise the need to ground our future specifications firmly in the Web. I'll mention a couple of examples where I think a relatively minor change could make a substantial step towards addressing the problem.

The OpenID Authentication specification (Version 1.1, Version 2), for example, makes use of a number of simple tokens which are used as application-specific link types (e.g. openid.server, openid.delegate, openid2.provider, openid2.local_id) in link elements in the headers of HTML documents to represent relationships between resources. These tokens are defined by the OpenID specification, and are supplementary to the "built-in" link types defined by the HTML specification itself. While the intent in the HTML spec is indeed that the list is extensible, AFAICT, the OpenID specification ignores the advice of the HTML specification to use a meta data profile and the profile attribute to provide access to documentation of those extensions:

Authors may wish to define additional link types not described in this specification. If they do so, they should use a profile to cite the conventions used to define the link types.

So currently an agent processing an HTML document and encountering one of these OpenID link types can not "follow its nose" to obtain further information; OpenID relies on the client having prior knowledge of the OpenID link types and the set of character strings that represent those types. Dan Connolly provides an example of how the provision of such a profile, and the use of the HTML profile attribute to reference that profile, would ground OpenID more firmly in the Web.

Similarly, in the current alpha drafts of the OAI ORE specifications, there is a proposal for a set of conventions proposed for embedding data in HTML documents. However, this too relies on the consumer of the document having advance built-in knowledge of those specific conventions and of specific character strings used as HTML attribute values. I'd suggest that for the ORE case, what is required is:

  1. Clarification of what relationships we wish to assert, and what RDF triples are required to make those assertions, including any additional terms required. (The core ORE data model is based on RDF, so this should be relatively straightforward to do)
  2. Development of a convention for representing those triples in HTML which is firmly grounded in the Web and compatible with the "self-describing message"/"follow your nose" principle, either (a) by defining an ORE-specific microformat with its own associated profile URI and using that profile to enable GRDDL-based extraction of those triples from an HTML instance; or (b) adopting the use of (a small subset of) RDFa. (My slight concern about the latter is that RDFa still seems to be work-in-progress - but OTOH so is ORE, and as long as that is made clear in our documentation that may not be an issue.)

I'm currently in Washington, D.C. for a meeting of the OAI ORE Technical Committee over the next two days, so I guess I'll get to have these discussions in a few hours time :-) Having said that, the combination of time zone adjustment (which I seem to find ever harder these days) and a slightly noisy hotel room means that I've managed only about five hours sleep for two nights running, so at this rate, far from resolutely fighting the corner of Web architecture, I'll probably be dozing over my laptop by lunchtime.


TrackBack URL for this entry:

Listed below are links to weblogs that reference Following your nose:


The comments to this entry are closed.



eFoundations is powered by TypePad