URIs and protocol dependence
In responding to my recent post, The Nature of OAI, identifiers and linked data, Herbert Van de Sompel says:
There is a use case for both variants: the HTTP variant is the one to use in a Web context (for obvious reasons; so totally agreed with both Andy and PeteJ on this), and the doi: variant is technology independent (and publishers want that, I think, if only to print it on paper).
(My emphasis added).
I'm afraid to say that I could not disagree more with Herbert on this. There is no technological dependence on HTTP by http URIs. [Editorial note: in the first comment below Herbert complains that I have mis-represented him here and I am happy to conceed that I have and apologise for it. By "technology independent" Herbert meant "independednt of URIs", not "independent of HTTP". I stand by my more general assertion in the remainder of this post that a mis-understanding around the relationship between http URIs and HTTP (the protocol) led the digital library community into a situation where it felt the need to invent alternative approaches to identification of digital conetent and, further, that those alternative approaches are both harmful to the Web and harmful to digital libraries. I think those mis-understandings are still widely held in the digital library community and I disagree with those people who continue to promote relatively new non-http forms of URIs for 'scholarly' content (by which I primarily mean info URIs and doi URIs) as though their use was good practice. On that basis, this blog post may represent a disagreement between Herbert and me but it may not. See the comment thread for further discussion. Note also that my reference to Stu Weibel below is intended to indicate only what he said at the time, not his current views (about which I know nothing).] As I said to Marco Streefkerk in the same thread:
there is nothing in the 'http' prefix of the http URI that says, "this must be dereferenced using HTTP". In that sense, there is no single 'service' permanently associated with the http URI - rather, there happens to be a current, and very helpful, default de-referencing mechanism available.
At the point that HTTP dies, which it surely will at some point, people will build alternative de-referencing mechanisms (which might be based on Handle, or LDAP, or whatever replaces HTTP). The reason they will have to build something else is that the weight of deployed http URIs will demand it.
That's the reasoning behind my, "the only good long term identifier, is a good short term identifier" mantra.
The mis-understanding that there is a dependence between the http URI and HTTP (the protocol) is endemic in the digital library community and has been the cause of who knows how many wasted person-years, inventing and deploying unnecessary, indeed unhelpful, URI schemes like 'oai', 'doi' and 'info'. Not only does this waste time and effort but it distances the digital library community from the mainstream Web - something that we cannot afford to happen.
Many URI schemes define a default interaction protocol for attempting access to the identified resource. That interaction protocol is often the basis for allocating identifiers within that scheme, just as "http" URIs are defined in terms of TCP-based HTTP servers. However, this does not imply that all interaction with such resources is limited to the default interaction protocol.
This has been re-iterated numerous times in discussion, not least by Roy Fielding:
"Really, it has far more to do with a basic misunderstanding of web architecture, namely that you have to use HTTP to get a representation of an "http" named resource. I don't think there is any simple solution to that misbelief aside from just explaining it over and over again."
"However, once named, HTTP is no more inherent in "http" name resolution than access to the U.S. Treasury is to SSN name resolution."
"The "http" resolution mechanism is not, and never has been, dependent on DNS. The authority component can contain anything properly encoded within the defined URI syntax. It is only when an http identifier is dereferenced on a local host that a decision needs to be made regarding which global name resolution protocol should be used to find the IP address of the corresponding authority. It is a configuration choice."
The (draft) URI Schemes and Web Protocols Tag Finding makes similar statements (e.g. section 4.1):
A server MAY offer representations or operations on a resource using any protocol, regardless of URI scheme. For example, a server might choose to respond to HTTP GET requests for an ftp resource. Of course, this is possible only for protocols that allow references to URIs in the given scheme.
I know that some people don't like this interpretation of http URIs, claiming that W3C (and presumably others) have changed their thinking over time. I remember Stu Weibel starting a presentation about URIs at a DCC meeting in Glasgow with the phrase:
Can you spell revisionist?
I disagree with this view. The world evolves. Our thinking evolves. This is a good thing isn't it? It's only not a good thing if we refuse to acknowledge the new because we are too wedded to the old.