« Knock, knock, who's there? | Main | A cautionary tale... »

August 11, 2007

PURL's single point of failure?

It strikes me that PURLs are now a pretty critical part of the Web infrastructure.  Of course, that statement won't ring true for everyone - I suspect quite a few of you are thinking, "Huh... I've never created a PURL in my life?".  But certainly in the semantic Web arena, PURLs are widely used as identifiers for metadata terms - DCMI started doing this ages ago, and many other metadata initiatives have followed suit.

As Pete and Thom noted a while back, OCLC have now funded an activity to renew the architecture of the PURL system and Pete notes some reasons why this is important.

But the PURL system is not without problems.  Several years ago I tried to highlight the fact that the PURL system represents a single point of failure in the Web infrastructure - in persistence terms, the ongoing provision of the PURL service relies on the goodwill of OCLC.  Not something that I doubt particularly - but not an ideal situation either.

My initial thoughts on solving this problem were around mirroring - to which end I briefly created http://purl.ue.org/ (though I note that it no longer exists).  But I quickly realised that mirroring was a useless solution.  Why?  Because it results in multiple URIs for the same resource, something that the Web Architecture tells us to avoid if at all possible.

A better solution lies in DNS hiding.  Running multiple instances of the PURL software around the planet but hiding them all behind http://purl.org/ - using the DNS to share the load between the different servers  Who would run such a networked set of services?  Like any infrastructural, and largely invisible, service, the business models for running this aren't clear.  But one could imagine, for example, national libraries having an interest in running, or funding, an instance of the PURL software for the benefit of their own, and other, communities.

Of course, one could only hide multiple PURL servers behind a single DNS domain if mechanisms for rapidly replicating the data between systems are put into place.  Perhaps now is a good time to think about adding that functionality into the PURL system?

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8345203ba69e200e5508632f18834

Listed below are links to weblogs that reference PURL's single point of failure?:

Comments

Interestingly, this is also why the Handle system is a bad idea.

Andy --

What do you think of ARKs? It leverages the DNS system (for now) to make the identifiers actionable. And in theory (I think) any ARK resolver ought to be able to resolve -- or at least redirect -- any ARK.

"Of course, one could only hide multiple PURL servers behind a single DNS domain if mechanisms for rapidly replicating the data between systems are put into place. Perhaps now is a good time to think about adding that functionality into the PURL system?"

It is starting to sound like a handle proxy. Why reinvent the wheel?

Peter,

ARKs do, in theory, handle this sort of proxying. However, no software or infrastructure is available for this currently, AFAIK. So it just "in theory" as you say.

John Kunze, author of the ARK spec and software, has written about this issue and also about the namespace splitting problem that Handles, ARKs, and related identifier persistence schemes all suffer from:

http://n2t.info/

The comments to this entry are closed.

About

Search

Loading
eFoundations is powered by TypePad