Repositories and Web 2.0
[Editorial note: I've updated the title and content of this item in response to comments that correctly pointed out that I was over-emphasising the importance of Flash in Web 2.0 service user-interfaces.]
At a couple of meetings recently the relationship between digital repositories as we currently know them in the education sector and Web 2.0 has been discussed. This happened first at the CETIS Metadata and Digital Repositories SIG meeting in Glasgow that looked at Item Banks, then again at the eBank/R4L/Spectra meeting in London.
In both cases, I found myself asking "What would a Web 2.0 repository look like?". At the Glasgow meeting there was an interesting discussion about the desirability of separating back-end functionality from the front-end user-interface. From a purist point of view, this is very much the approach to take - and its an argument I would have made myself until recently. Let the repository worry about managing the content and let someone (or something) else build the user-interface based on a set of machine-oriented APIs.
Yet what we see in Web 2.0 services is not such a clean separation. What has become the norm is a default user-interface, typically written in AJAX though often using other technologies such as Flash, that is closely integrated into the back-end content of the Web 2.0 service. For example, both Flickr and SlideShare follow this model. Of course, the services also expose an API of some kind (the minimal API being persistent URIs to content and various kinds of RSS feeds) - allowing other services to integrate ("mash") the content and other people to develop their own user-interfaces. But in some cases at least, the public API isn't rich enough to allow me to build my own version of the default user-interface.
More recently, there has been a little thread on the UK email@example.com list about the mashability of digital repositories. However, it struck me that most of that discussion centered on the repository as the locus of mashing - i.e. external stuff is mashed into the repository user-interface, based on metadata held in repository records. There seemed to be little discussion about the mashability of the repository content itself - i.e. where resources held in repositories are able to be easily integrated into external services.
One of the significant hurdles to making repository content more mashable is the way that identifiers are assigned to repository content. Firstly, there is currently little coherence in the way that identifiers are assigned to research publications in repositories. This is one of the things we set out to address in the work on the Eprints Application Profile. Secondly, the 'oai' URIs typically assigned to metadata 'items' in the repository are not Web-friendly and do not dereference (i.e. are not resolvable) in any real sense, without every application developer having to hardcode knowledge about how to dereference them. To make matters worse, the whole notion of what an 'item' is in the OAI-PMH is quite difficult conceptually, especially for those new to the protocol.
Digital repositories would be significantly more usable in the context of Web 2.0 if they used 'http' URIs throughout, and if those URIs were assigned in a more coherent fashion across the range of repositories being developed.