« Creative Commons, open licences and cultural heritage | Main | Repository Plan B? »

June 15, 2007

Repositories roadmap, cont...

I wanted to respond to some of the comments made on my presentation to the JISC Repositories Conference, you know... the one where I waffled on about Web 2.0 and sadly concluded that we need to take a different approach :-)

In the comments to my original blog entry Herbert says:

I do not get the point you are making re "OAI becoming less important". OAI (I think you mean OAI-PMH) fits under the Resource Discovery category of this blog. Just like RSS, and Sitemaps do. Nothing more, nothing less. And I think resource discovery is and remains important and the OAI-PMH offers one approach to allow for batch discovery of resources. Unfortunately, not because of some flaw in OAI-PMH (I think), but rather because of ambiguities in unqualified Dublin Core (or in the implementation thereof) regarding referencing actual resources by means of their URIs, many OAI-PMH harvested records turn out to be of little use to the major search engines. As far as I understood from discussions with Google people, this is the major reason why they do not promote (do not read "do not use") the OAI-PMH as a way to discover resources. Replace unqualified Dublin Core by some more meaningful resource description approach (I am, for example, thinking OAI-ORE serializations of named graphs; see Pete's entry), and I think that OAI-PMH still has quite something to offer in the realm of resource discovery.

I'm a theoretical fan of the OAI-PMH, but the sad fact is that it hasn't changed the world in the way that RSS has.  I don't know quite why but I don't think that one can simply blame DC.  I suspect it has to do with complexity, not just at the protocol level, but also in terms of our inconsistent modeling of the objects in repositories and the way we use OAI to expose metadata about them.  The primary problems with the use of DC in repositories are to do with the fact that it is used to disclose metadata about different kinds of objects in different repositories - conceptual 'works' in some, digital 'items' in others, and so on.  Add the conceptually challenging notion of an OAI 'item' into the mix and you have a potential problem.  That's not a problem with DC it seems to me, but with an inconsistency in how we see the world we are dealing with.

I also tend to think that the fact that OAI-PMH adopts a service-oriented approach where RSS and Atom adopt a resource-oriented approach is probably significant.  It's about fitting in with the way the Web works.

Apart from this, I would like to also agree with Pete when he suggests that the eventual OAI-ORE approach will most likely not be complex. The theory may look complex at this moment, I think the practice should be relatively simple. I think we agree that simplicity is a major factor when it comes to getting buy-in for interoperability specs. We have learned lessons from the past.

I think we could argue about simplicity vs complexity for a long time - and we probably wouldn't strongly disagree.  From my point of view it kind of misses the point.  The key issue for me is that we haven't managed to build a set of repository services that support the social dimension of what we want to achieve - improved scholarly communication.  RSS, it seems to me, is one of the features that has allowed Web 2.0 services to flourish - something we are still struggling to achieve with repositories in any real sense.  I'll return to this in my next post.

Rachel said:

In my view (and this is what I have argued in the past) it is significant that institutional repositories are 'well-managed' and for this reason have a level of sustainability and trustworthiness over and above an individual academic's or even a department's Web site. There are a number of actors involved with 'repositories' - depositors of content; searchers and users of content; repository administrators (ultimately the institution). That the repository is 'well-managed' (sustainable, backed up by institutional mandates, trusted) is an important characteristic which should encourage in particular the depositor to populate the repository and the administrator to keep content safe.

To that extent I think the repository is a particular sort of 'Web site', it has institutional commitment to keep it up-to-date and high quality.

Over and above that characteristic, I would suggest the manner in which the 'repository' interfaces with both the depositor and the searcher (both of whom might be considered as consumers I think?) can be as much Web 2.0 as you like....

I don't strongly disagree with any of this - yes, stuff needs managing somewhere - except that I think it interprets 'Web 2.0' as having a technical/technology meaning, whereas I'm more interested in the social aspects - Web 2.0 as attitude.  For example, in my talk I described ArXiv as the first (academic) Web 2.0 service, even though it pre-dates the Web.  This attitude needs to pervade every aspect of the way we deploy repositories, it's not just a surface gloss that we layer on top of an existing bit of software.
I accept that I'm finding it hard to get my thoughts across here.  Part of the problem, I think, is that we all interpret Web 2.0 as meaning slightly different things. I'll try and clarify some of this in my next post...


TrackBack URL for this entry:

Listed below are links to weblogs that reference Repositories roadmap, cont...:


Do you see the FRBR model playing a role in this?


Some of the issues faced by service providers wanting to harvest using OAI-PMH are detailed at http://www.icbl.hw.ac.uk/perx/setupmaintenance.htm

Interestingly, the ticTOCs project which I'm now involved in, which will aggregate RSS feeds of journal TOCs, is finding vaguely similar issues with RSS feeds - e.g. that publishers don't use RSS in very standardised ways - e.g. all sorts of information can appear in even the title field. However, this will not prove unsurmountable to the projects aims.


Andy, I'll look forward to your next post but in the meantime my two penneth would be that RSS has, and has had for some time, the benefit of prevalence. At the JISC conference we heard from one project where the developers struggled to find any OAI-compliant repository to interoperate with. Sure, there are some, but his experience was telling I think. On the other hand finding services with which to exchange syndicated content via RSS is far less a problem. Modestly (well no, why should I), I built a repository (actually more an LCMS) with RSS service layer that allowed interoperability with a wide range of other systems (such as there were back then) and data types (some 'content' was learning outcomes for example) back in '99/00. We syndicated content with a number of other institutions. The tipping point was not the way metadata described digital or real-world objects, but rather access to other services to interoperate with that contained data that people wanted.

Don't be too hard on yourself. I don't recall you waffling on about web 2.0 and sadly concluding that we need to take a different approach. We're all in the same boat trying to gain insight into the application of web 2.0 to educational technology, and as you say, there's aways the next post.


I could obviously not agree more with your:

"The key issue for me is that we haven't managed to build a set of repository services that support the social dimension of what we want to achieve - improved scholarly communication."

As you know, that is what I have been preaching (and researching) for many years, now, and I hope that OAI-ORE (which is a direct result of those efforts) can be one of many necessary steps in the right direction.

Joining the fashionable trend, I will also very much agree that there are quite a few issues with the OAI-PMH. But I would like to note the following:

(*) The OAI-PMH remains one of the only components of a very thin interoperability layer that we currently have across repositories. It would be kind of sad to throw it out (before we have something else/better).

(*) Let's keep things in perspective: the OAI-PMH was created to allow for the creation of cross-repository search services; an alternative to the distributed searching approaches that some, in those days (1999, when OAI started), still believed in. The OAI-PMH did not anticipate the need for Web 2.0 style object re-use mash-ups. Lack of vision, I guess.

> It's about fitting in with the way the Web works.

There is now a much greater mainstream understanding of the importance of the architecture of the web, as outlined in documents like /Architecture of the World Wide Web, Volume One/ and Fielding's thesis. Every function of OAI-PMH could have been implemented with HTTP URIs, HTTP, and a well defined xml document format. It was a mistake, though an understandable one at the time, to have built a protocol which did not comply with the web architecture. At this point in time it would be a mistake, but no longer an understandable one, to not work with the web architecture. If this meant dropping OAI-PMH in favor of a system based on the architecture of the web, it might be the right thing to do.

A decent heuristic is this: if you are building a system which works with the web architecture, which works with the web, then a web crawler (say, google) can crawl it. If you are building a system that Google cannot crawl, it is not part of the web, and you ought to have a very good reason for building a system which is not part of the web.

RSS is not important; the atom format is not important. What is important is the thought that went into the Atom format & into the Atom protocol. The hard part of defining the web architecture has largely been finished; we just need to catch up.

The comments to this entry are closed.



eFoundations is powered by TypePad