OAI-PMH vs. Atom vs. Sitemaps
For some time now I've been meaning to write a blog entry summarising the functional capabilities of the OAI-PMH and then looking at whether and how the same functionality could be delivered using RSS, Atom or Sitemaps.
Jim Downing has beaten me to it.
I have one minor quibble - which may be to do with my lack of understanding about Atom - in that I don't fully understand what Jim means by:
I have a feeling that the resource representations in Atom / RSS feeds are unlikely to satisfy most repository clients’ needs. Isn’t a more resource-oriented approach to simply link to the resource and let the client negotiate with the resource for an appropriate representation?
That said, I certainly agree with the thrust of his post.


I've just re-read Jim's post and think I now get it... sorry, I don't know why I didn't see what he was saying first time round :-(
I think that what Jim is arguing is that the added richness in an Atom or RSS feed over simply giving a list of URLs isn't sufficiently useful to repository clients to bother with and that we might as well therefore just use Sitemaps.
From a pure harvesting viewpoint, I think I agree - it is just the plain old list of URLs that is of interest.
But from my new-found repository as research feed viewpoint - i.e. from a viewpoint that sees an eprint repository as a collection of research blogs - RSS is important in the context of surfacing content in Technorati and the like.
So I think we actually need both Sitemaps and RSS feeds.
Posted by: Andy Powell | June 22, 2007 at 07:08 AM
Yes, that's what I meant - put a lot more clearly!
There's been a shift in the way syndication is described; from being a content feed, to being a representation of link lists (which is how Atom is repeatedly described in the REST book, for example), which muddies the water when it comes to harvesting.
The conclusion I came was based on the observation that syndication feeds are going to be a fact of life for all types of repository (not just eprint repos). The question for me was: if we can turn syndication feeds into an efficient harvesting API, why implement sitemaps in addition?
I suppose the pragmatic answer is that the major search engines want us to, and that the syndication standards chose not to support harvesting - it's the web way.
Posted by: Jim Downing | June 22, 2007 at 09:33 AM
> if we can turn syndication feeds into an
> efficient harvesting API, why implement
> sitemaps in addition?
Isn't that actually what OAI-PMH is already? I completely agree that we need RSS to promote the content to the many aggregators.
But sitemaps were introduced with a different mindset from the search engines. There focus was just an easy resource discovery for websites they need to crawl. It needed to be simple to attract a lot of web site owners to adopt to sitemaps. Why didn't they promote OAI-PMH? Because of its slightly more complexity and technical demand. The adoption wouldn't have been as fast as with sitemaps.
So I would think it would be easier to strip down OAI-PMH for the general purpose use of web resource representation.
Posted by: Lars Kirchhoff | June 22, 2007 at 10:38 AM
Well, some of the W3C lists use this purist resource-oriented view, and are basically useless when you scan then in an aggregator - I NEED the summary to know if its worth my while viewing the resource itself. Or in other words, their feeds are only usefully read by machines.
Personally I think if you make a feed with no human summary information you're cutting out a lot of uses. Sure, its not "good enough" metadata for some repository harvesters.
Pragmatically, any repository owner today is going to have to do both OAI-PMH and Atom. Hardly a hardship, though, is it?
Posted by: Scott | June 22, 2007 at 09:08 PM