« W3C TAG considering identification in Virtual Worlds | Main | Second Life in 3600 seconds - University of Bath »

June 21, 2007

OAI-PMH vs. Atom vs. Sitemaps

For some time now I've been meaning to write a blog entry summarising the functional capabilities of the OAI-PMH and then looking at whether and how the same functionality could be delivered using RSS, Atom or Sitemaps.

Jim Downing has beaten me to it.

I have one minor quibble - which may be to do with my lack of understanding about Atom - in that I don't fully understand what Jim means by:

I have a feeling that the resource representations in Atom / RSS feeds are unlikely to satisfy most repository clients’ needs. Isn’t a more resource-oriented approach to simply link to the resource and let the client negotiate with the resource for an appropriate representation?

That said, I certainly agree with the thrust of his post.


TrackBack URL for this entry:

Listed below are links to weblogs that reference OAI-PMH vs. Atom vs. Sitemaps:

» More OAI-PMH vs Atom vs Sitemaps Or Why Im A Bit Down On OAI-PMH from Jim Downing
There were a couple of comments on Andy Powells reply post to my post comparing OAI-PMH, Atom and sitemaps for repository harvesting that make it worth revisiting the issue (sorry I didnt pick them up at the time, I failed to add the conv... [Read More]


I've just re-read Jim's post and think I now get it... sorry, I don't know why I didn't see what he was saying first time round :-(

I think that what Jim is arguing is that the added richness in an Atom or RSS feed over simply giving a list of URLs isn't sufficiently useful to repository clients to bother with and that we might as well therefore just use Sitemaps.

From a pure harvesting viewpoint, I think I agree - it is just the plain old list of URLs that is of interest.

But from my new-found repository as research feed viewpoint - i.e. from a viewpoint that sees an eprint repository as a collection of research blogs - RSS is important in the context of surfacing content in Technorati and the like.

So I think we actually need both Sitemaps and RSS feeds.

Yes, that's what I meant - put a lot more clearly!

There's been a shift in the way syndication is described; from being a content feed, to being a representation of link lists (which is how Atom is repeatedly described in the REST book, for example), which muddies the water when it comes to harvesting.

The conclusion I came was based on the observation that syndication feeds are going to be a fact of life for all types of repository (not just eprint repos). The question for me was: if we can turn syndication feeds into an efficient harvesting API, why implement sitemaps in addition?

I suppose the pragmatic answer is that the major search engines want us to, and that the syndication standards chose not to support harvesting - it's the web way.

> if we can turn syndication feeds into an
> efficient harvesting API, why implement
> sitemaps in addition?
Isn't that actually what OAI-PMH is already? I completely agree that we need RSS to promote the content to the many aggregators.

But sitemaps were introduced with a different mindset from the search engines. There focus was just an easy resource discovery for websites they need to crawl. It needed to be simple to attract a lot of web site owners to adopt to sitemaps. Why didn't they promote OAI-PMH? Because of its slightly more complexity and technical demand. The adoption wouldn't have been as fast as with sitemaps.

So I would think it would be easier to strip down OAI-PMH for the general purpose use of web resource representation.

Well, some of the W3C lists use this purist resource-oriented view, and are basically useless when you scan then in an aggregator - I NEED the summary to know if its worth my while viewing the resource itself. Or in other words, their feeds are only usefully read by machines.

Personally I think if you make a feed with no human summary information you're cutting out a lot of uses. Sure, its not "good enough" metadata for some repository harvesters.

Pragmatically, any repository owner today is going to have to do both OAI-PMH and Atom. Hardly a hardship, though, is it?

The comments to this entry are closed.



eFoundations is powered by TypePad