« OpenIDs, researchers and delegation | Main | Share creep »

January 22, 2009

Why can't I find a library book in my search engine?

There's a story in today's Guardian, Why you can't find a library book in your search engine, (seen online but I assume that it is also in the paper version) covering the ongoing situation around the licensing of OCLC WorldCat catalog records.  Rob Styles provides some of the background to this, OCLC, Record Usage, Copyright, Contracts and the Law, though, as he notes, he works for Talis which is one of the commercial organisations that stands to benefit from a change in OCLC's approach.

I don't want to comment in too much detail on this story since I freely admit to not having properly done my homework, but I will note that my default position on this kind of issue is that we (yes, all of us) are better off in those cases where data is able to be made available on an 'open' rather than 'proprietary' basis and I think this view of the world definitely applies in this case.

The Guardian story is somewhat simplistic, IMHO, not on the question of 'open' vs. 'closed' but on how easy it would be for such data, assuming that it was to be made openly available, to get into search engines (by which I assume the article really means Google?) in a meaningful way.  Flooding the Web with multiple copies of metadata about multiple copies of books is non-trivial to get right (just think of the issues around sensibly assigning 'http' URIs to this kind of stuff for example) such that link counting, ranking of books vs. other Web resources, and providing access to appropriate copies can be done sensibly.  There has to be some point of 'concentration' (to use Lorcan Dempsey's term) around which such things can happen - whether that is provided by Google, Amazon, Open Library, OCLC, Talis, the Library of Congress or someone else.  Too many points of concentration and you have a problem... or so it seems to me.

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8345203ba69e2010536e40123970b

Listed below are links to weblogs that reference Why can't I find a library book in my search engine?:

Comments

Having published this post, I note that Ed Summers discusses some of the issues around crawling bibliographic data at http://inkdroid.org/journal/2009/01/22/crawling-bibliographic-data/. Shame I didn't see it sooner.

No worries Andy, we were writing at the same time and you finished first. I noticed your blog entry turn up just as I was submitting.

I agree that having a 'point of concentration' is essential...but I'd argue that we need points (plural) of concentration, and that they need to emerge dynamically...and not be artificially imposed.

Ed,
yes I agree with your first point, that "we need points (plural) of concentration". Irrespective of 'need', multiple points of concentration are likely to emerge and we'll have to live with that but clearly, for every additional point of concentration there's a proportional reduction in Google-juice for all the existing points of concentration (if you see what I mean?).

On your second point, that "they need to emerge dynamically...and not be artificially imposed" - I don't understand what you mean? OCLC has "emerged" (admittedly over a long period) but it has emerged nonetheless - just as Facebook, Talis, Google and a whole spectrum of other things have emerged. We live in a mixed economy where things can emerge in different forms - loosely coordinated community activities, not-for-profits, for-profits, ... - this is good isn't it?

In the form that OCLC has emerged (which I'll characterise as a not-for-profit membership organisation) evolution will presumably happen in response to what its members want. If enough of the members disagree with the licensing approach (particularly if it is to the point that they will consider leaving the organisation) then, presumably, OCLC policy will change. But the members (and others) also have to understand that the organisation has to be sustainable in order to deliver its benefits to the membership and that there will be some kind of costs associated with that sustainability, whether they are financial or something else. OCLC's task is to make sure that, overall, the benefits to members outweigh the costs, whilst still trying to maximise the benefits for the wider community.

I'm not a member, so I can't really comment on whether they are succeeding in doing that - but I would say, from our perspective here at Eduserv as a UK charity, that maintaining that kind of balance is distinctly non-trivial (though I would note that the revenue model here is quite different to that at OCLC).

It’s pretty amazing to me that OCLC have managed to keep the underlying business model of WorldCat going so long. I guess it is the classic 'cash cow' situation and that subscription to WorldCat forms an overly significant part of OCLC revenues making it very risky for them to change the business model.

As Karen Coyle mentions in her comment on Rob Style's blog, it is the holdings info attached to bib records that provide much of the value that innovative services might build on.

A couple of thoughts. First, as I understand most UK libraries (even HE libraries)do not subscribe to WorldCat - their catalogue records and holdings info are derived from and held on other systems - taking cat records from the BL, LC and shared library mgt system databases with holdings info on COPAC, UnityUK, Talis and shared LMS system databases. It would be interesting to consider how these various systems could be made more open to the Web, with added value services using their data.

Also it would be worth thinking whether COPAC would be affected by the changes OCLC are proposing.

On a wider level, moving towards less elaborate automatically generated metadata (rather than MARC) would open up the market for new 'union catalogues'.

Re: "taking cat records from the BL, LC and shared library mgt system databases with holdings info on COPAC, UnityUK, Talis and shared LMS system databases. It would be interesting to consider how these various systems could be made more open to the Web, with added value services using their data."

Sure, agreed... but I think my point, in the comments to Ed (not in the original post), is that sustainable business models - i.e. something more than just, "Here's some funding from JISC" - also have to emerge around that open sharing in order to build anything long term.

Re: "On a wider level, moving towards less elaborate automatically generated metadata (rather than MARC) would open up the market for new 'union catalogues'." Yup, interesting point - I hadn't really thought about it that way round. It is the complexity of the records, in part at least, that demands the current processes. Changing that might allow new processes to emerge more easily. Certainly, at the point that there is a full-text index of every book ever published (how far away is that?) the world changes completely.

Just thinking out loud...

The comments to this entry are closed.

About

Search

Loading
eFoundations is powered by TypePad