May 02, 2008

SWAP and ORE

There's an interesting mini-thread on the jisc-repositories list, started by the announcement by Google to drop support for OAI-PMH which Paul Walk blogged recently.

I reproduce my contribution here, partly because I haven't been blogging much lately and it seems a shame to waste the text :-) and partly because email messages have a tendency to disappear into the ether.

It seems to me that Google's lack of support for the OAI-PMH is largely a non-event - because their support for it was (ultimately) a non-event.  They never supported it fully in any case AFAIK and, in some cases at least, support was broken because they didn't recognise links higher in the server tree than the OAI base URL.

It highlights the fact that the OAI-PMH will never be a mainstream Web protocol, but so what... I think we spotted that anyway!

There are technical reasons why the OAI-PMH was always going to struggle (I say that only with the benefit of hindsight) because of its poor fit with the Web Architecture.  Whilst I don't suppose that directly factored into Google's thinking in any sense, I think it is worth remembering.

On the 'social' thing I very strongly agree and I've argued several times in the past that we need to stop treating stores of content purely as stores of content and think about the social networks that need to build up around them.  It seems to me that the OAI-PMH has never been a useful step in that direction in the way that, say, RSS has been in the context of blogging.

Simple DC suffers from being both too complex (i.e. more complex than RSS) and too simple (i.e. not rich enough to meet some scholarly functional requirements).  Phil Cross suggests that we need to move towards a more complex solution, i.e. SWAPOAI-ORE takes a different but similar step in the direction of complexity - though it is probably less conceptually challenging that SWAP in many ways.  ORE's closeness to Atom might be its saving grace - on the other hand, it's differences to Atom might be its undoing.  Come back in 3 year's time and I'll tell you which! :-)

I like SWAP because I like FRBR... and whenever I've sat down and worked with FRBR I've been totally sold on how well it models the bibliographic world.  But, and it's a very big but, however good the model is, SWAP is so conceptually challenging that it is hard to see it being adopted easily.

For me, I think the bottom line question is, "do SWAP or ORE help us build social networks around content?".  If the answer is "no", and I guess in reality I think the answer might well be "no", then we are focusing our attention in the wrong place.

More positively, I note that "SWAP and ORE" has quite a nice ring to it! :-)

Inside out - symposium update

Our annual symposium takes place next Thursday (8th May) at the British Library in London:

Inside Out: What do current Web trends tell us about the future of ICT provision for learners and researchers?

The day is intended to give people a chance to think about the potentially disruptive impact of current Web trends on the provision and use of ICT services within the educational sector, particularly higher education, and will feature talks from a range of perspectives including:

  • Larry Johnson (New Media Consortium, US),
  • Bobbie Johnson (Guardian),
  • Jem Stone (BBC),
  • Geoffrey Bilder (CrossRef),
  • Chris Adie (University of Edinburgh),
  • David Harrison (UCISA / Cardiff University)
  • and Grainne Conole (Open University).

I'm really looking forward to it... though right now things are a bit hectic with all the final preparations and what not.

The event is full but we are planning on streaming all the talks live on the Web, coupled with a live chat facility that will allow delegates (both those in the room and those watching the video stream) to discuss the presentations and ask questions of the speakers.

Presentations start at 10.30am, UK time.

Please note that it is not necessary to register to watch the video stream or take part in the live chat.  However, we have set up a social network for the event and we encourage you to sign up for this if you are planning on attending (either in person or via the video stream).  Doing so will give all delegates a better feel for who is in the audience.

Also note that all the presentations and streamed media will be made available after the event for those not able to see it live.

Finally, we are encouraging people to blog and Twitter about the event - if you do, please use the event tag, efsym2008.

For those with an interest in such things, we are using I S Media to do the live video streaming for us - the same people we used for the symposium last year.  The live chat facility is being done using Coveritlive, which is really a live blogging tool but it supports quite a nice moderated comment facility, so we are going to use it slightly outside its intended space.  It should work OK though.  The social network has been built using NIng.  I'm very impressed with the flexibility and power of NIng and I strongly suspect that would be possible to do an awful lot with it (given the necessary time!) - you basically get full access to the source code if you want it.  Despite that, in some ways I would have preferred to use Crowdvine for our social network, which I think offers a really nicely put together suite of social tools aimed specifically at conference delegates - but unfortunately, the costs were prohibitive for us given the money we are spending on other parts of the event.

Anyway, I'll be keeping my fingers firmly crossed between now and next Thursday and hoping that everything runs smoothly.

April 24, 2008

Slideshare, Tibet, China and DoS attacks

I Twittered briefly (yes, I know that all tweets are brief by definition) this morning that Slideshare appeared to be down again.  Within minutes I got a response from a member of staff at Slideshare indicating that they were under attack from hackers.

As an aside, I should note that this is not the first time I've received very prompt and helpful technical support as the direct result of tweeting about an issue (and not just from Slideshare either).  This feels very impressive, at least to me as an end-user of the service offering its help.  At the current stage of its development, Twitter seems very good for this right now.  I'm not sure it will last - not because the will won't be there but because the growing numbers of Twitter users will become increasingly difficult to deal with.

Anyway, it turns out (as reported by Techcrunch, SlideShare Slammed with DDOS Attacks from China) that Slideshare is suffering from a series of Denial of Service (DoS) attacks launched from somewhere in China at the moment, apparently in protest at various presentations on Slideshare covering the situation in Tibet.

Now, I'm not in a position to comment on where these attacks originate, nor why they are happening.  But I assume that they are real and, if so, that their effects can be felt by ordinary end-users of the Slideshare service.

In recent comments on my own blog post about Jorum I suggested that the global impact of services like Slideshare is hard to ignore when thinking about where content is best surfaced on the Web.  But success brings with it both negatives and positives I guess.  Most obvious are the issues around sustainability and reliability - like many such services, Slideshare uses Amazon S3 behind the scenes to help cope with peaks in demand and, by and large, it seems to do so reasonably well.  This is a different kind of threat - that success brings with it attention of a less healthy kind.  We've seen similar but different things of late with Second Life, where Linden Lab seem to have come increasingly under the scrutiny of political interests in the US - not of the direct action kind we are seeing here but certainly capable of having a significant impact on the way the service grows and develops.

I'm not suggesting this as a reason for not using the likes of Slideshare - just noting an interesting aspect of the globalised world in which we live and that service architectures and delivery models need to be mindful of those cases where the wrong kind of people want to do the wrong kind of things.  The Internet itself being a classic example I suppose.

April 21, 2008

Jorum to move to open access

The JISC have announced that Jorum, the national learning object repository hosted and run jointly by MIMAS and EDINA, is to move to an 'open access' model.

This is good news, though one is tempted to wonder why it has taken so long!  I've argued for a while now that using a relatively closed licensing model and forcing registration before use would more or less stop the service in its tracks.

Through the development of JorumOpen, lecturers and teachers will be able to share materials under the Creative Commons licence framework: this makes sharing easier, granting users greater rights for use and re-use of online content and easier to understand. Importantly, it does not require prior registration. As a result availability is global as well as across UK universities and colleges. JorumOpen will run alongside a 'members only' facility, JorumEducationUK, that will support sharing of material just within the UK educational sector; this will be available only to registered users and contributors, as is currently the case.

Is the addition of JorumOpen enough to turn the service around?  I'm not sure to be honest.  It might be, though I'm not fully convinced that the notion of learning objects, as relatively complex packages of other objects, is compelling and/or simple enough to really succeed.  Can something like Jorum really take on the likes of Slideshare, Flickr and YouTube?

Libraries of the future

There's a special supplement in tomorrow's Education Guardian (Tuesday, 22 April) looking at college and university libraries of the future.  This has been prepared in collaboration with the JISC as part of their Libraries of the Future programme.  Material from the supplement, covering information literacy, physical learning spaces, Library 2.0, business models, digitisation, users and librarians is already available on the Guardian Web site.

April 15, 2008

IMLS Digital Collections & Content

Another somewhat belated post.... Andy and I both get occasional invitations to be members of advisory/steering groups for various programmes and projects operating in the areas in which we have an interest. I'm currently a member of the Advisory Group for the second phase of the Digital Collections and Content project which is funded by the Institute of Museum and Library Services and led by a team at the University of Illinois at Urbana-Champaign. Given the UK focus of the Foundation, it's probably slightly unusual for me to take on such a role for a US project, but it combines a number of our interests - repositories, resource discovery, metadata, the use of cultural heritage resources for learning and research, and I have also worked with some members of the project team in the past in the development of the Dublin Core Collections Application Profile.

The group met recently in Chicago, and although I wasn't able to attend the meeting in person, I managed to join in by phone for a couple of hours. One area in which the project seems to be doing some interesting work is in the relationships between collection-level description and item description, and in particular the use of algorithms/rules by which item-level metadata might be inferred from collection-level metadata.

The project is also exploring how collection-level metadata might be presented more effectively during searching, particularly to provide contextual information for individual items.

April 14, 2008

Open Repositories 2008

I spent a large part of last week the week before last (Tuesday, Wednesday & Friday) at the Open Repositories 2008 conference at the University of Southampton.

There were something around 400 delegates there, I think, which I guess is an indicator of the considerable current level of interest around the R-word. Interestingly, if I recall conference chair Les Carr's introductory summary of stats correctly, nearly a quarter of these had described themselves as "developers", so the repository sphere has become a locus for debate around technical issues, as well as the strategic, policy and organisational aspects. The JISC Common Repository Interfaces Group (CRIG) had a visible presence at the conference, thanks to the efforts of David Flanders and his comrades, centred largely around the "Repository Challenge" competition (won by Dave Tarrant, Ben O’Steen and Tim Brody with their "Mining with ORE" entry).

The higher than anticipated number of people did make for some rather crowded sessions at times. There was a long queue for registration, though that was compensated for by the fact that I came away from that process with exactly two small pieces of paper: a name badge inside an envelope on which were printed the login details or the wireless network. (With hindsight, I could probably have done with a one page schedule of what was on in which location - there probably was one which I missed picking up!) Conference bags (in a rather neat "vertical" style which my fashion-spotting companions reliably informed me was a "man bag") were available, but optional. (I was almost tempted, as I do sport such an accessory at weekends, and it was black rather than dayglo orange, but decided to resist on the grounds that there was a high probability of it ending up in the hotel wastepaper bin as I packed up to leave.) Nul points, however, to those advertisers who thought it was a good idea to litter every desktop surface in the crowded lecture theatre with their glossy propaganda, with the result that a good proportion of it ended up on the floor as (newly manbagged-up) delegates squeezed their way to their seats.

The opening keynote was by Peter Murray-Rust of the Unilever Centre for Molecular Informatics, University of Cambridge. With some technical glitches to contend with, which must have been quite daunting in the circumstances - Peter has posted a quick note on his view of the experience! "I have no idea what I said" :-)) - , Peter delivered a somewhat "non-linear" but always engaging and entertaining overview of the role of repositories for scientific data. He noted the very real problem that while ever increasing quantities of data are being generated, very little of it is being successfully captured, stored and made accessible to others. Peter emphasised that any attempt to capture this data effectively must fit in with the existing working practices of scientists, and must be perceived as supporting the primary aims of the scientist, rather than introducing new tasks which might be regarded as tangential to those aims. And the practices of those scientists may, in at least some areas of scientific research, be highly "locally focused" i.e. the scientists see their "allegiances" as primarily to a small team with whom data is shared - at least in the first instance, an approach categorised as "long tail science" (a term attributed to Peter's colleague Jim Downing). Peter supported his discussion with examples drawn from several different e-Chemistry projects and initiatives, including the impressive OSCAR-3 text mining software which extracts descriptions of chemical compounds from documents)

Most of the remainder of the Tuesday and Wednesday I spent in paper sessions. The presentation I enjoyed most was probably a presentation by Jane Hunter from the University of Queensland on the work of the HarvANA project on a distributed approach to annotation and tagging of resources from the Picture Australia collection (in the first instance at least - at the end, Jane whipped through a series of examples of applying the same techniques to other resources). Jane covered a model for annotation on tagging based on the W3C Annotea model, a technical architecture for gathering and merging distributed annotations/taggings (using OAI-PMH to harvest from targets at quite short time intervals (though those intervals could be extended if preferred/required)), browser-based plug-in tools to perform annotation/tagging, and also touched on the relationships between tagging and formally-defined ontologies. The HarvANA retrieval system currently uses an ontology to enhance tag-based retrieval - "ontology-based or ontology-directed folksonomy" - , but the tags provided could also contribute to the development/refinement of that ontology, "folksonomy-directed ontology". Although it was in many ways a repository-centric approach and Jane focused on the use of existing, long-established technologies, she also succeeded in placing repositories firmly in the context of the Web: as systems which enable us to expose collections of resources (and collections of descriptions of those resources), which then enter the Web of relationships with other resources managed and exposed by other systems - here, the collections of annotations exposed by the Annotea servers, but potentially other collections too.

At Wednesday lunch time, (once I managed to find the room!) I contributed to a short "birds of a feather" session co-ordinated by Rosemary Russell of UKOLN and Julie Allinson of the University of York on behalf of the Dublin Core Scholarly Communications Community. We focused mainly on the Scholarly Works Application Profile and its adoption of a FRBR-based model, and talked around the extension of that approach to other resource types which is under consideration in a number of sibling projects currently being funded by JISC. (Rather frustratingly for me, this meeting clashed with another BoF session on Linked Data which I would really have liked to attend!)

I should also mention the tremendously entertaining presentation by Johan Bollen of the Los Alamos National Laboratory on the research into usage metrics carried out by the MESUR project. Yes, I know, "tremendously entertaining" and "usage statistics" aren't the sort of phrases I expect to see used in close proximity either. Johan's base premise was, I think, that seeking to illustrate impact through blunt "popularity" measures was inadequate, and he drew a distinction between citation - the resources which people announce in public that they have read - and usage - the actual resources they have downloaded. Based on a huge dataset of usage statistics provided by a range of popular publishers and aggregators, he explored a variety of other metrics, comparing the (surprisingly similar) rankings of journals obtained via several of these metrics with the rankings provided by the citation-based Thomson impact factor. I'm not remotely qualified to comment on the appropriateness of Johan's choice of algorithms, but the fact that Johan kept a large audience engaged at the end of a very long day was a tribute to his skill as a presenter. (Though I'd still take issue with the Britney (popular but insubstantial?)/Big Star (low-selling but highly influential/lauded by the cognoscenti) opposition: nothing by Big Star can compare with the strutting majesty of "Toxic". No, not even "September Gurls".)

On the Friday, I attended the OAI ORE Information Day, but I'll make that the subject of a separate post.

All in all - give or take a few technical hiccups - it was a successful conference, I think (and thanks to Les and his team for their hard work) - perhaps more so in terms of the "networking" that took place around the formal sessions, and the general "buzz" there seemed to be around the place, than because of any ground-breaking presentations.

And yet, and yet... at the end of the week I did come away from some of the sessions with my niggling misgivings about the "repository-centric" nature of much of the activity I heard described slightly reinforced. Yes, I know: what did I expect to hear at a conference called "Open Repositories"?! :-) But I did feel an awful lot of the emphasis was on how "repository systems" communicate with each other (or how some other app communicates with one repository system and then with another repository system ) e.g. how can I "get something out" of your repository system and "put it into" my repository system, and so on. It seems to me that - at the technical level at least - we need to focus less on seeing repository systems as "specific" and "different" from other Web applications, and focus more on commonalities. Rather than concentrating on repository interfaces we should ensure that repository systems implement the uniform interface defined by the RESTful use of the HTTP protocol. And then we can shift our focus to our data, and to

  • the models or ontologies (like FRBR and the CIDOC Conceptual Reference Model, or even basic one-object-is-made-available-in-multiple-formats models) which condition/determine the sets of resources we expose on the Web, and see the use of those models as choices we make rather than something "technologically determined" ("that's just what insert-name-of-repository-software-app-of-choice does");
  • the practical implementation of formalisms like RDF which underpin the structure of our representations describing instances of the entities defined by those models, through the adoption of conventions such as those advocated by the Linked Data community

In this world, the focus shifts to "Open (Managed) Collections" (or even "Open Linked Collections"), collections of documents, datasets, images, of whatever resources we choose to model and expose to the world. And as a consumer of those resources  I (and, perhaps more to the point, my client applications) really don't need to know whether the system that manages and exposes those collections is a "repository" or a "content management system" or something else (or if the provider changes that system from one day to the next): they apply the same principles to interactions with those resources as they do to any other set of resources on the Web.

April 12, 2008

UKOLN 30th Anniversary video

I just know this is what you've been waiting for...

The images are on Flickr and my Facebook profile.

April 10, 2008

Shibboleth Technical Reading List

Simon McLeish at LSE has produced a nice (and compact) list of technical reading material for those who are new to Shibboleth and the UK Access Management Federation.

I think he is looking for suggestions of what else to include, so if you have any, get in touch with him directly.

UKOLN's 30th bash

Ukolnstaff Pete and I are travelling up to London later today to attend UKOLN's 30th anniversary celebrations, being held at the British Library.  30 years old - blimey... who'd have thought it!

I joined UKOLN in 1996 and I look back on my near-10 years there with a lot of fondness.  The early years of the eLib Programme were awash with excitement - particularly that librarians would somehow shape the way that the Web evolved - and UKOLN was fairly central to much of the activity, both in the UK and more widely.  With hindsight, this was, well, how shall I put it... somewhat naive?  But that didn't make it any less fun at the time.  The Web moved on and most of us have spent the rest of our working lives trying to catch up with it but I still like to think that we helped in a small way to lay the foundations of what has come since.  UKOLN's focus has broadened somewhat since those days but it continues to offer the community a valuable thought-leadership role in all aspects of digital information management.

Working at UKOLN gave me an opportunity to get involved in a whole range of interesting projects, services and standards-related activities including the Dublin Core, the OAI-PMH, OpenURL, RDF and the semantic Web, RSS, persistent identifiers, the JISC Information Environment, Intute (I should add hyperlinks to each of these but they are now so well known that it hopefully isn't necessary) and a whole range of other things.  Hey, there were times when I even enjoyed European projects!

I'd like to offer my thanks to both Lorcan and Liz for the way they have led UKOLN over the years, not least in dealing with the difficult task of balancing competing demands from funders, the University of Bath and the other stakeholders, to Lorcan for his ongoing vision and inspiration (for me - and others I suspect - the chance to work with Lorcan was a big draw in moving to UKOLN in the first place), to the other staff at UKOLN over the years (far too many to mention) for making UKOLN such a great place to work, to people like Cliff Lynch who have offered UKOLN significant support over the years, and to the funders for (mostly :-) ) remembering that UKOLN does what it does best when it is left to get on with things in its own way.

Here's to another 30 years...

[Image: UKOLN staff circa 1996 - taken from the UKOLN Facebook group.]