« June 2010 | Main | August 2010 »

July 29, 2010


I woke up this morning to find a very excited flurry of posts in my Twitter stream pointing to the launch by the UK National Archives of the legislation.gov.uk site, which provides access to all UK legislation, including revisions made over time. A post on the data.gov.uk blog provides some of the technical background and highlights the ways in which the data is made available in machine-processable forms. Full details are provided in the "Developer Zone" documents.

I don't for a second pretend to have absorbed all the detail of what is available, so I'll just highlight a couple of points.

First and foremost, this is being delivered with an eye firmly on the Linked Data principles. From the blog post I mentioned above:

For the web architecturally minded, there are three types of URI for legislation on legislation.gov.uk. These are identifier URIs, document URIs and representation URIs. Identifier URIs are of the form http://www.legislation.gov.uk/id/{type}/{year}/{number} and are used to denote the abstract concept of a piece of legislation - the notion of how it was, how it is and how it will be. These identifier URIs are designed to support the use of legislation as part of the web of Linked Data. Document URIs are for the document. Representation URIs are for the different types of possible rendition of the document, so htm, pdf or xml.

(Aside: I admit to a certain squeamishness about the notion of "representation URIs" and I kinda prefer to think in terms of URIs for Generic Documents and for Specific Documents, along the lines described by Tim Berners-Lee in his "Generic Resources" note, but that's a minor niggle of terminology on my part, and not at all a disagreement with the model.)

A second aspect I wanted to highlight (given some of my (now slightly distant) past interests) is that, on looking at the RDF data (e.g. http://www.legislation.gov.uk/ukpga/2010/24/contents/data.rdf), I noticed that it appears to make use of a FRBR-based model to deal with the challenge of representing the various flavours of "versioning" relationships.

I haven't had time to look in any detail at the implementation, other than to observe that the data can get quite complex - necessarily so - when dealing with a lot of whole-part and revision-of/variant-of/format-of relationships. (There was one aspect where I wondered if the FRBR concepts were being "stretched" somewhat, but I'm writing in haste and I may well be misreading/misinterpreting the data, so I'll save that question for another day.)

It's fascinating to see the FRBR approach being deployed as a practical solution to a concrete problem, outside of the library community in which it originated.

Pretty cool stuff, and congratulations to all involved in providing it. I look forward to seeing how the data is used.

July 21, 2010

Getting techie... what questions should we be asking of publishers?

The Licence Negotiation team here are thinking about the kinds of technical questions they should be asking publishers and other content providers as part of their negotiations with them. The aim isn't to embed the answers to those questions in contractual clauses - rather, it is to build up a useful knowledge base of surrounding information that may be useful to institutions and others who are thinking about taking up a particular agreement.

My 'starter for 10' set of questions goes like this:

  • Do you make any commitment to the persistence of the URLs for your published content? If so, please give details. Do you assign DOIs to your published content? Are you members of CrossRef?
  • Do you support a search API? If so, what standard(s) do you support?
  • Do you support a metadata harvesting API? If so, what standard(s) do you support?
  • Do you expose RSS and/or Atom feeds for your content? If so, please describe what feeds you offer?
  • Do you expose any form of Linked Data about your published content? If so, please give details.
  • Do you generate OpenURLs as part of your web interface? Do you have a documented means of linking to your content based on bibliographic metadata fields? If so, please give details.
  • Do you support SAML (Service Provider) as a means of controlling access to your content? If so, which version? Are you a member of the UK Access Management Federation? If you also support other methods of access control, please give details.
  • Do you grant permission for the preservation of your content using LOCKSS, CLOCKSS and/or PORTICO? If so, please give details.
  • Do you have a statement about your support for the Web Accessibility Initiative (WAI)? If so, please give details?

Does this look like a reasonable and sensible set of questions for us to be asking of publishers? What have I missed? Something about open access perhaps?

July 20, 2010

SLOODLE gets further funding from the JISC

I don't do much thinking about 3D virtual worlds these days but it's good to see the recent announcement by one of our early Second Life projects, SLOODLE, that they have been awarded a Learning & Teaching Innovation Grant from the JISC:

The year long project on Supporting Education in Virtual Worlds with Virtual Learning Environments will conduct pilots at each participating institution and will explore how web-based learning environments (esp. Moodle) can effectively support and enhance learning in virtual worlds.

July 16, 2010

Finding e-books - a discovery to delivery problem

Some of you will know that we recently ran a quick survey of academic e-book usage in the UK - I hope to be able to report on the findings here shortly. One of the things that we didn't ask about in the survey but that has come up anecdotally in our discussions with librarians is the ease (or not) with which it is possible to find out if a particular e-book title is available.

A typical scenario goes like this. "Lecturer adds an entry for a physical book to a course reading list. Librarian checks the list and wants to know if there is an e-book edition of the book, in order to offer alternatives to the students on that course". Problemo. Having briefly asked around, it seems (somewhat surprisingly?) that there is no easy solution to this problem.

If we assume that the librarian in question knows the ISBN of the physical book, what can be done to try and ease the situation? Note that in asking this question I'm conveniently ignoring the looming, and potentially rather massive, issue around "what the hell is an e-book anyway?" and "how are we going to assign identifiers to them once we've worked out what they are?" :-). For some discussion around this see Eric Hellman's recent piece, What IS an eBook, anyway?

But, let's ignore that for now... we know that OCLC's xISBN service allows us to navigate different editions of the same book (I'm desperately trying not to drop into FRBR-speak here). Taking a quick look at the API documentation for xISBN yesterday, I noticed that the metadata returned for each ISBN can include both the fact that something is a 'Book' and that it is 'Digital' (form == 'BA' && form == 'DA') - that sounds like the working definition of an e-book to me (at least for the time being) - as well as listing the ISBNs for all the other editions/formats of the same book. So I knocked together a quick demonstrator. The result is e-Book Finder and you are welcome to have a play. To get you started, here are a couple of examples:

Of course, because e-Book Finder is based on xISBN, which is in turn based on WorldCat, you can only use it to find e-books that are listed in the catalogues of WorldCat member libraries (but I'm assuming that is a big enough set of libraries that the coverage is pretty good). Perhaps more importantly, it also only represents the first stage of the problem. It allows you to 'discover' that an e-book exists - but it doesn't get the thing 'delivered' to you.

Wouldn't it be nice if e-Book Finder could also answer questions like, "is this e-book covered by my existing institutional subscriptions?", "can I set up a new institutional subscription that would cover this e-book?" or simply "can I buy a one-off copy of this e-book?". It turns out that this is a pretty hard problem. My Licence Negotiation colleagues at Eduserv suggested doing some kind of search against myilibrary, dawsonera, Amazon, eBrary, eblib and SafariBooksOnline. The bad news is that (as far as I can tell), of those, only Amazon and SafariBooksOnline allow users to search their content before making them sign in and only Amazon offer an API. (I'm not sure why anyone would design a website that has the sole purpose of selling stuff such that people have to sign in before they can find out what is on offer, nor why that information isn't available in a openly machine-readable form but anyway...). So in this case, moving from discovery to delivery looks to be non-trivial. Shame. Even if each of these e-book 'aggregators' simply offered a list1 of the ISBNs of all the e-books they make available, it would be a step in the right direction.

On the other hand, maybe just pushing the question to the institutional OpenURL resolver would help answer these questions. Any suggestions for how things could be improved?

1. It's a list so that means RSS or Atom, right?

July 08, 2010

Going LOCAH: a Linked Data project for JISC

Recently I worked with Adrian Stevenson of UKOLN and Jane Stevenson and Joy Palmer of MIMAS, University of Manchester on a bid for a project under the JISC O2/10 call, Deposit of research outputs and Exposing digital content for education and research, and I'm very pleased to be able to say that the proposal has been accepted and the project has been funded.

The project is called "Linked Open Copac Archives Hub" (LOCAH). It aims to address the "expose" section of the call, and focuses on making available data hosted by the Copac and Archives Hub services hosted by MIMAS - i.e. library catalogue data and data from archival finding aids - in the form of Linked Data; developing some prototype applications illustrating the use of that data; and analysing some of the issues arising from that work. The main partners in the work are UKOLN and MIMAS, with contributions from Eduserv, OCLC and Talis. The Eduserv contribution will take the form of some input from me, probably mostly in the area of working with Jane on modelling some of the archival finding aid data, currently held in the form of EAD-encoded XML documents, so that it can be represented in RDF - though I imagine I'll be sticking my oar in on various other aspects along the way.

UKOLN is managing the project and hosting a project weblog. I'm not sure at the moment how I'll divide up thoughts between here and there; I'll probably end up with a bit of duplication along the way.

July 07, 2010

On federated access management, usability and discovery

A little over a week ago I attended a meeting in London organised by the JISC Collections team entitled From discovery to log-in and use: a workshop for publishers, content owners and service providers.

The meeting was targetted at academic publishers (and other service providers), of whom there were between 30 and 40 in the room. It started with presentations about two reports, the first by William Wong et al (Middlesex University), User Behaviour in Resource Discovery: Final Report, the second by Rhys Smith (Cardiff University), JISC Service Provider Interface Study. Both reports are worth reading, though, as I noted somewhat cheekily on Twitter prior to the meeting, if the JISC had paid more for the first one it might have been shorter!

Anyway... the eagle-eyed amongst you will have noticed that the two reports are somewhat different in scope and scale. Both talk about 'discovery' but the first uses that word in a very broad 'resource discovery' sense whilst the second uses it in the context of the 'discovery problem' as it applies to federated access management - i.e. the problem of how a 'service provider' knows which institutional login page to send the user to when they want to access their site. This difference in focus left me thinking that the day overall was a little out of balance.

For this blog post I don't intend to say anything more about 'resource discovery' in its wider sense, other than to note that Lorcan Dempsey has been writing some interesting stuff about this topic recently, that there are issues about SEO and how publishers of paid-for academic content can best interact with services like Google that could usefully be discussed somewhere (though they weren't discussed at this particular meeting), and that, in my humble opinion, any approach to resource discovery that assumes that institutions can dictate or control which service(s) the end-user is going to use to discover stuff is pretty much doomed from the start. On that basis, I'm not a big believer in library (or any other kind of) portals, nor in any architectural approach that assumes that a particular portal is what the user wants to use!

The two initial presentations were followed by a talk about the 'business case' for an 'EduID' brand - essentially a logo and/or button signifying to the user that they are about to undertake an 'academic federated login' (as opposed to an OpenID login, a Facebook Connect login, a Google login, or whatever else). Such a brand was one of the recommendations coming out of the Cardiff study. I fundamentally disagree with this approach (though I struggled to put my case across on the day). I'm not convinced that we have a 'branding' problem here and I'm worried that the way this work was presented makes it look as though the decision that we need a new 'brand' has already been taken.

During the ensuing discussion about the 'discovery problem' I mentioned the work of the Kantara Initiative and, in particular, the ULX group which is developing a series of recommendations about how the federated access management user experience should be presented to users. I think this group is coming up with a very sensible set of pragmatic recommendations and I think we need to collectively sit up and take some notice and/or get involved. Unfortunately, when I mentioned the initiative at the meeting, it appeared that the bulk of the publishers in the room were not aware of it.

To try and marshal my thoughts a little bit around the Kantara work I decided to try and implement a working demo based on their recommendations. I took as my starting point a fictitious academic service called EduStuff with a requirement to offer three login routes:

  • for UK university students and staff via the UK Federation,
  • for NHS staff via Athens, and
  • for other users via a local EduStuff login.

I'm assuming that this is a reasonably typical scenario for many academic publishers (with the exception of the UK-only targetting on the academic side of things, something I'll come back to later).

Note that this scenario is narrower than the scope of the Kantara ULX work, which includes things like Facebook Connect, Google, OpenID and so on, so I've had to interpret their recommendations somewhat, rather than implement them in their totality.

You can see the results on the demo site. Note that the site itself does nothing other than to provide a backdrop for demonstrating how the 'sign in' process might look - none of the other links work for example.

The process starts by clicking on the 'Sign in' link at the top right (as per the Kantara recommendations). This generates a pop-up 'sign in' box offering the three options. Institutional accounts are selected using a dynamic JQuery search interface which, once an institution has been selected, takes the user to their institutional login page. (My thanks to Mike Edwards at Eduserv for the original code for this). The NHS Athens option takes the user to an Athens login page. The EduStuff option goes to a fairly typical local login/register page, but one which also carries a warning about using one of the other two account types if that is more appropriate.

Whichever account type is chosen, the selection is remembered in a cookie so that future visits to the pop-up 'sign in' box can offer that as the default (again, as per Kantara).

Have a play and see what you think.

Ok, some thoughts from my perspective...

  • In the more general Kantara scenario, some options (Facebook, Google, OpenID, etc.) are presented using clickable buttons/icons. I haven't done this for my scenario because the text wording felt more helpful to me. If icons were to be used, for example if a publisher wanted to offer a Google-based login, then I would probably present the NHS Athens and EduStuff choices as icons as well.
  • You'll note that the word 'Athens' only appears next to the NHS option. I think that our Athens/OpenAthens branding should become largely invisible to users in the context of the UK Federation - or, to put it another way, one of our current usability problems is that publishers are still presenting Athens as an explicit 'sign in' option when they really do not need to so. In the context of the UK Federation, OpenAthens is just an implementation choice for SAML - users need be no more aware of it than they are of the fact that Apache is being used as the Web server. (The same can be said of Shibboleth of course). Part of our current problem is that we are highlighting the wrong brands - i.e. Shibboleth and OpenAthens/Athens rather than the institution - something that both the JISC and Eduserv have been guilty of encouraging in the past.
  • The institutional search box part of the demo is currently built on UK Federation metadata, so it only offers access to UK institutions. There is no reason why this interface couldn't deal with metadata from multiple federations. Indeed, I see no reason why it wouldn't scale to every institution in the world (with some sensible naming). So although the current demo is UK-specific, I think the approach adopted here can be expanded quite significantly.
  • On that basis, you'll note that there is no need in this interface for an EduID brand/button. Users need only concern themselves with the name of their institution - other brands become largely superficial, except where things like Google, Facebook, OpenID and so on are concerned.
  • I've presented only the front page for the EduStuff site. On the basis that we can't control how users discover stuff, i.e. we have to assume that users might arrive directly at any page of our site as the result of a Google search, the 'sign in' process has to be available on each and every page of the site.
  • Finally, the demo only deals with the usability of the first part of the process. It doesn't consider the usability of the institutional login screen, nor of what happens when the user arrives back at the publisher site after they have successfully (or otherwise) authenticated with their institution. I think there are probably significant usability issues at this point as well - for example, how to best indicate that the user is signed in - but I haven't addressed this as part of the current demo.

I'd be very interested in people's views on this work. It's at a very early stage - I haven't even presented it properly to other Eduserv staff yet - but we have some agreement (internally) that work in this area will likely be of value both to ourselves and our current customers and to the wider community. On that basis, I'm hopeful that we will do more work with this demo:

  • to make it more fully functional, i.e. to complete the round-trip back to the EduStuff site after successful authentication,
  • to make the 'sign in' pop-up into a re-usable 'widget' of some kind,
  • and to experiment with the usability of much larger lists of institutions, taken from multiple federations.

Whatever our conclusions, any results will be shared publicly.

Overall the day was very interesting. I'll leave you with my personal highlight... the point at which one of the (non-publisher) participants said (somewhat naively), "What would it take to make all this [publisher] content available for free? Then we wouldn't need to worry about authentication". Oh boy... there was a collective sharp intake of breath and you could almost hear the tumble-weed blowing for a minute there! :-)

Addendum (8 July 2010): in light of comments below I have re-worked my demo using a more icon-based approach. This is much more in line with the current Kantara ULX mockups (version 4) including the addition of a 'more options'/'less options' toggle on second and subsequent sign ins. Overall, it is, I think, rather better than my initial text-based approach. I stand by my assertion that an EduId button is not required in the 'sign in' process demonstrated here (irrespective of whether the icon-based or text-based approach is used). That said, I'd welcome views on how/where such a button would fit in.

July 02, 2010

Now don't tell me I've nothin' to do


Clay Shirky gave a polished performance at the Watershed in Bristol the other night for his talk, Our Cognitive Surplus: Creativity and Generosity in a Connected Age, given as part of the Bristol Festival of Ideas. One would expect nothing less of course.

The basic premise of the talk was that a combination of free time, talent, goodwill (our 'cognitive surplus') and the social Web are now allowing things to happen in ways that were previously not possible. The talk was peppered with anecdotal evidence for the kinds of changes being wrought by new technology and social media, from struggles for women's rights in India thru to changes of government policy on the environment (specifically car-sharing) in Canada and, yes, even to our use of Lolcats.

The individual examples were all new to me, though I've seen the general theme being covered several times before, using different examples of much the same thing. For me, there was a certain sense of, "Well, yes... but so what?" - perhaps I missed something? - though, oddly, that didn't detract from a very enjoyable evening.

Listening to the talk though did cause me to question my own use of social networks, something that I actually find quite hard to justify in any rational sense.

Here's an example...

For the last 574 days I have taken a photograph every day and put it on Blipfoto.com along with a few words of text. Blipfoto is a photo-blogging site - a social network, at least at the level of the number of "Wow... nice image" type comments that get exchanged, though it probably comes closer to the Lolcats end of the spectrum than the 'changing the planet' end. I probably spend somewhere between 30 minutes and an hour and a half on each photo - by the time I've taken the photo, editied it, uploaded it, written some text and so on. That probably represents something like 400 hours of my life over the last couple of years. Boggle!

To which one might sensibly ask, "Why?". And I don't think I'd be able to give you a coherent answer to such a question.

It's the closest thing I have to an artistic outlet I guess - which is certainly not a bad thing. My photography is getting better... maybe? There's a slight competetive element to it, both in the sense of forcing oneself to do something every day and in the sense of getting good comments and ratings. And there's the "Woo hoo... this is me... I'm over here" type of thing going on as well I suppose (something that is present in all social networks). But beyond that I'm not sure I can offer any rationalisation that will convince either you or me about why I am doing it? I'm certainly not making the world a better place with my time, whereas I could be. I could use that time to be a governor of a school again. Or use it to edit Wikipedia. Or to spend additional time working on my local school's website. Or to campaign on environmental issues. Or any number of other things. I could even do some private consultancy and make some money!

But I don't do any of those things... instead, I spend my time faffing around with a camera and a website in the vain hope of getting one or two positive comments from people that I've never met and who I will probably never meet.

Or as the Statler Brothers put it:

Countin' flowers on the wall
That don't bother me at all
Playin' solitaire till dawn with a deck of fifty-one
Smokin' cigarettes and watchin' Captain Kangaroo
Now don't tell me I've nothin' to do
[Photo created using Autostitch on an iPhone 3G]



eFoundations is powered by TypePad