« July 2010 | Main | September 2010 »

August 24, 2010

Resource discovery revisited...

...revisited for me that is!

Last week I attended an invite-only meeting at the JISC offices in London, notionally entitled a "JISC IE Technical Review" but in reality a kind of technical advisory group for the JISC and RLUK Resource Discovery Taskforce Vision [PDF], about which the background blurb says:

The JISC and RLUK Resource Discovery Taskforce was formed to focus on defining the requirements for the provision of a shared UK resource discovery infrastructure to support research and learning, to which libraries, archives, museums and other resource providers can contribute open metadata for access and reuse.

The morning session felt slightly weird (to me), a strange time-warp back to the kinds of discussions we had a lot of as the UK moved from the eLib Programme, thru the DNER (briefly) into what became known (in the UK) as the JISC Information Environment - discussions about collections and aggregations and metadata harvesting and ... well, you get the idea.

In the afternoon we were split into breakout groups and I ended up in the one tasked with answering the question "how do we make better websites in the areas covered by the Resource Discovery Taskforce?", a slightly strange question now I look at it but one that was intended to stimulate some pragmatic discussion about what content providers might actually do.

Paul Walk has written up a general summary of the meeting - the remainder of this post focuses on the discussion in the 'Making better websites' afternoon breakout group and my more general thoughts.

Our group started from the principles of Linked Data - assign 'http' URIs to everything of interest, serve useful content (both human-readable and machine-processable (structured according to the RDF model)) at those URIs, and create lots of links between stuff (internal to particular collections, across collections and to other stuff). OK... we got slightly more detailed than that but it was a fairly straight-forward view that Linked Data would help and was the right direction to go in. (Actually, there was a strongly expressed view that simply creating 'http' URIs for everything and exposing human-readable content at those URIs would be a huge step forward).

Then we had a discussion about what the barriers to adoption might be - the problems of getting buy-in from vendors and senior management, the need to cope with a non-obvious business model (particularly in the current economic climate), the lack of technical expertise (not to mention semantic expertise) in parts of those sectors, the endless discussions that might take place about how to model the data in RDF, the general perception that Semantic Web is permanently just over the horizon and so on.

And, in response, we considered the kinds of steps that JISC (and its partners) might have to undertake to build any kind of political momentum around this idea.

To cut a long story short, we more-or-less convinced ourselves out of a purist Linked Data approach as a way forward, instead preferring a 4 layer model of adoption, with increasing levels of semantic richness and machine-processability at each stage:

  1. expose data openly in any format available (.csv files, HTML pages, MARC records, etc.)
  2. assign 'http' URIs to things of interest in the data, expose it in any format available (.csv files, HTML pages, etc.) and serve useful content at each URI
  3. assign 'http' URIs to things of interest in the data, expose it as XML and serve useful content at each URI
  4. assign 'http' URIs to things of interest in the data and expose Linked Data (as per the discussion above).

These would not be presented as steps to go thru (do 1, then 2, then 3, ...) but as alternatives with increasing levels of semantic value. Good practice guidance would encourage the adoption of option 4, laying out the benefits of such an approach, but the alternatives would provide lower barriers to adoption and offer a simpler 'sell' politically.

The heterogeneity of data being exposed would leave a significant implementation challenge for the aggregation services attempting to make use of it and the JISC (and partners) would have to fund some pretty convincing demonstrators of what might usefully be achieved.

One might characterise these approaches as 'data.glam.uk' (echoing 'data.gov.uk' but where 'glam' is short for 'galleries, libraries, archives and museums') and/or Digital UK (echoing the pragmatic approaches being successfully adopted by the Digital NZ activity in New Zealand).

Despite my reservations about the morning session, the day ended up being quite a useful discussion. That said, I remain somewhat uncomfortable with its outcomes. I'm a purest at heart and the 4 levels above are anything but pure. To make matters worse, I'm not even sure that they are pragmatic. The danger is that people will adopt only the lowest, least semantic, option and think they've done what they need to do - something that I think we are seeing some evidence of happening within data.gov.uk. 

Perhaps even more worryingly, having now stepped back from the immediate talking-points of the meeting itself, I'm not actually sure we are addressing a real user need here any more - the world is so different now than it was when we first started having conversations about exposing cultural heritage collections on the Web, particularly library collections - conversations that essentially pre-dated Google, Google Scholar, Amazon, WorldCat, CrossRef, ... the list goes on. Do people still get agitated by, for example, the 'book discovery' problem in the way they did way back then? I'm not sure... but I don't think I do. At the very least, the book 'discovery' problem has largely become an 'appropriate copy' problem - at least for most people? Well, actually, let's face it... for most people the book 'discovery' and 'appropriate copy' problems have been solved by Amazon!

I also find the co-location of libraries, museums and archives, in the context of this particular discussion, rather uncomfortable. If anything, this grouping serves only to prolong the discussion and put off any decision making?

Overall then, I left the meeting feeling somewhat bemused about where this current activity has come from and where it is likely to go.


August 23, 2010

Upcoming identity events

A couple of upcoming UK identity events worth highlighting...

Firstly, FAM10, which is being held in Cardiff on the 5th and 6th October:

Some of the areas that delegates can look forward to hearing about this year include:

  • Detailed overview of the progress made in the school's sector to adopt federated access management;
  • Interfederation use cases, including links to the Government Gateway to allow parental access to schools data;
  • Detailed technical information for advanced programmers;
  • Update on JISC Services for accounting and statistical mnagement;
  • Updates on identity management, user management and licence management;
  • International speakers on areas such as Kantara, Shibboleth and Gartner;
  • Service updates from the UK federation.

Secondly, the Internet Identity Workshop - Europe, which is being held a few days later in London on the 11th October:

IIW’s focus is on "user-centric identity" or "user-driven identity" – addressing the technical and adoption challenge of how people can manage their own identity across the range of websites, services, companies and organizations with which they interact. The focus of this first IIW-Europe will be on the whole range of global and European initiatives in this space.

August 16, 2010

e-Book survey - summary of responses now available

A summary of the responses to our recent survey of current and future institutional attitudes to e-books is now available. For those of you who don't want to read the whole thing (25 pages or so), the executive summary is only two sides of A4. And for those of you who don't want to read that, here's my even briefer thoughts: 

  • The drivers for adopting e-books do not currently seem to be coming from faculty, with 2/3 of respondents suggesting that none or less than 10% of course modules at their institution currently recommend or mandate the use of e-books.
  • That said, uptake across subject areas is variable, with business and management, social sciences and health and medicine apparently making more use of e-books than other disciplines.
  • Distance learners are generally seen as an important user group for e-books. In addition, 'demand from students', shelf space', 'cost savings', 'convenience of access', 'accessibility' and 'coping with peaks in demand' were all given as drivers.
  • A significant growth in the use of e-books is predicted over the next two years, with 77% of respondents thinking that use of e-books double or more than double.
  • However, budgets are not predicted to rise in line with this (not surprisingly). Coupled with the lack of a separate 'e-book budget', a growth in spending on e-books seems likely to impact on budget for other resources (particularly print books).
  • In terms of suppliers, Coutts and Dawsonera are currently the most widely used.

I don't really know what to make of this. My suspicion is that we are at a point in the hype curve around e-books that has tended to push librarians (most of the respondents to this survey were librarians) into thinking, "we ought to be doing something here and we probably should expect a sharp rise in uptake" even though general demand from the user community (i.e. students and teaching staff) remains quite low to date. Perhaps that is unfair?

Mind you... I like books - you know, the old-fashioned paper ones.

August 13, 2010

Cloud infrastructures for academia - the FleSSR project

Yesterday, I attended the kick-off meeting for a new JISC-funded project called FleSSR - Flexible Services for the Support of Research. From the, as yet very new, project blog:

Our project will create a hybrid public-private Infrastructure as a Service cloud solution for academic research. The two pilot use cases chosen follow the two university partners interests, software development and multi-platform support and on-demand research data storage space.

We will be implementing open standards for cloud management through the OGF Open Cloud Computing Interface.

The project is a collaboration led by the Oxford e-Research Centre and involving STFC, Eduserv, the University of Reading, EoverI, Eucalyptus Inc. and Canonical Ltd.

Our role at Eduserv will primarily be to build a public cloud into which private clouds at Oxford and Reading can burst both compute resource and storage at times of high demand, as generated by pilot demonstrators at those two institutions. My colleagues Matt Johnson and Tim Lawrence will lead our work on this here. The clouds will be built on some variant of Eucalyptus and Ubuntu - one of the early pieces of work for the project team being to compare Open Eucalyptus, Enterprise Eucalyptus and Ubuntu Enterprise Cloud.

My own involvement with the project will start properly after Christmas and will contribute to the project's thinking about sustainable business models for cloud providers like Eduserv in this space. One of the interesting aspects of the project will be some technical work on policy enforcement and accounting that will allow business models other than 'top-sliced central-funding' to come into play in academia for this kind of provision.

I'm really looking forward to this work. The project itself, funded as part of the JISC's Flexible Service Delivery Programme, is only 10 months in duration but is attempting to cover a lot of ground very quickly. I'm very hopeful that the outputs will be of widespread interest to the community, as well as helping to shape our own potential offerings in this area.

Federation Metadata Explorer - update

Quick note: I've updated the Federation Metadata Explorer to support metadata from six European federations. 



eFoundations is powered by TypePad