March 21, 2011

UMF pilot cloud infrastructure - size matters?

It's been a while since the HEFCE announcement about UMF and there was quite a bit of discussion about UMF, virtualisation and the cloud at the recent JISC conference in Liverpool (at least from what I could see on the live video stream). It therefore seems appropriate to mention our role in this activity.

By way of background, the University Modernisation Fund (UMF) is a HEFCE initiative that aims to help universities and colleges in the UK deliver better efficiency and value for money through the development of shared services. Managed by the JISC, the programme has two core elements:

  • investment of up to £10 million in cloud computing, shared IT infrastructure, support to deliver virtual servers, storage and data management applications;
  • investment of up to £2.5 million to establish cloud computing and shared services in central administration functions to support learning, teaching, and research.

As part of our involvement in this activity, Eduserv is building a generalised virtualisation and cloud platform to serve up compute and storage resources as IaaS.  Both compute and storage resources will be offered at different tiers to enable delivery of a wide range of applications. At this stage, we expect the platform to offer the following services (though exact details are still under discussion with both JANET and the JISC):

  • VMware-based virtual machines;
  • physical blade servers;
  • block-level SAN disk storage;
  • file/object-level archive disk storage.

Whilst the platform is designated a pilot service, it will be delivered to production quality standards in order that we both build consumer confidence in service availability and that we are able to understand and mitigate any transitional or operational concerns as quickly as possible during the pilot. The platform will be designed to offer virtualization and cloud infrastructure to any projects funded through the UMF programme (at no cost) and to the wider UK HE community (using pricing and billing models that are still to be determined - Eduserv will be developing pricing and billing models that are both sympathetic to the needs of the academic community and that support a sustainable service in the future - again, in discussion with JANET).

We are in the process of designing this infrastructure in such a way to provide the following tactical benefits to HE institutions and supporting organisations: 

  • a fully-configurable virtualised environment that allows for the configuration of customer-specific infrastructure in segregated environments, offering high levels of security and performance;
  • a resilient network, capable of delivering wire-speed 10 Gigabit Ethernet connectivity from the physical or virtual server through to the JANET backbone, which enables institutions to make use of UMF services as though they were located locally on-premise;
  • a highly-scalable compute infrastructure that can simultaneously accommodate a number of different initiatives, from UMF-funded pilot SaaS services through to institution-specific virtualisation and cloud provisioning;
  • a multi-tier storage architecture providing a range of data services, from raw data processing through to research data management and longer-term storage.

From our perspective, the intention is to investigate and support the following strategic benefits across the HE community:

  • the potential to significantly reduce the amount of time and effort spent by HEIs in developing plans and associated business cases for institutional data centre infrastructure, leading to a reduction in capital expenditure on HEI-specific data centre construction, refit and on-going operations;
  • the provision of a focal point for new shared service development, offering on-demand development and test environments as well as high-quality production infrastructure capable of delivering enterprise-level SLAs;
  • a long-term, sustainable service blueprint that delivers IaaS services at pricing competitive with existing commercial providers, but with the benefit of direct JANET connectivity and HE-suited pricing models;
  • a service platform that offers IPv6 capability, assisting institutions in the transition from the currently depleted IPv4 address space.

During the morning cloud session at the JISC Conference, there were a couple of comments relating to "industrial scale" clouds, the implication being both that the education community can't build such clouds itself and that massive size matters in order to realise sufficient economies of scale to be worthwhile.

As I tweeted at the time, I don't believe that to be the case. Or, rather, I don't know if that is the case - one of the things we need to make sure comes out of the next 12 months or so of activity within UMF is some much better understanding of what the education community is capable of building itself, whether cloud infrastructure services within that community are likely to be sustainable, and what cost savings are likely to be made.

It seems to me that there is a scale that sits somewhere between "a single institution" and "industrial scale" (which I take to mean Amazon, Microsoft, etc.), a scale that the education community is well able to deliver, that is sufficiently far along the scale/cost curve for significant savings to be made.

The further one can move along the scale axis, the better - clearly. As in most things, size matters! But it is also the case that there are diminishing returns here I suspect. It remains to be seen how far educational providers can move along the scale, either individually or in collaboration, and whether the resulting infrastructure can be delivered in an attractive and sustainable way.

If you are interested in this kind of stuff, our annual free Eduserv Symposium (May 12th in London) will be focusing on virtualisation and the cloud in general, and UMF in particular - more on this shortly.

October 25, 2010

A few brief thoughts on iTunesU

The use of iTunesU by UK universities has come up in discussions a couple of times recently, on Brian Kelly's UK Web Focus blog (What Are UK Universities Doing With iTunesU? and iTunes U: an Institutional Perspective) and on the closed ALT-C discussion list. In both cases, as has been the case in previous discussions, my response has been somewhat cautious, an attitude that always seems to be interpreted as outright hostility for some reason.

So, just  for the record, I'm not particularly negative about iTunesU and in some respects I am quite positive - if nothing else, I recognise that the adoption of iTunesU is a very powerful motivator for the generation of openly available content and that has got to be a good thing - but a modicum of scepticism is always healthy in my view (particularly where commercial companies are involved) and I do have a couple of specific concerns about the practicalities of how it is used:

  • Firstly that students who do not own Apple hardware and/or who choose not to use iTunes on the desktop are not disenfranchised in any way (e.g. by having to use a less functional Web interface). In general, the response to this is that they are not and, in the absence of any specific personal experience either way, I have to concede that to be the case.
  • Secondly (and related to the first point), that in an environment where most of the emphasis seems to be on the channel (iTunesU) rather than on the content (the podcasts), that confusion isn't introduced as to how material is cited and referred to – i.e. do some lecturers only ever refer to 'finding stuff on iTunesU', while others offer a non-iTunesU Web URL, and others still remember to cite both? I'm interested in whether universities who have adopted iTunesU but who also make the material available in other ways have managed to adopt a single way of citing the material that is on offer?

Both these concerns relate primarily to the use of iTunesU as a distribution channel for teaching and learning content within the institution. They apply much less to its use as an external 'marketing' channel. iTunesU seems to me (based on a gut feel more than on any actual numbers) to be a pretty effective way of delivering OER outside the institution and to have a solid 'marketing win on the back of that. That said, it would be good to have some real numbers as confirmation (note that I don't just mean numbers of downloads here - I mean conversions into 'actions' (new students, new research opps, etc.)). Note that I also don't consider 'marketing' to be a dirty word (in this context) - actually, I guess this kind of marketing is going to become increasingly important to everyone in the HE sector.

There is a wider, largely religious, argument about whether "if you are not paying for it, you aren't the customer, you are part of the product" but HE has been part of the MS product for a long while now and, worse, we have paid for the privilege – so there is nothing particularly new there. It's not an argument that particularly bothers me one way or the other, provided that universities have their eyes open and understand the risks as well as the benefits. In general, I'm sure that they do.

On the other hand, while somebody always owns the channel, some channels seem to me to be more 'open' (I don't really want to use the word 'open' here because it is so emotive but I can't think of a better one) than others. So, for example, I think there are differences in an institution adopting YouTube as a channel as compared with adopting iTunesU as a channel and those differences are largely to do with the fit that YouTube has with the way the majority of the Web works.

October 13, 2010

What current trends tell us about the future of federated access management in education

As mentioned previously, I spoke at the FAM10 conference in Cardiff last week, standing in for another speaker who couldn't make it and using material crowdsourced from my previous post, Key trends in education - a crowdsource request, to inform some of what I was talking about. The slides and video from my talk follow:

As it turns out, describing the key trends is much easier than thinking about their impact on federated access management - I suppose I should have spotted this in advance - so the tail end of the talk gets rather weak and wishy-washy. And you may disagree with my interpretation of the key trends anyway. But in case it is useful, here's a summary of what I talked about. Thanks to those of you who contributed comments on my previous post.

By way of preface, it seems to me that the core working assumptions of the UK Federation have been with us for a long time - like, at least 10 years or so - essentially going back to the days of the centrally-funded Athens service. Yet over those 10 years the Internet has changed in almost every respect. Ignoring the question of whether those working assumptions still make sense today, I think it certainly makes sense to ask ourselves about what is coming down the line and whether our assumptions are likely to still make sense over the next 5 years or so. Furthermore, I would argue that federated access management as we see it today in education, i.e. as manifested thru our use of SAML, shows a rather uncomfortable fit with the wider (social) web that we see growing up around us.

And so... to the trends...

The most obvious trend is the current financial climate, which won't be with us for ever of course, but which is likely to cause various changes while it lasts and where the consequences of those changes, university funding for example, may well be with us much longer than the current crisis. In terms of access management, one impact of the current belt-tightening is that making a proper 'business case' for various kinds of activities, both within institutions and nationally, will likely become much more important. In my talk, I noted that submissions to the UCISA Award for Excellence (which we sponsor) often carry no information about staff costs, despite an explicit request in the instructions to entrants to indicate both costs and benefits. My point is not that institutions are necessarily making the wrong decisions currently but that the basis for those decisions, in terms of cost/benefit analysis, will probably have to become somewhat more rigorous than has been the case to date. Ditto for the provision of national solutions like the UK Federation.

More generally, one might argue that growing financial pressure will encourage HE institutions into behaving more and more like 'enterprises'. My personal view is that this will be pretty strongly resisted, by academics at least, but it may have some impact on how institutions think about themselves.

Secondly, there is the related trend towards outsourcing and shared services, with the outsourcing of email and other apps to Google being the most obvious example. Currently that is happening most commonly with student email but I see no reason why it won't spread to staff email as well in due course. At the point that an institution has outsourced all its email to Google, can one assume that it has also outsourced at least part of its 'identity' infrastructure as well? So, for example, at the moment we typically see SAML call-backs being used to integrate Google mail back into institutional 'identity' and 'access management' systems (you sign into Google using your institutional account) but one could imagine this flipping around such that access to internal systems is controlled via Google - a 'log in with Google' button on the VLE for example. Eric Sachs, of Google, has recently written about OpenID in the Enterprise SaaS market, endorsing this view of Google as an outsourced identity provider.

Thirdly, there is the whole issue of student expectations. I didn't want to talk to this in detail but it seems obvious that an increasingly 'open' mashed and mashable experience is now the norm for all of us - and that will apply as much to the educational content we use and make available as it does to everything else. Further, the mashable experience is at least as much about being able to carry our identities relatively seamlessly across services as it is about the content. Again, it seems unclear to me that SAML fits well into this kind of world.

There are two other areas where our expectations and reality show something of a mis-match. Firstly, our tightly controlled, somewhat rigid approach to access management and security are at odds with the rather fuzzy (or at least fuzzilly interpretted) licences negotiated by Eduserv and JISC Collections for the external content to which we have access. And secondly, our over-arching sense of the need for user privacy (the need to prevent publishers from cross-referencing accesses to different resources by the same user for example) are holding back the development of personalised services and run somewhat counter to the kinds of things we see happening in mainstream services.

Fourthly, there's the whole growth of mobile - the use of smart-phones, mobile handsets, iPhones, iPads and the rest of it - and the extent to which our access management infrastructure works (or not) in that kind of 'app'-based environment.

Then there is the 'open' agenda, which carries various aspects to it - open source, open access, open science, and open educational resources. It seems to me that the open access movement cuts right to the heart of the primary use-case for federated access management, i.e. controlling access to published scholarly literature. But, less directly, the open science movement, in part, pushes researchers towards the use of more open 'social' web services for their scholarly communication where SAML is not typically the primary mechanism used to control access.

Similarly, the emerging personal learning environment (PLE) meme (a favorite of educational conferences currently), where lecturers and students work around their institutional VLE by choosing to use a mix of external social web services (Flickr, Blogger, Twitter, etc.) again encourages the use of external services that are not impacted by our choices around the identity and access management infrastructure and over which we have little or no control. I was somewhat sceptical about the reality of the PLE idea until recently. My son started at the City of Bath College - his letter of introduction suggested that he created himself a Google Docs account so that he could do his work there and submit it using email or Facebook. I doubt this is college policy but it was a genuine example of the PLE in practice so perhaps my scepticism is misplaced.

We also have the changing nature of the relationship between students and institutions - an increasingly mobile and transitory student body, growing disaggregation between the delivery of learning and accreditation, a push towards overseas students (largely for financial reasons), and increasing collaboration between institutions (both for teaching and research) - all of which have an impact on how students see their relationship with the institution (or institutions) with whom they have to deal. Will the notion of a mandated 3 or 4 year institutional email account still make sense for all (or even most) students in 5 or 10 years time?

In a similar way, there's the changing customer base for publishers of academic content to deal with. At the Eduserv Symposium last year, for example, David Smith of CABI described how they now find that having exposed much of their content for discovery via Google they have to deal with accesses from individuals who are not affiliated with any institution but who are willing to pay for access to specific papers. Their access management infrastructure has to cope with a growing range of access methods that sit outside the 'educational' space. What impact does this have on their incentives for conforming to education-only norms?

And finally there's the issue of usability, and particularly the 'where are you from' discovery problem. Our traditional approach to this kind of problem is to build a portal and try and control how the user gets to stuff, such that we can generate 'special' URLs that get them to their chosen content in such a way that they can be directed back to us seemlessly in order to login. I hate portals, at least insofar as they have become an architectural solution, so the less said the better. As I said in my talk, WAYFless URLs are an abomination in architectural terms, saved only by the fact that they work currently. In my presentation I played up the alternative usability work that the Kantara ULX group have been doing in this area, which it seems to me is significantly better than what has gone before. But I learned at the conference that Shibboleth and the UK WAYF service have both also been doing work in this area - so that is good. My worry though is that this will remain an unsolvable problem, given the architecture we are presented with. (I hope I'm wrong but that is my worry). As a counterpoint, in the more... err... mainstream world we are seeing a move towards what I call the 'First Bus' solution (on the basis that in many UK cities you only see buses run by the First Group (despite the fact that bus companies are supposed to operate in a free market)) where you only see buttons to log in using Google, Facebook and one or two others.

I'm not suggesting that this is the right solution - just noting that it is one strategy for dealing with an otherwise difficult usability problem.

Note that we are also seeing some consolidation around technology as well - notably OpenID and OAuth - though often in ways that hides it from public view (e.g. hidden behind a 'login with google' or 'login with facebook' button).

Which essentially brings me to my concluding screen - you know, the one where I talk about all the implications of the trends above - which is where I have less to say than I should! Here's the text more-or-less copy-and-pasted from my final slide:

  • ‘education’ is a relatively small fish in a big pond (and therefore can't expect to drive the agenda)
  • mainstream approaches will win (in the end) - ignoring the difficult question of defining what is mainstream
  • for the Eduserv OpenAthens product, Google is as big a threat as Shibboleth (and the same is true for Shibboleth)
  • the current financial climate will have an effect somewhere
  • HE institutions are probably becoming more enterprise-like but they are still not totally like commercial organisations and they tend to occupy an uncomfortable space between the ‘enterprise’ and the ‘social web’ driven by different business needs (c.f. the finance system vs PLEs and open science)
  • the relationships between students (and staff) and institutions are changing

In his opening talk at FAM10 the day before, David Harrison had urged the audience to become leaders in the area of federated access management. In a sense I want the same. But I also want us, as a community, to become followers - to accept that things happen outside our control and to stop fighting against them the whole time.

Unfortunately, that's a harder rallying call to make!

Your comments on any/all of the above are very much welcomed.

September 06, 2010

On funding and sustainable services

I write this post with some trepidation, since I know that it will raise issues that are close to the hearts of many in the community but discussion on the jisc-repositories list following Steve Hitchcock's post a few days ago (which I posted in full here recently) has turned to the lessons that the withdrawl of JISC funding for the Intute service might teach us in terms of transitioning JISC- (or other centrally-) funded activities into self-sustaining services.

I'm reminded of a recent episode of the Dragon's Den on BBC TV where it emerged that the business idea being proposed for investment had survived thus far on European project funding. The dragons took a dim view of this, on the basis, I think, that such funding would only rarely result in a viable business because of a lack of exposure to 'real' market forces and the proposer was dispatched forthwith (the dragons clearly never having heard of Google! :-) ).

On the mailing list, views have been expressed that projects find it hard to turn into services because they attract the wrong kind of staff, or that the IPR situation is wrong, or that they don't get good external business advice. All valid points I'm sure. But I wonder if one could make the argument that it is the whole model of centralised project funding for activities that are intended to transition into viable, long-term, self-sustaining businesses that is part of the problem. (Note: I don't think this applies to projects that are funded purely in the pursuit of knowledge). By that I mean that such funding tends to skew the market in rather unhelpful ways, not just for the projects in question but for everyone else - ultimately in ways that make it hard for viable business models to emerge at all.

There are a number of reasons for this - reasons that really did not become apparent to me until I started working for an organisation that can only survive by spending all its time worrying about whether its business models are viable.

Firstly, centralised funding tends to mean that ideas are not subject to market forces early enough - not just not subjected, but market forces are not even considered by those proposing/evaluating the projects. Often we can barely get people to use the results of project funding when we give them away for free - imagine if we actually tried to charge people for them!? The primary question is not, 'can I get user X or institution Y to pay for this?' but 'can I get the JISC to pay for this?' which is a very different proposition.

Secondly, centralised funding tends to support people (often very clever people) who can then cherry-pick good ideas and develop them without any concern for sustainable business-models, and who subsequently may or may not be in a position to support them long term, but who thus prevent others, who might develop something more sustainable, from even getting started.

Thirdly, the centrally-funded model contributes to a wider 'free-at-the-point-of-use' mindset where people simply are not used to thinking in terms of 'how much is it really costing to do this?' and 'what would somebody actually be prepared to pay for this?' and where there is little incentive to undertake a cost/benefit analysis or prepare a proper business case. As I've mentioned here before, I've been on the receiving end of many proposals under the UCISA Award for Excellence programme that were explicitly asked to assess their costs and benefits but who chose to treat staff time at zero cost simply because those staff were in the employ of the institutions anyway.

Now... before you all shout at me, I don't think market forces are the be-all and end-all of this and I think there are plenty of situations where services, particularly infrastructural services, are better procured centrally than by going out to the market. This post is absolutely not a rant that everything funded by the JISC is necessarily pants - far from it.

That said, my personal view is that Intute did not fall into that class of infrastructural service and that it was rightly subjected to an analysis of whether its costs outweighed its benefits. I wasn't involved in that analysis, so I can't really comment on it - I'm sure there is a debate to be had about how the 'benefits' were assessed and measured. But my suspicion is that if one had asked every UK HE institution to pay a subscription to Intute not many would have been willing to do so - were that the case, I presume that Intute would be exploring that model right now? That, it seems to me, is the ultimate test of viability - or at least one of them. As I mentioned before, one of the lessons here is the speed with which we, as a community, can react to the environmental changes around us and how we deal with the fall-out - which is as much about how the viability of business models changes over time as it is about technology.

I certainly don't think there are any easy answers.

Comparing Yahoo Directory and the eLib Subject Gateways (the fore-runners of Intute), which emerged at around the same time and which attempted to meet a similar need (see Lorcan Dempsey's recent post, Curating the web ...), it's interesting that the Yahoo offering has proved to be longer lasting than the subject gateways, albeit in a form that is largely hidden from view, supported (I guess) by an advertising- and paid-for-listings- based model, a route that presumably wasn't/isn't considered appropriate or sufficient for an academic service?

Addendum (8 September 2010): Related to this post, and well worth reading, see Lorcan Dempsey's post from last year, Entrepreneurial skills are not given out with grant letters.

August 24, 2010

Resource discovery revisited...

...revisited for me that is!

Last week I attended an invite-only meeting at the JISC offices in London, notionally entitled a "JISC IE Technical Review" but in reality a kind of technical advisory group for the JISC and RLUK Resource Discovery Taskforce Vision [PDF], about which the background blurb says:

The JISC and RLUK Resource Discovery Taskforce was formed to focus on defining the requirements for the provision of a shared UK resource discovery infrastructure to support research and learning, to which libraries, archives, museums and other resource providers can contribute open metadata for access and reuse.

The morning session felt slightly weird (to me), a strange time-warp back to the kinds of discussions we had a lot of as the UK moved from the eLib Programme, thru the DNER (briefly) into what became known (in the UK) as the JISC Information Environment - discussions about collections and aggregations and metadata harvesting and ... well, you get the idea.

In the afternoon we were split into breakout groups and I ended up in the one tasked with answering the question "how do we make better websites in the areas covered by the Resource Discovery Taskforce?", a slightly strange question now I look at it but one that was intended to stimulate some pragmatic discussion about what content providers might actually do.

Paul Walk has written up a general summary of the meeting - the remainder of this post focuses on the discussion in the 'Making better websites' afternoon breakout group and my more general thoughts.

Our group started from the principles of Linked Data - assign 'http' URIs to everything of interest, serve useful content (both human-readable and machine-processable (structured according to the RDF model)) at those URIs, and create lots of links between stuff (internal to particular collections, across collections and to other stuff). OK... we got slightly more detailed than that but it was a fairly straight-forward view that Linked Data would help and was the right direction to go in. (Actually, there was a strongly expressed view that simply creating 'http' URIs for everything and exposing human-readable content at those URIs would be a huge step forward).

Then we had a discussion about what the barriers to adoption might be - the problems of getting buy-in from vendors and senior management, the need to cope with a non-obvious business model (particularly in the current economic climate), the lack of technical expertise (not to mention semantic expertise) in parts of those sectors, the endless discussions that might take place about how to model the data in RDF, the general perception that Semantic Web is permanently just over the horizon and so on.

And, in response, we considered the kinds of steps that JISC (and its partners) might have to undertake to build any kind of political momentum around this idea.

To cut a long story short, we more-or-less convinced ourselves out of a purist Linked Data approach as a way forward, instead preferring a 4 layer model of adoption, with increasing levels of semantic richness and machine-processability at each stage:

  1. expose data openly in any format available (.csv files, HTML pages, MARC records, etc.)
  2. assign 'http' URIs to things of interest in the data, expose it in any format available (.csv files, HTML pages, etc.) and serve useful content at each URI
  3. assign 'http' URIs to things of interest in the data, expose it as XML and serve useful content at each URI
  4. assign 'http' URIs to things of interest in the data and expose Linked Data (as per the discussion above).

These would not be presented as steps to go thru (do 1, then 2, then 3, ...) but as alternatives with increasing levels of semantic value. Good practice guidance would encourage the adoption of option 4, laying out the benefits of such an approach, but the alternatives would provide lower barriers to adoption and offer a simpler 'sell' politically.

The heterogeneity of data being exposed would leave a significant implementation challenge for the aggregation services attempting to make use of it and the JISC (and partners) would have to fund some pretty convincing demonstrators of what might usefully be achieved.

One might characterise these approaches as '' (echoing '' but where 'glam' is short for 'galleries, libraries, archives and museums') and/or Digital UK (echoing the pragmatic approaches being successfully adopted by the Digital NZ activity in New Zealand).

Despite my reservations about the morning session, the day ended up being quite a useful discussion. That said, I remain somewhat uncomfortable with its outcomes. I'm a purest at heart and the 4 levels above are anything but pure. To make matters worse, I'm not even sure that they are pragmatic. The danger is that people will adopt only the lowest, least semantic, option and think they've done what they need to do - something that I think we are seeing some evidence of happening within 

Perhaps even more worryingly, having now stepped back from the immediate talking-points of the meeting itself, I'm not actually sure we are addressing a real user need here any more - the world is so different now than it was when we first started having conversations about exposing cultural heritage collections on the Web, particularly library collections - conversations that essentially pre-dated Google, Google Scholar, Amazon, WorldCat, CrossRef, ... the list goes on. Do people still get agitated by, for example, the 'book discovery' problem in the way they did way back then? I'm not sure... but I don't think I do. At the very least, the book 'discovery' problem has largely become an 'appropriate copy' problem - at least for most people? Well, actually, let's face it... for most people the book 'discovery' and 'appropriate copy' problems have been solved by Amazon!

I also find the co-location of libraries, museums and archives, in the context of this particular discussion, rather uncomfortable. If anything, this grouping serves only to prolong the discussion and put off any decision making?

Overall then, I left the meeting feeling somewhat bemused about where this current activity has come from and where it is likely to go.


August 13, 2010

Cloud infrastructures for academia - the FleSSR project

Yesterday, I attended the kick-off meeting for a new JISC-funded project called FleSSR - Flexible Services for the Support of Research. From the, as yet very new, project blog:

Our project will create a hybrid public-private Infrastructure as a Service cloud solution for academic research. The two pilot use cases chosen follow the two university partners interests, software development and multi-platform support and on-demand research data storage space.

We will be implementing open standards for cloud management through the OGF Open Cloud Computing Interface.

The project is a collaboration led by the Oxford e-Research Centre and involving STFC, Eduserv, the University of Reading, EoverI, Eucalyptus Inc. and Canonical Ltd.

Our role at Eduserv will primarily be to build a public cloud into which private clouds at Oxford and Reading can burst both compute resource and storage at times of high demand, as generated by pilot demonstrators at those two institutions. My colleagues Matt Johnson and Tim Lawrence will lead our work on this here. The clouds will be built on some variant of Eucalyptus and Ubuntu - one of the early pieces of work for the project team being to compare Open Eucalyptus, Enterprise Eucalyptus and Ubuntu Enterprise Cloud.

My own involvement with the project will start properly after Christmas and will contribute to the project's thinking about sustainable business models for cloud providers like Eduserv in this space. One of the interesting aspects of the project will be some technical work on policy enforcement and accounting that will allow business models other than 'top-sliced central-funding' to come into play in academia for this kind of provision.

I'm really looking forward to this work. The project itself, funded as part of the JISC's Flexible Service Delivery Programme, is only 10 months in duration but is attempting to cover a lot of ground very quickly. I'm very hopeful that the outputs will be of widespread interest to the community, as well as helping to shape our own potential offerings in this area.

July 16, 2010

Finding e-books - a discovery to delivery problem

Some of you will know that we recently ran a quick survey of academic e-book usage in the UK - I hope to be able to report on the findings here shortly. One of the things that we didn't ask about in the survey but that has come up anecdotally in our discussions with librarians is the ease (or not) with which it is possible to find out if a particular e-book title is available.

A typical scenario goes like this. "Lecturer adds an entry for a physical book to a course reading list. Librarian checks the list and wants to know if there is an e-book edition of the book, in order to offer alternatives to the students on that course". Problemo. Having briefly asked around, it seems (somewhat surprisingly?) that there is no easy solution to this problem.

If we assume that the librarian in question knows the ISBN of the physical book, what can be done to try and ease the situation? Note that in asking this question I'm conveniently ignoring the looming, and potentially rather massive, issue around "what the hell is an e-book anyway?" and "how are we going to assign identifiers to them once we've worked out what they are?" :-). For some discussion around this see Eric Hellman's recent piece, What IS an eBook, anyway?

But, let's ignore that for now... we know that OCLC's xISBN service allows us to navigate different editions of the same book (I'm desperately trying not to drop into FRBR-speak here). Taking a quick look at the API documentation for xISBN yesterday, I noticed that the metadata returned for each ISBN can include both the fact that something is a 'Book' and that it is 'Digital' (form == 'BA' && form == 'DA') - that sounds like the working definition of an e-book to me (at least for the time being) - as well as listing the ISBNs for all the other editions/formats of the same book. So I knocked together a quick demonstrator. The result is e-Book Finder and you are welcome to have a play. To get you started, here are a couple of examples:

Of course, because e-Book Finder is based on xISBN, which is in turn based on WorldCat, you can only use it to find e-books that are listed in the catalogues of WorldCat member libraries (but I'm assuming that is a big enough set of libraries that the coverage is pretty good). Perhaps more importantly, it also only represents the first stage of the problem. It allows you to 'discover' that an e-book exists - but it doesn't get the thing 'delivered' to you.

Wouldn't it be nice if e-Book Finder could also answer questions like, "is this e-book covered by my existing institutional subscriptions?", "can I set up a new institutional subscription that would cover this e-book?" or simply "can I buy a one-off copy of this e-book?". It turns out that this is a pretty hard problem. My Licence Negotiation colleagues at Eduserv suggested doing some kind of search against myilibrary, dawsonera, Amazon, eBrary, eblib and SafariBooksOnline. The bad news is that (as far as I can tell), of those, only Amazon and SafariBooksOnline allow users to search their content before making them sign in and only Amazon offer an API. (I'm not sure why anyone would design a website that has the sole purpose of selling stuff such that people have to sign in before they can find out what is on offer, nor why that information isn't available in a openly machine-readable form but anyway...). So in this case, moving from discovery to delivery looks to be non-trivial. Shame. Even if each of these e-book 'aggregators' simply offered a list1 of the ISBNs of all the e-books they make available, it would be a step in the right direction.

On the other hand, maybe just pushing the question to the institutional OpenURL resolver would help answer these questions. Any suggestions for how things could be improved?

1. It's a list so that means RSS or Atom, right?

March 23, 2010

Federating ?

I suggested a while back that PURLs have become quite important, at least for some aspects of the Web (particularly Linked Data as it happens), and that the current service at may therefore represent something of a single point of failure.

I therefore note with some interest that Zepheira, the company developing the PURL software, have recently announced a PURL Federation Architecture:

A PURL Federation is proposed which will consist of multiple independently-operated PURL servers, each of which have their own DNS hostnames, name their PURLs using their own authority (different from the hostname) and mirror other PURLs in the federation. The authorities will be "outsourced" to a dynamic DNS service that will resolve to proxies for all authorities of the PURLs in the federation. The attached image illustrates and summarizes the proposed architecture.

Caching proxies are inserted between the client and federation members. The dynamic DNS service responds to any request with an IP address of a proxy. The proxy attempts to contact the primary PURL member via its alternative DNS name to fulfill the request and caches the response for future requests. In the case where the primary PURL member is not responsive, the proxy attempts to contact another host in the federation until it succeeds. Thus, most traffic for a given PURL authority continues to flow to the primary PURL member for that authority and not other members of the federation.

I don't know what is planned in this space, and I may not have read the architecture closely enough, but it seems to me that there is now a significant opportunity for OCLC to work with a small number of national libraries (the British Library, The Library of Congress and the National Library of Australia spring to mind as a usefully geographically-dispersed set) to federate the current service at ?

February 19, 2010

In the clouds

So, the Repositories and the Cloud meeting, jointly organised by ourselves and the JISC, takes place on Tuesday next week and I promised to write up my thoughts in advance.  Trouble is... I'm not sure I actually have any thoughts :-(

Let's start from the very beginning (it's a very good place to start)...

The general theory behind cloud solutions - in this case we are talking primarily about cloud storage solutions but I guess this applies more generally - is that you outsource parts of your business to someone else because:

  • they can do it better than you can,
  • they can do it more cheaply than you can,
  • they can do it in a more environmentally-friendly way than you can, or
  • you simply no longer wish to do it yourself for other reasons.

Seems simple enough and I guess that all of these apply to the issues at hand for the meeting next week, i.e. what use is there for utility cloud storage solutions for the data currently sitting in institutional repositories (and physically stored on disks inside the walls of the institution concerned).

Against that, there are a set of arguments or issues that mitigate against a cloud solution, such as:

  • security
  • data protection
  • sustainability
  • resilience
  • privacy
  • loss of local technical knowledge
  • ... know the arguments.  Ultimately institutions are going to end up asking themselves questions like, "how important is this data to us?", "are we willing to hand it over to one or more cloud providers for long term storage?", "can we afford to continue to store this stuff for ourselves?", "what is our exit strategy in the future?", and so on.

Wrapped up in this will be issues about the specialness of the kind of stuff one typically finds in institutional repositories - either because of volume of data (large research data-sets for example), or because stuff is seen as being especially important for various reasons (it's part of the scholarly record for example).

None of which is particularly helpful in terms of where the meeting will take us!  I certainly don't expect any actual answers to come out of it, but I am expecting a good set of discussions both about current capabilities (what the current tools are capable of), policy issues, and about where we are likely to go in the future.

One of the significant benefits the current interest in cloud solutions brings is the abstraction of the storage layer from the repository services.  Even if I never actually make use of Amazon S3, I might still get significant benefit from the cloud storage mindset because my internal repository 'storage' layer is separated from the rest of the software.  That means that I can do things like sharing data across multiple internal stores, sharing data across multiple external stores, or some combination of both, much more easily.  It also potentially opens up the market to competing products.

So, I think this space has wider implications than a simple, "should I use cloud storage?" approach might imply.

From an Eduserv point of view, both as a provider of not-for-profit services to the public, health and education sectors and as an organisation with a brand spanking new data centre I don't think there's any secret in the fact that we want to understand whether there is anything useful we can bring to this space - as a provider of cloud storage solutions that are significantly closer to the community than the utility providers are for example.  That's not to say that we have such an offer currently - but it is the kind of thing we are interested in thinking about.

I don't particularly buy into the view that the cloud is nothing new.  Amazon S3 and its ilk didn't exist 10 years ago and there's a reason for that.  As markets and technology have matured new things have become possible.  But that, on its own, isn't a reason to play in the cloud space. So, I suppose that the real question for the meeting next week is, "when, if ever, is the right time to move to cloud storage solutions for repository content... and why?" - both from a practical and a policy viewpoint.

I don't know the answers to those questions but I'm looking forward to finding out more about it next week.

February 11, 2010

Repositories and the Cloud - tell us your views

It's now a little over a week to go until the Repositories and the Cloud event (jointly organised by Eduserv and the JISC) takes place in London.  The event is sold out (sorry to those of you that haven't got a place) and we have a full morning of presentations from DuraSpace, Microsoft and EPrints and an afternoon of practical experience (Terry Harmer of the Belfast eScience Centre) and parallel discussion groups looking at both policy and technical issues.

To those of you that are coming, please remember that the afternoon sessions are for discussion.  We want you to get involved, to share your thoughts and to challenge the views of other people at the event (in the nicest way possible of course).  We'd love to know what you think about repositories and the cloud (note that, by that phrase, I mean the use of utility cloud providers as back-end storage for repository-like services).  Please share your thoughts below, or blog them using the event tag - 'repcloud' - or just bring them with you on the day!

I will share my thoughts separately here next week but let me be honest... I don't actually know what I think about the relationship between repositories and the cloud.  I'm coming to the meeting with an open mind.  As a community, we now have some experience of the policy and technical issues in our use of the cloud for things like undergraduate email but I expect the policy issues and technical requirements around repositories to be significantly different.  On that basis, I am really looking forward to the meeting.

The chairs of the two afternoon sessions, Paul Miller (paul.miller (at) who is leading the policy session and Brad McLean (bmclean (at) who is leading the technical session, would also like to hear your views on what you hope their sessions will cover.  If you have ideas please get in touch, either thru the comments form below, via Twitter (using the '#repcloud' hashtag) or by emailing them directly.


February 02, 2010

Second Life, scalability and data centres

Interesting article about the scalability issues around Second Life, What Second Life can teach your datacenter about scaling Web apps. (Note: this is not about the 3-D virtual world aspects of Second Life but about how the infrastructure to support it is delivered.)

Plenty of pixels have been spilled on the subject of where you should be headed: to single out one resource at random, Microsoft presented a good paper ("On Designing and Deploying Internet-Scale Services" [PDF]) with no less than 71 distinct recommendations. Most of them are good ("Use production data to find problems"); few are cheap ("Document all conceivable component failure modes and combinations thereof"). Some of the paper's key overarching principles: make sure all your code assumes that any component can be in any failure state at any time, version all interfaces such that they can safely communicate with newer and older modules, practice a high degree of automated fault recovery, auto-provision all resources. This is wonderful advice for very large projects, but herein lies a trap for smaller ones: the belief that you can "do it right the first time." (Or, in the young-but-growing scenario, "do it right the second time.") This unlikely to be true in the real world, so successful scaling depends on adapting your technology as the system grows.

December 03, 2009

On being niche

I spoke briefly yesterday at a pre-IDCC workshop organised by REPRISE.  I'd been asked to talk about Open, social and linked information environments, which resulted in a re-hash of the talk I gave in Trento a while back.

My talk didn't go too well to be honest, partly because I was on last and we were over-running so I felt a little rushed but more because I'd cut the previous set of slides down from 119 to 6 (4 really!) - don't bother looking at the slides, they are just images - which meant that I struggled to deliver a very coherent message.  I looked at the most significant environmental changes that have occurred since we first started thinking about the JISC IE almost 10 years ago.  The resulting points were largely the same as those I have made previously (listen to the Trento presentation) but with a slightly preservation-related angle:

  • the rise of social networks and the read/write Web, and a growth in resident-like behaviour, means that 'digital identity' and the identification of people have become more obviously important and will remain an important component of provenance information for preservation purposes into the future;
  • Linked Data (and the URI-based resource-oriented approach that goes with it) is conspicuous by its absence in much of our current digital library thinking;
  • scholarly communication is increasingly diffusing across formal and informal services both inside and outside our institutional boundaries (think blogging, Twitter or Google Wave for example) and this has significant implications for preservation strategies.

That's what I thought I was arguing anyway!

I also touched on issues around the growth of the 'open access' agenda, though looking at it now I'm not sure why because that feels like a somewhat orthogonal issue.

Anyway... the middle bullet has to do with being mainstream vs. being niche.  (The previous speaker, who gave an interesting talk about MyExperiment and its use of Linked Data, made a similar point).  I'm not sure one can really describe Linked Data as being mainstream yet, but one of the things I like about the Web Architecture and REST in particular is that they describe architectural approaches that haven proven to be hugely successful, i.e. they describe the Web.  Linked data, it seems to me, builds on these in very helpful ways.  I said that digital library developments often prove to be too niche - that they don't have mainstream impact.  Another way of putting that is that digital library activities don't spend enough time looking at what is going on in the wider environment.  In other contexts, I've argued that "the only good long-term identifier, is a good short-term identifier" and I wonder if that principle can and should be applied more widely.  If you are doing things on a Web-scale, then the whole Web has an interest in solving any problems - be that around preservation or anything else.  If you invent a technical solution that only touches on scholarly communication (for example) who is going to care about it in 50 or 100 years - answer, not all that many people.

It worries me, for example, when I see an architectural diagram (as was shown yesterday) which has channels labelled 'OAI-PMH', XML' and 'the Web'!

After my talk, Chris Rusbridge asked me if we should just get rid of the JISC IE architecture diagram.  I responded that I am happy to do so (though I quipped that I'd like there to be an archival copy somewhere).  But on the train home I couldn't help but wonder if that misses the point.  The diagram is neither here nor there, it's the "service-oriented, we can build it all", mentality that it encapsulates that is the real problem.

Let's throw that out along with the diagram.

November 23, 2009

Memento and negotiating on time

Via Twitter, initially in a post by Lorcan Dempsey, I came across the work of Herbert Van de Sompel and his comrades from LANL and Old Dominion University on the Memento project:

The project has since been the topic of an article in New Scientist.

The technical details of the Memento approach are probably best summarised in the paper "Memento: Time Travel for the Web", and Herbert has recently made available a presentation which I'll embed here, since it includes some helpful graphics illustrating some of the messaging in detail:

Memento seeks to take advantage of the Web Architecture concept that interactions on the Web are concerned with exchanging representations of resources. And for any single resource, representations may vary - at a single point in time, variant representations may be provided, e.g. in different formats or languages, and over time, variant representations may be provided reflecting changes in the state of the resource. The HTTP protocol incorporates a feature called content negotiation which can be used to determine the most appropriate representation of a resource - typically according to variables such as content type, language, character set or encoding. The innovation that Memento brings to this scenario is the proposition that content negotiation may also be applied to the axis of date-time. i.e. in the same way that a client might express a preference for the language of the representation based on a standard request header, it could also express a preference that the representation should reflect resource state at a specified point in time, using a custom accept header (X-Accept-Datetime).

More specifically, Memento uses a flavour of content negotiation called "transparent content negotiation" where the server provides details of the variant representations available, from which the client can choose. Slides 26-50 in Herbert's presentation above illustrate how this technique might be applied to two different cases: one in which the server to which the initial request is sent is itself capable of providing the set of time-variant representations, and a second in which that server does not have those "archive" capabilities but redirects to (a URI supported by) a second server which does.

This does seem quite an ingenious approach to the problem, and one that potentially has many interesting applications, several of which Herbert alludes to in his presentation.

What I want to focus on here is the technical approach, which did raise a question in my mind. And here I must emphasise that I'm really just trying to articulate a question that I've been trying to formulate and answer for myself: I'm not in a position to say that Memento is getting anything "wrong", just trying to compare the Memento proposition with my understanding of Web architecture and the HTTP protocol, or at least the use of that protocol in accordance with the REST architectural style, and understand whether there are any divergences (and if there are, what the implications are).

In his dissertation in which he defines the REST architectural style, Roy Fielding defines a resource as follows:

More precisely, a resource R is a temporally varying membership function MR(t), which for time t maps to a set of entities, or values, which are equivalent. The values in the set may be resource representations and/or resource identifiers. A resource can map to the empty set, which allows references to be made to a concept before any realization of that concept exists -- a notion that was foreign to most hypertext systems prior to the Web. Some resources are static in the sense that, when examined at any time after their creation, they always correspond to the same value set. Others have a high degree of variance in their value over time. The only thing that is required to be static for a resource is the semantics of the mapping, since the semantics is what distinguishes one resource from another.

On representations, Fielding says the following, which I think is worth quoting in full. The emphasis in the first and last sentences is mine.

REST components perform actions on a resource by using a representation to capture the current or intended state of that resource and transferring that representation between components. A representation is a sequence of bytes, plus representation metadata to describe those bytes. Other commonly used but less precise names for a representation include: document, file, and HTTP message entity, instance, or variant.

A representation consists of data, metadata describing the data, and, on occasion, metadata to describe the metadata (usually for the purpose of verifying message integrity). Metadata is in the form of name-value pairs, where the name corresponds to a standard that defines the value's structure and semantics. Response messages may include both representation metadata and resource metadata: information about the resource that is not specific to the supplied representation.

Control data defines the purpose of a message between components, such as the action being requested or the meaning of a response. It is also used to parameterize requests and override the default behavior of some connecting elements. For example, cache behavior can be modified by control data included in the request or response message.

Depending on the message control data, a given representation may indicate the current state of the requested resource, the desired state for the requested resource, or the value of some other resource, such as a representation of the input data within a client's query form, or a representation of some error condition for a response. For example, remote authoring of a resource requires that the author send a representation to the server, thus establishing a value for that resource that can be retrieved by later requests. If the value set of a resource at a given time consists of multiple representations, content negotiation may be used to select the best representation for inclusion in a given message.

So at a point in time t1, the "temporally varying membership function" maps to one set of values, and - in the case of a resource whose representations vary over time - at another point in time t2, it may map to another, different set of values. To take a concrete example, suppose at the start of 2009, I launch a "quote of the day", and I define a single resource that is my "quote of the day", to which I assign the URI And I provide variant representations in XHTML and plain text. On 1 January 2009 (time t1), my quote is "From each according to his abilities, to each according to his needs", and I provide variant representations in those two formats, i.e. the set of values for 1 January 2009 is those two documents. On 2 January 2009 (time t2), my quote is "Those who do not move, do not notice their chains", and again I provide variant representations in those two formats, i.e. the set of values for 2 January 2009 (time t2) is two XHTML and plain text documents with different content from those provided at time t1.

So, moving on to that second piece of text I cited, my interpretation of the final sentence as it applies to HTTP (and, as I say, I could be wrong about this) would be that the RESTful use of the HTTP GET method is intended to retrieve a representation of the current state of the resource. It is the value set at that point in time which provides the basis for negotiation. So, in my example here, on 1 January 2009, I offer XHTML and plain text versions of my "From each according to his abilities..." quote via content negotiation, and on 2 January 2009, I offer XHTML and plain text versions of my "Those who do not move..." quotations. i.e. At two different points in time t1 and t2, different (sets of) representations may be provided for a single resource, reflecting the different state of that resource at those two different points in time, but at either of those points in time, the expectation is that each representation of the set available represents the state of the resource at that point in time, and only members of that set are available via content negotiation. So although representations may vary by language, content-type etc, they should be in some sense "equivalent" (Roy Fielding's term) in terms of their representation of the current state of the resource.

I think the Memento approach suggests that on 2 January 2009, I could, using the date-time-based negotiation convention, offer all four of those variants listed above (and on each day into the future, a set which increases in membership as I add new quotes). But it seems to me that is at odds with the REST style, because the Memento approach requires that representations of different states of the resource (i.e. the state of the resource at different points in time) are all made available as representations at a single point in time.

I appreciate that (even if my interpretation is correct, which it may not be) the constraints specified by the REST architectural style are just that: a set of constraints which, if observed, generate certain properties/characteristics in a system. And if some of those constraints are relaxed or ignored, then those properties change. My understanding is not good enough to pinpoint exactly what the implications of this particular point of divergence (if indeed it is one!) would be - though as Herbert notes in hs presentation, it would appear that there would be implications for cacheing.

But as I said, I'm really just trying to raise the questions which have been running around my head and which I haven't really been able to answer to my own satisfaction.

As an aside, I think Memento could probably achieve quite similar results by providing some metadata (or a link to another document providing that metadata) which expressed the relationships between the time-variant resource and all the time-specific variant resources, rather than seeking to manage this via HTTP content negotiation.

Postscript: I notice that, in the time it has taken me to draft this post, Mark Baker has made what I think is a similar point in a couple of messages (first, second) to the W3C public-lod mailing list.

November 16, 2009

The future has arrived


About 99% of the way thru Bill Thompson's closing keynote to the CETIS 2009 Conference last week I tweeted:

great technology panorama painted by @billt in closing talk at #cetis09

And it was a great panorama - broad, interesting and entertainingly delivered. It was a good performance and I am hugely in awe of people who can give this kind of presentation. However, what the talk didn't do was move from the "this is where technology has come from, this is where it is now and this is where it is going" kind of stuff to the "and this is what it means for education in the future". Which was a shame because in questioning after his talk Thompson did make some suggestions about the future of print news media (not surprising for someone now at the BBC) and I wanted to hear similar views about the future of teaching, learning and research.

As Oleg Liber pointed out in his question after the talk, universities, and the whole education system around them, are lumbering beasts that will be very slow to change in the face of anything. On that basis, whilst it is interesting to note that (for example) we can now just about store a bit on an atom (meaning that we can potentially store a digital version of all human output on something the weight of a human body), that we can pretty much wire things directly into the human retina, and that Africa will one-day overtake 'digital' Britain in the broadband stakes are interesting individual propositions in their own right, there comes a "so what?" moment where one is left wondering what it actually all means.

As an aside, and on a more personal note, I suggest that my daughter's experience of university (she started at Sheffield Hallam in September) is not actually going to be very different to my own, 30-odd years ago. Lectures don't seem to have changed much. Project work doesn't seem to have changed much. Going out drinking doesn't seem to have changed much. She did meet all her 'hall' flat-mates via Facebook before she arrived in Sheffield I suppose :-) - something I never had the opportunity to do (actually, I never even got a place in hall). There is a big difference in how it is all paid for of course but the interesting question is how different university will be for her children. If the truth is, "not much", then I'm not sure why we are all bothering.

At one point, just after the bit about storing a digital version of all human output I think, Thompson did throw out the question, "...and what does that mean for copyright law?". He didn't give us an answer. Well, I don't know either to be honest... though it doesn't change the fact that creative people need to be rewarded in some way for their endeavours I guess. But the real point here is that the panorama of technological change that Thompson painted for us, interesting as it was, begs some serious thinking about what the future holds.  Maybe Thompson was right to lay out the panorama and leave the serious thinking to us?

He was surprisingly positive about Linked Data, suggesting that the time is now right for this to have a significant impact.  I won't disagree because I've been making the same point myself in various fora, though I tend not to shout it too loudly because I know that the Semantic Web has a history of not quite making it.  Indeed, the two parallel sessions that I attended during the conference, University API and the Giant Global Graph both focused quite heavily on the kinds of resources that universities are sitting on (courses, people/expertise, research data, publications, physical facilities, events and so on) that might usefully be exposed to others in some kind of 'open' fashion.  And much of the debate, particularly in the second session (about which there are now some notes), was around whether Linked Data (i.e. RDF) is the best way to do this - a debate that we've also seen played out recently on the uk-government-data-developers Google Group.

The three primary issues seemed to be:

  • Why should we (universities) invest time and money exposing our data in the hope that people will do something useful/interesting/of value with it when we have many other competing demands on our limited resources?
  • Why should we take the trouble to expose RDF when it's arguably easier for both the owner and the consumer of the data to expose something simpler like a CSV file?
  • Why can't the same ends be achieved by offering one or more services (i.e. a set of one or more APIs) rather than the raw data itself?

In the ensuing debate about the why and the how, there was a strong undercurrent of, "two years ago SOA was all the rage, now Linked Data is all the rage... this is just a fashion thing and in two years time there'll be something else".  I'm not sure that we (or at least I) have a well honed argument against this view but, for me at least, it lies somewhere in the fit with resource-orientation, with the way the Web works, with REST, and with the Web Architecture.

On the issue of the length of time it is taking for the Semantic Web to have any kind of mainstream impact, Ian Davis has an interesting post, Make or Break for the Semantic Web?, arguing that this is not unusual for standards track work:

Technology, especially standards track work, takes years to cross the chasm from early adopters (the technology enthusiasts and visionaries) to the early majority (the pragmatists). And when I say years, I mean years. Take CSS for example. I’d characterise CSS as having crossed the chasm and it’s being used by the early majority and making inroads into the late majority. I don’t think anyone would seriously argue that CSS is not here to stay.

According to this semi-official history of CSS the first proposal was in 1994, about 13 years ago. The first version that was recognisably the CSS we use today was CSS1, issued by the W3C in December 1996. This was followed by CSS2 in 1998, the year that also saw the founding of the Web Standards Project. CSS 2.1 is still under development, along with portions of CSS3.

Paul Walk has also written an interesting post, Linked, Open, Semantic?, in which he argues that our discussions around the Semantic Web and Linked Data tend to mix up three memes (open data, linked data and semantics) in rather unhelpful ways. I tend to agree, though I worry that Paul's proposed distinction between Linked Data and the Semantic Web is actually rather fuzzier than we may like.

On balance, I feel a little uncomfortable that I am not able to offer a better argument against the kinds of anti-Linked Data views expressed above. I think I understand the issues (or at least some of them) pretty well but I don't have them to hand in a kind of this is why Linked Data is the right way forward 'elevator pitch'.

Something to work on I guess!

[Image: a slide from Bill Thompson's closing keynote to the CETIS 2009 Conference]

October 14, 2009

Open, social and linked - what do current Web trends tell us about the future of digital libraries?

About a month ago I travelled to Trento in Italy to speak at a Workshop on Advanced Technologies for Digital Libraries organised by the EU-funded CACOA project.

My talk was entitled "Open, social and linked - what do current Web trends tell us about the future of digital libraries?" and I've been holding off blogging about it or sharing my slides because I was hoping to create a slidecast of them. Well... I finally got round to it and here is the result:

Like any 'live' talk, there are bits where I don't get my point across quite as I would have liked but I've left things exactly as they came out when I recorded it. I particularly like my use of "these are all very bog standard... err... standards"! :-)

Towards the end, I refer to David White's 'visitors vs. residents' stuff, about which I note he has just published a video. Nice one.

Anyway... the talk captures a number of threads that I've been thinking and speaking about for the last while. I hope it is of interest.

October 06, 2009


FOTE (the Future of Technology in Education conference organised by ULCC), which I attended on Friday, is a funny beast.  For two years running it has been a rather mixed conference overall but one that has been rescued by one or two outstanding talks that have made turning up well worthwhile and left delegates going into the post-conference drinks reception with something of a buzz.

Last year it was Miles Metcalfe of Ravensbourne College who provided the highlight.  This year it was down to Will McInnes (of Nixon/McInnes) to do the same, kicking off the afternoon with a great talk, making up for a rather ordinary morning, followed closely by James Clay (of Gloucestershire College).  If this seems a little harsh... don't get me wrong.  I thought that much of the afternoon session was worth listening to and, overall, I think that any conference that can get even one outstanding talk from a speaker is doing pretty well - this year we had at least two.  So I remain a happy punter and would definitely consider going back to FOTE in future years.

My live-blogged notes are now available in a mildly tidied up form.  This year's FOTE was heavily tweeted (the wifi network provided by the conference venue was very good) and about half-way thru the day I began to wonder if my live-blogging was adding anything to the overall stream?  On balance, and looking back at it now, I think the consistency added by by single-person viewpoint is helpful.  As I've noted before, I live-blog primarily as a way of taking notes.  The fact that I choose to take my notes in public is an added bonus (hopefully!) for anyone that wants to watch my inadequate fumblings.

The conference was split into two halves - the morning session looking at Cloud Computing and the afternoon looking at Social Media.  The day was kicked off by Paul Miller (of Cloud of Data) who gave a pretty reasonable summary of the generic issues but who fell foul, not just of trying to engage in a bit of audience participation very early in the day, but of trying to characterise issues that everyone already understood to be fuzzy and grey into shows of hands that required black and white, yes/no answers.  Nobody fell for it I'm afraid.

And that set the scene for much of the morning session.  Not enough focus on what cloud computing means for education specifically (though to his credit Ray Flamming (of Microsoft) did at least try to think some of that through and the report by Robert Moores (of Leeds Met) about their experiences with Google Apps was pretty interesting) and not enough acknowledgment of the middle ground.  Even the final panel session (for which there was nowhere near enough time by the way) tried to position panelists as either for or against but it rapidly became clear there was no such divide.  The biggest point of contention seemed to be between those who wanted to "just do it" and those who wanted to do it with greater reference to legal and/or infrastructural considerations - a question largely of pace rather than substance.

If the day had ended at lunchtime I would have gone home feeling rather let down.  But the afternoon recovered well.  My personal highlights were Will McInnes, James Clay and Dougald Hine (of School of Everything), all of whom challenged us to think about where education is going.  Having said that, I think that all of the afternoon speakers were pretty good and would likely have appealed to different sections of the audience, but those are the three that I'd probably go back and re-watch first. All the video streams are available from the conference website but here is Will's talk:

One point of criticism was that the conference time-keeping wasn't very good, leaving the final two speakers, Shirley Williams (of the University of Reading, talking about the This is Me project that we funded) and Lindsay Jordan (of the University of Bath/University of the Arts) with what felt like less than their alloted time.

For similar reasons, the final panel session on virtual worlds also felt very rushed.  I'd previously been rather negative about this panel (what, me?), suggesting that it might descend into pantomime.  Well, actually I was wrong.  I don't think it did (though I still feel a little bemused as to why it was on the agenda at all).  Its major problem was that there was only time to talk about one topic - simulation in virtual worlds - which left a whole range of other issues largely untouched.  Shame.

Overall then, a pretty good day I think.  Well done to the organisers... I know from my own experience with our symposium that getting this kind of day right isn't an easy thing to do.  I'll leave you with a quote (well, as best as I can remember it) from Lindsay Jordan who closed her talk with a slightly sideways take on Darwinism:

in the social media world the ones who survive - the fittest - are the ones who give the most

July 23, 2009

Give me an R

My rather outspoken outburst against the e-Framework a while back resulted in a mixed response, most of it in private replies for various reasons, largely supportive but some of which suggested that my negative tone was not conducive to encouraging debate.  For what it's worth, I agree with this latter point - I probably shouldn't have used the language that I did - but sometimes you just end up in a situation where you feel like letting off steam.  As a result, I didn't achieve any public debate about the value of the e-Framework and my negativity remains.  I would genuinely like to be set straight on this because I want to understand better what it is for and what its benefits have been (or will be).

Since writing that post I have been thinking, on and off, about why I feel so negative about it. In the main I think it comes down to a fundamental change in architectural thinking over the last few years. The e-Framework emerged at a time when 'service oriented' approaches (notably the WS- stack) were all the rage. The same can also be said to a certain extent about the JISC Information Environment of course (actually the JISC IE predated the rise of 'SOA' terminology but like most digital library initiatives it was certainly 'service oriented' in nature - I can remember people using phrases like, "we don't expose the resource, we expose services on the resource" very clearly) which I think explains some of my discomfort when I look back at that work now.

Perhaps SOA is still all the rage in 'enterprise' thinking, which is where the e-Framework seems to be most heavily focused, I don't know?1 It's not a world in which I dabble. But in the wider Web world it seems to me that all the interesting architectural thinking (at the technical level) is happening around REST and Linked Data at the moment. In short, the architectural focus has shifted from the 'service' to the 'resource'.

So, it's just about fashion then? No, not really - it's about the way architectural thinking has evolved over the last 10 years or so.  Ultimately, it’s that it seems more useful to think in resource-centric ways than it is to think in service-centric ways. Note that I'm not arguing that service oriented approaches have no role to play in our thinking at a business level, clearly they do - even in a resource oriented world. But I would suggest that if you adopt a technical architectural perspective that the world is service oriented at the outset then it is very hard to think in resource oriented terms later on and ultimately that is harmful to our use of the Web.

I think there are other problems with the e-Framework - the confusing terminology, the amount of effort that has gone into describing the e-Framework itself (as opposed to describing the things that are really of interest), the lack of obvious benefits and impact, and the heavyweight nature of the resulting ‘service’ descriptions – but it is this architectural difference that lies at the heart of my problem with it.

1) For what it's worth, I don't see why a resource oriented approach (at the technical level) shouldn't be adopted inside the enterprise as well as outside.

July 21, 2009

Linked data vs. Web of data vs. ...

On Friday I asked what I thought would be a pretty straight-forward question on Twitter:

is there an agreed name for an approach that adopts the 4 principles of #linkeddata minus the phrase, "using the standards (RDF, SPARQL)" ??

Turns out not to be so straight-forward, at least in the eyes of some of my Twitter followers. For example, Paul Miller responded with:

@andypowe11 well, personally, I'd argue that Linked Data does NOT require that phrase. But I know others disagree... ;-)


@andypowe11 I'd argue that the important bit is "provide useful information." ;-)

Paul has since written up his views more thoughtfully in his blog, Does Linked Data need RDF?, a post that has generated some interesting responses.

I have to say I disagree with Paul on this, not in the sense that I disagree with his focus on "provide useful information", but in the sense that I think it's too late to re-appropriate the "Linked Data" label to mean anything other than "use http URIs and the RDF model".

To back this up I'd go straight to the horses mouth, Tim Berners-Lee, who gave us his personal view way back in 2006 with his 'design issues' document on Linked Data. This gave us the 4 key principles of Linked Data that are still widely quoted today:

  1. Use URIs as names for things.
  2. Use HTTP URIs so that people can look up those names.
  3. When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL).
  4. Include links to other URIs. so that they can discover more things.

Whilst I admit that there is some wriggle room in the interpretation of the 3rd point - does his use of "RDF, SPARQL" suggested these as possible standards or is the implication intended to be much stronger? - more recent documents indicate that the RDF model is mandated. For example, in Putting Government Data online Tim Berners-Lee says (refering to Linked Data):

The essential message is that whatever data format people want the data in, and whatever format they give it to you in, you use the RDF model as the interconnection bus. That's because RDF connects better than any other model.

So, for me, Linked Data implies use of the RDF model - full stop. If you put data on the Web in other forms, using RSS 2.0 for example, then you are not doing Linked Data, you're doing something else! (Addendum: I note that Ian Davis makes this point rather better in The Linked Data Brand).

Which brings me back to my original question - "what do you call a Linked Data-like approach that doesn't use RDF?" - because, in some circumstances, adhering to a slightly modified form of the 4 principles, namely:

  1. use URIs as names for things
  2. use HTTP URIs so that people can look up those names
  3. when someone looks up a URI, provide useful information
  4. include links to other URIs. so that they can discover more things

might well be a perfectly reasonable and useful thing to do. As purists, we can argue about whether it is as good as 'real' Linked Data but sometimes you've just got to get on and do whatever you can.

A couple of people suggested that the phrase 'Web of data' might capture what I want. Possibly... though looking at Tom Coates' Native to a Web of Data presentation it's clear that his 10 principles go further than the 4 above.  Maybe that doesn't matter? Others suggested "hypermedia" or "RESTful information systems" or "RESTful HTTP" none of which strikes me as quite right.

I therefore remain somewhat confused. I quite like Bill de hÓra's post on "links in content", Snowflake APIs, but, again, I'm not sure it gets us closer to an agreed label?

In a comment on a post by Michael Hausenblas, What else?, Dan Brickley says:

I have no problem whatsoever with non-RDF forms of data in “the data Web”. This is natural, normal and healthy. Stastical information, geographic information, data-annotated SVG images, audio samples, JSON feeds, Atom, whatever.

We don’t need all this to be in RDF. Often it’ll be nice to have extracts and summaries in RDF, and we can get that via GRDDL or other methods. And we’ll also have metadata about that data, again in RDF; using SKOS for indicating subject areas, FOAF++ for provenance, etc.

The non-RDF bits of the data Web are – roughly – going to be the leaves on the tree. The bit that links it all together will be, as you say, the typed links, loose structuring and so on that come with RDF. This is also roughly analagous to the HTML Web: you find JPEGs, WAVs, flash files and so on linked in from the HTML Web, but the thing that hangs it all together isn’t flash or audio files, it’s the linky extensible format: HTML. For data, we’ll see more RDF than HTML (or RDFa bridging the two). But we needn’t panic if people put non-RDF data up online…. it’s still better than nothing. And as the LOD scene has shown, it can often easily be processed and republished by others. People worry too much! :)

Count me in as a worrier then!

I ask because, as a not-for-profit provider of hosting and Web development solutions to the UK public sector, Eduserv needs to start thinking about the implications of Tim Berners-Lee's appointment as an advisor to the UK government on 'open data' issues on the kinds of solutions we provide.  Clearly, Linked Data is going to feature heavily in this space but I fully expect that lots of stuff will also happen outside the RDF fold.  It's important for us to understand this landscape and the impact it might have on future services.

July 20, 2009

On names

There's was a brief exchange of messages on the jisc-repositories mailing list a couple of weeks ago concerning the naming of authors in institutional repositories.  When I say naming, I really mean identifying because a name, as in a string of characters, doesn't guarantee any kind of uniqueness - even locally, let alone globally.

The thread started from a question about how to deal with the situation where one author writes under multiple names (is that a common scenario in academic writing?) but moved on to a more general discussion about how one might assign identifiers to people.

I quite liked Les Carr's suggestion:

Surely the appropriate way to go forward is for repositories to start by locally choosing a scheme for identifying individuals (I suggest coining a URI that is grounded in some aspect of the institution's processes). If we can export consistently referenced individuals, then global services can worry about "equivalence mechanisms" to collect together all the various forms of reference that.

This is the approach taken by the Resist Knowledgebase, which is the foundation for the (just started) dotAC JISC Rapid Innovation project.

(Note: I'm assuming that when Les wrote 'URI' he really meant 'http URI').

Two other pieces of current work seem relevant and were mentioned in the discussion. Firstly the JISC-funded Names project which is working on a pilot Names Authroity Service. Secondly, the RLG Networking Names report.  I might be misunderstanding the nature of these bits of work but both seem to me to be advocating rather centralised, registry-like, approaches. For example, both talk about centrally assigning identifiers to people.

As an aside, I'm constantly amazed by how many digital library initiatives end up looking and feeling like registries. It seems to be the DL way... metadata registries, metadata schema registries, service registries, collection registries. You name it and someone in a digital library will have built a registry for it.

May favoured view is that the Web is the registry. Assign identifiers at source, then aggregate appropriately if you need to work across stuff (as Les suggests above).  The <sameAs> service is a nice example of this:

The Web of Data has many equivalent URIs. This service helps you to find co-references between different data sets.

As Hugh Glaser says in a discussion about the service:

Our strong view is that the solution to the problem of having all these URIs is not to generate another one. And I would say that with services of this type around, there is no reason.

In thinking about some of the issues here I had cause to go back and re-read a really interesting interview by Martin Fenner with Geoffrey Bilder of CrossRef (from earlier this year).  Regular readers will know that I'm not the world's biggest fan of the DOI (on which CrossRef is based), partly for technical reasons and partly on governence grounds, but let's set that aside for the moment.  In describing CrossRef's "Contributor ID" project, Geoff makes the point that:

... “distributed” begets “centralized”. For every distributed service created, we’ve then had to create a centralized service to make it useable again (ICANN, Google, Pirate Bay, CrossRef, DOAJ, ticTocs, WorldCat, etc.). This gets us back to square one and makes me think the real issue is - how do you make the centralized system that eventually emerges accountable?

I think this is a fair point but I also think there is a very significant architectural difference between a centralised service that aggregates identifiers and other information from a distributed base of services, in order to provide some useful centralised function for example, vs. a centralised service that assigns identifiers which it then pushes out into the wider landscape. It seems to me that only the former makes sense in the context of the Web.

July 09, 2009

e-Framework - time to stop polishing guys!

The e-Framework for Education and Research has announced a couple of new documents, the e-Framework Rationale and the e-Framework Technical Model, and have invited the community to comment on them.

In looking around the e-Framework website I stumbled on a definition for the 'Read' Service Genre. Don't know what a Service Genre is? Join the club... but for the record, they are defined as follows:

Service Genres describe generic capabilities expressed in terms of their behaviours, without prescribing how to make them operational.

The definition of Read runs to 9 screen's worth of fairly dense text in my browser window, summarised as:

Retrieve a known business object from a collection.

I'm sorry... but how is this much text of any value to anyone? What is being achieved here? There is public money (from several countries) being spent on this (I have no way of knowing how much) with very, very little return on investment. I can't remember how long the e-Framework activity has been going on but it must be of the order of 5 years or so? Where are the success stories? What things have happened that wouldn't have happened without it?

When you raise these kind of questions, as I did on Twitter, the natural response is, "please take the time to comment on our documents and tell us what is wrong". The trouble is, when something is so obviously broken, it's hard to justify taking time to comment on it. Or as I said on Twitter:

i'm sorry to be so blunt - i appreciate this is people's baby - but you're asking the community to help polish a 5 year old turd

it's time to kick the turd into the gutter and move on

(For those of you that think I'm being overly rude here, the use of this expression is reasonably common in IT circles!)

Of course, one is then asked to justify why the e-Framework is a 'turd' :-(.

For me, the lack of any concrete value speaks for itself. There comes a time when you just have to bite the bullet and admit that nothing is being achieved.  Trying to explain why something is broken isn't necessary - it just is! The JISC don't even refer to the e-Framework in their own ITTs anymore (presumably because they have given up trying to get projects to navigate the maze of complex terminology in order to contribute the odd Service Usage Model (SUM) or two). It doesn't matter... there are very few Service Usage Models anyway, and even fewer Service Expressions. In fact, as far as I can tell the e-Framework consists only of a half-formed collection of unusable 'service' descriptions.

So, how come this thing still has any life left in it?

July 01, 2009

RESTful Design Patterns, httpRange-14 & Linked Data

Stefan Tilkov recently announced the availability of the video of a presentation he gave a few months ago on design patterns (& anti-patterns) for REST. I recommend having a look at it, as it covers a lot of ground and has lots of useful examples, and I find his presentational style strikes a nice balance of technical detail and reflection. If you haven't got time to listen, the slides are also available in PDF (though I do think hearing the audio clarifies quite a lot of the content).

One of the questions that this presentation (and other similar ones) planted at the back of my mind is that of how some of the patterns presented might be impacted by the W3C TAG's httpRange-14 resolution and the Cool URIs conventions for distinguishing between what it calls "real world objects" and "Web documents", some of which describe those "real world objects". The Cool URIs document focuses on the implications of this distinction on the use of the HTTP protocol to request representations of resources, using the GET method, but does not touch on the question of whether/how it affects the use of HTTP methods other than GET.

In the early part of his presentation, Stefan introduces the notion of "representation" and the idea that a single resource may have multiple representations. Some of the resources referred to in his examples, like "customers" (slide 16 in the PDF; slide 16 in the video presentation), when seen from the perspective of the Cool URIs document, fall, I think, into the category of "real world objects" - things which may be described (by distinct resources) but are not themselves represented on the Web. So, following the Cool URIs guidelines, the URI of a customer would be a "hash URI" (URI with fragment id) or a URI for which the response to an HTTP GET request is a 303 redirect to the (distinct) URI of a document describing the customer.

But what about non-"read-only" interactions, and using methods other than GET? The third "design pattern" in the presentation is one for "resource creation" (slide 55 in the PDF; slide 98 in the video presentation). Here a client POSTs a representation of a resource to a "collection resource" (slide 50 in the PDF; slide 93 in the video presentation). The example of a "collection resource" used is a collection of customers, with the implication, I think, that the corresponding "resource creation" example would involve the posting of a representation of a customer, and the server responding 201 with a new URI for the customer.

I think (but I'm not sure, so please do correct me!) that the implication of the httpRange-14 resolution is that in this example, the "collection resource", the resource to which a POST is submitted, would be a collection of "customer descriptions", and the thing posted would be a representation of a customer description for the new customer, and the URI returned for the newly created resource would be the URI of a new customer description. And a GET for the URI of the description would return a representation which included the URI of the new customer.


(In the diagram above, is the URI of a customer; is the URI of a document describing that customer

And, finally, a GET for the URI of the customer (assuming it isn't a "hash URI") would - following the Cool URIs conventions - return a 303 redirect to the URI of the description.

There is some discussion of this is in a short post by Richard Cyganiak, and I think the comments there bear out what I'm suggesting here, i.e. that POST/PUT/DELETE are applied to "Web documents" and not to "real-world objects".

The comment by Leo Sauermann on that post refers to the use of a SPARQL endpoint for updates - the SPARQL Update specification certainly addresses this area. It talks in terms of adding/deleting triples to/from a graph, and adding/deleting graphs to/from a "graph store". I think the "adding a graph to a graph store" case is pretty close to the requirement that is being addressed by the "post representation to Collection Resource" pattern. But I admit I struggle slightly to reconcile the SPARQL Update approach with Stefan's design pattern - and indeed, he highlights the "endpoint" notion, with different methods embedded in the content of the representation, as part of one of his "anti-patterns", their presence typically being an indicator that an architecture is not really RESTful.

I should emphasise that I'm trying to avoid seeming to adopt a "purist" position here: I recognise that "RESTfulness" is a choice rather than an absolute requirement. However, interest in the RESTful use of HTTP has grown considerably in recent years (to the extent that some developers seem keen to apply the label "RESTful", regardless of whether their application meets the design constraints specified by the architectural style or not). And now the "linked data" approach - which of course makes use of the httpRange-14 conventions - also seems to be gathering momentum, not least following the announcement by the UK government that Tim Berners-Lee would be advising them on opening up government data (and his issuing of a new note in his Design Issues series focussed explicitly on government data). It seems to me it would be helpful to be clear about how/where these two approaches intersect, and how/where they diverge (if indeed they do!). Purely from a personal perspective, I would like to be clearer in my own mind about whether/how the sort of patterns recommended by Stefan apply in the post-httpRange-14/linked data world.

June 22, 2009

The rise of green?

I attended the Terena Networking Conference 2009 in Malaga a couple of weeks ago where several of the keynote talks focused on the environment, global warming, the impact that data centres and ICT more generally have on this, and the potential for cloud-based solutions to help.  The talks were all really interesting actually, though I must confess I was slightly confused as to why they appeared so heavily in that particular conference. I particularly liked Bill St. Arnaud's suggestion that facilities powered by sources of renewable energy (wind, wave or solar for example) will be subject to periods of non-availability, meaning that network routing architectures will have to be devised to move compute and storage resources around the network dynamically in response.

We've just announced our new Data Centre facility in Swindon and I initially commented (internally) that I felt the environmental statement to be a little weak. My interest is partly environmental (I want the organisation I work for to be as environmentally neutral as possible) and partly business-related (if the ICT green agenda is on the rise then one can reasonably expect that HEI business decisions around outsourcing will increasingly be made on the back of it). On that basis, I want Eduserv's messaging on environmental issues to be as transparent as possible. (I think this is true for any concerned individual working for any organisation - I'm not picking on my employer here).  It is worth noting that we now have an internal 'green team' with a remit to consider environmental issues across Eduserv as a whole.

Based on a completely trivial and unscientific sample of 4 'data centre'-related organisations in the UK - Edina, Eduserv, Mimas and ULCC - I make the following, largely unhelpful, observations...

  • It's marginally easier to find an accessibility statement than it is to find an environmental statement (not surprising I guess) though ULCC's Green Statement is quite prominent,
  • it's not easy for Joe Average (i.e. me!) to work out what the hell it all means in any practical sense.

On balance, and despite my somewhat negative comment above about its weakness, the fact that we are making any kind of statement about our impact on the environment is a step in the right direction.

June 19, 2009

Repositories and linked data

Last week there was a message from Steve Hitchcock on the UK [email protected] mailing list noting Tim Berners-Lee's comments that "giving people access to the data 'will be paradise'". In response, I made the following suggestion:

If you are going to mention TBL on this list then I guess that you really have to think about how well repositories play in a Web of linked data?

My thoughts... not very well currently!

Linked data has 4 principles:

  • Use URIs as names for things
  • Use HTTP URIs so that people can look up those names.
  • When someone looks up a URI, provide useful information.
  • Include links to other URIs. so that they can discover more things.

Of these, repositories probably do OK at 1 and 2 (though, as I’ve argued before, one might question the coolness of some of the http URIs in use and, I think, the use of cool URIs is implicit in 2).

3, at least according to TBL, really means “provide RDF” (or RDFa embedded into HTML I guess), something that I presume very few repositories do?

Given lack of 3, I guess that 4 is hard to achieve. Even if one was to ignore the lack of RDF or RDFa, the fact that content is typically served as PDF or MS formats probably means that links to other things are reasonably well hidden?

It’d be interesting (academically at least), and probably non-trivial, to think about what a linked data repository would look like? OAI-ORE is a helpful step in the right direction in this regard.

In response, various people noted that there is work in this area: Mark Diggory on work at DSpace, Sally Rumsey (off-list) on the Oxford University Research Archive and parallel data repository (DataBank), and Les Carr on the new JISC dotAC Rapid Innovation project. And I'm sure there is other stuff as well.

In his response, Mark Diggory said:

So the question of "coolness" of URI tends to come in second to ease of implementation and separation of services (concerns) in a repository. Should "Coolness" really be that important? We are trying to work on this issue in DSpace 2.0 as well.

I don't get the comment about "separation of services". Coolness of URIs is about persistence. It's about our long term ability to retain the knowledge that a particular URI identifies a particular thing and to interact with the URI in order to obtain a representation of it. How coolness is implemented is not important, except insofar as it doesn't impact on our long term ability to meet those two aims.

Les Carr also noted the issues around a repository minting URIs "for things it has no authority over (e.g. people's identities) or no knowledge about (e.g. external authors' identities)" suggesting that the "approach of dotAC is to make the repository provide URIs for everything that we consider significant and to allow an external service to worry about mapping our URIs to "official" URIs from various "authorities"". An interesting area.

As I noted above, I think that the work on OAI-ORE is an important step in helping to bring repositories into the world of linked data. That said, there was some interesting discussion on Twitter during the recent OAI6 conference about the value of ORE's aggregation model, given that distinct domains will need to layer their own (different) domain models onto those aggregations in order to do anything useful. My personal take on this is that it probably is useful to have abstracted out the aggregation model but that the hypothesis still to be tested that primitive aggregation is useful despite every domain needing own richer data and, indeed, that we need to see whether the way the ORE model gets applied in the field turns out to be sensible and useful.

March 20, 2009

Unlocking Audio

I spent the first couple of days this week at the British Library in London, attending the Unlocking Audio 2 conference.  I was there primarily to give an invited talk on the second day.

You might notice that I didn't have a great deal to say about audio, other than to note that what strikes me as interesting about the newer ways in which I listen to music online (specifically and Spotify) is that they are both highly social (almost playful) in their approach and that they are very much of the Web (as opposed to just being 'on' the Web).

What do I mean by that last phrase?  Essentially, it's about an attitude.  It's about seeing being mashed as a virtue.  It's about an expectation that your content, URLs and APIs will be picked up by other people and re-used in ways you could never have foreseen.  Or, as Charles Leadbeater put it on the first day of the conference, it's about "being an ingredient".

I went on to talk about the JISC Information Environment (which is surprisingly(?) not that far off its 10th birthday if you count from the initiation of the DNER), using it as an example of digital library thinking more generally and suggesting where I think we have parted company with the mainstream Web (in a generally "not good" way).  I noted that while digital library folks can discuss identifiers forever (if you let them!) we generally don't think a great deal about identity.  And even where we do think about it, the approach is primarily one of, "who are you and what are you allowed to access?", whereas on the social Web identity is at least as much about, "this is me, this is who I know, and this is what I have contributed". 

I think that is a very significant difference - it's a fundamentally different world-view - and it underpins one critical aspect of the difference between, say, Shibboleth and OpenID.  In digital libraries we haven't tended to focus on the social activity that needs to grow around our content and (as I've said in the past) our institutional approach to repositories is a classic example of how this causes 'social networking' issues with our solutions.

I stole a lot of the ideas for this talk, not least Lorcan Dempsey's use of concentration and diffusion.  As an aside... on the first day of the conference, Charles Leadbeater introduced a beach analogy for the 'media' industries, suggesting that in the past the beach was full of a small number of large boulders and that everything had to happen through those.  What the social Web has done is to make the beach into a place where we can all throw our pebbles.  I quite like this analogy.  My one concern is that many of us do our pebble throwing in the context of large, highly concentrated services like Flickr, YouTube, Google and so on.  There are still boulders - just different ones?  Anyway... I ended with Dave White's notions of visitors vs. residents, suggesting that in the cultural heritage sector we have traditionally focused on building services for visitors but that we need to focus more on residents from now on.  I admit that I don't quite know what this means in practice... but it certainly feels to me like the right direction of travel.

I concluded by offering my thoughts on how I would approach something like the JISC IE if I was asked to do so again now.  My gut feeling is that I would try to stay much more mainstream and focus firmly on the basics, by which I mean adopting the principles of linked data (about which there is now a TED talk by Tim Berners-Lee), cool URIs and REST and focusing much more firmly on the social aspects of the environment (OpenID, OAuth, and so on).

Prior to giving my talk I attended a session about iTunesU and how it is being implemented at the University of Oxford.  I confess a strong dislike of iTunes (and iTunesU by implication) and it worries me that so many UK universities are seeing it as an appropriate way forward.  Yes, it has a lot of concentration (and the benefits that come from that) but its diffusion capabilities are very limited (i.e. it's a very closed system), resulting in the need to build parallel Web interfaces to the same content.  That feels very messy to me.  That said, it was an interesting session with more potential for debate than time allowed.  If nothing else, the adoption of systems about which people can get religious serves to get people talking/arguing.

Overall then, I thought it was an interesting conference.  I suspect that my contribution wasn't liked by everyone there - but I hope it added usefully to the debate.  My live-blogging notes from the two days are here and here.

March 03, 2009

What became of the JISC IE?

Having just done an impromptu, and very brief, 1:1 staff development session about Z39.50 and OpenURL for a colleague here at Eduserv, I was minded to take a quick look at the JISC Information Environment Technical Standards document. (I strongly suspect that the reason he was asking me about these standards, before going to a meeting with a potential client, was driven by the JISC IE work.)

As far as I can tell, the standards document hasn't been updated since I left UKOLN (more than 3 years ago). On that basis, one is tempted to conclude that the JISC IE has no relevance, at least in terms of laying out an appropriate framework of technical standards. Surely stuff must have changed significantly in the intervening years? There is no mention of Atom, REST, the Semantic Web, SWORD, OpenSocial, OpenID, OAuth, Google Sitemaps, OpenSearch, ... to name but a few.

Of course, I accept that this document could simply now be seen as irrelevant?  But, if so, why isn't it flagged as such?  It's sitting there with my name on it as though I'd checked it yesterday and the JISC-hosted Information Environment pages still link to that area as though it remains up to date.  This is somewhat frustrating, both for me as an individual and, more importantly, for people in the community trying to make sense of the available information.

Odd... what is the current status of the JISC IE, as a framework of technical standards?

February 12, 2009

Clouds on the Horizon

I note that the NMC's Horizon Report for 2009 was published back in January, available as both a PDF file and in a rather nice Web version supporting online commentary.

The report discusses 6 topics (mobiles, cloud computing, geo-everything, the personal Web, semantic-aware applications, and smart objects) each of which it suggests will have an impact over the next 5 years.

I was drawn to the cloud computing section first, partly because of other interests here and partly because Larry Johnson (one of the co-PIs on the Horizon project) spoke on this very topic at our symposium last year, about which the report says:

Educational institutions are beginning to take advantage of ready-made applications hosted on a dynamic, ever-expanding cloud that enable end users to perform tasks that have traditionally required site licensing, installation, and maintenance of individual software packages. Email, word processing, spreadsheets, presentations, collaboration, media editing, and more can all be done inside a web browser, while the software and files are housed in the cloud. In addition to productivity applications, services like Flickr (, YouTube (, and Blogger (, as well as a host of other browser-based applications, comprise a set of increasingly powerful cloud-based tools for almost any task a user might need to do.

Cloud-based applications can handle photo and video editing (see for photos and for videos, to name just two examples) or publish presentations and slide shows (see or Further, it is very easy to share content created with these tools, both in terms of collaborating on its creation and distributing the finished work. Applications like those listed here can provide students and teachers with free or low-cost alternatives to expensive, proprietary productivity tools. Browser-based, thin-client applications are accessible with a variety of computer and even mobile platforms, making these tools available anywhere the Internet can be accessed. The shared infrastructure approaches embedded in the cloud computing concept offer considerable potential for large scale experiments and research that can make use of untapped processing power.

We are just beginning to see direct applications for teaching and learning other than the simple availability of platform-independent tools and scalable data storage. This set of technologies has clear potential to distribute applications across a wider set of devices and greatly reduce the overall cost of computing. The support for group work and collaboration at a distance embedded in many cloud- based applications could be a benefit applicable to many learning situations.

However, the report also notes that a level of caution is necessary:

The cloud does have certain drawbacks. Unlike traditional software packages that can be installed on a local computer, backed up, and are available as long as the operating system supports them, cloud- based applications are services offered by companies and service providers in real time. Entrusting your work and data to the cloud is also a commitment of trust that the service provider will continue to be there, even in face of changing market and other conditions. Nonetheless, the economics of cloud computing are increasingly compelling. For many institutions, cloud computing offers a cost-effective solution to the problem of how to provide services, data storage, and computing power to a growing number of Internet users without investing capital in physical machines that need to be maintained and upgraded on-site.

The report goes on the provide some examples of use.

I doubt that there will be much here that is exactly 'news' to regular readers of this blog (though this is not true for other sections of the report which cover areas that we don't really deal with here). On the other hand, it is good to see this stuff laid out in a relatively mainstream publication. I remain bemused (as I was last year) at the relatively low level of coverage this report gets in the UK and wonder, in a kind of off the top of my head way, whether a UK or European version of this report would be a worthwhile activity?

February 06, 2009

Open orienteering

It seems to me that there is now quite a general acceptance of what the 'open access' movement is trying to achieve. I know that not everyone buys into that particular world-view but, for those of us that do, we know where we are headed and most of us will probably recognise it when we get there. Here, for example, is Yishay Mor writing to the open-science mailing list:

I would argue that there's a general principle to consider here. I hold that any data collected by public money should be made freely available to the public, for any use that contributes to the public good. Strikes me as a no-brainer, but of course - we have a long way to go.

A fairly straight-forward articulation of the open access position and a goal that I would thoroughly endorse.

The problem is that we don't always agree as a community about how best to get there.

I've been watching two debates flow past today, both showing some evidence of lack of consensus in the map reading department, though one much more long-standing than the other. Firstly, the old chestnut about the relative merits of central repositories vs. institutional repositories (initiated in part by Bernard Rentier's blog post, Institutional, thematic or centralised repositories?) but continued on various repository-related mailing lists (you know the ones!). Secondly, a newer debate about whether formal licences or community norms provide the best way to encourage the open sharing of research data by scientists and others, a debate which I tried to sum up in the following tweet:

@yishaym summary of open data debate... OD is good & needs to be encouraged - how best to do that? 1 licences (as per CC) or 2 social norms

It's great what can be done with 140 characters.

I'm more involved in the first than the second and therefore tend to feel more aggrieved at lack of what I consider to be sensible progress. In particular, I find the recurring refrain that we can join stuff back together using the OAI-PMH and therefore everything is going to be OK both tiresome and laughable.

If there's a problem here, and perhaps there isn't, then it is that the arguments and debates are taking place between people who ultimately want the same thing. I'm reminded of Monty Python's Life of Brian:

Brian: Excuse me. Are you the Judean People's Front?
Reg: Fuck off! We're the People's Front of Judea

It's like we all share the same religion but we disagree about which way to face while we are praying. Now, clearly, some level of debate is good. The point at which it becomes not good is when it blocks progress which is why, generally speaking, having made my repository-related architectural concerns known a while back, I try and resist the temptation to reiterate them too often.

Cameron Neylon has a nice summary of the licensing vs. norms debate on his blog. It's longer and more thoughtful than my tweet! This is a newer debate and I therefore feel more positive that it is able to go somewhere. My initial reaction was that a licensing approach is the most sensible way forward but having read through the discussion I'm no longer so sure.

So what's my point? I'm not sure really... but if I wake up in 4 years time and the debate about licensing vs. norms is still raging, as has pretty much happened with the discussion around CRs vs. IRs, I'll be very disappointed.

January 14, 2009

If you're API and you know it clap your hands

There's a question doing the rounds in JISC circles at the moment, courtesy of the 'Good APIs’ project being led by UKOLN, which is essentially, "What makes a good API?":

The ‘Good APIs’ project aims to provide JISC and the sector with information and advice on best practice which should be adopted when developing and consuming APIs.

I have to confess that the question doesn't make a great deal of sense to me to be honest? Or at least, a good deal more contextual information is required before a sensible answer can be made - is HTTP considered to be an API in the context of this work for example?  If nothing else, the question tends to lean towards an SOA way of thinking IMHO.

A more fruitful line of inquiry might be, "What makes a good architectural approach?", in which case, it seems to me, REST might be a sensible answer.

Anyway... if you think you know what makes a good API, you can provide the answer on a postcard via the project's survey on

December 18, 2008

JISC IE and e-Research Call briefing day

I attended the briefing day for the JISC's Information Environment and e-Research Call in London on Monday and my live-blogged notes are available on eFoundations LiveWire for anyone that is interested in my take on what was said.

Quite an interesting day overall but I was slightly surprised at the lack of name badges and a printed delegate list, especially given that this event brought together people from two previously separate areas of activity. Oh well, a delegate list is promised at some point.  I also sensed a certain lack of buzz around the event - I mean there's almost £11m being made available here, yet nobody seemed that excited about it, at least in comparison with the OER meeting held as part of the CETIS conference a few weeks back.  At that meeting there seemed to be a real sense that the money being made available was going to result in a real change of mindset within the community.  I accept that this is essentially second-phase money, building on top of what has gone before, but surely it should be generating a significant sense of momentum or something... shouldn't it?

A couple of people asked me why I was attending given that Eduserv isn't entitled to bid directly for this money and now that we're more commonly associated with giving grant money away rather than bidding for it ourselves.

The short answer is that this call is in an area that is of growing interest to Eduserv, not least because of the development effort we are putting into our new data centre capability.  It's also about us becoming better engaged with the community in this area.  So... what could we offer as part of a project team? Three things really: 

  • Firstly, we'd be very interested in talking to people about sustainable hosting models for services and content in the context of this call.
  • Secondly, software development effort, particularly around integration with Web 2.0 services.
  • Thirdly, significant expertise in both Semantic Web technologies (e.g. RDF, Dublin Core and ORE) and identity standards (e.g. Shibboleth and OpenID).

If you are interested in talking any of this thru further, please get in touch.

November 07, 2008

Some (more) thoughts on repositories

I attended a meeting of the JISC Repositories and Preservation Advisory Group (RPAG) in London a couple of weeks ago.  Part of my reason for attending was to respond (semi-formally) to the proposals being put forward by Rachel Heery in her update to the original Repositories Roadmap that we jointly authored back in April 2006.

It would be unfair (and inappropriate) for me to share any of the detail in my comments since the update isn't yet public (and I suppose may never be made so).  So other than saying that I think that, generally speaking, the update is a step in the right direction, what I want to do here is rehearse the points I made which are applicable to the repositories landscape as I see it more generally.  To be honest, I only had 5 minutes in which to make my comments in the meeting, so there wasn't a lot of room for detail in any case!

Broadly speaking, I think three points are worth making.  (With the exception of the first, these will come as no surprise to regular readers of this blog.)


There may well be some disagreement about this but it seems to me that the collection of material we are trying to put into institutional repositories of scholarly research publications is a reasonably well understood and measurable corpus.  It strikes me as odd therefore that the metrics we tend to use to measure progress in this space are very general and uninformative.  Numbers of institutions with a repository for example - or numbers of papers with full text.  We set targets for ourselves like, "a high percentage of newly published UK scholarly output [will be] made available on an open access basis" (a direct quote from the original roadmap).  We don't set targets like, "80% of newly published UK peer-reviewed research papers will be made available on an open access basis" - a more useful and concrete objective.

As a result, we have little or no real way of knowing if are actually making significant progress towards our goals.  We get a vague feel for what is happening but it is difficult to determine if we are really succeeding.

Clearly, I am ignoring learning object repositories and repositories of research data here because those areas are significantly harder, probably impossible, to measure in percentage terms.  In passing, I suggest that the issues around learning object repositories, certainly the softer issues like what motivates people to deposit, are so totally different from those around research repositories that it makes no sense to consider them in the same space anyway.

Even if the total number of published UK peer-reviewed research papers is indeed hard to determine, it seems to me that we ought to be able to reach some kind of suitable agreement about how we would estimate it for the purposes of repository metrics.  Or we could base our measurements on some agreed sub-set of all scholarly output - the peer-reviewed research papers submitted to the current RAE (or forthcoming REF) for example.

A glass half empty view of the world says that by giving ourselves concrete objectives we are setting ourselves up for failure.  Maybe... though I prefer the glass half full view that we are setting ourselves up for success.  Whatever... failure isn't really failure - it's just a convenient way of partitioning off those activities that aren't worth pursuing (for whatever reason) so that other things can be focused on more fully.  Without concrete metrics it is much harder to make those kinds of decisions.

The other issue around metrics is that if the goal is open access (which I think it is), as opposed to full repositories (which are just a means to an end) then our metrics should be couched in terms of that goal.  (Note that, for me at least, open access implies both good management and long-term preservation and that repositories are only one way of achieving that).

The bottom-line question is, "what does success in the repository space actually look like?".  My worry is that we are scared of the answers.  Perhaps the real problem here is that 'failure' isn't an option?

Executive summary: our success metrics around research publications should be based on a percentage of the newly published peer-reviewed literature (or some suitable subset thereof) being made available on an open access basis (irrespective of how that is achieved).

Emphasis on individuals

Across the board we are seeing a growing emphasis on the individual, on user-centricity and on personalisation (in its widest sense).  Personal Learning Environments, Personal Research Environments and the suite of 'open stack' standards around OpenID are good examples of this trend.  Yet in the repository space we still tend to focus most on institutional wants and needs.  I've characterised this in the past in terms of us needing to acknowledge and play to the real-world social networks adopted by researchers.  As long as our emphasis remains on the institution we are unlikely to bring much change to individual research practice.

Executive summary: we need to put the needs of individuals before the needs of institutions in terms of how we think about reaching open access nirvana.

Fit with the Web

I written and spoken a lot about this in the past and don't want to simply rehash old arguments.  That said, I think three things are worth emphasising:


Global discipline-based repositories are more successful at attracting content than institutional repositories.  I can say that with only minimal fear of contradiction because our metrics are so poor - see above :-).  This is no surprise.  It's exactly what I'd expect to see.  Successful services on the Web tend to be globally concentrated (as that term is defined by Lorcan Dempsey) because social networks tend not to follow regional or organisational boundaries any more.

Executive summary: we need to work out how to take advantage of global concentration more fully in the repository space.

Web architecture

Take three guiding documents - the Web Architecture itself, REST, and the principles of linked data.  Apply liberally to the content you have at hand - repository content in our case.  Sit back and relax. 

Executive summary: we need to treat repositories more like Web sites and less like repositories.

Resource discovery

On the Web, the discovery of textual material is based on full-text indexing and link analysis.  In repositories, it is based on metadata and pre-Web forms of citation.  One approach works, the other doesn't.  (Hint: I no longer believe in metadata as it is currently used in repositories).  Why the difference?  Because repositories of research publications are library-centric and the library world is paper-centric - oh, and there's the minor issue of a few hundred years of inertia to overcome.  That's the only explanation I can give anyway.  (And yes, since you ask... I was part of the recent movement that got us into this mess!). 

Executive summary: we need to 1) make sure that repository content is exposed to mainstream Web search engines in Web-friendly formats and 2) make academic citation more Web-friendly so that people can discovery repository content using everyday tools like Google.

Simple huh?!  No, thought not...

I realise that most of what I say above has been written (by me) on previous occasions in this blog.  I also strongly suspect that variants of this blog entry will continue to appear here for some time to come.

October 24, 2008

Thoughts on FOTE

Pete's recent post about DC-2008 reminds me that I never wrote up my thoughts on FOTE 2008, the Future of Technology in Education event organised recently by Tim Bush and colleagues at ULCC.

It's probably too late now to do any kind of lengthy write-up of the day.  Suffice to say that there were some good talks and some bad talks.  See my live-blog on eFoundations LiveWire if you want to know more but my closing remark pretty much sums it up:

AP: summing up... i think there have been some very good talks today and some very bad talks.  on balance, i think it has been a good and useful day.  as i mentioned, i think that suppliers (with the exception of the Huddle guy) have a tendency to talk down to the audience - we know the world is changing - what we want is help in thinking about how to respond

One of the best talks (actually, probably one of the best talks I'll see this year) was by Miles Metcalfe of Ravensbourne College.  I include the slides below but you won't get the full effect without the very humorous presentation that went with it.

His closing slide, which he suggested was originally going to be entitled "Like I trust the fuckers", poked fun at the proposal that institutions can trust external service providers such as Google (who were also presenting at FOTE) to provide services critical to their business. Having said that, the earlier parts of the talk also acknowledged that individuals within institutions can now make many of those kinds of outsourcing decisions for themselves - irrespective of institutional policy.

The whole thrust of the presentation was to ask, "where does that leave institutional computing service provision?". We used to think that at least the institutional network was sacred (and to a large extent it still is) but with the advent of widely available 3G, of which the iPhone is the classic example, even that is being nibbled away at.

On the same slide, Metcalfe also argues that moving towards OpenID makes more sense than Shibboleth (in the current environment), a view that I tend to share, albeit acknowledging some of the usability issues that still have to be resolved.

All in all it was a very entertaining and thought-provoking presentation, and well worth turning up at the event to see.

(Note that slides from most of the other presentations during the day are also available.)

An evolutionary view of cloud computing

Quote from the tail end of a special report about 'cloud computing' in the Economist:

Irving Wladawsky-Berger, a technology visionary at IBM, compares cloud computing to the Cambrian explosion some 500m years ago when the rate of evolution speeded up, in part because the cell had been perfected and standardised, allowing evolution to build more complex organisms. Similarly, argues Mr Wladawsky-Berger, the IT industry spent much of its first few decades developing the basic components of computing. Now that these are essentially standardised, bigger and more diverse systems can emerge. “For computing to reach a higher level”, he says, “its cells had to be commoditised.”

Thanks to @PaulMiller on Twitter for the pointer.

October 20, 2008

Building in the cloud

Via @timoreilly on Twitter, I note that George Reese has written a short piece about developing cloud applications, Considerations in Building Web Applications for the Amazon Cloud, his four areas of consideration being licensing, persistence, horizontal scalability and disaster recovery. 

The licensing one caught my eye because it wasn't what I was expecting based on the concerns about cloud computing that I've heard raised at educational events in the recent past.

Reese's point about licensing is that if you've built an application running on hardware you own, using licensed software for which the cost is based on numbers of CPUs, and you try to move it into the cloud then you may be in for a shock because the answer to the question, "how many CPUs is my application now running on?" is non-trivial to answer.  On that basis, open source solutions may get a "shot in the arm" from any kind of mass movement into the cloud (as noted by Tim Bray at FOWA in London recently).

On the other hand, in education, the licensing issues I hear raised most frequently around cloud computing have to do with the terms and conditions under which you are storing material in the cloud and whether there are IPR, privacy/data protection and data recovery considerations that need to be taken into account.

Both concerns are valid of course.  I guess these different perspectives come from a developer-centric vs. a policy maker-centric view of the world.

October 16, 2008

Buzzwords as a service

In his joint session (with Jeff Barr) on cloud computing at FOWA last week, Tony Lucas from xCalibre introduced three acronyms:

  • SaaS - Software as a Service
  • PaaS - Platform as a Service
  • IaaS - Infrastructure as a Service

From top to bottom they are (approximately)... applications hosted in the cloud (e.g. Google Apps), cloud-based platforms on which you can build your own stuff, and cloud-based low-level (typically virtualised) compute infrastructure (e.g. Amazon EC2).

I appreciate that these aren't particularly new terms or anything... but I confess that two of the three were new to me (and on that basis may be new to others). 

Sitting under(?) these three I guess you have managed hosting (the phrase Hardware as a Service (HaaS) has been superceeded by IaaS, at least according to Wikipedia).  And then there's Data as a Service (DaaS), where data is hosted as a service provided to customers across the Internet.

All of which leads to Everything as a Service (EaaS, XaaS or aaS), the concept of being able to call up re-usable, fine-grained software components across a network.

I have to confess that I find the distinctions between these terms somewhat blurry... but that is pretty inevitable I guess.  Picking on something at random, the Talis Platform for example... I have no real sense for whether it is best described as SaaS, DaaS, PaaS or IaaS?  Perhaps it doesn't matter.

I particularly like the fact that the Wikipedia entry for PaaS currently says, "This article or section appears to contain a large number of buzzwords".  Quite!

October 14, 2008

Thoughts on FOWA

I spent Thursday and Friday last week at the Future of Web Apps Expo (FOWA) in London, a pretty good event overall in retrospect and certainly one that left me with a lot to think about.  I'm not going to write up any of the individual talks in any kind of detail - videos of all the talks are now available, as is my (lo-fat) live-blogging from the event - but I do want to touch on several thoughts that occurred to me while I was there.

Firstly, the somewhat mundane issue of wireless access at conferences...  I say mundane because one might expect that providing wireless access to conference delegates should have become pretty much routine by now - a bit like making sure that tea and coffee are available?  But that didn't seem to be the case at this event.  My (completely unscientific and non-exhaustive) experience was that everyone with a Mac in the venue had no trouble with the wifi network but that everyone with a PC seemed to have little or no connectivity.  (Actually, that's not quite true, I did find one person with a PC laptop who had no problem using the wifi).  Whatever... my poor little brand new EeePC didn't get on the network for any significant period of time at any point in the two days :-(

P1070969So, OK, we all know that Macs are better than PCs in every way but I was amazed at the stark difference that seemed to be in evidence during this particular event.

The lack of wifi connectivity was of particular annoyance to yours truly, since I was hoping to live-blog the whole event.  In the end, I used the mobile interface to Coveritlive via my iPhone over a 3G connection to cover some of the sessions - not an easy thing to do given the soft-keyboard but actually an interesting experiment in what is possible with mobile technology these days.  By day 2 of the conference my typing on the soft-keyboard was getting pretty good - though not always very accurate.

The conference had quite a young and entrepreneurial feel to it - I'm not saying that everyone there was under 30 but there were a lot of aspects to the style of the conference that were in stark contrast to the rather more... err... traditional feel of many 'academic' conferences.  I don't want to argue that age and attitude are necessarily linked (for obvious reasons) but the entrepreneurial thing is particularly interesting I think because it is something that has a non-obvious fit with how things happen in education.  Being an entrepreneur is about taking risks - risks with money more than anything I guess.  I don't quite know how this translates into the academic space but my gut feeling is that it would be worth thinking about.  Note that I'm not thinking about money here - I'm thinking about attitude.  What I suppose I mean is our ability to break out of a conservative approach to things - our ability to overcome the inertia associated with how things have been done in the past.

I realise that there are plenty of startups in the education space - Huddle springs to mind as a good current example of a company that seems to have the potential to cross the education/enterprise divide - my concern is more about what happens inside educational institutions.  A 24 year-old can run the world's biggest social network yet we don't see similar things happening in education... do we?  Calling all 24 year old directors of university computing services...

Is that something we should worry about?  Is it something we should applaud?  Does it matter?  Is it an inevitable consequence of the kinds of institutions we find in education?

Funding by JISC, Eduserv and the like should be about encouraging an entrepreneurial approach to the use of ICT in education but I'm not sure it fully succeeds in doing that.  Project funding is by its nature a largely low risk activity - except at the transition points between funding.  There are exceptions of course - there are people that I would say are definitely educational entrepreneurs (in the attitude sense) but they tend to be the exception rather than the rule overall and even where they exist I think it is very difficult for them to have a significant impact on wider practice.

The entrepreneurial theme came out strongly in several sessions. Tim Bray's keynote for example, my favorite talk of the conference, where he focused on what startups need to do to react to the current economic climate.  And in a somewhat contrived debate about 'work-life balance' where Jason Calacanis argued that "it's ok to be average but not in my company" - ever heard that in the education sector?  I'm not saying that his was the right attitude, and to a large extent he was playing devil's advocate anyway, but these are the kinds of issues that we tend to be pretty shy about even discussing in education.

Unfortunately, the whole entrepreneurial thing brings with it a less positive facet, in that there tends to be a "it's not what you know, but who you know" kind of attitude.  This comes out both face-to-face (people looking over your shoulder for a more interesting person to talk to - yes, I know I'm a boring git, thank you!) and in people's use of social networks.  The people I'd unfollow first on Twitter are those who spend the most time tweeting who they are meeting up with next. Yawn.

Much of FOWA was split into two parallel tracks - a developer track and a business track.  I spent most time in the former.  Overall I was slightly disappointed with this track and found the talks that I went to in the business track slightly better.  It's not that there weren't a lot of good talks in the developer track - just that they didn't seem like good developer talks.  My take was that many of them would have been more appropriate for managers who wanted to get up to speed on the latest technology-related issues and thinking.  It didn't seem to me that real developers (of which I'm not one) would have got much from many of those talks - they were too superficial or something.

Now, clearly, running a developer track aimed at 700-odd delegates is not an easy task - I certainly wouldn't be able to do any better - but more than anything you've got to try and inspire people to go away and learn about and deploy new technology, not try and teach it directly during the conference.  For whatever reason, it didn't feel like there was much really new technological stuff to get inspired about.  This is not the conference organiser's fault - just timing I guess.  The business track on the other hand had plenty to focus on, given the current economic climate.

As you'd expect, there was also a lot about the cloud over the two days.  Most of it positive... but interestingly (to me, since it was the first time I'd heard something like this) there was an impassioned plea from the floor (during the joint important bits of cloud computing slot by Jeff Barr and Tony Lucas) for consumers of cloud computing to band together in order to put pressure on suppliers for better terms and conditions, prices, and the like.

Overall then... FOWA was a different kind of event to those I normally attend and to be honest it was a very last-minute decision to go at all but I did so because there were some interesting looking speakers that I wanted to see.  It wasn't a total success (hey, what is!?) but on balance I'm really glad I went and I got a lot out of it.

P1070970Two final mini-thoughts...

Firstly, virtual economies came up a couple of times.  Once in the Techcrunch Pitch at the end of the first day, where one of the panel (sorry, I forget who) suggested that virtual economies would increasingly replace subscriptions as the way services are supported.  I think he was referring to services outside the virtual world space where these kinds of economies are regularly found - Second Life being the best known example of a virtual world economy - though I must confess that I don't really understand how it might work in other contexts.  Then again in Tim Bray's talk where he noted the sales of iPhone applications at very low unit costs (e.g. 59p a time) - a model that will become increasingly sustainable and profitable because of the growing size of the mobile market.  (I appreciate that these two aren't quite the same - but think they are close enough to be of passing interest).

Secondly, I had my first chance to play on a Microsoft Surface - a kind of table-sized iPhone multi-user touch interface.  These things are beautiful to watch and interact with, and the ability to almost literally touch digital content is amazing, with obvious possibilities in the education and cultural sectors, as well as elsewhere.  Costs are prohibitive at the moment of course - but that will no doubt change.  I can't wait!

P1070972 And finally... to that Mark Zuckerberg interview at the end of day 2.  I really enjoyed it actually.  Despite being well rehearsed and choreographed I thought he came across very well.  He certainly made all the right kinds of noises about making Facebook more open though whether it is believable or not remains to be seen!

It's easy to knock successful people - particularly ones so young.  But at the end of the day I suspect that many of us simply wish we could achieve half as much!?

September 17, 2008

Thoughts on ALT-C 2008

A few brief reflections on ALT-C 2008, which took place last week.

Overall, I thought it was a good event.  Hot water in my halls of residence rooms would have been an added bonus but that's a whole other story that I won't bother you with here.

I particularly enjoyed the various F-ALT sessions (the unofficial ALT-C Fringe), which were much better than I expected.  Actually, I don't know why I say that, since I didn't really know what to expect, but whatever... it seemed to me that those sessions were the main place in the conference where there was any real debate (at least from what I saw).  Good stuff and well done to the F-ALT organisers.  I hope we see better engagement between the fringe and the main conference next year because this is something that has the potential to bring real value to all conference delegates.

I also enjoyed the conference keynotes, though I think all three were somewhat guilty of not sufficiently tailoring their material to the target audience and conference themes.  I also suspect that my willingness to just sit back and accept the keynotes at face value, particularly the one by Itiel Dror, shows what little depth of knowledge I have in the 'learning' space - I know there were people in the audience who wanted to challenge his 'cognitive psychologist' take on learning as we understand it.

I live-blogged all three, as well as some of the other sessions I attended:

I should say that I live-blog primarily as a way of keeping my own notes of the sessions I attend - it's largely a personal thing.  But it's nice when I get a few followers watching my live note taking, especially when they chip in with useful comments and questions that I can pass on to the speakers, as happened particularly well with the "identity theft in VLEs" session.

I should also mention the ALT-C 2008 social network which was delivered using Crowdvine and which was, by all accounts, very successful.  Having been involved with a few different approaches to this kind of thing, I think Crowdvine offers a range of functionality that is hard to beat.  At the time of writing, over 440 of the conference's 500+ delegates had signed up to Crowdvine!  This is a very big proportion, certainly in my experience.  But it's not just about the number of sign-ups... it's the fact that Crowdvine was actively used to manage people's schedules, engage in debates (before, during and after the conference) and make contacts that is important.  I think it would be really interesting to do some post-conference analysis (both quantitative and qualitative) about how Crowdvine was really used - not that I'm offering to do it you understand.  The findings would be interesting when thinking about future events.

The conference dinner was also a triumph... it was an inspired choice to ask local FE students to both cater for us and serve the meal, and in my opinion it resulted in by far the best conference meal I've had for a long time.  Not that the conference meal makes or breaks a conference - but it's a nice bonus when things work out well :-).  Thinking about it now, it seems to me that more academic/education conferences should take kind of approach - certainly if this particular meal was anything to go by - not just in terms of the meal, but also for other aspects of the event.  How about asking media students to use a variety of new media to make their own record of a conference for example.  These are win-win situations it seems to me.

Finally, the slides from my sponsor's session are now available on Slideshare:

As I mentioned previously, the point of the talk was to think out loud about the way in which the availability of notionally low-cost or free Web 2.0 services (services in the cloud) impacts on our thinking about service delivery, both within institutions and in community-based service providers such as Eduserv.  What is it that we (institutions and service providers 'within' the community) can offer that external providers can't (sustainability, commitment to preservation of resources, adherence to UK law, and so on)?  What do they offer that we don't, or that we find it difficult to offer?  I'm thinking particularly of the user-experience here! :-) How do we make our service offerings compelling in an environment where 'free' is also 'easy'?

In the event, I spent most time talking about Eduserv - which is not necessarily a bad thing since I don't think we are a well understood organisation - and there was some discussion at the end which was helpful (to me at least).  But I'm not sure that I really got to the nub of the issue.

This is a theme that I would certainly like to return to.  The Future of Technology in Education (FOTE2008) event being held in London on October 3rd will be one opportunity.  It's now sold out but I'll live-blog if at all possible (i.e. wireless network permitting) - see you there.

September 08, 2008

Both sides, now - are we builders or users of services in the cloud?

"I've looked at clouds from both sides now
From up and down, and still somehow
It's cloud illusions I recall
I really don't know clouds at all"
(Joni Mitchell – Both sides, now)

As an educational charity with a mission to "realise the benefits of ICT for learners and researchers", Eduserv must constantly ask itself how to make the best of its available resources for the benefit of the community.

What kinds of services should we be offering? What maximises our impact?

The answers lie in the expectations, needs and desires of the education community itself. But in an environment where the "cloud" offers us an increasing array of apparently very high quality, very low cost services, those answers are not necessarily easy to come by.

These issues affect not just Eduserv, but funding bodies, institutions and individuals in the community.

For those of you at ALT-C 2008, I'll be thinking about this stuff out loud in our sponsor's session - Wednesday, 11.00am in the Conference Auditorium 1. You are very welcome to come and help me shape my thoughts.

August 28, 2008

Lost in the JISC Information Environment?

Tony Ross, writing in the current issue of Ariadne, Lost in the JISC Information Environment, gives a nice summary of some of the issues around the JISC Information Environment technical architecture.  He hits a lot of nails on the head, despite using that diagram twice in the same article!  On that basis he seems anything but lost.  That said, one is kind of left feeling "so what" by the end.

The IE does not, can not, have existence. The term is a description of a set of interoperable services and technologies which have been created to enhance the resource discovery access and use for users in HE/FE; it exists to aid conceptualisation of this ephemeral subject. No more, no less.

Well hang on, either it exists or it doesn't (that paragraph says both!) but the creation of services tends to indicate, to me, that it does.  Whatever... it's an angels on the head of a pin type of discussion, best moved to the pub.

In his response, Paul Walk suggests that all models are wrong, but some are useful, the JISC IE architecture being one of the useful ones, and broadly speaking I agree, though one might argue that the prescriptive nature of the architecture (or at least, the prescriptive way in which it has often been interpreted) has got us to a place where we no longer want to be?  And, leaving the diagram to one side for a moment, the technical standards certainly were intended to be prescriptive.  I can remember discussions in UKOLN about the relative merits of such an approach vs. a more ad hoc, open-ended and experimental one but I argued at the time that we wouldn't build a coherent environment if we just let people do whatever the hell they wanted.  Maybe I was wrong?

Referring to the "myth" quotes in the Ariadne article, I don't have a problem with trying to be prescriptive but at the same time recognising that what ends up happening on the ground may well be different in detail to what the blueprint says.

Looking back, I do find it somewhat frustrating that the diagram came to epitomise everything that the JISC IE was about whilst much of the associated work, the work on UML use case analysis for example (which was very much focused on end-user needs), largely got forgotten.  Such is life I suppose?  But let's ignore that... the work certainly had impact, and by and large it was for the good.  Think about when the DNER effort first started, way back at the end of the last century (yes, really!), a time when any notions of machine to machine interaction were relatively immature and not widely accepted (certainly not in the way they are today).  The idea that any service provider would care about exposing content in a machine-readable form for other bits of software to consume and display somewhere else on the Web was alien to many in the community.  Remember Lorcan Dempsey talking about us needing to overcome the information brandscape? :-)

If nothing else, the IE architecture helped contribute to the idea that there is value in going beyond the simple building of an HTML Web site.  In that sense, it had a Web 2.0 flavour to it well before Web 2.0 was a gleam in anybody's eye.  The world has come a long way since then... a long, long way.  The IE architecture got things wrong in the same way that most digital library activities got things wrong - it didn't anticipate the way the Web would evolve and it adopted a set of technologies that, with the exception of RSS, were rather non-Web-friendly in their approach (OAI-PMH, Z39.50, SRW/SRU, OpenURL and so on).  The Web Architecture, the Semantic Web, Web 2.0 (and in particular the emergence of the Web as a social environment and the predominance of user-generated content), REST and so on never really got a look in - nor could they, since the work on the JISC IE came too early for them in many ways.

With hindsight, the appearance of registries down the left-hand side was probably a mistake - what we missed, again, was that the Web would become the only 'registry' that anyone would need.  But it is the largely exclusive focus on resource discovery through metadata rather than full-text, as though the Web was a library of physical books, that is the JISC IE's most harmful legacy - a legacy that we still see being played out in discussions around open access repositories today.  If I've done harm to the community through the work on the JISC IE, then that is where I feel it has been worst.  Remember that in the early days of the JISC IE the primary aim was around the discovery, access and use of commercially valuable content that was not being exposed (to anyone) for full-text indexing, so the initial focus on metadata was probably excusable. Unfortunately, the impact of that design choice has now gone well beyond that.

The addition of the 'indexes' box to the diagram (it wasn't in the original versions) was recognition that Google was doing something that the IE could not - but it was too little, too late - the damage had been done.  That's not to say that metadata doesn't have a place.  It certainly does.  Metadata is about much more than resource discovery after all, and in any case, it brings things to resource discovery that are not possible with full-text indexing alone.  But we need balance in the way it is adopted and used and, looking back, I don't think we properly had such balance in the JISC IE.

Towards the end of his blog entry Paul says:

Turning to the reworked diagram which Tony offers at the end of his piece - I presume this is not offered too seriously as an alternative but is, rather, meant simply to show an ‘non-deterministic’ version. It is interesting that this version seems to miss what is, in my view, the most important issue with the original, in the way it simply copies the same depiction of the client desktop/browser.

That diagram was created by me, initially for a small group of JISC-people but then re-used in the presentation that Tony cites.  It originally had the caption, "what the user sees" and was preceded by the usual diagram with the caption, "what the architecture says".  So yes, some humour was intended.  But the serious point was that every variant of every box on the diagram necessarily offers a human Web interface, irrespective of whether it also presents a machine-interface, so the user just sees a Web of stuff, some of which is joined together behind the scenes in various ways.

As to that "client desktop/browser" icon!?  Yes, it's use was somewhat simplistic, even at the time - certainly now, where we have a much wider range of mobile and other client devices.  But as with the rest of the diagram, there was/is a tension between drawing something that people can easily engage with vs. drawing something that correctly captures more abstract principles.

On balance, I think the UK HE and FE community is better off for having had that diagram and the associated work, around which a useful and significant programme of activities has been able to be built by the JISC, as described by Paul.  Does the diagram remain useful now?  I'm less sure about that tbh.

August 26, 2008

Web futures - who ordered the pragmatic semantic organism with fries?

In the first of his Ten Futures (which is an interesting read by the way) Stephen Downes suggests that the Semantic Web will never happen and that we need the Pragmatic Web instead:

Forget about the Semantic Web. Whether or not it ever gets built, you can be sure that we will be complaining about it. Because while the Semantic Web gives us meaning, it doesn’t give us context. It will give us what we can get from an encyclopedia, but not what we can get from phoning up our best buddy.

The pragmatic web, by contrast, is all about context. Your tools know who you are, what you’re doing, who you’ve been talking to, what you know, where you want to go, where you are now, and what the weather is like outside.

Whilst I remain unsure about the likely arrival date of the Semantic Web or indeed whether it will ever arrive at all (in any real terms), and whilst I quite like the Pragmatic Web label, I can't agree with him about the cause.  Success or failure of the Semantic Web does not rest with context - there is plenty of semantic work in that area it seems to me, typically referred to as the graph or the social graph.  As Tim BL said at the end of last year:

Its not the Social Network Sites that are interesting -- it is the Social Network itself. The Social Graph. The way I am connected, not the way my Web pages are connected.

The Semantic Web's problem, if indeed it has one, has to do with complexity and a high cost/benefit ratio.  That said, even given my world view rather than Stephen's, I accept that the 'Pragmatic Web' label still works well as a nice alternative to 'Semantic Web'.

And while I'm on the subject of both the future and the Semantic Web, Kevin Kelly's TED talk, Predicting the next 5,000 days of the web, makes use of a lot of Semantic Web thinking and suggests that the Web of the future is not just going to be today's Web "but only better" but that it will be:

  • smarter,
  • more personalized,
  • more ubiquitous

and the price of that will be more transparency (of us to the Web).  The Web as "an organism" that we interact with.  Hmmm, nice...  I sense a Hollywood blockbuster coming on.

August 20, 2008

The right time for outsourcing

Paul Walk has an interesting post, “Did Google just make me look like an idiot?”, questioning whether the time is right for universities to start outsourcing services in a Web 2.0, SaaS kind of way.

As Paul notes, this was very much the focus for our symposium earlier on this year.

To be slightly frivolous, I have a gut feeling that no time is the right time but I very strongly agree with Paul that the question needs to be asked, especially given the possibility of global recession and its potential impact on Web 2.0 business models.  The sustainability of whoever you choose to outsource to has to be a major consideration in any decision - whether at an institutional, departmental or personal level.

July 18, 2008

AtomPub Video Tutorial

From Joe Gregorio of Google, a short video introduction to the Atom Publishing Protocol (RFC 5023):

Which, following Tim Bray's exhortation, I shall henceforth refer to only as "AtomPub".

July 02, 2008

Can you show them a better way?

I don't know... you wait ages for a competition and then two come along at once :-)

Hot on the heels of Elsevier's Article 2.0, the Show us a better way competition asks people to come up with ideas that improve health, education, justice or society at large through the innovative re-use of existing public information:

The UK Government wants to hear your ideas for new products that could improve the way public information is communicated. The Power of Information Taskforce is running a competition on the Government's behalf, and we have a £20,000 prize fund to develop the best ideas to the next level. You can see the type of thing we are are looking for here.

To support the competition,a range of public data sources are being made available (though many require a free Click-Use PSI Licence to re-use Crown copyright information) including crime statistics, information about schools and health care services, map data from the Ordnance Survey and so on.

To help show the kinds of things they are looking for (and possibly to prevent the wheel being re-invented too much!) a list of examples is provided.

June 27, 2008

Doin' the Museum Mash

The Eduserv Foundation has a programme of sponsorship and I'm pleased to say that we sponsored the recent Mashed Museum 2008 event, just prior to the UK Museums and the Web Conference 2008 in Leicester.  Mike Ellis, a colleague at Eduserv who organised the day, has just released this video of what people got up to - taking museum-related data sources and connecting them together in a range of different ways. In Mike's words, the remit of the day was to:

"...give us an environment free from political or monetary constraints. The not IPR, copyright, funding or museum politics. Our energies will be channeled into embracing the 'new web': envisaging, demonstrating and (hopefully) building some lightweight distributed applications."

For more information, visit

June 25, 2008

Putting the 'new' into New South Wales' schools - outsourcing email to Google

So schools in New South Wales become the latest area of education to migrate their email from an 'in-house' solution (Outlook and Exchange in this case) to Google.  And a pretty sizable migration it is by the sounds of it - Australian schools dump Outlook for Gmail for 1.3 million students.


On the face of it, the transition brings significant benefits, not just in terms of cost (the Education Department are reputedly saving something like 11 million quid over three years) but also in the amount of storage available to each student (6GB instead of 35MB).  But the Google garden isn't to everyone's taste and one certainly hears some grumblings about limitations and performance issues with the Google Apps offering.  How significant they are I'm not sure?

Outsourcing email to Google was one of the areas touched on in discussions at our symposium last month.  As some of you will already know, we are currently funding a series of snapshots tracking the use of Second Life within UK higher and further education.  I wonder if now would be a good time to start doing a similar series of snapshots around institutional outsourcing of services like email?

June 18, 2008

Interview with Stefan Tilkov on REST

One of the commentators/bloggers I most enjoy reading/hearing on the topic of the REST architectural style and resource-oriented approaches is Stefan Tilkov. Stefan was the guest interviewee in a recent episode of the Software Engineering Radio podcast, and the result is a very clear introduction to the principles of REST and its implementation in the HTTP protocol, and an entertaining conversation around the value of the approach.

June 16, 2008

Web 2.0 and repositories - have we got our repository architecture right?

For the record... this is the presentation I gave at the Talis Xiphos meeting last week, though to be honest, with around 1000 Slideshare views in the first couple of days (presumably thanks to a blog entry by Lorcan Dempsey and it being 'featured' by the Slideshare team) I guess that most people who want to see it will have done so already:

Some of my more recent presentations have followed the trend towards a more "picture-rich, text-poor" style of presentation slides.  For this presentation, I went back towards a more text-centric approach - largely because that makes the presentation much more useful to those people who only get to 'see' it on Slideshare and it leads to a more useful slideshow transcript (as generated automatically by Slideshare).

As always, I had good intentions around turning it into a slidecast but it hasn't happened yet, and may never happen to be honest.  If it does, you'll be the first to know ;-) ...

After I'd finished the talk on the day there was some time for Q&A.  Carsten Ulrich (one of the other speakers) asked the opening question, saying something along the lines of, "Thanks for the presentation - I didn't understand a word you were saying until slide 11".  Well, it got a good laugh :-).  But the point was a serious one... Carsten admitted that he had never really understood the point of services like arXiv until I said it was about "making content available on the Web".

OK, it's a sample of one... but this endorses the point I was making in the early part of the talk - that the language we use around repositories simply does not make sense to ordinary people and that we need to try harder to speak their language.

May 20, 2008

Streaming media from the symposium now available

All the streaming media from the Eduserv Foundation Symposium 2008 is now available via  See the symposium presentations page for a full listing.

May 16, 2008

JISC IE blog, Val Doonican and *that* diagram


A very quick note to say that the JISC Information Environment team are now blogging... good stuff.

And while I'm on the subject of the JISC IE, I should perhaps note that that diagram still seems to be doing the rounds.  At the UKOLN 30th celebration Paul Walk invited me to say a few words about it from the floor, at which point I stood up and joked that I'd left UKOLN to get away from the diagram and had no intention of saying anything about it!  Not quite true actually... I had prepared something to say about both the diagram and the work that went on around it but in the end I felt that the day needed something lighter and more anecdotal, so I sat down, stage front, mic in hand and said "I want to tell you a story" instead.

Ukoln30 This resulted in much piss being extracted by various of my current and ex-colleagues by way of reference to Val Doonican - Rachel Bruce (one of the authors of the new blog above) even went so far as to send me a photo of good old Val with the caption, "picture of you" :-)  I'll tell the story here, for posterity, another time - I know you're desperate to hear it.  I don't have the time right now.

Anyway, I digress... back to the diagram.  So having turned down an opportunity to talk about the diagram that day, I tuned into the live video stream from the JISC conference a few weeks later and found Sir Ron Cooke (Chair of the JISC) speaking to it in-front of an audience of practically millions.

Hey, if nothing else, it's certainly been good value.  If someone did a tag cloud of Powerpoint slides based on the number of times they'd been shown in JISC-related events, I reckon that diagram would be pretty sizable.

May 14, 2008

Symposium thoughts

Some brief thoughts on the symposium which happened last Thursday...

Overall, it seemed to go well I think, with relatively few hiccups.  We had one near miss - a Mac which decided not to work 5 minutes before its owner was due to go on stage.  Oh, and the air-conditioning at the venue, which appeared to be totally broken.  Other than that, things went pretty smoothly.  Note that we've still got to read thru the evaluation forms in detail, so it may be that I've got this all wrong and people hated it! :-)

The talks seemed to be well received and I'm grateful to all the speakers for turning up and doing their stuff.  One of the problems with both chairing and getting involved in the technical side of the event (which I love doing) is that I find it very difficult to concentrate on what the speakers are talking about.  It's also difficult to do any real socialising :-(  We're currently waiting for the media from our streaming company to turn up at which point I'll watch all the talks again.  We expect this to be available via the Web site by Friday this week.

Photos from the event are available in the following video:

Alternatively, if you prefer your photos in a more static form, look on Flickr.

We had 180 delegates registered for the day.  For one reason or another about 25 of those were unable to make it, though some sent replacements in their place.  This is understandable - given illness, travel problems and so on - though somewhat frustrating.  With the delegate day rate we were paying for the venue it is perhaps worth noting that it probably represents something like £2000 wasted investment on our part.

We streamed the whole event live on the Web and about 60 additional people watched throughout the day.  In his opening talk, Larry Johnson of the NMC noted in passing that, given the rise of free video streaming services like, it is no longer necessary to pay large sums of money to stream events live on the Web.  He may be right, though my personal view is that it is worth paying to get an experienced camera operator, sound engineer and vision editor.  Decent sound is, above all, absolutely critical in my experience.  On this occasion we chose to stream in Windows Media format (.wmv).  In part this was because the streaming company assured us that, given the greater number of Windows machines out there, this approach would lead to fewer compatibility problems than streaming in Quicktime (.mov).  I was also a little worried that if we streamed in a format compatible with Second Life, our virtual audience would fork into two sections (those in-world and those not) whereas we wanted them in one place to maximise the social aspects of the live chat facility (see later).  On reflection this was perhaps a bit of a hard line approach.  I certainly lost some sleep the night before the event, worried that Mac users wouldn't be able to see the stream.  However, as far as I can tell, this wasn't a problem for people.  I am aware of one issue, noted by a couple of bloggers including Joe Blogg - the streamed video wasn't good enough to read many of the slides.  Apologies for this.  With slightly more forward planning we could have got all the slides uploaded to Slideshare before the event (though I should note that at least two of the speakers were still tweaking there slides right up to the start of their talks!). 

Anyway, there's definitely room for some improvement in that area.

The use of Coveritlive as a live chat facility for both the delegates in the room on wireless and the remote delegates watching the video stream also seemed to be very successful.  Again, as chair, I didn't get as involved in this as I would have liked, but the virtual discussion certainly seemed to be flowing for most of the day.  We had a member of Eduserv staff in the venue (Mike Ellis) monitoring the chat for possible questions and asking them from the floor during the question and answer sessions at the end of each talk.  Furthermore, my co-author on this blog Pete Johnston, spent the whole day moderating the chat from the back of the room - a thankless task if ever there was one, especially seeing as moderation wasn't really necessary, but one that was imposed on us by the use of Coveritlive as the chat tool. Note that Coveritlive is not really designed for this purpose, it is really a live-blogging tool, so we were stretching its capabilities in rather unusual directions.  However, its ease of use (for delegates rather than for Pete) proved successful.  We also displayed the live chat on the screen in the venue during the Q&A sessions and this really helped to bring the remote audience into the room.

In his blog entry about the event, David Harrison (who spoke during the afternoon session) noted how odd it felt to be giving a presentation at the British Library in London, while his colleagues back in Cardiff answered questions in the live chat as he was speaking.

At this point I should stress that the video stream and the use of Coveritlive were completely separate.  We chose to co-locate them on the same Web page - such is the beauty of small tools, loosely joined - but they were unrelated and separate tools.  Coveritlive doesn't actually do streaming as far as I know.

Finally, we offered a Ning social network for the event, offering a chance for delegates, both real and virtual, to create a profile and share information about their interests - a kind of virtual delegate list if you like.  This worked reasonably well - at the last count there were 119 delegates signed up, though I don't currently know the balance between real and virtual delegates.

In his presentation, Chris Adie questioned whether Eduserv were taking risks in hosting such a network on an external service (because of data protection concerns primarily).  While I think there are valid issues to think about in this area, I don't think we were taking a risk at all - part of the point of using an external tool was to emphasise the topic of the day.  Indeed, I tend to think there is a greater danger in the paralysis that comes from being over sensitive to concerns about data protection, privacy and other legal matters.  Chris also made this point.

In his blog, Michael Webb questions the value brought by the social network, particularly for a one day symposium.  I have to confess I'm not sure either.  Some delegates reported using it to see who else was going to be at the event beforehand and it was very cheap to set up - free actually, though we paid a small amount to get rid of the Google adverts for a month.  So I'm not sure it matters too much.

Anyway, clearly there are things we could have done better - and we hope to do so next time - but all in all I'm pleased with the way the event turned out.  I hope those who took part feel likewise.

May 02, 2008

Inside out - symposium update

Our annual symposium takes place next Thursday (8th May) at the British Library in London:

Inside Out: What do current Web trends tell us about the future of ICT provision for learners and researchers?

The day is intended to give people a chance to think about the potentially disruptive impact of current Web trends on the provision and use of ICT services within the educational sector, particularly higher education, and will feature talks from a range of perspectives including:

  • Larry Johnson (New Media Consortium, US),
  • Bobbie Johnson (Guardian),
  • Jem Stone (BBC),
  • Geoffrey Bilder (CrossRef),
  • Chris Adie (University of Edinburgh),
  • David Harrison (UCISA / Cardiff University)
  • and Grainne Conole (Open University).

I'm really looking forward to it... though right now things are a bit hectic with all the final preparations and what not.

The event is full but we are planning on streaming all the talks live on the Web, coupled with a live chat facility that will allow delegates (both those in the room and those watching the video stream) to discuss the presentations and ask questions of the speakers.

Presentations start at 10.30am, UK time.

Please note that it is not necessary to register to watch the video stream or take part in the live chat.  However, we have set up a social network for the event and we encourage you to sign up for this if you are planning on attending (either in person or via the video stream).  Doing so will give all delegates a better feel for who is in the audience.

Also note that all the presentations and streamed media will be made available after the event for those not able to see it live.

Finally, we are encouraging people to blog and Twitter about the event - if you do, please use the event tag, efsym2008.

For those with an interest in such things, we are using I S Media to do the live video streaming for us - the same people we used for the symposium last year.  The live chat facility is being done using Coveritlive, which is really a live blogging tool but it supports quite a nice moderated comment facility, so we are going to use it slightly outside its intended space.  It should work OK though.  The social network has been built using NIng.  I'm very impressed with the flexibility and power of NIng and I strongly suspect that would be possible to do an awful lot with it (given the necessary time!) - you basically get full access to the source code if you want it.  Despite that, in some ways I would have preferred to use Crowdvine for our social network, which I think offers a really nicely put together suite of social tools aimed specifically at conference delegates - but unfortunately, the costs were prohibitive for us given the money we are spending on other parts of the event.

Anyway, I'll be keeping my fingers firmly crossed between now and next Thursday and hoping that everything runs smoothly.



eFoundations is powered by TypePad