May 18, 2012

Big Data - size doesn't matter, it's the way you use it that counts least, that's what they tell me!

IMG_6404Here's my brief take on this year's Eduserv Symposium, Big Data, big deal?, which took place in London last Thursday and which was, by all the accounts I've seen and heard, a pretty good event.

The day included a mix of talks, from an expansive opening keynote by Rob Anderson to a great closing keynote by Anthony Joseph. Watching either, or both, of these talks will give you a very good introduction to big data. Between the two we had some specifics: Guy Coates and Simon Metson talking about their experiences of big data in genomics and physics respectively (though the latter also included some experiences of moving big data techniques between different academic disciplines); a view of the role of knowledge engineering and big data in bridging the medical research/healthcare provision divide by Anthony Brookes; a view of the potential role of big data in improving public services by Max Wind-Cowie; and three shorter talks immediately after lunch - Graham Prior talking about big data and curation, Devin Gafney talking about his 140Kit twitter-analytics project (which, coincidentally, is hosted on our infrastructure) and Simon Hodson talking about the JISC's big data activities.

All of the videos and slides from the day are avaialble at the links above. Enjoy!

For my part, there were several take-home messages:

  • Firstly, that we shouldn’t get too hung up on the word ‘big’. Size is clearly one dimension of the big data challenge but of the three words most commonly associated with big data - volume, velocity and variety - it strikes me that volume is the least interesting and I think this was echoed by several of the talks on the day.
  • In particular, it strikes me there is some confusion between ‘big data’ and ‘data that happens to be big’ - again, I think we saw some of this in some of the talks. Whilst the big data label has helped to generate interest in this area, it seems to me that its use of the word 'big' is rather unhelp in this respect. It also strikes me that the JISC community, in particular, has a history of being more interested in curating and managing data than in making use of it, whereas big data is more about the latter than the former.
  • As with most new innovations (though 'evolution' is probably a better word here) there is a temptation to focus on the technology and infrastructure that makes it work, particularly amoungst a relatively technical audience. I am certainly guilty of this. In practice, it is the associated cultural change that is probably more important. Max Wind-Cowie’s talk, in particular, referred to the kinds of cultural inertia that need to be overcome in the public sector, on both the service provider side and the consumer side, before big data can really have an impact in terms of improving public services. Attitudes like, "how can a technology like big data possibly help me build a *closer* and more *personal* relationship with my clients?" or "why should I trust a provider of public services to know this much about me?" seem likely to be widespread. Though we didn't hear about it on the day, my gut feeling is that a similar set of issues would probably apply in education were we, for example, to move towards a situation where we make significant use of big data techniques to tailor learning experiences at an individual level. My only real regret about the event was that I didn't find someone to talk on this theme from an education perspective.
  • Several talks refered to the improvements in 'evidence-based' decision-making that big data can enable. For example, Rob Anderson talked about poor business decisions being based on poor data currently and Anthony Brookes discussed the role of knowledge engineering in improving the ability of those involved in front-line healthcare provision to take advantage of the most recent medical research. As Adam Cooper of CETIS argues in Analytics and Big Data - Reflections from the Teradata Universe Conference 2012, we need to find ways to ask questions that have efficiency or effectiveness implications and we need to look for opportunities to exploit near-real-time data if we are to see benefits in these areas.
  • I have previously raised the issue of possible confusion, especially in the government sector, between 'open data' and 'big data'. There was some discussion of this on the day. Max Wind-Cowie, in particular, argued that 'open data' is a helpful - indeed, a necessary - step in encouraging the public sector to move toward a more transparent use of public data. The focus is currently on the open data agenda but this will encourage an environment in which big data tools and techniques can flourish.
  • Finally, the issue that almost all speakers touched on to some extent was that of the need to grow the pool of people who can undertake data analytics. Whether we choose to refer to such people as data scientists, knowledge engineers or something else there is a need for us to grow the breadth and depth of the skills-base in this area and, clearly, universities have a critical role to play in this.

As I mentioned in my opening to the day, Eduserv's primary interest in Big Data is somewhat mundane (though not unimportant) and lies in the enabling resources that we can bring to the communities we serve (education, government, health and other charities), either in the form of cloud infrastructure on which big data tools can be run or in the form of data centre space within which physical kit dedicated to Big Data processing can be housed. We have plenty of both and plenty of bandwidth to JANET so if you are interested in working with us, please get in touch.

Overall, I found the day enlightening and challenging and I should end with a note of thanks to all our speakers who took the time to come along and share their thoughts and experiences.

[Photo: Eliot Hall, Eduserv]

April 02, 2012

Big data, big deal?

Some of you may have noticed that Eduserv's annual symposium is happening on May 10. Once again, we're at the Royal College of Physicians in London and this year we are looking at big data, appropriate really... since 2012 has been widely touted as being the year of big data.

Here's the blurb for our event:

Data volumes have been growing exponentially for a long while – so what’s new now? Is Big Data [1] just the latest hype from vendors chasing big contracts? Or does it indeed present wholly new challenges and critical new opportunities, and if so what are they?

The 2012 Symposium will investigate Big Data, uncovering what makes it different from what has gone before and considering the strategic issues it brings with it: both how to use it effectively and how to manage it.  It will look at what Big Data will mean across research, learning, and operations in HE, and at its implications in government, health, and the commercial sector, where large-scale data is driving the development of a whole new set of tools and techniques.

Through presentations and debate delegates will develop their understanding of both the likely demands and the potential benefits of data volumes that are growing disruptively fast in their organisation.

[1] Big Data is "data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn't fit the strictures of your database architectures. To gain value from this data, you must choose an alternative way to process it."  What is big data?  Edd Dumbill, O'Reilly Radar, Jan 2012

As usual, the event is free to attend and will be followed by a drinks reception.

You'll note that we refer to Edd Dumbill's What is big data? article in order to define what we mean by big data and I recommend reading this by way of an introduction for the day. The Wikipedia page for Big data provides a good level of background and some links for further reading. Finally, O'Reilly's follow-up publication, Planning for Big Data - A CIO's Handbook to the Changing Data Landscape is also worth a look (and is free to download as an e-book).

You'll also note that the defining characteristics of big data include not just 'size' (though that is certainly an important dimension) but also 'rate of creation and/or change', and 'structural coherence'. These are typically known as the three Vs - "volume (amount of data), velocity (speed of data in/out), and variety (range of data types, sources)". In looking around for speakers, my impression is that there is a strong emphasis on the first of these in people's general understanding about what big data means (which is not surprising given the name) and that in the government sector in particular there is potential confusion between 'big data' and 'open data' and/or 'linked data' which I think it would be helpful to unpick a little - big data might be both 'open' and 'linked' but isn't necessarily so.

So, what do we hope to get out of the day? As usual, it's primarily a 'bringing people up to speed' type of event. The focus will be on our charitable beneficiaries, i.e. organisations working in the area of 'public good' - education, government, health and the charity sector - though I suspect that the audience will be mainly from the first of these. The intention is for people to leave with a better understand of why big data might be important to them and what impact it might have in both strategic and practical terms on the kinds of activities they undertake.

We have a range of speakers, providing perspectives from inside and outside of those sectors, both hands-on and more theoretical - this is one of the things we always try and do at our sympoisia. Our sessions include keynotes by Anthony D. Joseph (Chancellor's Associate Professor in Computer Science at University of California, Berkeley) and Rob Anderson (CTO EMEA, Isilon Storage Division at EMC) as well as talks by Professor Anthony J Brookes (Department of Genetics at the University of Leicester), Dr. Guy Coates (Informatics Systems Group at The Wellcome Trust Sanger Institute) and Max Wind-Cowie (Head of the Progressive Conservatism Project, Demos - author of The Data Dividend).

By the way... we still have a couple of speaking slots available and are particularly interested in getting a couple of short talks from people with practical experience of working with big data, either using Hadoop or something else. If you are interested in speaking for 15 minutes or so (or if you know of someone who might be) please get in touch. Thanks. Another area that I was hoping to find a speaker to talk about, but haven't been able to so far, is someone who is looking at the potential impact of big data on learning analytics, either at the level of a single institution or, more likely, at a national level. Again, if this is something you are aware of, please get in touch. Crowd-sourced speakers FTW! :-)

All in all, I'm confident that this will be an interesting and informative day and a good follow-up to last year's symposium on the cloud - I look forward to seeing you there.

October 03, 2011

UMF Cloud Pilot update

As a quick update on where we are with work on the UMF Cloud Pilot, our emerging cloud offer for UK HE, here are the slides that I spoke to at the All Hands Meeting (AHM 2011) in York last week.

I wrote a longer trip report from the conference over on our UMF Cloud PIlot blog (which is where I do most of my blogging these days) so I won't repeat it here. It was a good conference, though more 'hands-on' than 'strategic', and much smaller, than I remember.

As to the UMF Cloud PIlot, the key things to note at this stage are that we'll be opening up the infrastructure to the UMF SaaS projects shortly, we'll make announcements about our pricing in a few weeks time, and the vCloud service will be more openly available from the end of the year (or early next year) - with an OpenStack Compute offer coming on stream beyond that.

June 24, 2011

Our HE Cloud Pilot - the real work begins

I'm spending an increasing amount of my time thinking about/talking about/in meetings about "the cloud" - well, specifically the HE Cloud Pilot that we are now putting in place as part of the JISC UMF Shared Services and the Cloud Programme.

This is both intimidating and exciting...

Intimidating because a lot of this stuff is new to me and, if I'm completely honest, the hardware side of things really doesn't do it for me. Luckily, we have some pretty good people now assigned to designing and building this stuff and I'm impressed at the progress being made very quickly. But there are times, as happened in an architectural design meeting this afternoon, where the discussion gets too much for me and I have to leave (about networking in this particular case). I figure that if I've sat in a meeting for 30 minutes and not understood a single word, and therefore couldn't summarise for someone else what that meeting was about, then I'm probably not being useful :-).

Exciting, because I think we have the potential to do something really useful and innovative here. That's not to say that success is guaranteed and I think that there are very real challenges for us in terms of building something that is of value to the education community and that has a sustainable business model associated with it. We spent an afternoon earlier this week (in the form of a premortem) brainstorming a pretty long list of everything that had gone wrong with the project from the perspective of 12 months hence - technology, usability, resourcing, business models and so on - and then thinking about the kinds of actions we wished we'd have taken to mitigate against them. It was a very useful exercise.

In delivering the infrastructure we'll start by targetting the various SaaS projects that are now being funded by the JISC (we've already made contact with most of them). But we'll also keep one eye on a much wider application of the infrastructure, to meet adminstrative and academic compute and storage requirements of institutions and individual researchers more generally because, long term, that is where our success lies.

Note that 'cloud' is a little narrow here since, on the compute side, our offer will include physical servers (though I think we'll try and discourage this to a certain extent), virtualisation in the form of VMware vCloud as well as true cloud in the form of OpenStack Compute.

I want to try and put something public in place where we can document the design decisions we are taking as we take them - partly as a sounding board for the community. I'm not sure where yet, though I suspect it won't be here. If you have an interest in what we are doing, keep an eye out.

May 25, 2011

Virtualisation and the cloud - the Eduserv Symposium 2011, a brief review

"I know what you're thinking... but don't worry, this is a talk on cloud computing and being lost is normal".

So started the Eduserv Symposium 2011 two weeks ago, with an opening keynote by Simon Wardley (Leading Edge Forum) that was anything but confusing. In fact, I heard several people comment after his talk that it was the best overview of cloud computing that they'd seen and that they couldn't wait for the video to be made available (below) so that they could show it to their senior management team as a way of highlighting the business and technological changes that are driving, and being driven by, the cloud.

Simon's opening talk was followed by a series of talks - Chris Cobb (Roehampton University), Kenji Takeda (University of Southampton), Phil Richards (Loughborough University) and Terry Harmer (Belfast e-Science Centre) - which provided an institutional perspective, both strategic and practical, on the ways in which shared services, virtualisation and the cloud might impact on administrative and research computing in UK universities and colleges.

And, in the middle of these, there were a short series of lightning (10 minute) talks by Rachel Bruce (JISC), Kevin Ashley (DCC), Dan Perry (JANET (UK)) and Matt Johnson (Eduserv) covering some aspects of what the JISC UMF Programme hopes to achieve over the next 12 months or so.

Rounding off the day was a great closing keynote by Armando Fox (UC Berkeley) (below), providing a US academic perspective that looked in some detail at the thinking around 'cloud' being done within the department of Electrical Engineering and Computer Science at UC Berkeley.

All the videos and presentation slides from the day are now available online.

The day was a significant challenge to put together. We wanted something that would try and cover the breadth of activity happening in the cloud space currently, particularly as it relates to the UK education community. We wanted something that would introduce people to what is likely to be happening over the next 12 months or so as part of the UMF Programme. And we wanted something that would challenge our thinking (both 'our' as in the education community and 'our' as in Eduserv) around the development and use of the cloud. Did we succeed? Yes, I think so... and, overall, I'm very happy with the way the day panned out. I'm particularly grateful to all our speakers.

The talks threw down some pretty significant challenges for those of us, like Eduserv, who are interested in building cloud services targeted at the education sector. I went in to the day with a question: is the education community a consumer of infrastructural services in the cloud or can it also be a sustainable provider? (By 'sustainable' I mean something other than 'funded centrally by public money'). Do I now know the answer? No. But I am significantly more aware of the challenges and issues that lie behind that question. This is something that I will return to in a future post.

In lieu of that deeper discussion, I have three rather more superficial take-home messages from the day. Firstly, that adopting the cloud (i.e. moving to commodity computing) is at least as much about changes to management structure, market competition and disruption as it is about technology (though I must admit that I don't quite understand how this might play out in, say, higher education). Secondly, that the adoption of cloud infrastructure should not be seen primarily as a way of saving money. Rather it is a way of enabling innovation and allowing things to be done that were not possible before. And thirdly, that the sustainability issues (for educational cloud providers) are at least as much about the ability to keep up with a rapidly changing and highly innovative environment as they are about price.

For us, the next 12 months look really interesting. As part of the UMF Programme we are receiving funding to build some pilot infrastructure for use by HE (some of which will be true 'cloud' infrastructure). This initial funding has, to a certain extent, reduced the risks for us in getting invloved in this space but we'll still have to work very hard to create a sustainable service in the longer term. Whatever else the UMF Programme achieves over the next 12 months or so, what I hope our involvement can do is to help build a better understanding of some of the issues and challenges laid out at the symposium.

As to my, as yet, unanswered question about whether an organisation like Eduserv can be a sustainable provider of cloud infrastructure... ask me again in 12 months :-)

May 11, 2011

Symposium rush

We're in the final stages of preparing for our 2011 symposium, Virtualisation and the Cloud: Realising the benefits of shared infrastructure, which takes place in London tomorrow so things are a bit hectic as you might expect.  If you are not registered, there's a live video stream, as per usual, once again provided by Switch New Media. (Note: worldwide timings for the video stream are available).

In preparing some notes for my introduction to the day, I've been thinking about what makes this year's symposium feel rather different to those in previous year's, and it does feel different (at least to me). I think there are two factors. Firstly, the changing environment within which HE in the UK finds itself having to operate (see Christine Sexton's notes from today's HE Futures Forum for a lengthier explanation of that but none of this will come as a surprise to anyone who hasn't been under a rock for the last year or so) and secondly that we (Eduserv) now find ourselves increasingly drawn to providing 'cloud' solutions as part of our service portfolio. This includes the HE Cloud pilot that we'll be providing as part of the JISC's UMF Programme (and about which you'll hear more tomorrow) but is certainly not limited to the education space. Developing cloud offerings is fairly high-risk stuff for us, not least in the sense that it will change the way we deliver and run our own infrastructure, but also because the business models that might sustain this kind of activity are unclear, at best, in the education space currently.

All of which makes this year's symposium much more relevent to future 'core' business for us than has been the case previously.

We are also wanting to use the symposium as a way of engaging with the community across the range of services we provide. So, although the main focus of the day is cloud and shared services, we've put together a mini-expo in the lunch room where there will be representatives from all parts of Eduserv (Web Hosting and Development, the Data Centre, Licence Negotiation and OpenAthens). If you have any questions for us, or are interested in any of the services we offer, there'll be plenty of people on hand to help you out.

Finally, in light of the potential interest in UMF, we've also asked JANET (UK) and the DCC, both of whom are contributing significantly to the UMF Programme (and both of whom will be giving lightning talks immediately after lunch), to take part in the mini-expo as well.

I'm really looking forward to the event tomorrow. I think we have a great set of talks, including two keynotes (Simon Wardley and Armando Fox) that bring in perspectives from outside UK education. I look forward to seeing you there in person, or at least at the far end of the live-stream.

March 21, 2011

Virtualisation and the cloud - the Eduserv Symposium 2011

In my last post I mentioned that there was a 'Cloud solutions: risk or reward?' session at the recent JISC Conference in Liverpool. You can watch the three presentations that were made as part of that session by visiting the conference website: Paul Watson (Professor of Computer Science, University of Newcastle) giving a nice overview of work they have been doing to allow non-technical people to use cloud infrastructure more easily; Phil Richards (University of Loughborough) talking about the recent work that Loughborough have been doing with Logicalis; and Henry Hughes (Strategic Programmes Manager, JANET(UK)) talking about the new JANET cloud brokerage service.

Cloud infrastructure is clearly one of the big topics for academia in the UK this year, not least because of the recent UMF funding announcement from HEFCE/JISC (of which the JANET brokerage service (above) is a part). As a result, this particular JISC Conference session came hot on the heels of various other 'cloud' university events including one organised by UCISA that I reported on recently. What struck me while watching it was that we have rapidly reached the point where people are up to speed with the general principles of cloud infrastructure. We don't need too many more 'What is the cloud?' type sessions. What we do need are more sessions that get into the detail of cloud infrastructure, how it might be delivered and consumed in the context of academia, what business models are going to be sustainable, and so on.

This was quite a sobering thought for me personally because I'm currently in the closing stages of organising this year's annual Eduserv Symposium, an event that will focus on - yes, you guessed it - the provision of cloud infrastructure. That said, I think we have a pretty good line-up of speakers - see the symposium website for details - including an opening keynote from Simon Wardley (previously of Canonical and now at Leading Edge Forum) and talks by Chris Cobb (Pro Vice Chancellor, Roehampton University), Phil Richards (Director of IT, Loughborough University) and Kenji Takeda (Senior Lecturer, University of Southampton). I'm also pleased to say that our closing keynote will be given by Armando Fox, Adjunct Associate Professor, Electrical Engineering and Computer Science, UC Berkeley, who was one of the authors of the influential position paper, Above the Clouds: A Berkeley View of Cloud Computing [PDF].

As with last year, we'll start the afternoon session with a set of short 'lightning talks', this time covering what JANET, the Digital Curation Centre (DCC) and ourselves are doing as part of the UMF programme (given by Dan Perry, Kevin Ashley and Matt Johnson).

We have a stated set of aims for the symposium, namely that it will allow people to:

  • hear about the latest developments in the University Modernisation Fund (UMF) shared services in cloud computing infrastructure programme;
  • understand the strategic role of virtualisation and the cloud in the delivery of shared IT services;
  • find out about current and future directions in the provision of cloud solutions for compute and storage, both within academia and beyond;
  • cover the issues and challenges associated with these approaches and their impact on efficiency and cost effectiveness;
  • listen to practical experiences from institutions already workingin the area; and
  • network with peers who have a shared interest in these issues.

I'm really hopeful that the symposium will help us begin to move this debate forward, to inform Eduserv's thinking as we begin ro roll out cloud services, and to help shape the wider UMF programme. I appreciate that it is difficult to get concrete stuff out of a day like this but (as always) I'm really looking forward to it and think we have the makings of a great day.

The event is free to delegates and we have plenty of room. If you are interested in this area and want to get a good handle on what is going on, please sign up via the website. Oh, and we have a drinks reception afterwards which always helps with the networking!

Addendum: I very pleased to report that Terry Harmer (Co-Principal Investigator, Belfast eScience Centre) has also agreed to speak.

UMF pilot cloud infrastructure - size matters?

It's been a while since the HEFCE announcement about UMF and there was quite a bit of discussion about UMF, virtualisation and the cloud at the recent JISC conference in Liverpool (at least from what I could see on the live video stream). It therefore seems appropriate to mention our role in this activity.

By way of background, the University Modernisation Fund (UMF) is a HEFCE initiative that aims to help universities and colleges in the UK deliver better efficiency and value for money through the development of shared services. Managed by the JISC, the programme has two core elements:

  • investment of up to £10 million in cloud computing, shared IT infrastructure, support to deliver virtual servers, storage and data management applications;
  • investment of up to £2.5 million to establish cloud computing and shared services in central administration functions to support learning, teaching, and research.

As part of our involvement in this activity, Eduserv is building a generalised virtualisation and cloud platform to serve up compute and storage resources as IaaS.  Both compute and storage resources will be offered at different tiers to enable delivery of a wide range of applications. At this stage, we expect the platform to offer the following services (though exact details are still under discussion with both JANET and the JISC):

  • VMware-based virtual machines;
  • physical blade servers;
  • block-level SAN disk storage;
  • file/object-level archive disk storage.

Whilst the platform is designated a pilot service, it will be delivered to production quality standards in order that we both build consumer confidence in service availability and that we are able to understand and mitigate any transitional or operational concerns as quickly as possible during the pilot. The platform will be designed to offer virtualization and cloud infrastructure to any projects funded through the UMF programme (at no cost) and to the wider UK HE community (using pricing and billing models that are still to be determined - Eduserv will be developing pricing and billing models that are both sympathetic to the needs of the academic community and that support a sustainable service in the future - again, in discussion with JANET).

We are in the process of designing this infrastructure in such a way to provide the following tactical benefits to HE institutions and supporting organisations: 

  • a fully-configurable virtualised environment that allows for the configuration of customer-specific infrastructure in segregated environments, offering high levels of security and performance;
  • a resilient network, capable of delivering wire-speed 10 Gigabit Ethernet connectivity from the physical or virtual server through to the JANET backbone, which enables institutions to make use of UMF services as though they were located locally on-premise;
  • a highly-scalable compute infrastructure that can simultaneously accommodate a number of different initiatives, from UMF-funded pilot SaaS services through to institution-specific virtualisation and cloud provisioning;
  • a multi-tier storage architecture providing a range of data services, from raw data processing through to research data management and longer-term storage.

From our perspective, the intention is to investigate and support the following strategic benefits across the HE community:

  • the potential to significantly reduce the amount of time and effort spent by HEIs in developing plans and associated business cases for institutional data centre infrastructure, leading to a reduction in capital expenditure on HEI-specific data centre construction, refit and on-going operations;
  • the provision of a focal point for new shared service development, offering on-demand development and test environments as well as high-quality production infrastructure capable of delivering enterprise-level SLAs;
  • a long-term, sustainable service blueprint that delivers IaaS services at pricing competitive with existing commercial providers, but with the benefit of direct JANET connectivity and HE-suited pricing models;
  • a service platform that offers IPv6 capability, assisting institutions in the transition from the currently depleted IPv4 address space.

During the morning cloud session at the JISC Conference, there were a couple of comments relating to "industrial scale" clouds, the implication being both that the education community can't build such clouds itself and that massive size matters in order to realise sufficient economies of scale to be worthwhile.

As I tweeted at the time, I don't believe that to be the case. Or, rather, I don't know if that is the case - one of the things we need to make sure comes out of the next 12 months or so of activity within UMF is some much better understanding of what the education community is capable of building itself, whether cloud infrastructure services within that community are likely to be sustainable, and what cost savings are likely to be made.

It seems to me that there is a scale that sits somewhere between "a single institution" and "industrial scale" (which I take to mean Amazon, Microsoft, etc.), a scale that the education community is well able to deliver, that is sufficiently far along the scale/cost curve for significant savings to be made.

The further one can move along the scale axis, the better - clearly. As in most things, size matters! But it is also the case that there are diminishing returns here I suspect. It remains to be seen how far educational providers can move along the scale, either individually or in collaboration, and whether the resulting infrastructure can be delivered in an attractive and sustainable way.

If you are interested in this kind of stuff, our annual free Eduserv Symposium (May 12th in London) will be focusing on virtualisation and the cloud in general, and UMF in particular - more on this shortly.

February 22, 2011

UCISA cloud computing event

I attended UCISA's Cloud Computing Seminar last week, a pretty good event overall though, like many 'cloud' events, there was quite a mix of IaaS (e.g. Amazon Web Services (AWS)) and SaaS (e.g. Google Apps) presentations so it sometimes felt like the programme was jumping around a bit. There is no doubt that the 'cloud' is generating a lot of interest at the moment, which is gratifying since it is also the topic of our annual Eduserv Symposium this year (May 12th in London).

Phil Richards, of Loughborough, talked about their partnership with Logicalis, building what he called a 'hybrid cloud' comprising both an on-campus virtualisation infrastructure and some in-the-cloud burst capacity (based on Logicalis' Cooperative Cloud Service). He seemed to make the point that research and teaching, being two of the key differentiators between universities, are inappropriate for outsourcing to the cloud. Well, yes, I tend to agree - but that doesn't mean that the compute infrastructure on which those things are built can't be outsourced? With the exception of some very specialised cases, I doubt that many people choose their place of research or learning based on the size of its data centre?

And, as David Wallom of OeRC pointed out in his talk about the FleSSR project, outsourcing to cloud infrastructure is already happening, albeit in a rather ad hoc and bottom-up way. He suggested that most institutions (certainly research intensive institutions) will probably have around 200-300 researchers that are already using AWS (or equivalent) for some aspects of their research. So, for at least some researchers, the decision to use cloud infrastructure has already been taken, often on the back of a personal credit card! The problem for universities is that it is happening in an unmanaged (and in some senses unmanagable) way.

Given the announcement of UMF funding a couple of weeks ago, which includes a pilot "virtual server infrastructure (a 'cloud')" hosted by us, and given our involvement in the FleSSR project, we now fall rather squarely into the camp of those people thinking about building shared 'cloud' infrastructure services for the education sector. Understanding both the needs of those individual researchers who are currently choosing to go to Amazon and those of university IT Services who likely have more strategic 'virtualisation' issues in mind brings, I suspect, some interesting tensions, not least around business models (which was the topic of Matt Johnson's talk), pricing models and, ultimately, sustainability.

Interestingly, JANET's new role as a broker for the "procurement of shared virtual servers and data centre capacity" (as part of the UMF funding announcement) got positive support on a couple of occassions during the day, with the speakers from UCD saying that they'd like to see a similar service being set up in Ireland.

So-called 'cloud bursting' was also refered to several times during the day as being an attractive option. This approach, like that adopted by Loughborough, retains virtualisation/compute and storage capacity in-house but uses the cloud to meet demand when it exceeds local capacity ('bursting'). This is also the architectural approach being investigated by the FleSSR project. What is not clear to me, when we view the UK HE community as a whole, is the extent to which this kind of approach is able to achieve such significant overall cost savings when compared to a more whole-hearted 'push everything out to a shared cloud provider' model, nor the ease with which such cloud-bursting services can become sustainable.

From our perspective, the issues around business models, costing and sustainability are taxing our minds at least as much as the nuts and bolts of building the infrastructure as we consider the future of both FleSSR and our UMF pilot. More anon...

December 10, 2010

Cloud storage - costing and pricing

I've been doing some cloud-related (cloudy?) thinking as part of my work on the FleSSR project over the last couple of days, ultimately with the aim of delivering a piece on business models for cloud services (one of the project deliverables) but initially just looking at the costs of storage in the cloud (Amazon, Dropbox and Rackspace) and the costs of building cloud storage in-house.

The result is a couple of posts on the FleSSR project blog and a Google spreadsheet. Please have a read. I'm keen to get feedback!

So, what can we conclude? Looking at the cost per TB per year, the Dropbox and Rackspace prices are pretty much flat (i.e. the same irrespective of how much data is being stored) at around £1530/TB/year and £1220/TB/year respectively (though, as noted above, the Dropbox prices are only applicable for 50GB and 100GB). Amazon's pricing is cheaper, particularly so for large amounts of data (anything over 100TB data where the price starts dipping below £1000/TB/year) but never reaches the kind of baseline figures I've seen others quote for Amazon storage alone (i.e. without network costs) of around £450/TB/year. (My lowest estimate is around £510/TB/year for 500PB data but, as mentioned above, this estimate is probably unrealistic for other reasons.)

Superficially, these prices seem quite high - they are certainly higher than I was expecting. What is interesting is whether they can be matched or beaten by academic providers (such as Eduserv) and/or in-house institutional provision, and if so by how much?

In the second post I try to identify a 'shopping list' of things that would need to be paid for if one were to build a cloud storage infrastructure oneself, partly as a simple reminder that setting up this kind of service isn't just about buying some kit - there are all sort of costs that need to be met (some up-front and some on an ongoing basis):

  • Disks
  • Network infrastructure (switching, etc.)
  • Router/firewall
  • Physical space costs
  • Energy
  • Operator cover
  • Development effort
  • Project/service management
  • Procurement/financial effort

I don't go as far as identifying specific costs (in terms of amounts of money) because doing so is subject to all kinds of variables. However, the list itself is intended to help think about costs when considering things like whether to outsource to the cloud or not. I'm hoping that this will prove useful to people but if you think I've got things majorly (or even a little bit) wrong, please shout.

November 02, 2010

FleSSR public cloud infrastructure update

I wrote a brief update for the FleSSR project blog yesterday, covering some work we did last week at our (relatively new) Swindon Data Centre to build the initial infrastructure for the project's public cloud. I won't repeat any of that here but would just like to note that the FAS 3140 SAN cluster (Storage Area Network) that we are being loaned by NetApp via Q Associates for the duration of the project, of which we'll use about 10 Tbytes for FleSRR, will be up and running over the next couple of days meaning that this infrastructure will be substantial enough for some real testing.

As an aside, when Eduserv's new Swindon Data Centre originally opened all staff we're encouraged to go over from Bath to have a look round. I didn't bother because "what's the point of looking round a shed?" - it wasn't one of my more popular in-house comments :-)

As it happens, I was quite wrong... the Data Centre is actually quite impressive, not just because of the available space (which is much bigger than I was expecting) but also the quality of the one 'vault' that has been built so far and the associated infrastructure. It looks (to my eyes) like a great resource... now we've just got to get it used by our primary communities - education, government and health. I'm hopeful that FleSSR represents a small step towards what will eventually become a well-valued community resource.

October 13, 2010

What current trends tell us about the future of federated access management in education

As mentioned previously, I spoke at the FAM10 conference in Cardiff last week, standing in for another speaker who couldn't make it and using material crowdsourced from my previous post, Key trends in education - a crowdsource request, to inform some of what I was talking about. The slides and video from my talk follow:

As it turns out, describing the key trends is much easier than thinking about their impact on federated access management - I suppose I should have spotted this in advance - so the tail end of the talk gets rather weak and wishy-washy. And you may disagree with my interpretation of the key trends anyway. But in case it is useful, here's a summary of what I talked about. Thanks to those of you who contributed comments on my previous post.

By way of preface, it seems to me that the core working assumptions of the UK Federation have been with us for a long time - like, at least 10 years or so - essentially going back to the days of the centrally-funded Athens service. Yet over those 10 years the Internet has changed in almost every respect. Ignoring the question of whether those working assumptions still make sense today, I think it certainly makes sense to ask ourselves about what is coming down the line and whether our assumptions are likely to still make sense over the next 5 years or so. Furthermore, I would argue that federated access management as we see it today in education, i.e. as manifested thru our use of SAML, shows a rather uncomfortable fit with the wider (social) web that we see growing up around us.

And so... to the trends...

The most obvious trend is the current financial climate, which won't be with us for ever of course, but which is likely to cause various changes while it lasts and where the consequences of those changes, university funding for example, may well be with us much longer than the current crisis. In terms of access management, one impact of the current belt-tightening is that making a proper 'business case' for various kinds of activities, both within institutions and nationally, will likely become much more important. In my talk, I noted that submissions to the UCISA Award for Excellence (which we sponsor) often carry no information about staff costs, despite an explicit request in the instructions to entrants to indicate both costs and benefits. My point is not that institutions are necessarily making the wrong decisions currently but that the basis for those decisions, in terms of cost/benefit analysis, will probably have to become somewhat more rigorous than has been the case to date. Ditto for the provision of national solutions like the UK Federation.

More generally, one might argue that growing financial pressure will encourage HE institutions into behaving more and more like 'enterprises'. My personal view is that this will be pretty strongly resisted, by academics at least, but it may have some impact on how institutions think about themselves.

Secondly, there is the related trend towards outsourcing and shared services, with the outsourcing of email and other apps to Google being the most obvious example. Currently that is happening most commonly with student email but I see no reason why it won't spread to staff email as well in due course. At the point that an institution has outsourced all its email to Google, can one assume that it has also outsourced at least part of its 'identity' infrastructure as well? So, for example, at the moment we typically see SAML call-backs being used to integrate Google mail back into institutional 'identity' and 'access management' systems (you sign into Google using your institutional account) but one could imagine this flipping around such that access to internal systems is controlled via Google - a 'log in with Google' button on the VLE for example. Eric Sachs, of Google, has recently written about OpenID in the Enterprise SaaS market, endorsing this view of Google as an outsourced identity provider.

Thirdly, there is the whole issue of student expectations. I didn't want to talk to this in detail but it seems obvious that an increasingly 'open' mashed and mashable experience is now the norm for all of us - and that will apply as much to the educational content we use and make available as it does to everything else. Further, the mashable experience is at least as much about being able to carry our identities relatively seamlessly across services as it is about the content. Again, it seems unclear to me that SAML fits well into this kind of world.

There are two other areas where our expectations and reality show something of a mis-match. Firstly, our tightly controlled, somewhat rigid approach to access management and security are at odds with the rather fuzzy (or at least fuzzilly interpretted) licences negotiated by Eduserv and JISC Collections for the external content to which we have access. And secondly, our over-arching sense of the need for user privacy (the need to prevent publishers from cross-referencing accesses to different resources by the same user for example) are holding back the development of personalised services and run somewhat counter to the kinds of things we see happening in mainstream services.

Fourthly, there's the whole growth of mobile - the use of smart-phones, mobile handsets, iPhones, iPads and the rest of it - and the extent to which our access management infrastructure works (or not) in that kind of 'app'-based environment.

Then there is the 'open' agenda, which carries various aspects to it - open source, open access, open science, and open educational resources. It seems to me that the open access movement cuts right to the heart of the primary use-case for federated access management, i.e. controlling access to published scholarly literature. But, less directly, the open science movement, in part, pushes researchers towards the use of more open 'social' web services for their scholarly communication where SAML is not typically the primary mechanism used to control access.

Similarly, the emerging personal learning environment (PLE) meme (a favorite of educational conferences currently), where lecturers and students work around their institutional VLE by choosing to use a mix of external social web services (Flickr, Blogger, Twitter, etc.) again encourages the use of external services that are not impacted by our choices around the identity and access management infrastructure and over which we have little or no control. I was somewhat sceptical about the reality of the PLE idea until recently. My son started at the City of Bath College - his letter of introduction suggested that he created himself a Google Docs account so that he could do his work there and submit it using email or Facebook. I doubt this is college policy but it was a genuine example of the PLE in practice so perhaps my scepticism is misplaced.

We also have the changing nature of the relationship between students and institutions - an increasingly mobile and transitory student body, growing disaggregation between the delivery of learning and accreditation, a push towards overseas students (largely for financial reasons), and increasing collaboration between institutions (both for teaching and research) - all of which have an impact on how students see their relationship with the institution (or institutions) with whom they have to deal. Will the notion of a mandated 3 or 4 year institutional email account still make sense for all (or even most) students in 5 or 10 years time?

In a similar way, there's the changing customer base for publishers of academic content to deal with. At the Eduserv Symposium last year, for example, David Smith of CABI described how they now find that having exposed much of their content for discovery via Google they have to deal with accesses from individuals who are not affiliated with any institution but who are willing to pay for access to specific papers. Their access management infrastructure has to cope with a growing range of access methods that sit outside the 'educational' space. What impact does this have on their incentives for conforming to education-only norms?

And finally there's the issue of usability, and particularly the 'where are you from' discovery problem. Our traditional approach to this kind of problem is to build a portal and try and control how the user gets to stuff, such that we can generate 'special' URLs that get them to their chosen content in such a way that they can be directed back to us seemlessly in order to login. I hate portals, at least insofar as they have become an architectural solution, so the less said the better. As I said in my talk, WAYFless URLs are an abomination in architectural terms, saved only by the fact that they work currently. In my presentation I played up the alternative usability work that the Kantara ULX group have been doing in this area, which it seems to me is significantly better than what has gone before. But I learned at the conference that Shibboleth and the UK WAYF service have both also been doing work in this area - so that is good. My worry though is that this will remain an unsolvable problem, given the architecture we are presented with. (I hope I'm wrong but that is my worry). As a counterpoint, in the more... err... mainstream world we are seeing a move towards what I call the 'First Bus' solution (on the basis that in many UK cities you only see buses run by the First Group (despite the fact that bus companies are supposed to operate in a free market)) where you only see buttons to log in using Google, Facebook and one or two others.

I'm not suggesting that this is the right solution - just noting that it is one strategy for dealing with an otherwise difficult usability problem.

Note that we are also seeing some consolidation around technology as well - notably OpenID and OAuth - though often in ways that hides it from public view (e.g. hidden behind a 'login with google' or 'login with facebook' button).

Which essentially brings me to my concluding screen - you know, the one where I talk about all the implications of the trends above - which is where I have less to say than I should! Here's the text more-or-less copy-and-pasted from my final slide:

  • ‘education’ is a relatively small fish in a big pond (and therefore can't expect to drive the agenda)
  • mainstream approaches will win (in the end) - ignoring the difficult question of defining what is mainstream
  • for the Eduserv OpenAthens product, Google is as big a threat as Shibboleth (and the same is true for Shibboleth)
  • the current financial climate will have an effect somewhere
  • HE institutions are probably becoming more enterprise-like but they are still not totally like commercial organisations and they tend to occupy an uncomfortable space between the ‘enterprise’ and the ‘social web’ driven by different business needs (c.f. the finance system vs PLEs and open science)
  • the relationships between students (and staff) and institutions are changing

In his opening talk at FAM10 the day before, David Harrison had urged the audience to become leaders in the area of federated access management. In a sense I want the same. But I also want us, as a community, to become followers - to accept that things happen outside our control and to stop fighting against them the whole time.

Unfortunately, that's a harder rallying call to make!

Your comments on any/all of the above are very much welcomed.

September 28, 2010

An App Store for the Government?

I listened in to a G-Cloud web-cast organised by Intellect earlier this month, the primary intention of which was to provide an update on where things have got to. I use the term 'update' loosely because, with the election and change of government and what-not, there doesn't seem to have been a great deal of externally visible progress since the last time I heard someone speak about the G-Cloud. This is not surprising I guess.

The G-Cloud, you may recall, is an initiative of the UK government to build a cloud infrastructure for use across the UK public sector. It has three main strands of activity:

The last of these strikes me as the hardest to get right. As far as I can tell, it's an idea that stems (at least superficially) from the success of the Apple App Store though it's not yet clear whether an approach that works well for low-cost, personal apps running on mobile handsets is also going to work for the kinds of software applications found running across government. My worry is that, because of the difficulty, the ASG will distract from progress on the other two fronts, both of which strike me as very sensible and potentially able to save some of the tax-payer's hard-earned dosh.

App stores (the real ones I mean) work primarily because of their scale (global), the fact that people can use them to showcase their work and/or make money, their use of relatively micro-payments, and their socialness. I'm not convinced that any of these factors will have a role to play in a government app store so the nature of the beast is quite different. During the Q&A session at the end of the web-cast someone asked if government departments and/or local councils would be able to 'sell' their apps to other departments/councils via the ASG. The answer seemed to be that it was unlikely. If we aren't careful we'll end up with a simple registry of government software applications, possibly augmented by up-front negotiated special deals on pricing or whatever and a nod towards some level of social engagement (rating, for example) but where the incentives for taking part will be non-obvious to the very people we need to take part - those people who procure government software. It's the kind of thing that Becta used to do for the school's sector... oh, wait! :-(

For the ASG to work, we need to identify those factors that might motivate people to use it (other than an outright mandate) - as individuals, as departments and as government as a whole. I think this will be quite a tricky thing to get right. That's not to say that it isn't worth trying - it may well be. But I wonder if it would be better unbundled from the other strands of the G-Cloud concept, which strike me as being quite different.

Addendum: A G-Cloud Overview [PDF, dated August 2010] is available from the G-Digital Programme website:

G-Digital will establish a series of digital services that will cover a wide range of government’s expected digital needs and be available across the public sector. G-Digital will look to take advantage of new and emerging service and commercial models to deliver benefits to government.

August 13, 2010

Cloud infrastructures for academia - the FleSSR project

Yesterday, I attended the kick-off meeting for a new JISC-funded project called FleSSR - Flexible Services for the Support of Research. From the, as yet very new, project blog:

Our project will create a hybrid public-private Infrastructure as a Service cloud solution for academic research. The two pilot use cases chosen follow the two university partners interests, software development and multi-platform support and on-demand research data storage space.

We will be implementing open standards for cloud management through the OGF Open Cloud Computing Interface.

The project is a collaboration led by the Oxford e-Research Centre and involving STFC, Eduserv, the University of Reading, EoverI, Eucalyptus Inc. and Canonical Ltd.

Our role at Eduserv will primarily be to build a public cloud into which private clouds at Oxford and Reading can burst both compute resource and storage at times of high demand, as generated by pilot demonstrators at those two institutions. My colleagues Matt Johnson and Tim Lawrence will lead our work on this here. The clouds will be built on some variant of Eucalyptus and Ubuntu - one of the early pieces of work for the project team being to compare Open Eucalyptus, Enterprise Eucalyptus and Ubuntu Enterprise Cloud.

My own involvement with the project will start properly after Christmas and will contribute to the project's thinking about sustainable business models for cloud providers like Eduserv in this space. One of the interesting aspects of the project will be some technical work on policy enforcement and accounting that will allow business models other than 'top-sliced central-funding' to come into play in academia for this kind of provision.

I'm really looking forward to this work. The project itself, funded as part of the JISC's Flexible Service Delivery Programme, is only 10 months in duration but is attempting to cover a lot of ground very quickly. I'm very hopeful that the outputs will be of widespread interest to the community, as well as helping to shape our own potential offerings in this area.



eFoundations is powered by TypePad