« VCard in RDF | Main | The 2nd Linked Data London Meetup & trying to bridge a gap »

February 19, 2010

In the clouds

So, the Repositories and the Cloud meeting, jointly organised by ourselves and the JISC, takes place on Tuesday next week and I promised to write up my thoughts in advance.  Trouble is... I'm not sure I actually have any thoughts :-(

Let's start from the very beginning (it's a very good place to start)...

The general theory behind cloud solutions - in this case we are talking primarily about cloud storage solutions but I guess this applies more generally - is that you outsource parts of your business to someone else because:

  • they can do it better than you can,
  • they can do it more cheaply than you can,
  • they can do it in a more environmentally-friendly way than you can, or
  • you simply no longer wish to do it yourself for other reasons.

Seems simple enough and I guess that all of these apply to the issues at hand for the meeting next week, i.e. what use is there for utility cloud storage solutions for the data currently sitting in institutional repositories (and physically stored on disks inside the walls of the institution concerned).

Against that, there are a set of arguments or issues that mitigate against a cloud solution, such as:

  • security
  • data protection
  • sustainability
  • resilience
  • privacy
  • loss of local technical knowledge
  • ...

...you know the arguments.  Ultimately institutions are going to end up asking themselves questions like, "how important is this data to us?", "are we willing to hand it over to one or more cloud providers for long term storage?", "can we afford to continue to store this stuff for ourselves?", "what is our exit strategy in the future?", and so on.

Wrapped up in this will be issues about the specialness of the kind of stuff one typically finds in institutional repositories - either because of volume of data (large research data-sets for example), or because stuff is seen as being especially important for various reasons (it's part of the scholarly record for example).

None of which is particularly helpful in terms of where the meeting will take us!  I certainly don't expect any actual answers to come out of it, but I am expecting a good set of discussions both about current capabilities (what the current tools are capable of), policy issues, and about where we are likely to go in the future.

One of the significant benefits the current interest in cloud solutions brings is the abstraction of the storage layer from the repository services.  Even if I never actually make use of Amazon S3, I might still get significant benefit from the cloud storage mindset because my internal repository 'storage' layer is separated from the rest of the software.  That means that I can do things like sharing data across multiple internal stores, sharing data across multiple external stores, or some combination of both, much more easily.  It also potentially opens up the market to competing products.

So, I think this space has wider implications than a simple, "should I use cloud storage?" approach might imply.

From an Eduserv point of view, both as a provider of not-for-profit services to the public, health and education sectors and as an organisation with a brand spanking new data centre I don't think there's any secret in the fact that we want to understand whether there is anything useful we can bring to this space - as a provider of cloud storage solutions that are significantly closer to the community than the utility providers are for example.  That's not to say that we have such an offer currently - but it is the kind of thing we are interested in thinking about.

I don't particularly buy into the view that the cloud is nothing new.  Amazon S3 and its ilk didn't exist 10 years ago and there's a reason for that.  As markets and technology have matured new things have become possible.  But that, on its own, isn't a reason to play in the cloud space. So, I suppose that the real question for the meeting next week is, "when, if ever, is the right time to move to cloud storage solutions for repository content... and why?" - both from a practical and a policy viewpoint.

I don't know the answers to those questions but I'm looking forward to finding out more about it next week.


TrackBack URL for this entry:

Listed below are links to weblogs that reference In the clouds:


The ravens have deserted the tower. Opinions have deserted Eduserv. Funding has deserted JISC. Woe is us.

No worries... normal service will be resumed shortly.


I feel that you missed some of the key strengths of the cloud. On demand self service and rapid scalability seem to be the features that I think no existing non-profit data center can compete with. I think the real characteristics of the cloud often get lost behind the outsourcing opportunities. If you haven't read it yet, I recommend reading the NIST's definition of the cloud:
From time to time I get lost in thinking about the cloud's potential, and reading this often times helps me see the bigger picture.

At the same time I think you're on to something with S3 providing viable options to institutions in terms of archival storage. S3's newest feature, versioning, makes this an even more viable option:
I think this solves the problem many institutions have when it comes to providing archival storage and providing it right.

I personally would really like to see an educational/non-profit based cloud with the feature rich services that AWS provides. While I understand most people see the majority of potential in terms of storage, I think the cloud has the ability to revolutionize the way not for profits work with one another.

re: "self service and rapid scalability"... yes agreed. I'd kind of subsumed those into the 'better' and 'cheaper' bullet points but I agree it is probably helpful to enumerate these things more fully. And, yes, I also agree very much with the competition issue.

One of the things we are currently thinking about funding is a study comparing the various open source 'cloud' solutions - Eucalyptus and the like. The aim would be to help inform people thinking of building their own clouds. Would such a thing be useful?

I think your list of "reasons for moving to the cloud" reads more like a list of "reasons for outsourcing" in general. Which is not to argue that the cloud is nothing new - but that the list of motivations you gave could just drive someone to outsource all or part of their repository provision to someone like Eduserv or ULCC without any cloudiness being involved.

Rosalyn's mentioned the self-service, easily scalable aspects which distinguish cloud solutions. They lead me to think that simply thinking of this problem as being about cloud storage is too confining; the cloud makes it easy to imagine firing up entire repository instances quickly.

I agree absolutely that even the possibility of cloud storage should encourage better thinking about repository architecture, clearly separating storage and preservation functions from deposit and access provision. That can only be good. It also encourages to evaluate whether in-house provision is necessarily best. Even if the answer is "yes", we end up being more confident about why it is yes.

My contribution ahead of Tuesday's event is now up at http://cloudofdata.com/2010/02/repositories-in-the-cloud-why-on-earth-not/


I like the idea of a study to look at the feasibility. Building your own cloud might be a lot of work. If the work a sys admin would need to do could be reduced, that would be awesome.

As far as creating a repository in the cloud, I wonder if there could be two options.

1) create your own cloud using things like eucalyptus and then top it off with whatever repository software/tools you want.

2) have a pre-built system based in something like amazon or rackspace. you could have a server image with the tools needed on it. then users would just need to start up that image, start up the necessary storage solution, and connect the two.

those are some of my thoughts on clouds and repositories. i'll be following you all as much as i can (i'll be heading off to code4lib).


Having read Pauls blog and your piece I am minded to make a few comments. One of the critical bits of the discussion I have not seen mentioned is what are clouds not good for? The critical one from my perspective is security. This is a real issue.

My experience is that there is a really woolly understanding of what is important in a University. As recent security breaches have shown there is the temptation to place to much emphasis on products and not on the process, and surrounding artefacts eg. email. The tech / information separation is nice but that separation also makes abuse of the infrastructure potentially easier.

The temptation is to move process into the cloud because of the connectivity benefits and only "publish" when the scholarly product is ready. In very real security terms a cloud approach is potentially publishing while still thinking. I believe we have all hit send on an email and after a nights sleep thought better of it. What if someone unfriendly has access to that?

The idea of potentially exposing ourselves without our active agreement in the cloud for the sake of expedience and cost seems misguided to me. Would we be happy to conduct all our private interactions with people in public and just hope no-one is looking?

I work in security often involving clouds so spend my time pre-occupied with the impact and risk of things as well as the value, but all the same it looks very risky to me - kind of like living your life in a hotel. As long as everyone plays fair we can all get along but we do not necessarily know who has the keys at any given moment.

We perhaps need some sort of a user-managed strong authentication layer, separate from the technology provider, but this is likely to be very tricky to pull off in a cloud environment.

I am all for cloud computing but the provider and the risk profile need to be considered and the applications well planned.

The comments to this entry are closed.



eFoundations is powered by TypePad