May 18, 2012

Big Data - size doesn't matter, it's the way you use it that counts least, that's what they tell me!

IMG_6404Here's my brief take on this year's Eduserv Symposium, Big Data, big deal?, which took place in London last Thursday and which was, by all the accounts I've seen and heard, a pretty good event.

The day included a mix of talks, from an expansive opening keynote by Rob Anderson to a great closing keynote by Anthony Joseph. Watching either, or both, of these talks will give you a very good introduction to big data. Between the two we had some specifics: Guy Coates and Simon Metson talking about their experiences of big data in genomics and physics respectively (though the latter also included some experiences of moving big data techniques between different academic disciplines); a view of the role of knowledge engineering and big data in bridging the medical research/healthcare provision divide by Anthony Brookes; a view of the potential role of big data in improving public services by Max Wind-Cowie; and three shorter talks immediately after lunch - Graham Prior talking about big data and curation, Devin Gafney talking about his 140Kit twitter-analytics project (which, coincidentally, is hosted on our infrastructure) and Simon Hodson talking about the JISC's big data activities.

All of the videos and slides from the day are avaialble at the links above. Enjoy!

For my part, there were several take-home messages:

  • Firstly, that we shouldn’t get too hung up on the word ‘big’. Size is clearly one dimension of the big data challenge but of the three words most commonly associated with big data - volume, velocity and variety - it strikes me that volume is the least interesting and I think this was echoed by several of the talks on the day.
  • In particular, it strikes me there is some confusion between ‘big data’ and ‘data that happens to be big’ - again, I think we saw some of this in some of the talks. Whilst the big data label has helped to generate interest in this area, it seems to me that its use of the word 'big' is rather unhelp in this respect. It also strikes me that the JISC community, in particular, has a history of being more interested in curating and managing data than in making use of it, whereas big data is more about the latter than the former.
  • As with most new innovations (though 'evolution' is probably a better word here) there is a temptation to focus on the technology and infrastructure that makes it work, particularly amoungst a relatively technical audience. I am certainly guilty of this. In practice, it is the associated cultural change that is probably more important. Max Wind-Cowie’s talk, in particular, referred to the kinds of cultural inertia that need to be overcome in the public sector, on both the service provider side and the consumer side, before big data can really have an impact in terms of improving public services. Attitudes like, "how can a technology like big data possibly help me build a *closer* and more *personal* relationship with my clients?" or "why should I trust a provider of public services to know this much about me?" seem likely to be widespread. Though we didn't hear about it on the day, my gut feeling is that a similar set of issues would probably apply in education were we, for example, to move towards a situation where we make significant use of big data techniques to tailor learning experiences at an individual level. My only real regret about the event was that I didn't find someone to talk on this theme from an education perspective.
  • Several talks refered to the improvements in 'evidence-based' decision-making that big data can enable. For example, Rob Anderson talked about poor business decisions being based on poor data currently and Anthony Brookes discussed the role of knowledge engineering in improving the ability of those involved in front-line healthcare provision to take advantage of the most recent medical research. As Adam Cooper of CETIS argues in Analytics and Big Data - Reflections from the Teradata Universe Conference 2012, we need to find ways to ask questions that have efficiency or effectiveness implications and we need to look for opportunities to exploit near-real-time data if we are to see benefits in these areas.
  • I have previously raised the issue of possible confusion, especially in the government sector, between 'open data' and 'big data'. There was some discussion of this on the day. Max Wind-Cowie, in particular, argued that 'open data' is a helpful - indeed, a necessary - step in encouraging the public sector to move toward a more transparent use of public data. The focus is currently on the open data agenda but this will encourage an environment in which big data tools and techniques can flourish.
  • Finally, the issue that almost all speakers touched on to some extent was that of the need to grow the pool of people who can undertake data analytics. Whether we choose to refer to such people as data scientists, knowledge engineers or something else there is a need for us to grow the breadth and depth of the skills-base in this area and, clearly, universities have a critical role to play in this.

As I mentioned in my opening to the day, Eduserv's primary interest in Big Data is somewhat mundane (though not unimportant) and lies in the enabling resources that we can bring to the communities we serve (education, government, health and other charities), either in the form of cloud infrastructure on which big data tools can be run or in the form of data centre space within which physical kit dedicated to Big Data processing can be housed. We have plenty of both and plenty of bandwidth to JANET so if you are interested in working with us, please get in touch.

Overall, I found the day enlightening and challenging and I should end with a note of thanks to all our speakers who took the time to come along and share their thoughts and experiences.

[Photo: Eliot Hall, Eduserv]

April 02, 2012

Big data, big deal?

Some of you may have noticed that Eduserv's annual symposium is happening on May 10. Once again, we're at the Royal College of Physicians in London and this year we are looking at big data, appropriate really... since 2012 has been widely touted as being the year of big data.

Here's the blurb for our event:

Data volumes have been growing exponentially for a long while – so what’s new now? Is Big Data [1] just the latest hype from vendors chasing big contracts? Or does it indeed present wholly new challenges and critical new opportunities, and if so what are they?

The 2012 Symposium will investigate Big Data, uncovering what makes it different from what has gone before and considering the strategic issues it brings with it: both how to use it effectively and how to manage it.  It will look at what Big Data will mean across research, learning, and operations in HE, and at its implications in government, health, and the commercial sector, where large-scale data is driving the development of a whole new set of tools and techniques.

Through presentations and debate delegates will develop their understanding of both the likely demands and the potential benefits of data volumes that are growing disruptively fast in their organisation.

[1] Big Data is "data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn't fit the strictures of your database architectures. To gain value from this data, you must choose an alternative way to process it."  What is big data?  Edd Dumbill, O'Reilly Radar, Jan 2012

As usual, the event is free to attend and will be followed by a drinks reception.

You'll note that we refer to Edd Dumbill's What is big data? article in order to define what we mean by big data and I recommend reading this by way of an introduction for the day. The Wikipedia page for Big data provides a good level of background and some links for further reading. Finally, O'Reilly's follow-up publication, Planning for Big Data - A CIO's Handbook to the Changing Data Landscape is also worth a look (and is free to download as an e-book).

You'll also note that the defining characteristics of big data include not just 'size' (though that is certainly an important dimension) but also 'rate of creation and/or change', and 'structural coherence'. These are typically known as the three Vs - "volume (amount of data), velocity (speed of data in/out), and variety (range of data types, sources)". In looking around for speakers, my impression is that there is a strong emphasis on the first of these in people's general understanding about what big data means (which is not surprising given the name) and that in the government sector in particular there is potential confusion between 'big data' and 'open data' and/or 'linked data' which I think it would be helpful to unpick a little - big data might be both 'open' and 'linked' but isn't necessarily so.

So, what do we hope to get out of the day? As usual, it's primarily a 'bringing people up to speed' type of event. The focus will be on our charitable beneficiaries, i.e. organisations working in the area of 'public good' - education, government, health and the charity sector - though I suspect that the audience will be mainly from the first of these. The intention is for people to leave with a better understand of why big data might be important to them and what impact it might have in both strategic and practical terms on the kinds of activities they undertake.

We have a range of speakers, providing perspectives from inside and outside of those sectors, both hands-on and more theoretical - this is one of the things we always try and do at our sympoisia. Our sessions include keynotes by Anthony D. Joseph (Chancellor's Associate Professor in Computer Science at University of California, Berkeley) and Rob Anderson (CTO EMEA, Isilon Storage Division at EMC) as well as talks by Professor Anthony J Brookes (Department of Genetics at the University of Leicester), Dr. Guy Coates (Informatics Systems Group at The Wellcome Trust Sanger Institute) and Max Wind-Cowie (Head of the Progressive Conservatism Project, Demos - author of The Data Dividend).

By the way... we still have a couple of speaking slots available and are particularly interested in getting a couple of short talks from people with practical experience of working with big data, either using Hadoop or something else. If you are interested in speaking for 15 minutes or so (or if you know of someone who might be) please get in touch. Thanks. Another area that I was hoping to find a speaker to talk about, but haven't been able to so far, is someone who is looking at the potential impact of big data on learning analytics, either at the level of a single institution or, more likely, at a national level. Again, if this is something you are aware of, please get in touch. Crowd-sourced speakers FTW! :-)

All in all, I'm confident that this will be an interesting and informative day and a good follow-up to last year's symposium on the cloud - I look forward to seeing you there.



eFoundations is powered by TypePad