May 18, 2012

Big Data - size doesn't matter, it's the way you use it that counts least, that's what they tell me!

IMG_6404Here's my brief take on this year's Eduserv Symposium, Big Data, big deal?, which took place in London last Thursday and which was, by all the accounts I've seen and heard, a pretty good event.

The day included a mix of talks, from an expansive opening keynote by Rob Anderson to a great closing keynote by Anthony Joseph. Watching either, or both, of these talks will give you a very good introduction to big data. Between the two we had some specifics: Guy Coates and Simon Metson talking about their experiences of big data in genomics and physics respectively (though the latter also included some experiences of moving big data techniques between different academic disciplines); a view of the role of knowledge engineering and big data in bridging the medical research/healthcare provision divide by Anthony Brookes; a view of the potential role of big data in improving public services by Max Wind-Cowie; and three shorter talks immediately after lunch - Graham Prior talking about big data and curation, Devin Gafney talking about his 140Kit twitter-analytics project (which, coincidentally, is hosted on our infrastructure) and Simon Hodson talking about the JISC's big data activities.

All of the videos and slides from the day are avaialble at the links above. Enjoy!

For my part, there were several take-home messages:

  • Firstly, that we shouldn’t get too hung up on the word ‘big’. Size is clearly one dimension of the big data challenge but of the three words most commonly associated with big data - volume, velocity and variety - it strikes me that volume is the least interesting and I think this was echoed by several of the talks on the day.
  • In particular, it strikes me there is some confusion between ‘big data’ and ‘data that happens to be big’ - again, I think we saw some of this in some of the talks. Whilst the big data label has helped to generate interest in this area, it seems to me that its use of the word 'big' is rather unhelp in this respect. It also strikes me that the JISC community, in particular, has a history of being more interested in curating and managing data than in making use of it, whereas big data is more about the latter than the former.
  • As with most new innovations (though 'evolution' is probably a better word here) there is a temptation to focus on the technology and infrastructure that makes it work, particularly amoungst a relatively technical audience. I am certainly guilty of this. In practice, it is the associated cultural change that is probably more important. Max Wind-Cowie’s talk, in particular, referred to the kinds of cultural inertia that need to be overcome in the public sector, on both the service provider side and the consumer side, before big data can really have an impact in terms of improving public services. Attitudes like, "how can a technology like big data possibly help me build a *closer* and more *personal* relationship with my clients?" or "why should I trust a provider of public services to know this much about me?" seem likely to be widespread. Though we didn't hear about it on the day, my gut feeling is that a similar set of issues would probably apply in education were we, for example, to move towards a situation where we make significant use of big data techniques to tailor learning experiences at an individual level. My only real regret about the event was that I didn't find someone to talk on this theme from an education perspective.
  • Several talks refered to the improvements in 'evidence-based' decision-making that big data can enable. For example, Rob Anderson talked about poor business decisions being based on poor data currently and Anthony Brookes discussed the role of knowledge engineering in improving the ability of those involved in front-line healthcare provision to take advantage of the most recent medical research. As Adam Cooper of CETIS argues in Analytics and Big Data - Reflections from the Teradata Universe Conference 2012, we need to find ways to ask questions that have efficiency or effectiveness implications and we need to look for opportunities to exploit near-real-time data if we are to see benefits in these areas.
  • I have previously raised the issue of possible confusion, especially in the government sector, between 'open data' and 'big data'. There was some discussion of this on the day. Max Wind-Cowie, in particular, argued that 'open data' is a helpful - indeed, a necessary - step in encouraging the public sector to move toward a more transparent use of public data. The focus is currently on the open data agenda but this will encourage an environment in which big data tools and techniques can flourish.
  • Finally, the issue that almost all speakers touched on to some extent was that of the need to grow the pool of people who can undertake data analytics. Whether we choose to refer to such people as data scientists, knowledge engineers or something else there is a need for us to grow the breadth and depth of the skills-base in this area and, clearly, universities have a critical role to play in this.

As I mentioned in my opening to the day, Eduserv's primary interest in Big Data is somewhat mundane (though not unimportant) and lies in the enabling resources that we can bring to the communities we serve (education, government, health and other charities), either in the form of cloud infrastructure on which big data tools can be run or in the form of data centre space within which physical kit dedicated to Big Data processing can be housed. We have plenty of both and plenty of bandwidth to JANET so if you are interested in working with us, please get in touch.

Overall, I found the day enlightening and challenging and I should end with a note of thanks to all our speakers who took the time to come along and share their thoughts and experiences.

[Photo: Eliot Hall, Eduserv]

April 02, 2012

Big data, big deal?

Some of you may have noticed that Eduserv's annual symposium is happening on May 10. Once again, we're at the Royal College of Physicians in London and this year we are looking at big data, appropriate really... since 2012 has been widely touted as being the year of big data.

Here's the blurb for our event:

Data volumes have been growing exponentially for a long while – so what’s new now? Is Big Data [1] just the latest hype from vendors chasing big contracts? Or does it indeed present wholly new challenges and critical new opportunities, and if so what are they?

The 2012 Symposium will investigate Big Data, uncovering what makes it different from what has gone before and considering the strategic issues it brings with it: both how to use it effectively and how to manage it.  It will look at what Big Data will mean across research, learning, and operations in HE, and at its implications in government, health, and the commercial sector, where large-scale data is driving the development of a whole new set of tools and techniques.

Through presentations and debate delegates will develop their understanding of both the likely demands and the potential benefits of data volumes that are growing disruptively fast in their organisation.

[1] Big Data is "data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn't fit the strictures of your database architectures. To gain value from this data, you must choose an alternative way to process it."  What is big data?  Edd Dumbill, O'Reilly Radar, Jan 2012

As usual, the event is free to attend and will be followed by a drinks reception.

You'll note that we refer to Edd Dumbill's What is big data? article in order to define what we mean by big data and I recommend reading this by way of an introduction for the day. The Wikipedia page for Big data provides a good level of background and some links for further reading. Finally, O'Reilly's follow-up publication, Planning for Big Data - A CIO's Handbook to the Changing Data Landscape is also worth a look (and is free to download as an e-book).

You'll also note that the defining characteristics of big data include not just 'size' (though that is certainly an important dimension) but also 'rate of creation and/or change', and 'structural coherence'. These are typically known as the three Vs - "volume (amount of data), velocity (speed of data in/out), and variety (range of data types, sources)". In looking around for speakers, my impression is that there is a strong emphasis on the first of these in people's general understanding about what big data means (which is not surprising given the name) and that in the government sector in particular there is potential confusion between 'big data' and 'open data' and/or 'linked data' which I think it would be helpful to unpick a little - big data might be both 'open' and 'linked' but isn't necessarily so.

So, what do we hope to get out of the day? As usual, it's primarily a 'bringing people up to speed' type of event. The focus will be on our charitable beneficiaries, i.e. organisations working in the area of 'public good' - education, government, health and the charity sector - though I suspect that the audience will be mainly from the first of these. The intention is for people to leave with a better understand of why big data might be important to them and what impact it might have in both strategic and practical terms on the kinds of activities they undertake.

We have a range of speakers, providing perspectives from inside and outside of those sectors, both hands-on and more theoretical - this is one of the things we always try and do at our sympoisia. Our sessions include keynotes by Anthony D. Joseph (Chancellor's Associate Professor in Computer Science at University of California, Berkeley) and Rob Anderson (CTO EMEA, Isilon Storage Division at EMC) as well as talks by Professor Anthony J Brookes (Department of Genetics at the University of Leicester), Dr. Guy Coates (Informatics Systems Group at The Wellcome Trust Sanger Institute) and Max Wind-Cowie (Head of the Progressive Conservatism Project, Demos - author of The Data Dividend).

By the way... we still have a couple of speaking slots available and are particularly interested in getting a couple of short talks from people with practical experience of working with big data, either using Hadoop or something else. If you are interested in speaking for 15 minutes or so (or if you know of someone who might be) please get in touch. Thanks. Another area that I was hoping to find a speaker to talk about, but haven't been able to so far, is someone who is looking at the potential impact of big data on learning analytics, either at the level of a single institution or, more likely, at a national level. Again, if this is something you are aware of, please get in touch. Crowd-sourced speakers FTW! :-)

All in all, I'm confident that this will be an interesting and informative day and a good follow-up to last year's symposium on the cloud - I look forward to seeing you there.

April 08, 2011

Scholarly communication, open access and disruption

I attended part of UKSG earlier this week, listening to three great presentations in the New approaches to research session on Monday afternoon (by Philip Bourne, Cameron Neylon and Bill Russell) and presenting first thing Tuesday morning in the Rethinking 'content' session.

(A problem with my hearing meant that I was very deaf for most of the time, making conversation in the noisy environment rather tiring, so I decided to leave the conference early Tuesday afternoon. Unfortunately, that meant that I didn't get much of an opportunity to network with people. If I missed you, sorry. Looking at the Twitter stream, it also meant that I missed what appear to have been some great presentations on the final day. Shame.)

Anyway, for what it's worth, my slides are below. I was speaking on the theme of 'open, social and linked', something that I've done before, so for regular readers of this blog there probably won't be too much in the way of news.

With respect to the discussion of 'social' and it's impact on scholarly communication, there is room for some confusion because 'social' is often taken to mean, "how does one use social media like Facebook, Twitter, etc. to support scholarly communication?". Whilst I accept that as a perfectly sensible question, it isn't quite what I meant in this talk. What I meant was that we need to better understand the drivers for social activity around research and research artefacts, which probably needs breaking down into the various activities that make up the scholarly research workflow/cycle, in order that we can build tools that properly support that social activity. That is something that I don't think we have yet got right, particularly in our provision of repositories. Indeed, as I argued in the talk, our institutional repository architecture is more or less in complete opposition to the social drivers at play in the research space. Anyway... you've heard all this from me before.

Cameron Neylon's talk was probably the best of the ones that I saw and I hope my talk picked up on some of the themes that he was developing. I'm not sure if Cameron's UKSG slides are available yet but there's a very similar set, The gatekeeper is dead, long live the gatekeeper, presented at the STM Innovation Seminar last December. Despite the number of slides, these are very quick to read thru, and very understandable, even in the absence of any audio. On that basis, I won't re-cap them here. Slides 112 onwards give a nice summary: "we are the gatekeepers... enable, don't block... build platforms, not destinations... sell services, not content... don't think about filtering or control... enable discovery". These are strong messages for both the publishing community and libraries. All in all, his points about 'discovery defecit' rather than 'filter failure' felt very compelling to me.

On the final day there were talks about open access and changing subscription models, particularly from 'reader pays' to 'author pays', based partly on the recently released study commissioned by the Research Information Network (RIN), JISC, Research Libraries UK (RLUK), the Publishing Research Consortium (PRC) and the Wellcome Trust, Heading for the open road: costs and benefits of transitions in scholarly communications. We know that the web is disruptive to both publishers and libraries but it seemed to me (from afar) that the discussions at UKSG missed the fact that the web is potentially also disruptive to the process of scholarly communication itself. If all we do is talk about shifting the payment models within the confines of current peer-review process we are missing a trick (at least potentially).

What strikes me as odd, thinking back to that original hand-drawn diagram of the web done by Tim Berners-Lee, is that, while the web has disrupted almost every aspect of our lives to some extent, it has done relatively little to disrupt scholarly communication except in an 'at the margins' kind of way. Why is that the case? My contention is that there is such a significant academic inertia to overcome, coupled with a relatively small and closed 'market', that the momentum of change hasn't yet grown sufficiently - but it will. The web was invented as a scholarly device, yet it has, in many ways, resulted in less transformation there than in most other fields. Strange?

Addendum: slides for Philip Bourne's talk are now available on Slideshare.

December 10, 2010

Cloud storage - costing and pricing

I've been doing some cloud-related (cloudy?) thinking as part of my work on the FleSSR project over the last couple of days, ultimately with the aim of delivering a piece on business models for cloud services (one of the project deliverables) but initially just looking at the costs of storage in the cloud (Amazon, Dropbox and Rackspace) and the costs of building cloud storage in-house.

The result is a couple of posts on the FleSSR project blog and a Google spreadsheet. Please have a read. I'm keen to get feedback!

So, what can we conclude? Looking at the cost per TB per year, the Dropbox and Rackspace prices are pretty much flat (i.e. the same irrespective of how much data is being stored) at around £1530/TB/year and £1220/TB/year respectively (though, as noted above, the Dropbox prices are only applicable for 50GB and 100GB). Amazon's pricing is cheaper, particularly so for large amounts of data (anything over 100TB data where the price starts dipping below £1000/TB/year) but never reaches the kind of baseline figures I've seen others quote for Amazon storage alone (i.e. without network costs) of around £450/TB/year. (My lowest estimate is around £510/TB/year for 500PB data but, as mentioned above, this estimate is probably unrealistic for other reasons.)

Superficially, these prices seem quite high - they are certainly higher than I was expecting. What is interesting is whether they can be matched or beaten by academic providers (such as Eduserv) and/or in-house institutional provision, and if so by how much?

In the second post I try to identify a 'shopping list' of things that would need to be paid for if one were to build a cloud storage infrastructure oneself, partly as a simple reminder that setting up this kind of service isn't just about buying some kit - there are all sort of costs that need to be met (some up-front and some on an ongoing basis):

  • Disks
  • Network infrastructure (switching, etc.)
  • Router/firewall
  • Physical space costs
  • Energy
  • Operator cover
  • Development effort
  • Project/service management
  • Procurement/financial effort

I don't go as far as identifying specific costs (in terms of amounts of money) because doing so is subject to all kinds of variables. However, the list itself is intended to help think about costs when considering things like whether to outsource to the cloud or not. I'm hoping that this will prove useful to people but if you think I've got things majorly (or even a little bit) wrong, please shout.

November 02, 2010

FleSSR public cloud infrastructure update

I wrote a brief update for the FleSSR project blog yesterday, covering some work we did last week at our (relatively new) Swindon Data Centre to build the initial infrastructure for the project's public cloud. I won't repeat any of that here but would just like to note that the FAS 3140 SAN cluster (Storage Area Network) that we are being loaned by NetApp via Q Associates for the duration of the project, of which we'll use about 10 Tbytes for FleSRR, will be up and running over the next couple of days meaning that this infrastructure will be substantial enough for some real testing.

As an aside, when Eduserv's new Swindon Data Centre originally opened all staff we're encouraged to go over from Bath to have a look round. I didn't bother because "what's the point of looking round a shed?" - it wasn't one of my more popular in-house comments :-)

As it happens, I was quite wrong... the Data Centre is actually quite impressive, not just because of the available space (which is much bigger than I was expecting) but also the quality of the one 'vault' that has been built so far and the associated infrastructure. It looks (to my eyes) like a great resource... now we've just got to get it used by our primary communities - education, government and health. I'm hopeful that FleSSR represents a small step towards what will eventually become a well-valued community resource.

October 13, 2010

What current trends tell us about the future of federated access management in education

As mentioned previously, I spoke at the FAM10 conference in Cardiff last week, standing in for another speaker who couldn't make it and using material crowdsourced from my previous post, Key trends in education - a crowdsource request, to inform some of what I was talking about. The slides and video from my talk follow:

As it turns out, describing the key trends is much easier than thinking about their impact on federated access management - I suppose I should have spotted this in advance - so the tail end of the talk gets rather weak and wishy-washy. And you may disagree with my interpretation of the key trends anyway. But in case it is useful, here's a summary of what I talked about. Thanks to those of you who contributed comments on my previous post.

By way of preface, it seems to me that the core working assumptions of the UK Federation have been with us for a long time - like, at least 10 years or so - essentially going back to the days of the centrally-funded Athens service. Yet over those 10 years the Internet has changed in almost every respect. Ignoring the question of whether those working assumptions still make sense today, I think it certainly makes sense to ask ourselves about what is coming down the line and whether our assumptions are likely to still make sense over the next 5 years or so. Furthermore, I would argue that federated access management as we see it today in education, i.e. as manifested thru our use of SAML, shows a rather uncomfortable fit with the wider (social) web that we see growing up around us.

And so... to the trends...

The most obvious trend is the current financial climate, which won't be with us for ever of course, but which is likely to cause various changes while it lasts and where the consequences of those changes, university funding for example, may well be with us much longer than the current crisis. In terms of access management, one impact of the current belt-tightening is that making a proper 'business case' for various kinds of activities, both within institutions and nationally, will likely become much more important. In my talk, I noted that submissions to the UCISA Award for Excellence (which we sponsor) often carry no information about staff costs, despite an explicit request in the instructions to entrants to indicate both costs and benefits. My point is not that institutions are necessarily making the wrong decisions currently but that the basis for those decisions, in terms of cost/benefit analysis, will probably have to become somewhat more rigorous than has been the case to date. Ditto for the provision of national solutions like the UK Federation.

More generally, one might argue that growing financial pressure will encourage HE institutions into behaving more and more like 'enterprises'. My personal view is that this will be pretty strongly resisted, by academics at least, but it may have some impact on how institutions think about themselves.

Secondly, there is the related trend towards outsourcing and shared services, with the outsourcing of email and other apps to Google being the most obvious example. Currently that is happening most commonly with student email but I see no reason why it won't spread to staff email as well in due course. At the point that an institution has outsourced all its email to Google, can one assume that it has also outsourced at least part of its 'identity' infrastructure as well? So, for example, at the moment we typically see SAML call-backs being used to integrate Google mail back into institutional 'identity' and 'access management' systems (you sign into Google using your institutional account) but one could imagine this flipping around such that access to internal systems is controlled via Google - a 'log in with Google' button on the VLE for example. Eric Sachs, of Google, has recently written about OpenID in the Enterprise SaaS market, endorsing this view of Google as an outsourced identity provider.

Thirdly, there is the whole issue of student expectations. I didn't want to talk to this in detail but it seems obvious that an increasingly 'open' mashed and mashable experience is now the norm for all of us - and that will apply as much to the educational content we use and make available as it does to everything else. Further, the mashable experience is at least as much about being able to carry our identities relatively seamlessly across services as it is about the content. Again, it seems unclear to me that SAML fits well into this kind of world.

There are two other areas where our expectations and reality show something of a mis-match. Firstly, our tightly controlled, somewhat rigid approach to access management and security are at odds with the rather fuzzy (or at least fuzzilly interpretted) licences negotiated by Eduserv and JISC Collections for the external content to which we have access. And secondly, our over-arching sense of the need for user privacy (the need to prevent publishers from cross-referencing accesses to different resources by the same user for example) are holding back the development of personalised services and run somewhat counter to the kinds of things we see happening in mainstream services.

Fourthly, there's the whole growth of mobile - the use of smart-phones, mobile handsets, iPhones, iPads and the rest of it - and the extent to which our access management infrastructure works (or not) in that kind of 'app'-based environment.

Then there is the 'open' agenda, which carries various aspects to it - open source, open access, open science, and open educational resources. It seems to me that the open access movement cuts right to the heart of the primary use-case for federated access management, i.e. controlling access to published scholarly literature. But, less directly, the open science movement, in part, pushes researchers towards the use of more open 'social' web services for their scholarly communication where SAML is not typically the primary mechanism used to control access.

Similarly, the emerging personal learning environment (PLE) meme (a favorite of educational conferences currently), where lecturers and students work around their institutional VLE by choosing to use a mix of external social web services (Flickr, Blogger, Twitter, etc.) again encourages the use of external services that are not impacted by our choices around the identity and access management infrastructure and over which we have little or no control. I was somewhat sceptical about the reality of the PLE idea until recently. My son started at the City of Bath College - his letter of introduction suggested that he created himself a Google Docs account so that he could do his work there and submit it using email or Facebook. I doubt this is college policy but it was a genuine example of the PLE in practice so perhaps my scepticism is misplaced.

We also have the changing nature of the relationship between students and institutions - an increasingly mobile and transitory student body, growing disaggregation between the delivery of learning and accreditation, a push towards overseas students (largely for financial reasons), and increasing collaboration between institutions (both for teaching and research) - all of which have an impact on how students see their relationship with the institution (or institutions) with whom they have to deal. Will the notion of a mandated 3 or 4 year institutional email account still make sense for all (or even most) students in 5 or 10 years time?

In a similar way, there's the changing customer base for publishers of academic content to deal with. At the Eduserv Symposium last year, for example, David Smith of CABI described how they now find that having exposed much of their content for discovery via Google they have to deal with accesses from individuals who are not affiliated with any institution but who are willing to pay for access to specific papers. Their access management infrastructure has to cope with a growing range of access methods that sit outside the 'educational' space. What impact does this have on their incentives for conforming to education-only norms?

And finally there's the issue of usability, and particularly the 'where are you from' discovery problem. Our traditional approach to this kind of problem is to build a portal and try and control how the user gets to stuff, such that we can generate 'special' URLs that get them to their chosen content in such a way that they can be directed back to us seemlessly in order to login. I hate portals, at least insofar as they have become an architectural solution, so the less said the better. As I said in my talk, WAYFless URLs are an abomination in architectural terms, saved only by the fact that they work currently. In my presentation I played up the alternative usability work that the Kantara ULX group have been doing in this area, which it seems to me is significantly better than what has gone before. But I learned at the conference that Shibboleth and the UK WAYF service have both also been doing work in this area - so that is good. My worry though is that this will remain an unsolvable problem, given the architecture we are presented with. (I hope I'm wrong but that is my worry). As a counterpoint, in the more... err... mainstream world we are seeing a move towards what I call the 'First Bus' solution (on the basis that in many UK cities you only see buses run by the First Group (despite the fact that bus companies are supposed to operate in a free market)) where you only see buttons to log in using Google, Facebook and one or two others.

I'm not suggesting that this is the right solution - just noting that it is one strategy for dealing with an otherwise difficult usability problem.

Note that we are also seeing some consolidation around technology as well - notably OpenID and OAuth - though often in ways that hides it from public view (e.g. hidden behind a 'login with google' or 'login with facebook' button).

Which essentially brings me to my concluding screen - you know, the one where I talk about all the implications of the trends above - which is where I have less to say than I should! Here's the text more-or-less copy-and-pasted from my final slide:

  • ‘education’ is a relatively small fish in a big pond (and therefore can't expect to drive the agenda)
  • mainstream approaches will win (in the end) - ignoring the difficult question of defining what is mainstream
  • for the Eduserv OpenAthens product, Google is as big a threat as Shibboleth (and the same is true for Shibboleth)
  • the current financial climate will have an effect somewhere
  • HE institutions are probably becoming more enterprise-like but they are still not totally like commercial organisations and they tend to occupy an uncomfortable space between the ‘enterprise’ and the ‘social web’ driven by different business needs (c.f. the finance system vs PLEs and open science)
  • the relationships between students (and staff) and institutions are changing

In his opening talk at FAM10 the day before, David Harrison had urged the audience to become leaders in the area of federated access management. In a sense I want the same. But I also want us, as a community, to become followers - to accept that things happen outside our control and to stop fighting against them the whole time.

Unfortunately, that's a harder rallying call to make!

Your comments on any/all of the above are very much welcomed.

September 17, 2010

Key trends in education? - a crowdsource request

I've been asked to give a talk at FAM10 (an event "to discuss federated identity and access management within the UK") replacing someone who has had to drop out, hence the rather late notice. I therefore wasn't first choice, nor would I expect to be, but having been asked I feel reluctant to simply say no and my previous posts here tend to indicate that I do have views on the subject of federated access management, particularly as it is being implemented in the UK. On the down side, there's a strong possibility that what I have to say will ruffle feathers with some combination of people in my own company (Eduserv), people at the JISC and people in the audience (probably all of them) so I need to be a bit careful. Still, that's never stopped me before :-)

I can't really talk about the technology - at least, not at a level that would be interesting for what is likely to be a highly technical FAM10 crowd. What I want to try and do instead is to take a look at current and emerging trends (technical, political and social), both in education in the UK and more broadly, and try to think about what those trends tell us about the future for federated access management.

To that end, I need your help!

Clearly, I have my own views on what the important trends might be but I don't work in academia and therefore I'm not confident that my views are sufficiently based in reality. I'd therefore like to try and crowdsource some suggestions for what you (I'm speaking primarily to people who work inside the education sector here - though I'm happy to hear from others as well) think are the key trends. I'm interested in both teaching/learning and research/scholarly communication and trends can be as broad or as narrow, as technological or as non-technological, as specific to education or as general as you like.

To keep things focused, how about I ask people to list their top 5 trends (though fewer is fine if you are struggling). I probably need more than one-word answers (sorry) so, for example, rather than just saying 'mobile', 'student expectations', 'open data' or 'funding pressure', indicate what you think those things might mean for education (particularly on higher education) in the UK. I'd love to hear from people outside the UK as well as those who work here. Don't worry about the impact on 'access management' - that's my job... just think about what you think the current trends affecting higher and further education are.

Any takers? Email me at [email protected] or comment below.

And finally... to anyone who just thinks that I'm asking them to do my work for me - well, yes, I am :-) On the plus side, I'll collate the answers (in some form) into the resulting presentation (on Slideshare) so you will get something back.


July 08, 2010

Going LOCAH: a Linked Data project for JISC

Recently I worked with Adrian Stevenson of UKOLN and Jane Stevenson and Joy Palmer of MIMAS, University of Manchester on a bid for a project under the JISC O2/10 call, Deposit of research outputs and Exposing digital content for education and research, and I'm very pleased to be able to say that the proposal has been accepted and the project has been funded.

The project is called "Linked Open Copac Archives Hub" (LOCAH). It aims to address the "expose" section of the call, and focuses on making available data hosted by the Copac and Archives Hub services hosted by MIMAS - i.e. library catalogue data and data from archival finding aids - in the form of Linked Data; developing some prototype applications illustrating the use of that data; and analysing some of the issues arising from that work. The main partners in the work are UKOLN and MIMAS, with contributions from Eduserv, OCLC and Talis. The Eduserv contribution will take the form of some input from me, probably mostly in the area of working with Jane on modelling some of the archival finding aid data, currently held in the form of EAD-encoded XML documents, so that it can be represented in RDF - though I imagine I'll be sticking my oar in on various other aspects along the way.

UKOLN is managing the project and hosting a project weblog. I'm not sure at the moment how I'll divide up thoughts between here and there; I'll probably end up with a bit of duplication along the way.

February 11, 2010

Repositories and the Cloud - tell us your views

It's now a little over a week to go until the Repositories and the Cloud event (jointly organised by Eduserv and the JISC) takes place in London.  The event is sold out (sorry to those of you that haven't got a place) and we have a full morning of presentations from DuraSpace, Microsoft and EPrints and an afternoon of practical experience (Terry Harmer of the Belfast eScience Centre) and parallel discussion groups looking at both policy and technical issues.

To those of you that are coming, please remember that the afternoon sessions are for discussion.  We want you to get involved, to share your thoughts and to challenge the views of other people at the event (in the nicest way possible of course).  We'd love to know what you think about repositories and the cloud (note that, by that phrase, I mean the use of utility cloud providers as back-end storage for repository-like services).  Please share your thoughts below, or blog them using the event tag - 'repcloud' - or just bring them with you on the day!

I will share my thoughts separately here next week but let me be honest... I don't actually know what I think about the relationship between repositories and the cloud.  I'm coming to the meeting with an open mind.  As a community, we now have some experience of the policy and technical issues in our use of the cloud for things like undergraduate email but I expect the policy issues and technical requirements around repositories to be significantly different.  On that basis, I am really looking forward to the meeting.

The chairs of the two afternoon sessions, Paul Miller (paul.miller (at) who is leading the policy session and Brad McLean (bmclean (at) who is leading the technical session, would also like to hear your views on what you hope their sessions will cover.  If you have ideas please get in touch, either thru the comments form below, via Twitter (using the '#repcloud' hashtag) or by emailing them directly.


February 09, 2010

Virtual World Watch survey call for information

John Kirriemuir has issued a request for updated information for his his eighth Virtual World Watch "snapshot" survey of the use of virtual worlds in UK Higher and Further Education.

Previous survey reports can be found on the VWW site.

For further information about the sort of information John is after, see his post. He would like responses by the end of February 2010.

Our period of funding for this work is approaching its end, so this will be the last survey funded under the Eduserv Research Programme. John is planning to continue some Virtual World Watch activity, at least through 2010, as he indicates in this presentation which he gave to the recent "Where next for Virtual Worlds?" (wn4vw) meeting in London:

The slides from the other presentations from the wn4vw meeting (including a video of the opening presentation by Ralph Schroeder) are also available here, and you can find an archive of tagged Twitter posts from the day here.

I enjoyed the meeting (even if I'm not sure we really arrived at many concrete answers to the question of "where next?"), but it also felt quite sad. It marked the end of the projects Eduserv funded in 2007 on the use of virtual worlds in education. That grants call was the first one I was involved with after joining Eduserv in 2006, and although it was an area that was completely new to me, the response we got, both in terms of the number of proposals and their quality, seemed very exciting. And I still look back on the 2007 Symposium as one of the most successful (if rather nerve-wracking at the time!) events I've been involved in. As things worked out, I wasn't able to follow the progress of the projects as closely as I'd have liked, but the recent meeting reminded me again of the strong sense of community that seems to have built up amongst researchers, learning technologists and educators working in this area, which seems to have outlived particular projects and programmes. Of course we only funded a handful of projects, and other funding agencies helped develop that community too (I'm thinking particularly of JISC with its Open Habitat project, and the EU MUVEnation project), but it's something I'm pleased we were able to contribute to in a small way.

December 21, 2009

Scanning horizons for the Semantic Web in higher education

The week before last I attended a couple of meetings looking at different aspects of the use of Semantic Web technologies in the education sector.

On the Wednesday, I was invited to a workshop of the JISC-funded ResearchRevealed project at ILRT in Bristol. From the project weblog:

ResearchRevealed [...] has the core aim of demonstrating a fine-grained, access controlled, view layer application for research, built over a content integration repository layer. This will be tested at the University of Bristol and we aim to disseminate open source software and findings of generic applicability to other institutions.

ResearchRevealed will enhance ways in which a range of user stakeholder groups can gain up-to-date, accurate integrated views of research information and thus use existing institutional, UK and potentially global research information to better effect.

I'm not formally part of the project, but Nikki Rogers of ILRT mentioned it to me at the recent VoCamp Bristol meeting, and I expressed a general interest in what they were doing; they were also looking for some concrete input on the use of Dublin Core vocabularies in some of their candidate approaches.

This was the third in a series of small workshops, attended by representatives of the project from Bristol, Oxford and Southampton, and the aim was to make progress on defining a "core Research ontology". The morning session circled mainly around usage scenarios (support for the REF (and other "impact" assessment exercises), building and sustaining cross-institutional collaboration etc), and the (somewhat blurred) boundaries between cross-institutional requirements and institution-specific ones; what data might be aggregated, what might be best "linked to"; and the costs/benefits of rich query interfaces (e.g. SPARQL endpoints) v simpler literal- or URI-based lookups. In the afternoon, Nick Gibbins from the University of Southampton walked through a draft mapping of the CERIF standard to RDF developed by the dotAC project. This focused attention somewhat and led to some - to me - interesting technical discussions about variant ways of expressing information with differing degrees of precision/flexibility. I had to leave before the end of the meeting, but I hope to be able to continue to follow the project's progress, and contribute where I can.

A long train journey later, the following day I was at a meeting in Glasgow organised by the CETIS Semantic Technologies Working Group to discuss the report produced by the recent JISC-funded Semtech project, and to try to identify potential areas for further work in that area by CETIS and/or JISC. Sheila MacNeill from CETIS liveblogged proceedings here. Thanassis Tiropanis from the University of Southampton presented the project report, with a focus on its "roadmap for semantic technology adoption". The report argues that, in the past, the adoption of semantic technologies may have been hindered by a tendency towards a "top-down" approach requiring the widespread agreement on ontologies; in contrast the "linked data" approach encourages more of a "bottom-up" style in which data is first made available as RDF, and then later application-specific or community-wide ontologies are developed to enable more complex reasoning across the base data (which may involve mapping that initial data to those ontologies as they emerge). While I think there's a slight risk of overstating the distinction - in my experience many "linked data" initiatives do seem to demonstrate a good deal of thinking about the choice of RDF vocabularies and compatibility with other datasets - and I guess I see rather more of a continuum, it's probably a useful basis for planning. The report recommends a graduated approach which focusses initially on the development of this "linked data field" - in particular where there are some "low-hanging fruit" cases of data already made available in human-readable form which could relatively easily be made available in RDF, especially using RDFa.

One of the issues I was slightly uneasy with in the Glasgow meeting was that occasionally there were mentions of delivering "interoperability" (or "data interoperability") without really saying what was meant by that - and I say this as someone who used to have the I-word in my job title ;-) I feel we probably need to be clearer, and more precise, about what different "semantic technologies" (for want of a better expression) enable. What does the use of RDF provide that, say, XML typically doesn't? What does, e.g., RDF Schema add to that picture? What about convergence on shared vocabularies? And so on. Of course, the learners, teachers, researchers and administrators using the systems don't need to grapple with this, but it seems to me such aspects do need to be conveyed to the designers and developers, and perhaps more importantly - as Andy highlighted in his report of related discussions at the CETIS conference - to those who plan and prioritise and fund such development activity. (As an aside, I this is also something of an omission in the current version of the DCMI document on "Interoperability Levels": it tells me what characterises each level, and how I can test for whether an application meets the requirements of the level, but it doesn't really tell me what functionality each level provides/enables, or why I should consider level n+1 rather than level n.)

Rather by chance, I came across a recent presentation by Richard Cyganiak to the Vienna Linked Data Camp, which I think addresses some similar questions, albeit from a slightly different starting point: Richard asks the questions, "So, if we have linked data sources, what's stopping the development of great apps? What else do we need?", and highlights various dimensions of "heterogeneity" which may exist across linked data sources (use of identifiers, differences in modelling, differences in RDF vocabularies used, differences in data quality, differences in licensing, and so on).

Finally, I noticed that last Friday, Paul Miller (who was also at the CETIS meeting) announced the availability of a draft of a "Horizon Scan" report on "Linked Data" which he has been working on for JISC, as part of the background for a JISC call for projects in this area some time early in 2010. It's a relatively short document (hurrah for short reports!) but I've only had time for a quick skim through. It aims for some practical recommendations, ranging from general guidance on URI creation and the use of RDFa to more specific actions on particular resources/datasets. And here I must reiterate what Paul says in his post - it's a draft on which he is seeking comments, not the final report, and none of those recommendations have yet been endorsed by JISC. (If you have comments on the document, I suggest that you submit them to Paul (contact details here or comment on his post) rather than commenting on this post.)

In short, it's encouraging to see the active interest in this area growing within the HE sector. On reading Paul's draft document, I was struck by the difference between the atmosphere now (both at the Semtech meeting, and more widely) and what Paul describes as the "muted" conclusions of Brian Matthews' 2005 survey report on Semantic Web Technologies for JISC Techwatch. Of course, many of the challenges that Andy mentioned in his report of the CETIS conference session remain to be addressed, but I do sense that there is a momentum here - an excitement, even - which I'm not sure existed even eighteen months ago. It remains to be seen whether and how that enthusiasm translates into applications of benefit to the educational community, but I look forward to seeing how the upcoming JISC call, and the projects it funds, contribute to these developments.

December 04, 2009

Moving beyond the typical 15% deposit level

In an email to the [email protected] mailing list, Steve Hitchcock writes:

... authors of research papers everywhere want "to reach the eyes and minds of peers, fellow esoteric scientists and scholars the world over, so that they can build on one another's contributions in that cumulative. collaborative enterprise called learned inquiry."

[This] belief was founded on principle, but also on observed practice, that in 1994 we saw authors spontaneously making their papers available on the Web. From those small early beginnings we just assumed the practice would grow. Why wouldn't it? The Web was new, and open, and people were learning quickly how they could make use of it. Our instincts about the Web were not wrong. Since then, writing to the Web has become even easier.

So this is the powerful idea ..., and what we haven't yet understood is why, beyond the typical 15% deposit level, self-archiving does not happen without mandates. The passage of 15 years should tell us something about the other 85% of authors. Do they not share this belief? Does self-archiving not serve the purpose? ...

This is the part that needs to be re-examined, the idea, and why it has yet to awaken and enthuse our colleagues, as it has us, to the extent we envisaged. Might we have misunderstood and idealised the process of 'learned inquiry'?

I completely agree.

In passing, I'd be interested to know what uptake of Mendeley is like, and whether it looks likely to make any in-roads into the 85%, either as an adjunct to institutional repositories or as an alternative?

December 03, 2009

On being niche

I spoke briefly yesterday at a pre-IDCC workshop organised by REPRISE.  I'd been asked to talk about Open, social and linked information environments, which resulted in a re-hash of the talk I gave in Trento a while back.

My talk didn't go too well to be honest, partly because I was on last and we were over-running so I felt a little rushed but more because I'd cut the previous set of slides down from 119 to 6 (4 really!) - don't bother looking at the slides, they are just images - which meant that I struggled to deliver a very coherent message.  I looked at the most significant environmental changes that have occurred since we first started thinking about the JISC IE almost 10 years ago.  The resulting points were largely the same as those I have made previously (listen to the Trento presentation) but with a slightly preservation-related angle:

  • the rise of social networks and the read/write Web, and a growth in resident-like behaviour, means that 'digital identity' and the identification of people have become more obviously important and will remain an important component of provenance information for preservation purposes into the future;
  • Linked Data (and the URI-based resource-oriented approach that goes with it) is conspicuous by its absence in much of our current digital library thinking;
  • scholarly communication is increasingly diffusing across formal and informal services both inside and outside our institutional boundaries (think blogging, Twitter or Google Wave for example) and this has significant implications for preservation strategies.

That's what I thought I was arguing anyway!

I also touched on issues around the growth of the 'open access' agenda, though looking at it now I'm not sure why because that feels like a somewhat orthogonal issue.

Anyway... the middle bullet has to do with being mainstream vs. being niche.  (The previous speaker, who gave an interesting talk about MyExperiment and its use of Linked Data, made a similar point).  I'm not sure one can really describe Linked Data as being mainstream yet, but one of the things I like about the Web Architecture and REST in particular is that they describe architectural approaches that haven proven to be hugely successful, i.e. they describe the Web.  Linked data, it seems to me, builds on these in very helpful ways.  I said that digital library developments often prove to be too niche - that they don't have mainstream impact.  Another way of putting that is that digital library activities don't spend enough time looking at what is going on in the wider environment.  In other contexts, I've argued that "the only good long-term identifier, is a good short-term identifier" and I wonder if that principle can and should be applied more widely.  If you are doing things on a Web-scale, then the whole Web has an interest in solving any problems - be that around preservation or anything else.  If you invent a technical solution that only touches on scholarly communication (for example) who is going to care about it in 50 or 100 years - answer, not all that many people.

It worries me, for example, when I see an architectural diagram (as was shown yesterday) which has channels labelled 'OAI-PMH', XML' and 'the Web'!

After my talk, Chris Rusbridge asked me if we should just get rid of the JISC IE architecture diagram.  I responded that I am happy to do so (though I quipped that I'd like there to be an archival copy somewhere).  But on the train home I couldn't help but wonder if that misses the point.  The diagram is neither here nor there, it's the "service-oriented, we can build it all", mentality that it encapsulates that is the real problem.

Let's throw that out along with the diagram.

September 29, 2009

The Google Book Settlement

The JISC have made a summary of the proposed Google Book Settlement available for comment on Writetoreply (a service that I really like by the way), along with a series of questions that might usefully be considered by interested parties. Thanks to Naomi Korn and Rachel Bruce for their work on this.

Not knowing a great deal about the proposed settlement I didn't really feel able to comment but in an effort to get up to speed I decided to put together a short set of Powerpoint slides, summarising my take on the issues, based largely on the JISC text.

Here's what I came up with:

Of course, my timing isn't ideal because the proposed review meeting on the 7th October has now been replaced with a 'status update' meeting [PDF] that will "decide how to proceed with the case as expeditiously as possible". Ongoing discussion between Google and the US Department of Justice looks likely to result in changes to the proposed settlement before it gets to the review stage.

Nonetheless, I think it's useful to understand the issues that have led up to any revised settlement and in any case, it was a nice excuse to put together a set of slides using CC images of books from Flickr!

September 16, 2009

Edinburgh publish guidance on research data management

The University of Edinburgh has published some local guidance about the way that research data should be managed, Research data management guidance, covering How to manage research data and Data sharing and preservation, as well as detailing local training, support and advice options.

One assumes that this kind of thing will become much more common at universities over the next few years.

Having had a very quick look, it feels like the material is more descriptive than prescriptive - which isn't meant as a negative comment, it just reflects the current state of play. The section on Data documentation & metadata for example, gives advice as simple as:

Have you created a "readme.txt" file to describe the contents of files in a folder? Such a simple act can be invaluable at a later date.

but also provides a link to the UK Data Archive's guidance on Data Documentation and Metadata, which at first sight appears hugely complex. I'm not sure what your average research will make of it?

(In passing, I note that the UKDA seem to be promoting the use of the Data Documentation Initiative standard at what they call the 'catalogue' level, a standard that I've not come across before but one that appears to be rooted firmly outside the world of linked data, which is a shame.)

Similarly, the section on Methods for data sharing lists a wide range of possible options (from "posting on a University website" thru to "depositing in a data repository") without being particularly prescriptive about which is better and why.

(As a second aside, I am continually amazed by this firm distinction in the repository world between 'posting on the website' and 'depositing in a repository' - from the perspective of the researcher, both can, and should, achieve the same aims, i.e. improved management, more chance of persistence and better exposure.)

As we have found with repositories of research publications, it seems to me that research data repositories (the Edinburgh DataShare in this case) need to hide much of this kind of complexity, and do most of the necessary legwork, in order to turn what appears to be a simple and obvious 'content management' workflow (from the point of view of the individual researcher) into a well managed, openly shared, long term resource for the community.

August 20, 2009

What researchers think about data preservation and access

There's an interesting report in the current issues of Ariadne by Neil Beagrie, Robert Beagrie and Ian Rowlands, Research Data Preservation and Access: The Views of Researchers, fleshing out some of the data behind the UKRDS Report, which I blogged about a while back.

I have a minor quibble with the way the data has been presented in the report, in that it's not overly clear how the 179 respondents represented in Figure 1 have been split across the three broad areas (Sciences, Social Sciences, and Arts and Humanities) that appear in subsequent figures. One is left wondering how significant the number of responses in each of the 3 areas was?  I would have preferred to see Figure 1 organised in such a way that the 'departments and faculties' were grouped more obviously into the broad areas.

That aside, I think the report is well worth reading.  I'll just highlight what the authors perceive to be the emerging themes:

  • It is clear that different disciplines have different requirements and approaches to research data.
  • Current provision of facilities to encourage and ensure that researchers have data stores where they can deposit their valuable data for safe-keeping and for sharing, as appropriate, varies from discipline to discipline.
  • Local data management and preservation activity is very important with most data being held locally.
  • Expectations about the rate of increase in research data generated indicate not only higher data volumes but also an increase in different types of data and data generated by disciplines that have not until recently been producing volumes of digital output.
  • Significant gaps and areas of need remain to be addressed.

The Findings of the Scoping Study and Research Data Management Workshop (undertaken at the University of Oxford and part of the work that infomed the Ariadne article) provides an indication of the "top requirements for services to help [researchers] manage data more effectively":

  • Advice on practical issues related to managing data across their life cycle. This help would range from assistance in producing a data management/sharing plan; advice on best formats for data creation and options for storing and sharing data securely; to guidance on publishing and preserving these research data.
  • A secure and user-friendly solution that allows storage of large volume of data and sharing of these in a controlled fashion way allowing fine grain access control mechanisms.
  • A sustainable infrastructure that allows publication and long-term preservation of research data for those disciplines not currently served by domain specific services such as the UK Data Archive, NERC Data Centres, European Bioinformatics Institute and others.
  • Funding that could help address some of the departmental challenges to manage the research data that are being produced.

Pretty high level stuff so nothing particularly surprising there. It seems to me that some work drilling down into each of these areas might be quite useful.

July 20, 2009

On names

There's was a brief exchange of messages on the jisc-repositories mailing list a couple of weeks ago concerning the naming of authors in institutional repositories.  When I say naming, I really mean identifying because a name, as in a string of characters, doesn't guarantee any kind of uniqueness - even locally, let alone globally.

The thread started from a question about how to deal with the situation where one author writes under multiple names (is that a common scenario in academic writing?) but moved on to a more general discussion about how one might assign identifiers to people.

I quite liked Les Carr's suggestion:

Surely the appropriate way to go forward is for repositories to start by locally choosing a scheme for identifying individuals (I suggest coining a URI that is grounded in some aspect of the institution's processes). If we can export consistently referenced individuals, then global services can worry about "equivalence mechanisms" to collect together all the various forms of reference that.

This is the approach taken by the Resist Knowledgebase, which is the foundation for the (just started) dotAC JISC Rapid Innovation project.

(Note: I'm assuming that when Les wrote 'URI' he really meant 'http URI').

Two other pieces of current work seem relevant and were mentioned in the discussion. Firstly the JISC-funded Names project which is working on a pilot Names Authroity Service. Secondly, the RLG Networking Names report.  I might be misunderstanding the nature of these bits of work but both seem to me to be advocating rather centralised, registry-like, approaches. For example, both talk about centrally assigning identifiers to people.

As an aside, I'm constantly amazed by how many digital library initiatives end up looking and feeling like registries. It seems to be the DL way... metadata registries, metadata schema registries, service registries, collection registries. You name it and someone in a digital library will have built a registry for it.

May favoured view is that the Web is the registry. Assign identifiers at source, then aggregate appropriately if you need to work across stuff (as Les suggests above).  The <sameAs> service is a nice example of this:

The Web of Data has many equivalent URIs. This service helps you to find co-references between different data sets.

As Hugh Glaser says in a discussion about the service:

Our strong view is that the solution to the problem of having all these URIs is not to generate another one. And I would say that with services of this type around, there is no reason.

In thinking about some of the issues here I had cause to go back and re-read a really interesting interview by Martin Fenner with Geoffrey Bilder of CrossRef (from earlier this year).  Regular readers will know that I'm not the world's biggest fan of the DOI (on which CrossRef is based), partly for technical reasons and partly on governence grounds, but let's set that aside for the moment.  In describing CrossRef's "Contributor ID" project, Geoff makes the point that:

... “distributed” begets “centralized”. For every distributed service created, we’ve then had to create a centralized service to make it useable again (ICANN, Google, Pirate Bay, CrossRef, DOAJ, ticTocs, WorldCat, etc.). This gets us back to square one and makes me think the real issue is - how do you make the centralized system that eventually emerges accountable?

I think this is a fair point but I also think there is a very significant architectural difference between a centralised service that aggregates identifiers and other information from a distributed base of services, in order to provide some useful centralised function for example, vs. a centralised service that assigns identifiers which it then pushes out into the wider landscape. It seems to me that only the former makes sense in the context of the Web.

June 17, 2009

JISC Data Management Infrastructure call

I've been reading thru the JISC's Data Management Infrastructure call, largely on the basis that we are interested in being considered as a project partner on a bid under the call (note that we can't bid directly because we are not an HEI). So, what might Eduserv offer to a potential partner?

  • detailed knowledge of the identity and access management space including long-standing service provision and development expertise,
  • hosting experience and Web development expertise including, potentially, our new data centre facility in Swindon,
  • digital library standards expertise, particularly w.r.t. the semantic Web, linked data, metadata and persistent identifiers.

Please get in touch if any of this looks to be of interest in the context of the call.

In reading thru the call I created a list of the reports, studies and projects listed in Appendix E and Appendix F which might be of use to others. I was surprised at how lightly bookmarked many of the listed resources are, given that these are presumably the key texts in this space? This might be indicative of a low uptake of within this particular community? However, it might also indicate 'cool URI' issues, meaning that different people are bookmarking the same resource using different URLs. For example, several of the reports (including at least one hosted by us unfortunately :-( ) have URLs which work with and without a '.aspx' suffix.

Bookmarking services like are one reason why moving towards 'cool' and unique URIs is a good idea.

May 18, 2009

Symposium live-streaming and social media

We are providing a live video stream from our symposium again this year, giving people who have not registered to attend in person a chance to watch all the talks and discussion and to contribute their own thoughts and questions via Twitter and a live chat facility (this year based on ScribbleLive).

Our streaming partner for this year is Switch New Media and we are looking forward to working with them on the day.  Some of you will probably be familiar with them because they provided streaming from this year's JISC Conference and the JISC Libraries of the Future event in Oxford.

If you plan on watching all or part of the stream, please sign up for the event’s social network so that we (and others) know who you are.  The social network has an option to indicate whether you are attending the symposium in person or remotely.

Also, for anyone tweeting, blogging or sharing other material about the event, remember that the event tag is ‘esym09’ (‘#esym09’ on Twitter).  If you want to follow the event on Twitter, you can do so using the Twitter search facility.

March 20, 2009

Unlocking Audio

I spent the first couple of days this week at the British Library in London, attending the Unlocking Audio 2 conference.  I was there primarily to give an invited talk on the second day.

You might notice that I didn't have a great deal to say about audio, other than to note that what strikes me as interesting about the newer ways in which I listen to music online (specifically and Spotify) is that they are both highly social (almost playful) in their approach and that they are very much of the Web (as opposed to just being 'on' the Web).

What do I mean by that last phrase?  Essentially, it's about an attitude.  It's about seeing being mashed as a virtue.  It's about an expectation that your content, URLs and APIs will be picked up by other people and re-used in ways you could never have foreseen.  Or, as Charles Leadbeater put it on the first day of the conference, it's about "being an ingredient".

I went on to talk about the JISC Information Environment (which is surprisingly(?) not that far off its 10th birthday if you count from the initiation of the DNER), using it as an example of digital library thinking more generally and suggesting where I think we have parted company with the mainstream Web (in a generally "not good" way).  I noted that while digital library folks can discuss identifiers forever (if you let them!) we generally don't think a great deal about identity.  And even where we do think about it, the approach is primarily one of, "who are you and what are you allowed to access?", whereas on the social Web identity is at least as much about, "this is me, this is who I know, and this is what I have contributed". 

I think that is a very significant difference - it's a fundamentally different world-view - and it underpins one critical aspect of the difference between, say, Shibboleth and OpenID.  In digital libraries we haven't tended to focus on the social activity that needs to grow around our content and (as I've said in the past) our institutional approach to repositories is a classic example of how this causes 'social networking' issues with our solutions.

I stole a lot of the ideas for this talk, not least Lorcan Dempsey's use of concentration and diffusion.  As an aside... on the first day of the conference, Charles Leadbeater introduced a beach analogy for the 'media' industries, suggesting that in the past the beach was full of a small number of large boulders and that everything had to happen through those.  What the social Web has done is to make the beach into a place where we can all throw our pebbles.  I quite like this analogy.  My one concern is that many of us do our pebble throwing in the context of large, highly concentrated services like Flickr, YouTube, Google and so on.  There are still boulders - just different ones?  Anyway... I ended with Dave White's notions of visitors vs. residents, suggesting that in the cultural heritage sector we have traditionally focused on building services for visitors but that we need to focus more on residents from now on.  I admit that I don't quite know what this means in practice... but it certainly feels to me like the right direction of travel.

I concluded by offering my thoughts on how I would approach something like the JISC IE if I was asked to do so again now.  My gut feeling is that I would try to stay much more mainstream and focus firmly on the basics, by which I mean adopting the principles of linked data (about which there is now a TED talk by Tim Berners-Lee), cool URIs and REST and focusing much more firmly on the social aspects of the environment (OpenID, OAuth, and so on).

Prior to giving my talk I attended a session about iTunesU and how it is being implemented at the University of Oxford.  I confess a strong dislike of iTunes (and iTunesU by implication) and it worries me that so many UK universities are seeing it as an appropriate way forward.  Yes, it has a lot of concentration (and the benefits that come from that) but its diffusion capabilities are very limited (i.e. it's a very closed system), resulting in the need to build parallel Web interfaces to the same content.  That feels very messy to me.  That said, it was an interesting session with more potential for debate than time allowed.  If nothing else, the adoption of systems about which people can get religious serves to get people talking/arguing.

Overall then, I thought it was an interesting conference.  I suspect that my contribution wasn't liked by everyone there - but I hope it added usefully to the debate.  My live-blogging notes from the two days are here and here.

March 11, 2009

Eduserv Symposium 2009

Yesterday we announced our annual symposium for 2009, Evolution or revolution: The future of identity and access management for research [title updated 23 March 2009], which this year will focus on the intersection between identity management, access management and e-research. I think this is an important conjunction of themes and one where most focus to date has been on controlling access to resources whereas I think the interesting issues in the future will be around the changing nature of a researcher's online identity.

We think we've put together a nice mix of speakers, including those speaking from the perspective of researchers, funders, publishers, providers of national services and providers of institutional services. We also have a couple of speaking slots for which we are awaiting confirmation before we can go public.

This meeting is the 5th in our symposium series and comes at a time when we are transitioning from a Foundation to a Research Programme (about which, more later). As usual, attendance on the day is free. The symposium will be held at the Royal College of Physicians in London on Thurs 21 May 2009. Hope to see you there.

October 08, 2007

DCMI Scholarly Communications Community launched

Mentioning the Eprints DCAP reminds me that I should draw attention to the fact that DCMI has recently launched a "Scholarly Communications Community" as a focus for work in that area:

The DCMI Scholarly Communications Community is a forum for individuals and organisations to exchange information, knowledge and general discussion on issues relating to using Dublin Core for describing research papers, scholarly texts, data objects and other resources created and used within scholarly communications. This includes providing a forum for discussion around the Eprints Application Profile, also known as the Scholarly Works Application Profile (SWAP) and for other existing and future application profiles created to describe items of scholarly communication.

The group is co-ordinated by Julie Allinson from the University of York and Rosemary Russell from UKOLN, and as usual for DCMI communities, participation is open to anyone who wishes to subscribe to the mailing list and join in (or indeed start) discussions.



eFoundations is powered by TypePad