« December 2009 | Main | February 2010 »

January 31, 2010

Readability and linkability

In July last year I noted that the terminology around Linked Data was not necessarily as clear as we might wish it to be.  Via Twitter yesterday, I was reminded that my colleague, Mike Ellis, has a very nice presentation, Don't think websites, think data, in which he introduces the term MRD - Machine Readable Data.

It's worth a quick look if you have time:

We also used the 'machine-readable' phrase in the original DNER Technical Architecture, the work that went on to underpin the JISC Information Environment, though I think we went on to use both 'machine-understandable' and 'machine-processable' in later work (both even more of a mouthful), usually with reference to what we loosely called 'metadata'.  We also used 'm2m - machine to machine' a lot, a phrase introduced by Lorcan Dempsey I think.  Remember that this was back in 2001, well before the time when the idea of offering an open API had become as widespread as it is today.

All these terms suffer, it seems to me, from emphasising the 'readability' and 'processability' of data over its 'linkedness'. Linkedness is what makes the Web what it is. With hindsight, the major thing that our work on the JISC Information Environment got wrong was to play down the importance of the Web, in favour of a set of digital library standards that focused on sharing 'machine-readable' content for re-use by other bits of software.

Looking at things from the perspective of today, the terms 'Linked Data' and 'Web of Data' both play up the value in content being inter-linked as well as it being what we might call machine-readable.

For example, if we think about open access scholarly communication, the JISC Information Environment (in line with digital libraries more generally) promotes the sharing of content largely through the harvesting of simple DC metadata records, each of which typically contains a link to a PDF copy of the research paper, which, in turn, carries only human-readable citations to other papers.  The DC part of this is certainly MRD... but, overall, the result isn't very inter-linked or Web-like. How much better would it have been to focus some effort on getting more Web links between papers embedded into the papers themselves - using what we would now loosely call a 'micro format'?  One of the reasons I like some of the initiatives around the DOI (though I don't like the DOI much as a technology), CrossRef springs to mind, is that they potentially enable a world where we have the chance of real, solid, persistent Web links between scholarly papers.

RDF, of course, offers the possibility of machine-readability, machine-processable semantics, and links to other content - which is why it is so important and powerful and why initiatives like data.gov.uk need to go beyond the CSV and XML files of this world (which some people argue are good enough) and get stuff converted into RDF form.

As an aside, DCMI have done some interesting work on Interoperability Levels for Dublin Core Metadata. While this work is somewhat specific to DC metadata I think it has some ideas that could be usefully translated into the more general language of the Semantic Web and Linked Data (and probably to the notions of the Web of Data and MRD).

Mike, I think, would probably argue that this is all the musing of a 'purist' and that purists should be ignored - and he might well be right.  I certainly agree with the main thrust of the presentation that we need to 'set our data free', that any form of MRD is better than no MRD at all, and that any API is better than no API.  But we also need to remember that it is fundamentally the hyperlink that has made the Web what it is and that those forms of MRD that will be of most value to us will be those, like RDF, that strongly promote the linkability of content, not just to other content but to concepts and people and places and everything else.

The labels 'Linked Data' and 'Web of Data' are both helpful in reminding us of that.

January 22, 2010

The right and left hands of open government data in the UK

As I'm sure everyone knows by now, the UK Government's data.gov.uk site was formally launched yesterday to a significant fanfare on Twitter and elsewhere.  There's not much I can add other than to note that I think this initiative is a very good thing and I hope that we can contribute more in the future than we have done to date.

[Edit: I note that the video of the presentation by Tim Berners-Lee and Nigel Shadbolt is now available.]

I'd like to highlight two blog posts that hurtled past in my Twitter stream yesterday.  The first, by Brian Hoadley, rightly reminds us that Open data is not a panacea – but it is a start:

In truth, I’ve been waiting for Joe Bloggs on the street to mention in passing – “Hey, just yesterday I did ‘x’ online” and have it be one of those new ‘Services’ that has been developed from the release of our data. (Note: A Joe Bloggs who is not related to Government or those who encircle Government. A real true independent Citizen.)

It may be a long wait.

The reality is that releasing the data is a small step in a long walk that will take many years to see any significant value. Sure there will be quick wins along the way – picking on MP’s expenses is easy. But to build something sustainable, some series of things that serve millions of people directly, will not happen overnight. And the reality, as Tom Loosemore pointed out at the London Data Store launch, it won’t be a sole developer who ultimately brings it to fruition.

The second, from the Daily Mash, is rather more flippant, New website to reveal exactly why Britain doesn't work:

Sir Tim said ordinary citizens will be able to use the data in conjunction with Ordnance Survey maps to show the exact location of road works that are completely unnecessary and are only being carried out so that some lazy, stupid bastard with a pension the size of Canada can use up his budget before the end of March.

The information could also be used to identify Britain's oldest pothole, how much business it has generated for its local garage and why in the name of holy buggering fuck it has never, ever been fixed.

And, while we are on the subject of maps and so on, today's posting to the Ernest Marples Blog, Postcode Petition Response — Our Reply, makes for an interesting read about the government's somewhat un-joined-up response to a petition to "encourage the Royal Mail to offer a free postcode database to non-profit and community websites":

The problem is that the licence was formed to suit industry. To suit people who resell PAF data, and who use it to save money and do business. And that’s fine — I have no problem with industry, commercialism or using public data to make a profit.

But this approach belongs to a different age. One where the only people who needed postcode data were insurance and fulfilment companies. Where postcode data was abstruse and obscure. We’re not in that age any more.

We’re now in an age where a motivated person with a laptop can use postcode data to improve people’s lives. Postcomm and the Royal Mail need to confront this and change the way that they do things. They may have shut us down, but if they try to sue everyone who’s scraping postcode data from Google, they’ll look very foolish indeed.

Finally — and perhaps most importantly — we need a consistent and effective push from the top. Number 10’s right hand needs to wake up and pay attention to the fantastic things the left hand’s doing.

Without that, we won’t get anywhere.

Hear, hear.

On the use of Microsoft SharePoint in UK universities

A while back we decided to fund a study looking at the uptake of SharePoint within UK higher education institutions, an activity undertaken on our behalf by a team from the University of Northumbria led by Julie McLeod.  At the time of the announcement of this work we took some stick about the focus on a single, commercially licensed, piece of software - something I attempted to explain in a blog post back in May last year.  On balance, I still feel we made the right decision to go with such a focused study, and I think the popularity of the event that we ran towards the end of last year confirms that to a certain extent.

I'm very pleased to say that the final report from the study is now available.  As with all the work we fund, the report has been released under a Creative Commons licence so feel free to go ahead a make use of it in whatever way you find helpful.  I think it's a good study that summarises the current state of play very nicely.  The key findings are listed on the project home page so I won't repeat them here.  Instead, I'd like to highlight what the report says about the future:

This research was conducted in the summer and autumn of 2009. Looking ahead to 2010 and beyond the following trends can be anticipated:

  • Beginnings of the adoption of SharePoint 2010
    SharePoint 2010 will become available in the first half of 2010. Most HEIs will wait until a service pack has been issued before they think about upgrading to it, so it will be 2011 before SharePoint 2010 starts to have an impact. SharePoint 2010 will bring improvements to the social computing functionality of My Sites, with Facebook/Twitter style status updates, and with tagging and bookmarking. My Sites are significant in an HE context because they are the part of SharePoint that HEIs consider providing to students as well as staff. We have hitherto seen lacklustre take up of My Sites in HE. Some HEIs implementing SharePoint 2007 have decided not to roll out My Sites at all, others have only provided them to staff, others have made them available to staff and students but decided not to actively promote them. We are likely to see increasing provision and take up of My Sites from those HEIs that move to SharePoint 2010.
  • Fuzzy boundary between SharePoint implementations and Virtual Learning Environments
    There is no prospect, in the near future, of SharePoint challenging Blackboard’s leadership in the market for institutional VLEs for teaching and learning. Most HEIs now have both an institutional VLE, and a SharePoint implementation. Institutional VLEs are accustomed to battling against web hosted applications such as Facebook for the attention of staff and students. They now also face competition internally from SharePoint. Currently SharePoint seems to be being used at the margins of teaching and learning, filling in for areas where VLEs are weaker. HEIs have reported SharePoint’s use for one-off courses and small scale courses; for pieces of work requiring students to collaborate in groups, and for work that cannot fit within the confines of one course. Schools or faculties that do not like their institution’s proprietary VLE have long been able to use an open source VLE (such as Moodle) and build their own VLE in that. Now some schools are using SharePoint and building a school specific VLE in SharePoint. However, SharePoint has a long way to go before it is anything more than marginal to teaching and learning.
  • Increase in average size of SharePoint implementations
    At the point of time in which the research was conducted (summer and autumn of 2009) many of the implementations examined were at an early stage. The boom in SharePoint came in 2008 and 2009, as HEIs started to pick up on SharePoint 2007. We will see the maturation of many implementations which are currently less than a year old. This is likely to bring with it some governance challenges (for example ‘SharePoint sprawl’) which are not apparent when implementations are smaller. It will also increase the percentage of staff and students in HE familiar with SharePoint as a working environment. One HEI reported that some of their academics, unaware that the University was about to deploy SharePoint, have been asking for SharePoint because they have been working with colleagues at other institutions who are using it.
  • Competition from Google Apps for the collaboration space
    SharePoint seems to have competed successfully against other proprietary ECM vendors in the collaboration space (though it faces strong competition from both proprietary and open source systems in the web content management space and the portal space). It seems that the most likely form of new competition in the collaboration space will come in the shape of Google Apps which offers significantly less functionality, but operates on a web hosted subscription model which may appeal to HEIs that want to avoid the complexities of the configuration and management of SharePoint.
  • Formation of at least one Higher Education SharePoint User Group
    It is surprising that there is a lack of Higher Education SharePoint user groups. There are two JISCmail groups (SharePoint-Scotland and YH-SharePoint) but traffic on these two lists is low. The formation of one or more active SharePoint user groups would seem to be essential given the high level of take up in the sector, the complexity of the product, the customisation and configuration challenges it poses, and the range of uses to which it can be put. Such a user group or groups could, support the sharing of knowledge across the sector, provide the sector with a voice in relation to both Microsoft and to vendors within the ecosystem around SharePoint, enable the sector to explore the implications of Microsoft’s increasing dominance within higher education, as domination of the collaboration space is added to its domination of operating systems, e-mail servers, and office productivity software.

On the last point, I am minded to wonder what a user group actually looks like in these days of blogs, Twitter and other social networks? Superficially, it feels to me like a concept rooted firmly in the last century. That's not to say that there isn't value in collectively being able to share our experiences with a particular product, both electronically and face-to-face, nor in being able to represent a collective view to a particular vendor - so there's nothing wrong with the underlying premise. Perhaps it is just the label that feels outdated?



eFoundations is powered by TypePad