November 17, 2012

Analysing the police and crime commissioner election data

UK readers will know that elections for police and crime commissioners (PCCs) took place earlier this week, the first time we have had such elections in England and Wales. They will also know that some combination of lack of knowledge, apathy and disagreement with the whole process led to our collective lowest voter turnout for a long time.

In the UK, election results are typically reported in terms of the percentage of votes cast. This makes reasonable sense in cases where the turnout is good; one can make the assumption that the votes cast are representative of the electorate as a whole. Where voter turnout is very low, as was the case this time, I think it makes far less sense. People have stayed away for a reason and I think it is more interesting to look at the mandate that newly elected commissioners have received in terms of the overall electorate.

I took a quick look at the PCC results data that is available from the Guardian data blog and came up with the following chart:

The result makes for pretty uncomfortable viewing, at least for anyone worried about how well democracy served England and Wales in this case.

The data itself comes in the form of a Google Spreadsheet containg a row for each police force and columns for the winning party and the turnout (as votes and as a percentage), followed by various columns for each of the candidates.

The data does not contain a column indicating the size of the electorate in each area, so I added a column called 'Potential vote' and reverse engineered it from the 'Turnout, votes' and 'Turnout, %' columns.

The data also does not contain a single column for each of the winners - instead, this data is spread across multiple columns representing each of the candidates. I created three new colums called 'Winner', 'Voted for winner' and 'Voted for winner as % of turnout' and manually copied the data across from the appropriate cells. (There is probably an automated way of doing this but I couldn't think of it and there's not that many rows to deal with so I did it one at a time).

From there, it is pretty straight-forward to populate columns called 'Voted for other' ('Turnout, votes' less 'Voted for winner') and 'Didn't vote' ('Potential vote' less 'Voted for winner' less 'Voted for other') and to turn these into corresponding percentages of the electorate.


As an aside, the data does not contain any information about spoiled ballot papers, of which there were alledgedly a large number in this election. A revised version of the spreadsheet today contains columns for this data but they are currently unpopulated, so I assume that this information is coming. I think this would make a useful addition to this chart.

If you are interested, my data is available here.

May 18, 2012

Big Data - size doesn't matter, it's the way you use it that counts least, that's what they tell me!

IMG_6404Here's my brief take on this year's Eduserv Symposium, Big Data, big deal?, which took place in London last Thursday and which was, by all the accounts I've seen and heard, a pretty good event.

The day included a mix of talks, from an expansive opening keynote by Rob Anderson to a great closing keynote by Anthony Joseph. Watching either, or both, of these talks will give you a very good introduction to big data. Between the two we had some specifics: Guy Coates and Simon Metson talking about their experiences of big data in genomics and physics respectively (though the latter also included some experiences of moving big data techniques between different academic disciplines); a view of the role of knowledge engineering and big data in bridging the medical research/healthcare provision divide by Anthony Brookes; a view of the potential role of big data in improving public services by Max Wind-Cowie; and three shorter talks immediately after lunch - Graham Prior talking about big data and curation, Devin Gafney talking about his 140Kit twitter-analytics project (which, coincidentally, is hosted on our infrastructure) and Simon Hodson talking about the JISC's big data activities.

All of the videos and slides from the day are avaialble at the links above. Enjoy!

For my part, there were several take-home messages:

  • Firstly, that we shouldn’t get too hung up on the word ‘big’. Size is clearly one dimension of the big data challenge but of the three words most commonly associated with big data - volume, velocity and variety - it strikes me that volume is the least interesting and I think this was echoed by several of the talks on the day.
  • In particular, it strikes me there is some confusion between ‘big data’ and ‘data that happens to be big’ - again, I think we saw some of this in some of the talks. Whilst the big data label has helped to generate interest in this area, it seems to me that its use of the word 'big' is rather unhelp in this respect. It also strikes me that the JISC community, in particular, has a history of being more interested in curating and managing data than in making use of it, whereas big data is more about the latter than the former.
  • As with most new innovations (though 'evolution' is probably a better word here) there is a temptation to focus on the technology and infrastructure that makes it work, particularly amoungst a relatively technical audience. I am certainly guilty of this. In practice, it is the associated cultural change that is probably more important. Max Wind-Cowie’s talk, in particular, referred to the kinds of cultural inertia that need to be overcome in the public sector, on both the service provider side and the consumer side, before big data can really have an impact in terms of improving public services. Attitudes like, "how can a technology like big data possibly help me build a *closer* and more *personal* relationship with my clients?" or "why should I trust a provider of public services to know this much about me?" seem likely to be widespread. Though we didn't hear about it on the day, my gut feeling is that a similar set of issues would probably apply in education were we, for example, to move towards a situation where we make significant use of big data techniques to tailor learning experiences at an individual level. My only real regret about the event was that I didn't find someone to talk on this theme from an education perspective.
  • Several talks refered to the improvements in 'evidence-based' decision-making that big data can enable. For example, Rob Anderson talked about poor business decisions being based on poor data currently and Anthony Brookes discussed the role of knowledge engineering in improving the ability of those involved in front-line healthcare provision to take advantage of the most recent medical research. As Adam Cooper of CETIS argues in Analytics and Big Data - Reflections from the Teradata Universe Conference 2012, we need to find ways to ask questions that have efficiency or effectiveness implications and we need to look for opportunities to exploit near-real-time data if we are to see benefits in these areas.
  • I have previously raised the issue of possible confusion, especially in the government sector, between 'open data' and 'big data'. There was some discussion of this on the day. Max Wind-Cowie, in particular, argued that 'open data' is a helpful - indeed, a necessary - step in encouraging the public sector to move toward a more transparent use of public data. The focus is currently on the open data agenda but this will encourage an environment in which big data tools and techniques can flourish.
  • Finally, the issue that almost all speakers touched on to some extent was that of the need to grow the pool of people who can undertake data analytics. Whether we choose to refer to such people as data scientists, knowledge engineers or something else there is a need for us to grow the breadth and depth of the skills-base in this area and, clearly, universities have a critical role to play in this.

As I mentioned in my opening to the day, Eduserv's primary interest in Big Data is somewhat mundane (though not unimportant) and lies in the enabling resources that we can bring to the communities we serve (education, government, health and other charities), either in the form of cloud infrastructure on which big data tools can be run or in the form of data centre space within which physical kit dedicated to Big Data processing can be housed. We have plenty of both and plenty of bandwidth to JANET so if you are interested in working with us, please get in touch.

Overall, I found the day enlightening and challenging and I should end with a note of thanks to all our speakers who took the time to come along and share their thoughts and experiences.

[Photo: Eliot Hall, Eduserv]

April 02, 2012

Big data, big deal?

Some of you may have noticed that Eduserv's annual symposium is happening on May 10. Once again, we're at the Royal College of Physicians in London and this year we are looking at big data, appropriate really... since 2012 has been widely touted as being the year of big data.

Here's the blurb for our event:

Data volumes have been growing exponentially for a long while – so what’s new now? Is Big Data [1] just the latest hype from vendors chasing big contracts? Or does it indeed present wholly new challenges and critical new opportunities, and if so what are they?

The 2012 Symposium will investigate Big Data, uncovering what makes it different from what has gone before and considering the strategic issues it brings with it: both how to use it effectively and how to manage it.  It will look at what Big Data will mean across research, learning, and operations in HE, and at its implications in government, health, and the commercial sector, where large-scale data is driving the development of a whole new set of tools and techniques.

Through presentations and debate delegates will develop their understanding of both the likely demands and the potential benefits of data volumes that are growing disruptively fast in their organisation.

[1] Big Data is "data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn't fit the strictures of your database architectures. To gain value from this data, you must choose an alternative way to process it."  What is big data?  Edd Dumbill, O'Reilly Radar, Jan 2012

As usual, the event is free to attend and will be followed by a drinks reception.

You'll note that we refer to Edd Dumbill's What is big data? article in order to define what we mean by big data and I recommend reading this by way of an introduction for the day. The Wikipedia page for Big data provides a good level of background and some links for further reading. Finally, O'Reilly's follow-up publication, Planning for Big Data - A CIO's Handbook to the Changing Data Landscape is also worth a look (and is free to download as an e-book).

You'll also note that the defining characteristics of big data include not just 'size' (though that is certainly an important dimension) but also 'rate of creation and/or change', and 'structural coherence'. These are typically known as the three Vs - "volume (amount of data), velocity (speed of data in/out), and variety (range of data types, sources)". In looking around for speakers, my impression is that there is a strong emphasis on the first of these in people's general understanding about what big data means (which is not surprising given the name) and that in the government sector in particular there is potential confusion between 'big data' and 'open data' and/or 'linked data' which I think it would be helpful to unpick a little - big data might be both 'open' and 'linked' but isn't necessarily so.

So, what do we hope to get out of the day? As usual, it's primarily a 'bringing people up to speed' type of event. The focus will be on our charitable beneficiaries, i.e. organisations working in the area of 'public good' - education, government, health and the charity sector - though I suspect that the audience will be mainly from the first of these. The intention is for people to leave with a better understand of why big data might be important to them and what impact it might have in both strategic and practical terms on the kinds of activities they undertake.

We have a range of speakers, providing perspectives from inside and outside of those sectors, both hands-on and more theoretical - this is one of the things we always try and do at our sympoisia. Our sessions include keynotes by Anthony D. Joseph (Chancellor's Associate Professor in Computer Science at University of California, Berkeley) and Rob Anderson (CTO EMEA, Isilon Storage Division at EMC) as well as talks by Professor Anthony J Brookes (Department of Genetics at the University of Leicester), Dr. Guy Coates (Informatics Systems Group at The Wellcome Trust Sanger Institute) and Max Wind-Cowie (Head of the Progressive Conservatism Project, Demos - author of The Data Dividend).

By the way... we still have a couple of speaking slots available and are particularly interested in getting a couple of short talks from people with practical experience of working with big data, either using Hadoop or something else. If you are interested in speaking for 15 minutes or so (or if you know of someone who might be) please get in touch. Thanks. Another area that I was hoping to find a speaker to talk about, but haven't been able to so far, is someone who is looking at the potential impact of big data on learning analytics, either at the level of a single institution or, more likely, at a national level. Again, if this is something you are aware of, please get in touch. Crowd-sourced speakers FTW! :-)

All in all, I'm confident that this will be an interesting and informative day and a good follow-up to last year's symposium on the cloud - I look forward to seeing you there.

March 21, 2012

MOOCing about with SaaS

I've been taking part in a MOOC, or Massively Open Online Course, over the last 4 weeks or so. The course in question is called Software Engineering for SaaS and is being offered by the University of California, Berkeley under the umbrella of Coursera, who are currently offering a range of online courses delivered by various US universities. This particular course is led jointly by Armando Fox (who spoke at our symposium last year) and David Patterson and is highly recommended - I'm finding it hard work but very enjoyable.

The course itself covers Ruby on Rails, the theory and underlying standards related to the development of SaaS applications, agile development, behavior-driven design (BDD) and test-driven development (TDD), Cucumber, RSpec and a range of other stuff. For a five week course there's a lot to take in (especially for someone starting from pretty much zero knowledge) and I'm probably taking more than the anticipated 10 hours per week to get thru it.

Course materials are delivered in the form of a $10 e-book (for which I bought a Kindle, though I have to say I'm rather disappointed with the Kindle as a means of reading technical text books - but that's another story), videos of lectures and associated material, backed up by an online forum where problems can be discussed.

Grading (i.e. homework) takes the form of 4 programming exercises (one per week) and 2 quizzes though it should be noted that there is no formal qualification awarded at the end. Other than the e-book (which is technically optional, though I doubt that I could have completed the course without it) the course is free.

The programming homework is submitted online and graded automatically (which is very neat) and can be submitted multiple times up to the deadline. Each piece of homework has to be completed in 1 week, though submissions up to 1 week late get 75% of the marks.

Homework is completed on your own infrastructure, though it can be done on a free Amazon EC2 account (and free credits for a better Amazon account were given out to everyone who completed the first week's homework). In my case, I have used Eduserv's Education Cloud as my infrastructure (as have a few others at Eduserv). You also have to use GitHub and Heroku as part of the course.

Some idle thoughts on the whole MOOC thing...

1) Despite the lack of a qualification at the end, deadlines really do feel like deadlines and I've spent at least one Sunday night up until 1 in the morning trying to get homework finished before the Monday deadline. Hey, if I hadn't had to get up for work the next day it would have been just like a real student experience :-).

2) How massive is massive? I've heard rumours that more than 50,000 60,000 people signed up for the course but I'm somewhat doubtful that as many as that are actively taking part. Homework 3 suggested that people should take a fork of a GitHub repository before starting their work, and that appears to have happened about 3000 times, which is obviously a much smaller number (though it turns out that people didn't have to fork the repository to complete the assignment, so that number isn't very helpful). The course organisers say that around 7,000 people are actively submitting homework, which is pretty impressive. And presumably there are a lot of others who are following the course but not submitting the homework.

3) In general, MOOCs are premised on the idea of connectivism as a pedagogic approach which I'll summarise somewhat trivially by saying, "you may not know the answer but in a large enough social network you'll probably know someone who does". I suspect this works particularly well in what I'll call "softer" disciplines - for example, where homework submissions take the form of essays. As it happens, it has also worked quite well for this course, not because people have directly given away the answers in the discussion forum but because the general discussion around problems and issues (with all aspects of the course) has been incredibly useful. There have been several occasions where I've only been able to get past a massive stumbling block because of hints left by other people in the formus. (Of course, just plain old Googling for stuff has also been very helpful (and has been actively encouraged during the course)).

All in all, if the other Coursera courses are anything like this one, I highly recommend them.

Addendum: I've just finished the final week, handed in my final piece of homework and taken the last quiz. I have to say that this week's focus on Rspec and test-driven development has taken me well outside my comfort zone. I really haven't understood it and the forums haven't helped (though I must admit that I haven't asked directly in them). I basically haven't grasped the fundamentals of why TDD is good and what I'm trying to achieve as I do it. Oh well, this week has slightly taken the edge off the course for me but hasn't fundamentally changed my overall summing up in the final sentence above. I joined the course primarily to force myself to learn Ruby, and I've suceeded in that.

Addendum 2: Armando Fox has a couple of blog posts (the most recent sounding somewhat relieved :-), Made it to Spring Break, things still holding together) giving his side of running the course. As of March 23 he says that about 5,000 students were still actively taking part in the course so my guesstimate above doesn't look too far wrong.

March 03, 2012

Moving on

As some of you may have heard by now, yesterday was my last day working at Eduserv. In a few weeks, I'll be taking up a post at Cambridge University Library, working on metadata for their Digital Library.

This feels like a fantastic opportunity, and I'm very excited about the move. I hope it will allow me to apply some of my existing knowledge and skills and also gain some experience in new areas - and to contribute to the development of a high quality resource. I very much enjoyed meeting the team there, and the library's digital collections are superb - as I think the set of materials available already show. It's not often I get to prepare for a job interview by listening to Melvyn Bragg on Radio 4 talking about Isaac Newton's (not always for the faint-hearted!) accounts of his experiments.

I'm sure it's no secret that as Eduserv's focus has changed the divergence from my own experience and interests has become more marked - probably starting right back with the demise of the Eduserv Foundation, but becoming particularly apparent over the last 12-18 months or so, with the strong shift of emphasis towards the provision of cloud services, and with the rest of my Research & Innovation Group colleagues now working pretty much exclusively in this area.

This move will also mark the end of a period of over 11 years working alongside Andy, first at UKOLN and then at Eduserv. We haven't worked closely together on a project for a while now - the last piece of work we did jointly was probably authoring the JISC RDTF/Discovery metadata guidelines. But on the occasions we did collaborate, I always valued and enjoyed the experience, and I like to think we complemented each other and made a good team.

I'll miss Andy's clear-sightedness and common sense approach - though I imagine I'll still be firing off late night emails saying, "OK, here's the thing: I'm a bit stuck here. I could take solution X, or I could take solution Y. What do you think?". I'm also very grateful for the support I've received for my work and ideas, from Andy, and also from the RIG team leader, Matt Johnson. I wish them and the rest of the team all the best with their future work.

I've had so much work to do over the last few weeks and have been working ridiculous (even by my standards!) hours. Sitting here at home looking out at a sunny spring morning in Bristol, I feel it's the first time in several weeks I've been able to catch my breath, and really start to look forward.

Andy and I haven't had a chance to discuss what this change means for this blog, but I dare say we will manage that in the next few days and something will appear here.

Meanwhile, I'm also trying to take this opportunity to reorganise various bits of my personal Web presence (like I need more to do when I have to tie up lots of things and organise a move across the country...). I'm not sure how that is all going to end up, but in the short term the best places to find me are probably on Twitter, or Diaspora.

January 10, 2012

Introducing Bricolage

My last post here (two months ago - yikes, must do better...) was an appeal to anyone who might be interested in my making a contribution to a project under the JISC call 16/11: JISC Digital infrastructure programme to get in touch. I'm pleased to say that Jasper Tredgold of ILRT at the University of Bristol contacted me about a proposal he was putting together for a project called Bricolage, with the prospect of my doing some consultancy. The proposal was to work with the University Library's Special Collections department and the University Geology Museum to make available metadata for two collections - the archive of Penguin Ltd. and the specimen collection of the Geology Museum - as Linked Open Data.

And just before I went off for the Christmas break, Jasper let me know that the proposal had been accepted and the project was being funded. I'm very pleased to have another opportunity to carry on applying some of what I've learned in the other JISC-funded projects I've contributed to recently, and also to explore some new categories of data. It's also nice to be working with a local university - I worked on a few projects with folks from ILRT during my time at UKOLN, and from a selfish perspective I look forward to project meetings which involve a twenty-minute walk up the hill for me rather than a 7am start and a three or four hour train journey!

The project will start in February and run through to July. I'm sure there'll be a project blog once we get going and I'll add a URI here when it is available.

November 04, 2011

JISC Digital Infrastructure programme call

JISC currently has a call, 16/11: JISC Digital infrastructure programme, open for project proposals in a number of "strands"/areas, including the following:

  • Resource Discovery: "This programme area supports the implementation of the resource discovery taskforce vision by funding higher education libraries archives and museums to make open metadata about their collections available in a sustainable way. The aim of this programme area is to develop more flexible, efficient and effective ways to support resource discovery and to make essential resources more visible and usable for research and learning."

This strand advances the work of the UK Discovery initiative, and is similar to the "Infrastructure for Resource Discovery" strand of the JISC 15/10 call under which the SALDA project (in which I worked with the University of Sussex Library on the Mass Observation Archive data) was funded. There is funding for up to ten projects of between £25,000 and £75,000 per project in this strand

First, I should say this is a great opportunity to explore this area of work and I think we're fortunate that JISC is able to fund this sort of activity. A few particular things I noticed about the current call:

  • a priority for "tools and techniques that can be used by other institutions"
  • a focus on unique resources/collections not duplicated elsewhere
  • should build on lessons of earlier projects, but must avoid duplication/reinvention
  • a particular mention of "exploring the value of integrating structured data into webpages using microformats, microdata, RDFa and similar technologies" as an area in scope
  • an emphasis on sharing the experience/lessons learned: "The lessons learned by projects funded under this call are expected to be as important as the open metadata produced. All projects should build sharing of lessons into their plans. All project reporting will be managed by a project blog. Bidders should commit to sharing the lessons they learn via a blog"

Re that last point, as I've said before, one of the things I most enjoyed about the SALDA and LOCAH projects was the sense that we were interested in sharing the ideas as well as getting the data out there.

I'm conscious the clock is ticking towards the submission deadline, and I should have posted this earlier, but if anyone reading is considering a proposal and thinks that I could make a useful contribution, I'd be interested to hear from you. My particular areas of experience/interest are around Linked Data, and are probably best reflected by the posts I made on the LOCAH and SALDA blogs, i.e. data modelling, URI pattern design, identification/selection of useful RDF vocabularies, identification of potential relationships with things described in other datasets, construction of queries using SPARQL, etc. I do have some familiarity with RDFa, rather less with microdata and microformats. I'm not a software developer, but I can do a little bit of XSLT (and maybe enough PHP to be dangerous hack together rather flakey basic demonstrators). And I'm not a technical architect, but I did get some experience of working with triple stores in those recent projects.

My recent work has been mainly with archival metadata, and I'd be particularly interested in further work which complements that. I'm conscious of the constraint in the call of not repeating earlier work, so I don't think "reapplying" the sort of EAD to RDF work I did with LOCAH and SALDA would fit the bill. (I'd love to do something around the event/narrative/storytelling angle that I wrote about recently here, for example.) Having said that, I certainly don't mean to limit myself to archival data. Anyway, if you think I might be able to help, please do get in touch ([email protected]).

October 21, 2011

Two UK government consultations related to open data

This is just a very quick note to highlight that there are two UK government consultations in the area of open data currently in progress and due to close very shortly - next week on 27 October 2011:

  • Making Open Data Real, from the Cabinet Office, on the Transparency and Open Data Strategy, and "establishing a culture of openness and transparency in public services".
  • A Consultation on Data Policy for a Public Data Corporation, from BIS, on the role of the planned Public Data Corporation and "key aspects of data policy – charging, licensing and regulation of public sector information produced by the PDC for re-use – that will determine how a PDC can deliver against all its objectives".

Below a few pointers to notes and comments I've seen around and about recently via Twitter:

Related to the former consultation is a very interesting report by Kieron O'Hara from the University of Southampton, published by the Cabinet Office as Transparent Government, not Transparent Citizens on the the issues for privacy raised by the government‘s transparency programme, and on reconciling the desire for openness from government with the privacy of individuals, which makes the argument that "privacy is a necessary condition for a successful transparency programme".

October 05, 2011

Storytelling, archives and linked data

Yesterday on Twitter I saw Owen Stephens (@ostephens) post a reference to a presentation titled "Every Story has a Beginning", by Tim Sherratt (@wragge), "digital historian, web developer and cultural data hacker" from Canberra, Australia.

The presentation content is available here, and the text of the talk is here. I think you really need to read the text in one window and click through the presentation in another. I found it fascinating, and pretty inspiring, from several perspectives.

First, I enjoyed the form of the presentation itself. The content is built up incrementally on the screen, with an engaging element of "dynamism" but kept simple enough to avoid the sort of vertiginous barrage that seems to characterise the few Prezi presentations I've witnessed. And perhaps most important of all, the presentation itself is very much "a thing of the Web": many of the images are hyperlinked through to the "live" resources pictured, providing not only a record of "provenance" for the examples, but a direct gateway into the data sources themselves, allowing people to explore the broader context of those individual records or fragments or visualisations.

Second, it provides some compelling examples of how digitised historical materials and data extracted or derived from them can be brought together in new combinations and used to uncover and (re)tell stories - and stories not just of the "famous", the influential and powerful, but of ordinary people whose life events were captured in historical records of various forms. (Aside: Kate Bagnall has some thoughtful posts looking at some of the ethical issues of making people who were "invisible" "visible").

Finally, what really intrigued me from the technical perspective was that - if I understand correctly - the presentation is being driven by a set of RDF data. (Tim said on Twitter he'd post more explaining some of the mechanics of what he has done, and I admit I'm jumping the gun somewhat in writing this post, so I apologise for any misinterpretations.) In his presentation, Tim says:

What we need is a data framework that sits beneath the text, identifying people, dates and places, and defining relationships between them and our documentary sources. A framework that computers could understand and interpret, so that if they saw something they knew was a placename they could head off and look for other people associated with that place. Instead of just presenting our research we’d be creating a whole series of points of connection, discovery and aggregation.

Sounds a bit far-fetched? Well it’s not. We have it already — it’s called the Semantic Web.

The Semantic Web exposes the structures that are implicit in our web pages and our texts in ways that computers can understand. The Linked Data movement takes the basic ideas of the Semantic Web and turns them into a collaborative activity. You share vocabularies, so that other people (and computers) know when you’re talking about the same sorts of things. You share identifiers, so that other people (and computers) know that you’re talking about a specific person, place, object or whatever.

Linked Data is Storytelling 101 for computers. It doesn’t have the full richness, complexity and nuance that we invest in our narratives, but it does at least help computers to fit all the bits together in meaningful ways. And if we talk nice to them, then they can apply their newly-acquired interpretative skills to the things that they’re already good at — like searching, aggregating, or generating the sorts of big pictures that enable us to explore the contexts of our stories.

So, if we look at the RDF data for Tim's presentation, it includes "descriptions" of many different "things", including people, like Alexander Kelley, the subject of his first "story" (to save space, I've skipped the prefix declarations in these snippets but I hope they convey the sense of the data):

story:kelley a foaf1:Person ;
     bio:death story:kelley_death ;
         story:kelley_wounded_2 ;
     foaf1:familyName "Kelley"@en-US ;
     foaf1:givenName "Alexander"@en-US ;
     foaf1:isPrimaryTopicOf story:kelley_moa ;
     foaf1:name "Alexander Dewar Kelley"@en-US ;
       <> . 

There is data about events in his life:

story:kelley_discharge a bio:Event ;
       "Discharged from the Australian Imperial Force."@en-US ;
     dc:date "1918-11-22"@en-US . 

story:kelley_enlistment a bio:Event ;
       "Enlistment in the Australian Imperial Force for 
        service in the First World War."@en-US ;
     dc:date "1916-01-22"@en-US . 
story:kelley_ww1_service a bio:Interval ;
     bio:concludingEvent story:kelley_discharge ;
     bio:initiatingEvent story:kelley_enlistment ;
     foaf1:isPrimaryTopicOf story:kelley_ww1_record . 

and about the archival materials that record/describe those events:

story:kelley_ww1_record a locah:ArchivalResource ;
       <> ;
     dc:identifier "B2455, KELLEY ALEXANDER DEWAR"@en-US ;
       ""@en-US . 

The presentation itself, the conference at which it was presented, various projects and researchers mentioned - all of these are also "things" described in the data.

I'd be interested in hearing more about how this data was created, the extent to which it was possible to extract the description of people, events, archival resources etc directly from existing data sources and the extent to which it was necessary to "hand craft" parts of it.

But I get very excited when I think about the potential in this sort of area if (when!?) we do have the data for historical records available as linked data (and available under open licences that support its free use).

Imagine having a "story building tool" which enables a "narrator" to visit a linked data page provided by the National Archives of Australia or the Archives Hub or one of the other projects Tim refers to, and to select and "intelligently clip" a chunk of data which you can then arrange into the "story" you are constructing - in much the way that bookmarklets for tools like Tumblr and Posterous enable you to do for text and images now. That "clipped chunk of data" could include a description of a person and some of their life events and metadata about digitised archival resources, including URIs of images - as in Tim's examples. You might follow pointers to other datasets from which additional data could be pulled. You might supplement the "clipped" data with your own commentary. Then imagine doing the same with data from the BBC describing a TV programme or radio broadcast related to the same person or events, or with data from a research repository describing papers about the person or events. The tool could generate some "provenance data" for each "chunk" saying "this subgraph was part of that graph over there, which was created by agency A on date D" in much the way that the blogging bookmarklets provide backlinks to their sources.

And the same components might be reorganised, or recombined with others, to tell different stories, or variants of the same story.

Now, yes, I'm sure there are some thorny issues to grapple with here, and coming up with an interface that balances usability and the required degree of control may be a challenge - so maybe I'm getting carried way with my enthusiasm, but it doesn't seem to be an entirely unreasonable proposition.

I think it's important here that, as Tim emphasises towards the end of his text, it is the human "narrator", not an algorithm, who decides on the structure of the story and selects its components and brings them into (possibly new) relationships with each other.

I'm aware that there's other work in this area of narratives and stories, particularly from some of the people at the BBC, but I admit I haven't been able to keep up with it in detail. See e.g. Tristan Ferne on "The Mythology Engine" and Michael Smethurst's thoughtful "Storytellin'".

For me, Tim's very concrete examples made the potential of these approaches seem very real. They suggest a vision of Linked Data not as one more institutional "output", but as a basis for individual and collective creativity and empowerment, for the (re)telling of stories that have been at least partly concealed - stories which may even challenge the "dominant" stories told by the powerful. It seems all too infrequent these days that I come across something that reminds me why I bothered getting interested in metadata and data on the Web in the first place: Tim's presentation was one of those things.

October 03, 2011

Virtual World Watch taking submissions for new Snapshot Report

John Kirriemuir has put out a new call for contributions to a tenth Virtual World Watch "snapshot report" on the use of virtual worlds in education in the UK and, this time, in Ireland too. His deadline for submissions is November 14 2011.

The activity is no longer funded under the Eduserv Research Programme, but John has obtained "a small amount of independent funding to carry out another snapshot over the remainder of the year", and Andy and I continue to be members of an informal "advisory board" for the activity (which means, err, we get the occasional email from John which prods us into writing blog posts like this one!)

Part of John's plan is to try to draw attention to the resulting report (and to contributors' work covered in it) by "pushing" it to various agencies, including:

  • UK funding bodies who fund virtual world in education activities
  • Journalists who specialise in technology in education news
  • Relevant government and civil service departments
  • The owners/developers of key virtual worlds
  • Major research groups (worldwide) involved in virtual world in education research

Previous reports are available here



eFoundations is powered by TypePad