The end of July formally marked the end of two projects I've been contributing to recently:
- the LOCAH project, funded under the JISCexpo programme, working with MIMAS and UKOLN on making available data from the Archives Hub and from Copac as Linked Data
- the SALDA project, funded under the Infrastructure for Resource Discovery programme, working with the University of Sussex Library on making available data describing the Mass Observation Archive as Linked Data
There are still some things to finish for LOCAH, particularly the publication of the Copac data.
A few (not terribly original or profound) thoughts, prompted mainly by my experience of working with the Archives Hub data:
- Sometimes data is "clean" and consistent and "normalised", but more often than not it has content in inconsistent forms or is incomplete: it is "messy". Data aggregated from multiple sources over a long period of time is probably going to be messier. (With an XML format like EAD, with what I think of as its somewhat "hybrid" document/data character, the potential for messiness increases).
- Doing things with data requires some sort of processing by software, and while there are off-the-shelf apps and various configurable tools and frameworks that can provide some functions, some element of software development is usually required.
- It may be relatively easy to identify in advance the major tasks where developer effort is required, and to plan for that, but sometimes there are additional tasks which it's more difficult to anticipate; rather they emerge as you attempt some of the other processes (and I think messy data probably makes that more likely).
- Even with developers on board, development work has to be specified, and that is a task in itself, and one that can be quite complex and time-consuming (all the more so if you find yourself trying to think through and describe what to do with a range of different inputs from a messy data source).
It's worth emphassing that most of the above is not specific to generating Linked Data: it applies to data in all sorts of formats, and it applies to all sorts of tasks, whether you're building a plain old Web site or exposing RSS feeds or creating some other Web application to do something with the data.
Sort of related to all of the above, I was reminded that my limited programming skills often leave me in the position where I can identify what needs doing but I'm not able to "make it happen", and that is something I'd like to try to change. I can "get by" in XSLT, and I can do a bit of PHP, but I'd like to get to the point where I can do more of the simpler data manipulation tasks and/or pull together simple presentation prototypes.
I've enjoyed working on both the projects. I'm pleased that we've managed to deliver some Linked Data datasets, and I'm particularly pleased to have been able to contribute to doing this for archival description data, as it's a field in which I worked quite a long time ago.
Both projects gave me opportunities to learn by actually doing stuff, rather than by reading about it or prodding at other people's data. And perhaps more than anything, I've enjoyed the experience of working together as a group. We had a lot of discussion and exchange of ideas, and I'd like to think that this process was valuable in itself. In an early post on the SALDA blog, Karen Watson noted:
it is perhaps not the end result that is most important but the journey to get there. What we hope to achieve with SALDA is skills and knowledge to make our catalogues Linked Data and use those skills and that knowledge to inform decisions about whether it would be beneficial for make all our data Linked Data.
Of course it wasn't all plain sailing, and particularly near the start there were probably times when we ran up against differences of perception and terminology. Jane Stevenson has written about some of these issues from her point of view on the LOCAH blog (e.g. here, here and here). As the projects progressed, I think we moved closer towards more of a shared understanding - and I think that is a valuable "output" in its own right, even if it is one which it may be rather hard to "measure".
So, a big thank you to everyone I worked with on both projects.
Looking forward, I'm very pleased to be able to say that Jane prepared a proposal for some further work with the Archives Hub data, under JISC's Capital Funded Service Enhancements initiative, and that bid has been successful, and I'll be contributing some work to that project as a consultant. The project is called "Linking Lives" and is focused on providing interfaces for researchers to explore the data (as well as making any enhancements/extensions to the data and the "back-end" processes that are required to enable that). More on that work to come once we get moving.
Finally, as I'm sure many of you are aware, JISC recently issued some new calls for projects, including call 13/11 for projects "to develop services that aim to make content from museums, libraries and archives more visible and to develop innovative ways to reuse those resources". If there are any institutions out there who are considering proposals and think that I could make a useful contribution - despite my lamenting my limitations above, I hope my posts on the LOCAH and SALDA blogs give an idea of the sorts of areas I can contribute to! - , please do get in touch.