Contrary to appearances, I haven't completely abandoned eFoundations, but recently I've mostly been working on the JISC-funded LOCAH project which I mentioned here a while ago, and my recent scribblings have mostly been over on the project blog.
LOCAH is working on making available some data from the Archives Hub (a collection of archival finding-aids i.e. metadata about archival collections and their constituent items) and from Copac (a "union catalogue" of bibliographic metadata from major research & specialist libraries) as linked data.
So far, I've mostly been working with the EAD data, with Jane Stevenson and Bethan Ruddock from the Archives Hub. I've posted a few pieces on the LOCAH blog, on the high-level architecture/workflow, on the model for the archival description data (also here), and most recently on the URI patterns we're using for the archival data.
I've got an implementation of this as an XSLT transform that reads an EAD XML document and outputs RDF/XML, and have uploaded the results of applying that to a small subset of data to a Talis Platform instance. We're still ironing out some glitches but there'll be information about that on the project blog coming in the not too distant future.
On a personal note, I'm quite enjoying the project. It gives me a chance to sit down and try to actually apply some of the principles that I read about and talk about, and I'm working through some of the challenges of "real world" data, with all its variability and quirks. I worked in special collections and archives for a few years back in the 1990s, when the institutions where I was working were really just starting to explore the potential of the Web, so it's interesting to see how things have changed (or not! :-)), and to see the impact of and interest in some current technological (and other) trends within those communities. It also gives me a concrete incentive to explore the use of tools (like the Talis Platform) that I've been aware of but have only really tinkered with: my efforts in that space inevitably bring me face to face with the limited scope of my development skills, though it's also nice to find that the availability of a growing range of tools has enabled me to get some results even with my rather stumbling efforts.
It'a also an opportunity for me to discuss the "linked data" approach with the archivists and librarians within the project - in very concrete ways based on actual data - and to try to answer their questions and to understand what aspects are perceived as difficult or complex - or just different from existing approaches and practices.
So while some of my work necessarily involves me getting my head down and analysing input data or hacking away at XSLT or prodding datasets with SPARQL queries, I've been doing my best to discuss the principles behind what I'm doing with Jane and Bethan as I go along, and they in turn have reflected on some of the challenges as they perceive them in posts like Jane's here.
One of the project's tasks is to:
Explore and report on the opportunities and barriers in making content structured and exposed on the Web for discovery and use. Such opportunities and barriers may coalesce around licensing implications, trust, provenance, sustainability and usability.
I think we're trying to take a broad view of this aspect of the project, so that it extends not just to the "technical" challenges in cranking out data and how we address them, but also incorporates some of these "softer" elements of how we, as individuals with backgrounds in different "communities", with different practices and experiences and perspectives, share our ideas, get to grips with some of the concepts and terminology and so on. Where are the "pain points" that cause confusion in this particular context? Which means of explaining or illustrating things work, and which don't? What (if any!) is the value of the "linked data" approach for this sort of data? How is that best demonstrated? What are the implications, if any, for information management practices within this community? It may not be the case that SPARQL becomes a required element of archivists' training any time soon, but having these conversations, and reflecting on them, is, I think, an important part of the LOCAH experience.