RDTF metadata guidelines - Limp Data or Linked Data?
Having just finished reading thru the 196 comments we received on the draft metadata guidelines for the UK RDTF I'm now in the process of wondering where we go next. We (Pete and I) have relatively little effort to take this work forward (a little less than 5 days to be precise) so it's not clear to me how best we use that effort to get something useful out for both RDTF and the wider community.
By the way... many thanks to everyone who took the time to comment. There are some great contributions and, if nothing else, the combination of the draft and the comments form a useful snapshot of the current state of issues around library, museum and archival metadata in the context of the web.
Here's my brief take on what the comments are saying...
Firstly, there were several comments asking about the target audience for the guidelines and whether, as written, they will be meaningful to... well... anyone I guess! It's worth pointing out that my understanding is that any guidelines we come up with thru the current process will be taken forward as part of other RDTF work. What that means is that the guidelines will get massaged into a form (or forms) that are digestable by the target audience (or audiences), as determined by other players with the RDTF activity. What we have been tasked with are the guidelines themselves - not how they are presented. We perhaps should have made this clearer in the draft guidelines. In short, I don't think the document, as written, will be put directly in-front of anyone who doesn't go to the trouble of searching it out explicitly.
Secondly, there were quite a number of detailed comments on particular data formats, data models, vocabularies and so on. This is great and I'm hopeful that as a result we can either extend the list of examples given at various points in the guidelines (or, in some cases, drop back to not having examples but simply say, "do whatever is the emerging norm here in your community").
Thirdly, the were some concerns about what we meant by "open". As we tried to point out in the draft, we do not consider this to be our problem - it is for other activity in RDTF to try and work out what "open" means - we just felt the need to give that word a concrete definition, so that people could understand where we were coming from for the purposes of these guidelines.
Finally, there were some bigger concerns - these are the things that are taxing me right now - that broadly fell into two, related, camps. Firstly, that the step between the community formats approach and the RDF data approach is too large (though no-one really suggested what might go in the middle). And secondly, that we are missing a trick by not encouraging the assignment of 'https' URIs to resources as part of the community formats approach.
As it stands, we have, on the one hand, what one might call Limp Data (MARC records, XML files, CVS, EAD and the rest) and, on the other, Linked Data and all that entails, with a rather odd middle ground that we are calling RDF data (in the current guidelines).
I was half hoping that someone would simply suggest collapsing our RDF data and Linked Data approaches into one - on the basis that separating them into two is somewhat confusing (but as far as I can tell no-one did... OK, I'm doing it now!). That would leave a two-pronged approach - community formats and Linked Data - to which we could add a 'community formats with https URIs' middle ground. My gut feel is that there is some attraction in such an approach but I'm not sure how feasible it is given the characteristics of many existing community formats.
As part of his commentary around encouraging https URIs (building a 'better web' was how he phrased it), Owen Stephens suggested that every resource should have a corresponding web page. I don't disagree with this... well, hang on... actually I do (at least in part)! One of the problems faced by this work is the fundamental difference between the library world and museums and archives. The former is primarily dealing with non-unique resources (at the item level), the latter with unique resources. (I know that I'm simplifying things here but bear with me). Do I think that resource discovery will be improved if every academic library in the UK (or indeed in the world) creates an https URI for every book they hold at which they serve a human-readable copy of their catalogue record? No, I don't. If the JISC and RLUK really want to improve web-scale resource discovery of books in the library sector, they would be better off spending their money encouraging libraries to sign up to OCLC WorldCat and contributing their records there. (I'm guessing that this isn't a particular popular viewpoint in the UK - at least, I'm not sure that I've ever heard anyone else suggest it - but it seems to me that WorldCat represents a valuable shared service approach that will, in practice, be hard to beat in other ways.) Doing this would both improve resource discovery (e.g. thru Google) and provide a workable 'appropriate copy' solution (for books). Clearly, doing this wouldn't help build a more unified approach across the GLAM domains but, as at least one commenter pointed out, it's not clear that the current guidelines do either. Note: I do agree with Owen that every unique museum and archival resource should have an https URI and a web page.
All of which, as I say, leaves us with a headache in terms of how we take these guidelines forward. Ah well... such is life I guess.