Metadata guidelines for the UK RDTF - please comment
As promised last week, our draft metadata guidelines for the UK Resource Discovery Taskforce are now available for comment in JISCPress. The guidelines are intended to apply to UK libraries, museums and archives in the context of the JISC and RLUK Resource Discovery Taskforce activity.
The comment period will last two weeks from tomorrow and we have seeded JISCPress with a small number of questions (see below) about issues that we think are particularly worth addressing. Of course, we welcome comments on all aspects of the guidelines, not just where we have raised issues. (Note that you don't have to leave public comments in JISCPress if you don't want to - an email to me or Pete will suffice. Or you can leave a comment here.)
The guidelines recommend three approaches to exposing metadata (to be used individually or in combination), referred to as:
- the community formats approach;
- the RDF data approach;
- the Linked Data approach.
We've used words like 'must' and 'should' but it is worth noting that at this stage we are not in a position to say how these guidelines will be applied - if at all. Nor whether there will be any mechanisms for compliance put in place. On that basis, treat phrases like 'must do this' as meaning, 'you must do this for your metadata to comply with one or other approach as recommended by these guidelines' - no more, no less. I hope that's clear.
When we started this work, we we began by trying to think about functional requirements - always a good place to start. In this case however, that turned out not to make much sense. We are not starting from a green field here. Lots of metadata formats are already in use and we are not setting out with the intent of changing current cataloguing practice across libraries, museums and archives. What we can say is that:
- we have tried to keep as many people happy as possible (hence the three approaches), and
- we want to help libraries, museums and archives expose existing metadata (and new metadata created using existing practice) in ways that support the development of aggregator services and that integrate well with the web (of data).
As mentioned previously, the three approaches correspond roughly to the 3-star, 4-star and 5-star ratings in the W3C's Linked Data Star Ratings Scheme. To try and help characterise them, we prepared the following set of bullet points for a meeting of the RDTF Technical Advisory Group earlier this week:
The community data approach
- the “give us what you’ve got” bit
- share existing community formats (MARC, MODS, BibTeX, DC, SPECTRUM, EAD, XML, CSV, JSON, RSS, Atom, etc.) over RESTful HTTP or OAI-PMH
- for RESTful HTTP, use sitemaps and robots.txt to advertise availability and GZip for compression
- for CSV, give us a column called ‘label’ or ‘title’ so we’ve got something to display and a column called 'identifier' if you have them
- provide separate records about separate resources
The RDF data approach
- use RDF
- model according to FRBR, CIDOC CRM or EAD and ORE where you can
- re-use existing vocabularies where you can
- assign URIs to everything of interest
- make big buckets of RDF (e.g. as RDF/XML, N-Tuples, N-Quads or RDF/Atom) available for others to play with
- use Semantic Sitemaps and the Vocabulary of Interlinked Datasets (VoID) to advertise availability of the buckets
The Linked Data approach
- like the RDF data approach but also...
- use ‘http’ URIs and follow recommended URI patterns for data.gov.uk
- serve HTML and RDF/RDFa at your own URIs and follow the “cool URIs for the semantic web” recommended practice
- become part of the web of data - link to other people’s stuff using their URIs
Dunno if that is a helpful summary but we look forward to your comments on the full draft. Do your worst!
For the record, the issues we are asking questions about mainly fall into the following areas:
- is offering a choice of three approaches helpful?
- for the community formats approach, are the example formats we list correct, are our recommendations around the use of CSV useful and are JSON and Atom significant enough that they should be treated more prominently?
- does the suggestion to use FRBR and CIDOC CRM as the basis for modeling in RDF set the bar too high for libraries and museums?
- where people are creating Linked Data, should we be recommending particular RDF datasets/vocabularies as the target of external links?
- do we need to be more prescriptive about the ways that URIs are assigned and dereferenced?
Note that a printable version of the draft is also available from Google Docs.