« Open, social and linked - what do current Web trends tell us about the future of digital libraries? | Main | The ubiquitous university »

October 19, 2009

Helpful Dublin Core RDF usage patterns

In the beginning [*] there was the HTML meta element and we used to write things like:

<meta name="DC.Creator"content="Andy Powell">
<meta name="DC.Subject" content="something, something else, something else again">
<meta name="DC.Date.Available" scheme="W3CDTF" content="2009-10-19">
<meta name="DC.Rights" content="Open Database License (ODbL) v1.0">

Then came RDF and a variety of 'syntax' guidance from DCMI and we started writing:

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF
  xmlns:dc="http://purl.org/dc/elements/1.1/"
  xmlns:dcterms="http://purl.org/dc/terms/"
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <rdf:Description rdf:about="http://example.net/something">
    <dc:creator>Andy Powell</dc:creator>
    <dcterms:available>2009-10-19</dcterms:available>
    <dc:subject>something</dc:subject>
    <dc:subject>something else</dc:subject>
    <dc:subject>something else again</dc:subject>
    <dc:rights>Open Database License (ODbL) v1.0</dc:rights>
  </rdf:Description>
</rdf:RDF>

Then came the decision to add 15 new properties to the DC terms namespace which reflected the original 15 DC elements but which added a liberal smattering of domains and ranges.  So, now we write:

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF
  xmlns:dcterms="http://purl.org/dc/terms/"
  xmlns:foaf="http://xmlns.com/foaf/0.1/"
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <rdf:Description rdf:about="http://example.net/something">
    <dcterms:creator>
      <dcterms:Agent>
        <rdf:value>Andy Powell</rdf:value>
        <foaf:name>Andy Powell</foaf:name>
      </dcterms:Agent>
    </dcterms:creator>
    <dcterms:available
rdf:datatype="http://purl.org/dc/terms/W3CDTF">2009-10-19</dcterms:available>
    <dcterms:subject>
      <rdf:Description>
        <rdf:value>something</rdf:value>
      </rdf:Description>
    </dcterms:subject>
    <dcterms:subject>
      <rdf:Description>
        <rdf:value>something else</rdf:value>
      </rdf:Description>
    </dcterms:subject>
    <dcterms:subject>
      <rdf:Description>
        <rdf:value>something else again</rdf:value>
      </rdf:Description>
    </dcterms:subject>
    <dcterms:rights
rdf:resource="http://opendatacommons.org/licenses/odbl/1.0/" />
  </rdf:Description>
</rdf:RDF>

Despite the added verbosity and rather heavy use of blank nodes in the latter, I think there are good reasons why moving towards this kind of DC usage pattern is a 'good thing'.  In particular, this form allows the same usage pattern to indicate a subject term by URI or literal (or both - see addendum below) meaning that software developers only need to code to a single pattern. It also allows for the use of multiple literals (e.g. in different languages) attached to a single value resource.

The trouble is, a lot of existing usage falls somewhere between the first two forms shown here.  I've seen examples of both coming up in discussions/blog posts about both open government data and open educational resources in recent days.

So here are some useful rules of thumb around DC RDF usage patterns:

  • DC properties never, ever, start with an upper-case letter (i.e. dcterms:Creator simply does not exist).
  • DC properties never, ever, contain a full-stop character (i.e. dcterms:date.available does not exist either).
  • If something can be named by its URI rather than a literal (e.g. the ODbL licence in the above examples) do so using rdf:resource.
  • Always check the range of properties before use.  If the range is anything other than a literal (as is the case with both dc:subject and dc:creator for example) and you don't know the URI of the value, use a blank or typed node to indicate the value and rdf:value to indicate the value string.
  • Do not provide lists of separate keywords as a single dc:subject value.  Repeat the property multiple times, as necessary.
  • Syntax encoding schemes, W3CDTF in this case, are indicated using rdf:datatype.

See the Expressing Dublin Core metadata using the Resource Description Framework (RDF) DCMI Recommendation for more examples and guidance.

[*] The beginning of Dublin Core metadata obviously! :-)

Addendum

As Bruce notes in the comments below, the dcterms:subject pattern that I describe above applies in those situations where you do not know the URI of the subject term. In cases where you do know the URI (as is the case with LCSH for example), the pattern becomes:

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF
  xmlns:dcterms="http://purl.org/dc/terms/"
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <rdf:Description rdf:about="http://example.net/something">
    <dcterms:subject>
      <rdf:Description rdf:about="http://id.loc.gov/authorities/sh85101653#concept">
        <rdf:value>Physics</rdf:value>
      </rdf:Description>
    </dcterms:subject>
  </rdf:Description>
</rdf:RDF>

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8345203ba69e20120a64bd802970c

Listed below are links to weblogs that reference Helpful Dublin Core RDF usage patterns:

Comments

The use of blank nodes throughout the revised example is going to tend to privilege that (less than ideal) approach for people not that familiar with RDF. Would be nice if you could have at least a few (more) examples where the object is URI. Maybe use lcsh URIs for the subjects?

How, or more precisely with what should I be writing such RDF in my web pages?

Let's put this another way. We are presently moving our web site to be based on DRUPAL, with the expectation that more (or most) of our staff will become content/web page creators. In effect, we're trying to deskill our web page creation! However, at the same time I want to add metadata in RDF, and add other RDF into the documents where appropriate (I'm still not _certain_ quite what that's going to mean, but bibliographic references would be one strong possibility). And this rather feels like upskilling rather than deskilling. So I'm looking for an easy tool to help write XHTML pages with RDF/RDFa in them...

Any ideas?

I understand why it's a good idea (more or less), but I wonder if it's complexity is going to make it tricky for it to catch on. Part of the beauty of the 15-term DC stuff is how easy it is to implement.

Being a newcomer to RDF, I still don't entirely understand how to apply DC-RDF the way you've described. And that technical document you link to is kind of rough reading for simple takeaways.

@Bruce - thanks. I've added a brief addendum to try and make this clear.

@Chris - off the top of my head, no. Sorry. I'm not up to date on tools like this.

@Jonathan - yes, I agree that this is non-trivial to grasp and I suspect one of the reasons that the old patterns of usage keep coming up is that they were simpler.

People seem to have a mental association between 'DC' and 'simple usage' which I'm not convinced is appropriate anymore. DC provides a set of properties that are no different in any substantive way from properties that you might use from other sources. Unfortunately, DC brings with it a legacy expectation of simplicity (for many people) who seem to assume that if they want to do anything other than plain old "literal values" they have to go elsewhere for their properties :-(

Andy, I find it interesting that your LCSH example includes both the URI and a literal value:

<rdf:Description rdf:about="http://id.loc.gov/authorities/sh85101653#concept">
<rdf:value>Physics</rdf:value>

Do you think this is the predominant way that value URIs will be used? It looks odd to me -- it's like the value is given twice, once as a URI and once as a literal. This makes the literal value redundant (and a possible source of error if for some reason it doesn't really match the URI). It does match your statement on dc-arch that the value functions as a "preferred label." I was, however, under the impression that the URI *is* the value (which would be why it is called a "value URI").

What would the code look like for a controlled list of values where the values themselves are literals? An example would be the ISO language codes. Assume we have a URI that represents the list of codes, but the codes themselves are not URIs.

Thanks.

Karen,
yes, I think this is the predominant way value URIs will be used (ignoring the current debate about whether it would be better to use skos:prefLabel in place of rdf:value). The use of rdf:value (or skos:prefLabel if we chose to go that way) is optional here - we could just provide the value URI. But adding the rdf:value is helpful for applications that choose not to keep following the graph to retrieve the RDF representation of the value (by dereferencing the value URI)..

Your characterisation of "giving the value twice" is wrong I think. The value dcterms:subject is a thing (in this case a concept) - it is a resource. That resource may or may not have a URI... but irrespective of whether it has a URI or not, it is that thing (that concept in this case) that is the value.

The string is an annotation (label) on that value. The string is *not* the value (in this case).

This is an issues of "strings vs. things".

Whether a value is a string or a thing is always a modelling choice. That modelling choice is indicated by the range of the value. In the case of dcterms:subject we have said that the value is always a thing (a concept, a person, a place , etc.). In such cases, i.e. in all cases where DCMI says that the value is a thing, the string is *not* the value - it is just an annotation (or label) on the value - and the value node needs to be explicitly modelled in the RDF.

In the case of dcterms:language we have said that the range is dcterms:LinguisticSystem - this is also a thing (a concept) - so the modelling will be the same - the language codes will be provided as an rdf:value hanging off a blank node and the RDF data type will indicate the syntax encoding scheme being used (dcterms:ISO639-2 or whatever).

Thanks, again, Andy. I hope you can stand more questions....

You say:

"In the case of dcterms:subject we have said that the value is always a thing ... the string is *not* the value - it is just an annotation (or label) on the value - and the value node needs to be explicitly modelled in the RDF."

So in this case, is rdf:value a label on an un-identified thing?

<dcterms:subject>
<rdf:Description>
<rdf:value>something</rdf:value>
</rdf:Description>
</dcterms:subject>

Hi Karen,

Andy is away for a few days, but in response to your question, essentially, yes: in that example the string "something" is a string which (in the language of the DCAM) "represents" an un-identified thing.

Part of the current discussion around the use of skos:prefLabel is around whether that DCAM notion of "representing" is quite the same thing as "labelling" (in the SKOS sense).

But leaving aside that argument for a moment, the key point is that the string is something distinct from the (un-identified) value itself.

And if that thing did have a URI (the case in the "Addendum" example), that URI is an identifier for the thing, still distinct from the thing itself. Where the DCAM uses the expression "value URI", it means "URI which identifies a value".

The comments to this entry are closed.

About

Search

Loading
eFoundations is powered by TypePad