« Web 2.0 usage survey | Main | JISC, Scribd and scholarly repositories »

March 23, 2007

DCMI meetings in Barcelona

Resurantlitoral Pete and I spent the the tail end of last week in Barcelona for a couple of Dublin Core meetings.

On the Thursday, we got together with Tom Baker and Mikael Nilsson to brainstorm application profiles and specifically to think about a UML model for the machine-readable part of an application profile - what we've come to term a Description Set Profile.  More of that later, since although we made a lot of progress during the meeting there is still quite a way to go before we have something ready for general consumption.

On the Friday and Saturday, there was a meeting of the DC Usage Board.  Overall, we had a pretty good meeting and we got through a lot of decision making, helped in part, for me at least, by knowing that the hotel we were staying at had a nice hot tub, pool and sauna to go back to after the meetings!

The first item on the UB agenda was the literal value vs. non-literal value issue that has been raised on the DC-Architecture mailing list during the recent comment period on the domains and ranges proposal.  The issue, briefly, is that it is somewhat problematic in OWL-DL applications for us to define the range of a DC property as being something that can be both a literal and a non-literal - which was exactly what we proposed doing for the new dcterms:title, dcterms:description and dcterms:date properties.

Clearly, one question we need to think about is whether OWL-DL compatibility is important to the DCMI community.  But let's assume for a moment that it is.  If so, then we need to decide one way or another whether the values of the above properties are literals (strings) or non-literals.

Now... readers in the western world are probably thinking to themselves, how can a title be anything other than a literal string? They might even be thinking that treating titles as anything other than literals is so non-intuitive that to do so would be madness.  But interestingly it became very clear during the meeting that for languages that have more than one written form (such as Japanese) treating the title as a non-literal resource, off of which you can hang the multiple written forms, is the only approach that really makes sense.

This is a tricky situation for DCMI, since we are torn between doing things in an intuitive way, at least from the perspective of a large part of the planet, and doing things in a generic enough way to handle all written languages.

On balance, the UB decided to go with the literal approach, but to go back out for another comment period specifically asking implementers in regions with languages that have multiple written forms to consider how they would deal with dcterms:title values being simple literal strings.

One of the other issues that has been taxing the UB of late is trying to categorise the existing list of DCMI-endorsed encoding schemes into syntax encoding schemes and vocabulary encoding schemes.  One might assume that this would be a trivial exercise.  Unfortunately not!  Part of the problem is that the labels we use for our two kinds of encoding schemes - labels that were essentially given to us by the history of DCMI - are no longer very appropriate in the context of the DCMI Abstract Model.

In short, what the Abstract Model tells us is that syntax encoding schemes pertain to value strings, essentially they provide the RDF datatype of the string.  Vocabulary encoding schemes, on the other hand, tell us the set of things of which the value is a member.  At the risk of being somewhat simplistic, syntax encoding schemes define a set of strings, while vocabulary encoding schemes define a set of things.

So even where something smells and tastes just like a vocabulary (as that word is used in common parlance), a good example being the list of language tags defined by RFC 1766, if it defines a set of strings then it is categorised as a syntax encoding scheme rather than a vocabulary encoding scheme.

It takes a while to get your head round this, but once done, everything becomes clearer.

Coincidentally, later in the meeting we came to look at the definitions for all our encoding schemes.  These are somewhat problematic at the moment, because the current definitions simply provide the expanded form of the encoding scheme name - hardly something that can be counted as a good definition!

As we went through the list it suddenly became clear... if we define encoding schemes to be along the lines of either "The set of concepts defined by ..." or the "The set of strings defined by ..." then DCMI's interpretation of other people's systems as being a syntax encoding scheme or a vocabulary encoding scheme would become much easier to grasp.  At least that's the theory!  So, for example, the issue of whether LCSH defines a set of strings or a set of concepts becomes a moot point.  DCMI interprets the dcterms:LCSH encoding scheme to be the set of concepts defined by the Library of Congress Subject Headings.  I.e. we define it to be a vocabulary encoding scheme.

Image: Restaurant Litoral on the Barcelona sea-front.  No, we didn't eat there and yes, I know it's a crap pun!  Sorry.


TrackBack URL for this entry:

Listed below are links to weblogs that reference DCMI meetings in Barcelona:

» AcientWarrior19 from AcientWarrior19
HI! I've have similar topic at my blog! Please check it.. Thanks. [url=http://search.yahoo.com/search?p=AcientWarrior19][/url] http://search.yahoo.com/search?p=AcientWarrior19 [Read More]


Pete, re:

"But interestingly it became very clear during the meeting that for languages that have more than one written form (such as Japanese) treating the title as a non-literal resource, off of which you can hang the multiple written forms, is the only approach that really makes sense."

What about indicating different language forms using xml:lang? If possible, that would be a way to keep the simple literal form.

The comments to this entry are closed.



eFoundations is powered by TypePad