« SharePoint in UK universities - literature review | Main | FOTE09 »

October 05, 2009

QNames, URIs and datatypes

I posted a version of this on the dc-date Jiscmail list recently, but I thought it was worth a quick posting here too, as I think it's a slightly confusing area and the differences in naming systems is something which I think has tripped up some DC implementers in the past.

The Library of Congress has developed a (very useful looking) XML schema datatype for dates, called the Extended Date Time Format (EDTF). It appears to address several of the issues which were discussed some time ago by the DCMI Date working group. It looks like the primary area of interest for the LoC is the use of the data type in XML and XML Schema, and the documentation for the new datatype illustrates its usage in this context.

However, as far as I can see, it doesn't provide a URI for the datatype, which is a prerequisite for referencing it in RDF. I think sometimes we assume that providing what the XML Namespaces spec calls an expanded name is sufficient, and/or that doing that automatically also provides a URI, but I'm not sure that is the case.

In XML Schema, a datatype is typically referred to using an XML Qualified Name (QName), which is essentially a "shorthand" for an "expanded name". An expanded name has two parts: a (possibly null-valued) "namespace name" and a "local name".

Take the case of the built-in datatypes defined as part of the XML Schema specification. In XML/XML Schema, these datatypes are referenced using these two part "expanded names", typically represented in XML documents as QNames e.g.

<count xsi:type="xsd:int">42</count>

where the prefix xsd is bound to the XML namespace name "http://www.w3.org/2001/XMLSchema". So the two-part expanded name of the XML Schema integer datatype is

( http://www.w3.org/2001/XMLSchema , int )

Note: no trailing "#" on that namespace name.

The key point here is that the expanded name system is a different naming system from the URI system. True, the namespace name component of the expanded name is a URI (if it isn't null), but the expanded name as a whole is not. And furthermore, there is no generalised mapping from expanded name to URI.

To refer to a datatype in RDF, I need a URI, and for the built-in datatypes provided, the XML Schema Datayping spec tells me explicitly how to construct such a URI:

Each built-in datatype in this specification (both *primitive* and *derived*) can be uniquely addressed via a URI Reference constructed as follows:

  1. the base URI is the URI of the XML Schema namespace
  2. the fragment identifier is the name of the datatype

For example, to address the int datatype, the URI is:

* http://www.w3.org/2001/XMLSchema#int

The spec tells me that for the names of this set of datatypes, to generate a URI from the expanded name, I use the local part as the fragment id - but some other mechanism might have been applied.

And this is different from, say, the way RDF/XML maps the QNames of some XML elements to URIs, where the mechanism is simple concatenation of "namespace name" and "local name"

(And, as an aside, I think this makes for a bit of "gotcha" for XML folk coming to RDF syntaxes like Turtle where datatype URIs can be represented as prefixed names, because the "namespace URI" you use in Turtle, where the simple concatenation mechanism is used, needs to include the trailing "#" (http://www.w3.org/2001/XMLSchema#), whereas in XML/XML Schema people are accustomed to using the "hash-less" XML namespace name (http://www.w3.org/2001/XMLSchema)).

But my understanding is that that the mapping defined by the XML Schema datatyping document is specific to the datatypes defined by that document ("Each built-in datatype in this specification... can be uniquely addressed..." (my emphasis)), and the spec is silent on URIs for other user-defined XML Schema datatypes.

So it seems to me that if a community defines a datatype, and they want to make it available for use in both XML/XML Schema and in RDF, they need to provide both an expanded name and a URI for the datatype.

The LoC documentation for the datatype has plenty of XML examples so I can deduce what the expanded name is:

(info:lc/xmlns/edtf-v1 , edt)

But that doesn't tell me what URI I should use to refer to the datatype.

I could make a guess that I should follow the same mapping convention as that provided for the XML Schema built-in datatypes, and decide that the URI to use is

info:lc/xmlns/edtf-v1#edt

But given that there is no global expanded name-URI mapping, and the XML Schema specs provide a mapping only for the names of the built-in types, I think I'd be on shaky ground if I did that, and really the owners of the datatype need to tell me explicitly what URI to use - as the XML Schema spec does for those built-in types.

Douglas Campbell responded to my message pointing to a W3C TAG finding which provides this advice:

If it would be valuable to directly address specific terms in a namespace, namespace owners SHOULD provide identifiers for them.

Ray Denenberg has responded with a number of suggestions: I'm less concerned about the exact form of the URI than that a URI is explicitly provided.

P.S. I didn't comment on the list on the choice of URI scheme, but of course I agree with Andy's suggestion that value would be gained from the use of the http URI scheme.... :-)

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8345203ba69e20120a5b5d759970b

Listed below are links to weblogs that reference QNames, URIs and datatypes:

Comments

The comments to this entry are closed.

About

Search

Loading
eFoundations is powered by TypePad