« FRBR & "Time-Based Media", Part 3: Stills | Main | Beta Release of ORE Specifications and User Guides »

June 03, 2008

Is DCMI hiding its light under a bushel?

Good grief... Dublin Core gets some bad press at times - some of it justified, some of it not - and I have a tendency to blow hot and cold on the subject myself every so often but my blood near boils when I see people mis-representing Dublin Core as being just "a set of 15 basic fields" and then comparing it to other metadata standards that have, oh, let's say, 80 fields as though that makes them necessarily better and more expressive.

The recent Metadata for digital libraries: state of the art and future directions report published by JISC TechWatch is a case in point.  "State of the art and future directions"?  I'm sorry... I think I might have missed something?  The report doesn't even mention the Semantic Web or RDF - not even once.  So, if you want a report looking at the state of the art of METS and MODS in digital libraries this is the report for you - otherwise look elsewhere.  My suggestion for the TechWatch people is, "give your reports more appropriate titles - after all, it is probably the single most important metadata field!".  And for the rest of you... here's my Bluffer's Guide to the Dublin Core tip - if anyone starts using constructs like "creator.author" when they are talking about Dublin Core you can confidently tell them that they are about 5 years behind the curve.

Just for the record, Dublin Core hasn't been just "a set of 15 basic fields" since about 1995.  The current list of DCMI metadata terms stands at 50 or 60 I guess (not all of which are properties by the way) but the numbers are largely irrelevant.  What the Dublin Core provides is a set of flexible and extensible frameworks (primarily the DCMI Abstract Model but also the more recent and ongoing work looking at application profiles in the form of the Singapore Framework for Dublin Core Application Profiles) that are tightly bound to the core standards that make up the Semantic Web and that provide a toolkit for building metadata applications rich enough to meet any (yes I really do mean any) set of functional requirements whilst still remaining semantically interoperable with each other.

OK, rant over, and I apologise in part to the TechWatch report author.  As I say, if you want to know more about METS and MODS and how they fit with digital libraries then I'm pretty sure that the report is an excellent place to start.  More importantly, there are mitigating circumstances which make it understandable why people make the assumption that Dublin Core is just "a set of 15 basic fields".  DCMI has an identity crisis - it is torn between, on the one hand, promoting the highly extensible, flexible, semantically rich but conceptually challenging frameworks outlined above and, on the other, the simple, easy to understand but ultimately rather useless original 15 elements.  The only formal standards documents produced by the DCMI (ISO 15836 / NISO Z39.85 and RFC 2413) both focus solely on the original 15 elements, presumably leaving some people with the view that this is all that matters.

What I think has happened is that over the years the DCMI have tried, with some success, to associate the "Dublin Core" brand with only the 15 elements, using other terminology (usually with the prefix "DCMI") for everything else.  The result is something of a confusing mess, leaving the real value proposition offered by DCMI hidden under a bushel.  This is a shame IMHO.

To sum up... Dublin Core (at least in its widest interpretation) is definitely, 100%, absolutely, categorically not just 15 metadata elements but if you want to know what it is you'll have to look beyond the old standards documents and spend some time understanding the thinking that underpins the DCMI Abstract Model, the Singapore Framework for Dublin Core Application Profiles and various associated documents.  Only at that point will it be possible to have a sensible discussion about whether MODS and/or METS (or anything else for that matter) are "richer" than the Dublin Core or not.

To refer back to a comment by Irvin Flack on Pete's "Dublin Core layered model" post, Donkey may be right to suggest that Dublin Core stinks like an onion but if he is it isn't because it only has 15 metadata fields!


TrackBack URL for this entry:

Listed below are links to weblogs that reference Is DCMI hiding its light under a bushel?:


Oh my, that really is an anachronistic article.

To be fair, it at least discusses XML namespaces, from 1999, so it seems pretty modern!

How on earth can this be published without anyone noticing?

Until digital library systems in wide use (like CONTENTdm or DSpace) show Dublin Core as something more than 15 elements or 15 elements with qualifiers, Dublin Core is going to be misunderstood as 15 elements.

While the JISC report should have gotten this right, I don't expect many practitioners to really understand the Abstract Model until they see it practically applied in their systems. This isn't because they can't understand it, but because it takes a lot of time to really get what the Abstract Model and the Singapore Framework are doing. Many practitioners just don't have the time to do that. (I'm thinking here not of those in big research libraries who are doing research and development, but those in smaller institutions with limited resources but who just want to move forward with digitization programs and systems like CONTENTdm are going to work best for them.)

I'm not sure how this is best resolved. I try to talk about the Abstract Model now when I'm speaking about Dublin Core, but it's hard to get folks to understand the shift without some practical applications to show them.

It would help if the decision were taken to separate committee outputs from dissemination efforts. The inability to communicate certain ideas to the public may plausibly be related to the likelihood that the public are not a core target audience. It's easier to build beautiful, complex, detailed conceptual models without having to stop every two minutes to explain it to the dunces at the back of the class. But one has to decide who one's audience is. Other researchers?

Rhetorically, why does the Abstract Model matter? The words on the page make sense but http://dublincore.org/documents/abstract-model/ is a set of factoids without any narrative text or evident point. Why is forcing great lumps of UML down the throats of the general public considered an inspiring or even sane first step towards dissemination? Tell us in English why we should care and what it ought to mean to us. This documentation no verb. Acting as though it should be obvious to all who are true disciples is a passive-aggressive and inappropriate way to treat the general public. If they give up and go with something they can understand, who can blame them?

There is an understandable limit to the amount of time that the general public can be bothered to put into a given document, even assuming that the importance of said document is obvious, which it apparently is not. As a rule, if it can't be adequately explained in a paragraph of readable English, it may as well have never happened. Consequence of the attention economy.

So don't lay all the blame on the authors of the TechWatch article (incidentally, I can see why they separate 'metadata' from 'semantic web'; that modern DCMI fuses these concepts is intriguing in its own right). Think of this article as a sample viewpoint from a set of generalists who will have made some reasonable research effort, at least of the documents that were accessible to them. The symptoms mentioned here probably result from a lack of documentation that is intelligible to the general public. And that, I suspect, is because the concerns dealt with in these areas are sufficiently divorced from reality that it takes significant amounts of exposition, explanation and demonstration to get back to planet Generalist - a standard weakness of frameworks and models. The real WTF here is not the TechWatch report, but the docs, the procedures and the attitudes that cause them. "If you want to know what it is you'll have to look beyond the old standards documents and spend some time understanding the thinking that underpins..." The general public should not be asked to decompile the distilled wisdom of the Committee.

Summary: If you want to encourage reading and reuse, begin by writing for reading and reuse. And beta-test all documents OUTSIDE your immediate domain of co-authors and co-workers, who probably drank the same Kool-Aid anyway.

@Sarah and @George Thanks... I tend to agree with what you both say here, especially with @George's "The real WTF here is not the TechWatch report, but the docs, the procedures and the attitudes that cause them" which is what I was trying to say in the second part of the post but it probably didn't come thru strongly enough. Andy.

The comments to this entry are closed.



eFoundations is powered by TypePad