The DCMI Abstract Model in 2010
The Dublin Core Metadata Initiative's 2010 conference, DC-2010, takes place this week in Pittsburgh. I won't be attending, but Tom Baker and I have been working on a paper, A review of the DCMI Abstract Model with scenarios for its future for the meeting of the DCMI Architecture Forum - actually, a joint meeeting with the W3C Library Linked Data Incubator Group.
This is a two-part meeting, the first part looking at the position of the DCMI Abstract Model in 2010, five years on from its becoming a DCMI Recommendation, from the perspective of a new context in which the emergence of the "Linked Data" approach has brought a wider understanding and take-up of the RDF model.
The second part of the meeting looks at the question of what the DCMI community calls "application profiles", descriptions of "structural patterns" within data, and "validation" against such patterns. Within the DCMI context, work in this area has built on the DCAM, in the form of the draft Description Set Profile specification. But, as I've mentioned before, there is interest in this topic within some sectors of the "Linked Data" community.
Our paper tries to outline the historical factors which led to the development of the DCAM, to take stock of the current position, and suggest a number of possible paths forward. The aim is to provide a starting point for discussions at the face-to-face meeting, and the suggestions for ways forward are not intended to be an exhaustive list, but we felt it was important to have some concrete choices on the table:
- DCMI carries on developing DCAM as before, including developing the DSP specification and concrete syntaxes based on DCAM
- DCMI develops a "DCAM 2" specification (initial rough draft here), simplified and better aligned with RDF, and with a cleaner sepration of syntax and semantics, and either:
- develops the DSP specification and concrete syntaxes based on DCAM; or
- treats "DCAM 2" as a clarification and a transitional step towards promoting the RDF model and RDF abstract syntax
- DCMI deprecates the DCAM and henceforth promotes the RDF abstract syntax (and examines the question of "structural constraints" within this framework)
- DCMI does nothing to change the statuses of existing DCAM-related specifications
For my own part, in 2010, I do rather tend to look at the DCAM as an artefact "of its time". The DCAM was created during a period when the DCMI community was between two world views, one, which I tend to think of as a "classical view", reflected in Tom's "A Grammar of Dublin Core" 2000 article for Dlib, and based on the use of "appropriate literals" - character strings - as values, and a second based on the RDF model, emphasising the use of URIs as global names and supported by a formal semantics. In developing the DCAM, we tried to do two things:
- To provide a formalisation of that "classical" view, the "DCMI community" metadata model, if you like: in 2003, DCMI had "a typology of terms" but little consensus on the nature of the data structure(s) in which those terms were referenced.
- To provide a "bridge" between that "classical" model and the RDF model, through the use of RDF concepts, and the provision of a mapping to the RDF abstract syntax in Expressing Dublin Core metadata using the Resource Description Framework (RDF).
If I'm honest, I think we've had limited success in these immediate aims. In creating the DCAM "description set model" we may have achieved the former in theory, but in practice people coming to the DCAM from a "classical Dublin Core" viewpoint found that model complicated, and difficult to reconcile with their own conceptualisations. So as a "community model" I suspect the "buy-in" from that community isn't as high as we might like to imagine! People coming to the Dublin Core vocabularies with some familiarity with the (much simpler) RDF model, on the other hand, were confused by, and/or didn't see the need for, the description set model. And a third (and perhaps larger still) constituency was engaged primarily in the use of XML-based metadata schemas (like MODS), with little or no notion of an abstract syntax distinct from the XML syntax itself.
However, I think the existence of the DCAM has perhaps provided some more positive outcomes in other areas.
First, I think the very existence of the DCAM helped advance discussions around comparing metadata standards from different communities, particularly in the initiatives championed by Mikael Nilsson in comparing Dublin Core and the IEEE Learning Object Metadata standard, by drawing attention to the importance of articulating the "abstract models" in use in standards when making such comparisons and when trying to establish conditions for "interoperability" between applications based on them. (This work is nicely summarised in a paper for the ProLEARN project Harmonization of Metadata Standards).
Second, while implementation of the Description Set Profile specification itself has been limited, it has provided a focus for exploring the question of describing structural patterns and performing structural validation, based not on concrete syntaxes and on e.g. XML schema technologies, but on the abstract syntax. A recent thread on the Library Linked Data Incubator Group mailing list, starting with Mikael Nilsson's post, provides a very interesting discussion of current thinking, and this area will be the focus of the second part of the Pittsburgh meeting.
And the Singapore Framework's separation of "vocabulary" from patterns for, or constraints on, the use of that vocabulary - leaving aside for a moment the actual techniques for realising that distinction - has received some attention as a general basis for metadata schema development (see, for example, the comments by Scott Wilson in his contribution to the recent JISC CETIS meeting on interoperability standards.
Finally, it's probably stating the obvious that any choice of path forward needs to take into account that DCMI, like many similar organisations, finds itself in an environment in which resources, both human and financial, are extremely limited. Many individuals who devoted time and energy to DCMI activities in the past have shifted their energy to other areas - and while I continue to maintain some engagement with DCMI, mainly through the vocabulary management activity of the Usage Board, I include myself in this category. Many of the DCMI "community" mailing lists show little sign of activity, and what few postings there are seem to receive little response. And some organisations which in the past supported staff to work in this area are choosing to focus their resources elsewhere.
Against this background, more than ever, it seems to me, it is important for DCMI not to try to tackle problems in isolation, but rather to (re)align its approaches firmly with those of the Semantic Web community, to capitalise on the momentum - and the availability of tools, expertise and experience (and good old enthusiasm!) - being generated by the wider take-up of the "Linked Data" approach, and to explore solutions to what might appear to be "DC-specific" problems (but probably aren't) within that broader community. The fact that the Architecture meeting in Pittsburgh is a joint one seems like a good first step in this direction.