« SAML attributes vs. entitlements - a quick rule of thumb | Main | A few brief thoughts on iTunesU »

October 19, 2010

The DCMI Abstract Model in 2010

The Dublin Core Metadata Initiative's 2010 conference, DC-2010, takes place this week in Pittsburgh. I won't be attending, but Tom Baker and I have been working on a paper, A review of the DCMI Abstract Model with scenarios for its future for the meeting of the DCMI Architecture Forum - actually, a joint meeeting with the W3C Library Linked Data Incubator Group.

This is a two-part meeting, the first part looking at the position of the DCMI Abstract Model in 2010, five years on from its becoming a DCMI Recommendation, from the perspective of a new context in which the emergence of the "Linked Data" approach has brought a wider understanding and take-up of the RDF model.

The second part of the meeting looks at the question of what the DCMI community calls "application profiles", descriptions of "structural patterns" within data, and "validation" against such patterns. Within the DCMI context, work in this area has built on the DCAM, in the form of the draft Description Set Profile specification. But, as I've mentioned before, there is interest in this topic within some sectors of the "Linked Data" community.

Our paper tries to outline the historical factors which led to the development of the DCAM, to take stock of the current position, and suggest a number of possible paths forward. The aim is to provide a starting point for discussions at the face-to-face meeting, and the suggestions for ways forward are not intended to be an exhaustive list, but we felt it was important to have some concrete choices on the table:

  1. DCMI carries on developing DCAM as before, including developing the DSP specification and concrete syntaxes based on DCAM
  2. DCMI develops a "DCAM 2" specification (initial rough draft here), simplified and better aligned with RDF, and with a cleaner sepration of syntax and semantics, and either:
    1. develops the DSP specification and concrete syntaxes based on DCAM; or
    2. treats "DCAM 2" as a clarification and a transitional step towards promoting the RDF model and RDF abstract syntax
  3. DCMI deprecates the DCAM and henceforth promotes the RDF abstract syntax (and examines the question of "structural constraints" within this framework)
  4. DCMI does nothing to change the statuses of existing DCAM-related specifications

For my own part, in 2010, I do rather tend to look at the DCAM as an artefact "of its time". The DCAM was created during a period when the DCMI community was between two world views, one, which I tend to think of as a "classical view", reflected in Tom's "A Grammar of Dublin Core" 2000 article for Dlib, and based on the use of "appropriate literals" - character strings - as values, and a second based on the RDF model, emphasising the use of URIs as global names and supported by a formal semantics. In developing the DCAM, we tried to do two things:

  • To provide a formalisation of that "classical" view, the "DCMI community" metadata model, if you like: in 2003, DCMI had "a typology of terms" but little consensus on the nature of the data structure(s) in which those terms were referenced.
  • To provide a "bridge" between that "classical" model and the RDF model, through the use of RDF concepts, and the provision of a mapping to the RDF abstract syntax in Expressing Dublin Core metadata using the Resource Description Framework (RDF).

If I'm honest, I think we've had limited success in these immediate aims. In creating the DCAM "description set model" we may have achieved the former in theory, but in practice people coming to the DCAM from a "classical Dublin Core" viewpoint found that model complicated, and difficult to reconcile with their own conceptualisations. So as a "community model" I suspect the "buy-in" from that community isn't as high as we might like to imagine! People coming to the Dublin Core vocabularies with some familiarity with the (much simpler) RDF model, on the other hand, were confused by, and/or didn't see the need for, the description set model. And a third (and perhaps larger still) constituency was engaged primarily in the use of XML-based metadata schemas (like MODS), with little or no notion of an abstract syntax distinct from the XML syntax itself.

However, I think the existence of the DCAM has perhaps provided some more positive outcomes in other areas.

First, I think the very existence of the DCAM helped advance discussions around comparing metadata standards from different communities, particularly in the initiatives championed by Mikael Nilsson in comparing Dublin Core and the IEEE Learning Object Metadata standard, by drawing attention to the importance of articulating the "abstract models" in use in standards when making such comparisons and when trying to establish conditions for "interoperability" between applications based on them. (This work is nicely summarised in a paper for the ProLEARN project Harmonization of Metadata Standards).

Second, while implementation of the Description Set Profile specification itself has been limited, it has provided a focus for exploring the question of describing structural patterns and performing structural validation, based not on concrete syntaxes and on e.g. XML schema technologies, but on the abstract syntax. A recent thread on the Library Linked Data Incubator Group mailing list, starting with Mikael Nilsson's post, provides a very interesting discussion of current thinking, and this area will be the focus of the second part of the Pittsburgh meeting.

And the Singapore Framework's separation of "vocabulary" from patterns for, or constraints on, the use of that vocabulary - leaving aside for a moment the actual techniques for realising that distinction - has received some attention as a general basis for metadata schema development (see, for example, the comments by Scott Wilson in his contribution to the recent JISC CETIS meeting on interoperability standards.

Finally, it's probably stating the obvious that any choice of path forward needs to take into account that DCMI, like many similar organisations, finds itself in an environment in which resources, both human and financial, are extremely limited. Many individuals who devoted time and energy to DCMI activities in the past have shifted their energy to other areas - and while I continue to maintain some engagement with DCMI, mainly through the vocabulary management activity of the Usage Board, I include myself in this category. Many of the DCMI "community" mailing lists show little sign of activity, and what few postings there are seem to receive little response. And some organisations which in the past supported staff to work in this area are choosing to focus their resources elsewhere.

Against this background, more than ever, it seems to me, it is important for DCMI not to try to tackle problems in isolation, but rather to (re)align its approaches firmly with those of the Semantic Web community, to capitalise on the momentum - and the availability of tools, expertise and experience (and good old enthusiasm!) - being generated by the wider take-up of the "Linked Data" approach, and to explore solutions to what might appear to be "DC-specific" problems (but probably aren't) within that broader community. The fact that the Architecture meeting in Pittsburgh is a joint one seems like a good first step in this direction.

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8345203ba69e20133f4e20a0c970b

Listed below are links to weblogs that reference The DCMI Abstract Model in 2010:

Comments

Just in case there is any ambiguity in my position, I am arguing that DCMI should state clearly that:

- the data model for Dublin Core metadata is the RDF triple/graph data model as defined by RDF Concepts and Abstract Syntax http://www.w3.org/TR/rdf-concepts/
- the formal semantics for Dublin Core metadata are those defined by RDF & RDF Schema http://www.w3.org/TR/rdf-mt/ http://www.w3.org/TR/rdf-schema/

The question of "validation"/"structural constraints" (currently addressed within DCMI in the draft DSP specification) should be explored within the wider Linked Data/Semantic Web community, with the aim of finding solutions based on the RDF graph/triple model.

I don't know whether anything needs to be said about OWL (but note that DCMI's descriptions of its terms do now make use of some OWL constructs).

I would like to see DCMI make this commitment to the RDF standards clear throughout its documentation and tutorial materials, in their use of concepts and terminology, and through the use of "branding" devices such as the logos provided by the W3C http://www.w3.org/2007/10/sw-logos.html

I have no strong opinion on whether DCMI should continue to promote the use of community-specific abstractions based on the RDF triple/graph data model (like "description set", "description", "statement", "value surrogate" etc). I'm slightly "against" on the grounds that their use has caused confusion. But IF those abstractions are useful AND IF they can be well defined in relation to the RDF graph/triple model (i.e both conditions are true), then I would not object to that.

Any solution which ignores the RDF model would not have my support.

And for completeness...

I agree with everything that Pete says above.

I recognise the appeal of the RDF/triple/graph approach to (meta)data representation.

I still don't see why any given metadata standard needs to nail it's colours to that particular technological mast....

Paul

Well... firstly, there's a difference between not nailing DCMI colours to the mast and choosing to invent your own language.

The DCAM does nail DCMI firmly to the RDF mast - and always has done, albeit with enough of a 'legacy' world-view to be able to say things like "this is how we interpret this XML or HTML contruct in the context of RDF".

As I said on Friday night - the issue around the DCAM is not, "is the DCAM RDF or not?" - it clearly is. The issue is, "do we continue to describe the DCMI view of the world using our own language (i.e. that used in the current DCAM) or do we move to using a language shared with W3C (RDF/Linked Data)?". And on the back of that, do we continue to (not!) have discussions in our own closed world (the DC mailing lists) or do we join forces with others working on RDF/Linked Data where there is much more activity?

That's all the debate is about.

People who want to have a different debate - e.g. wanting a non-RDF-based model - need to start from somewhere other than the DCAM.

The core of DCMI activity is its semantics. Those semantics have to be rooted in something - some model. Nailing colours to the mast is about nailing semantics to the mast. The alternative is leaving our semantics to swash about all over the decks - which is where we have been for the last 15 years.

(Sorry... I've stretched your analogy to breaking point there! :-) ).

@Paul,

Just to add some background to Andy's response, it might be helpful to look at the paper I referred to from the ProLEARN project:

Harmonization of Metadata Standards
http://ariadne.cs.kuleuven.be/lomi/images/5/52/D4.7-prolearn.pdf

It tries to summarise the role of "abstract models" in metadata standards, and how the compatability or incompatability of the abstract models used by different metadata standards impacts on interoperability between applications based on them.

It acknowledges that cross-community consensus on abstract models does not yet exist, but makes some pragmatic recommendations:

For abstract models, a consensus has yet to be reached, although the Resource Description Framework (RDF) does provide a framework well founded in Web architecture and a formal semantics. This deliverable still recommends that metadata specifications harmonize their models with the RDF model and, by extension, the semantic web.

This - harmonizing the (previously loosely/informally defined) "abstract model" of the DCMI community with RDF - is the step we tried to take with the DCMI Abstract Model, and we took it because it was felt that RDF/RDF Schema are designed to try to address many of the requirements for metadata identified within that community, mentioned in section 4 of the paper.

The paper also tries to highlight the problems arising from introducing more community-/domain-specific abstract models:

Discourage the introduction of new abstract models into the domain, as this further fragments the community.

If DCMI had not adopted the RDF model, and sought instead to develop an abstract model not based on RDF, we'd be contributing to exactly this problem, and further hindering interoperability.

P.S. Mikael's (currently draft, but hopefully soon to be published) PhD thesis goes into these issues (and other related issues) in more detail, and will be a jolly good read.

P.P.S. I tried to think up some maritime metaphors but failed :)

The comments to this entry are closed.

About

Search

Loading
eFoundations is powered by TypePad