« WorldCat Institution Registry and Identifiers | Main | Virtual worlds, real learning? »

March 02, 2007

What is a Dublin Core Application Profile, really?

The notion that metadata standards are tailored by their implementers to meet the requirements of some particular context and that this contextualisation might be captured in the form of a "profile" was one of the ideas explored within the European Commission-funded DESIRE project back in 1998-2000. It was brought to a broader audience through the widely-cited Ariadne article "Application profiles: mixing and matching metadata schemas" by Rachel Heery and Manjula Patel of UKOLN. For several years now, the Dublin Core Metadata Initiative and DC implementers have used the notion of the DC Application Profile (DCAP) - though so far DCMI hasn't really developed a precise statement of what a DC Application Profile really is! One of the items on the "roadmap" of development work for the DCMI Architecture Forum and the DCMI Usage Board presented at the DC-2006 conference seeks to address this question with the development of a model for a DCAP.

Despite this absence of a formal definition, many DC implementers (and indeed groups within DCMI) have developed specifications recognised as "application profiles". They typically take the form of human-readable documents providing annotated lists of named metadata terms (or permutations of terms) used in DC metadata so as to meet some shared set of goals or requirements within a defined context (e.g. a particular application, some specific set of systems that are exchanging metadata, or some broader domain or community). Further, several initiatives have developed software tools that work with the concept of the DCAP. These include metadata registries that include application profiles as one of the types of resource about which they capture and disclose information; "profile authoring" tools which allow a metadata designer to create a description of their application profile; and "instance authoring" tools which use that description of a profile, e.g. to configure a form for the editing of metadata instances.

The designers/developers of these various tools have had to provide some answer to that question of "what a DC Application Profile is": they have had to develop a model for a DCAP - and then choose a means of representing instances of that model in a machine-processable form. The problem has been, of course, that they have each chosen (at least slightly) different models - and in some cases the same designers/developers used different models over time. I can say this because I was one of them! In my previous life at UKOLN, I contributed to a number of projects which sought to develop prototype metadata registries, building in part on the work of the DESIRE project. One of the central concepts we used was the notion of a DCAP as a set of what in the model for the JISC IE Metadata Schema Registry we called "property usages" (a rather less than elegant compound noun, but the best we could come up with at the time!) - a description of how a named property was deployed or "used" in DC metadata in some application.

Most of this work pre-dated the development, or at least the finalisation, of the DCMI Abstract Model (DCAM). The DCAM tells us that "DC metadata" takes the form of information structures which it calls description sets, which contain descriptions, which in turn contain statements. This, I think, helps elucidate the real purpose/nature of a DCAP, and particularly what is meant by the notion of "using" a property. If the "units" of DC metadata are description sets, then a DCAP is a specification for the construction of some class of description sets - and arguably what we have called a DCAP might be better labelled a "description set profile" (or maybe a "description set template" or "description set pattern"?). And since a description set is made up of multiple descriptions, possibly of different types of resource with different characteristics, a DCAP is also structured to provide information about the descriptions of those different resource types. For example, the Eprints DCAP that we've referred to in some previous posts here specifies how to construct descriptions of what it calls Scholarly Works, Expressions, Manifestations, Copies and Agents. So the DCAP (the "description set profile/template") might be conceptualised as consisting of a set of "description profiles" (templates, patterns). A description is composed of a set of statements, each of which must contain a reference to a property via its URI, and may contain references to metadata terms of other types (vocabulary encoding schemes, syntax encoding schemes), and may contain references to or representations of a value. So the role of what the IEMSR model calls a "property usage" is to provide a specification of how to construct a statement; it is a "statement profile" or "statement template".

While the DCAM specifies the types of component that make up a description set and the relationships between those components, it does not specify that the statements within a description set should refer to any particular set of terms (any specific set of properties, classes, vocabulary encoding schemes, syntax encoding schemes), and it does not specify when statements should provide explicit references to values or when they should provide representations of values. That information is specific to some system or domain or community, and it is that level of shared interest which is addressed by a DCAP - though of course it may be the case that that domain or community is extremely general and broad, as might be argued is the case for the use of the "Simple DC" DCAP.

I think this approach of the DCAP-as-template/pattern dovetails with the approach described by Dan Brickley and extended by Alistair Miles in his Schemarama 2 for the checking of RDF graphs. Dan's post examines the relationship between querying an RDF graph and checking for patterns in the graph, in a fashion more or less analogous to the way Schematron works with patterns in XML trees. Alistair's approach uses the CONSTRUCT feature of the SPARQL RDF query language to generate a "report" based on querying for patterns in a graph, in much the way Schematron uses XPath. And indeed I should acknowledge that much of what I've written here has been influenced by Dan and Alistair's ideas -  with the distinction that I've portrayed the DCAP as being defined with reference to the structure of the DC description set rather than the RDF graph.

Taking a step back from this level of detail, it's also worth noting that - as I think the work on the Eprints DCAP illustrates quite clearly - a DCAP-as-template/pattern exists as, and indeed is created as, one component of a set of closely related resources, including a "domain model" or "application model" which specifies the types of "things in the world" (and their relationships) for which the DCAP specifies how to create DC descriptions - in the case of the Eprints DCAP, a specialisation of the FRBR model - and a specification of how description sets are to be encoded using one or more concrete syntaxes (which in many cases may be simply a reference to an existing DCMI specification).

One last point: one of the other ideas I heard mentioned at DC-2006 was that of a "DCAP module", a specification for DC metadata which, rather than describing how to construct a complete description set, describes how to construct some subset of a description set in order to support functions that are generic to many different applications and domains - the work of the DCMI Accessibity Task Group being a good example - so that it can be referenced/re-used by many different DCAPs. Again, I think the template/pattern-oriented approach would be compatible with this notion: a "DCAP module" could be a set of partial description profiles which could be imported into other DCAPs. Hmmm, I imagine there may be some non-trivial issues there to do with precedence to work through... but enough already, as they say.

P.S. I guess I should just add that the represents only my own "thinking out loud", not any DCMI-endorsed view!


TrackBack URL for this entry:

Listed below are links to weblogs that reference What is a Dublin Core Application Profile, really?:


I guess I should really have said:

So the role of what the IEMSR model calls a "property usage" is to provide a specification of how to construct a statement; it is an annotated "statement profile" or "statement template".

i.e. include "annotated", as it typically provides some human-readable guidance, not just a set of "structural" constraints on the statement components.

The comments to this entry are closed.



eFoundations is powered by TypePad