More ruminations on compoundness and complexity (and metadata)
This is a somewhat belated post that I started a few days ago, but put to one side while we concentrated on reading through the pile of Eduserv Research Grant proposals.
A couple of weeks ago I attended the workshop on describing complex objects that Andy referred to, and at which he gave a presentation (I was in the happy position of being able to sit in the back row and nod enthusiastically).
The programme featured presentations on three fairly widely used "packaging formats": MPEG-21 DIDL (by Frances Knudson of Los Alamos National Laboratory), METS (by Markus Enders of Goettingen State and University Library) and IMS Content Packaging (by Sheila Macneill of CETIS and University of Strathclyde).
The programme also included a presentation by Wilbert Kraan of CETIS on an IEEE LTSC project called RAMLET (Resource Aggregation Model for Learning, Education and Training), which has developed an ontology that can be used as the basis for mapping between instances of different "packaging formats".
Andy's presentation was the last of the five, and, leaving aside the DC-specific aspects, I thought probably his key point was that metadata is at the heart of what we call "content packaging" - metadata that describes certain specific characteristics of resources in order to allow applications to perform certain specific functions, certainly, but ultimately a key part of a "package" is some set of "statements" about some resources - and more specifically about relationships between resources.
So, when I create the content of a <structMap> element in a METS instance or of an <organisation> element in an IMS CP instance, I'm describing relationships between resources. (I'm consciously not commenting further on DIDL here as I'm much less familiar with the specification and after Frances' presentation I feel I need to go away and read up a bit more before making (probably quite misguided) comments about it!) To take a very simple example, if I create a <structMap> like (rough outline for illustration purposes only - I don't promise that this is a complete/valid METS XML fragment!):
<div label="My paper">
<div label="My section 1">
<fptr fileid="file001" />
<div label="My section 2">
<fptr fileid="file002" />
<div label="My section 3">
<fptr fileid="file003" />
or an IMS CP <organization> like (caveats as above!):
<item identifier="item2" identifierref="file0001">
<title>My section 1</title>
<item identifier="item3" identifierref="file0002">
<title>My section 2</title>
<item identifier="item4" identifierref="file0003">
<title>My section 3</title>
then in each case I'm "saying" that one resource (titled "My paper") is composed of a sequence of component resources titled "My section 1", "My section 2" and "My section 3". Elsewhere in the METS or IMS CP document I provide URIs of those resources. OK, it's a bit more complicated than that, but for the purposes of this argument, I'll stick to a simple case. And I could "say" exactly the same thing by constructing a Dublin Core metadata description set or an RDF graph, e.g. using the Turtle syntax for RDF:
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix ex: <http://example.org/terms/> .
_:resource1 dc:title "My paper" ;
ex:hasOrganisation [ a rdf:Seq ;
rdf:_1 <http://example.org/docs/1> ;
rdf:_2 <http://example.org/docs/2> ;
rdf:_3 <http://example.org/docs/3> ] .
<http://example.org/docs/1> dc:title "My section 1" .
<http://example.org/docs/2> dc:title "My section 2" .
<http://example.org/docs/3> dc:title "My section 3" .
And I could probably do something similar using various other metadata specifications that allow me to describe relationships between things. I'm conscious that I'm over-simplifying somewhat, and METS and IMS CP provide other features that go beyond describing relationships, particularly in terms of describing how to embed representations of resources within an instance, but nevertheless I think Andy's point is a good one: a description of relationships between resources is a form of metadata. (Footnote: Sheila doesn't sound completely convinced!)
The other key point emerging from Andy's presentation, which he also highlighted in his earlier post, is that resources are of different types and relationships between resources are of different types, and he proposed a distinction between "compond objects" and "complex objects" on the basis of the different categories of relationship being described.
It seems to me that METS and IMS CP are fundamentally about describing what I think of as structural relationships - Andy's "compound object" case - : when I construct a METS structMap or an IMS CP organization, I'm "saying" resource W has components resources X, Y and Z. Further, I think METS and IMS CP support a specific subset of structural relationships i.e. they deal essentially with (ordered?) tree structures, where a "parent" resource has as components a sequence of "child" resources.
And (rather more tentatively!) I'd venture that the types of resources with which METS and IMS CP are concerned are, more or less, what the Web Architecture categorises (albeit somewhat vaguely!) as "information resources" i.e.
We do not limit the scope of what might be a resource. The term "resource" is used in a general sense for whatever might be identified by a URI. It is conventional on the hypertext Web to describe Web pages, images, product catalogs, etc. as "resources". The distinguishing characteristic of these resources is that all of their essential characteristics can be conveyed in a message. We identify this set as "information resources."
This document is an example of an information resource. It consists of words and punctuation symbols and graphics and other artifacts that can be encoded, with varying degrees of fidelity, into a sequence of bits. There is nothing about the essential information content of this document that cannot in principle be transfered in a message. In the case of this document, the message payload is the representation of this document.
But Andy went on to consider the example of the ePrints DC Application Profile, which is concerned with the description of resources of several different types, at least some of which - agents, for example - are not "information resources", and the description of various types of relationship which are not structural, e.g. relationships like is-created-by, is-published-by, and so on. While it is quite possible to describe relationships of these types between resources using statements in a Dublin Core metadata description set or using RDF, it seems to me I can not describe such relationships using a METS structMap or an IMS CP organization.
The point I'm trying to make here is not that I think Dublin Core is "better" than METS or IMS CP, but rather that, in order to make decisions about which specifications we use in this area, it's important to understand what each of the "packaging formats" allows us to "say" about "things in the world". From this viewpoint, the syntactic structure of, say, a METS XML instance is of less interest than what information such a document allows us to convey about resources and the relationships between them, i.e. what models underpin those formats - not models of the packaging instance itself (which I think is what is described by e.g. the IMS Content Packaging Information Model) but of the resources described or referred to within that instance.
Such considerations will be important in the context of the OAI ORE initiative: for example, if an existing "packaging format" is used to serialise the ORE model, then it becomes critical that we understand fully any model inherent in that format - any built-in assumptions about the nature of the resources referenced or described, and the nature of any relationships between resources that are expressed within the format - , and that we ensure that any such serialisation accurately reflects the ORE model.