« Generation G | Main | Flickr Commons »

January 16, 2008

Complexity, compschlexity

I used to think that complex objects were important and that packaging standards were going to be critical for the future of managing learning and/or research objects.  For example, the JISC Information Environment Technical Standards document (wot I wrote) says:

Resources that comprise a collection of items that are packaged together for management or exchange purposes should be packaged using the IMS Content Packaging Specification if they are 'learning objects' (i.e. resources are primarily intended for use in a learning and teaching context and that have a specific pedagogic aim) or the Metadata Encoding & Transmission Standard (METS).

Now I'm not so sure.  Les Carr, over on RepositoryMan, seems to have reached the same conclusion.

For some time now I've argued that the Web is made up of complex objects anyway (in the sense that almost every Web page you look at is a bundle of text, images and other stuff) but that the Web does well to treat these as loosely-coupled bundles of individual items - items that are to a large extent managed and delivered separately.  In some respects (X)HTML acts like a simple packaging format, using the <img>, <object> and other tags to link together the appropriate bundle of items for a given Web page but leaving the client to decide which bits of the bundle to download and use - a text-only browser will behave differently from a graphical browser for example.

Our attempts at more explicit tightly-coupled complex object standards, in the form of IMS CP and METS for example, have resulted in objects that are useful only in some of the systems, some of the time (between a learning object repository and a learning management system for example) but that are largely unhelpful for people armed only with a bog-standard Web browser.

What do I mean by tightly-coupled?  It's difficult to say precisely!  It may be the wrong term.  One can certainly argue that a METS package in which all the content is linked by reference (as opposed to being carried as part of the package) is not hugely dissimilar to the situation with (X)HTML described above.  But there is one massive difference.  (X)HTML is part of the mainstream Web, other packaging standards are not - when was the last time you found a Web browser that knew what to so with an IMS CP or METS package for example?  So maybe the issue has more to do with the solution being mainstream or not, rather than about how tightly-coupled stuff is?

My concern is that repository efforts that first and foremost treat the world as being made up of complex objects that need to be explicitly packaged together using standards like IMS CP or METS in order to be useful may take repositories further away from the mainstream Web than they are currently - which is not a good thing.  IMHO.


TrackBack URL for this entry:

Listed below are links to weblogs that reference Complexity, compschlexity:


In general I'd agree with this - the idea of a special format to develop 'learning objects' doesn't make sense.

However, there are some issues around sharing learning objects. A complex object structure lets you transfer all elements related to a learning object in a single file, which is a useful thing to do - especially if you are moving from closed system to closed system (e.g. from one Bb instance to another?). Also, if you want to describe the structure specific standards are helpful (although I'm not entirely convinced describing the structure is a useful thing to do - perhaps the structure speaks for itself as long as you preserve it).

However, this clearly is a job for an 'export' function - not something that you want to use to create, manipulate or deliver complex objects.

Thanks Andy for the intriguing post. I guess I have to respond since ORE and Fedora (two projects to which I have rather close connections) are implicated by your and Les’ post (I’m going to drop this same comment in Les’ blog).

I think its important to tease apart the issues here and decide what deserve criticism as “old think” and what really makes sense in in the work we do.

- I’ve heard you say several times that “the Web is made up of complex objects anyway”. Indeed that is true and it is what made us old guys so excited many years ago the first time we saw web pages with embedded images in Mosaic - we never went back to gopher. Browsers are one example of client-based construction of complex objects based on information in the file format - in this case xHTML. I think we can all comfortably agree with this model and argue that it is part of what makes the web so interesting, and that we look forward to more “client inferred” complex documents.

- We should note the difference between what Les has posted and your post. Les says "files, files, files", which is way different than your "client aggregated files" argument. I really take issue with Les' file-centric argument since his rich-media argument (file types including Word, PDF, spreadsheet) flies in the face of reuse/refactoring scenarios that we seem to think important.

- I have no argument with criticism of repository-unique packaging formats or notions of compound objects that exist outside the web architecture. Anything that violates the notion of client flexibility that I allude to in my first point should be resisted. I think that you and I have been aligned in criticizing the frequent separation between digital library architecture (whatever that is anymore?) and web architecture. Clearly the latter has emerged as dominant and anything we do in the network information domain must recognize that. I’ve stated this in criticism of earlier OAI-PMH work that failed to synchronize with syndication work (e.g., RSS, ATOM) in the web space.

- My first disagreement with what you say is connected with the identification and resulting “entification” of complex objects. Much better observers of the “social life of information” (John Seely Brown, David Levy, etc.) recognize the importance of information packaging - if I can’t share with you what I am saying or referring to (i.e., the document) then we loose something important in our human fabric. This makes me dissatisfied with the loosy-goosy notion of client-constructed complex documents that the web supports as its only model. Sure its great in many cases - e.g., different presentations for different devices. But sometimes I want to share/preserve/reference/criticize/reuse/etc. a definite “document” (for lack of a better word). To that I need two things: identity and declared boundary. The web with URIs does not supply this. As we all know http://arxiv.org/abs/0710.2029 is not the identity of a Stephen Hawking paper, it is the id of an xHTML splash page. Nor is http://www.flickr.com/photos/carllagoze/2199741004/ the identity of a picture of me, it is the id of an entry page to my collection. In both cases, I have ambiguity about what I am referring to in both its nature and extent. That is fine in lots of cases but inappropriate for many use cases especially scholarship and legal/economic (how many times do we see a warning on a link “you are now leaving the web site of....”)

- My second disagreement, a corollary of the first, relates to what I’ll call “author intent”. I as the composer of a document (which is more-and-more by nature compound/complex in our linked world) should be able to imprint in the document what I meant by it (again its boundary). Les may criticize technologies like Fedora that architecturally imprint the notion of a compound object. But (having already dismissed a totally file-centric approach) I strongly advocate for systems that allow an author to compose an aggregate and imprint that intent at the identity and description level. I strongly agree with the notion that this should not be imprinted at some repository specific level, but that is orthogonal to the architectural notion. I think Jane Hunter’s recent work http://www.dcc.ac.uk/events/dcc-2007/programme/presentations/Day_Two/SCOPE.ppt is a fabulous demonstration of the utility of this type of author intent in the eScience domain and how to then deposit this author intent in supporting repositories. The preservation of the author-intent in these repositories, due to their architectural capabilities is essential.

- From my perspective (and I think most of my OAI-ORE colleagues share this) our current OAI-ORE work addresses these notions (you probably knew I'd get to this!). Its primary goal is to provide identity and boundary definitions for aggregations. This acknowledges the importance of these concepts, the shortcomings of the file-centric approach that Les seems to advocate, and the importance of doing it in a manner that is completely conforming (hopefully) with the web architecture. Of course, the horse is still in the barn (in the sense the OAI-ORE really isn’t prime time yet), but I think that work such as Jane’s shows the utility of this.

To close, I’ll say again. I argue strongly that identity and explicit boundary (or containment) is essential to many information applications. While your notion that “the Web is made up of complex objects anyway” is valuable, it simply can not stand alone as the only implementation of aggregate information. Nor can Les’ “files, files, file...” notion.

I welcome more discussion.

I came away from the CRIG unconf thinking similar things, some of which I posted at http://blogs.open.ac.uk/Maths/ajh59/012201.html

The post provides a really cursory overview of a 'complex splash pages' such as the splash page for a Youtube movie on youtube, and comments on how a sensible URL structure provides a mechanical way of reusing resources embedded in the page.

The comments to this entry are closed.



eFoundations is powered by TypePad