« Preliminary Programme for DC-2007 announced | Main | Good grief... »

June 06, 2007

Refining ORE

I spent the early part of last week in New York for the second meeting of the OAI ORE Technical Committee, which - thanks to the participation in the project of Google's Rob Tansley - took place at the impressive New York offices of Google in downtown Manhattan. (Edit: Ooh, I just noticed that Tony Hammond of the Nature Publishing Group includes a rather cool photo of the view from the offices in his post about the meeting.) 

Since the first TC meeting in January, the group has held a number of telecons, and the ideas we discussed in January have been refined and expanded, and this has been reflected in the content of a paper authored by Carl and Herbert which has now gone through several iterations. The latest version of that paper "Compound Information Objects: An OAI-ORE Perspective" is now available. See also the announcement by Herbert to the oai-implementers mailing list in which he invites comments on that document (Comments on that document to ore at openarchives.org rather than here, please!)

One of the significant steps forward, I think, has been to conceptualise the "descriptions" of "compound objects", and of the relationships between those objects and their component resources, as graphs, and further, to recognise - drawing on work within the Semantic Web community, particularly the work on "Named Graphs" by Jeremy Caroll and colleagues - that those graphs are resources in their own right - resources which are related to, but distinct from, the resources referenced in the graph - and they can be identified and referred to, just like other resources.

The document emphasises that there are issues still under discussion, and indeed for most of our time at the meeting last week, we grappled with the questions raised in Section 7 of that document, and particularly around questions of identity and referencing.

FWIW, I think the main arguments I made at the meeting were:

  • Given the very broad definition/description of what constitutes a "compound object" presented in the opening paragraph of the paper (particularly the example of "a scholarly publication that is aggregation of text and supporting materials such as datasets, software tools, and video recordings of an experiment"), it seems to me there is no fundamental difference between the two scenarios presented in section 7. A "composite" made up of several previously unrelated items, and created through some algorithmic selection process (Case 1), is every bit as much a "compound object" as a digitised book made up of a number of pages (Case 2). In both cases there is an "aggregate" resource which is distinct from, but related to, its "component" resources - and distinct also from a graph which describes the relationships between that composite and its components.
  • (In each case) the relationships between "compound object" and its "component" resources should be explicitly asserted (in one or more graphs).
  • The "compound object", its "component" resources, and the graph(s) describing those resources may be created by different agents at different points in time, given different names/labels/titles, associated with different conditions of use (etc etc), and it may be useful/necessary (particularly when dealing with issues of trust, authority, provenance) to make that information available. 
  • If it is necessary/useful to refer to a resource, then consideration should be given to assigning a URI to that resource (following the Web Architecture Good Practice note "Identify with URIs").
  • If distinct URIs have been assigned to distinct resources, then we must be consistent in our use of those URIs to refer to those resources. If the owner of URI X says that it identifies resource A, then it introduces ambiguity if we then use that same URI to refer to resource B (again, this reflects the principles of the Web Architecture: Constraint  "URIs identify a single resource" ). (This seems particularly important given the third point above.)

Ah, just the usual stuff I tend to bang on about, I suppose ;-)

As a footnote, I noticed that in his post on Monday, Andy referred to the OAI ORE approach as, potentially at least, "relatively complex". I suppose complexity is always relative, but I still hope (though readers of the preceding list of points may already be concluding that I am skating on thin ice with such an aspiration!) that whatever specifications emerge from the project will turn out to be relatively simple - simpler than the ePrints application profile, certainly - and that they will be firmly rooted in the principles of the Web. At the moment, I think it seems complex in part because we've been working through a process of arriving at a shared conceptualisation, and (as always during such processes?) that has involved a certain amount of (occasionally fraught!) "negotiation" as we tried to understand each others' perceptions and particularly get to grips with each others' use of terminology.

Also as Peter Murray from OhioLINK, a fellow member of the TC, mentions in his recent post on the ORE work, the ideas that are presented in the current paper will almost certainly be further refined and will be subject to some further "repackaging for public consumption" (sorry, that's my paraphrasing, not Peter's words!).


TrackBack URL for this entry:

Listed below are links to weblogs that reference Refining ORE:


The comments to this entry are closed.



eFoundations is powered by TypePad