« Clouds on the Horizon | Main | Software as a disservice »

February 17, 2009

What is ORE, really? (A personal viewpoint)

This is another post that I've had sitting around in draft for ages, but which some recent discussion has prompted me to dig out and try to finish. Chris Keene commented on my post of some time ago about the publication of OAI ORE specs, asking for some clarification on what it is that OAI ORE provides, "what ORE is", I suppose, and I promised I'd take a stab at answering. I guess I should emphasise that this is my personal view only, but here's my attempt at a response to Chris' questions.

is it a protocol like OAI-PMH or a file standard? I read a primer (somewhat quickly) and it seems to be almost a XML file specification to be read over HTTP, which describes a resource such as a repository? is that right?

I think it's helpful - and maybe why I think it's important will become clearer by the end of this post - to distinguish between the parts of the ORE specifications which are specific to ORE and the parts which provide guidance on how to apply principles and conventions which have been defined outside of the ORE context, are not dependent on the use of the ORE-specific parts of ORE, and are more general in their application. (The distinction I'm making here doesn't quite match the separation ORE itself makes between "Specifications" and "User Guides".)

Some parts of the ORE specifications are "ORE-specific", they define or describe things that aren't defined or described elsewhere. Those things are:

  1. A simple data model for the things ORE calls a Resource Map, an Aggregation and a Proxy. This is defined by the Abstract Data Model document. Here the term "data model" is used in the sense of a "model of (some part of) the world", a "domain model", if you like - though in the ORE case, it is intended to be quite a generally applicable one.
  2. An RDF vocabulary used, in association with terms from some existing RDF vocabularies, for representing instances of that model. This is defined in human-readable form by the Vocabulary document, and in machine-processable form by the RDF/XML "namespace document" http://www.openarchives.org/ore/terms/.
  3. A variant of what I might call - following the terminology used by Alistair Miles - a "Graph Profile", a specification of some structural constraints on an RDF graph which should be met if that graph is to serve as an ORE Resource Map, a set of "triple patterns", if you like, for the triples that make up an ORE Resource Map. This is defined in Section 6 of the Abstract Data Model document.
  4. A set of conventions for representing an ORE Resource Map as an Atom Entry Document, using the Atom Syndication Format. This is defined by the ORE document Resource Map Implementation in Atom
  5. A set of conventions for disclosing and discovering ORE Resource Maps, defined by the document Resource Map Discovery. Some of these are applications of existing conventions, but as there are some ORE-specific aspects (e.g. the definition of http://www.openarchives.org/ore/html/ as an HTML profile specifying the use of "resourcemap" as an X/HTML link type), I'm including it in this list.

Those are the things I tend to focus on when I try to answer the question "What is ORE, really?"

In addition to those ORE-specific elements, the ORE specifications also provide guidelines for how to make use of various other existing specifications and conventions when deploying the ORE model:

  1. The two documents, Resource Map Implementation in RDF/XML and Resource Map Implementation in RDFa describe how to use those two existing syntaxes, defined by W3C Recommendations, to represent Resource Maps
  2. The document HTTP Implementation describes how to apply the principles and patterns define by the W3C TAG's httpRange-14 resolution and the Cool URIs for the Semantic Web document

For the most part, these documents don't really provide new information, at least in the same way those noted above do: instead, they indicate how to apply some existing, more general specifications when making use of the ORE-specific specifications listed above.

That's not to say they aren't useful guidelines: they are, not least because they "contextualise" the general information provided by the more general specifications, and provide ORE-specific examples of their use. The ORE HTTP Implementation document selects from the patterns of the Cool URIs document and provides illustrations of their use for the URIs of Aggregations and Resource Maps.

My main point here is that I think it's important - particularly for audiences who are perhaps encountering some of these more general principles and conventions for the first time in the specific context of ORE - to "decouple" these two aspects, and to make clear that the use of these principles and conventions is not dependent on the ORE-specific parts, and they can - and indeed should - be applied in other contexts too. More on that later.

To answer, Chris' specific questions above: no, ORE isn't a protocol; no, it isn't (what I think of as) a "file standard", though it describes the use of some existing formats; and while ORE does deal with the description of things, the things it deals with are what it calls "aggregations", not "repositories", at least as that term is typically used in the OAI context, to refer to a system that supports some functions. The concept of a repository doesn't feature in ORE.

And I'm not sure how it fits in with OAI-PMH does it replace, or improve, or cater for different needs (they both seem to cater for getting an item from one system to another).

I think ORE is largely orthogonal to OAI-PMH. ORE was not designed to "replace" or "improve" OAI-PMH. ORE can be used independently of OAI-PMH, or, as I think the Discovery document illustrates, it can be used in the context of OAI-PMH, i.e. you could expose ORE Resource Maps as metadata records over OAI-PMH.

Having said that, I do think the approaches underpinning ORE provide at least some hints of how the sort of functionality which is currently provided by OAI-PMH in an RPC-like idiom, where a client "harvester" sends protocol-specific requests to a "repository", might be offered using a more "resource-oriented" approach. Here, I'm not using the term "resource-oriented" to highlight a distinction between "resource" and "metadata", but rather to emphasise the notion of treating all the "items of interest" to the application as "resources" in the sense that the Web Architecture uses that term, assigning them URIs, and supporting interaction using the uniform interface defined by the HTTP protocol. And those "items of interest" can include resources which are descriptions of other resources, and resources which are collections of resources - collections based on various criteria. Anyway, it isn't my intention here to embark on specifying an alternative approach to OAI-PMH. :-)

Chris also asked:

And what about things like SWAP and SWORD?

Let's take the case of SWORD first, as it's the one I know less about! :-) I'm not a SWORD/Atompub expert at all but I think ORE is independent of SWORD, but designed to be usable in the context of SWORD, i.e. in principle at least, an ORE Resource Map could form the subject of a SWORD "deposit". Richard Jones ponders three variant approaches, and there is some discussion on the OAI ORE Google Group.

The case of the Scholarly Works Application Profile (SWAP) raises some issues which I think illustrate some of the points I was making above about the wider applicability of some of the conventions used within ORE.

First, I think there are differences in "scope and purpose". SWAP focuses very specifically on the "eprint" and on supporting a more or less well-defined set of operations, particularly operations related to "versioning" and the various types of relationships between resources which one encounters when dealing with those issues; ORE focuses on a rather simpler, more generic concept of "aggregation" and membership of a set. Having said that, the ORE model can also be applied to the case of the eprint, and indeed some of the examples in the specifications and in supporting presentations use examples of applying ORE to eprint resources.

Second, again as noted above, ORE makes use of some general principles and patterns for exposing resources and resource descriptions on the Web. But those principles and patterns are equally applicable in the context of data models other than ORE; what ORE calls a "Resource Map" is a specialised case of an RDF graph, and the HTTP patterns for providing access to a Resource Map are applications of patterns which can be - and are - applied to provide access to data describing resources of any type - including resources of the type described by SWAP. It isn't necessary to make use of the ORE concept of the Aggregation to use those patterns.

Now then, it is true that the SWAP documentation does not make reference to these patterns, but that is probably because of two considerations. First, at the time of its development, the primary context of use considered was that of exposing data over OAI-PMH. Second, although the httpRange-14 resolution had been agreed, it hadn't been as widely disseminated/popularised  as it has been subsequently, particularly in the form of the Linked Data tutorial and the Cool URIs document. But as I discussed in a recent post, those same principles and patterns used in ORE can be applied to the FRBR case - and if SWAP was being developed now, I'm sure reference to those approaches would be included. (Well, they would if I had any input to the process!)

Third, picking up on my attempt above to identify what I think are the "core" characteristics of ORE, ORE and SWAP are based on two different "models of the world", both of which can be applied to the case of the eprint. From the perspective of the ORE model, the eprint is viewed as an aggregation made up of a number of component/member resources; with SWAP, the perspective is that of the FRBR model - a Work realised in one or more Expressions, each embodied in one or more Manifestations, each exemplified by one or more Items (possibly with relationships between this Work and other Works, between Expressions of the same or different Works, between Works, Expressions etc and Agents, and so on).

In the FRBR case, although, as in the ORE case, there are multiple related resources involved, there isn't necessarily a notion of "aggregation" involved: a FRBR Work (or indeed any of the FRBR Group 1 entities) may be a composite/aggregate resource, but it isn't necessarily the case. There is nothing in FRBR that treats, say, the set of all the Items which exemplify the Manifestations of the Expressions of a single Work as a single aggregate entity - but FRBR does allow for the expression of whole/part relationships between instances of the various Group 1 entities.

So, I think it is important to remember that the choice to use either ORE or SWAP to model an eprint is just that: a modeling choice, one which enables certain functionality on the basis of the data created. Depending on what we want to achieve with the data, different choices may be appropriate.

So to return to Chris' question, it seems to me the core difference between ORE and SWAP is that they offer different models which can be applied to the "eprint". And here, I think I'm revisiting the point that, quite some time ago now, Andy made in terms of contrasting what he called "compound objects" and "complex objects". I must admit I didn't and don't like the term "complex object" - if I describe a set and its members, I understand that the set is the "compound object", but if I describe a document and its three authors, or a FRBR Work, its Expressions, their Manifestations, their Items, and a number of related Agents, which one of them is the "complex object"? - but the point remains a good one: many of the functions we wish to perform rely on our capacity to represent relationships other than relationships of "aggregation" or "composition".

Of course, the ORE concept of the Resource Map does allow for the expression of any other types of relationship, in addition to the required ore:aggregates relationship (and I think using ORE and FRBR together would requires some careful analysis, given the nature of whole/part relationships in FRBR); but one can also construct descriptions expressing other types of relationship, and make those descriptions available using the community-agreed conventions of the Cool URIs document, without using ORE.

So, that turned into another rather rambling post, and I'm not sure how much it helps, but that's my take on "what ORE is".

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8345203ba69e20111685feb2d970c

Listed below are links to weblogs that reference What is ORE, really? (A personal viewpoint):

Comments

The comments to this entry are closed.

About

Search

Loading
eFoundations is powered by TypePad