What is ORE for, really?
Pete has rather nicely answered the question, "What is ORE, really?". In response, I'm tempted to ask a slightly different question, "What is ORE for, really? In the ORE User Guide - Primer we find a 'Motivating Example' section which lays out some hard-to-reject statements about the importance of aggregations but which doesn't give us many verbs - it doesn't tell us what it is we can expect to be able to do to those aggregations, nor why we might want to. The previous introductory section does propose three sample uses:
Because aggregations are not well-defined on the Web, we are limited in what we can do with them, especially in terms of the services or automated processes that make the Web useful. People who wish to save or print a multiple page document must manually click through each page and invoke the appropriate browser command. Programs that transfer multiple page documents among information systems must rely on the API's of the individual system architectures and their definition of document boundaries. Search engines must use heuristics to group individual Web pages into logical documents so that search results have the proper granularity.
On the face of it these are perfectly valid functional requirements but I think the underlying point that Pete makes in his post in that ORE, on its own, doesn't meet them. The necessary knowledge that allows one bit of software to say, "ah, these are the pages of a document and I need to print them in this order" or "these are the boundaries of a document" or "it makes sense to group these individual Web pages in this way" based on the data it gets from another bit of software is not captured by ORE. Life is not as simple as saying "here is an aggregation" because the aggregation might not be a set of printable pages from a document, or a set of Web pages, or a coherent set of anything else for that matter and there is very little in ORE that tells you anything about the relationship(s) between the things in the aggregation or their relationship to the outside world. And if ORE doesn't meet its own functional requirements particularly well, it is even further from the kind of functional requirements we envisaged in the work on SWAP. Requirements like, "show me the latest freely available version of this research paper".
Now, I accept that ORE does provide a way of layering that additional information (which might be in the form of SWAP for example) over the top of the aggregation. On that basis the pertinent questions, or so it seems to me, are "given that we probably need that extra level of information to do anything useful with the aggregation, is the information about the aggregation useful on its own?" and "does SWAP capture the right level of detail and is it realistic to expect real-world systems to handle this level of complexity?".
I think the jury is out on both. (Note: I am certainly not arguing that SWAP is better than ORE - they are sufficiently different for that to be a pointless statement anyway and the bottom line is that I'm not completely sure that I'm convinced by either if I'm absolutely honest.) I would say that in the world of learning objects there is quite a long history of treating things as reasonably unrefined aggregations (usually refered to as 'content packages') and that in that space the usefulness of that approach has been fairly minimal.
Sorry... quick final point re: the last paragraph. Conversely, the Web has very successfully built on the simple aggregation known as the RSS/Atom feed.
Posted by: AndyP | February 19, 2009 at 05:31 PM
I do think that merely providing information about what is aggregated is useful. For example, pure aggregation information would allow e.g. Google Scholar to bundle search results that pertain to a same, e.g. DSpace item [1]. It is not uncommon to see multiple hits in Scholar, e.g. one for a PDF version and another for the PS version of an arXiv document. And how about objects that not only have textual info (easily indexed by Scholar) but also other stuff such as videos, datasets, etc. One could imagine indexing strategies for such other stuff that would "inherit" index words from textual resources that are in the same aggregation.
Also, lets not forget that ORE offers more than just the aggregation piece. In essence, via its various serializations (Atom [2], RDF/XML [3], RDFa [4]), its HTTP Guidelines [5], and its Discovery Guidelines [6], it offers a way for repositories to expose their content in a way that is totally aligned with the Architecture of the World Wide Web, Web 2.0, Semantic Web, and Linked Data principles. I mean, only the discovery piece of ORE is IMO a big step forward from the status-quo.
[1] http://public.lanl.gov/herbertv/papers/Repository_Usability.htm
[2] http://www.openarchives.org/ore/1.0/atom
[3] http://www.openarchives.org/ore/1.0/rdfxml
[4] http://www.openarchives.org/ore/1.0/rdfa
[5] http://www.openarchives.org/ore/1.0/http
[6] http://www.openarchives.org/ore/1.0/discovery
Posted by: Herbert Van de Sompel | February 19, 2009 at 11:30 PM
And one more point from me too. Check out this movie to have a glimpse of what can be done with ORE: http://maenad.itee.uq.edu.au/lore/ . You will see the creation of an ORE Aggregation with expressive relationships among Aggregated Resources. Now, this could all be done without ORE, just by creating any old RDF graph. But ORE can provide an interoperable glue for these kind of objects, in this case a learning object actually. And, of course, ORE gives these Aggregations an HTTP URI identity and as such makes them referencable on the Web.
Posted by: Herbert Van de Sompel | February 20, 2009 at 04:03 AM
@Herbert,
It seems to me the use case in your first example above (enabling app to grok that hits on the PDF and the PS are both hits on two different "formats" of the same content) _could_ - I'm not saying "should", just "could" - be addressed using a FRBR-based model, and expressing that data in "any old RDF graph" :-) e.g. a graph accessible using the Cool URI conventions I was thinking through in http://efoundations.typepad.com/efoundations/2009/02/httprange14-cool-uris-frbr.html
Similarly, on the last point about "providing identity" and making things referenceable, that could equally be done using the FRBR concepts of Work or Expression, and the FRBR relationships (or indeed some other, simpler model - I'm not particularly flying the FRBR/SWAP flag here!)
It really depends what we're trying to achieve in each case, I think.
And I guess the other issue that conditions our use of any of these models (ORE, SWAP, whatever) is that while in some cases we may be building new systems surfacing new resources based on a model; in others we're applying it to an existing system, which in all likelihood wasn't based on that model at all, so we're immediately in the business of "retrofitting" a model onto existing sets of resources, which probably reflect - to a greater or lesser degree - a range of different internal "repository models" and community practices.
Posted by: PeteJ | February 20, 2009 at 08:55 AM
Pete, I am going to paraphrase Andy in the days that he still had some sympathy for OAI-PMH, by stating that I am a "theoretical" fan of FRBR. But I am quite skeptical about the creation of an interoperable layer that depends on understanding and deploying FRBR, and - as you say - retrofitting repositories etc to do FRBR.
It feels to me that understanding "aggregations of Web resources" is a tad more straightforward. Hence, I'd rather apply some optional FRBR embellishment to a graph that is based on the quite elementary ORE Aggregation concepts than use FRBR as the starting point.
I think what I am trying to say is that the ORE Aggregation notion is just one level of complexity up from the basic resource, URI, representation notion of the Web Architecture. The way ORE introduces Aggregations almost feels like a natural extension. I don't get that kind of vibe from the introduction of FRBR concepts on the Web. But, obviously, that must have to do with the sensitivity (or lack thereof) of my antennas.
Posted by: Herbert Van de Sompel | February 20, 2009 at 12:29 PM
Hi Herbert,
I do share some of your skepticism about the level of complexity involved in FRBR - really, I do.
And I agree that ORE provides a simpler model than FRBR.
ORE solves those problems where knowing that X, Y and Z are members of a set/aggregation is sufficient to enables me to do something useful.
And the "open" nature of the ORE Resource Map also allows me to say other things about the relationships between the Aggregation or the Aggregated Resources and other things.
Let's leave FRBR to one side, and suppose I just adopt a very simple model where all I'm dealing with are documents, some of which are translations of other documents, and some of which are reviews of other documents.
Using ORE, in addition to saying that X, Y and Z form a set, I can add information in the Resource Map to say, "Y is-a-review of X" and "Z is-a-translation-of X".
I was arguing that if the information my application needs to deliver some function is that "Y is-a-review of X" and "Z is-a-translation-of X", then I can provide that information without needing to say that X, Y and Z form a set/aggregation.
Now that leaves the questions of what model I do use (FRBR, a simple model of translations & reviews, as above, or something else) and what RDF vocabularies I use to provide that information, and, if I'm interested in merging data with that of other data providers, how they compare with the models and vocabularies used by those other data providers.
But it seems to me that, as ORE is silent on those points, those same questions arise if I choose to use ORE.
Posted by: PeteJ | February 26, 2009 at 02:17 PM