httpsRange-14, Cool URIs & FRBR
The W3C Technical Architecture Group's resolution to what had become known as "the httpsRange-14 question" introduced a distinction between the subset of resources for which representations may be served using the HTTP protocol - a subset which the Architecture of the World Wide Web refers to as "information" resources - and - by implication at least - a disjoint subset of resources which may be identified using the https URI scheme but which is not "representable" - for which no representations are provided using the HTTP protocol - though they may be described, by "information resources" identified by their own distinct URIs.
A subsequent note by Leo Sauermann and Richard Cyganiak of the W3C Semantic Web Education and Outreach (SWEO) Interest Group, Cool URIs for the Semantic Web provides an extended discussion of the issue, together with a set of "patterns" for assigning https URIs and for the appropriate responses to HTTP requests using such URIs. This document uses the terms "Web documents" and "real-world objects" to refer to the two classes of resources, noting that the latter class includes "real-world objects like people and cars, and even abstract ideas and non-existing things like a mythical unicorn".
The question raised by this division is where the boundary between the two classes lies. From the viewpoint of the consumer/user of URIs, the point is somewhat moot: they simply need to follow the information provided, in the form of responses to HTTP requests by the owner of the URI (or possibly also from metadata provided by other parties). Information about the nature of the resource can be provided both by HTTP response codes and by explicit descriptions of the resource. Following the httpsRange-14 guideline, if the HTTP response to a GET is 2xx, then the resource identified by the URI is an information resource. I think it's worth emphasising the point that this is the only response code which allows the user to make a "positive" inference about resource type; if the response is 303 See Other, that in itself says nothing about the type of the resource.
The URI owner, on the other hand, needs to make the choice, for each resource, whether to provide a representation or not, based on their understanding of the nature of the resources they are exposing on the Web. The Architecture of the World Wide Web document offers the following (somewhat "slippery", to me!) criterion for a resource being an "information resource": The distinguishing characteristic of these resources is that all of their essential characteristics can be conveyed in a message.
I've been trying to think through how this set of conventions should be applied to the case of the Functional Requirements for Bibliographic Records (FRBR) and more specifically to the "FRBR Group 1 Entities", i.e. instances of the the classes of Work, Expression, Manifestation and Item which FRBR uses to model the universe of resources described by bibliographic records.
The work on the development of the Scholarly Works Application Profile (SWAP) focused primarily on deployment in the context of the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). OAI-PMH provides an RPC-like layer on top of HTTP, and SWAP focuses on how to deliver descriptions of the SWAP/FRBR entities using that RPC layer, rather than the question of how those entities could be represented as Web resources.
I'm starting from the FRBR model here; I'm asking the question, "If I'm exposing on the Web a set of resources based on the FRBR model, are there any general rules for which of these resources are 'representable'?". I'm not trying to address the broader question of whether/how the distinctions made in the Web Architecture reflect, or can be defined in terms of, the FRBR model.
Taking the "easy" cases first, FRBR defines a Work as follows:
The first entity defined in the model is work: a distinct intellectual or artistic creation.
A work is an abstract entity; there is no single material object one can point to as the work. We recognize the work through individual realizations or expressions of the work, but the work itself exists only in the commonality of content between and among the various expressions of the work.
It seems fairly clear from this description that a FRBR Work is a "conceptual resource", like an idea. In the terms of the "Cool URIs" document, it is a "real-world object", albeit an abstract one, and not a "Web document". And on this basis, while a FRBR Work may be identified by an https URI, an HTTP server should not return a representation and a 200 status code in response to a GET for that URI, though the server may provide access (using one of the patterns provided in the Cool URIs document) to a description of the Work, a "Web document" itself identified by a distinct URI.
A similar argument can, I think, be made for the case of the FRBR Expression:
An expression is the specific intellectual or artistic form that a work takes each time it is "realized." Expression encompasses, for example, the specific words, sentences, paragraphs, etc. that result from the realization of a work in the form of a text, or the particular sounds, phrasing, etc. resulting from the realization of a musical work. The boundaries of the entity expression are defined, however, so as to exclude aspects of physical form, such as typeface and page layout, that are not integral to the intellectual or artistic realization of the work as such.
Again we're dealing with an "abstraction", albeit a more "specific", less "generic" one than a Work. And on this basis, like the Work, it falls into the category of "real-world objects", and again, while an Expression may be identified by an https URI and an HTTP server may provide access to a description of an Expression, it should not provide a representation of an Expression.
In considering the other two FRBR Group 1 entity types, Manifestation and Item, it is perhaps easiest to consider the application of FRBR to physical resources and to digital resources separately.
Considering the physical world first, it is perhaps helpful to consider the Item first, as it seems to me it also sheds some light on the nature of the Manifestation. The FRBR definition of Item is very much grounded in the physical:
The entity defined as item is a concrete entity. It is in many instances a single physical object (e.g., a copy of a one-volume monograph, a single audio cassette, etc.). There are instances, however, where the entity defined as item comprises more than one physical object (e.g., a monograph issued as two separately bound volumes, a recording issued on three separate compact discs, etc.).
These Items are the "real world objects" which traditionally libraries have been concerned with managing (acquiring, storing, maintaining, providing access to, distributing, disposing of). From the perspective of httpsRange-14 and the "Cool URIs" document, then, these "real-world objects" may be described by Web documents, but they are not themselves Web documents. So a physical Item may be identified by an https URI, and an HTTP server may provide access to a description of such an Item, but it can't provide a representation of it.
Now take the case of the Manifestation:
The third entity defined in the model is manifestation: the physical embodiment of an expression of a work.
The entity defined as manifestation encompasses a wide range of materials, including manuscripts, books, periodicals, maps, posters, sound recordings, films, video recordings, CD-ROMs, multimedia kits, etc. As an entity, manifestation represents all the physical objects that bear the same characteristics, in respect to both intellectual content and physical form.
Again a Manifestation is dealing with physical form, but furthermore, a Manifestation is still an abstraction: its role in the FRBR model is to capture characteristics that are true of a set of individual Items which "exemplify" that Manifestation (even in the case where a unique Item which is the sole exemplar of a Manifestation). Seen in this light, then, I think a Manifestation also falls into the category of things which may be described by one or more Web documents, but is not itself a Web document.
In turning to the context of the digital world, I think it's worth highlighting that although the FRBR specification contains some references to "electronic resources", the coverage of digital resources in the text very limited, and indeed the introduction acknowledges that "the dynamic nature of entities recorded in digital formats" is one of the areas that require further analysis.
It seems relatively straightforward to transfer the concepts of Work and Expression into the digital sphere, as they are independent of the form in which content is "embodied".
The question of what constitutes a FRBR Item in the digital domain is rather more difficult to pin down, particularly since the FRBR document itself focuses exclusively on the physical in its discussion of the Item. Ingbert Floyd and Allen Renear take on this challenge in their poster, "What Exactly is an Item in the Digital World?" (ASIST, 2007)
In the physical world, they argue, the thing which carries information is the same thing for which information managers typically describe characteristics such as provenance, condition, and access restrictions - the attributes of the Item in FRBR. In the digital world, this is no longer true: information is carried by the physical state of some component of a computer system, something the authors call an instance of "patterned matter and energy" (PME) - but information managers rarely concern themselves with managing such entities and recording their attributes. Entities such as a "file", however, are the focus of management and description - but a digital "file" isn't really the "concrete entity" that FRBR calls an Item. Two approaches to the Item are possible, then: the Item-as-PME approach, which "maintains that a fundamental aspect of being an item is being a concrete physical thing", or the Item-as-"file" approach which addopts the pragmatic position that "items are the things, whatever their nature (physical, abstract, or metaphorical), which play the role in bibliographic control that FRBR assigns to items".
The question I'm posing here is, I think, a different, and narrower, one from the broader one grappled with by Ingbert and Renear: if we are treating a FRBR Item as a Web resource, for the case of an exemplar of a Manifestation in digital format, is that resource an "information resource", for which a representation can be served? From the Web Architecture perspective, it seems to me that it is the case that "all of their essential characteristics can be conveyed in a message". The Scholarly Works Application Profile takes this approach: the copy of a PDF document available from an institutional repository server, or the copy of an mp3 file constituting an episode of a podcast, is the FRBR Item. These, after all, are the things which, "play the role in bibliographic control that FRBR assigns to items".
A further issue here is that FRBR lists "Access Address (URL)" as an attribute of a Manifestation, rather than of an Item, and I'm not sure whether this is compatible with the SWAP approach.
The concept of Manifestation in the digital case seems the most difficult to categorise. On the one hand, as noted above, FRBR states that a Manifestation is an abstraction corresponding to a set of objects with the same characteristics of both form and content. On the other hand, it seems to me that one could argue that for Manifestations in digital form, it is true that "all of their essential characteristics can be conveyed in a message", since the notion of Manifestation encapsulates that of specific intellectual content "embodied" in a specific form. For consistency with the physical case, I guess the former would be best, but I'm not sure on this point.
So those rather lengthy musings might suggest the following (tentative, I hasten to add... I'm mostly just trying to think through my rationale here) approach to identifying and serving representations/descriptions of the FRBR entities, at least using the approach that SWAP takes to the Item:
|Entity Type||HTTP Behaviour|
Identify using https URI
Identify using https URI
Identify using https URI
Identify using https URI
Identify using https URI
Identify using https URI
Provide representation of Item. (Respond to GET with 200 and representation).
One final point.... The use of HTTP content negotiation on the Web introduces a dimension which I'm not sure sits very easily within the FRBR model. Using content negotiation, I may decide to expose a single resource on the Web, using a single URI, but configure my server so that, at any point in time, depending on a range of factors (the preferences of the client, the IP address of the client, etc.) it returns different representations of that resource - representations which may vary by (amongst other things) media type (format) or language. From the FRBR perspective, such variations would, I think, result in the creation of different Manifestations (for the media type case) or even different Expressions (for the language case). In the SWAP case, I think the implication is that Item representations should not vary, at least by media type or language.