Via Twitter, initially in a post by Lorcan Dempsey, I came across the work of Herbert Van de Sompel and his comrades from LANL and Old Dominion University on the Memento project:
The project has since been the topic of an article in New Scientist.
The technical details of the Memento approach are probably best summarised in the paper "Memento: Time Travel for the Web", and Herbert has recently made available a presentation which I'll embed here, since it includes some helpful graphics illustrating some of the messaging in detail:
Memento seeks to take advantage of the Web Architecture concept that interactions on the Web are concerned with exchanging representations of resources. And for any single resource, representations may vary - at a single point in time, variant representations may be provided, e.g. in different formats or languages, and over time, variant representations may be provided reflecting changes in the state of the resource. The HTTP protocol incorporates a feature called content negotiation which can be used to determine the most appropriate representation of a resource - typically according to variables such as content type, language, character set or encoding. The innovation that Memento brings to this scenario is the proposition that content negotiation may also be applied to the axis of date-time. i.e. in the same way that a client might express a preference for the language of the representation based on a standard request header, it could also express a preference that the representation should reflect resource state at a specified point in time, using a custom accept header (X-Accept-Datetime).
More specifically, Memento uses a flavour of content negotiation called "transparent content negotiation" where the server provides details of the variant representations available, from which the client can choose. Slides 26-50 in Herbert's presentation above illustrate how this technique might be applied to two different cases: one in which the server to which the initial request is sent is itself capable of providing the set of time-variant representations, and a second in which that server does not have those "archive" capabilities but redirects to (a URI supported by) a second server which does.
This does seem quite an ingenious approach to the problem, and one that potentially has many interesting applications, several of which Herbert alludes to in his presentation.
What I want to focus on here is the technical approach, which did raise a question in my mind. And here I must emphasise that I'm really just trying to articulate a question that I've been trying to formulate and answer for myself: I'm not in a position to say that Memento is getting anything "wrong", just trying to compare the Memento proposition with my understanding of Web architecture and the HTTP protocol, or at least the use of that protocol in accordance with the REST architectural style, and understand whether there are any divergences (and if there are, what the implications are).
In his dissertation in which he defines the REST architectural style, Roy Fielding defines a resource as follows:
More precisely, a resource R is a temporally varying membership function MR(t), which for time t maps to a set of entities, or values, which are equivalent. The values in the set may be resource representations and/or resource identifiers. A resource can map to the empty set, which allows references to be made to a concept before any realization of that concept exists -- a notion that was foreign to most hypertext systems prior to the Web. Some resources are static in the sense that, when examined at any time after their creation, they always correspond to the same value set. Others have a high degree of variance in their value over time. The only thing that is required to be static for a resource is the semantics of the mapping, since the semantics is what distinguishes one resource from another.
On representations, Fielding says the following, which I think is worth quoting in full. The emphasis in the first and last sentences is mine.
REST components perform actions on a resource by using a representation to capture the current or intended state of that resource and transferring that representation between components. A representation is a sequence of bytes, plus representation metadata to describe those bytes. Other commonly used but less precise names for a representation include: document, file, and HTTP message entity, instance, or variant.
A representation consists of data, metadata describing the data, and, on occasion, metadata to describe the metadata (usually for the purpose of verifying message integrity). Metadata is in the form of name-value pairs, where the name corresponds to a standard that defines the value's structure and semantics. Response messages may include both representation metadata and resource metadata: information about the resource that is not specific to the supplied representation.
Control data defines the purpose of a message between components, such as the action being requested or the meaning of a response. It is also used to parameterize requests and override the default behavior of some connecting elements. For example, cache behavior can be modified by control data included in the request or response message.
Depending on the message control data, a given representation may indicate the current state of the requested resource, the desired state for the requested resource, or the value of some other resource, such as a representation of the input data within a client's query form, or a representation of some error condition for a response. For example, remote authoring of a resource requires that the author send a representation to the server, thus establishing a value for that resource that can be retrieved by later requests. If the value set of a resource at a given time consists of multiple representations, content negotiation may be used to select the best representation for inclusion in a given message.
So at a point in time t1, the "temporally varying membership function" maps to one set of values, and - in the case of a resource whose representations vary over time - at another point in time t2, it may map to another, different set of values. To take a concrete example, suppose at the start of 2009, I launch a "quote of the day", and I define a single resource that is my "quote of the day", to which I assign the URI http://example.org/qotd/. And I provide variant representations in XHTML and plain text. On 1 January 2009 (time t1), my quote is "From each according to his abilities, to each according to his needs", and I provide variant representations in those two formats, i.e. the set of values for 1 January 2009 is those two documents. On 2 January 2009 (time t2), my quote is "Those who do not move, do not notice their chains", and again I provide variant representations in those two formats, i.e. the set of values for 2 January 2009 (time t2) is two XHTML and plain text documents with different content from those provided at time t1.
So, moving on to that second piece of text I cited, my interpretation of the final sentence as it applies to HTTP (and, as I say, I could be wrong about this) would be that the RESTful use of the HTTP GET method is intended to retrieve a representation of the current state of the resource. It is the value set at that point in time which provides the basis for negotiation. So, in my example here, on 1 January 2009, I offer XHTML and plain text versions of my "From each according to his abilities..." quote via content negotiation, and on 2 January 2009, I offer XHTML and plain text versions of my "Those who do not move..." quotations. i.e. At two different points in time t1 and t2, different (sets of) representations may be provided for a single resource, reflecting the different state of that resource at those two different points in time, but at either of those points in time, the expectation is that each representation of the set available represents the state of the resource at that point in time, and only members of that set are available via content negotiation. So although representations may vary by language, content-type etc, they should be in some sense "equivalent" (Roy Fielding's term) in terms of their representation of the current state of the resource.
I think the Memento approach suggests that on 2 January 2009, I could, using the date-time-based negotiation convention, offer all four of those variants listed above (and on each day into the future, a set which increases in membership as I add new quotes). But it seems to me that is at odds with the REST style, because the Memento approach requires that representations of different states of the resource (i.e. the state of the resource at different points in time) are all made available as representations at a single point in time.
I appreciate that (even if my interpretation is correct, which it may not be) the constraints specified by the REST architectural style are just that: a set of constraints which, if observed, generate certain properties/characteristics in a system. And if some of those constraints are relaxed or ignored, then those properties change. My understanding is not good enough to pinpoint exactly what the implications of this particular point of divergence (if indeed it is one!) would be - though as Herbert notes in hs presentation, it would appear that there would be implications for cacheing.
But as I said, I'm really just trying to raise the questions which have been running around my head and which I haven't really been able to answer to my own satisfaction.
As an aside, I think Memento could probably achieve quite similar results by providing some metadata (or a link to another document providing that metadata) which expressed the relationships between the time-variant resource and all the time-specific variant resources, rather than seeking to manage this via HTTP content negotiation.
Postscript: I notice that, in the time it has taken me to draft this post, Mark Baker has made what I think is a similar point in a couple of messages (first, second) to the W3C public-lod mailing list.
Recent Comments