« October 2009 | Main | December 2009 »

November 23, 2009

Memento and negotiating on time

Via Twitter, initially in a post by Lorcan Dempsey, I came across the work of Herbert Van de Sompel and his comrades from LANL and Old Dominion University on the Memento project:

The project has since been the topic of an article in New Scientist.

The technical details of the Memento approach are probably best summarised in the paper "Memento: Time Travel for the Web", and Herbert has recently made available a presentation which I'll embed here, since it includes some helpful graphics illustrating some of the messaging in detail:

Memento seeks to take advantage of the Web Architecture concept that interactions on the Web are concerned with exchanging representations of resources. And for any single resource, representations may vary - at a single point in time, variant representations may be provided, e.g. in different formats or languages, and over time, variant representations may be provided reflecting changes in the state of the resource. The HTTP protocol incorporates a feature called content negotiation which can be used to determine the most appropriate representation of a resource - typically according to variables such as content type, language, character set or encoding. The innovation that Memento brings to this scenario is the proposition that content negotiation may also be applied to the axis of date-time. i.e. in the same way that a client might express a preference for the language of the representation based on a standard request header, it could also express a preference that the representation should reflect resource state at a specified point in time, using a custom accept header (X-Accept-Datetime).

More specifically, Memento uses a flavour of content negotiation called "transparent content negotiation" where the server provides details of the variant representations available, from which the client can choose. Slides 26-50 in Herbert's presentation above illustrate how this technique might be applied to two different cases: one in which the server to which the initial request is sent is itself capable of providing the set of time-variant representations, and a second in which that server does not have those "archive" capabilities but redirects to (a URI supported by) a second server which does.

This does seem quite an ingenious approach to the problem, and one that potentially has many interesting applications, several of which Herbert alludes to in his presentation.

What I want to focus on here is the technical approach, which did raise a question in my mind. And here I must emphasise that I'm really just trying to articulate a question that I've been trying to formulate and answer for myself: I'm not in a position to say that Memento is getting anything "wrong", just trying to compare the Memento proposition with my understanding of Web architecture and the HTTP protocol, or at least the use of that protocol in accordance with the REST architectural style, and understand whether there are any divergences (and if there are, what the implications are).

In his dissertation in which he defines the REST architectural style, Roy Fielding defines a resource as follows:

More precisely, a resource R is a temporally varying membership function MR(t), which for time t maps to a set of entities, or values, which are equivalent. The values in the set may be resource representations and/or resource identifiers. A resource can map to the empty set, which allows references to be made to a concept before any realization of that concept exists -- a notion that was foreign to most hypertext systems prior to the Web. Some resources are static in the sense that, when examined at any time after their creation, they always correspond to the same value set. Others have a high degree of variance in their value over time. The only thing that is required to be static for a resource is the semantics of the mapping, since the semantics is what distinguishes one resource from another.

On representations, Fielding says the following, which I think is worth quoting in full. The emphasis in the first and last sentences is mine.

REST components perform actions on a resource by using a representation to capture the current or intended state of that resource and transferring that representation between components. A representation is a sequence of bytes, plus representation metadata to describe those bytes. Other commonly used but less precise names for a representation include: document, file, and HTTP message entity, instance, or variant.

A representation consists of data, metadata describing the data, and, on occasion, metadata to describe the metadata (usually for the purpose of verifying message integrity). Metadata is in the form of name-value pairs, where the name corresponds to a standard that defines the value's structure and semantics. Response messages may include both representation metadata and resource metadata: information about the resource that is not specific to the supplied representation.

Control data defines the purpose of a message between components, such as the action being requested or the meaning of a response. It is also used to parameterize requests and override the default behavior of some connecting elements. For example, cache behavior can be modified by control data included in the request or response message.

Depending on the message control data, a given representation may indicate the current state of the requested resource, the desired state for the requested resource, or the value of some other resource, such as a representation of the input data within a client's query form, or a representation of some error condition for a response. For example, remote authoring of a resource requires that the author send a representation to the server, thus establishing a value for that resource that can be retrieved by later requests. If the value set of a resource at a given time consists of multiple representations, content negotiation may be used to select the best representation for inclusion in a given message.

So at a point in time t1, the "temporally varying membership function" maps to one set of values, and - in the case of a resource whose representations vary over time - at another point in time t2, it may map to another, different set of values. To take a concrete example, suppose at the start of 2009, I launch a "quote of the day", and I define a single resource that is my "quote of the day", to which I assign the URI http://example.org/qotd/. And I provide variant representations in XHTML and plain text. On 1 January 2009 (time t1), my quote is "From each according to his abilities, to each according to his needs", and I provide variant representations in those two formats, i.e. the set of values for 1 January 2009 is those two documents. On 2 January 2009 (time t2), my quote is "Those who do not move, do not notice their chains", and again I provide variant representations in those two formats, i.e. the set of values for 2 January 2009 (time t2) is two XHTML and plain text documents with different content from those provided at time t1.

So, moving on to that second piece of text I cited, my interpretation of the final sentence as it applies to HTTP (and, as I say, I could be wrong about this) would be that the RESTful use of the HTTP GET method is intended to retrieve a representation of the current state of the resource. It is the value set at that point in time which provides the basis for negotiation. So, in my example here, on 1 January 2009, I offer XHTML and plain text versions of my "From each according to his abilities..." quote via content negotiation, and on 2 January 2009, I offer XHTML and plain text versions of my "Those who do not move..." quotations. i.e. At two different points in time t1 and t2, different (sets of) representations may be provided for a single resource, reflecting the different state of that resource at those two different points in time, but at either of those points in time, the expectation is that each representation of the set available represents the state of the resource at that point in time, and only members of that set are available via content negotiation. So although representations may vary by language, content-type etc, they should be in some sense "equivalent" (Roy Fielding's term) in terms of their representation of the current state of the resource.

I think the Memento approach suggests that on 2 January 2009, I could, using the date-time-based negotiation convention, offer all four of those variants listed above (and on each day into the future, a set which increases in membership as I add new quotes). But it seems to me that is at odds with the REST style, because the Memento approach requires that representations of different states of the resource (i.e. the state of the resource at different points in time) are all made available as representations at a single point in time.

I appreciate that (even if my interpretation is correct, which it may not be) the constraints specified by the REST architectural style are just that: a set of constraints which, if observed, generate certain properties/characteristics in a system. And if some of those constraints are relaxed or ignored, then those properties change. My understanding is not good enough to pinpoint exactly what the implications of this particular point of divergence (if indeed it is one!) would be - though as Herbert notes in hs presentation, it would appear that there would be implications for cacheing.

But as I said, I'm really just trying to raise the questions which have been running around my head and which I haven't really been able to answer to my own satisfaction.

As an aside, I think Memento could probably achieve quite similar results by providing some metadata (or a link to another document providing that metadata) which expressed the relationships between the time-variant resource and all the time-specific variant resources, rather than seeking to manage this via HTTP content negotiation.

Postscript: I notice that, in the time it has taken me to draft this post, Mark Baker has made what I think is a similar point in a couple of messages (first, second) to the W3C public-lod mailing list.

November 20, 2009

COI guidance on use of RDFa

Via a post from Mark Birbeck, I notice that the UK Central Office for Information has published some guidelines called Structuring information on the Web for re-usability which include some guidance on the use of RDFa to provide specific types of information, about government consultations and about job vacancies.

This is exciting news as, as far as I know, this is the first formal document from UK central government to provide this sort of quite detailed, resource-type-specific guidance with recommendations on the use of particular RDF vocabularies - guidance of the sort I think will be an essential component in the effective deployment of RDFa and the Linked Data approach. It's also the sort of thing that is of considerable interest to Eduserv, as a developer of Web sites for several government agencies. The document builds directly on the work Mark has been doing in this area, which I mentioned a while ago.

As Mark notes in his post, the document is unequivocal in its expression of the government's commitment to the Linked Data approach:

Government is committed to making its public information and data as widely available as possible. The best way to make structured information available online is to publish it as Linked Data. Linked Data makes the information easier to cut and combine in ways that are relevant to citizens.

Before the announcement of these guidelines, I recently had a look at the "argot" for consultations - "argot" is Mark's term for a specification of how a set of terms from multiple RDF vocabularies is used to meet some application requirement; as I noted in that earlier post, I think it is similar to what DCMI calls an "application profile" - , and I had intended to submit some comments. I fear it is now somewhat late in the day for me to be doing this, but the release of this document has prompted me to write them up here. My comments are concerned primarily with the section titled "Putting consultations into Linked Data"

The guidelines (correctly, I think) establish a clear distinction between the consultation on the one hand and the Web page describing the consultation on the other by (in paragraphs 30/31) introducing a fragment identifier for the URI of the consultation (via the about="#this" attribute). The consultation itself is also modelled as a document, an instance of the class foaf:Document, which in turn "has as parts" the actual document(s) on which comment is being sought, and for which a reply can be sent to some agent.

I confess that my initial "instinctive" reaction to this was that this seemed a slightly odd choice, as a "consultation" seemed to me to be more akin to an event or a process, taking place during an interval of time, which had a as "inputs" to that process a set of documents on which comments were sought, and (typically at least) resulted in the generation of some other document as a "response". And indeed the page describing the Consultation argot introduces the concept as follows (emphasis added):

A consultation is a process whereby Government departments request comments from interested parties, so as to help the department make better decisions. A consultation will usually be focused on a particular topic, and have an accompanying publication that sets the context, and outlines particular questions on which feedback is requested. Other information will include a definite start and end date during which feedback can be submitted and contact details for the person to submit feedback to.

I admit I find it difficult to square this with the notion of a "document". And I think a "consultation-as-event" (described by a Web page) could probably be modelled quite neatly using the Event Ontology or the similar LODE ontology (with some specialisation of classes and properties if required).

Anyway, I appreciate that aspect may be something of a "design choice". So for the remainder of the comments here, I'll stick to the actual approach described by the guidelines (consultation as document).

The RDF properties recommended for the description of the consultation are drawn mainly from Dublin Core vocabularies, and more specifically from the "DC Terms" vocabulary.

The first point to note is that, as Andy noted recently, DCMI recently made some fairly substantive changes to the DC Terms vocabulary, as a result of which the majority of the properties are now the subject of rdfs:range assertions, which indicate whether the value of the property is a literal or a non-literal resource. The guidelines recommend the use of the publisher (paragraphs 32-37), language(paragraphs 38-39), and audience (paragraph 46) properties, all with literal values, e.g.

<span property="dc:publisher" content="Ministry of Justice"></span>

But according to the term descriptions provided by DCMI, the ranges of these properties are the classes dcterms:Agent, dcterms:LinguisticSystem and dcterms:AgentClass respectively. So I think that would require the use of an XHTML-RDFa construct something like the following, introducing a blank node (or a URI for the resource if one is available):

<div rel="dc:publisher"><span property="foaf:name" content="Ministry of Justice"></span></div>

Second, I wasn't sure about the recommendation for the use of the dcterms:source property (paragraph 37). This is used to "indicate the source of the consultation". For the case where this is a distinct resource (i.e. distinct from the consultation and this Web page describing it), this seems OK, but the guidelines also offer the option of referring to the current document (i.e. the Web page) as the source of the consultation:

<span rel="dc:source" resource=""></span>

DCMI's definition of the property is "A related resource from which the described resource is derived", but it seems to me the Web page is acting as a description of the consultation-as-document, rather than a source of it.

Third, the guidelines recommend the use of some of the DCMI date properties (paragraph 42):

  • dcterms:issued for the publication date of the consultation
  • dcterms:available for the start date for receiving comments
  • dcterms:valid for the closing date ("a date through which the consultation is 'valid'")

I think the use of dcterms:valid here is potentially confusing. DCMI's definition is "Date (often a range) of validity of a resource", so on this basis I think the implication of the recommended usage is that the consultation is "valid" only on that date, which is not what is intended. The recommendations for dcterms:issued and dcterms:available are probably OK - though I do think the event-based approach might have helped make the distinction between dates related to documents and dates related to consultation-as-process rather clearer!

Oh dear, this must read like an awful lot of pedantic nitpicking on my part, but my intent is to try to ensure that widely used vocabularies like those provided by DCMI are used as consistently as possible. As I said at the start I'm very pleased to see this sort of very practical guidance appearing (and I apologise to Mark for not submitting my comments earlier!)

November 18, 2009

Where does your digital identity want to go today?

This morning I had cause to revisit an identity-related 'design pattern' that I originally worked on during a workshop back in January, in readiness for a follow-up workshop tomorrow.

The pattern is concerned with the way in which personal information can be aggregated, shared and re-used between social networking sites and other tools and the moral and legal rights and responsibilities that go with that kind of activity.

I don't want to write in detail about the pattern here, since it is the topic for the workshop tomorrow and may well change significantly.  What I do want to note, is that in thinking about this 'aggregating' scenario I realised that there are three key roles in any scenario of this kind:

  • the subject - the person that the personal information is about
  • the creator - the person that has created the personal information
  • the aggregator - the person aggregating personal information from one or more sources into a new tool or service.

In any given instance, an individual might play more than one of these roles.  Indeed, in the original use-case which I provided to kick-start the discussion I played all three roles.  But the important thing is that in the general case, the three are often different people, each having different 'moral' and legal rights and responsibilities and different interests in how the personal information is aggregated and re-used.

To illustrate this, here is a simple, and completely fictitious, case-study:

Amy (the subject) uses Twitter to share updates with both colleagues and friends.  Concerned about cross-over between the two audiences, Amy chooses to use two Twitter accounts, one aimed at professional colleagues and the other aimed at personal friends.  Amy uses Twitter's privacy options to control who sees the tweets from her personal account.

Ben (the creator) is both a friend and colleague of Amy and is thus a follower of both Amy's Twitter accounts. On seeing a personal tweet from Amy that Ben feels would be of wider interest to his professional colleagues, Ben retweets it (thus creating a new piece of personal information about Amy), prefixing the original text with a comment containing the name of Amy's company.

Calvin (the aggregator) works for the same company as Amy and looks after the company intranet.  He decides to use a Twitter search to aggregate any tweets that contain the company name and display them on the intranet so that all staff can see what is being said about the company.

Amy's original 'private' tweet thus appears semi-publicly in-front of all staff within the company.

Depending on the nature of the original private tweet, the damage done here is probably minimal but this scenario serves to illustrate the way that personal information (i.e. information that is part of Amy's digital identity) can flow in unexpected ways.

One can imagine lots of similar scenarios arising from unwanted tagged Flickr or Facebook images, re-used del.icio.us links, forwarding of private emails, and so on.

Who, if anyone, is at fault in this scenario?  Perhaps 'fault' is too strong a word?

Well, Amy is probably naive to assume that anything posted anywhere on the Internet is guaranteed to remain private.  Ben clearly should not have retweeted a tweet from Amy that was intended to remain somewhat private but in the general to-and-fro of Twitter exchanges it is probably understandable that it happened. Note that the Web interface to Twitter displays a padlock next to 'private' tweets but this is not a convention used by all Twitter clients. In general therefore, any shared knowledge that some tweets are intended to be treated more confidentially than others has to be maintained between the two people concerned outside of Twitter itself.  Calvin is simply aggregating public information in order to share it more widely within the company and it is thus not clear that he could or should do otherwise.

On that basis, any fault seems to lie with Ben.  Does Amy have any moral grounds for complaint? Against Ben... yes, probably, though as I said, the mistake is understandable in the context of normal Twitter usage.

The point here is to illustrate that currently, while many social networking tools have mechanisms for adjusting privacy settings, these are not foolproof and the shared knowledge and conventions about the acceptable use of personal information (i.e. digital identity) typically have to be maintained outside of the particular technology in use.  Further the trust required to ensure that things don't go wrong relies on both the goodwill and good practice of all three parties concerned.

November 16, 2009

The future has arrived

Cetis09

About 99% of the way thru Bill Thompson's closing keynote to the CETIS 2009 Conference last week I tweeted:

great technology panorama painted by @billt in closing talk at #cetis09

And it was a great panorama - broad, interesting and entertainingly delivered. It was a good performance and I am hugely in awe of people who can give this kind of presentation. However, what the talk didn't do was move from the "this is where technology has come from, this is where it is now and this is where it is going" kind of stuff to the "and this is what it means for education in the future". Which was a shame because in questioning after his talk Thompson did make some suggestions about the future of print news media (not surprising for someone now at the BBC) and I wanted to hear similar views about the future of teaching, learning and research.

As Oleg Liber pointed out in his question after the talk, universities, and the whole education system around them, are lumbering beasts that will be very slow to change in the face of anything. On that basis, whilst it is interesting to note that (for example) we can now just about store a bit on an atom (meaning that we can potentially store a digital version of all human output on something the weight of a human body), that we can pretty much wire things directly into the human retina, and that Africa will one-day overtake 'digital' Britain in the broadband stakes are interesting individual propositions in their own right, there comes a "so what?" moment where one is left wondering what it actually all means.

As an aside, and on a more personal note, I suggest that my daughter's experience of university (she started at Sheffield Hallam in September) is not actually going to be very different to my own, 30-odd years ago. Lectures don't seem to have changed much. Project work doesn't seem to have changed much. Going out drinking doesn't seem to have changed much. She did meet all her 'hall' flat-mates via Facebook before she arrived in Sheffield I suppose :-) - something I never had the opportunity to do (actually, I never even got a place in hall). There is a big difference in how it is all paid for of course but the interesting question is how different university will be for her children. If the truth is, "not much", then I'm not sure why we are all bothering.

At one point, just after the bit about storing a digital version of all human output I think, Thompson did throw out the question, "...and what does that mean for copyright law?". He didn't give us an answer. Well, I don't know either to be honest... though it doesn't change the fact that creative people need to be rewarded in some way for their endeavours I guess. But the real point here is that the panorama of technological change that Thompson painted for us, interesting as it was, begs some serious thinking about what the future holds.  Maybe Thompson was right to lay out the panorama and leave the serious thinking to us?

He was surprisingly positive about Linked Data, suggesting that the time is now right for this to have a significant impact.  I won't disagree because I've been making the same point myself in various fora, though I tend not to shout it too loudly because I know that the Semantic Web has a history of not quite making it.  Indeed, the two parallel sessions that I attended during the conference, University API and the Giant Global Graph both focused quite heavily on the kinds of resources that universities are sitting on (courses, people/expertise, research data, publications, physical facilities, events and so on) that might usefully be exposed to others in some kind of 'open' fashion.  And much of the debate, particularly in the second session (about which there are now some notes), was around whether Linked Data (i.e. RDF) is the best way to do this - a debate that we've also seen played out recently on the uk-government-data-developers Google Group.

The three primary issues seemed to be:

  • Why should we (universities) invest time and money exposing our data in the hope that people will do something useful/interesting/of value with it when we have many other competing demands on our limited resources?
  • Why should we take the trouble to expose RDF when it's arguably easier for both the owner and the consumer of the data to expose something simpler like a CSV file?
  • Why can't the same ends be achieved by offering one or more services (i.e. a set of one or more APIs) rather than the raw data itself?

In the ensuing debate about the why and the how, there was a strong undercurrent of, "two years ago SOA was all the rage, now Linked Data is all the rage... this is just a fashion thing and in two years time there'll be something else".  I'm not sure that we (or at least I) have a well honed argument against this view but, for me at least, it lies somewhere in the fit with resource-orientation, with the way the Web works, with REST, and with the Web Architecture.

On the issue of the length of time it is taking for the Semantic Web to have any kind of mainstream impact, Ian Davis has an interesting post, Make or Break for the Semantic Web?, arguing that this is not unusual for standards track work:

Technology, especially standards track work, takes years to cross the chasm from early adopters (the technology enthusiasts and visionaries) to the early majority (the pragmatists). And when I say years, I mean years. Take CSS for example. I’d characterise CSS as having crossed the chasm and it’s being used by the early majority and making inroads into the late majority. I don’t think anyone would seriously argue that CSS is not here to stay.

According to this semi-official history of CSS the first proposal was in 1994, about 13 years ago. The first version that was recognisably the CSS we use today was CSS1, issued by the W3C in December 1996. This was followed by CSS2 in 1998, the year that also saw the founding of the Web Standards Project. CSS 2.1 is still under development, along with portions of CSS3.

Paul Walk has also written an interesting post, Linked, Open, Semantic?, in which he argues that our discussions around the Semantic Web and Linked Data tend to mix up three memes (open data, linked data and semantics) in rather unhelpful ways. I tend to agree, though I worry that Paul's proposed distinction between Linked Data and the Semantic Web is actually rather fuzzier than we may like.

On balance, I feel a little uncomfortable that I am not able to offer a better argument against the kinds of anti-Linked Data views expressed above. I think I understand the issues (or at least some of them) pretty well but I don't have them to hand in a kind of this is why Linked Data is the right way forward 'elevator pitch'.

Something to work on I guess!

[Image: a slide from Bill Thompson's closing keynote to the CETIS 2009 Conference]

November 12, 2009

Where Next for Digital Identity?

In collaboration with the three 'digital identity' projects that we funded last year, we have organised a day-long event looking at the future of digital identity.  The day will feature an invited talk by Ian Brown of the Oxford Internet Institute, followed by talks from each of the three projects (Steven Warburton, Shirley Williams and Harry Halpin), followed by an afternoon of discussion groups.

The event will be held at the British Library in January next year. Places are limited to 50.

November 05, 2009

Write to Reply

I've noted before how much I like the Write to Reply service, conceived and developed by Tony Hirst and Joss Winn:

A site for commenting on public reports in considerable detail. Texts are broken down into their respective sections for easier consumption. Rather than comment on the text as a whole, you are encouraged to direct comments to specific paragraphs.

On that basis, I am very pleased to announce that we have made available a small amount of funding for the service, initially covering the website hosting costs for the next 6 months but with a commitment to do so in some form for 2 years (whether that be through a continuation of the current hosting arrangement or by moving the content to Eduserv servers or elsewhere).

It strikes me that Write to Reply has already demonstrated its value in various fields, notably in the areas of education and government policy, and I'm sure it will continue to do so. It's one of those ideas that is rather simple and obvious in hindsight, yet very powerful in practice - give people a public space in which they can make comments on important documents and make it social enough that commenting feels more like having a conversation than simply annotating a text.  Good stuff.  Long may it continue.

November 04, 2009

"Simple DC" Revisited

In a recent post I outlined a picture of Simple Dublin Core as a pattern for constructing DC description sets (a Description Set Profile), in which statements referred to one of the fifteen properties of the Dublin Core Metadata Element Set, with a literal value surrogate in which syntax encoding schemes were not permitted.

While I think this is a reasonable reflection of the informal descriptions of "Simple DC" provided in various DCMI documents, this approach does tend to give primacy to a pattern based solely on the use of literal values. It also ignores the important work done more recently by DCMI in "modernising" its vocabularies, emphasing the distinction between literal and other values, and reflecting that in the formal RDFS descriptions of its terms.

In a document presented to a DCMI Usage Board meeting at DC-2007 in Singapore, an alternative, "modern" approach to "Simple DC" was proposed by Mikael Nilsson and Tom Baker. (I don't have a current URI for the particular document, but it is part of the "meeting packet"). That proposal suggested a view of "Simple DC" as a DSP (actually, it proposed a DCAP, but I'm focussing here on the "structural constraints" component) in which the properties referenced are not the "original" fifteen properties of the DCMES, but rather the fifteen new properties added to the "DC Terms" collection as part of that modernisation exercise:

  • A description set must contain exactly one description (Description Template: Minimum occurrence constraint = 1; Maximum occurrence constraint = 1)
  • That description may be of a resource of any type (Description Template: Resource class constraint: none (default))
  • For each statement in that description, the type of value surrogate supported depends on the range of the property:
    • For the following property URIs: dcterms:title, dcterms:identifier, dcterms:date, dcterms:description (Statement Template: Property List constraint):
      • There may be no such statement; there may be many (Statement Template: Minimum occurrence constraint = 0; Maximum occurrence constraint = unbounded)
      • A literal value surrogate is required (Statement Template: Type constraint = literal)
      • Within that literal value surrogate, the use of a syntax encoding sceme URI is not permitted (Statement Template/Literal Value: Syntax Encoding Scheme Constraint = disallowed)
    • For the following property URIs: dcterms:creator, dcterms:contributor, dcterms:publisher, dcterms:type, dcterms:language, dcterms:format, dcterms:source, dcterms:relation, dcterms:subject, dcterms:coverage, dcterms:rights (Statement Template: Property List constraint):
      • There may be no such statement; there may be many (Statement Template: Minimum occurrence constraint = 0; Maximum occurrence constraint = unbounded)
      • A non-literal value surrogate is required (Statement Template: Type constraint = non-literal)
      • Within that non-literal value surrogate
        • the use of a value URI is not permitted (Statement Template/Non-Literal Value: Value URI Constraint = disallowed)
        • the use of a vocabulary encoding scheme URI is not permitted (Statement Template/Non-Literal Value: Vocabulary Encoding Scheme Constraint = disallowed)
        • a single value string is required (Statement Template/Non-Literal Value/Value String: Minimum occurrence constraint = 1; Maximum occurrence constraint = 1) and the use of a syntax encoding scheme URI is not permitted (Statement Template/Non-Literal Value/Value String: Syntax Encoding Scheme Constraint = disallowed)

This pattern seeks to combine the simplicity of use of the "traditional" "Simple DC" approach of using only 15 properties with a recognition of the value of using literal and non-literal values as appropriate for each property. However, it is, by definition, slightly more complex than the "all literal values" pattern outlined in the earlier post, and it differs from the patterns described informally in existing DCMI documentation (and I think it would be difficult to argue that it is represented using formats like the oai_dc XML format, which of course predated the creation of the new properties by several years.)

This does not have to be an either/or choice. It may well be that there is a use for both patterns, and if they are clearly named (I don't really care what they are called as long as the names are different!) and documented, there is no reason why two such DSPs should not co-exist.

Having said all that, I'd just re-emphasise that I think both of these patterns are fairly limited in the sort of functionality they can support. It seems to me the notion of "Simple DC" emerged at at time when the emphasis was still very much on the indexing and searching of textual values, and it largely ignores the Web principle of making links between resources. It would be difficult to categorise "Simple DC" - in either of the forms suggested - as a "linked data" friendly approach. I fear a lot of effort has been spent trying to build services on the basis of "Simple DC" when it may have been more appropriate to recognise the inherent limitations of that approach, and to focus instead on richer patterns designed from the outset to support more complex functions.

P.S. I know, I know, I promised a post on "Qualified DC". It's on its way....

About

Search

Loading
eFoundations is powered by TypePad