October 25, 2010

A few brief thoughts on iTunesU

The use of iTunesU by UK universities has come up in discussions a couple of times recently, on Brian Kelly's UK Web Focus blog (What Are UK Universities Doing With iTunesU? and iTunes U: an Institutional Perspective) and on the closed ALT-C discussion list. In both cases, as has been the case in previous discussions, my response has been somewhat cautious, an attitude that always seems to be interpreted as outright hostility for some reason.

So, just  for the record, I'm not particularly negative about iTunesU and in some respects I am quite positive - if nothing else, I recognise that the adoption of iTunesU is a very powerful motivator for the generation of openly available content and that has got to be a good thing - but a modicum of scepticism is always healthy in my view (particularly where commercial companies are involved) and I do have a couple of specific concerns about the practicalities of how it is used:

  • Firstly that students who do not own Apple hardware and/or who choose not to use iTunes on the desktop are not disenfranchised in any way (e.g. by having to use a less functional Web interface). In general, the response to this is that they are not and, in the absence of any specific personal experience either way, I have to concede that to be the case.
  • Secondly (and related to the first point), that in an environment where most of the emphasis seems to be on the channel (iTunesU) rather than on the content (the podcasts), that confusion isn't introduced as to how material is cited and referred to – i.e. do some lecturers only ever refer to 'finding stuff on iTunesU', while others offer a non-iTunesU Web URL, and others still remember to cite both? I'm interested in whether universities who have adopted iTunesU but who also make the material available in other ways have managed to adopt a single way of citing the material that is on offer?

Both these concerns relate primarily to the use of iTunesU as a distribution channel for teaching and learning content within the institution. They apply much less to its use as an external 'marketing' channel. iTunesU seems to me (based on a gut feel more than on any actual numbers) to be a pretty effective way of delivering OER outside the institution and to have a solid 'marketing win on the back of that. That said, it would be good to have some real numbers as confirmation (note that I don't just mean numbers of downloads here - I mean conversions into 'actions' (new students, new research opps, etc.)). Note that I also don't consider 'marketing' to be a dirty word (in this context) - actually, I guess this kind of marketing is going to become increasingly important to everyone in the HE sector.

There is a wider, largely religious, argument about whether "if you are not paying for it, you aren't the customer, you are part of the product" but HE has been part of the MS product for a long while now and, worse, we have paid for the privilege – so there is nothing particularly new there. It's not an argument that particularly bothers me one way or the other, provided that universities have their eyes open and understand the risks as well as the benefits. In general, I'm sure that they do.

On the other hand, while somebody always owns the channel, some channels seem to me to be more 'open' (I don't really want to use the word 'open' here because it is so emotive but I can't think of a better one) than others. So, for example, I think there are differences in an institution adopting YouTube as a channel as compared with adopting iTunesU as a channel and those differences are largely to do with the fit that YouTube has with the way the majority of the Web works.

May 05, 2010

RDFa for the Eduserv Web site

Another post that I've been intermittently chiselling away at in the draft pile for a while... A few weeks ago, I was asked by Lisa Price, our Website Communications Manager, to make some suggestions of how Eduserv might make use of the RDFa in XHTML syntax to embed structured data in pages on the Eduserv Web site, which is currently in the process of being redesigned. I admit this is coming mostly from the starting point of wanting to demonstrate the use of the technology rather than from a pressing use case, but OTOH there is a growing interest from RDFa amongst some of Eduserv's public sector clients so a spot of "eating our own dogfood" would be a Good Thing, and furthermore there are signs of a gradual but significant adoption of RDFa by some major Web service providers.

It seems to me Eduserv might use RDFa to describe, or make assertions about:

  • (Perhaps rather trivially) Web pages themselves i.e. reformulating the (fairly limited) "document metadata" we supply as RDFa.
  • (Perhaps rather more interestingly) some of the "things" that Eduserv pages "are about", or that get mentioned in those pages (e.g. persons, organisations, activities, events, topics of interest, etc).

Within that category of data about "things", we need to decide which data it is most useful to expose. We could:

  • look at those classes of data that are processed by tools/services that currently make use of RDFa (typically using specified RDF vocabularies); or
  • focus on data that we know already exists in a "structured" form but is currently presented in X/HTML either only in human-readable form or using microformats (or even new data which isn't currently surfaced at all on the current site)

Another consideration was the question of whether data was covered by existing models and vocabularies or required some analysis and modelling.

To be honest, there's a fairly limited amount of "structured" information on the site currently. There is some data on licence agreements for software and data, currently made available as HTML tables and Excel spreadsheets. While I think some of the more generic elements of this might be captured using a product/service ontology such as Good Relations, the license-specific aspects would require some additional modelling. For the short term at least, we've taken a somewhat "pragmatic" approach and focused mainly on that first class of data for which there are some identifiable consuming applications, based on the use of specified RDF vocabularies - and more specifically on data that Google and Yahoo make particular reference to in their documentation for creators/publishers of Web pages.

That's not to say there won't be more use of RDFa on the site in the future: at the moment, this is something of a "dipping toes in the water" exercise, I think.

The following is by best effort to summarize Google and Yahoo support for RDFa at the time of writing. Please note that this is something which is evolving - as I was writing up this post, I just noticed that the Google guidelines have changed slightly since I sent my initial notes to Lisa. And I'm still not at all sure I've captured the complete picture here, so please do check their current documentation for content providers to get an idea of the current state of play.

Google and RDFa

Google's support for RDFa is part of a larger programme of support for structured data embedded in X/HTML that they call "rich snippets" (announced here), which includes support for RDFa, microformats and microdata. (The latter, I think, is a relatively recent addition).

Google functionality extends to extracting specified categories of RDFa data in (some) pages it indexes, and displaying that in search result sets (and in place pages in Google Maps). It also provides access to the data in its Custom Search platform.

Initially at least, Google required the use of its own RDF vocabularies, which attracted some criticism (see e.g. Ian Davis' response), but it appears to have fairly quietly introduced some support for other RDF vocabularies. "In addition to the Person RDFa format, we have added support for the corresponding fields from the FOAF and vCard vocabularies for all those of you who asked for it." And Martin Hepp has pointed to Google displaying data encoded using the Good Relations product/service ontology.

The nature of the RDFa syntax is such that it is often fairly straightforward to use multiple RDF vocabularies in RDFa e.g. triples using the same subject and object but different predicates can be encoded using a single RDFa attribute with multiple white-space-separated CURIEs - though things do tend to get more messy if the vocabularies are based on different models (e.g. time periods as literals v time periods as resources with properties of their own).

Google provides specific recommendations to content creators on the embedding of data to describe:

Yahoo and RDFa

Yahoo's support for RDFa is through its SearchMonkey platform. Like Google, it provides a set of "standard" result set enhancements, based on the use of specified RDF vocabularies for a small set of resource types:

In addition, my understanding is that although Yahoo defines some RDF vocabularies of its own, and describes the use of specified vocabularies in the guidelines for the resource types above, it exposes any RDFa data in pages it indexes to developers on its SearchMonkey platform, to allow the building of custom search enhancements. Several existing vocabularies are discussed in the SearchMonkey guide and the FAQ in Appendix D of that document notes "You may use any RDF or OWL vocabulary".

Linked Data

The decentralised extensibility built into RDF means that a provider can choose to extend what data they expose beyond that specified in the guidelines mentioned above.

In addition, I tried to take into account some other general "good practice" points that have emerged from the work of the Linked Data community, captured in sources such as:

So in the Eduserv case, for example (I hope!) URIs will be assigned to "things" like events, distinct from the pages describing them, with suitable redirects put in place on the HTTP server and syitable triples in the data linking those things and the corresponding pages.

Summary

Anyway, on the basis of the above sources, I tried to construct some suggestions, taking into acccount both the Google and Yahoo guidelines, for descriptions of people, organisations and events, which I'll post here in the next few entries.

Postscript: Facebook

Even more recently, of course, has come the news of Facebook's announcement at the f8 conference of their Open Graph Protocol. This makes use of RDFa embedded in the headers of XHTML pages using meta elements to provide (pretty minimal) metadata "about" things described by those pages (films, songs, people, places, hotels, restaurants etc - see the Facebook page for a full (and I imagine, growing) list of resource types supported).

Facebook makes use of the data to drive its "Like" application: a "button" can be embedded in the page to allow a Facebook user to post the data to their Fb account to signal an "I like this" relationship with the thing described. Or as Dare Obasanjo expresses it, an Fb user can add a node for the thing to their Fb social graph, making it into a "social object". This results in the data being displayed at appropriate points in their Fb stream, while the button displays, as a minimum, a count of the "likers" of the resource on the source page itself; logged-in Fb users would, I think, see information about whether any of their "friends" had liked it.

My reporting of these details of the interface is somewhat "second-hand" as I no longer use Facebook - I deleted my account some time ago because I was concerned about their approaches to the privacy of personal information (see these three recent posts by Tony Hirst for some thoughts on the most recent round of changes in that sphere).

Perhaps unsurprisingly given the popularity of Fb and its huge user base, the OGP announcement seems to have attracted a very large amount of attention within a very short period of time, and it may turn out to be a significant milestone for the use of XHTML-embedded metadata in general and of RDFa in particular. The substantial "carrot" of supporting the Fb "Like" application and attracting traffic from Fb users is likely to be the primary driver for many providers to generate this data, and indeed some commentators (see e.g. this BBC article) have gone as far as to suggest that this represents a move by Facebook to challenge Google as the primary filter of resources for people searching and navigating the Web.

However, I also think it is important to distinguish between the data on the one hand and that particular Facebook app on the other. Having this data available, minimal as it may be, also opens up the possibility of other applications by other parties making use of that same data.

And this is true also, of course, for the case of data constructed following the Google and Yahoo guidelines.

April 22, 2010

Document metadata using DC-HTML and using RDFa

In the context of various bits and pieces of work recently (more of which I'll write about in some upcoming posts), I've been finding myself describing how document metadata that can be represented using DCMI's DC-HTML meta data profile, described in Expressing Dublin Core metadata using HTML/XHTML meta and link elements, might also be represented using RDFa. (N.B. Here I'm considering only the current RDFa in XHTML W3C Recommendation, not the newly announced drafts for RDFa 1.1). So I thought I'd quickly list some examples here. Please note: I don't intend this to be a complete tutorial on using RDFa. Far from it; here I focus only on the case of "document metadata" whereas of course RDFa can be used to represent data "about" any resources. And these are really little more than a few rough notes which one day I might reuse somewhere else.

I really just wanted to illustrate that:

  • in terms of its use with the XHTML meta and link elements, RDFa has many similarities to the DC-HTML profile - unsurprisingly, as the RDF model underlies both; and
  • RDFa also provides the power and flexibility to represent data that can not be expressed using the DC-HTML profile.

The main differences between using RDFa in XHTML and using the DC-HTML profile are:

  • RDFa supports the full RDF model, not just the particular subset supported by DC-HTML
  • RDFa introduces some new XML attributes (@about, @property, @resource, @datatype, @typeof)
  • RDFa uses a datatype called CURIE for the abbreviation of URIs; DC-HTML uses a prefixed name convention which is essentially specific to that profile (though it was also adopted by the Embedded RDF profile)
  • Perhaps most significantly, RDFa can be used anywhere in an XHTML document, so the same syntactic conventions can be used both for document metadata and for data ("about" any resources) embedded in the body of the document

I'm presenting these examples following the description set model of the DCMI Abstract Model, and in more or less the same order that the DC-HTML specification presents the same set of concepts.

For each example, I present the data:

  • using DC-Text
  • using Turtle
  • in XHTML using DC-HTML
  • in XHTML+RDFa, using meta and link elements
  • in XHTML+RDFa, using block and inline elements (to illustrate that the same data could be embedded in the body of an XHTML document, rather than only in the head)

As an aside, it is possible to use the DC-HTML profile alongside RDFa in the same document, but I haven't bothered to show that here.

Footnote: Hmmm. Considering that I said to myself at the start of the year that I was rather tired of thinking/writing about syntax, I still seem to be doing an awful lot of it! Will try to write about other things soon....

1. Literal Value Surrogates

See DC-HTML 4.5.1.2.

1.1 Plain Value String

1.1.1 DC-Text:

@prefix dc: <http://purl.org/dc/terms/> .
DescriptionSet (
  Description (
    Statement (
      PropertyURI ( dc:title )
      LiteralValueString ( "My World Cup 2010 Review" )
    )
  )
)

1.1.2 Turtle:

@prefix dc: <http://purl.org/dc/terms/> .
<> dc:title "My World Cup 2010 Review" .

1.1.3 XHTML using DC-HTML:

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">

  <head 
    profile="http://dublincore.org/documents/2008/08/04/dc-html/">
    <title>My World Cup 2010 Review</title>
    <link rel="schema.DC" href="http://purl.org/dc/terms/" />
    <meta name="DC.title" content="My World Cup 2010 Review" />
  </head>

</html>

1.1.4 XHTML+RDFa using meta and link

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
    "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:dc="http://purl.org/dc/terms/"
      version="XHTML+RDFa 1.0">
      
  <head>
    <title>My World Cup 2010 Review</title>
    <meta property="dc:title" content="My World Cup 2010 Review" />
  </head>

</html>

In this example, it would also be possible to simply add an attribute to the title element, instead of introducing the meta element:

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
    "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:dc="http://purl.org/dc/terms/"
      version="XHTML+RDFa 1.0">
      
  <head>
    <title property="dc:title">My World Cup 2010 Review</title>
  </head>

</html>

1.1.5 XHTML+RDFa in body

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
    "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:dc="http://purl.org/dc/terms/"
      version="XHTML+RDFa 1.0">
      
  <head>
    <title>My World Cup 2010 Review</title>
  </head>

  <body>
    <h1 property="dc:title">My World Cup 2010 Review</h1>
  </body>
  
</html>

1.2 Plain Value String with Language Tag

1.2.1 DC-Text:

@prefix dc: <http://purl.org/dc/terms/> .
DescriptionSet (
  Description (
    Statement (
      PropertyURI ( dc:title )
      LiteralValueString ( "My World Cup 2010 Review" 
        Language ( en )
      )
    )
  )
)

1.2.2 Turtle:

@prefix dc: <http://purl.org/dc/terms/> .
<> dc:title "My World Cup 2010 Review"@en .

1.2.3 XHTML using DC-HTML:

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">

  <head 
    profile="http://dublincore.org/documents/2008/08/04/dc-html/">
    <title>My World Cup 2010 Review</title>
    <link rel="schema.DC" href="http://purl.org/dc/terms/" />
    <meta name="DC.title" 
      xml:lang="en" content="My World Cup 2010 Review" />
  </head>

</html>

1.2.4 XHTML+RDFa using meta and link

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
    "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:dc="http://purl.org/dc/terms/"
      version="XHTML+RDFa 1.0">
      
  <head>
    <title>My World Cup 2010 Review</title>
    <meta property="dc:title"
      xml:lang="en" content="My World Cup 2010 Review" />
  </head>

</html>

1.2.5 XHTML+RDFa in body

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
    "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:dc="http://purl.org/dc/terms/"
      version="XHTML+RDFa 1.0">
      
  <head>
    <title>My World Cup 2010 Review</title>
  </head>

  <body>
    <h1 property="dc:title" xml:lang="en">My World Cup 2010 Review</h1>
  </body>
  
</html>

1.3 Typed Value String

1.3.1 DC-Text:

@prefix dc: <http://purl.org/dc/terms/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
DescriptionSet (
  Description (
    Statement (
      PropertyURI ( dc:modified )
      LiteralValueString ( "2010-07-04"
        SyntaxEncodingSchemeURI ( xsd:date )
      )
    )
  )
)

1.3.2 Turtle:

@prefix dc: <http://purl.org/dc/terms/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
<> dc:modified "2010-07-04"^^xsd:date .

1.3.3 XHTML using DC-HTML:

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">

  <head 
    profile="http://dublincore.org/documents/2008/08/04/dc-html/">
    <title>My World Cup 2010 Review</title>
    <link rel="schema.DC" href="http://purl.org/dc/terms/" />
    <link rel="schema.XSD" href="http://www.w3.org/2001/XMLSchema#" >
    <meta name="DC.modified" 
      scheme="XSD.date" content="2010-07-04" />
  </head>

</html>

1.3.4 XHTML+RDFa using meta and link

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
    "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:dc="http://purl.org/dc/terms/"
      xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
      version="XHTML+RDFa 1.0">
      
  <head>
    <title>My World Cup 2010 Review</title>
    <meta property="dc:modified" 
      datatype="xsd:date" content="2010-07-04" />
  </head>

</html>

1.3.5 XHTML+RDFa in body

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
    "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:dc="http://purl.org/dc/terms/"
      xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
      version="XHTML+RDFa 1.0">
      
  <head>
    <title>My World Cup 2010 Review</title>
  </head>

  <body>
    <p>Date last modified: 
      <span property="dc:modified"
        datatype="xsd:date">2010-07-04</span>
    </p>
  </body>
  
</html>

2. Non-Literal Value Surrogates

See DC-HTML 4.5.2.2.

2.1 Value URI

2.1.1 DC-Text:

@prefix dc: <http://purl.org/dc/terms/> .
@prefix ex: <http://example.org/resource/> .
DescriptionSet (
  Description (
    Statement (
      PropertyURI ( dc:subject )
      ValueURI ( ex:2010_FIFA_World_Cup )
      )
    )
  )
)

2.1.2 Turtle:

@prefix dc: <http://purl.org/dc/terms/> .
@prefix ex: <http://example.org/resource/> .
<> dc:subject ex:2010_FIFA_World_Cup .

2.1.3 XHTML using DC-HTML:

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">

  <head 
    profile="http://dublincore.org/documents/2008/08/04/dc-html/">
    <title>My World Cup 2010 Review</title>
    <link rel="schema.DC" href="http://purl.org/dc/terms/" />
    <link rel="DC.subject"
      href="http://example.org/resource/2010_FIFA_World_Cup" />
  </head>

</html>

2.1.4 XHTML+RDFa using meta and link

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
    "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:dc="http://purl.org/dc/terms/"
      version="XHTML+RDFa 1.0">
      
  <head>
    <title>My World Cup 2010 Review</title>
    <link rel="dc:subject"
      href="http://example.org/resource/2010_FIFA_World_Cup" />
  </head>

</html>

2.1.5 XHTML+RDFa in body

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
    "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:dc="http://purl.org/dc/terms/"
      version="XHTML+RDFa 1.0">
      
  <head>
    <title>My World Cup 2010 Review</title>
  </head>

  <body>
    <p>About: 
      <a rel="dc:subject"
        href="http://example.org/resource/2010_FIFA_World_Cup">
        The 2010 World Cup
      </a>
    </p>
  </body>
  
</html>

2.2 Value URI with Plain Value String

2.2.1 DC-Text:

@prefix dc: <http://purl.org/dc/terms/> .
@prefix ex: <http://example.org/resource/> .
DescriptionSet (
  Description (
    Statement (
      PropertyURI ( dc:subject )
      ValueURI ( ex:2010_FIFA_World_Cup )
      ValueString ( "2010 FIFA World Cup" )
      )
    )
  )
)

2.2.2 Turtle:

@prefix dc: <http://purl.org/dc/terms/> .
@prefix ex: <http://example.org/resource/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
<> dc:subject ex:2010_FIFA_World_Cup .
ex:2010_FIFA_World_Cup rdf:value "2010 FIFA World Cup" .

2.2.3 XHTML using DC-HTML:

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">

  <head 
    profile="http://dublincore.org/documents/2008/08/04/dc-html/">
    <title>My World Cup 2010 Review</title>
    <link rel="schema.DC" href="http://purl.org/dc/terms/" />
    <link rel="DC.subject"
      href="http://example.org/resource/2010_FIFA_World_Cup"
      title="2010 FIFA World Cup" />
  </head>

</html>

2.2.4 XHTML+RDFa using meta and link

Here the single DCAM statement is made up of two RDF triples, and in RDFa both a link and a meta element are used:

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
    "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:dc="http://purl.org/dc/terms/"
      xmlns:ex="http://example.org/resource/"
      xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
      version="XHTML+RDFa 1.0">
      
  <head>
    <title>My World Cup 2010 Review</title>
    <link rel="dc:subject"
      href="http://example.org/resource/2010_FIFA_World_Cup" />
    <meta about="[ex:2010_FIFA_World_Cup]"
      property="rdf:value" content="2010 FIFA World Cup" />
  </head>

</html>

2.2.5 XHTML+RDFa in body

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
    "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:dc="http://purl.org/dc/terms/"
      xmlns:ex="http://example.org/resource/"
      xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
      version="XHTML+RDFa 1.0">
      
  <head>
    <title>My World Cup 2010 Review</title>
  </head>

  <body>
    <p>About: 
      <a rel="dc:subject"
        href="http://example.org/resource/2010_FIFA_World_Cup">
        <span property="rdf:value">2010 FIFA World Cup</span>
      </a>
    </p>
  </body>
  
</html>

2.3 Value URI with Plain Value String with Language Tag

2.3.1 DC-Text:

@prefix dc: <http://purl.org/dc/terms/> .
@prefix ex: <http://example.org/resource/> .
DescriptionSet (
  Description (
    Statement (
      PropertyURI ( dc:subject )
      ValueURI ( ex:2010_FIFA_World_Cup )
      ValueString ( "2010 FIFA World Cup" 
        Language ( en )
        )
      )
    )
  )
)

2.3.2 Turtle:

@prefix dc: <http://purl.org/dc/terms/> .
@prefix ex: <http://example.org/resource/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
<> dc:subject ex:2010_FIFA_World_Cup .
ex:2010_FIFA_World_Cup rdf:value "2010 FIFA World Cup"@en .

2.3.3 XHTML using DC-HTML:

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">

  <head 
    profile="http://dublincore.org/documents/2008/08/04/dc-html/">
    <title>My World Cup 2010 Review</title>
    <link rel="schema.DC" href="http://purl.org/dc/terms/" />
    <link rel="DC.subject"
      href="http://example.org/resource/2010_FIFA_World_Cup"
      xml:lang="en" title="2010 FIFA World Cup" />
  </head>

  </html>

2.3.4 XHTML+RDFa using meta and link

Again, the single DCAM statement is made up of two RDF triples, and in RDFa both a link and a meta element are used:

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
    "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:dc="http://purl.org/dc/terms/"
      xmlns:ex="http://example.org/resource/"
      xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
      version="XHTML+RDFa 1.0">
      
  <head>
    <title>My World Cup 2010 Review</title>
    <link rel="dc:subject"
      href="http://example.org/resource/2010_FIFA_World_Cup" />
    <meta about="[ex:2010_FIFA_World_Cup]"
      property="rdf:value" 
      xml:lang="en" content="2010 FIFA World Cup" />
  </head>

</html>

With RDFa, multiple value strings might be provided, using multiple meta elements (which is not supported in DC-HTML):

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
    "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:dc="http://purl.org/dc/terms/"
      xmlns:ex="http://example.org/resource/"
      xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
      version="XHTML+RDFa 1.0">
      
  <head>
    <title>My World Cup 2010 Review</title>
    <link rel="dc:subject"
      href="http://example.org/resource/2010_FIFA_World_Cup" />
    <meta about="[ex:2010_FIFA_World_Cup]"
      property="rdf:value" 
      xml:lang="en" content="2010 FIFA World Cup" />
    <meta about="[ex:2010_FIFA_World_Cup]"
      property="rdf:value" 
      xml:lang="es" content="Copa Mundial de Fútbol de 2010" />
  </head>

</html>

2.3.5 XHTML+RDFa in body

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
    "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:dc="http://purl.org/dc/terms/"
      xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
      version="XHTML+RDFa 1.0">
      
  <head>
    <title>My World Cup 2010 Review</title>
  </head>

  <body>
    <p>About: 
      <a rel="dc:subject" 
        href="http://example.org/resource/2010_FIFA_World_Cup">
        <span property="rdf:value" 
	  xml:lang="en">2010 FIFA World Cup</span>
      </a>
    </p>
  </body>
  
</html>

2.4 Value URI with Typed Value String

2.4.1 DC-Text:

@prefix dc: <http://purl.org/dc/terms/> .
@prefix ex: <http://example.org/resource/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
DescriptionSet (
  Description (
    Statement (
      PropertyURI ( dc:language )
      ValueURI ( ex:English )
      ValueString ( "en"
        SyntaxEncodingSchemeURI ( xsd:language )
      )
    )
  )
)

2.4.2 Turtle:

@prefix dc: <http://purl.org/dc/terms/> .
@prefix ex: <http://example.org/resource/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
<> dc:language ex:English .
ex:English rdf:value "en"^^xsd:language .

2.4.3 XHTML using DC-HTML:

Not supported by DC-HTML.

2.4.4 XHTML+RDFa using meta and link

Again, the single DCAM statement is made up of two RDF triples, and in RDFa both a link and a meta element are used:

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
    "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:dc="http://purl.org/dc/terms/"
      xmlns:ex="http://example.org/resource/"
      xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
      xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
      version="XHTML+RDFa 1.0">
      
  <head>
    <title>My World Cup 2010 Review</title>
    <link rel="dc:language"
      href="http://example.org/resource/English" />
    <meta about="[ex:English]"
      property="rdf:value" datatype="xsd:language" content="en" />
  </head>

</html>

2.4.5 XHTML+RDFa in body

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
    "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:dc="http://purl.org/dc/terms/"
      xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
      xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
      version="XHTML+RDFa 1.0">
      
  <head>
    <title>My World Cup 2010 Review</title>
  </head>

  <body>
    <p>Language:
      <a rel="dc:language"
        href="http://example.org/resource/English">
        <span property="rdf:value" 
          datatype="xsd:language" content="en">English</span>
      </a>
    </p>
  </body>

</html>

2.5 Value URI with Vocabulary Encoding Scheme URI

2.5.1 DC-Text:

@prefix dc: <http://purl.org/dc/terms/> .
@prefix ex: <http://example.org/resource/> .
DescriptionSet (
  Description (
    Statement (
      PropertyURI ( dc:subject )
      ValueURI ( ex:2010_FIFA_World_Cup )
      VocabularyEncodingSchemeURI ( ex:MyScheme )
      )
    )
  )
)

2.5.2 Turtle:

@prefix dc: <http://purl.org/dc/terms/> .
@prefix dcam: <http://purl.org/dc/dcam/> .
@prefix ex: <http://example.org/resource/> .
<> dc:subject ex:2010_FIFA_World_Cup .
ex:2010_FIFA_World_Cup dcam:memberOf ex:MyScheme .

2.5.3 XHTML using DC-HTML:

Not supported by DC-HTML.

2.5.4 XHTML+RDFa using meta and link

Again, the single DCAM statement is made up of two RDF triples, and in XHTML using RDFa two link elements are used:

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
    "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:dc="http://purl.org/dc/terms/"
      xmlns:dcam="http://purl.org/dc/dcam/"
      xmlns:ex="http://example.org/resource/"
      version="XHTML+RDFa 1.0">
      
  <head>
    <title>My World Cup 2010 Review</title>
    <link rel="dc:subject"
      href="http://example.org/resource/2010_FIFA_World_Cup" />
    <link about="[ex:2010_FIFA_World_Cup]"
      rel="dcam:memberOf"
      href="http://example.org/resource/MyScheme" />
  </head>

</html>

2.5.5 XHTML+RDFa in body

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
    "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:dc="http://purl.org/dc/terms/"
      xmlns:dcam="http://purl.org/dc/dcam/"
      version="XHTML+RDFa 1.0">
      
  <head>
    <title>My World Cup 2010 Review</title>
  </head>

  <body>
    <p>About: 
      <a rel="dc:subject"
        href="http://example.org/resource/2010_FIFA_World_Cup">
	<span rel="dcam:memberOf"
	  resource="http://example.org/resource/MyScheme" />
	The 2010 World Cup
      </a>
    </p>
  </body>
  
</html>

April 13, 2010

A small GRDDL (XML, really) gotcha

I've written previously here about DCMI's use of an HTML meta data profile for document metadata, and the use of a GRDDL profile transformation to extract RDF triples from an XHTML document. DCMI has had made use of an HTML profile for many years, but providing a "GRDDL-enabled" version is a more recent development - and it is one which I admit I was quietly quite pleased to see put in place, as I felt it illustrated rather neatly how DCMI was trying to implement some of the "follow your nose" principles of Web Architecture.

A little while ago, I noticed that the Web-based tools which I usually use to test GRDDL processing (the W3C GRDDL service and the librdf parser demonstrator) were generating errors when I tried to process documents which reference the profile. I've posted a more detailed account of my investigations to the dc-architecture Jiscmail list, and I won't repeat them all here, but in short it comes down to the use of the entity references (&nbsp; and &copy;) in the profile document, which itself is subject to a GRDDL transformation to extract the pointer to the profile transformation.

The problem arises because XHTML defines those entity references in the XHTML DTD, i.e. externally to the document itself, and a non-validating XML processor is not required to read that DTD when parsing the document, with the consequence that it fails to resolve the references - and there's no guarantee that a GRDDL processor will employ a validating parser. There's a more extended discussion of these issues in a post by Lachlan Hunt from 2005 which concludes:

Character entity references can be used in HTML and in XML; but for XML, other than the 5 predefined entities, need to be defined in a DTD (such as with XHTML and MathML). The 5 predefined entities in XML are: &amp;, &lt;, &gt;, &quot; and &apos;. Of these, you should note that &apos; is not defined in HTML. The use of other entities in XML requires a validating parser, which makes them inherently unsafe for use on the web. It is recommended that you stick with the 5 predefined entity references and numeric character references, or use a Unicode encoding.

And the GRDDL specification itself cautions :

Document authors, particularly XHTML document authors, who wish their documents to be unambiguous when used with GRDDL should avoid dependencies on an external DTD subset; specifically:

  • Explicitly include the XHTML namespace declaration in an XHTML document, or an appropriate namespace in an XML document.
  • Avoid use of entity references, except those listed in section 4.6 of the XML specification.
  • And, more generally, follow the rules listed for the standalone document validity constraint.

A note will be added to the DC-HTML profile document to emphasise this point (and the offending references removed).

I guess I was surprised that no-one else had reported the error, particularly as it potentially affects the processing of all instance documents. The fact that they hadn't does rather lends weight to the suspicion that I voiced here a few weeks ago that it may well be that few implementers are actually making use of the DC-HTML GRDDL profile transformation.

January 22, 2010

On the use of Microsoft SharePoint in UK universities

A while back we decided to fund a study looking at the uptake of SharePoint within UK higher education institutions, an activity undertaken on our behalf by a team from the University of Northumbria led by Julie McLeod.  At the time of the announcement of this work we took some stick about the focus on a single, commercially licensed, piece of software - something I attempted to explain in a blog post back in May last year.  On balance, I still feel we made the right decision to go with such a focused study, and I think the popularity of the event that we ran towards the end of last year confirms that to a certain extent.

I'm very pleased to say that the final report from the study is now available.  As with all the work we fund, the report has been released under a Creative Commons licence so feel free to go ahead a make use of it in whatever way you find helpful.  I think it's a good study that summarises the current state of play very nicely.  The key findings are listed on the project home page so I won't repeat them here.  Instead, I'd like to highlight what the report says about the future:

This research was conducted in the summer and autumn of 2009. Looking ahead to 2010 and beyond the following trends can be anticipated:

  • Beginnings of the adoption of SharePoint 2010
    SharePoint 2010 will become available in the first half of 2010. Most HEIs will wait until a service pack has been issued before they think about upgrading to it, so it will be 2011 before SharePoint 2010 starts to have an impact. SharePoint 2010 will bring improvements to the social computing functionality of My Sites, with Facebook/Twitter style status updates, and with tagging and bookmarking. My Sites are significant in an HE context because they are the part of SharePoint that HEIs consider providing to students as well as staff. We have hitherto seen lacklustre take up of My Sites in HE. Some HEIs implementing SharePoint 2007 have decided not to roll out My Sites at all, others have only provided them to staff, others have made them available to staff and students but decided not to actively promote them. We are likely to see increasing provision and take up of My Sites from those HEIs that move to SharePoint 2010.
  • Fuzzy boundary between SharePoint implementations and Virtual Learning Environments
    There is no prospect, in the near future, of SharePoint challenging Blackboard’s leadership in the market for institutional VLEs for teaching and learning. Most HEIs now have both an institutional VLE, and a SharePoint implementation. Institutional VLEs are accustomed to battling against web hosted applications such as Facebook for the attention of staff and students. They now also face competition internally from SharePoint. Currently SharePoint seems to be being used at the margins of teaching and learning, filling in for areas where VLEs are weaker. HEIs have reported SharePoint’s use for one-off courses and small scale courses; for pieces of work requiring students to collaborate in groups, and for work that cannot fit within the confines of one course. Schools or faculties that do not like their institution’s proprietary VLE have long been able to use an open source VLE (such as Moodle) and build their own VLE in that. Now some schools are using SharePoint and building a school specific VLE in SharePoint. However, SharePoint has a long way to go before it is anything more than marginal to teaching and learning.
  • Increase in average size of SharePoint implementations
    At the point of time in which the research was conducted (summer and autumn of 2009) many of the implementations examined were at an early stage. The boom in SharePoint came in 2008 and 2009, as HEIs started to pick up on SharePoint 2007. We will see the maturation of many implementations which are currently less than a year old. This is likely to bring with it some governance challenges (for example ‘SharePoint sprawl’) which are not apparent when implementations are smaller. It will also increase the percentage of staff and students in HE familiar with SharePoint as a working environment. One HEI reported that some of their academics, unaware that the University was about to deploy SharePoint, have been asking for SharePoint because they have been working with colleagues at other institutions who are using it.
  • Competition from Google Apps for the collaboration space
    SharePoint seems to have competed successfully against other proprietary ECM vendors in the collaboration space (though it faces strong competition from both proprietary and open source systems in the web content management space and the portal space). It seems that the most likely form of new competition in the collaboration space will come in the shape of Google Apps which offers significantly less functionality, but operates on a web hosted subscription model which may appeal to HEIs that want to avoid the complexities of the configuration and management of SharePoint.
  • Formation of at least one Higher Education SharePoint User Group
    It is surprising that there is a lack of Higher Education SharePoint user groups. There are two JISCmail groups (SharePoint-Scotland and YH-SharePoint) but traffic on these two lists is low. The formation of one or more active SharePoint user groups would seem to be essential given the high level of take up in the sector, the complexity of the product, the customisation and configuration challenges it poses, and the range of uses to which it can be put. Such a user group or groups could, support the sharing of knowledge across the sector, provide the sector with a voice in relation to both Microsoft and to vendors within the ecosystem around SharePoint, enable the sector to explore the implications of Microsoft’s increasing dominance within higher education, as domination of the collaboration space is added to its domination of operating systems, e-mail servers, and office productivity software.

On the last point, I am minded to wonder what a user group actually looks like in these days of blogs, Twitter and other social networks? Superficially, it feels to me like a concept rooted firmly in the last century. That's not to say that there isn't value in collectively being able to share our experiences with a particular product, both electronically and face-to-face, nor in being able to represent a collective view to a particular vendor - so there's nothing wrong with the underlying premise. Perhaps it is just the label that feels outdated?

November 23, 2009

Memento and negotiating on time

Via Twitter, initially in a post by Lorcan Dempsey, I came across the work of Herbert Van de Sompel and his comrades from LANL and Old Dominion University on the Memento project:

The project has since been the topic of an article in New Scientist.

The technical details of the Memento approach are probably best summarised in the paper "Memento: Time Travel for the Web", and Herbert has recently made available a presentation which I'll embed here, since it includes some helpful graphics illustrating some of the messaging in detail:

Memento seeks to take advantage of the Web Architecture concept that interactions on the Web are concerned with exchanging representations of resources. And for any single resource, representations may vary - at a single point in time, variant representations may be provided, e.g. in different formats or languages, and over time, variant representations may be provided reflecting changes in the state of the resource. The HTTP protocol incorporates a feature called content negotiation which can be used to determine the most appropriate representation of a resource - typically according to variables such as content type, language, character set or encoding. The innovation that Memento brings to this scenario is the proposition that content negotiation may also be applied to the axis of date-time. i.e. in the same way that a client might express a preference for the language of the representation based on a standard request header, it could also express a preference that the representation should reflect resource state at a specified point in time, using a custom accept header (X-Accept-Datetime).

More specifically, Memento uses a flavour of content negotiation called "transparent content negotiation" where the server provides details of the variant representations available, from which the client can choose. Slides 26-50 in Herbert's presentation above illustrate how this technique might be applied to two different cases: one in which the server to which the initial request is sent is itself capable of providing the set of time-variant representations, and a second in which that server does not have those "archive" capabilities but redirects to (a URI supported by) a second server which does.

This does seem quite an ingenious approach to the problem, and one that potentially has many interesting applications, several of which Herbert alludes to in his presentation.

What I want to focus on here is the technical approach, which did raise a question in my mind. And here I must emphasise that I'm really just trying to articulate a question that I've been trying to formulate and answer for myself: I'm not in a position to say that Memento is getting anything "wrong", just trying to compare the Memento proposition with my understanding of Web architecture and the HTTP protocol, or at least the use of that protocol in accordance with the REST architectural style, and understand whether there are any divergences (and if there are, what the implications are).

In his dissertation in which he defines the REST architectural style, Roy Fielding defines a resource as follows:

More precisely, a resource R is a temporally varying membership function MR(t), which for time t maps to a set of entities, or values, which are equivalent. The values in the set may be resource representations and/or resource identifiers. A resource can map to the empty set, which allows references to be made to a concept before any realization of that concept exists -- a notion that was foreign to most hypertext systems prior to the Web. Some resources are static in the sense that, when examined at any time after their creation, they always correspond to the same value set. Others have a high degree of variance in their value over time. The only thing that is required to be static for a resource is the semantics of the mapping, since the semantics is what distinguishes one resource from another.

On representations, Fielding says the following, which I think is worth quoting in full. The emphasis in the first and last sentences is mine.

REST components perform actions on a resource by using a representation to capture the current or intended state of that resource and transferring that representation between components. A representation is a sequence of bytes, plus representation metadata to describe those bytes. Other commonly used but less precise names for a representation include: document, file, and HTTP message entity, instance, or variant.

A representation consists of data, metadata describing the data, and, on occasion, metadata to describe the metadata (usually for the purpose of verifying message integrity). Metadata is in the form of name-value pairs, where the name corresponds to a standard that defines the value's structure and semantics. Response messages may include both representation metadata and resource metadata: information about the resource that is not specific to the supplied representation.

Control data defines the purpose of a message between components, such as the action being requested or the meaning of a response. It is also used to parameterize requests and override the default behavior of some connecting elements. For example, cache behavior can be modified by control data included in the request or response message.

Depending on the message control data, a given representation may indicate the current state of the requested resource, the desired state for the requested resource, or the value of some other resource, such as a representation of the input data within a client's query form, or a representation of some error condition for a response. For example, remote authoring of a resource requires that the author send a representation to the server, thus establishing a value for that resource that can be retrieved by later requests. If the value set of a resource at a given time consists of multiple representations, content negotiation may be used to select the best representation for inclusion in a given message.

So at a point in time t1, the "temporally varying membership function" maps to one set of values, and - in the case of a resource whose representations vary over time - at another point in time t2, it may map to another, different set of values. To take a concrete example, suppose at the start of 2009, I launch a "quote of the day", and I define a single resource that is my "quote of the day", to which I assign the URI http://example.org/qotd/. And I provide variant representations in XHTML and plain text. On 1 January 2009 (time t1), my quote is "From each according to his abilities, to each according to his needs", and I provide variant representations in those two formats, i.e. the set of values for 1 January 2009 is those two documents. On 2 January 2009 (time t2), my quote is "Those who do not move, do not notice their chains", and again I provide variant representations in those two formats, i.e. the set of values for 2 January 2009 (time t2) is two XHTML and plain text documents with different content from those provided at time t1.

So, moving on to that second piece of text I cited, my interpretation of the final sentence as it applies to HTTP (and, as I say, I could be wrong about this) would be that the RESTful use of the HTTP GET method is intended to retrieve a representation of the current state of the resource. It is the value set at that point in time which provides the basis for negotiation. So, in my example here, on 1 January 2009, I offer XHTML and plain text versions of my "From each according to his abilities..." quote via content negotiation, and on 2 January 2009, I offer XHTML and plain text versions of my "Those who do not move..." quotations. i.e. At two different points in time t1 and t2, different (sets of) representations may be provided for a single resource, reflecting the different state of that resource at those two different points in time, but at either of those points in time, the expectation is that each representation of the set available represents the state of the resource at that point in time, and only members of that set are available via content negotiation. So although representations may vary by language, content-type etc, they should be in some sense "equivalent" (Roy Fielding's term) in terms of their representation of the current state of the resource.

I think the Memento approach suggests that on 2 January 2009, I could, using the date-time-based negotiation convention, offer all four of those variants listed above (and on each day into the future, a set which increases in membership as I add new quotes). But it seems to me that is at odds with the REST style, because the Memento approach requires that representations of different states of the resource (i.e. the state of the resource at different points in time) are all made available as representations at a single point in time.

I appreciate that (even if my interpretation is correct, which it may not be) the constraints specified by the REST architectural style are just that: a set of constraints which, if observed, generate certain properties/characteristics in a system. And if some of those constraints are relaxed or ignored, then those properties change. My understanding is not good enough to pinpoint exactly what the implications of this particular point of divergence (if indeed it is one!) would be - though as Herbert notes in hs presentation, it would appear that there would be implications for cacheing.

But as I said, I'm really just trying to raise the questions which have been running around my head and which I haven't really been able to answer to my own satisfaction.

As an aside, I think Memento could probably achieve quite similar results by providing some metadata (or a link to another document providing that metadata) which expressed the relationships between the time-variant resource and all the time-specific variant resources, rather than seeking to manage this via HTTP content negotiation.

Postscript: I notice that, in the time it has taken me to draft this post, Mark Baker has made what I think is a similar point in a couple of messages (first, second) to the W3C public-lod mailing list.

October 22, 2009

SharePoint in UK universities event

We've just announced an event (in London on 25 November 2009) based on the work that's been done by Northumbria University (and others) as part of the Investigation into the Uptake and use of Microsoft SharePoint by HEIs study that we funded a while back.

  • Do you want to learn about how and why HEIs are using SharePoint? What worked well, lessons learned?
  • Do you want to hear from some HEIs about their experience of implementing SharePoint?
  • Do you want the opportunity to network and learn about real experiences with SharePoint in HEIs and benchmark yourself?

The event will provide a chance to hear from the project team about their findings, as well as from 4 university-based case-studies (Peter Yeadon, UWE, University of Glasgow, and University of Kent).

Please go to the registration page to sign-up - places are limited.

October 14, 2009

Open, social and linked - what do current Web trends tell us about the future of digital libraries?

About a month ago I travelled to Trento in Italy to speak at a Workshop on Advanced Technologies for Digital Libraries organised by the EU-funded CACOA project.

My talk was entitled "Open, social and linked - what do current Web trends tell us about the future of digital libraries?" and I've been holding off blogging about it or sharing my slides because I was hoping to create a slidecast of them. Well... I finally got round to it and here is the result:

Like any 'live' talk, there are bits where I don't get my point across quite as I would have liked but I've left things exactly as they came out when I recorded it. I particularly like my use of "these are all very bog standard... err... standards"! :-)

Towards the end, I refer to David White's 'visitors vs. residents' stuff, about which I note he has just published a video. Nice one.

Anyway... the talk captures a number of threads that I've been thinking and speaking about for the last while. I hope it is of interest.

October 05, 2009

SharePoint in UK universities - literature review

We are currently funding the University of Northumbria to undertake some work for us looking at the uptake of Microsoft SharePoint in UK universities.  As part of this work we have just published a literature review [PDF] by James Lappin and Julie McLeod:

SharePoint 2007 has spread rapidly in the Higher Education (HE) sector, as in most other market sectors. It is an extra-ordinarily wide ranging piece of software and it has been put to a wide variety of different uses by different UK Higher Education Institutions (HEIs). This literature review is based upon what HEIs have been willing to say about their implementations in public.

Implementations range from the provision of team sites supporting team collaboration, through the use of SharePoint to support specific functions, to its use as an institutional portal, providing staff and/or students with a single site from which to access key information sources and tools.

By far the most common usage of SharePoint in UK HEIs is for team collaboration. This sees SharePoint team sites replacing, or supplementing, network shared drives as the area in which staff collaborate on documents and share information with each other.

September 16, 2009

Edinburgh publish guidance on research data management

The University of Edinburgh has published some local guidance about the way that research data should be managed, Research data management guidance, covering How to manage research data and Data sharing and preservation, as well as detailing local training, support and advice options.

One assumes that this kind of thing will become much more common at universities over the next few years.

Having had a very quick look, it feels like the material is more descriptive than prescriptive - which isn't meant as a negative comment, it just reflects the current state of play. The section on Data documentation & metadata for example, gives advice as simple as:

Have you created a "readme.txt" file to describe the contents of files in a folder? Such a simple act can be invaluable at a later date.

but also provides a link to the UK Data Archive's guidance on Data Documentation and Metadata, which at first sight appears hugely complex. I'm not sure what your average research will make of it?

(In passing, I note that the UKDA seem to be promoting the use of the Data Documentation Initiative standard at what they call the 'catalogue' level, a standard that I've not come across before but one that appears to be rooted firmly outside the world of linked data, which is a shame.)

Similarly, the section on Methods for data sharing lists a wide range of possible options (from "posting on a University website" thru to "depositing in a data repository") without being particularly prescriptive about which is better and why.

(As a second aside, I am continually amazed by this firm distinction in the repository world between 'posting on the website' and 'depositing in a repository' - from the perspective of the researcher, both can, and should, achieve the same aims, i.e. improved management, more chance of persistence and better exposure.)

As we have found with repositories of research publications, it seems to me that research data repositories (the Edinburgh DataShare in this case) need to hide much of this kind of complexity, and do most of the necessary legwork, in order to turn what appears to be a simple and obvious 'content management' workflow (from the point of view of the individual researcher) into a well managed, openly shared, long term resource for the community.

August 06, 2009

The management of website content in UK universities - report available

The final report from the Investigation into the management of website content in higher education institutions (undertaken by SIRC and funded by us) is now available.

We funded the investigation for two reasons: firstly, to help the community (particularly those involved with university 'web teams') to understand itself a little better and secondly, to help us understand the space in order that we can think about tailoring our own content management, web hosting and other services to the needs of UK higher education in line with our charitable mission.  I think/hope we've succeeded on both counts.

So what have we learned?  Well, first off, it's a long report, 58 pages, so it's not easy to summarise in a few words.  At the Care in the community session we ran at IWMW a week or so ago, Simon Bradley from SIRC presented these slides, which give a nice overview:

Trying to look past the raw numbers a little, here are my thoughts...

The management of university Web content (and associated provision of Web applications more generally) continues to mature as an area of professional activity and there is a growing recognition of the value that the Web and the Web team bring to the institution. That said, there appears to be a continued emphasis (particularly amongst senior members of HEIs) on using the Web as a way of “marketing the institution to new audiences” rather than meeting the ‘business’ needs of existing members of the institution (lecturers, students and researchers and other staff). Furthermore, despite the growing recognition of value there is a perceived mismatch between the expectations put on the Web team and the level of resources made available to them leading to significant ‘time pressures’ for many teams.

Web teams need to be as good at writing plain English as they are at writing code and the challenges they face are at least as much ‘managerial’ as they are ‘technical’. Given that many Web teams remain quite small that seems to imply that flexible people with a broad skills-base are quite valuable. Web strategies are seen as very important but need to be adopted institution-wide to be properly effective. Sitting at the cross-roads between new media, ‘old school’ university culture and the more hard-nosed world of marketing and business requires good communication skills. Such positioning makes Web teams crucial to the successful functioning of HEIs but can also leave them vulnerable to the kinds of issues and challenges faced by such large, complex organisations. The wide variety of job titles, job descriptions and organisational positioning for those with responsibility for the management of Web content leads to a somewhat confusing picture across the UK as a whole – something that is indicative of a profession that, while continuing to mature, is still relatively young.

Despite the indicated shift towards greater recognition of the importance of the Web in universities, there remains a need for broad 'cultural shifts' in Higher Education. Attitudes and perspectives are changing, but academics and senior management alike still need to develop a better understanding the nature of the web as a context for academic practice, as a platform for sharing knowledge, and as an avenue for economic development.

The use of Content Management Systems is wide-spread, with about half having been deployed since 2006. Major factors in the decision-making process appear to be usability, reliability and scalability while familiarity and popularity are not deemed to be important. In general, HEIs seem reasonably happy with their choice of CMS with the majority not currently considering a change. Where change is being considered, technical limitations and changing institutional requirements are cited as the reason. There seems to be little evidence that the university community takes a particular view ‘for’ or ‘against’ open source or proprietary software CMS solutions.

There seems to be a similarly balanced attitude to outsourcing. Whilst valid reasons are presented for developing skills in-house, it is clear that Web teams are willing to consider outsourcing work to external consultants (e.g. where there are skills gaps) not least because it is sometimes the case that senior management seem to be prepared to take more notice of an ‘impartial’ external view. That said, the survey data does not suggest overwhelming satisfaction with the use of external consultants in the work of the Web team.

It is clear that most university Web teams now monitor user behaviour in some way in order to inform the future design of the website. However, most teams indicated that such monitoring is not comprehensive enough.

Web teams seem to be broadly optimistic about the future of Web content management, recognising several key areas as short-term drivers for development (growing use of rich media and social networking and the need for a more personalised website offering for example). However, there appears to be a more cautious response when asked to consider the institution’s ability to keep pace with the current and future rate of technological change, the implication being that more investment in resources is required if universities are to continue to maximise the effectiveness of their use of the Web in the future.

A big thank you to all those people who contributed to the report, either thru the interviews or by completeing the Web survey. It is much appreciated. The report is dotted with some pretty pithy comments from those interviewed, some of which I've been tweeting over the last few days. Here are two of my favorites:

I think universities have a habit of going “Yes, the web is the future” without actually giving it any resource because it’s seen as being free and actually it’s a huge entity involving a lot of people, underlying technologies, and it needs managing in the same way as any other resource.

and:

I think there are several opposites that I naturally find myself in the middle of with understanding of both; so mediation is an enormous part of the role because I’ve got quite extreme marketing coming in one ear and the usual academic way of just talking in really quite difficult language and sounding very clever for five pages without any paragraph breaks in another.

Happy reading... I really hope people find this report to be of value. Let us know how you get on with it.

July 31, 2009

Care in the community

I was at UKOLN's Institutional Web Management Workshop 2009 event at the University of Essex earlier this week to run a workshop session with Ed Barker and Simon Bradley (of SIRC) entitled Care in the community... how do you manage your Web content?. The session, and the workshop more generally for that matter, went pretty well I think.  We used our 90 minutes for a mix of presentation, Simon giving a whirlwind tour of the major findings of the Investigation into the management of website content in higher education institutions that they've been undertaking on our behalf, and group discussion.

For the discussion groups we split people randomly into 3 groups to discuss a range of propositions based loosely on the findings of the investigation. The groups were asked to consider each proposition and to either agree with it or to offer an alternative version. They were than asked to write down 3 consequences (issues, actions or conclusions) that arose from their agreed proposition.

16 propositions were available, inside sealed envelopes labelled with one of 5 broad topic areas:

  • The Web Team,
  • Institutional Issues,
  • CMS,
  • End Users,
  • The Future.

Of the available propositions, 13 were discussed by the groups in the time available. Note that the propositions were chosen to stimulate discussion. They do not necessarily represent the views of Eduserv or SIRC. Perhaps more importantly, they should not be taken as a direct representation of the findings of the study.

The outputs from the group discussions are now available on Google Docs. The report of the investigation itself will be published on Thursday 6th August.

June 18, 2009

How do you manage yours?

You may recall that we are currently funding an Investigation into the management of website content in higher education institutions, an activity being undertaken on our behalf by SIRC.

As part of this work, SIRC have put together a Web-based survey looking at various issues associated with the management of website content in UK HEIs.

As part of the project we are seeking the assistance of those involved in the management of web content within HEIs in completing an online survey. The following survey has been informed by over 20 hours of in-depth interviews with members of web teams and individuals from Computing Services and Marketing in HEIs more generally.

The survey should take approximately 15-20 minutes to complete. We would be extremely grateful if you could take the time to complete the questionnaire. With your input we hope to be able to provide the most comprehensive picture to date of web content management within UK HEIs – in terms of the structure of content management, the technologies used and the challenges faced by those working in the sector.

After completing all questions in the survey applicable to you, you will be entered in a random draw to win one of four flip video cameras for some instant user generated Web 2.0 content on a University webpage. In order to be eligible for the draw, you must fill out your full name and email address on the last page of the survey so that we may contact you. We will not use your personal information for any other purposes. The survey will close at midnight on Wednesday 8th July, 2009. To be eligible for the draw all surveys must be completed and submitted by that time.

We are interested in the opinions of anyone that is involved in the management of web content within HEIs. Completed surveys will be welcome from different individuals / departments within the same institution. Please feel free to pass on the details of the survey to colleagues / peers who you think might be able to contribute. If you have contacts outside of your institution whom you think might be interested in the research please feel forward this link to them. Please note that the results of the survey will be anonymous.

Two things are worth noting...

Firstly, as a not-for-profit provider of 'content management' services we (Eduserv) are very keen that people understand that this is not just a bit of traditional 'commercial' market research from which only we benefit. We want the community to benefit from this work (as well as us of course!). On that basis, we will make the final report openly available to the community and we have asked SIRC not to provide us with any raw material from the survey that can tie responses back to an individual or institution. That means you can be sure that if/when you fill out this survey (and we hope you will) your privacy and confidentiality is assured.

Secondly, the survey looks like it is going to be quite long at the outset. Please don't panic... parts of the survey get skipped depending on the answers you give and in any case, many of the answers are selected from lists. On that basis, we hope it won't take longer to complete than the estimated 15-20 minutes.

So, if you are in any way involved with the creation or management of website content in a UK HEI, please take time to complete the survey.  Thanks.

PS. Brian Kelly has remined me that this study will also lead to a workshop session, "Care in the community... how do you manage your Web content?", at the forthcoming IWMW 2009 event (bookings for which close tomorrow).

May 18, 2009

SharePoint study

We're commissioning a study looking at the uptake and use of Microsoft SharePoint by Higher Education Institutions and currently have an ITT available.

This is an unusual study for us - in the sense that it focuses on an individual product - a fact that hasn't gone unnoticed either internally or in the community. When we announced the study on the [email protected] mailing list, David Newman (Queen's University Management School, Belfast) responded with:

What a remarkably narrow research scope. It would be interesting to find out what groupware HEI institutions are using to support particular functions (co-ordinating international research projects, helping students work together in group projects, joint report editing, keeping track of expenses, ...). But just one product from one supplier?

I think David is right to raise this as an issue but there are reasons why we've done things in the way that we have and I think those reasons are worth sharing. Here's a copy of my response to David:

Hi David,
Firstly, I agree with you that this looks to be a rather narrowly scoped piece of work. It is the kind of study that we haven't funded to date and it's something that we didn't fund without a certain amount of internal angst! On that basis, I think it is worth me trying to explain where we are coming from with it.

You should note that this study comes out of our new Research Programme

http://www.eduserv.org.uk/research/

rather than the previous Eduserv Foundation (which has now been wrapped up, except in the sense that we are continuing to support projects that we previously funded under the Foundation). Our previously announced ITT for a study looking at the way Web content is managed by HEIs (currently being undertaken by SIRC)

http://www.eduserv.org.uk/research/studies/wcm2009

came from the same place.

The change from a Foundation to a Research Programme brought with it a subtle, but significant, change of emphasis. Eduserv is a non-profit IT services company. We have a charitable mission to "realise the benefits of ICT for learners and researchers", something we believe we do most effectively thru the services we deliver, e.g. those provided for the education community (particularly HE). Because of that, we felt we would get better 'value' from our research funding (more bang-per-buck if you like) if we tried to align it more closely with the kinds of services we offer. That is what we are trying to do thru the new Research Programme.

Our services to HE currently include OpenAthens and Chest, though we have a desire to improve our Web hosting/development offer within the sector as well (something we currently sell primarily into the public sector). For info... we are also in the final stages of developing a new data centre in Swindon and we hope to use that as the basis for new services to the HE sector in the future.

As a service provider, we sense a significant (and growing) interest in the use of MS SharePoint as the basis for the provision of a fairly wide range of solutions. This is particularly true in the public sector, where we also operate, but also in HE (for example, the HEA are just in the process of initiating a SharePoint project). Please note, I'm not saying this is necessarily good thing - my personal view is that it is not (though my personal view on all this is largely irrelevant!).

We tried to broaden the scope of the ITT in line with the kind of "groupware" suggestion you make below [above] but ultimately we felt that in doing so it was hard to capture the breadth of things that people are trying to do in SharePoint without ending up with something quite fuzzy and unfocused. On that basis, we reluctantly narrowed in on a specific technology - something we are not used to doing.

Let me be quite clear. We are not looking for a study that says MS SharePoint is the answer to everything (or indeed anything). Nor, that it is the answer to nothing. We are looking to understand what people in HE are doing with SharePoint, what they think works well, what they think is broken, why they have considered but rejected it and so on.

In that sense, it is a piece of market research... pure and simple. However, we believed (perhaps wrongly?) that the community would also be interested in this topic, which is why the findings of the work will be made openly available under a CC licence. The intention is to help both us and the community make better long term deployment decisions and, rightly or wrongly, we felt that decisions about one particular piece of software, i.e. SharePoint, was a significant enough part of that in this particular case to make the study worthwhile.

Hope that helps?

Note, I'm very happy to continue to hear if people think we have gone badly wrong on this because it will help us to spend our money more wisely (i.e. more effectively for the benefit of both us and the community) in the future.

Best,

Andy

April 24, 2009

Investigation into the management of website content in higher education institutions

I'm very pleased to announce that work has now started on a short study looking at the issues around the management of website content in higher education institutions. Full details are available on the website so I won't repeat them here. The work is being undertaken by the Social Issues Research Centre (SIRC) on our behalf and will culminate in an openly available report (released under Creative Commons). We also plan to run an interactive session at the next Institutional Web Management Workshop in Essex in July, tentatively entitled Care in the community... how do you manage your Web content?

March 20, 2009

Unlocking Audio

I spent the first couple of days this week at the British Library in London, attending the Unlocking Audio 2 conference.  I was there primarily to give an invited talk on the second day.

You might notice that I didn't have a great deal to say about audio, other than to note that what strikes me as interesting about the newer ways in which I listen to music online (specifically Blip.fm and Spotify) is that they are both highly social (almost playful) in their approach and that they are very much of the Web (as opposed to just being 'on' the Web).

What do I mean by that last phrase?  Essentially, it's about an attitude.  It's about seeing being mashed as a virtue.  It's about an expectation that your content, URLs and APIs will be picked up by other people and re-used in ways you could never have foreseen.  Or, as Charles Leadbeater put it on the first day of the conference, it's about "being an ingredient".

I went on to talk about the JISC Information Environment (which is surprisingly(?) not that far off its 10th birthday if you count from the initiation of the DNER), using it as an example of digital library thinking more generally and suggesting where I think we have parted company with the mainstream Web (in a generally "not good" way).  I noted that while digital library folks can discuss identifiers forever (if you let them!) we generally don't think a great deal about identity.  And even where we do think about it, the approach is primarily one of, "who are you and what are you allowed to access?", whereas on the social Web identity is at least as much about, "this is me, this is who I know, and this is what I have contributed". 

I think that is a very significant difference - it's a fundamentally different world-view - and it underpins one critical aspect of the difference between, say, Shibboleth and OpenID.  In digital libraries we haven't tended to focus on the social activity that needs to grow around our content and (as I've said in the past) our institutional approach to repositories is a classic example of how this causes 'social networking' issues with our solutions.

I stole a lot of the ideas for this talk, not least Lorcan Dempsey's use of concentration and diffusion.  As an aside... on the first day of the conference, Charles Leadbeater introduced a beach analogy for the 'media' industries, suggesting that in the past the beach was full of a small number of large boulders and that everything had to happen through those.  What the social Web has done is to make the beach into a place where we can all throw our pebbles.  I quite like this analogy.  My one concern is that many of us do our pebble throwing in the context of large, highly concentrated services like Flickr, YouTube, Google and so on.  There are still boulders - just different ones?  Anyway... I ended with Dave White's notions of visitors vs. residents, suggesting that in the cultural heritage sector we have traditionally focused on building services for visitors but that we need to focus more on residents from now on.  I admit that I don't quite know what this means in practice... but it certainly feels to me like the right direction of travel.

I concluded by offering my thoughts on how I would approach something like the JISC IE if I was asked to do so again now.  My gut feeling is that I would try to stay much more mainstream and focus firmly on the basics, by which I mean adopting the principles of linked data (about which there is now a TED talk by Tim Berners-Lee), cool URIs and REST and focusing much more firmly on the social aspects of the environment (OpenID, OAuth, and so on).

Prior to giving my talk I attended a session about iTunesU and how it is being implemented at the University of Oxford.  I confess a strong dislike of iTunes (and iTunesU by implication) and it worries me that so many UK universities are seeing it as an appropriate way forward.  Yes, it has a lot of concentration (and the benefits that come from that) but its diffusion capabilities are very limited (i.e. it's a very closed system), resulting in the need to build parallel Web interfaces to the same content.  That feels very messy to me.  That said, it was an interesting session with more potential for debate than time allowed.  If nothing else, the adoption of systems about which people can get religious serves to get people talking/arguing.

Overall then, I thought it was an interesting conference.  I suspect that my contribution wasn't liked by everyone there - but I hope it added usefully to the debate.  My live-blogging notes from the two days are here and here.

March 05, 2009

A National Research Data Service for the UK?

I attended the A National Research Data Service for the UK? meeting at the Royal Society in London last week and my live-blogged notes are available for those who want more detail.  Chris Rusbridge also blogged the day on the Digital Curation Blog - session 1, session 2, session 3 and session 4.  FWIW, I think that Chris's posts are more comprehensive and better than my live-blogged notes.

The day was both interesting and somewhat disappointing...

Interesting primarily because of the obvious political tension in the room (which I characterised on Twitter as a potential bun-fight between librarians and the rest but which in fact is probably better summed up as a lack of shared agreement around centralist (discipline-based) solutions vs. institutional solutions).

Disappointing because the day struck me more as a way of presenting a done-deal than as a real opportunity for debate.

The other thing that I found annoying was the constant parroting of the view that "researchers want to share their data openly" as though this is an obvious position.  The uncomfortable fact is that even the UKRDS report's own figures suggest that less than half (43%) of those surveyed "expressed the need to access other researchers' data" - my assumption therefore is that the proportion currently willing to share their data openly will be much smaller.

Don't take this as a vote against open access, something that I'm very much in favour of.  But, as we've found with eprint archives, a top-down "thou shalt deposit because it is good for you" approach doesn't cut it with researchers - it doesn't result in cultural change.  Much better to look for, and actively support, those areas where open sharing of data occurs naturally within a community or discipline, thus demonstrating its value to others.

That said, a much more fundamental problem facing the provision of collaborative services to the research community is that funding happens nationally but research happens globally (or at least across geographic/funding boundaries) - institutions are largely irrelevant whichever way you look at it [except possibly as an agent of long term preservation - added 6 March 2009].  Resolving that tension seems paramount to me though I have no suggestions as to how it can be done.  It does strike me however that shared discipline-based services come closer to the realities of the research world than do institutional services.

March 03, 2009

Web content management in UK universities

We've decided to fund a study looking at the way in which UK universities manage their Web content. There are two primary reasons for this...

Firstly, we think that sharing knowledge about current practice across the community is likely to be both of interest to people and beneficial in terms of moving things forward. Secondly, we offer content management systems as part of our charitable portfolio of services but we have not, to date, been very successful at convincing HE institutions that our offer is a good one. Consequently, we'd like to understand better what we can offer that is seen to be valuable.

We're undertaking this activity as part of our new Research Programme, which means that all the findings will be openly available to the community (including to our CMS competitors). We think this is a good thing. We're also hoping to use the findings of the study to seed a discussion session at the next Institutional Web Management Workshop.

About

Search

Loading
eFoundations is powered by TypePad