RDFa for the Eduserv Web site
Another post that I've been intermittently chiselling away at in the draft pile for a while... A few weeks ago, I was asked by Lisa Price, our Website Communications Manager, to make some suggestions of how Eduserv might make use of the RDFa in XHTML syntax to embed structured data in pages on the Eduserv Web site, which is currently in the process of being redesigned. I admit this is coming mostly from the starting point of wanting to demonstrate the use of the technology rather than from a pressing use case, but OTOH there is a growing interest from RDFa amongst some of Eduserv's public sector clients so a spot of "eating our own dogfood" would be a Good Thing, and furthermore there are signs of a gradual but significant adoption of RDFa by some major Web service providers.
It seems to me Eduserv might use RDFa to describe, or make assertions about:
- (Perhaps rather trivially) Web pages themselves i.e. reformulating the (fairly limited) "document metadata" we supply as RDFa.
- (Perhaps rather more interestingly) some of the "things" that Eduserv pages "are about", or that get mentioned in those pages (e.g. persons, organisations, activities, events, topics of interest, etc).
Within that category of data about "things", we need to decide which data it is most useful to expose. We could:
- look at those classes of data that are processed by tools/services that currently make use of RDFa (typically using specified RDF vocabularies); or
- focus on data that we know already exists in a "structured" form but is currently presented in X/HTML either only in human-readable form or using microformats (or even new data which isn't currently surfaced at all on the current site)
Another consideration was the question of whether data was covered by existing models and vocabularies or required some analysis and modelling.
To be honest, there's a fairly limited amount of "structured" information on the site currently. There is some data on licence agreements for software and data, currently made available as HTML tables and Excel spreadsheets. While I think some of the more generic elements of this might be captured using a product/service ontology such as Good Relations, the license-specific aspects would require some additional modelling. For the short term at least, we've taken a somewhat "pragmatic" approach and focused mainly on that first class of data for which there are some identifiable consuming applications, based on the use of specified RDF vocabularies - and more specifically on data that Google and Yahoo make particular reference to in their documentation for creators/publishers of Web pages.
That's not to say there won't be more use of RDFa on the site in the future: at the moment, this is something of a "dipping toes in the water" exercise, I think.
The following is by best effort to summarize Google and Yahoo support for RDFa at the time of writing. Please note that this is something which is evolving - as I was writing up this post, I just noticed that the Google guidelines have changed slightly since I sent my initial notes to Lisa. And I'm still not at all sure I've captured the complete picture here, so please do check their current documentation for content providers to get an idea of the current state of play.
Google and RDFa
Google's support for RDFa is part of a larger programme of support for structured data embedded in X/HTML that they call "rich snippets" (announced here), which includes support for RDFa, microformats and microdata. (The latter, I think, is a relatively recent addition).
Google functionality extends to extracting specified categories of RDFa data in (some) pages it indexes, and displaying that in search result sets (and in place pages in Google Maps). It also provides access to the data in its Custom Search platform.
Initially at least, Google required the use of its own RDF vocabularies, which attracted some criticism (see e.g. Ian Davis' response), but it appears to have fairly quietly introduced some support for other RDF vocabularies. "In addition to the Person RDFa format, we have added support for the corresponding fields from the FOAF and vCard vocabularies for all those of you who asked for it." And Martin Hepp has pointed to Google displaying data encoded using the Good Relations product/service ontology.
The nature of the RDFa syntax is such that it is often fairly straightforward to use multiple RDF vocabularies in RDFa e.g. triples using the same subject and object but different predicates can be encoded using a single RDFa attribute with multiple white-space-separated CURIEs - though things do tend to get more messy if the vocabularies are based on different models (e.g. time periods as literals v time periods as resources with properties of their own).
Google provides specific recommendations to content creators on the embedding of data to describe:
Yahoo and RDFa
Yahoo's support for RDFa is through its SearchMonkey platform. Like Google, it provides a set of "standard" result set enhancements, based on the use of specified RDF vocabularies for a small set of resource types:
- Local (Businesses/organizations)
- News Items
- Videos (e.g. to allow embedding in search results)
- (Flash) Documents (e.g. to allow embedding in search results)
- Games (e.g. to allow embedding in search results)
In addition, my understanding is that although Yahoo defines some RDF vocabularies of its own, and describes the use of specified vocabularies in the guidelines for the resource types above, it exposes any RDFa data in pages it indexes to developers on its SearchMonkey platform, to allow the building of custom search enhancements. Several existing vocabularies are discussed in the SearchMonkey guide and the FAQ in Appendix D of that document notes "You may use any RDF or OWL vocabulary".
The decentralised extensibility built into RDF means that a provider can choose to extend what data they expose beyond that specified in the guidelines mentioned above.
In addition, I tried to take into account some other general "good practice" points that have emerged from the work of the Linked Data community, captured in sources such as:
- the original How to Publish Linked Data on the Web tutorial by Chris Bizer, Richard Cyganiak and Tom Heath
- Cool URIs for the Semantic Web by Leo Sauermann and Richard Cyganiak
- Linked Data Patterns by Ian Davis and Leigh Dodds
So in the Eduserv case, for example (I hope!) URIs will be assigned to "things" like events, distinct from the pages describing them, with suitable redirects put in place on the HTTP server and syitable triples in the data linking those things and the corresponding pages.
Anyway, on the basis of the above sources, I tried to construct some suggestions, taking into acccount both the Google and Yahoo guidelines, for descriptions of people, organisations and events, which I'll post here in the next few entries.
Even more recently, of course, has come the news of Facebook's announcement at the f8 conference of their Open Graph Protocol. This makes use of RDFa embedded in the headers of XHTML pages using meta elements to provide (pretty minimal) metadata "about" things described by those pages (films, songs, people, places, hotels, restaurants etc - see the Facebook page for a full (and I imagine, growing) list of resource types supported).
Facebook makes use of the data to drive its "Like" application: a "button" can be embedded in the page to allow a Facebook user to post the data to their Fb account to signal an "I like this" relationship with the thing described. Or as Dare Obasanjo expresses it, an Fb user can add a node for the thing to their Fb social graph, making it into a "social object". This results in the data being displayed at appropriate points in their Fb stream, while the button displays, as a minimum, a count of the "likers" of the resource on the source page itself; logged-in Fb users would, I think, see information about whether any of their "friends" had liked it.
My reporting of these details of the interface is somewhat "second-hand" as I no longer use Facebook - I deleted my account some time ago because I was concerned about their approaches to the privacy of personal information (see these three recent posts by Tony Hirst for some thoughts on the most recent round of changes in that sphere).
Perhaps unsurprisingly given the popularity of Fb and its huge user base, the OGP announcement seems to have attracted a very large amount of attention within a very short period of time, and it may turn out to be a significant milestone for the use of XHTML-embedded metadata in general and of RDFa in particular. The substantial "carrot" of supporting the Fb "Like" application and attracting traffic from Fb users is likely to be the primary driver for many providers to generate this data, and indeed some commentators (see e.g. this BBC article) have gone as far as to suggest that this represents a move by Facebook to challenge Google as the primary filter of resources for people searching and navigating the Web.
However, I also think it is important to distinguish between the data on the one hand and that particular Facebook app on the other. Having this data available, minimal as it may be, also opens up the possibility of other applications by other parties making use of that same data.
And this is true also, of course, for the case of data constructed following the Google and Yahoo guidelines.