Google Rich Snippets
As ever, I'm slow off the mark with this, but last week's big news within the Web metadata and Semantic Web communities was the announcement by Google of a feature they are calling Rich Snippets, which provides support for the parsing of structured data within HTML pages - based on a selection of microformats and on RDFa using a specified RDF vocabulary - and the surfacing of that data in Google search result sets. In the first instance, at least, only a selected set of sources are being indexed, with the hope of extending participation soon (see the discussion in the O'Reilly interview with Othar Hansson and Guha.)
A number of commentators, including Ian Davis, Tom Scott, and Jeni Tennison have pointed out that Google's support for RDFa, at least as currently described, is somewhat partial, and its reliance on a centralised Google-owned URIspace for terms is at odds with RDF's support for the distributed creation of vocabularies - and indeed in coining that Google vocabulary, Google appears to have ignored the existence of some already widely deployed vocabularies.
Nevertheless, it's hard to disagree with Timothy O'Brien's recognition of the huge power that Google wields in this space:
Google is certainly not the first search engine to support RDFa and Microformats, but it certainly has the most influence on the search market. With 72% of the search market, Google has the influence to make people pay attention to RDFa and Microformats.
Or, to put it another way, we may be approaching a period in which, to quote Dries Buytaert of the Drupal project, "structured data is the new search engine optimization" - with, I might add, both the pros and cons that come with that particular emphasis!
One of the challenges to an approach based on consuming structured data from the open Web is, of course, that of dealing with inaccuracies, spammers and "gamers" - see, for example, Cory Doctorow's "metacrap" piece, from back in 2001. But as Jeni Tennison notes towards the end of her post, having Google in a position where they have an interest in tackling this problem must be a good thing for the data web community more widely:
They will now have a stake in answering the difficult questions around trust, confidence, accuracy and time-sensitivity of semantic information.
Google's announcement is also one of the topics discussed in the newly released Semantic Web Gang podcast from Talis, and in that conversation - which is well worth a listen as it covers many of the issues I've mentioned here and more besides - Tom Tague from Thomson-Reuters highlights another potential outcome when he expresses optimism that the interest in embedded metadata generated by the Google initiative will also provide an impetus for the development of other tools to consume that data, such as browser plug-ins.
Thinking about activities that I have some involvement in, I think the use of RDFa in general is an area that should be entering on the radar of the "repositories" community in their efforts to improve access to the outputs of scholarly research.
It's also an area that I think the Dublin Core Metadata Initiative should be engaging with. Embedding metadata in HTML pages with the intent of facilitating the discovery of those pages using search engines was probably one of the primary motivating cases, at least in the early days of work on DC, though of course there has historically been little support from the global search engines for the approach, in large part because of the sort of problems identified by Doctorow. The current DCMI recommendation for doing this makes use of an HTML metadata profile (associated with a GRDDL namespace transformation). While on the one hand, RDFa is "just another syntax for RDF", it might be useful for DCMI to produce a short document illustrating the use of RDFa (and perhaps to consider the use of RDFa in its own documents). Of course, as far as the use of DCMI's own RDF vocabularies in data exposed to Google is concerned, it remains to be seen whether support for RDF vocabularies other than Google's own will be introduced. (Having said that, it's also worth noting that one of the strengths of RDFa is that the attribute-based syntax is fairly amenable to the use of multiple vocabularies in combination.)
Finally, I think this is an area which Eduserv should be tracking carefully with regard to its relevance to the services it provides to the education sector and to the wider public sector in the UK: it seems notable that, as I mentioned a few weeks ago, some of the early deployments of RDFa have been within UK government services.