« March 2010 | Main | May 2010 »

April 27, 2010

RDFa 1.1 drafts available from W3C

Last week, the W3C RDFa Working Group announced the availability of two new "First Public Working Drafts" which it is circulating for comment:

Ivan Herman, the W3C Semantic Web Activity Lead, and a co-editor of these documents, has provided a very helpful summary of their main features, and particularly of some of the differences they introduce whem compared with the current W3C Recommendation for RDFa RDFa in XHTML: Syntax and Processing: A collection of attributes and processing rules for extending XHTML to support RDF. I think the intent is that the new drafts maintain compatibility with the current recommendation, in the sense that all the features used in XHTML+RDFa 1.0 are also present in RDFa 1.1. I should reiterate what Ivan says at the start of his piece: these are drafts and features may change based on feedback received.

Some of the most interesting features in these drafts, at least for data creators, are those which enable a more concise/compact style of RDFa markup. One of the criticisms of the initial version of RDFa, particularly from communities unfamiliar with RDF syntaxes, was the dependency on the use of prefixed names, in the form of CURIEs, two-part names made up of a "prefix" and a "reference", mapped to URIs by associating the prefix with a "base" URI, and concatenating the reference part of the CURIE with that URI. In XHTML the prefix-URI association was made through an XML Namespace Declaration. In particular, arguments against this approach focused on problems of "copy-and-paste", where a document fragment including RDFa markup was extracted from a source document without also copying the in-scope XML Namespace declarations, and as a result the RDF interpretation of the fragment in the context of a different (paste target) document was changed. More generally, there were some concerns that the use of prefixes was difficult to explain and understand, at least when compared with the "unprefixed name" styles typically adopted in approaches like microformats.

The new drafts introduce several mechanisms which can simplify markup for authors.

I should emphasise that my examples below are based on my fairly rapid reading of the drafts, and any errors and misrepresentations are mine!

The @vocab attribute

@vocab is a new RDFa attribute which provides a means for defining a "default" "vocabulary URI" to which the "terms" in attribute values are appended to construct a URI.

Aside: I should note here that I'm using the word "term" in the sense it is used in the RDFa 1.1 draft, where it refers to a datatype for a string used in an attribute value; this differs from usage by e.g. DCMI where "term" typically refers to a property, class, vocabulary encoding scheme or syntax encoding scheme i.e. to the "conceptual resource" identified by a DCMI URI, rather than to a syntactic component. In RDFa 1.1, "terms" have the syntactic constraints of the NCName production in the XML Namespaces specification.

This mechanism provides an alternative to the use of a CURIE (with prefix, reference and namespace declaration) to represent a URI.

Consider an example based on those from my recent post about RDFa (1.0) and document metadata (This is a "hybrid" of the examples 1.1.5, 1.3.5, and 2.1.5 in that post):

XHTML+RDFa 1.0:

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
    "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:dc="http://purl.org/dc/terms/"
      xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
      version="XHTML+RDFa 1.0">
      
  <head>
    <title>My World Cup 2010 Review</title>
  </head>

  <body>
    <h1 property="dc:title">My World Cup 2010 Review</h1>

    <p>About: 
      <a rel="dc:subject"
        href="http://example.org/resource/2010_FIFA_World_Cup">
        The 2010 World Cup
      </a>
    </p>

    <p>Date last modified: 
      <span property="dc:modified"
        datatype="xsd:date">2010-07-04</span>
    </p>

  </body>
  
</html>

This represents the following three triples (in Turtle):

@prefix dc: <http://purl.org/dc/terms/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix ex: <http://example.org/resource/> .

<> dc:title "My World Cup 2010 Review" .
<> dc:subject ex:2010_FIFA_World_Cup .
<> dc:modified "2010-07-04"^^xsd:date .

XHTML+RDFa 1.1 using @vocab:

Using the @vocab attribute on the body element to set http://purl.org/dc/terms/ as the default vocabulary URI, I could write this as:

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.1//EN"
    "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-2.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
      version="XHTML+RDFa 1.1">
      
  <head>
    <title>My World Cup 2010 Review</title>
  </head>

  <body vocab="http://purl.org/dc/terms/">
    <h1 property="title">My World Cup 2010 Review</h1>

    <p>About: 
      <a rel="subject"
        href="http://example.org/resource/2010_FIFA_World_Cup">
        The 2010 World Cup
      </a>
    </p>

    <p>Date last modified: 
      <span property="modified"
        datatype="xsd:date">2010-07-04</span>
    </p>

  </body>
  
</html>

In that case, where just three properties are referenced, the reduction in the number of characters is minimal, but if several properties from the same vocabulary were referenced, then the saving could be more substantial.

The @vocab approach provides limited help where, as is often the case, terms from multiple RDF vocabularies are used in combination (e.g. the example above continues to use a CURIE for the URI of the XML Schema date datatype), but other features of RDFa 1.1 are useful in those cases.

RDFa Profiles and the @profile attribute

Perhaps more powerful than the @vocab attribute is the new RDFa 1.1 feature known as the RDFa profile, and the @profile attribute:

RDFa Profiles are optional external documents that define collections of terms and/or prefix mappings. These documents must be defined in an approved RDFa Host Language (currently XHTML+RDFa [XHTML-RDFA]). They may also be defined in other RDF serializations as well (e.g., RDF/XML [RDF-SYNTAX-GRAMMAR] or Turtle [TURTLE]). RDFa Profiles are referenced via @profile, and can be used by document authors to simplify the task of adding semantic markup.

Let's take each of these two functions - defining terms and defining prefix mappings - in turn.

Defining term mappings in an RDFa profile

An RDFa profile can provide mappings between "terms" and URIs. The following example provides four such "term mappings", for the URIs of three properties from the DC Terms RDF vocabulary and for the URI of one XML Schema datatype:

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.1//EN"
    "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-2.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:rdfa="http://www.w3.org/ns/rdfa#"
      version="XHTML+RDFa 1.1">
      
  <head>
    <title>My RDFa Profile for a few DC and XSD terms</title>
  </head>

  <body>

    <h1>My RDFa Profile for a few DC and XSD terms</h1>

    <ul>

      <li typeof="rdfa:TermMapping">
        <span property="rdfa:term">title</span> :
        <span property="rdfa:uri">http://purl.org/dc/terms/title</span>
      </li>

      <li typeof="rdfa:TermMapping">
        <span property="rdfa:term">about</span> :
        <span property="rdfa:uri">http://purl.org/dc/terms/subject</span>
      </li>

      <li typeof="rdfa:TermMapping">
        <span property="rdfa:term">modified</span> :
        <span property="rdfa:uri">http://purl.org/dc/terms/modified</span>
      </li>

      <li typeof="rdfa:TermMapping">
        <span property="rdfa:term">xsddate</span> :
        <span property="rdfa:uri">http://www.w3.org/2001/XMLSchema#date</span>
      </li>

    </ul>

  </body>
  
</html>

Note that - in contrast to the case of CURIE references - the content of the "term" doesn't have to match the trailing characters of the URI; so for example, here I've mapped the term "about" to the URI http://purl.org/dc/terms/subject. So sets of "terms" corresponding to various community-specific or domain=specific lexicons could be mapped to a single set of URIs.

Also a single RDFa profile might provide mappings for URIs from different URI owners - the example above reference three DCMI-owned URIs for properties and a W3C-owned URI for a datatype. Conversely, different subsets of URIs owned by a single agency may be referenced in different RDFa profiles.

If the URI of this RDFa profile is http://example.org/profile/terms/, then I can reference it in an XHTML+RDFa 1.1 document, and make use of the term mappings it defines. So taking the example above again, and now using @profile to reference the profile and its term mappings:

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.1//EN"
    "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-2.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      version="XHTML+RDFa 1.1">
      
  <head>
    <title>My World Cup 2010 Review</title>
  </head>

  <body profile="http://example.org/profile/terms/">
    <h1 property="title">My World Cup 2010 Review</h1>

    <p>About: 
      <a rel="about"
        href="http://example.org/resource/2010_FIFA_World_Cup">
        The 2010 World Cup
      </a>
    </p>

    <p>Date last modified: 
      <span property="modified"
        datatype="xsddate">2010-07-04</span>
    </p>

  </body>
  
</html>

The @profile attribute may appear on any XML element, so it is possible that an element with a @profile attribute referencing profile A may contain as a child element with a @profile attribute referencing profile B.

  <body profile="http://example.org/profile/a/">
    <h1 property="title">My World Cup 2010 Review</h1>

    <div profile="http://example.org/profile/b/">
 
      <p>About: 
        <a rel="about"
          href="http://example.org/resource/2010_FIFA_World_Cup">
          The 2010 World Cup
        </a>
      </p>

    </div>

  </body>
  

And the value of a single @profile attribute may be a whitespace-separated list of URIs.

  <body profile="http://example.org/profile/a/
    http://example.org/profile/b/">

  </body>

One of the questions I'm not quite sure about is what happens if the same "term" is mapped to different URIs in different profiles. I think, but I'm not 100% sure, only a single mapping is used and a single triple is generated, but I'm not sure about the precedence rules for determining which mapping is to be used.

As Ivan notes, probably the most common pattern for deploying RDFa profiles will be for the owners/publishers of RDF vocabularies (such as DCMI) to publish profiles for their vocabularies, and for data providers to simply reference those profiles, rather than creating their own.

Defining prefix mappings in an RDFa profile

RDFa 1.1 continues to support the use of XML Namespace Declarations to associate CURIE prefixes with URIs (see my first example above and the use of the XML Schema datatype) but it also introduces other mechanisms for achieving this. One of these is the ability to supply CURIE prefix to URI mappings in RDFa profiles.

The following example provides four such "prefix mappings", for the URIs of three DCMI vocabularies and for the URI of the XML Schema datatype vocabulary:

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.1//EN"
    "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-2.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:rdfa="http://www.w3.org/ns/rdfa#"
      version="XHTML+RDFa 1.1">
      
  <head>
    <title>My RDFa Profile for DC and XSD prefixes</title>
  </head>

  <body>

    <h1>My RDFa Profile for DC and XSD prefixes</h1>

    <ul>

      <li typeof="rdfa:PrefixMapping">
        <span property="rdfa:prefix">dc</span> :
        <span property="rdfa:uri">http://purl.org/dc/terms/</span>
      </li>

      <li typeof="rdfa:PrefixMapping">
        <span property="rdfa:prefix">dcam</span> :
        <span property="rdfa:uri">http://purl.org/dc/dcam/</span>
      </li>

      <li typeof="rdfa:PrefixMapping">
        <span property="rdfa:prefix">dcmitype</span> :
        <span property="rdfa:uri">http://purl.org/dc/dcmitype/</span>
      </li>

      <li typeof="rdfa:PrefixMapping">
        <span property="rdfa:prefix">xsd</span> :
        <span property="rdfa:uri">http://www.w3.org/2001/XMLSchema#</span>
      </li>

    </ul>

  </body>
  
</html>

If the URI of this RDFa profile is http://example.org/profile/prefixes/, then I can reference it in an XHTML+RDFa 1.1 document, and make use of the prefix mappings it defines. Taking the example above again, and using @profile to reference this second profile and its prefix mappings:

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.1//EN"
    "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-2.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      version="XHTML+RDFa 1.1">
      
  <head>
    <title>My World Cup 2010 Review</title>
  </head>

  <body profile="http://example.org/profile/prefixes/">
    <h1 property="dc:title">My World Cup 2010 Review</h1>

    <p>About: 
      <a rel="dc:subject"
        href="http://example.org/resource/2010_FIFA_World_Cup">
        The 2010 World Cup
      </a>
    </p>

    <p>Date last modified: 
      <span property="dc:modified"
        datatype="xsd:date">2010-07-04</span>
    </p>

  </body>
  
</html>

As in the case of term mappings, the issue arises of what happens in the case that two profiles provide different prefix-URI mappings for the same prefix. I think the CURIE datatype is based on the notion that at a point in a document, for prefix p, a single prefix-URI mapping is in force for that prefix, so I assume there are precedence rules for establishing which of the profile prefix mappings is to be applied.

Access to profiles and changes to triples?

Although the RDFa 1.1 profile mechanism is a powerful mechanism, it also introduces a new element of complexity for consumers of RDFa. In RDFa 1.0, an XHTML+RDFa document is "self-contained", by which I mean an RDFa processor can construct an interpretation of the document as a set of RDF triples using only the content of the document itself. In RDFa 1.1, however, the interpretation of terms and prefixes may be determined by the term mappings and prefix mappings specified in profiles external to the document containing the RDFa markup.

Consider my last example above. When the processor encounters the @profile attribute it retrieves the profile and obtains a list of prefix-URI mappings to be applied in subsequent processing, and when it encounters the CURIE "dc:title" it generates the URI http://purl.org/dc/terms/title

But if for some reason, the processor is unable to dereference the URI, and doesn't have a cached copy of the referenced profile, then it does not have those mappings available. In that case, for my example above, when the processor encounters the CURIE "dc:title" it would not have a mapping for the "dc" prefix, and (I think?) would instead (with the new "URI everywhere" rules in force) treat the string "dc:title" as a URI? (See e.g. the section on CURIE and URI Processing)

In the case where two profiles are referenced, and both provide a mapping for the same prefix, then it seems possible that the prefix mapping in force might change depending on the availability of access to the profiles.

I lurk on the RDFa WG list, and I've seen various discussions of how these sort of issues should be handled - see, for example, this thread on "What happens when you can't dereference a profile document?", though related issues surface in other discussions too. I suspect the current draft is far from the "last word" in this area, and these are the sort of issues on which the authors are seeking feedback.

Summary

I've focused here only on a few "highlights" of the RDFa 1.1 drafts, and Ivan's post covers a couple more which I won't discuss here (the use of the @prefix attribute to provide CURIE prefix mappings and the ability to use URIs in contexts where previously CURIEs were required), but I hope they give a flavour of the sort of functionality which is being introduced. The examples here are based on my understanding of the current drafts, but I may have made mistakes, so please do check out the drafts rather than relying on my interpretations.

It seems to me the WG is trying hard to address some of the criticisms made of RDFa 1.0, and to provide mechanisms that make the provision of RDFa markup simpler while retaining the power and flexibility of the syntax and ensuring that RDFa 1.0 data remains compatible. In particular, it seems to me the "term mapping" feature of RDFa profiles may be very useful in "shielding" data providers from some of the complexity of name-URI mappings and prefixed names, especially once the owners of commonly used RDF vocabularies start to make such profiles available.

However, such flexibility doesn't come without its own challenges. and it also seems that the profile mechanism in particular introduces some complexity which I imagine will become a focus of some discussion during the comment period for these drafts. Comments on the drafts themselves should be sent to the RDFa Working Group list.

April 22, 2010

Document metadata using DC-HTML and using RDFa

In the context of various bits and pieces of work recently (more of which I'll write about in some upcoming posts), I've been finding myself describing how document metadata that can be represented using DCMI's DC-HTML meta data profile, described in Expressing Dublin Core metadata using HTML/XHTML meta and link elements, might also be represented using RDFa. (N.B. Here I'm considering only the current RDFa in XHTML W3C Recommendation, not the newly announced drafts for RDFa 1.1). So I thought I'd quickly list some examples here. Please note: I don't intend this to be a complete tutorial on using RDFa. Far from it; here I focus only on the case of "document metadata" whereas of course RDFa can be used to represent data "about" any resources. And these are really little more than a few rough notes which one day I might reuse somewhere else.

I really just wanted to illustrate that:

  • in terms of its use with the XHTML meta and link elements, RDFa has many similarities to the DC-HTML profile - unsurprisingly, as the RDF model underlies both; and
  • RDFa also provides the power and flexibility to represent data that can not be expressed using the DC-HTML profile.

The main differences between using RDFa in XHTML and using the DC-HTML profile are:

  • RDFa supports the full RDF model, not just the particular subset supported by DC-HTML
  • RDFa introduces some new XML attributes (@about, @property, @resource, @datatype, @typeof)
  • RDFa uses a datatype called CURIE for the abbreviation of URIs; DC-HTML uses a prefixed name convention which is essentially specific to that profile (though it was also adopted by the Embedded RDF profile)
  • Perhaps most significantly, RDFa can be used anywhere in an XHTML document, so the same syntactic conventions can be used both for document metadata and for data ("about" any resources) embedded in the body of the document

I'm presenting these examples following the description set model of the DCMI Abstract Model, and in more or less the same order that the DC-HTML specification presents the same set of concepts.

For each example, I present the data:

  • using DC-Text
  • using Turtle
  • in XHTML using DC-HTML
  • in XHTML+RDFa, using meta and link elements
  • in XHTML+RDFa, using block and inline elements (to illustrate that the same data could be embedded in the body of an XHTML document, rather than only in the head)

As an aside, it is possible to use the DC-HTML profile alongside RDFa in the same document, but I haven't bothered to show that here.

Footnote: Hmmm. Considering that I said to myself at the start of the year that I was rather tired of thinking/writing about syntax, I still seem to be doing an awful lot of it! Will try to write about other things soon....

1. Literal Value Surrogates

See DC-HTML 4.5.1.2.

1.1 Plain Value String

1.1.1 DC-Text:

@prefix dc: <http://purl.org/dc/terms/> .
DescriptionSet (
  Description (
    Statement (
      PropertyURI ( dc:title )
      LiteralValueString ( "My World Cup 2010 Review" )
    )
  )
)

1.1.2 Turtle:

@prefix dc: <http://purl.org/dc/terms/> .
<> dc:title "My World Cup 2010 Review" .

1.1.3 XHTML using DC-HTML:

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">

  <head 
    profile="http://dublincore.org/documents/2008/08/04/dc-html/">
    <title>My World Cup 2010 Review</title>
    <link rel="schema.DC" href="http://purl.org/dc/terms/" />
    <meta name="DC.title" content="My World Cup 2010 Review" />
  </head>

</html>

1.1.4 XHTML+RDFa using meta and link

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
    "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:dc="http://purl.org/dc/terms/"
      version="XHTML+RDFa 1.0">
      
  <head>
    <title>My World Cup 2010 Review</title>
    <meta property="dc:title" content="My World Cup 2010 Review" />
  </head>

</html>

In this example, it would also be possible to simply add an attribute to the title element, instead of introducing the meta element:

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
    "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:dc="http://purl.org/dc/terms/"
      version="XHTML+RDFa 1.0">
      
  <head>
    <title property="dc:title">My World Cup 2010 Review</title>
  </head>

</html>

1.1.5 XHTML+RDFa in body

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
    "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:dc="http://purl.org/dc/terms/"
      version="XHTML+RDFa 1.0">
      
  <head>
    <title>My World Cup 2010 Review</title>
  </head>

  <body>
    <h1 property="dc:title">My World Cup 2010 Review</h1>
  </body>
  
</html>

1.2 Plain Value String with Language Tag

1.2.1 DC-Text:

@prefix dc: <http://purl.org/dc/terms/> .
DescriptionSet (
  Description (
    Statement (
      PropertyURI ( dc:title )
      LiteralValueString ( "My World Cup 2010 Review" 
        Language ( en )
      )
    )
  )
)

1.2.2 Turtle:

@prefix dc: <http://purl.org/dc/terms/> .
<> dc:title "My World Cup 2010 Review"@en .

1.2.3 XHTML using DC-HTML:

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">

  <head 
    profile="http://dublincore.org/documents/2008/08/04/dc-html/">
    <title>My World Cup 2010 Review</title>
    <link rel="schema.DC" href="http://purl.org/dc/terms/" />
    <meta name="DC.title" 
      xml:lang="en" content="My World Cup 2010 Review" />
  </head>

</html>

1.2.4 XHTML+RDFa using meta and link

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
    "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:dc="http://purl.org/dc/terms/"
      version="XHTML+RDFa 1.0">
      
  <head>
    <title>My World Cup 2010 Review</title>
    <meta property="dc:title"
      xml:lang="en" content="My World Cup 2010 Review" />
  </head>

</html>

1.2.5 XHTML+RDFa in body

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
    "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:dc="http://purl.org/dc/terms/"
      version="XHTML+RDFa 1.0">
      
  <head>
    <title>My World Cup 2010 Review</title>
  </head>

  <body>
    <h1 property="dc:title" xml:lang="en">My World Cup 2010 Review</h1>
  </body>
  
</html>

1.3 Typed Value String

1.3.1 DC-Text:

@prefix dc: <http://purl.org/dc/terms/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
DescriptionSet (
  Description (
    Statement (
      PropertyURI ( dc:modified )
      LiteralValueString ( "2010-07-04"
        SyntaxEncodingSchemeURI ( xsd:date )
      )
    )
  )
)

1.3.2 Turtle:

@prefix dc: <http://purl.org/dc/terms/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
<> dc:modified "2010-07-04"^^xsd:date .

1.3.3 XHTML using DC-HTML:

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">

  <head 
    profile="http://dublincore.org/documents/2008/08/04/dc-html/">
    <title>My World Cup 2010 Review</title>
    <link rel="schema.DC" href="http://purl.org/dc/terms/" />
    <link rel="schema.XSD" href="http://www.w3.org/2001/XMLSchema#" >
    <meta name="DC.modified" 
      scheme="XSD.date" content="2010-07-04" />
  </head>

</html>

1.3.4 XHTML+RDFa using meta and link

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
    "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:dc="http://purl.org/dc/terms/"
      xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
      version="XHTML+RDFa 1.0">
      
  <head>
    <title>My World Cup 2010 Review</title>
    <meta property="dc:modified" 
      datatype="xsd:date" content="2010-07-04" />
  </head>

</html>

1.3.5 XHTML+RDFa in body

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
    "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:dc="http://purl.org/dc/terms/"
      xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
      version="XHTML+RDFa 1.0">
      
  <head>
    <title>My World Cup 2010 Review</title>
  </head>

  <body>
    <p>Date last modified: 
      <span property="dc:modified"
        datatype="xsd:date">2010-07-04</span>
    </p>
  </body>
  
</html>

2. Non-Literal Value Surrogates

See DC-HTML 4.5.2.2.

2.1 Value URI

2.1.1 DC-Text:

@prefix dc: <http://purl.org/dc/terms/> .
@prefix ex: <http://example.org/resource/> .
DescriptionSet (
  Description (
    Statement (
      PropertyURI ( dc:subject )
      ValueURI ( ex:2010_FIFA_World_Cup )
      )
    )
  )
)

2.1.2 Turtle:

@prefix dc: <http://purl.org/dc/terms/> .
@prefix ex: <http://example.org/resource/> .
<> dc:subject ex:2010_FIFA_World_Cup .

2.1.3 XHTML using DC-HTML:

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">

  <head 
    profile="http://dublincore.org/documents/2008/08/04/dc-html/">
    <title>My World Cup 2010 Review</title>
    <link rel="schema.DC" href="http://purl.org/dc/terms/" />
    <link rel="DC.subject"
      href="http://example.org/resource/2010_FIFA_World_Cup" />
  </head>

</html>

2.1.4 XHTML+RDFa using meta and link

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
    "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:dc="http://purl.org/dc/terms/"
      version="XHTML+RDFa 1.0">
      
  <head>
    <title>My World Cup 2010 Review</title>
    <link rel="dc:subject"
      href="http://example.org/resource/2010_FIFA_World_Cup" />
  </head>

</html>

2.1.5 XHTML+RDFa in body

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
    "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:dc="http://purl.org/dc/terms/"
      version="XHTML+RDFa 1.0">
      
  <head>
    <title>My World Cup 2010 Review</title>
  </head>

  <body>
    <p>About: 
      <a rel="dc:subject"
        href="http://example.org/resource/2010_FIFA_World_Cup">
        The 2010 World Cup
      </a>
    </p>
  </body>
  
</html>

2.2 Value URI with Plain Value String

2.2.1 DC-Text:

@prefix dc: <http://purl.org/dc/terms/> .
@prefix ex: <http://example.org/resource/> .
DescriptionSet (
  Description (
    Statement (
      PropertyURI ( dc:subject )
      ValueURI ( ex:2010_FIFA_World_Cup )
      ValueString ( "2010 FIFA World Cup" )
      )
    )
  )
)

2.2.2 Turtle:

@prefix dc: <http://purl.org/dc/terms/> .
@prefix ex: <http://example.org/resource/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
<> dc:subject ex:2010_FIFA_World_Cup .
ex:2010_FIFA_World_Cup rdf:value "2010 FIFA World Cup" .

2.2.3 XHTML using DC-HTML:

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">

  <head 
    profile="http://dublincore.org/documents/2008/08/04/dc-html/">
    <title>My World Cup 2010 Review</title>
    <link rel="schema.DC" href="http://purl.org/dc/terms/" />
    <link rel="DC.subject"
      href="http://example.org/resource/2010_FIFA_World_Cup"
      title="2010 FIFA World Cup" />
  </head>

</html>

2.2.4 XHTML+RDFa using meta and link

Here the single DCAM statement is made up of two RDF triples, and in RDFa both a link and a meta element are used:

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
    "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:dc="http://purl.org/dc/terms/"
      xmlns:ex="http://example.org/resource/"
      xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
      version="XHTML+RDFa 1.0">
      
  <head>
    <title>My World Cup 2010 Review</title>
    <link rel="dc:subject"
      href="http://example.org/resource/2010_FIFA_World_Cup" />
    <meta about="[ex:2010_FIFA_World_Cup]"
      property="rdf:value" content="2010 FIFA World Cup" />
  </head>

</html>

2.2.5 XHTML+RDFa in body

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
    "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:dc="http://purl.org/dc/terms/"
      xmlns:ex="http://example.org/resource/"
      xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
      version="XHTML+RDFa 1.0">
      
  <head>
    <title>My World Cup 2010 Review</title>
  </head>

  <body>
    <p>About: 
      <a rel="dc:subject"
        href="http://example.org/resource/2010_FIFA_World_Cup">
        <span property="rdf:value">2010 FIFA World Cup</span>
      </a>
    </p>
  </body>
  
</html>

2.3 Value URI with Plain Value String with Language Tag

2.3.1 DC-Text:

@prefix dc: <http://purl.org/dc/terms/> .
@prefix ex: <http://example.org/resource/> .
DescriptionSet (
  Description (
    Statement (
      PropertyURI ( dc:subject )
      ValueURI ( ex:2010_FIFA_World_Cup )
      ValueString ( "2010 FIFA World Cup" 
        Language ( en )
        )
      )
    )
  )
)

2.3.2 Turtle:

@prefix dc: <http://purl.org/dc/terms/> .
@prefix ex: <http://example.org/resource/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
<> dc:subject ex:2010_FIFA_World_Cup .
ex:2010_FIFA_World_Cup rdf:value "2010 FIFA World Cup"@en .

2.3.3 XHTML using DC-HTML:

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">

  <head 
    profile="http://dublincore.org/documents/2008/08/04/dc-html/">
    <title>My World Cup 2010 Review</title>
    <link rel="schema.DC" href="http://purl.org/dc/terms/" />
    <link rel="DC.subject"
      href="http://example.org/resource/2010_FIFA_World_Cup"
      xml:lang="en" title="2010 FIFA World Cup" />
  </head>

  </html>

2.3.4 XHTML+RDFa using meta and link

Again, the single DCAM statement is made up of two RDF triples, and in RDFa both a link and a meta element are used:

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
    "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:dc="http://purl.org/dc/terms/"
      xmlns:ex="http://example.org/resource/"
      xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
      version="XHTML+RDFa 1.0">
      
  <head>
    <title>My World Cup 2010 Review</title>
    <link rel="dc:subject"
      href="http://example.org/resource/2010_FIFA_World_Cup" />
    <meta about="[ex:2010_FIFA_World_Cup]"
      property="rdf:value" 
      xml:lang="en" content="2010 FIFA World Cup" />
  </head>

</html>

With RDFa, multiple value strings might be provided, using multiple meta elements (which is not supported in DC-HTML):

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
    "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:dc="http://purl.org/dc/terms/"
      xmlns:ex="http://example.org/resource/"
      xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
      version="XHTML+RDFa 1.0">
      
  <head>
    <title>My World Cup 2010 Review</title>
    <link rel="dc:subject"
      href="http://example.org/resource/2010_FIFA_World_Cup" />
    <meta about="[ex:2010_FIFA_World_Cup]"
      property="rdf:value" 
      xml:lang="en" content="2010 FIFA World Cup" />
    <meta about="[ex:2010_FIFA_World_Cup]"
      property="rdf:value" 
      xml:lang="es" content="Copa Mundial de FĂștbol de 2010" />
  </head>

</html>

2.3.5 XHTML+RDFa in body

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
    "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:dc="http://purl.org/dc/terms/"
      xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
      version="XHTML+RDFa 1.0">
      
  <head>
    <title>My World Cup 2010 Review</title>
  </head>

  <body>
    <p>About: 
      <a rel="dc:subject" 
        href="http://example.org/resource/2010_FIFA_World_Cup">
        <span property="rdf:value" 
	  xml:lang="en">2010 FIFA World Cup</span>
      </a>
    </p>
  </body>
  
</html>

2.4 Value URI with Typed Value String

2.4.1 DC-Text:

@prefix dc: <http://purl.org/dc/terms/> .
@prefix ex: <http://example.org/resource/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
DescriptionSet (
  Description (
    Statement (
      PropertyURI ( dc:language )
      ValueURI ( ex:English )
      ValueString ( "en"
        SyntaxEncodingSchemeURI ( xsd:language )
      )
    )
  )
)

2.4.2 Turtle:

@prefix dc: <http://purl.org/dc/terms/> .
@prefix ex: <http://example.org/resource/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
<> dc:language ex:English .
ex:English rdf:value "en"^^xsd:language .

2.4.3 XHTML using DC-HTML:

Not supported by DC-HTML.

2.4.4 XHTML+RDFa using meta and link

Again, the single DCAM statement is made up of two RDF triples, and in RDFa both a link and a meta element are used:

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
    "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:dc="http://purl.org/dc/terms/"
      xmlns:ex="http://example.org/resource/"
      xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
      xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
      version="XHTML+RDFa 1.0">
      
  <head>
    <title>My World Cup 2010 Review</title>
    <link rel="dc:language"
      href="http://example.org/resource/English" />
    <meta about="[ex:English]"
      property="rdf:value" datatype="xsd:language" content="en" />
  </head>

</html>

2.4.5 XHTML+RDFa in body

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
    "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:dc="http://purl.org/dc/terms/"
      xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
      xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
      version="XHTML+RDFa 1.0">
      
  <head>
    <title>My World Cup 2010 Review</title>
  </head>

  <body>
    <p>Language:
      <a rel="dc:language"
        href="http://example.org/resource/English">
        <span property="rdf:value" 
          datatype="xsd:language" content="en">English</span>
      </a>
    </p>
  </body>

</html>

2.5 Value URI with Vocabulary Encoding Scheme URI

2.5.1 DC-Text:

@prefix dc: <http://purl.org/dc/terms/> .
@prefix ex: <http://example.org/resource/> .
DescriptionSet (
  Description (
    Statement (
      PropertyURI ( dc:subject )
      ValueURI ( ex:2010_FIFA_World_Cup )
      VocabularyEncodingSchemeURI ( ex:MyScheme )
      )
    )
  )
)

2.5.2 Turtle:

@prefix dc: <http://purl.org/dc/terms/> .
@prefix dcam: <http://purl.org/dc/dcam/> .
@prefix ex: <http://example.org/resource/> .
<> dc:subject ex:2010_FIFA_World_Cup .
ex:2010_FIFA_World_Cup dcam:memberOf ex:MyScheme .

2.5.3 XHTML using DC-HTML:

Not supported by DC-HTML.

2.5.4 XHTML+RDFa using meta and link

Again, the single DCAM statement is made up of two RDF triples, and in XHTML using RDFa two link elements are used:

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
    "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:dc="http://purl.org/dc/terms/"
      xmlns:dcam="http://purl.org/dc/dcam/"
      xmlns:ex="http://example.org/resource/"
      version="XHTML+RDFa 1.0">
      
  <head>
    <title>My World Cup 2010 Review</title>
    <link rel="dc:subject"
      href="http://example.org/resource/2010_FIFA_World_Cup" />
    <link about="[ex:2010_FIFA_World_Cup]"
      rel="dcam:memberOf"
      href="http://example.org/resource/MyScheme" />
  </head>

</html>

2.5.5 XHTML+RDFa in body

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
    "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:dc="http://purl.org/dc/terms/"
      xmlns:dcam="http://purl.org/dc/dcam/"
      version="XHTML+RDFa 1.0">
      
  <head>
    <title>My World Cup 2010 Review</title>
  </head>

  <body>
    <p>About: 
      <a rel="dc:subject"
        href="http://example.org/resource/2010_FIFA_World_Cup">
	<span rel="dcam:memberOf"
	  resource="http://example.org/resource/MyScheme" />
	The 2010 World Cup
      </a>
    </p>
  </body>
  
</html>

April 13, 2010

A small GRDDL (XML, really) gotcha

I've written previously here about DCMI's use of an HTML meta data profile for document metadata, and the use of a GRDDL profile transformation to extract RDF triples from an XHTML document. DCMI has had made use of an HTML profile for many years, but providing a "GRDDL-enabled" version is a more recent development - and it is one which I admit I was quietly quite pleased to see put in place, as I felt it illustrated rather neatly how DCMI was trying to implement some of the "follow your nose" principles of Web Architecture.

A little while ago, I noticed that the Web-based tools which I usually use to test GRDDL processing (the W3C GRDDL service and the librdf parser demonstrator) were generating errors when I tried to process documents which reference the profile. I've posted a more detailed account of my investigations to the dc-architecture Jiscmail list, and I won't repeat them all here, but in short it comes down to the use of the entity references (&nbsp; and &copy;) in the profile document, which itself is subject to a GRDDL transformation to extract the pointer to the profile transformation.

The problem arises because XHTML defines those entity references in the XHTML DTD, i.e. externally to the document itself, and a non-validating XML processor is not required to read that DTD when parsing the document, with the consequence that it fails to resolve the references - and there's no guarantee that a GRDDL processor will employ a validating parser. There's a more extended discussion of these issues in a post by Lachlan Hunt from 2005 which concludes:

Character entity references can be used in HTML and in XML; but for XML, other than the 5 predefined entities, need to be defined in a DTD (such as with XHTML and MathML). The 5 predefined entities in XML are: &amp;, &lt;, &gt;, &quot; and &apos;. Of these, you should note that &apos; is not defined in HTML. The use of other entities in XML requires a validating parser, which makes them inherently unsafe for use on the web. It is recommended that you stick with the 5 predefined entity references and numeric character references, or use a Unicode encoding.

And the GRDDL specification itself cautions :

Document authors, particularly XHTML document authors, who wish their documents to be unambiguous when used with GRDDL should avoid dependencies on an external DTD subset; specifically:

  • Explicitly include the XHTML namespace declaration in an XHTML document, or an appropriate namespace in an XML document.
  • Avoid use of entity references, except those listed in section 4.6 of the XML specification.
  • And, more generally, follow the rules listed for the standalone document validity constraint.

A note will be added to the DC-HTML profile document to emphasise this point (and the offending references removed).

I guess I was surprised that no-one else had reported the error, particularly as it potentially affects the processing of all instance documents. The fact that they hadn't does rather lends weight to the suspicion that I voiced here a few weeks ago that it may well be that few implementers are actually making use of the DC-HTML GRDDL profile transformation.

April 08, 2010

Linked Data & Web Search Engines

I seem to have fallen into the habit of half-writing posts and then either putting them to one side because I'm don't feel entirely happy with them or because I get diverted into other more pressing things. This is one of several that I seem to have accumulated over the last few weeks, and which I've resolved to try to get out there....

A few weekends ago I spotted a brief exchange on Twitter between Andy and our colleague Mike Ellis on the impact of exposing Linked Data on Google search ranking. Their conclusion seemed to be that the impact was minimal. I think I'd question this assessment, and here I'll try to explain why - though in the absence of empirical evidence, I admit this is largely speculation on my part, a "hunch", if you like. I admit I almost hesitate to write this post at all, as I am far from an expert in "search-engine optimisation", and, tbh, I have something of an instinctive reaction against a notion that a high Google search ranking is the "be all and end all" :-) But I recognise it is something that many content providers care about.

In this post, I'm not considering the ways search engines might use the presence of structured data in the documents they index to enhance result sets (or make that data available to developers to provide such enhancements); rather, I'm thinking about the potential impact of the Linked Data approach on ranking.

It is widely recognised that one of the significant factors in Google's page ranking algorithm is the weighting it attaches to the number of links made to the page in question from other pages ("incoming links"). Beyond that, the recommendations of the Google Webmaster guidelines seem to be largely "common sense" principles for providing well-formed X/HTML, enabling access to your document set for Google's crawlers, and not resorting to techniques that attempt to "game" the algorithm.

Let's go back to Berners-Lee's principles for Linked Data:

  1. Use URIs as names for things

  2. Use HTTP URIs so that people can look up those names.

  3. When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL)

  4. Include links to other URIs so that they can discover more things.

The How to Publish Linked Data on the Web and the W3C Note on Cool URIs for the Semantic Web elaborate on some of the mechanics of providing Linked Data. Both of these sources make the point that to "provide useful information" means to provide data both in RDF format(s) and in human-readable forms.

So following those guidelines typically means that "exposing Linked Data" results in exposing a whole lot of new Web documents "about" the things featured in the dataset, in both RDF/XML (or another RDF format) and in XHTML/HTML - and indeed the use of XHTML+RDFa could meet both requirements in a single format. So this immediately increases what Leigh Dodds of Talis rather neatly refers to as the "surface area" of my pages which are available for Google to crawl and index.

The second aspect which is significant is that, by definition, Linked Data is about making links: I make links between items described in my own dataset, but I also make ("outgoing") links between those items and items described in other linked datasets made available by other parties elsewhere. And (hopefully!), at least in time, other people exposing Linked Data make ("incoming") links to items in my datasets.

And in the X/HTML pages at least, those are the very same hyperlinks that Google crawls and counts when calculating its pagerank.

The key point, I think, is that my pages are available, not just to other "Linked Data applications", but also for other people to reference, bookmark and make links to just as they do any page on any Web site. This is one of the points I was trying to highlight in my last post when I mentioned the BBC's Linked Data work: the pages generated as part of those initiatives are fairly seamlessly integrated within the collection of documents that make up the BBC Web site. They do not appear as some sort of separate "data area", something just for "client apps that want data", somehow "different from" the news pages or the IPlayer pages; on the contrary, they are linked to by those other pages, and the X/HTML pages are given a "look and feel" that emphasises their membership of this larger aggregation. And human readers of the BBC Web site encounter those pages in the course of routinely navigating the site.

Of course the key to increasing the rank of my pages in Google is whether other people actually make those links to pages I expose, and it may well be that for much of the data surfaced so far, such links are relatively small in number. But the Linked Data approach, and its emphasis on the use of URIs and links, helps me do my bit to make sure my resources are things "of (or in) the Web".

So I'd argue that the Linked Data approach is potentially rather a good fit with what we know of the way Google indexes and ranks pages - precisely because both approaches seek to "work with the grain of the Web". I'd stick my neck out and say that having a page about my event (project, idea, whatever) provides a rather better basis for making that information findable in Google than exposing that description only as the content of a particular row in an Excel spreadsheet, where it is difficult to reference as an individual target resource and where it is (typically at least) not a source of links to other resources.

As I was writing this, I saw a new post appear from Michael Hausenblas, in which he attempts to categorise some common formats and services according to what he calls their "Link Factor" ("the degree of how 'much' they are in the Web". And more recently, I noticed the appearance of a a post titled 10 Reasons Why News Organizations Should Use 'Linked Data' which, in its first two points, highlights the importance of Linked Data's use of hyperlinks and URIs to SEO - and points to the fact that the BBC's Wildlife Finder pages do tend to appear prominently in Google result sets.

Before I get carried away, I should add a few qualifiers, and note some issues which I can imagine may have some negative impact. And I should emphasise this is just my thinking out loud here - I think more work is necessary to examine the actual impact, if any.

  • Redirects: Many of the links in Linked Data are made between "things", rather than between the pages describing the things. And following the "Cool URIs" guidelines, these URIs would either be URIs with fragment identifiers ("hash URIs") or URIs for which an HTTP server responds with a 303 response providing the URI of a document describing the thing. For the first case, I think Google recognises these as links to the document with the URI obtained by stripping the fragment id; for the 303 case, I'm unsure about the impact of the use of the redirect on the ranking for the document which is the final target. (A related issue would be that some sources might cite the URI of the thing and other sources might cite the URI of the document describing the thing).
  • Synonyms: As the Publishing Linked Data tutorial highlights, one of the characteristics of Linked Data is that it often makes use of URI aliases, multiple URIs each referring to the same resource. If some users bookmark/cite URI A and some users bookmark/cite URI B, then that would result in a lower link-based ranking for each of the two pages describing the thing than if all users bookmarked/cited a single URI. To some extent, this is just part of the nature of the Web, and it applies similarly outside the Linked Data context, but the tendency to generate an increasing number of aliases is something which generates continued discussion in the LD community (see, for example, the recent thread on "annotation" on the public-lod mailing list generated in response to Leigh Dodds' and Ian Davis' recent Linked Data Patterns document (which I should add, from my very hasty skim reading so far, seems to provide an excellent combination of thoughtful discussion and clear practical suggestions.)).
  • "Caching"/"Resurfacing": As we are seeing Linked Data being deployed, we are seeing data aggregated by various agencies and resurfaced on the Web using new URIs. Similarly to the previous point, this may lead to a case where two users cite different URIs, with a corresponding reduction in the number of incoming links to any single document. I also note that Google's guidelines include the admonition: "Don't create multiple pages, subdomains, or domains with substantially duplicate content", which does make me wonder whether such resurfaced content may have a negative impact on ranking.
  • "Good XHTML": While links are important, they aren't the whole story, and attention still needs to be paid to ensuring that HTML pages generated by a Linked Data application follow the sort of general good practice for "good Web pages" described in the Google guidelines (provide well-structured XHTML, use title elements, use alt attributes, don't fill with irrelevant keywords etc etc)
  • Sitemaps: This is probably just a special case of the previous point, but Google emphasises the importance of using sitemaps to provide entry points for its crawlers. Although I'm aware of the Semantic Sitemap extension, I'm not sure whether the use of sitemaps is widespread in Linked Data deployments - though it is the sort of thing I'd expect to see happen as Linked Data moves further out of the preserve of the "research project" and towards more widespread deployment.
  • "Granularity": (I'm unsure whether this is a factor or not: I can imagine it might be, but it's probably not simple to assess exactly what the impact is.) How a provider decides to "chunk up" their descriptive data into documents might have an impact on the "density" of incoming links. If they expose a large number of documents each describing a single specific resource, does that result in each document receiving fewer incoming links than if they expose a smaller number of documents each describing several resources?
  • Integration: Although above I highlighted the BBC as an example of Linked Data being well-integrated into a "traditional" Web site, and so made highly visible to users of that Web site, I suspect this may - at the moment at least - be the exception rather than the rule. However, as with the previous point, this is something I'd expect to become more common.

Nevertheless, I still stand by my "hunch" that the LD approach is broadly "good" for ranking. I'm not claiming Linked Data is a panacea for search-engine optimisation, and I admit that some of what I'm suggesting here may be "more potential than actual". But I do believe the approach can make a positive contribution - and that is because both the Google ranking algorithm and Linked Data exploit the URI and the hyperlink: they "work with the grain of the Web".

April 07, 2010

Linked Data business models

The by-line is mine but the content comes from Pete (in a message to the Dublin Core Advisory Board mailing list)... a list of blog posts and other resources that articulate some of the business cases/models around Linked Data:

About

Search

Loading
eFoundations is powered by TypePad