« Two ends and one start | Main | UMF Cloud Pilot update »

September 19, 2011

Things & their conceptualisations: SKOS, foaf:focus & modelling choices

I thought I'd try to write up some thoughts around an issue which I've come across in a few different contexts recently, and which as a shorthand I sometimes think of as "the foaf:focus question". It was prompted mainly by:

  • my work on modelling the Archives Hub data during the LOCAH project, and looking at datasets to which we wanted to make links, such as VIAF;
  • looking at the data model for the recent Linked Open BNB dataset from the British Library, and how some Dublin Core properties were being used, and some email discussions I had around that;
  • a recent message by Dan Brickley to the foaf-dev mailing list, explaining how the design of some FOAF properties was conditioned by the context at the time, and reflecting on how that context had changed, and what the implications of those changes might be.

Rather by chance on Friday evening, just as I was about to try to tie up what had become a rather long and rambling post, I noticed a conversation on Twitter, initiated by John Goodwin (@gothwin), which I think addressed the broader issue which I'd been circling around without quite addressing it: the use of different "modelling styles" and the issues which arise as a result when we try to link or merge data.

After much chopping and changing, the post adopts a very roughly "chronological" approach. The initial parts cover areas and activities that I wasn't directly involved in at the time, so I am providing my own retrospective interpretation based on my reading of the sources rather than a first-hand account, and I apologise in advance for any omissions or misrepresentations.

FOAF and "interests"

The FOAF Project was an initiative launched by a community of Semantic Web enthusisasts back in 2000, which explored the use of the - then newly emerging - Resource Description Framework specification to express information about individuals, their contact details and their interests and projects - the sort of information that was typically presented on a "personal home page" - and also some of the practical considerations in providing and consuming such information on the Web as RDF. The principle "deliverable" of the project is the Friend of a Friend (FOAF) RDF vocabulary, which continues to evolve and is now very widely used.

As Dan Brickley notes in his recent post, when looking at some of the FOAF properties from the perspective of 2011, their design may seem slightly "unwieldy" to the newcomer, and this is in part because their design was shaped by the context of how the Web was being used at the time of their creation, perhaps nine or ten years ago. At that point, as Dan notes, while there were URIs available for Web documents, and a growing recognition of the importance of maintaining the stability of those URIs, the use of http URIs to identify things other than documents was much less widely adopted, and we often lacked stable URIs for those things of other types that we wanted to "talk about".

One of the use cases covered by FOAF is to express the "interests" of an individual - where that "interest" might be an idea, a place, a person, an event or series of events, anything at all. To work around the issue of the availability of a URI of that thing, FOAF adopted a convention of "indirection" in some of its early properties. So, for example, the foaf:interest property expresses a relation, not between agent and idea/place/person (etc), but between agent and document: it says "this agent is interested in the topic of this page - the thing the page is 'about'" - where that might be anything at all. Using this convention, the topic itself is not explicitly referenced, so the question of its URI does not arise.

So, for example, to express an interest in the Napoleonic Wars, one might make use of the URI of the Wikipedia page 'about' that thing, and say:

@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix person: <http://example.org/id/person/> .

person:fred 
        foaf:interest
          <http://en.wikipedia.org/wiki/Napoleonic_Wars> .

A second property, foaf:topic_interest, does allow for the expression of that "direct" relationship, linking the agent to the thing of interest - which again might be anything at all. (I'm not sure whether thse two properties were created at the same time or whether one preceded the other). Even in the absence of URIs for concepts and people and places, RDF allows for the use of "blank nodes" to refer to such things.

@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix person: <http://example.org/id/person/> .

person:fred 
        foaf:topic_interest [ rdfs:label "Napoleonic Wars"@en ] .

However, a blank node is limited in its scope as an identifier to the graph within which it is found: if Fred provides a blank node for the notion of the Napoleonic Wars in his graph and Freda provides a blank node for an interest in her graph, I can't tell (from that information alone) whether those two nodes are references to the same thing or to two different things (e.g. the historical event and a book about the event). Again, historically, one solution to this problem was to introduce the URI of a document, together with some inferencing based on OWL:

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix person: <http://example.org/id/person/> .

person:fred 
        foaf:topic_interest 
          [ rdfs:label "Napoleonic Wars"@en ;
            foaf:isPrimaryTopicOf 
              <http://en.wikipedia.org/wiki/Napoleonic_Wars> ] .

According to the FOAF documentation for foaf:isPrimaryTopicOf:

The isPrimaryTopicOf property is inverse functional: for any document that is the value of this property, there is at most one thing in the world that is the primary topic of that document. This is useful, as it allows for data merging

i.e. if Freda's graph says:

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix person: <http://example.org/id/person/> .

person:freda 
        foaf:topic_interest 
          [ rdfs:label "Napoleonic Wars"@en ;
            foaf:isPrimaryTopicOf 
              <http://en.wikipedia.org/wiki/Napoleonic_Wars> ] .

then my application can conclude that they are indeed both interested in the same thing - though that does depend on that application having some "built-in knowledge" of the OWL inferencing rules (or access to another service which does).

http URIs for Things

The "httpRange-14 resolution" by the W3C Technical Architecture Group, the publication of the W3C Note on Cool URIs for the Semantic Web, the adoption of the principles of Linked Data and the emergence of a large number of datasets based on those principles has, of course, changed the landscape considerably, and the use of http URIs to identify things other than documents has become commonplace - even if there remain concerns about the practical challenges of implementing of some of the recommended techniques.

So, now DBpedia assigns a distinct http URI for the thing the Wikipedia page http://en.wikipedia.org/wiki/Napoleonic_Wars "is about", and provides a description of that thing in which it says (amongst other things):

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .

<http://dbpedia.org/resource/Napoleonic_Wars> 
        rdfs:label "Napoleonic Wars"@en ;
        a <http://dbpedia.org/ontology/Event> ,
          <http://dbpedia.org/ontology/MilitaryConflict> ,
          <http://umbel.org/umbel/rc/Event> ,
          <http://umbel.org/umbel/rc/ConflictEvent> ;
        foaf:page 
          <http://en.wikipedia.org/wiki/Napoleonic_Wars> .

i.e. that thing, the topic of the page, is an event, a military conflict etc.

We could substitute this new DBpedia URI for the blank nodes in our foaf:topic_interest data above:

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix person: <http://example.org/id/person/> .

person:fred 
        foaf:topic_interest 
          <http://dbpedia.org/resource/Napoleonic_Wars> .

<http://dbpedia.org/resource/Napoleonic_Wars>
        rdfs:label "Napoleonic Wars"@en ;
        foaf:isPrimaryTopicOf 
          <http://en.wikipedia.org/wiki/Napoleonic_Wars> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix person: <http://example.org/id/person/> .

person:freda 
        foaf:topic_interest
          <http://dbpedia.org/resource/Napoleonic_Wars> .

<http://dbpedia.org/resource/Napoleonic_Wars>
        rdfs:label "Napoleonic Wars"@en ;
        foaf:isPrimaryTopicOf 
          <http://en.wikipedia.org/wiki/Napoleonic_Wars> .

When those two graphs are merged, the use of the common URI now makes it trivial to determine that Fred and Freda share the same interest.

Concept Schemes, SKOS and Document Metadata

The other factor Dan mentions in his message was the emergence of of the Simple Knowledge Organisation System (SKOS) RDF vocabulary, which after a long evolution became a W3C Recommendation in 2009.

SKOS is designed to provide an RDF representation of the various flavours of "knowledge organisation systems" and "controlled vocabularies" which information managers have traditionally used to organise information about various resources (books in libraries, objects in museums etc etc etc).

The core class in SKOS is that of the concept (skos:Concept). Each concept can be labelled with one or more names; documented with notes of various types; grouped into collections; related to other concepts through relationships such as "broader"/"narrower"/"related"; or mapped to other concepts in other collections.

The Library of Congress has published several library thesauri/classification schemes/controlled vocabularies as SKOS RDF data, including the Library of Congress Subject Headings, which includes a concept named "Napoleonic Wars, 1800-1815" with the URI http://id.loc.gov/authorities/subjects/sh85089767 (this is a subset of the actual data provided):

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix lcsh: <http://id.loc.gov/authorities/subjects/> .

lcsh:sh85089767
        a skos:Concept ;
        rdfs:label "Napoleonic Wars, 1800-1815"@en ;
        skos:prefLabel "Napoleonic Wars, 1800-1815"@en ;
        skos:altLabel "Napoleonic Wars, 1800-1814"@en ;
        skos:broader lcsh:sh85045703 ;
        skos:narrower lcsh:sh85144863 ;
        skos:inScheme 
          <http://id.loc.gov/authorities/subjects> .

A metadata creator coming from a bibliographic background and providing Dublin Core-based metadata for the Wikipedia page http://en.wikipedia.org/wiki/Napoleonic_Wars might well use this concept URI to provide the "subject" of that page:

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix lcsh: <http://id.loc.gov/authorities/subjects/> .

<http://en.wikipedia.org/wiki/Napoleonic_Wars>
        a foaf:Document ;
        rdfs:label "Napoleonic Wars"@en ;
        dcterms:subject lcsh:sh85089767 .

And indeed the concept URI could (I think) also be used with the foaf:topic or foaf:primaryTopic properties:

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix lcsh: <http://id.loc.gov/authorities/subjects/> .

<http://en.wikipedia.org/wiki/Napoleonic_Wars>
        a foaf:Document ;
        rdfs:label "Napoleonic Wars"@en ;
        foaf:topic lcsh:sh85089767 ;
        foaf:primaryTopic lcsh:sh85089767 .

Note that all three of these properties (dcterms:subject, foaf:topic, and foaf:primaryTopic) are defined in such a way that they are not limited to being used with concepts. We've seen this above where the DBpedia data includes:

@prefix foaf: <http://xmlns.com/foaf/0.1/> .

<http://dbpedia.org/resource/Napoleonic_Wars> 
        foaf:page 
          <http://en.wikipedia.org/wiki/Napoleonic_Wars> .

which (since foaf:page and foaf:topic are inverse properties) implies:

@prefix foaf: <http://xmlns.com/foaf/0.1/> .

<http://en.wikipedia.org/wiki/Napoleonic_Wars> 
        foaf:topic 
          <http://dbpedia.org/resource/Napoleonic_Wars> .

It is also true for the dcterms:subject property. Although traditionally the Dublin Core community has highlighted the use of formal classification schemes like LCSH, and it may well be true that the dcterms:subject property is often used to link things to concepts, it is not limited to taking concepts as values, and one could also link the document to the event using dcterms:subject:

@prefix dcterms: <http://purl.org/dc/terms/> .

<http://en.wikipedia.org/wiki/Napoleonic_Wars> 
        dcterms:subject 
          <http://dbpedia.org/resource/Napoleonic_Wars> .

I should acknowledge here that some in the Dublin Core community might disagree with my last example above, and argue that the values of dcterms:subject should be concepts. I think my position is backed up by the current DCMI documentation, and particularly by the fact that when they assigned ranges to the DCMI Terms properties in 2008, the DCMI Usage Board did not specify a range for the dcterms:subject property i.e. the intention is that dcterms:subject may link to a resource of any type.

I also note in passing that when DCMI created the DCMI Abstract Model in its attempt to reflect the "classical view" of Dublin Core (perhaps best expressed in Tom Baker's "A Grammar of Dublin Core") in an RDF-based model, the notion of a "Vocabulary Encoding Scheme" was defined as a set of things of any type, not specifically as a set of concepts.

Things and their Conceptualisations: foaf:focus

The SKOS approach and the SKOS Concept class introduce a new sort of "indirection" from our "things-in-the-world". As Dan puts it in a message to the W3C public-esw-thes list:

a SKOS "butterflies" concept is a social and technological artifact designed to help interconnect descriptions of butterflies, documents (and data) about butterflies, and people with interest or expertise relating to butterflies. I'm quite consciously avoiding saying what a "butterflies" concept in SKOS "refers to", because theories of reference are hard to choose between. Instead, I prefer to talk about why we bother building SKOS and what we hope can be achieved by it.

So, although both the DBpedia URI http://dbpedia.org/resource/Napoleonic_Wars) and the Library of Congress URI http://id.loc.gov/authorities/subjects/sh85089767) may be used in the triple patterns shown above, those two URIs identify two different resources - and both of them are distinct from the Wikipedia page which we cited in the examples back at the very start of this post.

i.e. we now have three separate URIs identifying three separate resources:

  1. a Wikipedia page, a document, created and modified by Wikipedia contributors between 2002 and the present identified by the URI http://en.wikipedia.org/wiki/Napoleonic_Wars
  2. the Napoleonic Wars as event taking place between 1800 and 1815, something with a duration in time, which occurred in physical locations, and in which human beings participated, identified by the DBpedia URI http://dbpedia.org/resource/Napoleonic_Wars
  3. a "conceptualisation of" the Napoleonic Wars, "a social and technological artifact designed to help interconnect", an "abstraction" created by the authors of LCSH editors for the purposes of classifying works; it has "semantic" relationships to other concepts, and is identified by the Library of Congress URI http://id.loc.gov/authorities/subjects/sh85089767

As we've seen, properties like dcterms:subject, foaf:topic/foaf:page, foaf:primaryTopic/foaf:isPrimaryTopicOf provide the vocabulary to express the relationships between the first and second of these resources, and between the first and third. But what about the relationship between the second and third, between the "thing in the world" and its conceptualisation in a classification scheme? Or to make the issue concrete, what happens if, in their "interest" graphs, Fred cites the DBpedia event URI and Frida cites the LCSH concept URI? How do we establish that their interests are indeed related? Can a publisher of an SKOS concept scheme indicate a relationship between a concept and a "conceptualised thing" (person, place, event etc)?

Dan provides a rather neat diagram illustrating the issue, using the example of Ronald Reagan. The arcs Dan labels "it" represent this "missing" (at the time the diagram was drawn) relationship type/property.

The resolution was to create a new property in the FOAF vocabulary, called foaf:focus. (See this page on the FOAF Project Wiki) for some of the discussion of its name).

The FOAF vocabulary specification says of the property:

The focus property relates a conceptualisation of something to the thing itself. Specifically, it is designed for use with W3C's SKOS vocabulary, to help indicate specific individual things (typically people, places, artifacts) that are mentioned in different SKOS schemes (eg. thesauri).

W3C SKOS is based around collections of linked 'concepts', which indicate topics, subject areas and categories. In SKOS, properties of a skos:Concept are properties of the conceptualization (see 2005 discussion for details); for example administrative and record-keeping metadata. Two schemes might have an entry for the same individual; the foaf:focus property can be used to indicate the thing in they world that they both focus on. Many SKOS concepts don't work this way; broad topical areas and subject categories don't typically correspond to some particular entity. However, in cases when they do, it is useful to link both subject-oriented and thing-oriented information via foaf:focus.

It's worth emphasising the point made in the penultimate sentence: not all concepts "have a focus"; some concepts are "just concepts" (poetry, slavery, conscientious objection, anarchism etc etc etc).

Dan summarises how he sees the new property being used in a message to the W3C public-esw-thes list:

The addition of foaf:topic is intended as a modest and pragmatic bridge between SKOS-based descriptions of topics, and other more entity-centric RDF descriptions. When a SKOS Concept stands for a person or agent, FOAF and its extensions are directly applicable; however we expect foaf:focus to also be used with places, events and other identifiable entities that are covered both by SKOS vocabularies as well as by factual datasets like wikipedia/dbpedia and Freebase.

A single "thing in the world" may be "the focus of" multiple concepts: e.g. several different library classification schemes may include concepts for the Napoleonic Wars or Ronald Reagan or Paris. Even within a single scheme, it may be that there are multiple concepts each reflecting different facets or aspects of a single entity.

VIAF

This aspect of the relationship between conceptualisation and "thing in the world" is illustrated in VIAF, the Virtual International Authority File, a service provided by OCLC. VIAF aggregates library "authority records" from multiple library "name authority files" maintained mainly by national libraries. Each record provides a "preferred form" of the name of a person or corporate entity, and multiple "alternate forms" - though that preferred form may vary from one file to the next. VIAF analyses and collates the aggregated data to establish which records refer to the same person or corporate entity, and presents the results as Linked Data.

Jeff Young of OCLC summarises the VIAF model in a post on the Outgoing blog. The post actually describes the transition between an earlier, slightly more complex model and the current model. For the purposes of this discussion, the thing to look at is the "Example (After)" graphic at the top right of the post (direct link)

Consider Dan's example of Ronald Reagan, identified by the VIAF URI http://viaf.org/viaf/76321889. The RDF description provided shows that there are eleven concepts linked to the person resource by a foaf:focus link. Each of those concepts (I think!) corresponds to a record in an authority file harvested by VIAF. Each concept has a preferred label (the preferred form of the name in that authority file) and may have a number of alternate labels. An abridged version of the VIAF description is below:

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> . 

<http://viaf.org/viaf/76321889> a foaf:Person ;
        foaf:name "Reagan, Ronald" ,
                  "Reagan, Ronald W." , 
                  "Reagan, Ronald, 1911-2004" , 
                  "Reagan, Ronald W. 1911-2004" , 
                  "Reagan, Ronald Wilson 1911-2004" , 
                  "Reagan, Elvis 1911-2004" ;
# plus various other names!
        owl:sameAs 
          <http://d-nb.info/gnd/118598724> ,
          <http://dbpedia.org/resource/Ronald_Reagan> , 
          <http://libris.kb.se/resource/auth/237204> , 
          <http://www.idref.fr/027091775/id> .

<http://viaf.org/viaf/sourceID/BNE%7CXX1025345#skos:Concept>
        a skos:Concept ;
        skos:prefLabel "Reagan, Ronald, 1911-2004" ;
        skos:altLabel "Reagan, Elvis 1911-2004" , 
                      "Reagan, Ronald W. 1911-2004", 
                      "Reagan, Ronald Wilson 1911-2004" ;
        skos:inScheme 
          <http://viaf.org/authorityScheme/BNE> ;
        foaf:focus 
          <http://viaf.org/viaf/76321889> .

<http://viaf.org/viaf/sourceID/BNF%7C11921304#skos:Concept>
        a skos:Concept ;
        skos:prefLabel "Reagan, Ronald, 1911-2004" ;
        skos:altLabel "Reagan, Ronald Wilson 1911-2004" ;
        skos:inScheme 
          <http://viaf.org/authorityScheme/BNF> ;
        foaf:focus 
          <http://viaf.org/viaf/76321889> .

# plus nine other concepts

(There really is an alternate label of "Reagan, Elvis 1911-2004" in the actual data!)

Fig1

Note that the owl:sameAs links here are between the VIAF person resource and person resources in external datasets.

LOCAH and index terms

My own first engagement with foaf:focus came during the LOCAH project. In deciding how to represent the content of the Archives Hub EAD documents as RDF, we had to decide how to model the use of "index terms" provided using the EAD <controlaccess> element. Within the Hub EAD documents, those index terms are names of one of the following categories of resource:

  • Concepts
  • Persons
  • Families
  • Organisations
  • Places
  • Genres or Forms
  • Functions

The names are sometimes (but not always) drawn from some sort of "controlled list", which is also named in the data. In other cases, they are constructed using some specified set of rules, again named in the data.

For some of these categories (Concepts, Genres/Forms, Functions), the "thing" named is simply a concept, an abstraction; for others (Persons, Families, Organisations and Places), there is a second "non-abstract" entity "out there in the world". And for this second case, we chose to represent the two distinct things, each with their own distinct URI, linked by a triple using the foaf:focus property.

The LOCAH data model is illustrated in the diagram in this post. The Concept entity type is in the lower centre; directly below are four boxes for the related "conceptualised" types (Person, Family, Organisation and Place), each linked from the Concept by the foaf:focus property.

As in the VIAF case, for a single person/family/organisation/place, there may be multiple distinct "conceptualisations", reflecting the fact that different data providers have referred to the same "thing in the world" by citing entries from different "authority files". The nature of the process by which the LOCAH RDF data is generated - the EAD documents are processed on a "document by document" basis - means that in this case, multiple URIs for the person are generated, and the "reconciliation" of these URIs as co-references to a single entity is performed as a subsequent step.

The British Library Linked Data

The British Library recently announced the release of Linked Open BNB, a new Linked Data dataset covering a subset of the British National Bibliography. The approach taken is described in a post by Richard Wallis of Talis, who worked with the BL as consultants in preparing the dataset.

The data model for the BNB data shows quite extensive use of the Concept-foaf:focus-Thing pattern.

For the "subjects" of Bibliographic Resources, a dcterms:subject link is made to a skos:Concept, reflecting an authority file or classification scheme entry, and which is in turn linked using foaf:focus to a Person, Family, Organisation or Place. In other cases, the Bibliographic Resource is linked directly to the "thing-in-the-world" and a corresponding Concept is also provided, linking to the "thing-in-the world" using foaf:focus. This is the case for languages, for persons as creators or contributors to the bibliographic resource, and for the Dublin Core "spatial coverage property".

So for the example http://bnb.data.bl.uk/id/resource/009436036 (again, this is a very stripped-down version of the actual data):

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix wgs84_pos: <http://www.w3.org/2003/01/geo/wgs84_pos#> .

<http://bnb.data.bl.uk/id/resource/009436036>
        a dcterms:BibliographicResource ;
        dcterms:creator 
          <http://bnb.data.bl.uk/id/person/KingGRD%28GeoffreyRD%29> ; 
        dcterms:subject
          <http://bnb.data.bl.uk/id/concept/place/lcsh/Aden%28Yemen%29> ;
        dcterms:spatial
          <http://bnb.data.bl.uk/id/place/Aden%28Yemen%29> ;
        dcterms:language 
          <http://lexvo.org/id/iso639-3/eng> .

<http://bnb.data.bl.uk/id/concept/place/lcsh/Aden%28Yemen%29> 
        a skos:Concept ;
        foaf:focus 
          <http://bnb.data.bl.uk/id/place/Aden%28Yemen%29> .

<http://bnb.data.bl.uk/id/place/Aden%28Yemen%29>        
        a dcterms:Location, wgs84_pos:SpatialThing .

In this case, the subject-concept and the spatial coverage-place happen to be linked to each other, but the point I wanted to illustrate was that the object of the dcterms:subject triple is the URI of a concept, and is the subject of an "outgoing" foaf:focus link to a "thing", but the object of the dcterms:spatial triple is the URI of a location/place, and is the object of an "incoming" foaf:focus link from a concept.

The two cases are perhaps best illustrated using the graph representation:

Fig2

Authority Files, Concepts, foaf:focus and Dublin Core

As discussed above, the dcterms:subject property is defined in such a way that it can be used to link to a thing of any type, although there may be a preference amongst some implementers to use dcterms:subject to link only to concepts.

For the other four properties I highlighted in the BL model, DCMI specifies an rdfs:range for the properties:

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix dcterms: <http://purl.org/dc/terms/> .

dcterms:creator rdfs:range dcterms:Agent .
dcterms:contributor rdfs:range dcterms:Agent .
dcterms:spatial rdfs:range dcterms:Location .
dcterms:language rdfs:range dcterms:LinguisticSystem .

Those three classes (dcterms:Agent, dcterms:LinguisticSystem, dcterms:Location) are described using RDFS, and although there is no formal statement that they are disjoint from the class skos:Concept, I think their human-readable descriptions, taken together with those of the properties, carry a fairly strong suggestion that instances of these classes are the "things in the world" rather than their conceptualisations - certainly for the first three cases at least. A dcterms:Agent is "A resource that acts or has the power to act", which a concept can not; a dcterms:Location is "A spatial region or named place", which again seems distinct from a concept.

The case of dcterms:LinguisticSystem, "A system of signs, symbols, sounds, gestures, or rules used in communication", seems a bit less clear, as this is a "conceptual thing" but I think one can argue that the actual linguistic system practiced by a community of language speakers is a distinct concept from the "conceptualisation" created within a classification scheme.

And this is, I think, reflected in the patterns used in the BL data model.

As I noted earlier, the Library of Congress has published SKOS representations of a number of controlled vocabularies, and these include:

In each case, the entries/members of the vocabularies are modelled as instances of skos:Concept. Following the argument I just constructed above, then, to use these vocabularies with the dcterms:spatial and dcterms:languageproperties, strictly speaking, one should adopt the patterns used in the BL model, where the concept URI is not the direct object, but is linked to a thing (location, linguistic system) URI by a foaf:focus link.

Finally, I'd draw attention to lexvo.org, which also provides Linked Data representations for ISO639-3 languages and ISO 3166-1 / UN M.49 geographical regions. In contrast to the Library of Congress SKOS representations, lexvo.org models "the things themselves", the languages and geographical regions, e.g. (again, a subset of the actual data)

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix lvont: <http://lexvo.org/ontology#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .

<http://lexvo.org/id/iso639-3/spa>
        a lvont:Language ;
        rdfs:label "Spanish"@en ;
        lvont:usedIn 
          <http://lexvo.org/id/iso3166/ES> ;
        owl:sameAs 
          <http://dbpedia.org/resource/Spanish_language> .
        
<http://lexvo.org/id/iso3166/ES>
        a lvont:GeographicRegion ;
        rdfs:label "Spain"@en ;
        lvont:memberOf 
          <http://lexvo.org/id/un_m49/039> ;
        owl:sameAs 
          <http://sws.geonames.org/2510769> .
        
<http://lexvo.org/id/un_m49/039>
        a lvont:GeographicRegion ;
        rdfs:label "Southern Europe"@en ;
        lvont:hasMember 
          <http://lexvo.org/id/iso3166/ES> ;
        lvont:memberOf 
          <http://lexvo.org/id/un_m49/150> .
 
<http://lexvo.org/id/un_m49/150>
        a lvont:GeographicRegion ;
        rdfs:label "Europe"@en ;
        lvont:hasMember 
          <http://lexvo.org/id/un_m49/039> ;
        lvont:memberOf 
          <http://lexvo.org/id/un_m49/001> .

In the BL data one finds lexvo.org language URIs used as objects of dcterms:language (which we do in the LOCAH data too).

The lexvo.org language URIs are the subjects of properties such as lvont:usedIn linking language to place where it is used or spoken and of owl:sameAs triples linking to language URIs in other datasets. And the geographic region URIs are the subjects of properties such as lvont:memberOf linking one region to another region of which it is part and of owl:sameAs triples linking to place URIs in other datasets.

Compare this with the relationships between the SKOS-based "conceptualisations of languages" and "conceptualisations of geographic in the Library of Congress dataset (again, a subset of the actual data):

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .

<http://id.loc.gov/vocabulary/iso639-2/spa>
        a skos:Concept ;
        skos:prefLabel "Spanish | Castilian"@en ;
        skos:altLabel "Spanish"@en ,
                      "Castilian"@en ;
        skos:note "Bibliographic Code"@en ;
        skos:exactMatch 
          <http://id.loc.gov/vocabulary/languages/spa> ,
          <http://id.loc.gov/vocabulary/iso639-1/es> ;
        skos:inScheme 
          <http://id.loc.gov/vocabulary/iso639-2> .
                        
<http://id.loc.gov/vocabulary/countries/sp>
        a skos:Concept ;
        skos:prefLabel "Spain"@en ;
        skos:altLabel "Balearic Islands"@en ,
                      "Canary Islands"@en ;
        skos:exactMatch 
          <http://id.loc.gov/vocabulary/geographicAreas/e-sp> ;
        skos:broadMatch 
          <http://id.loc.gov/vocabulary/geographicAreas/e> ;
        skos:inScheme 
          <http://id.loc.gov/vocabulary/countries> .
        
<http://id.loc.gov/vocabulary/geographicAreas/e-sp>
        a skos:Concept ;        
        skos:prefLabel "Spain"@en ;
        skos:exactMatch 
          <http://id.loc.gov/vocabulary/countries/sp> ;
        skos:broader 
          <http://id.loc.gov/vocabulary/geographicAreas/e> ;
        skos:inScheme 
          <http://id.loc.gov/vocabulary/geographicAreas> .

<http://id.loc.gov/vocabulary/geographicAreas/e>
        a skos:Concept ;        
        skos:prefLabel "Europe"@en ;
        skos:narrower 
          <http://id.loc.gov/vocabulary/geographicAreas/e-sp> ;
        skos:narrowMatch 
          <http://id.loc.gov/vocabulary/countries/sp> ;
        skos:inScheme 
          <http://id.loc.gov/vocabulary/geographicAreas> .
        

Here the types of relationships involved are the SKOS "semantic relations" and "mapping properties" between concepts: e.g. Spain-as-concept has-broader-concept Europe-as-concept, and so on.

Conclusions

If anyone has read this far, I can imagine it is not without some rolling of eyes at the pedantic distinctions I seem to be unpicking!

Given my understanding of SKOS, FOAF and Dublin Core, I do think the designers of the BL data have done a sterling job in trying to "get it right", carefully observing the way terms have been described by their owners and seeking to use those terms in ways that are consistent with those descriptions.

At the same time, I admit I can well imagine that to many Dublin Core implementers who see the Dublin Core properties as providing a relatively simple approach, this will seem over-complicated.

And I rather expect that we will see uses of the dcterms:language and dcterms:spatial properties which simply link directly to the concept:

@prefix dcterms: <http://purl.org/dc/terms/> .

<http://example.org/doc/1234>
        a dcterms:BibliographicResource ;
        dcterms:spatial 
          <http://id.loc.gov/vocabulary/countries/sp> ;
        dcterms:language 
          <http://id.loc.gov/vocabulary/iso639-2/spa> .

I think this is particularly likely for the case of dcterms:spatial as it is perceived as "similar" to dcterms:subject - amongst other things, it covers "aboutness" in the case where the topic is a place - particularly since, as in the BL example above, the concept URI may be used with dcterms:subject in the very same graph.

Returning to the recent post by Dan which in part prompted me to start this post, he makes a similar point, focusing on the FOAF properties with which I introduced the post. He recognises that "in some contexts having careful names for all these inter-relations is very important" but suggests

We should consider making foaf:interest vaguer, such that any of those three are acceptable. Publishers aren't going to go looking up obscure distinctions between foaf:interest and foaf:topic_interest and that other pattern using SKOS ... they just want to mark that this is a relationship between someone and a URI characterising one of their interests.

So I suggest we become more tolerant about the URIs acceptable as values of foaf:interest

(See the message in full for Dans examples.)

Part of me is tempted to suggest that similar reasoning might be applied to the Dublin Core Terms properties i.e. that consideration might be given to "relaxing" the range of dcterms:spatial to allow for the use of either the place or the concept, to resolve the dilemna that, with the current design, the patterns are different for dcterms:subject and dcterms:spatial. But where do we stop? Do we also relax dcterms:language? I really don't know. And I think I'd be quite uneasy making the suggestion for the "agent" properties.

The fundamental issue here is that the thesaurus-based approach introduces a layer of abstraction - Dan's "social and technological artifact[s] designed to help interconnect descriptions" - and that is reflected in SKOS's notion of the skos:Concept. Outside the "knowledge organisation" community, however, many RDFS/OWL modellers "model the world directly": they consider the language as system used by a community of speakers (as modelled by lexvo.org), the place as thing in space (as modelled by Geonames), and so on. Constructs such as foaf:focus help us bridge the two approaches - but not without some complexity.

On Twitter on Friday evening, I noticed a (slightly provocative!) comment on Twitter from John Goodwin (@gothwin) which I think is related:

coming to the conclusion that #SKOS is being waaay over used #linkeddata

It attracted some sympathetic responses, e.g. from Rob Styles (@mmmmmmrob) 1, 2:

@gothwin @juansequeda if you're not publishing an existing subject heading scheme on the cheap then SKOS is the wrong tool.

@johnlsheridan @juansequeda @gothwin Paris is not "narrower than" France, it's "capital of". That's the problem.

which I think echoes my examples above of Spain and Europe.

And from Leigh Dodds (@ldodds): 1, 2:

@gothwin that's all it was designed for: converting one specific class of datasets into RDF. It's just been mistaken for a modelling tool

@juansequeda @gothwin @kendall only use #skos if you're CONVERTING a taxonomy. Otherwise, just model the domain

These comments struck home with me, and I think I may have made exactly this sort of mistake in some other work I've been doing recently, and I need to revisit it, to check whether what I've done is really necessary or useful. As John suggests, I think sometimes I reach for SKOS as a tool for something that has a "controlled list" feel to it without thinking hard enough whether it is really the appropriate tool for the task at hand.

Having said that, I do also think we will need to manage the "mixed" cases - particularly in datasets coming from communities such as libraries where the use of thesauri is commonplace, and a growing number are available as SKOS - where we end up needing the sort of bridge which foaf:focus provides, so some of this complexity may be unavoidable.

In any case, I think it's an area where some guidance/best practice notes - with lots of examples! - would be very helpful.

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8345203ba69e2015391b8f25b970b

Listed below are links to weblogs that reference Things & their conceptualisations: SKOS, foaf:focus & modelling choices:

Comments

Pete,

You probably know all this already but I've been trying to make sense of RDF and it's relation to Topic Maps. [1] and your article made me think a little more about it.

RDF is resource focused while topic maps are subject focused. A topic map can exist without any "occurrences" of the topics which it names and identifies existing. It's a way to model concepts not things (though you can talk about things but it is kind of the opposite of reification because you make the thing a topic). Topics in topic maps are abstract and using SKOS to identify them seems to me to be very appropriate. (Paris as an abstract concept, France as an abstract concept, and "capital city" as an abstract concept. and use associations to model relations between topics. So I could say "Paris is the capital city of France" but never point to a thing. "Things" are instances of the topic and are called occurrences (there are internal and external occurrences). Relationships between occurrences can also be modeled using associations.

Anyway, I'm far from expert on RDF or Topic Maps (or any thing else for that matter) but as you pointed out in your article there sometimes needs to be a way to talk about both resources and concepts. I know people have tried to express topic maps in RFD and vis a versa but with out much success, but I coming to believe that that are complementary and would like to find a way to make them work together.


[1] http://www.topicmaps.org/xtm/

Dan Brickley's diagram of Ronald Reagan caused a brief debate here on whether "US Presidents" should be modeled as a subclass of "Person" or as an instance of foaf:Group. The confusion and disagreement evaporated when we remembered that class names should be singular rather than plural. In one sense this is a nitpick but in another sense it illustrates how words (and even a single letter as in this case) can significantly alter the model.

@Jeff: In theory the words that build URI references should not matter in RDF at all. I suppose there is something wrong with the theory ;-) See also my recent blog article http://jakoblog.de/2011/09/21/modeling-is-difficult/

The comments to this entry are closed.

About

Search

Loading
eFoundations is powered by TypePad