« Term-based thesauri and SKOS (Part 3): Change over time (i) | Main | UMF pilot cloud infrastructure - size matters? »

March 10, 2011

Term-based thesauri and SKOS (Part 4): Change over time (ii)

This is the fourth in a series of posts (previously: part 1, part 2, part 3) on making a thesaurus available as linked data using the SKOS and SKOS-XL RDF vocabularies. In the previous post, I examined some of the ways the thesaurus can change over time, and problems that arose with my proposed mapping to RDF. Here I'll outline one potential solution to those problems.

The last three cases I described in the previous post, where an existing preferred term loses that status and is "relegated" to a non-preferred term, all present a problem for my suggested simple mapping, because the URI for a concept disappears from the generated RDF graph - and this creates a conflict with the principles of URI stability and reliability I advocated at the start of that post.

My first thoughts on a solution circled around generating concept URIs, not just for the preferred term, but also for all the non-preferred terms, and using owl:sameAs (or skos:exactMatch?) to indicate that the concept URIs derived from the terms associated with a single preferred term were synonyms, i.e. each of them identified the same concept. That way the change from preferred term to non-preferred term would not result in the loss of a concept URI. But the proliferation of URIs here feels fundamentally flawed - the problem is not one that is solved by having multiple URIs for a single concept; the issue is the persistence of a single URI. Introducing the multiple URIs also seems like a recipe for a lot of practical difficulties in managing the impact of changes on external applications, particularly if URIs which were once synonyms cease to be so.

After some searching, I found a couple of useful pages on the W3C wiki: some notes on versioning (which as far as I know didn't make it into the final SKOS specifications) and particularly this page on "Concept Evolution" in SKOS. The latter is rather more a collection of pointers than the concrete set of examples and guidelines I was hoping for, but one of those pointers is to a thread on the W3C public-esw-thes mailing list, starting with this message from Rob Tice, which I think describes (in his point 2) exactly the situation I'm dealing with in the problem cases in the previous post:

How should we identify and manage change between revisions of concept schemes as this 'seems' to result in imprecision. e.g. a concept 'a' is currently in thes 'A' and only has a preferred label. A new revision of thes 'A' is published and what was concept 'a' is now a non preferred concept and thus becomes simply a non preferred label for a new concept 'b'.

It seems to me that this operation loses some of the semantic meaning of the change as all references to the concept id of 'concept a' would be lost as it now is only a non preferred label of a different concept with a different id (concept 'b').

The suggested approach emerging from that discussion has two elements:

  1. A notion that a concept can be marked as "deprecated" (using e.g. a "status" property with a value of "deprecated" or a "deprecated" property with Boolean (yes/no) values) or as being "valid" or "applicable" only for a specified bounded period of time (see the messages from Johan De Smedt and from Margarita Sini)
  2. Such a "deprecated" concept can be the subject of a "replaced by" relationship linking it to the "preferred term" concept (see the message from Michael Panzer)

The application of these two elements in combination is illustrated in this example by Joachim Neubert (again, I think, addressing the same scenario).

I wasn't aware of the owl:deprecated property before, but as far as I can tell, it would be appropriate for this case.

Joachim's message highlights the question of what to do about skos:prefLabel/skosxl:prefLabel or skos:altLabel/skosxl:altLabel properties for the deprecated concept. In the term-based thesaurus, the term has become a non-preferred term for another term: in the SKOS model, the term is now the alternate label for a different concept, and the preferred label for no concept. So on that basis, I'm inclined to follow Joachim's suggestion that the deprecated concept should be the subject of neither skos:prefLabel/skosxl:prefLabel nor skos:altLabel/skosxl:altLabel properties, though it could, as Joachim's example shows, retain an rdfs:label property. And similarly it is no longer the subject or object of semantic relationships.

I did wonder about the option of introducing a set of properties, parallel to the SKOS ones, to indicate those former relationships, e.g. ex:hadPrefLabel, ex:hadAltLabel, ex:hadRelated, ex:hadBroader, ex:hadNarrower, essentially as documentation. But I'm really not sure how useful this is: the semantic relationships in which those other target concepts are involved may themselves change. And I suppose in principle (though it seems unlikely in practice) a single concept may itself go through several status changes (e.g. from active to deprecated to active to deprecated) and accrue various different "former" relationships in the course of that. If this level of information is required, then I think it probably has to be provided using some other approach - like the use of a set of date-stamped graphs/documents that reflect the state of a concept at a point in time.

So applying Joachim's approach to Case 8 from the examples in the previous post, where the current preferred term "Political violence" is to become a non-preferred term for "Collective violence", we end up with the concept con:C2 as a "deprecated" concept with a Dublin Core dcterms:isReplacedBy relationship to concept con:C6 (and the inverse from con:C6 to con:C2):

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix skosxl: <http://www.w3.org/2008/05/skos-xl#> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix con: <http://example.org/id/concept/polthes/> .
@prefix term: <http://example.org/id/term/polthes/> .

con:C2 a skos:Concept;
       rdfs:label "Political violence"@en;
       owl:deprecated "true"^^xsd:boolean;
       dcterms:isReplacedBy con:C6 .
       
term:T2 a skosxl:Label;
        skosxl:literalForm "Political violence"@en.

con:C6 a skos:Concept;
       skos:prefLabel "Collective violence"@en;
       skosxl:prefLabel term:T6;
       skos:altLabel "Political violence"@en;
       skosxl:altLabel term:T2; 
       dcterms:replaces con:C2 .       

term:T6 a skosxl:Label;
        skosxl:literalForm "Collective violence"@en.

Using this approach then, the full output graph for Case 8 would be as follows (the highlighting indicates the difference between this graph and that for Case 8 in the previous post):

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix skosxl: <http://www.w3.org/2008/05/skos-xl#> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix con: <http://example.org/id/concept/polthes/> .
@prefix term: <http://example.org/id/term/polthes/> .

term:T1 a skosxl:Label;
        skosxl:literalForm "Civil violence"@en.
        
con:C6 a skos:Concept;
       skos:prefLabel "Collective violence"@en;
       skosxl:prefLabel term:T6;
       skos:altLabel "Political violence"@en;
       skosxl:altLabel term:T2;
       skos:altLabel "Civil violence"@en;
       skosxl:altLabel term:T1;
       skos:altLabel "Violent protest"@en;
       skosxl:altLabel term:T5;
       skos:broader con:C4;
       skos:narrower con:C3; 
       dcterms:replaces con:C2 .       

term:T6 a skosxl:Label;
        skosxl:literalForm "Collective violence"@en.

con:C2 a skos:Concept;
       rdfs:label "Political violence"@en;
       owl:deprecated "true"^^xsd:boolean;
       dcterms:isReplacedBy con:C6 .
       
term:T2 a skosxl:Label;
        skosxl:literalForm "Political violence"@en.

con:C3 a skos:Concept;
       skos:prefLabel "Terrorism"@en;
       skosxl:prefLabel term:T3;
       skos:broader con:C6 .

term:T3 a skosxl:Label;
        skosxl:literalForm "Terrorism"@en.
       
con:C4 a skos:Concept;
       skos:prefLabel "Violence"@en;
       skosxl:prefLabel term:C4;
       skos:narrower con:C6 .

term:T4 a skosxl:Label;
        skosxl:literalForm "Violence"@en.

term:T5 a skosxl:Label;
        skosxl:literalForm "Violent protest"@en.

Now our graph retains the URI con:C2 and provides a description of that resource as a "deprecated concept".

And for Case 9 (again the highlighting indicates the difference from the initial graph for Case 9):

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix skosxl: <http://www.w3.org/2008/05/skos-xl#> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix con: <http://example.org/id/concept/polthes/> .
@prefix term: <http://example.org/id/term/polthes/> .

term:T1 a skosxl:Label;
        skosxl:literalForm "Civil violence"@en.

con:C1 a skos:Concept;
       skos:prefLabel "Civil violence"@en;
       skosxl:prefLabel term:T1;
       skos:altLabel "Political violence"@en;
       skosxl:altLabel term:T2;
       skos:altLabel "Violent protest"@en;
       skosxl:altLabel term:T5;
       skos:broader con:C4;
       skos:narrower con:C3; 
       dcterms:replaces con:C2 .       
        
con:C6 a skos:Concept;
       skos:prefLabel "Collective violence"@en;
       skosxl:prefLabel term:T6.

term:T6 a skosxl:Label;
        skosxl:literalForm "Collective violence"@en.

con:C2 a skos:Concept;
       rdfs:label "Political violence"@en;
       owl:deprecated "true"^^xsd:boolean;
       dcterms:isReplacedBy con:C1 .

term:T2 a skosxl:Label;
        skosxl:literalForm "Political violence"@en.

con:C3 a skos:Concept;
       skos:prefLabel "Terrorism"@en;
       skosxl:prefLabel term:T3;
       skos:broader con:C1 .

term:T3 a skosxl:Label;
        skosxl:literalForm "Terrorism"@en.
       
con:C4 a skos:Concept;
       skos:prefLabel "Violence"@en;
       skosxl:prefLabel term:C4;
       skos:narrower con:C1 .

term:T4 a skosxl:Label;
        skosxl:literalForm "Violence"@en.

term:T5 a skosxl:Label;
        skosxl:literalForm "Violent protest"@en.

In the (unlikely?) event that the (previously preferred) non-preferred term is once again restored to the state of preferred term, then the concept con:C2 loses its deprecated status and the dcterms:isReplacedBy relationship, and acquires skos:prefLabel/skos:altLabel properties as normal.

Generating these graphs does, however, imply a change to the process of generating the RDF representation. As I noted at the start of the previous post, my first cut at this was based on being able to process a snaphot of the thesaurus "stand-alone" without knowledge of previous versions. But the capacity to detect deprecated concepts depends on knowledge of the current state of the thesaurus, i.e. when the transformation process encounters a non-preferred term x, it needs to behave differently depending on whether:

  1. concept con:Cx exists in the current thesaurus dataset (as either an "active" or "deprecated" concept), in which case a "deprecated concept" con:Cx should be output, as well as term:Tx (as alternate label for some other concept, con:Cy); or
  2. concept con:Cx does not exist in the current thesaurus dataset, in which case only term:Tx (as alternate label for a concept con:Cy) is required

I think that test has to be made against the current RDF thesaurus dataset rather than using the previous XML snapshot in time, as the "deprecation" may have taken place several snapshots ago.

I have to admit this does make the transformation process rather more complicated than I had hoped. The only way alternative would be if it is somehow possible to distinguish the "deprecation" case from the "static" non-preferred term case from the input data alone, but as far as I know this isn't possible.

Summary

The previous post highlighted that for one particular category of change, where an existing preferred term is "relegated" to the status of a non-preferred term, the results of the suggested simple mapping into SKOS had problematic consequences.

Based on some investigation of how others approach similar scenarios (and here I should note I'm very grateful to the contributors to the wiki page on concept evolution and to those discussions linked from it, as I was struggling to see clearly how to deal with these scenarios), I've sketched above an approach to representing a concept which has been "deprecated", or is no longer applicable, and is replaced by another concept. I'm sure it isn't the only way of addressing the problem, but it seems a reasonable one to try.

I think this creates new challenges for implementing this approach in the transformation process and I need to work on that to test it, but I think it is achievable. But I would also be very grateful for any comments, particularly if there are gaping holes in this which I haven't spotted!

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8345203ba69e20147e2eda4e5970b

Listed below are links to weblogs that reference Term-based thesauri and SKOS (Part 4): Change over time (ii):

Comments

The comments to this entry are closed.

About

Search

Loading
eFoundations is powered by TypePad