« e-Framework - time to stop polishing guys! | Main | Linked data vs. Web of data vs. ... »

July 20, 2009

On names

There's was a brief exchange of messages on the jisc-repositories mailing list a couple of weeks ago concerning the naming of authors in institutional repositories.  When I say naming, I really mean identifying because a name, as in a string of characters, doesn't guarantee any kind of uniqueness - even locally, let alone globally.

The thread started from a question about how to deal with the situation where one author writes under multiple names (is that a common scenario in academic writing?) but moved on to a more general discussion about how one might assign identifiers to people.

I quite liked Les Carr's suggestion:

Surely the appropriate way to go forward is for repositories to start by locally choosing a scheme for identifying individuals (I suggest coining a URI that is grounded in some aspect of the institution's processes). If we can export consistently referenced individuals, then global services can worry about "equivalence mechanisms" to collect together all the various forms of reference that.

This is the approach taken by the Resist Knowledgebase, which is the foundation for the (just started) dotAC JISC Rapid Innovation project.

(Note: I'm assuming that when Les wrote 'URI' he really meant 'http URI').

Two other pieces of current work seem relevant and were mentioned in the discussion. Firstly the JISC-funded Names project which is working on a pilot Names Authroity Service. Secondly, the RLG Networking Names report.  I might be misunderstanding the nature of these bits of work but both seem to me to be advocating rather centralised, registry-like, approaches. For example, both talk about centrally assigning identifiers to people.

As an aside, I'm constantly amazed by how many digital library initiatives end up looking and feeling like registries. It seems to be the DL way... metadata registries, metadata schema registries, service registries, collection registries. You name it and someone in a digital library will have built a registry for it.

May favoured view is that the Web is the registry. Assign identifiers at source, then aggregate appropriately if you need to work across stuff (as Les suggests above).  The <sameAs> service is a nice example of this:

The Web of Data has many equivalent URIs. This service helps you to find co-references between different data sets.

As Hugh Glaser says in a discussion about the service:

Our strong view is that the solution to the problem of having all these URIs is not to generate another one. And I would say that with services of this type around, there is no reason.

In thinking about some of the issues here I had cause to go back and re-read a really interesting interview by Martin Fenner with Geoffrey Bilder of CrossRef (from earlier this year).  Regular readers will know that I'm not the world's biggest fan of the DOI (on which CrossRef is based), partly for technical reasons and partly on governence grounds, but let's set that aside for the moment.  In describing CrossRef's "Contributor ID" project, Geoff makes the point that:

... “distributed” begets “centralized”. For every distributed service created, we’ve then had to create a centralized service to make it useable again (ICANN, Google, Pirate Bay, CrossRef, DOAJ, ticTocs, WorldCat, etc.). This gets us back to square one and makes me think the real issue is - how do you make the centralized system that eventually emerges accountable?

I think this is a fair point but I also think there is a very significant architectural difference between a centralised service that aggregates identifiers and other information from a distributed base of services, in order to provide some useful centralised function for example, vs. a centralised service that assigns identifiers which it then pushes out into the wider landscape. It seems to me that only the former makes sense in the context of the Web.

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8345203ba69e2011570b97c30970c

Listed below are links to weblogs that reference On names:

Comments

The People Australia program (http://wiki.nla.gov.au/display/peau) at the National Library of Australia (NLA) harvests records about people and organisations using OAI-PMH and matches these to identities. All identities are assigned a public, persistent identifier and the identifiers/urls for ingested records are stored.

We have implemented SRU and OpenSearch interfaces to support searching and an OAI interface for the harvesting of the data we aggregate. Humans can search for and view these records in the NLA’s new integrated discovery service at http://sbdsproto.nla.gov.au/sbdp-ui/island/people.

So, People Australia is a centralised identity resolution service that also disseminates identifiers for other services to use.

With a little XSLT magic, rdf+xml documents can be created from our records that are similar to search results from http://sameas.org. An example for the Dame Nellie Melba (http://nla.gov.au/nla.party-505278) People Australia record is available at https://wiki.nla.gov.au/display/peau/Examples.

It is early days but we have already harvested around 200,000 names from a number of sources of names including the Libraries Australia Name Authority File, the Music Australia Party Database, the Australian Women’s Register and Australia Dancing. Over the coming months we will be increasing the number of contributors and greatly increasing the number of identities accessible via the service.

In terms of names and institutional repositories, we are working with the NicNames project (http://nicnamesproject.blogspot.com/) here in Australia to help identify researchers. A recent blog post by Peter Sefton (Australian Digital Futures Institute, University of Southern Queensland) describes how: http://bit.ly/IvRxz.

The results of this work will allow institutional repositories to manage identity across institutions. It will also benefit users who will be able to discover resources across institutional repositories created by a researcher along with works by the researcher that are recorded elsewhere (eg in Libraries Australia).

I'm not sure whether 'centralised' is quite the point. What does it mean to be centralised in a distributed world? I think we need to be a bit more careful with terminology.

There are questions about comprehensiveness, authority, compatibility and exclusivity that go along side the idea of 'centralisation'.

If the Names project built something that was intended to be comprehensive, authoratative, is not compatible with other possible authorities and built in such as way that it is seen as the exclusive source for name authorities, I'd worry.

However, if Names ends up minting (http) URIs for names that can be used alongside other authorities (e.g. Institutions as suggested by Les), then this seems ultimately de-centralised.


Although I find Geoff Bilder's point about the inevitability of centralisation interesting, I think there are some issues with it as well. Worldcat does not acheive 'centralisation' in the same way as Google does - and I'd argue that the differences are important and mean that closer analysis of what 'centralised' means in each context to really understand what is going on.

The comments to this entry are closed.

About

Search

Loading
eFoundations is powered by TypePad