There's was a brief exchange of messages on the jisc-repositories mailing list a couple of weeks ago concerning the naming of authors in institutional repositories. When I say naming, I really mean identifying because a name, as in a string of characters, doesn't guarantee any kind of uniqueness - even locally, let alone globally.
The thread started from a question about how to deal with the situation where one author writes under multiple names (is that a common scenario in academic writing?) but moved on to a more general discussion about how one might assign identifiers to people.
I quite liked Les Carr's suggestion:
Surely the appropriate way to go forward is for repositories to start by locally choosing a scheme for identifying individuals (I suggest coining a URI that is grounded in some aspect of the institution's processes). If we can export consistently referenced individuals, then global services can worry about "equivalence mechanisms" to collect together all the various forms of reference that.
This is the approach taken by the Resist Knowledgebase, which is the foundation for the (just started) dotAC JISC Rapid Innovation project.
(Note: I'm assuming that when Les wrote 'URI' he really meant 'http URI').
Two other pieces of current work seem relevant and were mentioned in the discussion. Firstly the JISC-funded Names project which is working on a pilot Names Authroity Service. Secondly, the RLG Networking Names report. I might be misunderstanding the nature of these bits of work but both seem to me to be advocating rather centralised, registry-like, approaches. For example, both talk about centrally assigning identifiers to people.
As an aside, I'm constantly amazed by how many digital library initiatives end up looking and feeling like registries. It seems to be the DL way... metadata registries, metadata schema registries, service registries, collection registries. You name it and someone in a digital library will have built a registry for it.
May favoured view is that the Web is the registry. Assign identifiers at source, then aggregate appropriately if you need to work across stuff (as Les suggests above). The <sameAs> service is a nice example of this:
The Web of Data has many equivalent URIs. This service helps you to find co-references between different data sets.
As Hugh Glaser says in a discussion about the service:
Our strong view is that the solution to the problem of having all these URIs is not to generate another one. And I would say that with services of this type around, there is no reason.
In thinking about some of the issues here I had cause to go back and re-read a really interesting interview by Martin Fenner with Geoffrey Bilder of CrossRef (from earlier this year). Regular readers will know that I'm not the world's biggest fan of the DOI (on which CrossRef is based), partly for technical reasons and partly on governence grounds, but let's set that aside for the moment. In describing CrossRef's "Contributor ID" project, Geoff makes the point that:
... “distributed” begets “centralized”. For every distributed service created, we’ve then had to create a centralized service to make it useable again (ICANN, Google, Pirate Bay, CrossRef, DOAJ, ticTocs, WorldCat, etc.). This gets us back to square one and makes me think the real issue is - how do you make the centralized system that eventually emerges accountable?
I think this is a fair point but I also think there is a very significant architectural difference between a centralised service that aggregates identifiers and other information from a distributed base of services, in order to provide some useful centralised function for example, vs. a centralised service that assigns identifiers which it then pushes out into the wider landscape. It seems to me that only the former makes sense in the context of the Web.