LCCN Permalinks and the info URI scheme
Another post that has been on the back burner for a few days.... via catalogablog, I noticed recently that the Library of Congress announced the availability of what it calls LCCN Permalinks, a set of URIs using the http URI scheme which act as globally scoped identifiers for bibliographic records in the Library of Congress Online Catalog and for which the LoC makes a commitment of persistence.
I tend to think of two aspects of persistence, following the distinction that the Web Architecture makes between identification and interaction. From reading the FAQ, I think that persistence in the LCCN Permalink case covers both of these aspects. So the LoC commits to the persistence of the identifiers as names by, for example, keeping ownership of the domain name and managing the (human, organisational) processes for assigning URIs within that space so that once assigned a single URI will continue to identify the same record (i.e. they observe the WebArch principles of avoiding collisions). And they also commit to serving consistent representations of the resources identified by those URIs (i.e. they observe the WebArch principles of providing representations and doing so consistently and predictably over time).
So for example, the URI http://lccn.loc.gov/2003556443 is a persistent identifier of a metadata record describing an online exhibit called "1492: an ongoing voyage". And in addition, for each URI of this form, a further three URIs are coined to identify that same metadata record presented in different formats: http://lccn.loc.gov/2003556443/marcxml (MARCXML), http://lccn.loc.gov/2003556443/mods (MODS), http://lccn.loc.gov/2003556443/dc (SRW DC XML). So in terms of the Web Architecture, we have four distinct, but related, resources here. And indeed the fact that they are related is reflected in the hypertext links in the HTML document served as a representation of the first resource, along the lines of the TAG finding, On Linking Alternative Representations To Enable Discovery And Publishing. It would be even nicer if that HTML document indicated the nature of the "generic resource"-"specific resource" relationship between those resources. But, really, it would be churlish to complain! :-) We now have a set of URIs which have the (attractive) characteristics that, first, they serve as globally scoped persistent names and, second, they are amenable to lookup using a widely used network protocol which is supported by tools on my desktop and by libraries for every common programming platform. Good stuff.
However, it is interesting to note that this - or at least the first aspect, the provision of persistent names - was the intent behind the provision of the "lccn" namespace within the info URI scheme. According to the entry for the "lccn" namespace in the info URI registry:
The LCCN namespace consists of identifiers, one corresponding to every assigned LCCN (Library of Congress Control Number). Any LCCN may have various forms which all normalize to a single canonical form; only normalized values are included in the LCCN namespace.
An LCCN is an identifier assigned by the Library of Congress for a metadata record (e.g., bibliographic record, authority record).
Compare (from the first two questions of the LCCN Permalink FAQ)
1. What are LCCN Permalinks?
LCCN Permalinks are persistent URLs for bibliographic records in the Library of Congress Online Catalog. These links are constructed using the record's LCCN (or Library of Congress Control Number), an identifier assigned by the Library of Congress to bibliographic and authority records.
2. How can I use LCCN Permalinks?
LCCN Permalinks offer an easy way to cite and link to bibliographic records in the Library of Congress Online Catalog. You can use an LCCN Permalink anywhere you need to reference an LC bibliographic record in emails, blogs, databases, web pages, digital files, etc.
The issue with URIs in the info: URI scheme, of course, is that while they provide globally scoped, persistent names, the info URI scheme is not mapped to a network protocol to enable the lookup of those names. I understand that for info URIs, "per-namespace methods may exist as declared by the relevant Namespace Authorities", but "[a]pplications wishing to tap into this functionalitiesy (sic) must consult the INFO Registry on a per-namespace basis." (both quotes from the info URI scheme FAQ.)
The creation of LCCN Permalinks seems to endorse Berners-Lee's basic principle that I mentioned in my post on Linked Data) that it is helpful for the users/consumers of a URI not only to have a globally-scoped name, but also to be able to look up those names - using an almost ubiquitous network protocol - and obtain some useful information. LoC have supplemented the use of a URI scheme that only supported the former with the use of a scheme which facilitates both the former and the latter. And with a recent post by Stu Weibel in mind, I'd just add that (a) the use of an http URI does not constitute an absolute requirement that the owner also serve representations - the http URIs I coin can be used quite effectively as names alone without my ever configuring my HTTP server to provide representations for those URIs (and if the LoC HTTP server disappears, an LCCN Permalink still works as a name); and (b) the serving of representations for http URIs is not - in principle, at least - limited to the use of the HTTP protocol (see "Serve using any protocol" in the draft finding of the W3C TAG URI Schemes and Web Protocols).
Further, the persistence in LCCN Permalinks is a consequence of LoC's policy commitment to ensuring that persistence (in both aspects I outlined above): it is primarily a socio-economic, organisational consideration, not a technical one, and that applies regardless of the URI scheme chosen.
Indeed, it seems to me the creation of LCCN Permalinks suggests that there wasn't really much of a requirement for the creation of the "lccn" info URI namespace. And the co-existence of these two sets of URIs now means that consumers are faced with managing the use of two parallel sets of global identifiers - two sets provided by the same agency - for a single set of resources (i.e. URI aliases). Certainly, this can be managed, using, e.g. the capability provided by the owl:sameAs property to state that two URIs identify the same resource. But it does seem to me that it adds an avoidable overhead, with - in this case - little (no?) appreciable benefit. (Compare the case that I mentioned, also in the post on Linked Data, of URI aliases provided by different agencies, where the use of two URIs enables the provision of different descriptions of a single resource, and so does bring something additional to the table.)
Given the (commendable) strong commitment to persistence expressed by LoC for LCCN Permalinks, it seems to me that anyone using the URIs in the info URI "lccn" namespace could switch to citing the corresponding LCCN Permalink instead - though if only a proportion of the community makes the change, that still leaves services which work across the Web and which merge data from the two camps having to work with the two aliases.
Interestingly, the use of the http URI scheme in association with a domain which was supported by some organisational commitment is exactly the sort of suggestion made by several observers as a viable alternative to the info URI scheme when it was first being proposed. See for example a message by Patrick Stickler to the W3C URI and RDF Interest Group mailing lists (in October 2003!) which uses the LCCN case as an example.
Anyway, all in all, this is a very positive and exciting development. I look forward to the implementation of similar conventions using the http URI scheme by the owners of other info URI namespaces :-)