« P vs. P in a user-centric world | Main | Concentration and diffusion - the two ways of Web 2.0 »

March 04, 2008

LCCN Permalinks and the info URI scheme

Another post that has been on the back burner for a few days.... via catalogablog, I noticed recently that the Library of Congress announced the availability of what it calls LCCN Permalinks, a set of URIs using the http URI scheme which act as globally scoped identifiers for bibliographic records in the Library of Congress Online Catalog and for which the LoC makes a commitment of persistence.

I tend to think of two aspects of persistence, following the distinction that the Web Architecture makes  between identification and interaction. From reading the FAQ, I think that persistence in the LCCN Permalink case covers both of these aspects. So the LoC commits to the persistence of the identifiers as names by, for example, keeping ownership of the domain name and managing the (human, organisational) processes for assigning URIs within that space so that once assigned a single URI will continue to identify the same record (i.e. they observe the WebArch principles of avoiding collisions). And they also commit to serving consistent representations of the resources identified by those URIs (i.e. they observe the WebArch principles of providing representations and doing so consistently and predictably over time).

So for example, the URI http://lccn.loc.gov/2003556443 is a persistent identifier of a metadata record describing an online exhibit called "1492: an ongoing voyage". And in addition, for each URI of this form, a further three URIs are coined to identify that same metadata record presented in different formats: http://lccn.loc.gov/2003556443/marcxml (MARCXML), http://lccn.loc.gov/2003556443/mods (MODS), http://lccn.loc.gov/2003556443/dc (SRW DC XML). So in terms of the Web Architecture, we have four distinct, but related, resources here. And indeed the fact that they are related is reflected in the hypertext links in the HTML document served as a representation of the first resource, along the lines of the TAG finding, On Linking Alternative Representations To Enable Discovery And Publishing. It would be even nicer if that HTML document indicated the nature of the "generic resource"-"specific resource" relationship between those resources. But, really, it would be churlish to complain! :-) We now have a set of URIs which have the (attractive) characteristics that, first, they serve as globally scoped persistent names and, second, they are amenable to lookup using a widely used network protocol which is supported by tools on my desktop and by libraries for every common programming platform. Good stuff.

However, it is interesting to note that this - or at least the first aspect, the provision of persistent names - was the intent behind the provision of the "lccn" namespace within the info URI scheme. According to the entry for the "lccn" namespace in the info URI registry:

The LCCN namespace consists of identifiers, one corresponding to every assigned LCCN (Library of Congress Control Number). Any LCCN may have various forms which all normalize to a single canonical form; only normalized values are included in the LCCN namespace.

An LCCN is an identifier assigned by the Library of Congress for a metadata record (e.g., bibliographic record, authority record).

Compare (from the first two questions of the LCCN Permalink FAQ)

1. What are LCCN Permalinks?

LCCN Permalinks are persistent URLs for bibliographic records in the Library of Congress Online Catalog. These links are constructed using the record's LCCN (or Library of Congress Control Number), an identifier assigned by the Library of Congress to bibliographic and authority records.

2. How can I use LCCN Permalinks?

LCCN Permalinks offer an easy way to cite and link to bibliographic records in the Library of Congress Online Catalog. You can use an LCCN Permalink anywhere you need to reference an LC bibliographic record in emails, blogs, databases, web pages, digital files, etc.

The issue with URIs in the info: URI scheme, of course, is that while they provide globally scoped, persistent names, the info URI scheme is not mapped to a network protocol to enable the lookup of those names. I understand that for info URIs, "per-namespace methods may exist as declared by the relevant Namespace Authorities", but "[a]pplications wishing to tap into this functionalitiesy (sic) must consult the INFO Registry on a per-namespace basis." (both quotes from the info URI scheme FAQ.)

The creation of LCCN Permalinks seems to endorse Berners-Lee's basic principle that I mentioned in my post on Linked Data) that it is helpful for the users/consumers of a URI not only to have a globally-scoped name, but also to be able to look up those names - using an almost ubiquitous network protocol - and obtain some useful information. LoC have supplemented the use of a URI scheme that only supported the former with the use of a scheme which facilitates both the former and the latter. And with a recent post by Stu Weibel in mind, I'd just add that (a) the use of an http URI does not constitute an absolute requirement that the owner also serve representations - the http URIs I coin can be used quite effectively as names alone without my ever configuring my HTTP server to provide representations for those URIs (and if the LoC HTTP server disappears, an LCCN Permalink still works as a name); and (b) the serving of representations for http URIs is not - in principle, at least - limited to the use of the HTTP protocol (see "Serve using any protocol" in the draft finding of the W3C TAG URI Schemes and Web Protocols).

Further, the persistence in LCCN Permalinks is a consequence of LoC's policy commitment to ensuring that persistence (in both aspects I outlined above): it is primarily a socio-economic, organisational consideration, not a technical one, and that applies regardless of the URI scheme chosen.

Indeed, it seems to me the creation of LCCN Permalinks suggests that there wasn't really much of a requirement for the creation of the "lccn" info URI namespace. And the co-existence of these two sets of URIs now means that consumers are faced with managing the use of two parallel sets of global identifiers - two sets provided by the same agency - for a single set of resources (i.e. URI aliases). Certainly, this can be managed, using, e.g. the capability provided by the owl:sameAs property to state that two URIs identify the same resource. But it does seem to me that it adds an avoidable overhead, with - in this case - little (no?) appreciable benefit. (Compare the case that I mentioned, also in the post on Linked Data, of URI aliases provided by different agencies, where the use of two URIs enables the provision of different descriptions of a single resource, and so does bring something additional to the table.)

Given the (commendable) strong commitment to persistence expressed by LoC for LCCN Permalinks, it seems to me that anyone using the URIs in the info URI "lccn" namespace could switch to citing the corresponding LCCN Permalink instead - though if only a proportion of the community makes the change, that still leaves services which work across the Web and which merge data from the two camps having to work with the two aliases.

Interestingly, the use of the http URI scheme in association with a domain which was supported by some organisational commitment is exactly the sort of suggestion made by several observers as a viable alternative to the info URI scheme when it was first being proposed. See for example a message by Patrick Stickler to the W3C URI and RDF Interest Group mailing lists (in October 2003!) which uses the LCCN case as an example.

Anyway, all in all, this is a very positive and exciting development. I look forward to the implementation of similar conventions using the http URI scheme by the owners of other info URI namespaces :-)

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8345203ba69e200e550ac14258834

Listed below are links to weblogs that reference LCCN Permalinks and the info URI scheme:

Comments

Can you clarify exactly what is identified by these URIs?

Seems to me it's "an HTML document containing links to various metadata records about some (unspecified) resource".

Where is the persistent URI for the resource itself?

Hi Mikael,

I think LCCNs identify metadata records. I agree an identifier for the resource described by the record would be good (and probably of more widespread usefulness) but I don't that's what LCCNs provide.

"Don't think that's what LCCNs provide", I meant

What if LCCNs actually identify books? Or rather 'manifestations' in FRBR ('editions' in vernacular)---that is, the set of all books that are more or less the exact same book (with "more or less" being defined through community consensus, of course).

What are the implications if we just start assuming that they DO? Not just identify a metadata record, but in fact identify a manifestation (which that metadata record describes). I certainly already has software that acts as if this is the case. To be sure, they are not 'canonical' identifiers for that manifestation, in the sense that there may indeed be many other identifiers from various other systems identifying that same manifestation--and in some cases (largely mistakes, if they exist, I'd think) there may even be more than one LCCN identifying that same manifestation.

So what happens if we say that LCCNs DO identify 'the resource the record represents'? What makes you think they do not? What are the negative consequences of assuming/saying/acting-as-if/declaring that they do? Like I said, I do already have software that acts as if this is so (since we are rather identifier-starved in this domain, I'll take what I can get). It works.

Jonathan,

Apologies, in my reply to Mikael, I meant to say "LCCN Permalinks identify metadata records" (though my reading of http://www.loc.gov/marc/lccn.html is that that is true also for LCCNs, though they aren't URIs)

For LCCN Permalinks, I was going only on what the Library of Congress said in the LCCN Permalink FAQ:

- "LCCN Permalinks are persistent URLs for bibliographic records in the Library of Congress Online Catalog",
- "LCCN Permalinks offer an easy way to cite and link to bibliographic records in the Library of Congress Online Catalog",
- "You can use an LCCN Permalink anywhere you need to reference an LC bibliographic record".

I can't see any way to interpret those sentences as saying the URI identifies the thing described by the metadata record. (And the LCCN Permalink isn't used in the dc:identifier field of the SRW DC record, which I would have expected to see if it did identify the thing described.)

But they aren't my URIs and I could be wrong!

What would be the consequences of assuming they do identify the thing described by the metadata record?

Well, I guess the problem would arise if LoC uses the URIs to make assertions about the metadata record (type=metadata record, created last month by Fred Cataloguer), and then I use the same URI to make assertions about the thing described by the metadata record (type=online exhibition, created in 1996). If an application merges those two data sources, then there is confusion about what is identified, because I've created a URI collision.

I think the principle is that it's for the URI owner to say what their URI identifies (though, yes, I guess there is a social dimension to that and if enough people use it to identify something else then that does carry some weight too!)

I think one way forward which avoids the collision problem, but still builds on the availability of the LCCN Permalinks to stave off "starvation" would be to use the convention suggested by

http://thing-described-by.org/

i.e. accepting that the LCCN Permalink URI does identify the metadata record, then for the metadata record identified by the URI

http://lccn.loc.gov/2003556443

the thing described by the record is identified by the URI

http://thing-described-by.org?http://lccn.loc.gov/2003556443

The following comment is posted on behalf of Ray Denenberg, Library of Congress:

I want to address the issue raised about the use of the info: URI scheme for LCCNs.

The LCCN 2003556443 is an LC identifier for a bibliographic record. The identifier "info:lccn/2003556443" is a URI that identifies that bibliographic record. It is argued that the info: namespace, info:lccn, is redundant because the URI http://lccn.loc.gov/2003556443 also identifies the same bibliographic record. There are a few things to point out though.

1. It doesn't. http://lccn.loc.gov/2003556443 does not identify the bibliographic record, it identifies a particular representation of that record.

There are four representations available:

http://lccn.loc.gov/2003556443/marcxml :identifies the MARCXML representation.

http://lccn.loc.gov/2003556443/mods : identifies the MODS representation.

http://lccn.loc.gov/2003556443/dc :identifies the Dublin Core representation.

http://lccn.loc.gov/2003556443 : identifies the human-display representation.

"info:lccn/2003556443" does not identify any particular representation of the record. It is an abstract identifier for the record.

2. lccn permalink is for LC database records only.
http://lccn.loc.gov/ is a valid URL only if the record identified by the LCCN is accessible via the LC Online Catalog. info:lccn/ is a valid URI for any LCCN.

As stated in the LC Permalink FAQ, there are specific classes of records for which LCCNs have been assigned that are not found the LC Online Catalog but which can be accessed in other databases. Examples:

(A) Blocks of LCCNs are assigned by LC to CONSER serial records cataloged in OCLC's WorldCat and distributed by LC's Cataloging Distribution Service. These CONSER records can be accessed online through WorldCat and may also exist in the local catalogs of CONSER participants.

The record assigned the LCCN 2006243115, for example, was cataloged as a CONSER contribution by NLM. It does not reside in the LC Online Catalog, but can be seen with its LCCN identifier both in WorldCat's subscription-accessed catalog and in NLM's LocatorPlus: http://locatorplus.gov/cgi-bin/Pwebrecon.cgi?db=local&v4=1&Search_Arg=2006243115&Search_Code=GKEY&CNT=25 . Since LCCN 2006243115 is not accessible in the LC Online Catalog, http://lccn.loc.gov/2006243115 is not a valid URL and will not retrieve a representation of the bibliographic record identified by this LCCN. However, the URI info:lccn/2006243115 is a valid URI and does identify the bibliographic record. No claim is made by the existence of this URI that any particular representation of it can be retrieved, nor that a record exists at any particular location.

(B) "Preassigned card numbers" are a class of LCCNs assigned to records for prepublication books by American publishers who are not part of the Library's Cataloging-in-Publication (CIP) program. These LCCNs are usually printed on publications, but the Library does not automatically create records for these items. The LCCN, however, is generally included in cataloging records created by other institutions.

LCCN 2006900006, for example, was assigned as part of the Library's PCN program. You can find this number included in the item's bibliographic record found in the catalog of the University of North Carolina: http://webcat.lib.unc.edu/search?/i2006900006/i2006900006/1%2C1%2C1%2CB/marc&FF=i2006900006


http://lccn.loc.gov/2006900006 is not a valid URl, but info:lccn/2006900006 identifies the conceptual bibliographic record, although no claim is made by the existence of this URI that such a record has been created. And no specific location is distinguished by the URI, for example if the record exists somewhere in addition to UNC.

3. There are LCCNs for older records that currently exist only as paper catalog cards, which are intended to be added to the LC Online Catalog at some future, undetermined date. For these, similarly, network-retrievability is a moot point. Yet the info URI still identifies it.

The Library of Congress began LCCN assignment in 1898. There are, therefore, several thousand older records, which currently exist only as printed catalog cards and have not yet been converted to machine-readable form. For these records, network-retrievability will be moot for an undetermined period of time. Yet the info:lccn URI still identifies these records.

For example, LCCN 75905704 (Autobiography of a Marathi journalist) is a record created by the Library in 1975. (The paper catalog card may, at some future date, be converted to an online record in the LC Online Catalog.) The URI info:lccn/75905704 identifies the current paper record. No claim is made by this URI as to the existence of a machine-readable representation.

Ray Denenberg

The comments to this entry are closed.

About

Search

Loading
eFoundations is powered by TypePad