The Nature of OAI, identifiers and linked data
Records are available in two flavours - simple Dublin Core (as mandated by the protocol) and Prism Aggregator Message (PAM), a format that Nature also use to enhance their RSS feeds. (Thanks to Scott Wilson and TicTocs for the Jopml listing).
Taking a quick look at their simple DC records (example) and their PAM records (example) I can't help but think that they've made a mistake in placing a doi: URI rather than an http: URI in the dc:identifier field.
Why does this matter?
Imagine you are a common-or-garden OAI aggregator. You visit the Nature OAI-PMH interface and you request some records. You don't understand the PAM format so you ask for simple DC. So far, so good. You harvest the requested records. Wanting to present a clickable link to your end-users, you look to the dc:identifier field only to find a doi: URI:
If you understand the doi: URI scheme you are fine because you'll know how to convert it to something useful:
But if not, you are scuppered! You'll just have to present the doi: URI to the end-user and let them work it out for themselves :-(
Much better for Nature to put the http: URI form in dc:identifier. That way, any software that doesn't understand DOIs can simply present the http: URI as a clickable link (just like any other URL). Any software that does understand DOIs, and that desperately wants to work with the doi: URI form, can do the conversion for itself trivially.
Of course, Nature could simply repeat the dc:identifier field and offer both the http: URI form and the doi: URI form side-by-side. Unfortunately, this would run counter the the W3C recommendations not to mint multiple URIs for the same resource (section 2.3.1 of the Architecture of the World Wide Web):
A URI owner SHOULD NOT associate arbitrarily different URIs with the same resource.
On balance I see no value (indeed, I see some harm) in surfacing the non-HTTP forms of DOI:
both of which appear in the PAM record (somehwat redundantly?).
The http: URI form
is sufficient. There is no technical reason why it should be perceived as a second-class form of the identifier (e.g. on persistence grounds).
I'm not suggesting that Nature gives up its use of DOIs - far from it. Just that they present a single, useful and usable variant of each DOI, i.e. the http: URI form, whenever they surface them on the Web, rather than provide a mix of the three different forms currently in use.
This would be very much in line with recommended good practice for linked data:
- Use URIs as names for things
- Use HTTP URIs so that people can look up those names.
- When someone looks up a URI, provide useful information.
- Include links to other URIs. so that they can discover more things.