In his response to my previous post, Freedom, Google-juice and institutional mandates, Chris Rusbridge responded using one of his Ariadne articles as an illustrative example.
By way of, err... reward, I want to take a quick look (in what I'm going to broadly call 'usability' terms) at the way in which that article is handled by the Edinburgh Research Archive (ERA). Note that I'm treating the ERA as an example here - I don't consider it to be significantly different to other institutional repositories and, on that basis, I assume that most of what I am going to say will also apply to other repository implementations.
Much of this is basic Web 101 stuff...
The original Ariadne article is at http://www.ariadne.ac.uk/issue46/rusbridge/ - an HTML document containing embedded links to related material in the References section (internally linked from the relevant passage in the text). The version deposited into ERA is a 9 page PDF snapshot of the original article. I assume that PDF has been used for preservation reasons, though I'm not sure. Hypertext links in the original HTML continue to work in the PDF version.
So far, so good. I would naturally tend to assume that the HTML version is more machine-readable than the PDF version and on that basis is 'better', though I admit that I can't provide solid evidence to back up that statement.
The repository 'jump-off' page for the article is at http://www.era.lib.ed.ac.uk/handle/1842/1476 though the page itself tells us (in a human-readable way) that we should use http://hdl.handle.net/1842/1476 for citation purposes.
So we already have 4 URLs for this article and no explicit machine-readable knowledge that they all identify the same resource. Further, the URLs that 15 years of using a Web browser lead me to use most naturally (those of the jump-off page, the original Ariadne article or the PDF file) are not the one that the page asks me to use for citation purposes. So, in Web usability terms, I would most naturally bookmark (e.g. using del.icio.us) the wrong URL for this article and where different scholars choose to bookmark different URLs, services like del.icio.us are unlikely to be able to tell that they are referring to the same thing (recent experience of Google Scholar notwithstanding).
OK, so now let's look more closely at the jump-off page...
Firstly, what is the page title (as contained in the HTML <title> tag)? Something useful like "Excuse Me... Some Digital Preservation Fallacies?". No, it's "Edinburgh Research Archive : Item 1842/1476". Nice!? Again, if I bookmark this page in del.icio.us, that is the label is going to appear next to the URL, unless I manually edit it.
Secondly, what other metadata and/or micro-formats are embedded into this page? All that nice rich Dublin Core metadata that is buried away inside the repository? Nah. Nothing. A big fat zilch. Not even any <meta name="keywords" ...> stuff. I mean, come on. The information is there on the page right in front of me... it's just not been marked up using even the most basic of HTML tags. Most university Web site managers would get shot for turning out this kind of rubbish HTML.
Note I'm not asking for embedded Dublin Core metadata here - I'm asking for useful information to be embedded in useful (and machine-readable) ways where there are widely accepted conventions for how to to that.
So, let's look at those human-readable keywords again. Why aren't they hyperlinked to all all other entries in ERA that use those keywords (in the way that Flickr and most other systems do with tags)? Yes, the institutional repository architectural approach means that we'd only get to see other stuff in ERA, not all that useful I'll grant you, but it would be better than nothing.
Similarly, what about linking the author's name to all other entries by that author. Ditto with the publisher's name. Let's encourage a bit of browsing here shall we? This is supposed to be about resource discovery after all!
So finally, let's look at the links on the page. There at the bottom is a link labelled 'View/Open' which takes me to the PDF file - phew, the thing I'm actually looking for! Not the most obvious spot on the page but I got there in the end. Unfortunately, I assume that every single other item in ERA uses that exact same link text for the PDF (or other format) files. Link text is supposed to indicate what is being linked to - it's a kind of metadata for goodness sake.
And then, right at the bottom of the page, there's a button marked "Show full item record". I have no idea what that is but I'll click on it anyway. Oh, it's what other services call "Additional information". But why use an HTML form button to hide a plain old hypertext link? Strange or what?
OK, I apologise... I've lapsed into sarcasm for effect. But the fact remains that repository jump-off pages are potentially some of the most important Web pages exposed by universities (this is core research business after all) yet they are nearly always some of the worst examples of HTML to be found on the academic Web. I can draw no other conclusion than that the Web is seen as tangential in this space.
I've taken 10 minutes to look at these pages... I don't doubt that there are issues that I've missed. Clearly, if one took time to look around at different repositories one would find examples that were both better and worse (I'm honestly not picking on ERA here... it just happened to come to hand). But in general, this stuff is atrocious - we can and should do better.