Some (more) thoughts on repositories
I attended a meeting of the JISC Repositories and Preservation Advisory Group (RPAG) in London a couple of weeks ago. Part of my reason for attending was to respond (semi-formally) to the proposals being put forward by Rachel Heery in her update to the original Repositories Roadmap that we jointly authored back in April 2006.
It would be unfair (and inappropriate) for me to share any of the detail in my comments since the update isn't yet public (and I suppose may never be made so). So other than saying that I think that, generally speaking, the update is a step in the right direction, what I want to do here is rehearse the points I made which are applicable to the repositories landscape as I see it more generally. To be honest, I only had 5 minutes in which to make my comments in the meeting, so there wasn't a lot of room for detail in any case!
Broadly speaking, I think three points are worth making. (With the exception of the first, these will come as no surprise to regular readers of this blog.)
There may well be some disagreement about this but it seems to me that the collection of material we are trying to put into institutional repositories of scholarly research publications is a reasonably well understood and measurable corpus. It strikes me as odd therefore that the metrics we tend to use to measure progress in this space are very general and uninformative. Numbers of institutions with a repository for example - or numbers of papers with full text. We set targets for ourselves like, "a high percentage of newly published UK scholarly output [will be] made available on an open access basis" (a direct quote from the original roadmap). We don't set targets like, "80% of newly published UK peer-reviewed research papers will be made available on an open access basis" - a more useful and concrete objective.
As a result, we have little or no real way of knowing if are actually making significant progress towards our goals. We get a vague feel for what is happening but it is difficult to determine if we are really succeeding.
Clearly, I am ignoring learning object repositories and repositories of research data here because those areas are significantly harder, probably impossible, to measure in percentage terms. In passing, I suggest that the issues around learning object repositories, certainly the softer issues like what motivates people to deposit, are so totally different from those around research repositories that it makes no sense to consider them in the same space anyway.
Even if the total number of published UK peer-reviewed research papers is indeed hard to determine, it seems to me that we ought to be able to reach some kind of suitable agreement about how we would estimate it for the purposes of repository metrics. Or we could base our measurements on some agreed sub-set of all scholarly output - the peer-reviewed research papers submitted to the current RAE (or forthcoming REF) for example.
A glass half empty view of the world says that by giving ourselves concrete objectives we are setting ourselves up for failure. Maybe... though I prefer the glass half full view that we are setting ourselves up for success. Whatever... failure isn't really failure - it's just a convenient way of partitioning off those activities that aren't worth pursuing (for whatever reason) so that other things can be focused on more fully. Without concrete metrics it is much harder to make those kinds of decisions.
The other issue around metrics is that if the goal is open access (which I think it is), as opposed to full repositories (which are just a means to an end) then our metrics should be couched in terms of that goal. (Note that, for me at least, open access implies both good management and long-term preservation and that repositories are only one way of achieving that).
The bottom-line question is, "what does success in the repository space actually look like?". My worry is that we are scared of the answers. Perhaps the real problem here is that 'failure' isn't an option?
Executive summary: our success metrics around research publications should be based on a percentage of the newly published peer-reviewed literature (or some suitable subset thereof) being made available on an open access basis (irrespective of how that is achieved).
Emphasis on individuals
Across the board we are seeing a growing emphasis on the individual, on user-centricity and on personalisation (in its widest sense). Personal Learning Environments, Personal Research Environments and the suite of 'open stack' standards around OpenID are good examples of this trend. Yet in the repository space we still tend to focus most on institutional wants and needs. I've characterised this in the past in terms of us needing to acknowledge and play to the real-world social networks adopted by researchers. As long as our emphasis remains on the institution we are unlikely to bring much change to individual research practice.
Executive summary: we need to put the needs of individuals before the needs of institutions in terms of how we think about reaching open access nirvana.
Fit with the Web
I written and spoken a lot about this in the past and don't want to simply rehash old arguments. That said, I think three things are worth emphasising:
Global discipline-based repositories are more successful at attracting content than institutional repositories. I can say that with only minimal fear of contradiction because our metrics are so poor - see above :-). This is no surprise. It's exactly what I'd expect to see. Successful services on the Web tend to be globally concentrated (as that term is defined by Lorcan Dempsey) because social networks tend not to follow regional or organisational boundaries any more.
Executive summary: we need to work out how to take advantage of global concentration more fully in the repository space.
Take three guiding documents - the Web Architecture itself, REST, and the principles of linked data. Apply liberally to the content you have at hand - repository content in our case. Sit back and relax.
Executive summary: we need to treat repositories more like Web sites and less like repositories.
On the Web, the discovery of textual material is based on full-text indexing and link analysis. In repositories, it is based on metadata and pre-Web forms of citation. One approach works, the other doesn't. (Hint: I no longer believe in metadata as it is currently used in repositories). Why the difference? Because repositories of research publications are library-centric and the library world is paper-centric - oh, and there's the minor issue of a few hundred years of inertia to overcome. That's the only explanation I can give anyway. (And yes, since you ask... I was part of the recent movement that got us into this mess!).
Executive summary: we need to 1) make sure that repository content is exposed to mainstream Web search engines in Web-friendly formats and 2) make academic citation more Web-friendly so that people can discovery repository content using everyday tools like Google.
Simple huh?! No, thought not...
I realise that most of what I say above has been written (by me) on previous occasions in this blog. I also strongly suspect that variants of this blog entry will continue to appear here for some time to come.