« Real vs. fake sharing | Main | e-Portfolios: an overview of JISC activities »

October 25, 2006

The R word

I've said this before and I'll probably say it again... in the context of the general move towards making scholarly research papers available on an open access basis, repositories are just a means to an end.  They are one way of making such material available on an open access basis, but they are not the only way.  A repository is just a content management system by another name - though, admittedly, one where the content of interest is a collection of scholarly papers.

The important point, at least as far as open access is concerned, is not that such papers are deposited into a repository but that they are made freely available on the Web.

It surprises me therefore that the 'repository' word tends to feature quite prominently in semi-legalistic documents such as the position statements from the UK research councils and the new JISC/SURF model agreement for authors.  Take the model agreement (announced earlier today) as an example.  It says:

The Author retains all other rights with respect to the Article not granted to the Publisher and in particular he can exercise the following rights:


To upload the Article or to grant to the Author’s own institution (or another appropriate organisation) the authorisation to upload the Article, immediately from the date of publication of the journal in which the Article is published (unless that the Author and the Publisher have agreed in writing to a short embargo period, with a maximum of six (6) months):
a) onto the institution’s closed network (e.g. intranet system); and/or
b) onto publicly accessible institutional and/or centrally organised repositories (such
as PubMed Central and other PubMed Central International repositories), provided
that a link is inserted to the Article on the Publisher’s website.

Why does this document refer specifically to the publisher's 'website' and the institution's 'repository'?  It could just as easily have referred to the publisher's 'content management system' and the institution's 'website'.  Who cares?  It's not like there's a legal definition of what a repository is anyway?

What the author needs is the right to make their research freely available on the public Web, not specifically the right to deposit it into a repository.

Note that this is not to take issue with the overall thrust of the model agreement, which is something that in general I agree with.  What worries me is whether we are fixated, in a somewhat unhealthy way, with the word 'repository'.


TrackBack URL for this entry:

Listed below are links to weblogs that reference The R word:


And what's to stop it disappearing from the open Web as quickly as it appeared there?

I don't think focusing on intelligently-curated CMSes (which is essentially what an IR is) is a bad thing. I mean, you can make a print book "freely available" by leaving it on a park bench in the rain, but how much does that help, even in the short term?

As for definitions and suchlike, you probably want to take a look at the NARA/RLG draft trusted-repository audit document, if you haven't already.

The thing that stops stuff "disappearing from the open Web as quickly as it appeared there" is the management of that content. And, yes, right now, repositories in the form we typically know them are a good way of managing that content.

But lets suppose that in five years time we see an alternative, better way of managing our research output - let's say that (very hypothetically!) peer-to-peer technologies are on the rise and have developed sufficiently to support the proper management of academic research materials. At that point it may not make much sense to have a licence that specifically refers to 'deposit in a repository'.

I'm not particularly arguing against the current use of repositories. I'm suggesting that in terms of policy it is better to focus on the end rather than the means.

I agree with Andy. I said something similar in a different context a few weeks back:


The systems we call Repositories bundle together tools that enable me to:

- manage the assignment of URIs to resources within some collection, so that those URIs have some predictable degree of persistence (in the sense that a single URI isn't going to be assigned to one resource today, and to a different resource in three weeks time or two years time)
- manage the process of serving consistent representations of those resources
- manage the process of disseminating descriptions of those resources

(And probably some other functions too.)

I'm not arguing that having the systems we call Repositories to help me perform those functions is not useful - far from it! - but what matters is the functions, and the commitment of an agency to delivering those functions over time, not the fact that an agency happens to be using a system called a Repository to perform them.

In a previous life, I argued that the list of my publications that I maintained (as an XHTML page which I edited using a text editor) together with the set of digital objects hyperlinked from that page (for which I assigned URIs in that subset of my emplyer's URI-space which was delegated to me), and an RSS feed which I generated from the HTML, was, functionally, my personal "repository".

And there are plenty of URIs that were assigned in the past by systems we call Repositories for which I receive HTTP 404 status codes if I attempt to de-reference them now, because the providers abandoned their use of those systems and made no ongoing commitment to those URIs independently of the particular technological system which generated them.

Is the set of documents made available by W3C on its Web site "in a Repository"? Or are they "on the W3C Web site"? How do I know? Does it matter? I have no idea what tools are used to drive the W3C Web site today, and/or how those tools might change in the future. It is the policy statement of organisational commitment at


(and my experience of using W3C-owned URIs over a period of time, I suppose!) which encourages me to trust that those URIs are assigned in a managed way and that representations will be made available consistently over some reasonable period of time, not the particular set of technical mechanisms that are used at any point in time to implement that policy.

Okay, I see where you're coming from now, and I agree -- authors should be free to Do The Right Thing with their own content, whatever that means at the time. Let me try to explain where this weird language may have arisen.

Publishers eyed the creation of disciplinary repositories with dread -- and institutional repositories with disinterest. They banked on scholars' identification with their disciplines over their institutions. As the proprietor of an IR myself, I can't say they were wrong!

As for placement on the open Web, well, that just wasn't on for a lot of them.

So this over-specific language arose because it was the best librarians thought we could get away with, not because it was ideal. With some of the movement e.g. in MIT and the UCal system toward general author rights-retention, perhaps the over-specific language can loosen up a bit; I don't know.

But it didn't come about because of a fixation on IRs, exactly.

Whilst information practitioners and system developers do tend to focus on ‘products’ such as repository software systems, it is the functions and services that are important. I have sometimes portrayed repositories as a ‘set of policies’, drawing on Cliff Lynch's definition of repositories, where he emphasises the significance of ‘services’ rather than a particular software product or type of content. Lynch: 'a university-based institutional repository is a set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members.' (Lynch, 2003 http://www.arl.org/newsltr/226/ir.html).

Just as the ‘library’ implies and includes a range of business functions, so the ‘repository’ is a quite useful focus for a range of information management issues both within and beyond institutions.

The comments to this entry are closed.



eFoundations is powered by TypePad