February 26, 2008

Preserving the ABC of scholarly communication

Somewhat belatedly, I've been re-reading Lorcan Dempsey's post from October last year, Quotes of the day (and other days?): persistent academic discourse, in which he ponders the role of academic blogs in scholarly discourse and the apparent lack of engagement by institutions in thinking about their preservation.

I like Grainne Conole's characterisation of the place of blogging in scholarly communication:

  • Academic paper: reporting of findings against a particular narrative, grounded in the literature and related work; style – formal, academic-speak
  • Conference presentation: awareness raising of the work, posing questions and issues about the work, style – entertaining, visual, informal
  • Blogging – snippets of the work, reflecting on particular issues, style – short, informal, reflective

(even though it would have been better in alphabetical order! :-) ) and I'm tempted to wonder whether and how this characterisation will change over the next few years, as blogging continues to grow in importance as a communication medium.

Lorcan ends with:

Universities and university libraries are recognizing that they have some responsibility to the curation of the intellectual outputs of their academics and students. So far, this has not generally extended to thinking about blogs. What, if anything, should the Open University or Harvard be doing to make sure that this valuable discourse is available to future readers as part of the scholarly record?

As I argued in my most recent post about repositories, I suspect that most academics would currently expect to host their blogs outside their institution.  (Note that I'm hypothesising here, since I haven't asked any real academics this question - however, the breadth and depth of external blog services seems so overwhelming that it would be hard for institutions to try to compel their academics to use an institutional blogging service IMHO). This leaves institutions (or anyone else for that matter) that want to curate the blogging component of their intellectual output with a problem.  Somehow, they have to aggregate their part of the externally held scholarly record into an internal form, such that they can curate it.

I don't see this as an impossible task - though clearly, there is a challenge here in terms of both technology and policy.

In the context of the debate about institutional repositories, my personal opinion is that this situation waters down the argument that repositories have to be institutional because that is the only way in which the scholarly record can be preserved.  Sorry, I just don't buy it.


I don't get what is so hard about pulling the blogs down from various external services? All nicely packaged (APP) unlike Learning Objects which are all over the place. Clean APIs on Blog services and a time or event based pull of the item from the blog service down to a versioned IR? The only stumbling block seems to be AuthN/AuthZ with lecturers as they launch a new blog and have no desire to let the institution known. So maybe policy issues? I'm sure lecturers would like to have a back-up copy in case Wordpress is bought up by AOL and clean-erased? Who else will do the preserving (ergo future surfacing of scholarly content) other than the IR?

Who has made the argument you cite, Mr. Powell, and in what context? I smell a potential straw man. I run an IR, and I am completely baffled by the idea that I am the sole preserver of the digital scholarly record.

That said, I may be uniquely situated, in some contexts. I suspect you may have caught wind of a rights dilemma surrounding disciplinary repositories: namely, that some publishers permit IR but not DR deposit. A frustrating situation, but hardly cause for me to ignore or scorn the work of my DR colleagues.

As for blogs, I've looked into the question, and one of the nastier problems is one of third-party designs, which are copyrightable independently of blog content. I once approached SixApart with regard to an academic blog hosted on TypePad. They affirmed to me that yes, they owned the design, and no, they would not permit its archival. As I didn't own the blog, I didn't have any obvious way to bypass this restriction, except to ask the blog owner to switch designs, and you can imagine how well that would have gone over.

I hope that the Atom Publishing Protocol plus SWORD, a little elbow grease, and a blog design intentionally given to the public domain (which I would be happy to work on myself!) could help solve this problem.

I would not argue that IRs are the only (or best) way that the digital scholarly record of academic papers can be preserved over the long term. Is anyone proposing this? I think initiatives such as JSTOR are much better positioned to do that, with one of its aim "to preserve a record of scholarship for posterity." Also national libraries, OCLC, and co-operative ventures such as LOCKSS are taking a role. In my view IRs are meeting the short to medium term requirements of authors, institutions and end users i.e. readers.

I would be interested to know more about the requirement for preservation of conference presentations and blogs. Is there a requirement? OK there might be long term interest in the output of particular individuals, and in a sample of what is being produced ... but surely the bulk of output in these categories is only of interest in the short term??

Re: do repositories have a preservation role? Yes, I think that some are arguing for it. For example, in response to my previous post, Chris Rusbridge said, "I am not sure this is quite enough. My friends in Informatics ask why they should bother putting their papers i a repository when they already have them on their web sites. The difference is permanence." Perhaps I am misunderstanding?

My personal view is that we would be better off separating our functional requirements around open access and scholarly communication from those around preservation. The solutions may be different IMHO.

Re: why curate blog output? It seems to me that if we are seeing anecdotal evidence of the growing importance of blogging to scholarly communication (which I think we are) then it seems to me that we need to recognise a similar growth in the importance of preserving that material as part of the scholarly record. I don't see how one can simply dismiss it as having only short term value.

I'm in 100% agreement that blogging is maturing into a method of scholarly communication but it remains one of those ephemeral item types that, for all the reasons mentioned above, are hard to curate.

But why not tie up published scholarly communication, data and informal output like blogs via the institution if it's possible? If we're not talking about institutional collection and preservation (for this, or any other item), what are we talking about?-

a: Subject based curation? Because Learned Societies and subject based organisations, across disciplines, have the funds, capacity and inclination to do this?

b: Third party curation? Are blog hosts or publishers going to do this? If so, will we be buying back access to the blogs produced by the scholars at our insitutions - sound familiar?

c: Individuals linking up and preserving their work? I think this relates to the work that NCUACS do cataloguing the journals and papers of scientists, usually posthumously - which tells you it's unlikely to be the author curating their own work...

I'd say that most academics understand institutional requirements - they may not like them, but you can't please everyone. If we need to curate blog output, and figure out how to do this technically, I don't see that connecting it to the body of research via the instutition is wrong.

Just to clarify, which I obviously didn't manage to do first or second time around(!), I have no particular problem with the institution being the home of preservation activity - which is not to say that it might not happen elsewhere as well or instead.

One of the problems with repositories (as I see it) is that the 'surfacing content on the Web' function and the 'preservation' function have tended to become conflated in a single service - the repository! We need to unbundle those two functions completely and then ask, how is each function best performed.

The 'surfacing content on the Web' function is the short term imperative for open access. Given that this function is (IMHO) tightly coupled with the social networking aspects of scholarly communication, I'm arguing that we should therefore ask ourselves whether the institution is best home for it.

Frankly, I don't much care where preservation takes place but as Rachel Heery argued, there are certainly possibilities outside the institution.

