October 14, 2009

Open, social and linked - what do current Web trends tell us about the future of digital libraries?

About a month ago I travelled to Trento in Italy to speak at a Workshop on Advanced Technologies for Digital Libraries organised by the EU-funded CACOA project.

My talk was entitled "Open, social and linked - what do current Web trends tell us about the future of digital libraries?" and I've been holding off blogging about it or sharing my slides because I was hoping to create a slidecast of them. Well... I finally got round to it and here is the result:

Like any 'live' talk, there are bits where I don't get my point across quite as I would have liked but I've left things exactly as they came out when I recorded it. I particularly like my use of "these are all very bog standard... err... standards"! :-)

Towards the end, I refer to David White's 'visitors vs. residents' stuff, about which I note he has just published a video. Nice one.

Anyway... the talk captures a number of threads that I've been thinking and speaking about for the last while. I hope it is of interest.

September 16, 2009

Edinburgh publish guidance on research data management

The University of Edinburgh has published some local guidance about the way that research data should be managed, Research data management guidance, covering How to manage research data and Data sharing and preservation, as well as detailing local training, support and advice options.

One assumes that this kind of thing will become much more common at universities over the next few years.

Having had a very quick look, it feels like the material is more descriptive than prescriptive - which isn't meant as a negative comment, it just reflects the current state of play. The section on Data documentation & metadata for example, gives advice as simple as:

Have you created a "readme.txt" file to describe the contents of files in a folder? Such a simple act can be invaluable at a later date.

but also provides a link to the UK Data Archive's guidance on Data Documentation and Metadata, which at first sight appears hugely complex. I'm not sure what your average research will make of it?

(In passing, I note that the UKDA seem to be promoting the use of the Data Documentation Initiative standard at what they call the 'catalogue' level, a standard that I've not come across before but one that appears to be rooted firmly outside the world of linked data, which is a shame.)

Similarly, the section on Methods for data sharing lists a wide range of possible options (from "posting on a University website" thru to "depositing in a data repository") without being particularly prescriptive about which is better and why.

(As a second aside, I am continually amazed by this firm distinction in the repository world between 'posting on the website' and 'depositing in a repository' - from the perspective of the researcher, both can, and should, achieve the same aims, i.e. improved management, more chance of persistence and better exposure.)

As we have found with repositories of research publications, it seems to me that research data repositories (the Edinburgh DataShare in this case) need to hide much of this kind of complexity, and do most of the necessary legwork, in order to turn what appears to be a simple and obvious 'content management' workflow (from the point of view of the individual researcher) into a well managed, openly shared, long term resource for the community.

August 20, 2009

What researchers think about data preservation and access

There's an interesting report in the current issues of Ariadne by Neil Beagrie, Robert Beagrie and Ian Rowlands, Research Data Preservation and Access: The Views of Researchers, fleshing out some of the data behind the UKRDS Report, which I blogged about a while back.

I have a minor quibble with the way the data has been presented in the report, in that it's not overly clear how the 179 respondents represented in Figure 1 have been split across the three broad areas (Sciences, Social Sciences, and Arts and Humanities) that appear in subsequent figures. One is left wondering how significant the number of responses in each of the 3 areas was?  I would have preferred to see Figure 1 organised in such a way that the 'departments and faculties' were grouped more obviously into the broad areas.

That aside, I think the report is well worth reading.  I'll just highlight what the authors perceive to be the emerging themes:

  • It is clear that different disciplines have different requirements and approaches to research data.
  • Current provision of facilities to encourage and ensure that researchers have data stores where they can deposit their valuable data for safe-keeping and for sharing, as appropriate, varies from discipline to discipline.
  • Local data management and preservation activity is very important with most data being held locally.
  • Expectations about the rate of increase in research data generated indicate not only higher data volumes but also an increase in different types of data and data generated by disciplines that have not until recently been producing volumes of digital output.
  • Significant gaps and areas of need remain to be addressed.

The Findings of the Scoping Study and Research Data Management Workshop (undertaken at the University of Oxford and part of the work that infomed the Ariadne article) provides an indication of the "top requirements for services to help [researchers] manage data more effectively":

  • Advice on practical issues related to managing data across their life cycle. This help would range from assistance in producing a data management/sharing plan; advice on best formats for data creation and options for storing and sharing data securely; to guidance on publishing and preserving these research data.
  • A secure and user-friendly solution that allows storage of large volume of data and sharing of these in a controlled fashion way allowing fine grain access control mechanisms.
  • A sustainable infrastructure that allows publication and long-term preservation of research data for those disciplines not currently served by domain specific services such as the UK Data Archive, NERC Data Centres, European Bioinformatics Institute and others.
  • Funding that could help address some of the departmental challenges to manage the research data that are being produced.

Pretty high level stuff so nothing particularly surprising there. It seems to me that some work drilling down into each of these areas might be quite useful.

June 04, 2009

Open government data

A piece in today's Guardian, UK set to follow successful US data method, took me to a request for feedback by Richard Stirling on the Cabinet Office blog on the issues around creating the UK equivalent of the US data.gov service.

The blog entry is a few days old so apologies if you've already seen it, but one advantage of coming late to the discussion is that there is an interesting thread of comments on the post, in particular around the value/complexity of RDF vs. other data encoding formats.

Worth a look, and worth taking the time to comment if you can. I think this is a useful and interesting development and something to be encouraged.

April 07, 2009

OKCon 2009

While I probably do spend longer than is healthy in front of a PC on a typical weekend, I have to admit to a fairly high level of resistance to attending "work-related" events at weekends, especially if travel is involved. My Saturdays are for friends, footy, films, & music, possibly accompanied by beer, ideally in some combination.

But (in the absence of any proper football) I temporarily suspended the SafFFFM rule the weekend before last and attended the Open Knowledge Conference, held at UCL. The programme was a mix of themed presentation sessions and an "Open Spaces" session based on contributions from attendees.

The morning session featured three presentations from people working in the development/aid sector. Mark Charmer talked about AKVO, and its mission to the facilitate connections between funders and projects in the area of water and sanitation, and to streamline reporting by projects (through support for submissions of updates by SMS). Vinay Gupta described the use of wiki technology to build Appropedia, a collection of articles on "appropriate technology" and related aid/development issues, including project histories and detailed "how-to"-style information. The third session was a collaboration between Karin Christiansen, on the Publish What You Fund campaign to promote greater access to information about aid, and Simon Parrish on the work of Aidinfo to develop standards for the sharing of such information.

One recurring theme in these presentations was that of valuable information - from records of practical project experience "on the ground" to records of funding by global agencies - being "locked away" from, or at least only partially accessible to, the parties who would most benefit from it. The other fascinating (to me, at least) element was the emphasis on the growing ubiquity of mobile technology: while I'm accustomed to this in the UK, I was still quite taken aback by the claim (I think, by Mark) that in the near future there will be large sections of the world's population who have access to a mobile phone, but not to a toilet.

The main part of the day was dedicated to the "Open Spaces" session of short presentations. Initially, IIRC, these had been programmed as two parallel sessions in which the speakers were allocated 10 minutes each. On the day, the decision was taken to merge them into a single session with (nearly 20, I think?) speakers delivering very short "lightning" talks. We were offered the opportunity to vote on this, I hasten to add, and at the time avoiding missing out on contributions had seemed like a Good Idea, if time permitted. But with hindsight, I'm not sure it was the right choice: it led to a situation in which speakers had to deliver their content in less time than they had anticipated (and some adjusted better than others), there was little time for discussion, and the pace and diversity of the contributions, some slightly technical, but mostly focusing more on social/cultural aspects, did make it rather difficult for me to identify common threads.

The next slot was dedicated to the relationship between Open Data and Linked Data and the Semantic Web, with short, largely non-technical, presentations by Tom Scott of the BBC, Jeni Tennison, and Leigh Dodds of Talis. Maybe it was just because I was familiar with the topic, but it felt to me that this part of the day worked well, and the cohesive theme enabled speakers to build on each other's contributions.

I thought Tom's presentation of the BBC's work on linked data was one of the best I've seen on that topic: he managed to cover a range of technical topics in very accessible terms, all in fifteen minutes. (I see Tom has posted his slides and notes on his weblog.) Jeni described her work with RDFa on the London Gazette. Leigh pursued an aquatic metaphor for RDF - triple as recombinant molecule - and semantic web applications, and also announced the launch of a Talis data hosting scheme which they are calling the Talis Connected Commons, under which public domain datasets of up to 50 million triples can be hosted for free on the Talis Platform. (I noticed this also got an enthusiastic write-up on Read Write Web).

Although I quite enjoyed the linked data talks, it's probably true to say that - Leigh's announcement aside - they didn't really introduce me to anything I didn't know already - but there again, I probably wasn't the primary target audience.

The day ended with a presentation by David Bollier, author of Viral Spiral, on the "sharing economy". Unfortunately, things were over-running slightly at that point, and I only caught the first few minutes before I had to leave for my train home - which was a pity as I think that session probably did consolidate some of the issues related to business models which had been touched on in some of the short talks.

Overall, I suppose I came away feeling the event might have benefited from a slightly tighter focus, maybe building around the content of the two themed sessions. Having said that, I recognise that the call for contributions had been explicitly very "open", and the event did attract a very mixed audience, many probably with quite different expectations from my own! :-)

March 20, 2009

Unlocking Audio

I spent the first couple of days this week at the British Library in London, attending the Unlocking Audio 2 conference.  I was there primarily to give an invited talk on the second day.

You might notice that I didn't have a great deal to say about audio, other than to note that what strikes me as interesting about the newer ways in which I listen to music online (specifically Blip.fm and Spotify) is that they are both highly social (almost playful) in their approach and that they are very much of the Web (as opposed to just being 'on' the Web).

What do I mean by that last phrase?  Essentially, it's about an attitude.  It's about seeing being mashed as a virtue.  It's about an expectation that your content, URLs and APIs will be picked up by other people and re-used in ways you could never have foreseen.  Or, as Charles Leadbeater put it on the first day of the conference, it's about "being an ingredient".

I went on to talk about the JISC Information Environment (which is surprisingly(?) not that far off its 10th birthday if you count from the initiation of the DNER), using it as an example of digital library thinking more generally and suggesting where I think we have parted company with the mainstream Web (in a generally "not good" way).  I noted that while digital library folks can discuss identifiers forever (if you let them!) we generally don't think a great deal about identity.  And even where we do think about it, the approach is primarily one of, "who are you and what are you allowed to access?", whereas on the social Web identity is at least as much about, "this is me, this is who I know, and this is what I have contributed". 

I think that is a very significant difference - it's a fundamentally different world-view - and it underpins one critical aspect of the difference between, say, Shibboleth and OpenID.  In digital libraries we haven't tended to focus on the social activity that needs to grow around our content and (as I've said in the past) our institutional approach to repositories is a classic example of how this causes 'social networking' issues with our solutions.

I stole a lot of the ideas for this talk, not least Lorcan Dempsey's use of concentration and diffusion.  As an aside... on the first day of the conference, Charles Leadbeater introduced a beach analogy for the 'media' industries, suggesting that in the past the beach was full of a small number of large boulders and that everything had to happen through those.  What the social Web has done is to make the beach into a place where we can all throw our pebbles.  I quite like this analogy.  My one concern is that many of us do our pebble throwing in the context of large, highly concentrated services like Flickr, YouTube, Google and so on.  There are still boulders - just different ones?  Anyway... I ended with Dave White's notions of visitors vs. residents, suggesting that in the cultural heritage sector we have traditionally focused on building services for visitors but that we need to focus more on residents from now on.  I admit that I don't quite know what this means in practice... but it certainly feels to me like the right direction of travel.

I concluded by offering my thoughts on how I would approach something like the JISC IE if I was asked to do so again now.  My gut feeling is that I would try to stay much more mainstream and focus firmly on the basics, by which I mean adopting the principles of linked data (about which there is now a TED talk by Tim Berners-Lee), cool URIs and REST and focusing much more firmly on the social aspects of the environment (OpenID, OAuth, and so on).

Prior to giving my talk I attended a session about iTunesU and how it is being implemented at the University of Oxford.  I confess a strong dislike of iTunes (and iTunesU by implication) and it worries me that so many UK universities are seeing it as an appropriate way forward.  Yes, it has a lot of concentration (and the benefits that come from that) but its diffusion capabilities are very limited (i.e. it's a very closed system), resulting in the need to build parallel Web interfaces to the same content.  That feels very messy to me.  That said, it was an interesting session with more potential for debate than time allowed.  If nothing else, the adoption of systems about which people can get religious serves to get people talking/arguing.

Overall then, I thought it was an interesting conference.  I suspect that my contribution wasn't liked by everyone there - but I hope it added usefully to the debate.  My live-blogging notes from the two days are here and here.

March 05, 2009

A National Research Data Service for the UK?

I attended the A National Research Data Service for the UK? meeting at the Royal Society in London last week and my live-blogged notes are available for those who want more detail.  Chris Rusbridge also blogged the day on the Digital Curation Blog - session 1, session 2, session 3 and session 4.  FWIW, I think that Chris's posts are more comprehensive and better than my live-blogged notes.

The day was both interesting and somewhat disappointing...

Interesting primarily because of the obvious political tension in the room (which I characterised on Twitter as a potential bun-fight between librarians and the rest but which in fact is probably better summed up as a lack of shared agreement around centralist (discipline-based) solutions vs. institutional solutions).

Disappointing because the day struck me more as a way of presenting a done-deal than as a real opportunity for debate.

The other thing that I found annoying was the constant parroting of the view that "researchers want to share their data openly" as though this is an obvious position.  The uncomfortable fact is that even the UKRDS report's own figures suggest that less than half (43%) of those surveyed "expressed the need to access other researchers' data" - my assumption therefore is that the proportion currently willing to share their data openly will be much smaller.

Don't take this as a vote against open access, something that I'm very much in favour of.  But, as we've found with eprint archives, a top-down "thou shalt deposit because it is good for you" approach doesn't cut it with researchers - it doesn't result in cultural change.  Much better to look for, and actively support, those areas where open sharing of data occurs naturally within a community or discipline, thus demonstrating its value to others.

That said, a much more fundamental problem facing the provision of collaborative services to the research community is that funding happens nationally but research happens globally (or at least across geographic/funding boundaries) - institutions are largely irrelevant whichever way you look at it [except possibly as an agent of long term preservation - added 6 March 2009].  Resolving that tension seems paramount to me though I have no suggestions as to how it can be done.  It does strike me however that shared discipline-based services come closer to the realities of the research world than do institutional services.

February 10, 2009

Freedom, Google-juice and institutional mandates

[Note: This entry was originally posted on the 9th Feb 2009 but has been updated in light of comments.]

An interesting thread has emerged on the American Scientist Open Access Forum based on the assertion that in Germany "freedom of research forbids mandating on university level" (i.e. that a mandate to deposit all research papers in an institutional repository (IR) would not be possible legally).  Now, I'm not familiar with the background to this assertion and I don't understand the legal basis on which it is made.  But it did cause me to think about why there might be an issue related to academic freedom caused by IR deposit mandates by funders or other bodies.

In responding to the assertion, Bernard Rentier says:

No researcher would complain (and consider it an infringement upon his/ her academic freedom to publish) if we mandated them to deposit reprints at the local library. It would be just another duty like they have many others. It would not be terribly useful, needless to say, but it would not cause an uproar. Qualitatively, nothing changes. Quantitatively, readership explodes.

Quite right. Except that the Web isn't like a library so the analogy isn't a good one.

If we ignore the rarefied, and largely useless, world of resource discovery based on the OAI-PMH and instead consider the real world of full-text indexing, link analysis and, well... yes, Google then there is a direct and negative impact of mandating a particular place of deposit. For every additional place that a research paper surfaces on the Web there is a likely reduction in the Google-juice associated with each instance caused by an overall diffusion of inbound links.

So, for example, every researcher who would naturally choose to surface their paper on the Web in a location other than their IR (because they have a vibrant central (discipline-based) repository (CR) for example) but who is forced by mandate to deposit a second copy in their local IR will probably see a negative impact on the Google-juice associated with their chosen location.

Now, I wouldn't argue that this is an issue of academic freedom per se, and I agree with Bernard Rentier (earlier in his response) that the freedom to "decide where to publish is perfectly safe" (in the traditional academic sense of the word 'publish'). However, in any modern understanding of 'to publish' (i.e. one that includes 'making available on the Web') then there is a compromise going on here.

The problem is that we continue to think about repositories as if they were 'part of a library', rather than as a 'true part of the fabric of the Web', a mindset that encourages us to try (and fail) to redefine the way the Web works (through the introduction of things like the OAI-PMH for example) and that leads us to write mandates that use words like 'deposit in a repository' (often without even defining what is meant by 'repository') rather than 'make openly available on the Web'.

In doing so I think we do ourselves, and the long term future of open access, a disservice.

Addendum (10 Feb 2009): In light of the comments so far (see below) I confess that I stand partially corrected.  It is clear that Google is able to join together multiple copies of research papers.  I'd love to know the heuristics they use to do this and I'd love to know how successful those heuristics are in the general case.  Nonetheless, on the basis that they are doing it, and on the assumption that in doing so they also combine the Google juice associated with each copy, I accept that my "dispersion of Google-juice" argument above is somewhat weakened.

There are other considerations however, not least the fact that the Web Architecture explicitly argues against URI aliases:

Good practice: Avoiding URI aliases
A URI owner SHOULD NOT associate arbitrarily different URIs with the same resource.

The reasons given align very closely to the ones I gave above, though couched in more generic language:

Although there are benefits (such as naming flexibility) to URI aliases, there are also costs. URI aliases are harmful when they divide the Web of related resources. A corollary of Metcalfe's Principle (the "network effect") is that the value of a given resource can be measured by the number and value of other resources in its network neighborhood, that is, the resources that link to it.

The problem with aliases is that if half of the neighborhood points to one URI for a given resource, and the other half points to a second, different URI for that same resource, the neighborhood is divided. Not only is the aliased resource undervalued because of this split, the entire neighborhood of resources loses value because of the missing second-order relationships that should have existed among the referring resources by virtue of their references to the aliased resource.

Now, I think that some of the discussions around linked data are pushing at the boundaries of this guidance, particularly in the area of non-information resources.  Nonetheless, I think this is an area in which we have to tread carefully.  I stand by my original statement that we do not treat scholarly papers as though they are part of the fabric of the Web - we do not link between them in the way we link between other Web pages.  In almost all respects we treat them as bits of paper that happen to have been digitised and the culprits are PDF, the OAI-PMH, an over-emphasis on preservation and a collective lack of imagination about the potential transformative effect of the Web on scholarly communication.  We are tampering at the edges and the result is a mess.

February 06, 2009

Open orienteering

It seems to me that there is now quite a general acceptance of what the 'open access' movement is trying to achieve. I know that not everyone buys into that particular world-view but, for those of us that do, we know where we are headed and most of us will probably recognise it when we get there. Here, for example, is Yishay Mor writing to the open-science mailing list:

I would argue that there's a general principle to consider here. I hold that any data collected by public money should be made freely available to the public, for any use that contributes to the public good. Strikes me as a no-brainer, but of course - we have a long way to go.

A fairly straight-forward articulation of the open access position and a goal that I would thoroughly endorse.

The problem is that we don't always agree as a community about how best to get there.

I've been watching two debates flow past today, both showing some evidence of lack of consensus in the map reading department, though one much more long-standing than the other. Firstly, the old chestnut about the relative merits of central repositories vs. institutional repositories (initiated in part by Bernard Rentier's blog post, Institutional, thematic or centralised repositories?) but continued on various repository-related mailing lists (you know the ones!). Secondly, a newer debate about whether formal licences or community norms provide the best way to encourage the open sharing of research data by scientists and others, a debate which I tried to sum up in the following tweet:

@yishaym summary of open data debate... OD is good & needs to be encouraged - how best to do that? 1 licences (as per CC) or 2 social norms

It's great what can be done with 140 characters.

I'm more involved in the first than the second and therefore tend to feel more aggrieved at lack of what I consider to be sensible progress. In particular, I find the recurring refrain that we can join stuff back together using the OAI-PMH and therefore everything is going to be OK both tiresome and laughable.

If there's a problem here, and perhaps there isn't, then it is that the arguments and debates are taking place between people who ultimately want the same thing. I'm reminded of Monty Python's Life of Brian:

Brian: Excuse me. Are you the Judean People's Front?
Reg: Fuck off! We're the People's Front of Judea

It's like we all share the same religion but we disagree about which way to face while we are praying. Now, clearly, some level of debate is good. The point at which it becomes not good is when it blocks progress which is why, generally speaking, having made my repository-related architectural concerns known a while back, I try and resist the temptation to reiterate them too often.

Cameron Neylon has a nice summary of the licensing vs. norms debate on his blog. It's longer and more thoughtful than my tweet! This is a newer debate and I therefore feel more positive that it is able to go somewhere. My initial reaction was that a licensing approach is the most sensible way forward but having read through the discussion I'm no longer so sure.

So what's my point? I'm not sure really... but if I wake up in 4 years time and the debate about licensing vs. norms is still raging, as has pretty much happened with the discussion around CRs vs. IRs, I'll be very disappointed.

January 30, 2009

Digital Britain - the future isn't open apparently

The Digital Britain Interim Report was released yesterday:

a plan to secure Britain’s place at the forefront of the global digital economy. The interim report contains more than 20 recommendations, including specific proposals on:

  • next generation networks
  • universal access to broadband
  • the creation of a second public service provider of scale
  • the modernisation of wireless radio spectrum holdings
  • a digital future for radio
  • a new deal for digital content rights
  • enhancing the digital delivery of public services

I haven't read the full report, much of which is about greater roll-out of broadband connectivity, but I have taken a look through section 3, entitled Digital Content [PDF], which is the part that interests me most.

Here's a Wordle of just that section:

Digitalbritain

And a few word counts:

  • open (1) (but not in the context of 'open content')
  • unlawful (12)
  • rights / rightsholders (37)
  • copyright (15)

I'll leave you to draw your own conclusions.... suffice to say, I would have preferred to see at least some discussion about the benefits that open digital content can bring to the economy.

January 22, 2009

Why can't I find a library book in my search engine?

There's a story in today's Guardian, Why you can't find a library book in your search engine, (seen online but I assume that it is also in the paper version) covering the ongoing situation around the licensing of OCLC WorldCat catalog records.  Rob Styles provides some of the background to this, OCLC, Record Usage, Copyright, Contracts and the Law, though, as he notes, he works for Talis which is one of the commercial organisations that stands to benefit from a change in OCLC's approach.

I don't want to comment in too much detail on this story since I freely admit to not having properly done my homework, but I will note that my default position on this kind of issue is that we (yes, all of us) are better off in those cases where data is able to be made available on an 'open' rather than 'proprietary' basis and I think this view of the world definitely applies in this case.

The Guardian story is somewhat simplistic, IMHO, not on the question of 'open' vs. 'closed' but on how easy it would be for such data, assuming that it was to be made openly available, to get into search engines (by which I assume the article really means Google?) in a meaningful way.  Flooding the Web with multiple copies of metadata about multiple copies of books is non-trivial to get right (just think of the issues around sensibly assigning 'http' URIs to this kind of stuff for example) such that link counting, ranking of books vs. other Web resources, and providing access to appropriate copies can be done sensibly.  There has to be some point of 'concentration' (to use Lorcan Dempsey's term) around which such things can happen - whether that is provided by Google, Amazon, Open Library, OCLC, Talis, the Library of Congress or someone else.  Too many points of concentration and you have a problem... or so it seems to me.

December 18, 2008

JISC IE and e-Research Call briefing day

I attended the briefing day for the JISC's Information Environment and e-Research Call in London on Monday and my live-blogged notes are available on eFoundations LiveWire for anyone that is interested in my take on what was said.

Quite an interesting day overall but I was slightly surprised at the lack of name badges and a printed delegate list, especially given that this event brought together people from two previously separate areas of activity. Oh well, a delegate list is promised at some point.  I also sensed a certain lack of buzz around the event - I mean there's almost £11m being made available here, yet nobody seemed that excited about it, at least in comparison with the OER meeting held as part of the CETIS conference a few weeks back.  At that meeting there seemed to be a real sense that the money being made available was going to result in a real change of mindset within the community.  I accept that this is essentially second-phase money, building on top of what has gone before, but surely it should be generating a significant sense of momentum or something... shouldn't it?

A couple of people asked me why I was attending given that Eduserv isn't entitled to bid directly for this money and now that we're more commonly associated with giving grant money away rather than bidding for it ourselves.

The short answer is that this call is in an area that is of growing interest to Eduserv, not least because of the development effort we are putting into our new data centre capability.  It's also about us becoming better engaged with the community in this area.  So... what could we offer as part of a project team? Three things really: 

  • Firstly, we'd be very interested in talking to people about sustainable hosting models for services and content in the context of this call.
  • Secondly, software development effort, particularly around integration with Web 2.0 services.
  • Thirdly, significant expertise in both Semantic Web technologies (e.g. RDF, Dublin Core and ORE) and identity standards (e.g. Shibboleth and OpenID).

If you are interested in talking any of this thru further, please get in touch.

September 30, 2008

Open Science

Via Richard Akerman on Science Library Pad I note that a presentation made to a British Library Board awayday (on 23rd Sept), The Future of Research (Science and Technology), by Carole Goble is now available on Slideshare:

The presentation looks at the way in which scientific and technology-related research is changing, particularly thru the use of the Web to support open, data-driven research - essentially enabling a more immediate, transparent and repeatable approach to science.

The ideas around open science are interesting.  Coincidentally, a few Eduserv bods met with Cameron Neylon yesterday and he talked us thru some of the work going on around blog-driven open labbooks and the like.  Good stuff.  Whatever one thinks about the success or otherwise of institutional repositories as an agent of change in scholarly communication there seems little doubt that the 'open' movement is where things are headed because it is such a strong enabler of collaboration and communication.

Slide 24 of the presentation above introduces the notion that open "methods are scientific commodities".  Obvious really, but something I hadn't really thought about.  I note that there seem to be some potential overlaps here with the approaches to sharing pedagogy between lecturers/teachers enabled by standards such as Learning Design - "pedagogies as learning commodities" perhaps? - though I remain somewhat worried about how complex these kinds of things can get in terms of mark-up languages.

The presentation ends with some thoughts about the impact that this new user-centric (scientist-centric) world of personal research environments has on libraries:

  • We don’t come to the library, it comes to us.
  • We don’t use just one library or one source.
  • We don’t use just one tool!
  • Library services embedded in our toolkits, workbenches, browsers, authoring tools.

I find the closing scenario (slide 67) somewhat contrived:

Prior to leaving home Paul, a Manchester graduate student, syncs his iPhone with the latest papers, delivered overnight by the library via a news syndication feed. On the bus he reviews the stream, selecting a paper close to his interest in HIV-1 proteases. The data shows apparent anomalies with his own work, and the method, an automated script, looks suspect. Being on-line he notices that a colleague in Madrid has also discovered the same paper through a blog discussion and they Instant Message, annotating the results together. By the time the bus stops he has recomputed the results, proven the anomaly, made a rebuttal in the form of a pubcast to the Journal Editor, sent it to the journal and annotated the article with a comment and the pubcast. [Based on an original idea by Phil Bourne]

If nothing else, it is missing any reference to Twitter (see the MarsPhoenix Twitter feed for example) and Second Life! :-).  That said, there is no doubt that the times they are a'changing.

My advice?  You'd better start swimming or you'll sink like a stone :-)

July 18, 2008

Does metadata matter?

This is a 30 minute slidecast (using 130 slides), based on a seminar I gave to Eduserv staff yesterday lunchtime.  It tries to cover a broad sweep of history from library cataloguing, thru the Dublin Core, Web search engines, IEEE LOM, the Semantic Web, arXiv, institutional repositories and more.

It's not comprehensive - so it will probably be easy to pick holes in if you so choose - but how could it be in 30 minutes?!

The focus is ultimately on why Eduserv should be interested in 'metadata' (and surrounding areas), to a certain extent trying to justify why the Foundation continues to have a significant interest in this area.  To be honest, it's probably weakest in its conclusions about whether, or why, Eduserv should retain that interest in the context of the charitable services that we might offer to the higher education community.

Nonetheless, I hope it is of interest (and value) to people.  I'd be interested to know what you think.

As an aside, I found that the Slideshare slidecast editing facility was mostly pretty good (this is the first time I've used it), but that it seemed to struggle a little with the very large number of slides and the quickness of some of the transitions.

July 04, 2008

Cory Doctorow on open licences

The Guardian's Tech Weekly podcast from Wednesday this week contains a brief but interesting interview with Cory Doctorow (about 21 minutes into the podcast if you want to jump straight to it).  In it he talks about his 3 key reasons for adopting open licences for his books.  Speaking about the work he produces he says:

  • Firstly, artistically it doesn't seem like a plausible 21st century piece of art if it is not intended to be copied.  There's something anachronistic in doing otherwise - "it's like making horse shoes or something".
  • Secondly, morally we are not going to be able to stop people copying and remixing work anyway, and our attempts at doing so to date have resulted in horrible things happening like spying on people, kicking them off the Internet, or suing old ladies or very young people for all their money.  Further, like most of us, he was a avid copier when he was part of the "time rich, cash poor" demographic - "I never would have had a single romantic episode if it wasn't for the mix tape".  If he was 17 again he'd be copying and remixing stuff it so it seems hypocritical to try to stop it happening to his own stuff.
  • Finally, financially the fundamental problem isn't piracy, it's obscurity.  The people who don't buy his books, do so because they've never heard of them, not because the books are openly available online.

He then goes on to talk about his desire to give practical help to those people who "get" the open access argument but need help in making it happen effectively.  And towards the end he touches on the people illegally selling CC licenced Flickr images on eBay issue that I blogged about a while back.

Worth a listen if you have time.

June 23, 2008

Creative borrowing?

Charles Arthur, What's the right way to talk about copyright stuff? (Guardian Technology, June 23 2008), asks, "How do you describe it when someone sells a photograph they don't have a right to?".

He is referring to the fact that "some people who had put photos on Flickr under a Creative Commons non-commercial licence found that they were being sold on eBay by someone who was claiming the rights to them".

Simple question?

Apparently not - and, judging by the responses to the article, using words like 'theft' and 'steal' clearly rub some people up the wrong way... "The photographs in question simply are not being stolen. They're being copied. No thieves in existence there, but copiers. Illegal copiers I'm sure...".

I have to confess that when I used to talk to my own children about illegally downloading music on the Web I tended to use the analogy of shoplifting, on the basis that each time they downloaded a track they would be denying a shop (and the artist) a sale .  So what was their typical reaction?  Basically, they thought I was completely mad (and they probably weren't wrong - either in the specific or general case!).

As ParkyDR says in a comment on the article: "Nothing has been taken, the original owner still has the photo, in this case even copying it was ok (CC licenced), the license was broken when the photo was sold".

Well, yes...  but it still feels a lot like theft in many ways?  As Nickminers says, "if you make money from a photo that was taken by somebody else, you have effectively stolen the money that the copyright holder should have earned from the sale".

Dvdhldn suggests using the phrase "copyright theft" as a compromise between "theft" and "illegal copying" which sounds reasonable to me.  Whatever... the key point here is that I don't think many people would disagree that selling someone else's CC-BY-NC images without permission is wrong.  The issue is only with what words we should use to label the activity.

So here's a less clear cut scenario...  Brian Kelly tweeted the other day about a new competitor service to Slideshare called AuthorSTREAM.  The new service looks interesting and offers some functionality not currently present in Slideshare, though I have to say I feel slightly uncomfortable about how far the new service has gone to make itself look and feel like the original.

But the service itself isn't the issue.  Brian also noticed that someone had taken copies of a large number of old UKOLN Powerpoint presentations and uploaded them to the AuthorSTREAM site.  I took a look for my own presentations and sure enough, a few of those uploaded were mine.

Hmmm... that's a little annoying.  Or is it?  No, perhaps not - there's no attempt at passing these presentations off as being by someone else so perhaps it is just good visibility.  On the other hand, I know of at least one case where the continued availability of old, technically out of date, material on the Web does more harm than good and I'd prefer to be in control of when I publish my own crap thank you very much.

So, it's not clear cut by any means...

I also noticed that one of my more recent presentations has been made available, uploaded by someone called 'Breezy' and labeled on AuthorSTREAM as andy powell presentation, though the original on Slideshare is called The Repository Roadmap - are we heading in the right direction?  This is more frustrating in a way.  The presentation is already available on the Web in a very accessible form and someone else uploading it to a different service just waters down the Google juice of the original.  That's downright unhelpful, at least from my perspective.  If Breezy had asked, I'd have said no and asked him or her to link to the original.

Now, I must stress that Breezy has done nothing legally wrong here.  The original presentation is made available under a CC-BY licence (at least that was what I intended, though I've just noticed that in fact, on this occasion, I forgot to add a CC licence until just now!).  So in some sense, I am explicitly encouraging Breezy to do what he or she has done through my use of open licences.

But supposing Breezy had taken all of my presentations from Slideshare and replicated them all on AuthorSTREAM.  Would that have been OK?  Again, according to the individual licence on each presentation Breezy would have done nothing wrong - at least, not legally.  But morally... that seems like a different kettle of fish?  At least from my point of view.

It's frustrating because what I really want is a licence that says, "you can take this content, unbundle it, and use the parts to create a new derivative work but you can't simply copy the whole work and republish it on the Web unchanged" and more fundamentally, "you can do stuff with the individual resources that I make available but you can't take everything I've ever created and make it all available at a new location on the Web wholesale".

The bottom line is that there's a difference between making a new, derivative work and simply copying stuff.

Enough said... at the end of the day Creative Commons licences are the best we've got for making content openly available on the Web and in those few cases where things go a bit wrong I can either learn to live with it or try to resolve the situation with a simple email.

June 16, 2008

Web 2.0 and repositories - have we got our repository architecture right?

For the record... this is the presentation I gave at the Talis Xiphos meeting last week, though to be honest, with around 1000 Slideshare views in the first couple of days (presumably thanks to a blog entry by Lorcan Dempsey and it being 'featured' by the Slideshare team) I guess that most people who want to see it will have done so already:

Some of my more recent presentations have followed the trend towards a more "picture-rich, text-poor" style of presentation slides.  For this presentation, I went back towards a more text-centric approach - largely because that makes the presentation much more useful to those people who only get to 'see' it on Slideshare and it leads to a more useful slideshow transcript (as generated automatically by Slideshare).

As always, I had good intentions around turning it into a slidecast but it hasn't happened yet, and may never happen to be honest.  If it does, you'll be the first to know ;-) ...

After I'd finished the talk on the day there was some time for Q&A.  Carsten Ulrich (one of the other speakers) asked the opening question, saying something along the lines of, "Thanks for the presentation - I didn't understand a word you were saying until slide 11".  Well, it got a good laugh :-).  But the point was a serious one... Carsten admitted that he had never really understood the point of services like arXiv until I said it was about "making content available on the Web".

OK, it's a sample of one... but this endorses the point I was making in the early part of the talk - that the language we use around repositories simply does not make sense to ordinary people and that we need to try harder to speak their language.

May 16, 2008

Teach online to compete...

An article in Tuesday's Education Guardian, Teach online to compete, British universities told, caught my eye - not least because it appears to say very little about teaching online.  Rather, it talks about making course materials available online, which is, after all, very different.  To be fair, Carol Comer, academic development advisor (eLearning) at the University of Chester, does make this point towards the end of the article.

The report on which the story is based is "a paper for the latest edition of ppr, the publication of influential thinktank the Institute for Public Policy Research".  I'm not sure if the paper is currently finished - it doesn't really look finished to be honest - the fonts seem to be all over the shop but perhaps I'm being too picky.  Or perhaps the Guardian have got sight of it a little early?

The report suggests that the UK should:

  • establish a centralised online hub of diverse British open courseware offerings at www.ocw.ac.uk, presented in easily-readable formats and accessible to teachers, students and citizens alike
  • establish the right and subsequent capacity for non-students and non-graduates to take the same exam as do face-to-face students, through the provision of open access exam sessions
  • pass an Open Access Act through Parliament, establishing a new class of Open degree, achieved solely using open courseware
  • conduct a high-profile public information campaign, promoting the opportunities afforded open courseware and open access examinations and degrees, targeted at adult learners, excluded minorities and students at pre-university age

OK, I confess that I found the report quite long and I didn't quite get to the end (err, make that beyond halfway).  I'm as big a fan of open access as the next person, probably more so, so I don't have a problem with the suggestion that we should be making more courseware openly available.  I'm just not convinced that anyone could get themselves up to degree level simply by downloading / reading / watching / listening to a load of open access courseware - no matter how good it is.  The report makes reference to MIT's OpenCourseware and the OU's OpenLearn initiatives.  Call me a cynic, but I've always suspected that MIT makes its cousreware available online, not for the greater good of humanity but so that more students will enroll at MIT?  OK, I'm adopting an intentionally extreme position here and I'm sure people at MIT do have the best of intentions - but I think it is also the case that they don't see the giving away of courseware in any way harmful to their current business models.  The OU's OpenLearn initiative (treated somewhat unfairly by the parts of the report I read) is slightly different in any case since the OU is by definition a distance-based institution - or so it seems to me.

So, I should probably stop at this point - having not properly read the report fully.  If you think I've been very unfair when you read the report yourself, let me know by way of a comment.

April 21, 2008

Jorum to move to open access

The JISC have announced that Jorum, the national learning object repository hosted and run jointly by MIMAS and EDINA, is to move to an 'open access' model.

This is good news, though one is tempted to wonder why it has taken so long!  I've argued for a while now that using a relatively closed licensing model and forcing registration before use would more or less stop the service in its tracks.

Through the development of JorumOpen, lecturers and teachers will be able to share materials under the Creative Commons licence framework: this makes sharing easier, granting users greater rights for use and re-use of online content and easier to understand. Importantly, it does not require prior registration. As a result availability is global as well as across UK universities and colleges. JorumOpen will run alongside a 'members only' facility, JorumEducationUK, that will support sharing of material just within the UK educational sector; this will be available only to registered users and contributors, as is currently the case.

Is the addition of JorumOpen enough to turn the service around?  I'm not sure to be honest.  It might be, though I'm not fully convinced that the notion of learning objects, as relatively complex packages of other objects, is compelling and/or simple enough to really succeed.  Can something like Jorum really take on the likes of Slideshare, Flickr and YouTube?

April 14, 2008

Open Repositories 2008

I spent a large part of last week the week before last (Tuesday, Wednesday & Friday) at the Open Repositories 2008 conference at the University of Southampton.

There were something around 400 delegates there, I think, which I guess is an indicator of the considerable current level of interest around the R-word. Interestingly, if I recall conference chair Les Carr's introductory summary of stats correctly, nearly a quarter of these had described themselves as "developers", so the repository sphere has become a locus for debate around technical issues, as well as the strategic, policy and organisational aspects. The JISC Common Repository Interfaces Group (CRIG) had a visible presence at the conference, thanks to the efforts of David Flanders and his comrades, centred largely around the "Repository Challenge" competition (won by Dave Tarrant, Ben O’Steen and Tim Brody with their "Mining with ORE" entry).

The higher than anticipated number of people did make for some rather crowded sessions at times. There was a long queue for registration, though that was compensated for by the fact that I came away from that process with exactly two small pieces of paper: a name badge inside an envelope on which were printed the login details or the wireless network. (With hindsight, I could probably have done with a one page schedule of what was on in which location - there probably was one which I missed picking up!) Conference bags (in a rather neat "vertical" style which my fashion-spotting companions reliably informed me was a "man bag") were available, but optional. (I was almost tempted, as I do sport such an accessory at weekends, and it was black rather than dayglo orange, but decided to resist on the grounds that there was a high probability of it ending up in the hotel wastepaper bin as I packed up to leave.) Nul points, however, to those advertisers who thought it was a good idea to litter every desktop surface in the crowded lecture theatre with their glossy propaganda, with the result that a good proportion of it ended up on the floor as (newly manbagged-up) delegates squeezed their way to their seats.

The opening keynote was by Peter Murray-Rust of the Unilever Centre for Molecular Informatics, University of Cambridge. With some technical glitches to contend with, which must have been quite daunting in the circumstances - Peter has posted a quick note on his view of the experience! "I have no idea what I said" :-)) - , Peter delivered a somewhat "non-linear" but always engaging and entertaining overview of the role of repositories for scientific data. He noted the very real problem that while ever increasing quantities of data are being generated, very little of it is being successfully captured, stored and made accessible to others. Peter emphasised that any attempt to capture this data effectively must fit in with the existing working practices of scientists, and must be perceived as supporting the primary aims of the scientist, rather than introducing new tasks which might be regarded as tangential to those aims. And the practices of those scientists may, in at least some areas of scientific research, be highly "locally focused" i.e. the scientists see their "allegiances" as primarily to a small team with whom data is shared - at least in the first instance, an approach categorised as "long tail science" (a term attributed to Peter's colleague Jim Downing). Peter supported his discussion with examples drawn from several different e-Chemistry projects and initiatives, including the impressive OSCAR-3 text mining software which extracts descriptions of chemical compounds from documents)

Most of the remainder of the Tuesday and Wednesday I spent in paper sessions. The presentation I enjoyed most was probably a presentation by Jane Hunter from the University of Queensland on the work of the HarvANA project on a distributed approach to annotation and tagging of resources from the Picture Australia collection (in the first instance at least - at the end, Jane whipped through a series of examples of applying the same techniques to other resources). Jane covered a model for annotation on tagging based on the W3C Annotea model, a technical architecture for gathering and merging distributed annotations/taggings (using OAI-PMH to harvest from targets at quite short time intervals (though those intervals could be extended if preferred/required)), browser-based plug-in tools to perform annotation/tagging, and also touched on the relationships between tagging and formally-defined ontologies. The HarvANA retrieval system currently uses an ontology to enhance tag-based retrieval - "ontology-based or ontology-directed folksonomy" - , but the tags provided could also contribute to the development/refinement of that ontology, "folksonomy-directed ontology". Although it was in many ways a repository-centric approach and Jane focused on the use of existing, long-established technologies, she also succeeded in placing repositories firmly in the context of the Web: as systems which enable us to expose collections of resources (and collections of descriptions of those resources), which then enter the Web of relationships with other resources managed and exposed by other systems - here, the collections of annotations exposed by the Annotea servers, but potentially other collections too.

At Wednesday lunch time, (once I managed to find the room!) I contributed to a short "birds of a feather" session co-ordinated by Rosemary Russell of UKOLN and Julie Allinson of the University of York on behalf of the Dublin Core Scholarly Communications Community. We focused mainly on the Scholarly Works Application Profile and its adoption of a FRBR-based model, and talked around the extension of that approach to other resource types which is under consideration in a number of sibling projects currently being funded by JISC. (Rather frustratingly for me, this meeting clashed with another BoF session on Linked Data which I would really have liked to attend!)

I should also mention the tremendously entertaining presentation by Johan Bollen of the Los Alamos National Laboratory on the research into usage metrics carried out by the MESUR project. Yes, I know, "tremendously entertaining" and "usage statistics" aren't the sort of phrases I expect to see used in close proximity either. Johan's base premise was, I think, that seeking to illustrate impact through blunt "popularity" measures was inadequate, and he drew a distinction between citation - the resources which people announce in public that they have read - and usage - the actual resources they have downloaded. Based on a huge dataset of usage statistics provided by a range of popular publishers and aggregators, he explored a variety of other metrics, comparing the (surprisingly similar) rankings of journals obtained via several of these metrics with the rankings provided by the citation-based Thomson impact factor. I'm not remotely qualified to comment on the appropriateness of Johan's choice of algorithms, but the fact that Johan kept a large audience engaged at the end of a very long day was a tribute to his skill as a presenter. (Though I'd still take issue with the Britney (popular but insubstantial?)/Big Star (low-selling but highly influential/lauded by the cognoscenti) opposition: nothing by Big Star can compare with the strutting majesty of "Toxic". No, not even "September Gurls".)

On the Friday, I attended the OAI ORE Information Day, but I'll make that the subject of a separate post.

All in all - give or take a few technical hiccups - it was a successful conference, I think (and thanks to Les and his team for their hard work) - perhaps more so in terms of the "networking" that took place around the formal sessions, and the general "buzz" there seemed to be around the place, than because of any ground-breaking presentations.

And yet, and yet... at the end of the week I did come away from some of the sessions with my niggling misgivings about the "repository-centric" nature of much of the activity I heard described slightly reinforced. Yes, I know: what did I expect to hear at a conference called "Open Repositories"?! :-) But I did feel an awful lot of the emphasis was on how "repository systems" communicate with each other (or how some other app communicates with one repository system and then with another repository system ) e.g. how can I "get something out" of your repository system and "put it into" my repository system, and so on. It seems to me that - at the technical level at least - we need to focus less on seeing repository systems as "specific" and "different" from other Web applications, and focus more on commonalities. Rather than concentrating on repository interfaces we should ensure that repository systems implement the uniform interface defined by the RESTful use of the HTTP protocol. And then we can shift our focus to our data, and to

  • the models or ontologies (like FRBR and the CIDOC Conceptual Reference Model, or even basic one-object-is-made-available-in-multiple-formats models) which condition/determine the sets of resources we expose on the Web, and see the use of those models as choices we make rather than something "technologically determined" ("that's just what insert-name-of-repository-software-app-of-choice does");
  • the practical implementation of formalisms like RDF which underpin the structure of our representations describing instances of the entities defined by those models, through the adoption of conventions such as those advocated by the Linked Data community

In this world, the focus shifts to "Open (Managed) Collections" (or even "Open Linked Collections"), collections of documents, datasets, images, of whatever resources we choose to model and expose to the world. And as a consumer of those resources  I (and, perhaps more to the point, my client applications) really don't need to know whether the system that manages and exposes those collections is a "repository" or a "content management system" or something else (or if the provider changes that system from one day to the next): they apply the same principles to interactions with those resources as they do to any other set of resources on the Web.

April 10, 2008

Powerhouse becomes first museum to join the Commons on Flickr

2362702043_3a80db9624_o_2 Via Seb Chan I note that the Powerhouse Museum in Sydney, Australia is the first museum to join the Commons on Flickr:

In the tradition of ’slow food’ we have decided to do a slow release of content with an initial 200 historic images of Sydney and surrounds available through the Commons on Flickr and a promise of another 50 new fresh images each week! These initial images are drawn from the Tyrrell Collection. Representing some of the most significant examples of early Australian photography, the Tyrrell Collection is a series of glass plate negatives by Charles Kerry (1857-1928) and Henry King (1855-1923), two of Sydney’s principal photographic studios at the time.

A follow-up post discusses the apparent impact this move is having.

Good stuff.  Which UK museum is going to be the first to do this I wonder?

[Image: A modern Australian shearer.  Glass plate negative.  Tyrrell Photographic Collection, Powerhouse Museum.]

March 29, 2008

Open cultural heritage

JISC have announced five new digitisation projects, funded jointly with US’s National Endowment for the Humanities (NEH).

Looking at the announcement text, I am slightly worried about the licences under which the resulting digitised resources will be made available. Yes, I know I bang on about this all the time but we seem to have a well ingrained habit in this country (the UK more so than the US I think) of publicly funding digitisation projects which result in resources being freely available on the Web, but not being open.  I, for one, would feel reassured if such things were made more explicit.

Now, the word open is used in multiple ways, so I should explain.  I'm using it here as in open content (from Wikipedia):

[Open content is] any kind of creative work published in a format that explicitly allows copying and modifying of its information by anyone, not exclusively by a closed organization, firm or individual.

This usually implies the use of an explicit open content licence, such as those provided by Creative CommonsFree content on the other hand, is typically available only for viewing by the end-user, with copyright and/or other restrictions typically limiting other usage to 'personal educational' use at best.

Based on the minimal information provided about the five projects, only one explicitly mentions the use of Creative Commons, one mentions the development of open source software and one talks about results being freely available (though as mentioned above, being free and being open are two different things).

Why does this matter?  Well, it seems to me that whenever possible (and I accept that there may be situations in which it is not possible) publicly funded digitisation of our cultural heritage should result in resources that can be re-purposed freely by other people.  That means, for example, that any lecturer or teacher who wants to take the digitised cultural heritage resource and build it into a learning object in their VLE, or an exhibit in Second Life, or whatever, can do so freely, without needing to contact the content provider.

Open content is what makes the Web truly mashable, and we should look to the cultural heritage sector for our richest and most valued mashable content.  Free content is not sufficient.

There is probably a useful debate to be had around whether the cultural resources produced by publicly funded digitisation should be able to be re-used in commercial activities as well as non-profit ones.  My personal view is that anything that adds value is fair game, including commercial activities, but I accept that there are other views on this issue.  Whatever, re-use for non-profit purposes is an absolute minimum.

To conclude... I really hope that I'm wasting blog space here, and that the conditions of funding in this case mandated that the resulting resources be made open rather than just free.  And further, that such a condition is already (or rapidly becomes) the norm for publicly funded digitisation of our cultural heritage everywhere.  I'm keeping my fingers crossed.

March 17, 2008

Hiding Magna Carta on the Web

Magnacarta The BL have made a digitised copy of the Magna Carta available on the Web:

Magna Carta is one of the most celebrated documents in history. Examine the British Library's copy close-up, translate it into English, hear what our curator says about it, and explore a timeline.

So says the introductory blurb.

Well... if it's so "celebrated" and important can someone please explain why the digitised version has been hidden behind a Shockwave viewer that makes it pretty much impossible to do anything other than browse it on the BL's Web site?  Yes, there is a simple version, which does not require a browser plugin, but the copyright statement and complete lack of CC licence (or anything remotely like it) makes it clear that re-use wasn't high on the BL's agenda.

Shame on them.

Come on BL, you can spend our money better then this!

February 13, 2008

Repositories thru the looking glass

P1050338 I spent last week in Melbourne, Australia at the VALA 2008 Conference - my first trip over to Australia and one that I thoroughly enjoyed.  Many thanks to all those locals and non-locals that made me feel so welcome.

I was there, first and foremost, to deliver the opening keynote, using it as a useful opportunity to think and speak about repositories (useful to me at least - you'll have to ask others that were present as to whether it was useful for anyone else).

It strikes me that repositories are of interest not just to those librarians in the academic sector who have direct responsibility for the development and delivery of repository services.  Rather they represent a microcosm of the wider library landscape - a useful case study in the way the Web is evolving, particularly as manifest through Web 2.0 and social networking, and what impact those changes have on the future of libraries, their spaces and their services.

My keynote attempted to touch on many of the issues in this area - issues around the future of metadata standards and library cataloguing practice, issues around ownership, authority and responsibility, issues around the impact of user-generated content, issues around Web 2.0, the Web architecture and the Semantic Web, issues around individual vs. institutional vs. national, vs. international approaches to service provision.

In speaking first I allowed myself the luxury of being a little provocative and, as far as I can tell from subsequent discussion, that approach was well received.  Almost inevitably, I was probably a little too technical for some of the audience.  I'm a techie at heart and a firm believer that it is not possible to form a coherent strategic view in this area without having a good understanding of the underlying technology.  But perhaps I am also a little too keen to inflict my world-view on others. My apologies to anyone who felt lost or confused.

I won't repeat my whole presentation here.  My slides are available from Slideshare and a written paper will become available on the VALA Web site as soon as I get round to sending it to the conference organisers!

I can sum up my talk in three fairly simple bullet points:

  • Firstly, that our current preoccupation with the building and filling of 'repositories' (particularly 'institutional repositories') rather than the act of surfacing scholarly material on the Web means that we are focusing on the means rather than the end (open access).  Worse, we are doing so using language that is not intuitive to the very scholars whose practice we want to influence.
  • Secondly, that our focus on the 'institution' as the home of repository services is not aligned with the social networks used by scholars, meaning that we will find it very difficult to build tools that are compelling to those people we want to use them.  As a result, we resort to mandates and other forms of coercion in recognition that we have not, so far, built services that people actually want to use.  We have promoted the needs of institutions over the needs of individuals.  Instead, we need to focus on building and/or using global scholarly social networks based on global repository services.  Somewhat oddly, ArXiv (a social repository that predates the Web let alone Web 2.0) provides us with a good model, especially when combined with features from more recent Web 2.0 services such as Slideshare.
  • Finally, that the 'service oriented' approaches that we have tended to adopt in standards like the OAI-PMH, SRW/SRU and OpenURL sit uncomfortably with the 'resource oriented' approach of the Web architecture and the Semantic Web.  We need to recognise the importance of REST as an architectural style and adopt a 'resource oriented' approach at the technical level when building services.

I'm pretty sure that this last point caused some confusion and is something that Pete or I need to return to in future blog entries.  Suffice to say at this point that adopting a 'resource oriented' approach at the technical level does not mean that one is not interested in 'services' at the business or function level.

[Image: artwork outside the State Library of Victoria]

December 04, 2007

Socialising our Applications

In addition to the travels that Andy mentioned, we've also been grappling with the disruption caused by a relocation to a different office, so I seem to have accumulated a number of half-written posts which I'll try to find the time to get out this week.

For now, a brief pointer to a nice post by Roo Reynolds in which he compares the character and functionality of the UK government's Hansard Web site (which provides access to the "official" " edited verbatim report of proceedings" in the two houses of the UK Parliament) and two independent sites, TheyWorkForYou.com and The Public Whip, which take advantage of the availability of that data to provide more "social" functionality around the same information:

While the text is the same, the simple addition of some additional markup, links and photos brings it to life. The addition of user comments turns the whole thing into a social application, allowing us to discuss what our MPs and Lords are shouting across their respective aisles at each other every day.

In addition, Roo highlights the importance of underpinning such applications with an entity-/object-based approach - what I would probably call a resource-oriented approach:

Social software designers talk about the 'atoms', (or objects, or entities) of an application. For example, YouTube’s atoms include videos (of course) but also comments, playlists and users. Flickr’s atoms include photos, comments, users, groups and notes. TheyWorkForYou’s atoms are speeches and comments. Don’t get the impression that ’speech’ necessarily means a long speech. It could be a question, an interruption, an answer or a statement. Sometimes even standing up to speak is enough to get an entry in Hansard.

In his discussion of The Public Whip, Roo emphasises that  such entities include people and also 'abstract resources' such as 'divisions' and 'policies'. I guess I might add that such entities aren't necessarily 'atomic' in the traditional sense of that word, indicating something 'indivisible': a collection or list of other entities/resources can also be an entity/resource in its own right, and indeed such entities are visible in those services.

But it's a good post, highlighting very simply and clearly the value of open data and what the "social" dimension can bring to an application. 

November 16, 2007

Use of open content licences by cultural heritage organisations - report now available

The study that Jordan Hatcher has been working on for us is now available.  The report looks at the current usage of Creative Commons and other open content licences by cultural heritage organisations in the UK.

Note that this report, and survey on which it is based, only reflects those individuals that participated (107 respondents in all), and does not purport to represent the entire sector.  That said, it mildly surprises me that about half of those completing the survey hadn't heard of Creative Commons or Creative Archive licences.  It also struck me as interesting to note that only about half the respondents have "an in-house legal department or designated person that deals with copyright issues" and that a similar proportion do not have "a copyright policy publicly stated on its website".

I've argued before that it is too hard to re-use cultural heritage content in the UK for anything other than personal educational use (particularly in comparison with the US).  Moving towards making copyright and licensing terms explicit would be a big step in the right direction.

November 13, 2007

Strangling creativity

I've mentioned the TED talks before on this blog and I think it is true to say that all the ones I've watched in the series have been excellent.  The recently announced talk by Lawrence Lessig, How creativity is being strangled by the law, is no exception:

The Net's most adored lawyer brings together John Philip Sousa, celestial copyrights, and the "ASCAP cartel" to build a case for creative freedom. He pins down the key shortcomings of our dusty, pre-digital intellectual property laws, and reveals how bad laws beget bad code. Then, in an homage to cutting-edge artistry, he throws in some of the most hilarious remixes you've ever seen.

This presentation works on a number of levels - it is thought-provoking, inspirational and very funny and is given using a presentational style that makes it a joy to watch.  Well worth the 30 minutes or so that it will take to view it.

Meanwhile, over on the Guardian Unlimited Technology blog, Cory Doctorow pokes fun at the National Portrait Gallery, Warhol is turning in his grave, by highlighting the irony of putting on an exhibition of pop art, an art movement that to a large extent celebrated "nicking the work of others, without permission, and transforming it to make statements and evoke emotions never countenanced by the original creators", in an environment adorned with copyright-induced restrictions.

Does this show - paid for with public money, with some works that are themselves owned by public institutions - seek to inspire us to become 21st century pop artists, armed with cameraphones, websites and mixers, or is it supposed to inform us that our chance has passed and we'd best settle for a life as information serfs who can't even make free use of what our eyes see and our ears hear? 

October 03, 2007

Flipping open access

Peter Suber has an interesting article in the current SPARC Open Access Newsletter, issue #114, in which he discusses an idea originally put forward by Mark Rowse (previously CEO of Ingenta) for how current toll access journals can become open access journals by 'flipping' their consortia subscriptions for readers to consortia subscriptions for authors.

Peter's analysis starts from some rather simplistic assumptions about the penetration of consortia subscription models in the US but quickly moves to firmer ground, assessing both the likely viability of 'flipped' business models and some of the potential benefits such an approach might bring to readers, authors, institutions, publishers and research in general.

I don't know how new these ideas will be to those of you steeped in the political discussions around open access, but I found it an interesting read - one made better by its acknowledgment that the sustainability of publisher services is an important consideration in the move towards OA.

September 28, 2007

Are we digitising into silos?

I certainly hope not!  But reading, or possibly mis-reading, between the lines of the BBC report on the British Library's recent announcement that it is working with Microsoft to digitise 100,000 19th Century books leaves me a little worried. The phrasing of:

digitised publications will be accessible in two ways - initially through Microsoft's Live Search Books and then via the Library's website

makes it sound like these texts are not going to start popping up in Google search results any time soon.

I hope I'm wrong?  Please shout if you know better!

September 27, 2007

Open content licences survey - update

A quick reminder that we are still seeking responses to our survey about the use of open content licences in the UK cultural heritage sector.  We hope to close the survey at the end of this month (or soon after).

We've had about 70 responses so far, which is great, but if you haven't responded and you have something you'd like to share with us, please fill in the form asap.  Thanks.

August 13, 2007

How open is The European Library?

I note that the terms of use of The European Library state:

Copying of individual articles is governed by international copyright law. Users may print off or make single copies of web pages for personal use. Users may also save web pages other than individual articles electronically for personal use. Electronic dissemination or mailing of articles is not permitted, without prior permission from the Conference of European of National Librarians and/or the National Library concerned.

Seems a shame.  Surely some the material found through the TEL portal could be made available on a more open basis?

As someone that would like to build experimental virtual exhibitions of European cultural heritage materials in Second Life, I'm scuppered at the first hurdle - I can't easily work out what is available for re-use.  Worse in fact - it looks like nothing is available for re-use!

As I've noted before, the US seems way ahead of us in terms of making digitised cultural heritage material openly available.

 

August 06, 2007

Open, online journals != PDF ?

I note that Volume 2, Number 1 of the International Journal of Digital Curation (IJDC) has been announced with a healthy looking list of peer-reviewed articles.  Good stuff.

I mention this partly because I helped set up the technical infrastructure for the journal using the Open Journal System, an open source journal management and publishing system developed by the Public Knowledge Project, while I was still at UKOLN - so I have a certain fondness for it.

Odd though, for a journal that is only ever (as far as I know) intended to be published online, to offer the articles using PDF rather than HTML.  Doing so prevents any use of lightweight 'semantic' markup within the articles, such as microformats, and tends to make re-use of the content less easy.

In short, choosing to use PDF rather than HTML tends to make the content less open than it otherwise could be.  That feels wrong to me, especially for an open access journal!  One could just about justify this approach for a journal destined to be published both on paper and online (though even in that case I think it would be wrong) but surely not for an online-only 'open' publication?

August 03, 2007

Use of open content licences by cultural heritage organisations

The survey part of the study into the use of CC and other open content licences by UK cultural heritage organisations is now available.

If you have responsibility in this area, please consider filling in the survey.  Cultural heritage organisations include museums, galleries, libraries, and archives, as well as radio and television broadcasters, and film and video organisations. Even if you do not fall into one of these groups, but conduct cultural heritage activities, you are invited to take the survey.

We anticipate that completing it will take less than 10 minutes.  By completing the survey you will have a chance to win one of three iPod Shuffles, pre-filled with Creative Commons licensed material.

July 12, 2007

Journal articles, metadata formats and woes

In a post on his Digital Library Technology Jester weblog, Peter Murray of OhioLINK points to an XML format developed by the Directory of Open Access Journals (DOAJ) for representing descriptions of journal articles.

First, I think I'd qualify Peter's point that

Prior to this addition the only scheme available was Dublin Core, which as a metadata schema for describing article content is woefully inadequate. (Dublin Core, of course, was never designed to handle the complexity of the description of an average article.)

I think the reference here to "Dublin Core" is really to the specific "DC application profile" (or description set profile, as we are starting to refer to these things) commonly known as "Simple DC", i.e. the use of (only) the 15 properties of the Dublin Core Metadata Element Set with literal values, for which the oai_dc XML format defined by the OAI-PMH spec provides a serialisation. On that basis, I'd be inclined to agree that the Simple DC profile is not the tool for the task at hand: the Simple DC profile is intended to support simple, general descriptions of a wide range of resources, and it doesn't in itself offer the "expressiveness" that may be required to support all the requirements of individual communities, or more detailed description specific to particular resource types.

However, the framework provided by the DCMI Abstract Model provides the sort of extensibility which enables communities to develop other profiles to meet those requirements for richer, more specific descriptions.

I guess DCMI still has its work cut out to try to convey the message that "Dublin Core" doesn't begin and end with the DCMES.

But perhaps more specifically pertinent to the topic of the DOAJ format is the fact that the work carried out last year on the ePrints DC Application Profile, led by Andy and Julie Allinson of UKOLN, applied exactly this approach for the area of scholarly works, including journal articles. From the outset, the initiative recognised that the Simple DC profile was insufficient to meet the requirements which had been articulated, and shifted their focus to the development of a new profile, based on applying a subset of the FRBR entity-relational model to the "eprint" domain.

I haven't yet compared the DOAJ format and the ePrints DCAP closely enough to say whether the latter would support the representation of all the information represented by the former. I guess it's quite likely that the two initiatives were simply not aware of each other's efforts. Or it may be that the DOAJ folks felt that the ePrints DCAP was more complex than they needed for the task at hand.

But it does seem a pity that we seem to have ended up with two specs, developed at almost the same time, and applying to pretty much the same "space", leaving implementers harvesting data from multiple providers with the probability of needing to work across both.

(Hmmm, it occurs to me that a quick spot of GRDDL-ing might make that less painful than it appears... Watch this space.)

July 03, 2007

A brief history of OA

Stevan Harnard has posted a nice summary of the key milestones in the development of the Open Access movement to the American Scientist Open Access Forum.

Towards the end he says:

The OA way of the present and future is for researchers to deposit their articles in their own Institutional Repositories.

Is this the one true OA way?  I'm not convinced.  Let's focus on what is important, the 'open' and the 'access' - and let the way of the future determine itself based on what actually helps to achieve those aims.

June 22, 2007

Precedings

I note that Nature have announced Precedings:

Nature Precedings is a place for researchers to share pre-publication research, unpublished manuscripts, presentations, posters, white papers, technical papers, supplementary findings, and other scientific documents. Submissions are screened by our professional curation team for relevance and quality, but are not subjected to peer review. We welcome high-quality contributions from biology, medicine (except clinical trials), chemistry and the earth sciences.

Interesting.  As one might expect, blog reaction is mixed... for example, the positive reception by David Weinberger draws some negative comment from those on the institutional repository side of the fence, who argue that repositories (despite the fact that they are largely empty!) already do all of this.

The announcement of Precedings echoes almost exactly the point I was trying to make in my talk at the JISC Repositories Conference and in subsequent posts - we need to stop thinking institutionally and instead develop or use naturally compelling services, such as Precedings, that position researchers directly in a globally social context.

Of course, it remains to be seen whether Nature have got Precedings right, but I think it is an interesting development that deserves close attention as it grows.

June 21, 2007

Obsessive-compulsive disorder?

In a posting to the American Scientist Open Access Forum Sally Morris notes:

It's one of the curious things about the 'Open Access movement' that uptake by the academics themselves (for whose benefit it is supposed to be) depends on compulsion.

I made a similar point, though I suspect for completely different reasons, in my recent posting about repositories:

Yes, we can acknowledge our failure to put services in place that people find intuitively compelling to use by trying to force their use thru institutional or national mandates?  But wouldn't it be nicer to build services that people actually came to willingly?

Steven Harnard, in his response to Sally, notes that:

But if "compulsion" is indeed the right word for mandating self-archiving, I wonder whether Sally was ever curious about why publication itself had to be mandated by researchers' institutions and funders ("publish or perish"), despite its substantial benefits to researchers?

Touché.

I don't consider myself a real researcher [tm] so I probably shouldn't comment but I've always assumed that "publish or perish" resulted at least as much from social pressure as from policy pressure.  Self-archiving should be the same - it should be the expected norm because it is the obvious and intuitive thing for researchers to do to gain impact.

June 14, 2007

Creative Commons, open licences and cultural heritage

We have agreed to fund Jordan Hatcher, formerly a Research Associate at the AHRC Research Centre for studies in Intellectual Property and Technology Law, to undertake a study into how open content licences are currently being used by cultural organisations in the UK.  Get in touch with Ed Barker if you want to know more.

April 18, 2007

Slideshare gets real

A while back I blogged about Slideshare being an example on 'fake' sharing...

I'm pleased to say that it has got 'real' and now offers the ability to download the PPT or PDF file for each presentation, as well as making the slides available thru the embedded display facility.  Nice.

Note that you have to manually enable this feature for any existing presentations in Slideshare - but doing so doesn't mean uploading the presentation again, just selecting a new tick box on the 'edit' page.

March 31, 2007

JISC, Scribd and scholarly repositories

Tony Hurst asks "why doesn't JISC fund the equivalent of Scribd for the academic community?" in a post on the OUseful blog to which one is tempted to ask, "why would they when such things already exist out on the Web?".

Of course, in reality there are good reasons why they might, partly because of the specific requirements of scholarly documents (as opposed to just any old documents) and partly because of assurances about persistence of services, quality assurance, and so on.

I'm minded to ask a different question.  One that I've asked before on a number of occasions, not least in the context of the current ORE project, which is "why don't scholarly repositories look more like Scribd?".  Why do we continue to develop and use digital library specific solutions, rather than simply making sue that our repositories integrate tightly with the main fabric of the Web (read Web 2.0)?

What does that mean?  Essentially it means assigning 'http' URIs to everything of interest, using the HTTP protocol and content negotiation to serve appropriate representations of that stuff, using sitemaps to steer crawlers to the important information, and using JSON to surface stuff flexibly in other places?

By the way, Tony also asks whether there is any sort of cross-search of UK repositories available, to which the answer is that JISC are funding Intute to develop such a thing (a development of the previous ePrints UK project I think).  And there are the global equivalents such as OAIster.

February 14, 2007

Donations to Creative Commons and Wikimedia

One of the great things about working at the Eduserv Foundation is our ability to give money to projects and activities that we feel are of benefit to the education community.  I am therefore very pleased to announce that we have given $10,000(US) each to Creative Commons and the Wikimedia Foundation (press release, PDF).

Creative Commons has fundamentally changed, and continues to change, our attitude to the way content is created and shared. By 'our' I am primarily thinking about the UK education community, though clearly the impact of CC is much wider than that - part of CC's attraction is that the underlying principles are understandable and applicable globally. CC has liberated us from thinking first and foremost about protecting and restricting content and has given us the ability to focus on sharing, which is fundamental to both learning and research. Sure, we have a long way to go in fully realising the benefits of CC, that's why it is important for organisations like the Eduserv Foundation to continue to support CC, but it seems to me that in a very real sense CC has changed the landscape in which we operate. The basis of the community's discussion about content is fundamentally different now because people come to the table with CC as a viable option.

In a similar way, our donation to the Wikimedia Foundation recognises the growing importance of the suite of activities they undertake, notably Wikipedia, in the context of learning and teaching. This is not just because Wikipedia has become such a valuable resource to teachers and learners in its own right, but because it has demonstrated the real potential of the Web; the potential for building very significant and valuable encyclopedic resources collaboratively, using a highly distributed knowledge base, in ways that were unimaginable to most of us even 3 or 4 years ago. I fully expect Wikipedia and their other offerings to continue to grow in importance within our community over the coming years.

November 22, 2006

The power of open access to data

This may be old news to some readers, but it was new to me, and so stunning that I felt the need to share it here.  Via David Recordon's shared items in Google Reader and a post in the ConnectID blog I discovered the TED talks and in particular this presentation by Hans Rosling (from Feb 2006 I think).  It's a fascinating talk that uses some very nice graphics to de-bunk some of the myths about developing nations.

Towards the end Hans makes the point that this kind of analysis is only really possible by unlocking UN statistical data in ways that makes it more openly available for use on the Web - data that has hitherto been locked away in closed databases with hard to use or non-existent APIs.  Hans talks a little about the need to search across this data, whereas my view is that it is the ability to re-use the data that is critical to the kind of analysis demonstrated in the presentation.  But that's a minor point.

Amazing stuff... and it makes me wonder if this kind of analysis could usefully be combined with the data that underpins OCLC's environmental scan to plot similar trends in provision of library and museum services and education.

About

Powered by TypePad
Add to Technorati Favorites