« September 2006 | Main | November 2006 »

October 27, 2006

Pushing an OpenDOAR

The OpenDOAR directory of open access repositories has announced a new search service based on Google's Custom Search Engine facility.  Good stuff - though for me it raises several questions of policy and implementation...

Firstly, the announcement for the service states:

It is well known that a simple full-text search of the whole web will turn up thousands upon thousands of junk results, with the valuable nuggets of information often being lost in the sheer number of results.

Well, err... OK.  This is not an uncommon statement, but simply repeating it over and over again doesn't necessarily make it true!  It doesn't reflect my informal impressions of the way that Web searching works today.  So I thought I'd do a little experiment, to try and compare results from the new OpenDOAR search service with results from a bog standard Google search.

Note that this isn't in the least bit scientific and it only compares known item searching - which may not be all that helpful - but bear with me while I give you my results and try to draw some conclusions.  (Then you can say "what a waste of time"!).

So, what I did was to browse my way thru the list of eprints.org repositories, selecting 10 research papers as randomly as I was able.  I then used the title of the paper to construct a known-item search against both the new OpenDOAR search interface and Google itself, noting down where the paper came in the search results and how many results there were overall.  Note that I counted the abstract page for the paper (as served by the repository in which the paper is held) as a hit.

My results were as follows:

Recent work on French rural history
Google 1 (out of ~8,000,000 results)
OpenDOAR 1 (out of 115 results)

Towards absorbing outer boundaries in general relativity
Google 5 (out of 69,000) (http://arxiv.org/abs/gr-qc/0608051 copy number 1)
OpenDOAR not in first 10 results (out of 86) (http://arxiv.org/abs/gr-qc/0608051 copy number 1)

More to life than Google – a journey for PhD students
Google 3 (out of 219,000) (http://magpie.lboro.ac.uk/dspace/handle/2134/676 copy number 1)
OpenDOAR not in first 10 (http://magpie.lboro.ac.uk/dspace/handle/2134/676 copy number 1)

Google 1 (out of 23)
OpenDOAR 1 (out of 2)

Pulse rates in the songs of trilling field crickets (Orthoptera: Gryllidae: Gryllus).
Google 5 (out of 158) (http://tjwalker.ifas.ufl.edu/AE93p565.pdf copy number 1)
OpenDOAR 1 (out of 4)

Te mana o te reo me ngā tikanga: Power and politics of the language
Google 1 (out of 540)
OpenDOAR 1 (out of 8)

Surface Projection Method for Visualizing Volumetric Data
Google 1 (out of 730,000)
OpenDOAR 1 (out of 193)

The Social Construction of Economic Man: The Genesis, Spread, Impact and Institutionalisation of Economic Ideas
Google 1 (out of 10,200)
OpenDOAR 1 (out of 38)

The Adoption of Activity-Based Costing in Thailand
Google 1 (out of 30,100)
OpenDOAR 1 (out of 30)

Connecteurs de conséquence et portée sémantique
Google 3 (out of 18,800) (pweb.ens-lsh.fr/jjayez/clf19.pdf copy number 1)
OpenDOAR 1 (out of 34)

As I say, I'm not in the least bit proud of this experiment, but it only took 30 minutes or so to carry out, so if you tell me how flawed it is I won't get too upset.

What these results say to me is that, for known item searching at least, there is little evidence that Google is losing our research nuggets within large results sets.  What Google is doing is to push the nuggets to the top of the list.  In fact, in some cases at least, I suspect one could argue that the vanilla Google search is surrounding those nuggets with valuable non-repository resources that are missed in the OpenDOAR repository-only search engine.

For me, this exercise raises three interesting questions:

  1. Are repositories successfully exposing the full-text of articles (the PDF file or whatever) to Google rather than (or as well as) the abstract page?  If not, then they should be.  I think there is some evidence from these results that some repositories are only exposing the abstract page, not the full-text.  For a full-text search engine, this is less than optimal.  My suspicion is that the way that Google uses the OAI-PMH to steer its Web crawling is actually working against us here and that we either need to work with Google to improve the way this works, or bite the bullet and ask repository software developers to support Google sitemaps in order to improve the way that Google indexes our repositories.
  2. Are we consistent in the way we create hypertext links between research papers in repositories?  If not, then we should be.  In the context of Google searches, linking is important because each link to a paper increases its Google-juice, which helps to push that paper towards the top of Google's search results.  Researchers currently have the option of linking either direct to the full-text (or one of several full-texts) or to the abstract page.  This choice ultimately results in a lowering of the Google-juice assigned to both the paper and the abstract page - potentially pushing both further down the list of Google search results.  The situation is made worse by the use of OpenURLs, which do nothing for the Google-juice of the resource that they identify, in effect working against the way the Web works.  If we could agree on a consistent way of linking to materials in repositories, we would stand to improve the visibility of our high-quality research outputs in search engines like Google.
  3. What is the role of metadata in a full-text indexing world?  What the mini-experiment above and all my other experience says to me is that full-text indexing clearly works.  In terms of basic resource discovery, we're much better off exposing the full-text of research papers to search engines for indexing, than we are exposing metadata about those papers.  Is metadata therefore useless?  No.  We need metadata to support the delivery of other bibliographic services.  In particular we need metadata to capture those attributes that are useful for searching, ranking and linking but that can't reliably be derived from the resource itself.  I'm thinking here primarily of the status of the paper and of the relationships between the paper and other things - the relationships between papers and people and organisations, the relationships between different versions, between different translations, between different formats and between different copies of a paper.  These are the kinds of relationships that we have been trying to capture in our work on the DC Eprints Application Profile.  It is these relationships that are important in the metadata, much moreso than the traditional description and keywords kind of metadata.

Overall, what I conclude from this (once again) is that it is not the act of depositing a paper in a repository that is important for open access, but the act of surfacing the paper on the Web - the repository is just a means to en end in that respect.  More fundamentally, I conclude that the way we configure, run and use repositories has to fit in with the way the Web works - not work against it or around it!  First and foremost, our 'resource discovery' efforts should centre on exposing the full text of research papers in repositories to search engines like Google and on developing Web-friendly and consistent approaches to creating hypertext links between research papers.

October 26, 2006

e-Portfolios: an overview of JISC activities

A short JISC briefing paper entitled e-Portfolios: an overview of JISC activities has been made available by Lisa Gray and Sarah Davies (apparently since the 5th of October, though I only saw it announced yesterday - hence this post).

The paper gives a useful overview of JISC-funded activities in this area (though a few more links to project Web sites would have been nice).  In particular, the paper lays out three main functions of e-portfolios as enablers of:

  • presentation - providing supporting material for admission to study or job, induction, appraisal or assessment;
  • transition - supporting learners as they move between and across institutions and sectors; and
  • learning - enabling personal and reflective learning, guiding and developing learning (both formal and informal) over time in education, training and employment.

October 25, 2006

The R word

I've said this before and I'll probably say it again... in the context of the general move towards making scholarly research papers available on an open access basis, repositories are just a means to an end.  They are one way of making such material available on an open access basis, but they are not the only way.  A repository is just a content management system by another name - though, admittedly, one where the content of interest is a collection of scholarly papers.

The important point, at least as far as open access is concerned, is not that such papers are deposited into a repository but that they are made freely available on the Web.

It surprises me therefore that the 'repository' word tends to feature quite prominently in semi-legalistic documents such as the position statements from the UK research councils and the new JISC/SURF model agreement for authors.  Take the model agreement (announced earlier today) as an example.  It says:

The Author retains all other rights with respect to the Article not granted to the Publisher and in particular he can exercise the following rights:


To upload the Article or to grant to the Author’s own institution (or another appropriate organisation) the authorisation to upload the Article, immediately from the date of publication of the journal in which the Article is published (unless that the Author and the Publisher have agreed in writing to a short embargo period, with a maximum of six (6) months):
a) onto the institution’s closed network (e.g. intranet system); and/or
b) onto publicly accessible institutional and/or centrally organised repositories (such
as PubMed Central and other PubMed Central International repositories), provided
that a link is inserted to the Article on the Publisher’s website.

Why does this document refer specifically to the publisher's 'website' and the institution's 'repository'?  It could just as easily have referred to the publisher's 'content management system' and the institution's 'website'.  Who cares?  It's not like there's a legal definition of what a repository is anyway?

What the author needs is the right to make their research freely available on the public Web, not specifically the right to deposit it into a repository.

Note that this is not to take issue with the overall thrust of the model agreement, which is something that in general I agree with.  What worries me is whether we are fixated, in a somewhat unhealthy way, with the word 'repository'.

Real vs. fake sharing

There was an interesting post in the O'Reilly Radar recently, commenting on the notion of real sharing vs. fake sharing, an idea first proposed in Lawrence Lessig's blog and one that I would characterise (somewhat simplistically) as being the difference between making stuff available for re-use vs. making stuff available only to read or view.

By way of example, I always do my best to make my Powerpoint slides available on the Web  - though not necessarily in a particularly timely fashion! :-)  Often, this is done as part of the record of the meeting at which the slides were presented.  It also gives people who were unable to attend the meeting a chance to see what was being said.  More fundamentally though, it is about allowing people to re-use the material in those presentations in whatever way they see fit.

Now, I'll happily accept that I haven't been very explicit about this more fundamental intent - and it could easily be argued that I should have made an appropriate Creative Commons licence clearly visible on the opening slide of each presentation or somesuch.  Nonetheless, sharing my slides with re-use in mind was certainly part of the plan.  (Whether anyone actually sees any value in re-using those slides is, err... a different matter!)

Readers of this blog will know that we have recently been experimenting a little with Slideshare, as an alternative way of making our Eduserv Foundation Powerpoint slides available.  However, one of the issues with this service is that it doesn't provide a way of downloading the original PPT or Open Office file.  Yes, we can use the service to provide a nice view of our slides.  Yes, those slides can be embedded into someone else's Web site.  But the slides themselves can't easily be re-used.  Its an example of fake sharing I think.

As a result, it currently seems better to stick with a local Eduserv Web page for each of our presentations (such as the one for my recent CETIS presentation), containing both a link to the PPT file and an embedded version of the Slideshare viewer.

It would also be nice if Slideshare offered this functionality directly and looking around at various blogs, there are at least some hints that it might be included in future versions.  I'm keeping my fingers crossed...

October 24, 2006

Videoconferencing in education

The latest eLearning for Education Bulletin from eMedia (a largely marketing-type email newsletter, but one which is mildly useful for seeing what is currently attracting attention) carries a short item about using the Marratech videoconferencing system in education.  It notes that Marratech have been awarded a contract to supply 3000 Scottish schools and 800k users with their conferencing and collaboration solutions.

Apart from wondering what 3000 Scottish schools are actually going to do with their shiny new videoconferencing tool, the item caught my eye because we use Marratech here occasionally for the Dublin Core / IEEE  LTSC Taskforce meetings (courtesy of the system installed at the Royal Institute of Technology in Sweden where Mikael Nilsson is based).  It's a nice system, very easy to use, with a freely downloadable dedicated client, embedded shared whiteboard and chat facility - pretty much as you'd expect.

In the past, we also played briefly with the Open University's FlashMeeting - a browser-based videoconferencing tool with the unusual feature of having to electronically raise your hand before speaking.  It sounds odd, but actually is very effective as a way of structuring the meeting once you get used to it, particularly where there are a lot of participants.

I have wondered whether people in higher and further education might benefit from centralised provision of one or other of these tools (or their equivalent) in order that there is a lightweight  and free-at-the-point-of-use shared on-line space for videoconferencing meetings in the way that there is a shared space for mailing lists (JISCMail). Politically though I suspect it would be hard to argue for this given the research community's emphasis on the Access Grid to date, a system that has singularly failed to reach the desktop in any real way as far as I know.

Sometime ago I wrote a short piece for Ariadne looking at the use of VRVS - an open source videoconferencing tool that can also interface to Access Grid rooms. At the time, Brian Kelly at UKOLN asked me why I was promoting a tool with such a poor user-interface.  Looking back, he was absolutely right - as a tool it doesn't compare at all with the equivalent commercial offerings in terms of usability.  (Worried about being unfair to VRVS in writing this blog item I went back to it today to see if things had improved but I'm afraid to say they haven't - for example, VRVS still features terms like Mbone, VIC and RAT very prominently in its user-interface).

Of course, one can do a lot with the likes of Skype and MSN these days but I still think there are scenarios where Marratech (or tools like it) bring significant benefits.

October 21, 2006

Identity management and Learning 2.0

Plasticflowerscropped_1 David Recordon of Verisign has uploaded his Enabling Digital Identity slides (PDF) for the 2006 DC PHP Conference to the OpenID Web site.  The slides are largely about OpenID, but there's some contextual information about the wider identity landscape which is very helpful - making the audio of David's presentation available as well would be even better (hint, hint!).

I realise that I've mentioned OpenID in a couple of recent posts, so it probably looks like I'm pushing it as a solution.  The truth is that I don't think I understand the issues well enough to really comment yet.  But I do think it is an interesting development - and it's interesting because it helps to highlight some aspects of the transition that the UK academic community is currently making in terms of access and identity management.

Think back 10 or 12 years or whatever it was.  The university community had a pressing need for single sign-on (in some shape or form) to the growing number of external information resources being made available on-line.  The chosen solution was Athens.  Now, I defy anyone to argue that Athens has not been hugely beneficial to the UK higher and further education community and beyond in the intervening years as an enabler of access to content.  And it remains so.  Yet it was a completely centralised solution.  At the time, there was probably no real architectural alternative for practical reasons.  As a community, the JISC took centralised responsibility for our access and identity management needs (at least as far as access to external bibliographic and related resources was concerned) and sub-contracted the delivery of a solution to Eduserv (or one of its previous incarnations) in the form of Athens.

Over the last few years we have seen the beginnings of a more distributed approach by enabling integration between Athens and institutional identity solutions (LDAP directories or whatever) in the form of AthensDA.  More fundamentally, we are now seeing concerted attempts to move UK educational communities to a federated access management approach, through the use of SAML-based technologies (Shibboleth) and the UK Access Management Federation for Education and Research.  In short, we are seeing a transition from centralised to federated, with UK academic institutions taking responsibility for the delivery of their member's public identities.

But what of 5 or 10 years time?  Well, it seems to me that strategically the current federated situation is just a stepping stone on the road to a completely devolved identity landscape.  On-line identity will become an individual's responsibility and we will see a corresponding shift from a federated to an individualised approach.  Expecting students to turn up at a university in 10 years time and be told that they've got to use the identity provided by their institution will be as anachronistic as expecting today's students to turn up and use their institutional email account.  (OK, universities might still just about get away with mandating use of their email accounts - but surely not for much longer?).  Students (and staff) will expect (perhaps even demand?) to be able to use the same on-line identity that they use for everything else - the one they used at school and elsewhere before going to university and the one that they will use afterwards.  Academic institutions will be users of identity services, but they will not be identity providers.

This was brought home to me very clearly in the paper that Scott Wilson mentioned in his blog some time back.  It's a paper that looks at the notion of personal learning environments and is one that is well worth reading.  The paper considers the use of Web 2.0 social tools as a way of supporting learning in what has come to be known as Learning 2.0.  As I read that paper I couldn't help but ask myself, "How does Shibboleth help enable this kind of environment?".  I'm not sure that it does?

Image: Flowers in a gite in Pontivy, France (post-processed using something or other by Stan, my youngest son). [July 2006]

October 18, 2006

Item banks and access control

At the CETIS SIG meeting about item banks last week there was some discussion about access control.  Despite fairly widespread agreement that item banks are just repositories (as these terms tend to get used in the context of JISC discussions at least), it became clear that one of the defining factors of item banks is that access typically needs to controlled more tightly than is normally the case with open access repositories.  This is particularly true of item banks that are used as part of summative assessment - clearly it is important to know that people won't have had sight of questions until the actual exam.

To complicate matters there is therefore a need for both access control and timed release of questions - but lets leave that aside for the moment and focus on how access should be controlled.

In my slides about the relationship between the JISC Information Environment and item banks, I included a single bullet point that somewhat simplisticly said, "if you need to control access, used Shibboleth".  This is in line with current JISC guidance - but on questioning I realised that I was struggling to explain the details of what it meant in practice.  Having now thought about it some more, and taken advice from a few colleagues at Eduserv, I've had a go at trying to clarify what I really meant in the text below.

The following text provides practical guidance on how to implement access control in-front of an item bank (or any other kind of digital repository for that matter):

Where the item bank is only intended to be used by members of a single institution, use whatever single sign-on mechanism is in use within that institution (e.g. LDAP-based authentication).  Having said that, it is probably worth noting that intra-item banks (only used within a single institution) are likely to become extra-item banks over time (shared between a collaborating group of institutions) or even inter-item banks (shared openly) because of the changing nature of collaboration between institutions in education.  It may therefore make sense to implement an item bank them with sharing in mind (see below), even if short term usage remains closed.

Where the item bank is intended to be used by members of more than one institution (for example, where a group of institutions are collaborating on a single item bank), a 3-step Shibboleth approach should be adopted:

1) Implement a Shibboleth Service Provider (SP) in-front of your item bank application software.  An SP is a deployment of SAML software that validates attributes (assertions) issued by Shibboleth Identity Providers (IdP) and uses them to create a security context that assists in the enforcement of access control based on those attributes.  Typically the SP is embedded as an Apache module (or equivalent for other Web server platforms).

2) Join the UK Access Management Federation for Education and Research (which in practice means joining the SDSS federation at the moment - though the transition from SDSS to the UK Federation will apparently be seamless).

3) Configure access control policies in the item bank software, based on the attributes passed via the SP.

This will allow members of any Shibbolised institution (i.e. any institution that has implemented a Shibboleth IdP) to gain access to your item bank (if your policies allow it).  Unfortunately, in the short term that doesn't mean much - there aren't many Shibbolised institutions in the UK yet, though that will presumably change over time.  Luckily, it will also allow access to your item bank by any existing Athens users (by virtue of the Athens to Shibboleth gateway that is intended to go into beta service later this month).

There are various options for implementing the SP and IdP code and the MATU Web site provides additional information.  It's worth remembering that adopting Shibboleth follows current JISC recommendations, but it isn't the only access and identity management kid on the block.  Various players are moving into this area - with OpenID looking interesting, particularly from the point of view of Web 2.0.

Eduserv have recently announced a new software toolkit called Atacama, currently available as a beta release (note that Atacama is only currently available to existing Athens sites and Data Service Providers (DSPs)).  One advantage of the Atacama approach is that Eduserv will use it to provide support for multiple identity and access management architectures.  Atacama contains a Shibboleth-compatible module allowing it to interact with any Shibboleth IdPs.  In addition it also supports OpenId, geoIP and SAML2, allowing easy plug-in of different authN/authZ modules as required.  What is not yet clear is whether Atacama will be released on an Open Source basis.

October 16, 2006

Using games in education

I've just got round to reading the final report of Teaching with Games - a one year FutureLab project supported by Electronic Arts, Microsoft, Take-Two and ISFE.  Questions over whether it is appropriate for the games industry to directly support this kind of research aside, it seems to me that the key findings listed in the executive summary are worth seeing written down somewhere - though I don't suppose that any of them are particularly surprising or controversial?  It is at least reassuring to note:

Using games in a meaningful way within lessons depended far more on the effective use of existing teaching skills than it did on the development of any new, game-related skills.

And there was me thinking that I might become a great teacher just by being a whizz at Rollercoaster Tycoon! :-)

The Eduserv Foundation funds some work in this area - a project by Diane Carr, Eduserv Research Fellow at the Institute of Education, called Digital technology: learning and 'game formats'. Computer games, motivation, and gender in educational contexts. Diane's research aims to investigate the actual and potential benefits of computer game formats in educational settings, looking at issues like:

Why are computer games so captivating? Are computer games intrinsically motivating? What of 'bad' games? What of the relationship between learning-to-play, structure, and content in games? How do context and gender shape players' motivation?

For those wanting to get a good overview of this area, the slides presented by John Kirriemuir at the Ticer (digital library) Summer School (warning: large PPT file) earlier this year are pretty good.  There is a particular focus on the use of games in libraries, but actually the presentation is of more general interest I would say - I found it quite a useful way of getting up to speed.

October 13, 2006

e-Portfolio and access management

(Found via the OpenID mailing list).

For those with an interest in e-portfolios and access management there are snippets of news coming out of the ePortfolio 2006 event in Oxford - including reports of a demonstration of Elgg, Moodle and Drupal working together using OpenID, which sounds interesting (though I couldn't find much in the way of technical detail yet).

More generally, various of the presentations from the conference are now available, including some commentary from delegates and various video interviews.

IBM SOA videos on YouTube

I learned yesterday that three IBM videos introducing the notion of Service Oriented Architecture are available on YouTube (though they've been there for a while).

I wouldn't get too excited - not that there's anything particularly wrong with them.  I watched the first and now know that SOA is like my wardrobe - allowing me to pick and mix my clothes at will.  Unfortunately I tend to throw my clothes on the floor most of the time - which probably means I've got Wardrobe 2.0 ?

October 12, 2006

Our collective carbon footprint

Greenland I mentioned in one of my early postings to this blog that I had some reservations about the amount of travel that I do in order to attend meetings, and the resulting size of my personal carbon footprint.

Having just come back from DC-2006 in Mexico and now sitting in Glasgow airport awaiting my return flight to Bath following the CETIS Assessment and Metadata and Repositories SIG meeting, I'm feeling particularly guilty!  We are used to the notion of cost / benefit analysis - though I'm not sure how well we apply it to the kinds of digital library and elearning meetings that we attend.  Doing such an analysis is difficult, since many of the benefits of face to face meetings are relatively intangible, and often only bear fruit sometime downstream of the actual meeting.  But I'm wondering whether we, as a community, also need to factor in the "carbon cost" of our activities?  One obvious way of reducing our impact on the planet being to make more use of virtual meetings than we do currently.

Two coincidental happenings occurred to spark this particular post.  Firstly, a seminar at the International Centre for the Environment at the University of Bath earlier this week entitled "The Case for Personal Carbon Allowances" by Dr Tina Fawcett from the University of Oxford.   A personal carbon allowance essentially gives each individual a quota of "carbon points", charging them for any over-use, but allowing them to sell on any part of their quota that they don't use to someone else.  It's an alternative form of taxation - intended to hit hardest those who contribute most to the problem.  An interesting idea, though I'm not sure how well it would work in practice and unfortunately I missed the seminar because of prior commitments.

Secondly, an article in last Saturday's UK Guardian questioning the value of carbon offsetting as companies jump on the band-waggon of becoming "carbon neutral".  The paper version of the Guardian report includes some indications of typical carbon emissions.  Running a car for a year (average mileage) produces 3 tonnes of carbon for example.  A return flight from London to Sydney produces 5.6 tonnes.

Using the Terrapass emissions calculator shows that my two DCMI trips so far this year (Seattle, US and Colima, Mexico) have produced over 8 tonnes of CO2 - that's equivalent to running my car for nearly 3 years!  Frightening.

I'm not sure that I'm ready to draw any hasty conclusions yet - hey, I enjoy travelling as much as the next person!  But it certainly gives pause for thought.

Image: View over Greenland [May 2006]

October 11, 2006

The importance of being open

CloverploverA copy of October's EDINA Newsline floated onto my desk today and I happend to read their update on the progress of JORUM - JORUM delivers valuable resources.

Chicken trussing aside, it is hard not to be tempted to make simplistic comparisons between JORUM and Slideshare and the other social tools - all of which seem to gather momentum at frightening speed.  I know I'm not comparing like with like and I know I'm being unfair in a way - but I wonder how the 1200 resources deposited into JORUM over the last 11 or so months compare with the rate of presentations being deposited into Slideshare currently (even while it is still in beta)?  I briefly tried looking for some statistics about the rate of takeup of Slideshare and failed - but looking at the turnover of new presentations on the homepage indicates a pretty healthy pattern of usage.

So, what are we doing wrong?  If anything?

I was preparing my slides for the CETIS Metadata SIG meeting about Item Banks earlier today, which caused me to stop and think a little about the similarities and differences between the JISC Information Environment and Web 2.0.  It seems to me that we got a lot right with the JISC IE (and when I say we, I really mean Lorcan Dempsey and Robin Murray and various others who did a lot of the early thinking before I got involved).  I spent a lot of time around the turn of the century evangelising the importance of machine to machine interfaces and being able to glue things together across the network - at a time when many people were only really interested in getting everyone to visit their Web site or portal.

The JISC IE encouraged an open approach but, looking again at the Web 2.0 Meme Map, what it failed to do was successfully encourage participation in the way that Web 2.0 social tools manage to do.  Why?  I don't know.  Perhaps I simply didn't do a very good job, or perhaps the world wasn't ready for that way of thinking?  Paul Miller does a great job of talking up Library 2.0 (and Library 2.0 is the logical conclusion of where the JISC IE was going I think) but I don't really know if it is having a real impact on your average library service even now?

To a certain extent I think we fell foul of being too rigid in the use of a particular set of standards - some of which are not very RESTful in their approach.  I end my Item Bank talk with two slides.  The first giving my view of how Item Banks should be delivered in terms of the JISC IE technical architecture.  The second in terms of Web 2.0 (or my interpretation of it).  There are some similarities, but also some differences.  I wonder what, if anything, we should learn from that?

More generally, I also wonder if we don't always 'trust our users' and value their 'right to remix' - both of which it seems to me are key principles in what makes Web 2.0 a success.  Does JORUM's current registration process indicate a trust in the end-user?  Slideshare warns me not to upload copyright material, but leaves it at that - perhaps they've got better lawyers than JORUM!?

Image: slide taken from How to tell the Birds from the Flowers by Robert Williams Wood.

October 09, 2006


Doh!  As soon as I go and buy an Eduserv Foundation Flickr account in order to experiment with uploading copies of all our Powerpoint slides (see some of my previous blog entries for examples), someone goes and invents a new service dedicated to that task - Slideshare.  Looks interesting, though it is still in beta and getting an account is by invite only at the moment.  Fortunately, it seems to be possible to invite yourself!

I haven't played with it much yet (here is a link to the only set of slides that I've uploaded so far), but the interface looks pretty slick and it certainly seems to beat exporting every slide as a JPEG and then bulk uploading them all to Flickr as I was doing before.

October 08, 2006


I have been an enthusiastic user of the del.icio.us social bookmarking service for quite a long time (my first entry was on 30 July 2004, apparently). I have looked at other similar systems over the years but there's something about the clarity and simplicity of del.icio.us that appeals to me, and my del.icio.us collection is indispensable to me. I use del.icio.us almost every day, to add new entries and to retrieve references from my own collection, but also occasionally to browse the collections of colleagues and friends who I know share similar interests.

One of the most common criteria by which I found myself wanting to retrieve entries was by author: I wanted to find bookmarks in my collection for items that were created by Tim Berners-Lee or Roy Fielding. In my early posts to del.icio.us, I captured this using a "structured tag" approach in which I used a tag of the form "creator_FamilynameFirstname" to capture this information. So items by the two authors above were tagged with the tags "creator_Berners-LeeTim" and "creator_FieldingRoy".

Some time later, I came across GeoTagging, a set of conventions for using tags to add geographical identification metadata, usually the latitude and longitude of a location associated with the described resource, so that the resource can be found using a location-based search or the data otherwise processed using location-based services. Geotagging incorporates the use of a single geotagged tag, to signal that the convention is being applied in the current set of tags, and a set of structured tags of the form geo:xyz=nnnnnn, which serve as attribute-value pairs, where geo:xyz is the "qualified name" of an attribute defined by the Geotagging specification and nnnnnn is the attribute value provided by the person creating the entry.

So, for example the set of tags

geotagged geo:lat=51.4989 geo:lon=-0.1786

indicate firstly that the Geotagging convention is being applied and secondly that the resource described has some association with the place with latitude 51.4989 and longitude -0.1786. (Recently Flickr implemented a system whereby users can apply geotags to their images by selecting a location on a map, rather than having to determine latitude and longitude by some other means and enter the tags by hand.)

So based on the Geotagging approach, I switched to using a convention I've informally called "dctagging" where I apply a tag dctagged and then use structured tags of the form dc:xyz=sssss, where each such tag represents (using the terminology of the DCMI Abstract Model) a statement using a property from the Dublin Core metadata vocabularies, with the property URI represented by the "qualified name" dc:xyz (actually, I use names of the form dcterms:xyz as well) and the value string is represented by the sssss part of the tag. So for items created by Tim Berners-Lee, I use the tag combination

dctagged dc:creator=Berners-LeeTim

which enables me to retrieve bookmarks in my collection for items created by Berners-Lee. Obviously this relies on a shared convention for the construction of "value strings" (the sssss part of the tag), and it would be more difficult to achieve that across the collections of multiple users.

I have made a few uses of other DCMI properties e.g.

dctagged dc:publisher=DCMI

but for my own purposes of retrieval, I've tended to make use mainly of the DC "creator" and "contributor" properties. Fortunately del.icio.us incorporates a global tag replacement feature where you can replace all the instances of one tag in your collection to instances of one or more other tags, so it was relatively easy to convert my existing data to the new conventions - though it does have to be done on a tag by tag basis through a form, so I still haven't converted all my existing tags.

The next step, I suppose, would be to produce an algorithm to extract this DC metadata from one of the XML formats exposed by del.icio.us and to make use of it as DC metadata description sets - but that is a job for some rainy Sunday afternoon back in the UK, not a hotel room in Mexico City!

On openness

I had started to compose this post four days days ago and a few hundred miles away, on the morning of the first day of DC-2006, the conference of the Dublin Core Metadata Initiative, held in Manzanillo, Mexico, on which Andy has already reported. But a combination of the customary whirl of conference activity (somehow in between giving presentations and joining in subsequent discussions, there didn't seem to be much time to write about what was going on) and the intermittent access to the wireless network (the rumour was that the bandwidth wasn't sufficient to meet demand once the Skypesters got going) means that I find myself finishing it only now.

Our outward journey from the UK was unexpectedly extended (there's an opportunity here for a Top 5 Most Confusing Airports list meme, but I think that would be another post - suffice to say that Benito Juarez Airport, Mexico City is right up there on my list), and before the conference proper got under way with its mix of presentations of papers and topic-focussed "special sessions" and meetings of working groups, I had already spent three days in meetings, first of the DCMI Usage Board (as a guest), and then of the DCMI Advisory Board (as a member). So it was a long and at times demanding week, but also, I think, a rewarding and interesting one.

Like Andy, I was pleased to see so many references to the DCMI Abstract Model at the conference, and probably for the first time, I got a sense that it has become "embedded" in the activity of DCMI; awareness and understanding of the DCAM now extends beyond the constituencies of the DCMI Architecture Working Group (who were most closely involved in its development) and the Usage Board (who were probably its first "users"), and several working groups and implementers were citing it as their reference point.

The other topic I found myself pondering both before and during the conference was the question of how DCMI operates in a fast-changing world, how it manages to demonstrate its relevance in new contexts - the world of Google and "Web 2.0" and podcasts and YouTube is a long way from the world of the emerging Web of "document-like objects" in which Dublin Core itself was born, and yet metadata is at the heart of many of these new systems and services -, how it engages with new communities, and indeed how it might encourage members of these communities to become active participants in the work of the DCMI.

Many individuals who were driving forces behind DCMI in the early years of its development have moved on to other areas of interest. Certainly it is true that new participants have come forward and joined working groups and task forces, many of them dedicating large amounts of time and energy to the work of the DCMI. But it seems that they are, for the most part, "people like us" - librarians and other information management professionals. Not that I'm complaining, y'understand - some of my very best friends are librarians and archivists - honest! But how do we reach the people working in areas of social bookmarking or microformats or designing metadata specs for Flickr? And telling them about Dublin Core and explaining its potential usefulness to them is only part of the challenge. How do we provide opportunities for them to feed their insights and experiences in to DCMI? We need to make sure our processes and communities are open and accessible as well as our specifications. That is not to say that we abandon any of the rigour with which DCMI has approached the development of its specifications - far from it: there is much I see that concerns me in the bewildering array of ad hoc APIs and incompatible and less than clearly articulated data models that characterise at least some areas of the "Web 2.0" landscape - but there is undeniably a great dynamism and inventiveness there, which, it seems to me, we need to tap into if we are to continue to develop.

Perhaps another facet of this question - and one which becomes very obvious at conferences held around the globe - is what we do to make those processes and communities open to individuals whose first language is not English, but that is probably a topic for another post.

October 06, 2006

Big crashing sounds - I can hear you

Sunrise It's been a good DC conference this year and I thought I'd better write at least one blog entry while I'm here in Mexico.

The week started badly, with a missed connection in Mexico City thanks to a technical fault with our plane in Paris leading to an unexpected night in the capital and the need to Skype into the first day of the DC Usage Board meeting. I was travelling with Pete Johnston, who was attending the UB meeting as a guest in order to hear our somewhat negative deliberations on the proposed Collection Description Application Profile. The Skype connection worked just about OK – though network problems and a nasty echo at our end made for a less than ideal situation.  (The title of this post is based on Tom Baker's first words to us when we managed to get connected - which Windows subsequently decided to use as the title of our Skype 'chat' window).

But things got better from then on – in a hotel complex that was both conducive to getting some work done (though a more reliable wireless network would have been a bonus) but also somewhat scary in the way it (literally) locked out the real Mexico.

Personal highlights for me included the DC Architecture working group meeting and the ePrints Special Session – both of which, it seemed to me, gave a significant endorsement of the DCMI Abstract Model, and the direction in which it is going. Tom Baker has now finalised a joint roadmap for the UB and the Architecture WG which should see new DCMI Recommendations for the Abstract Model, the Namespace Policy and the XML and RDF encoding guidelines by Easter 2007, following comment periods some time after Christmas. Several plenary papers also made positive reference to the abstract model – so I really think we're on to something with it.

At the Advisory Board meeting on Monday, I questioned DCMI's current standardisation activities on the grounds that they no longer give the right message about Dublin Core. I must admit that I felt awkward in raising this issue since I'm well aware that people have put significant effort into the standards activity within DCMI – but getting DC's unique selling proposition right is key to our success in the future and any standardisation activity is part of that.

In short, it seems to me that DC isn't about 15 metadata elements – or even the slowly evolving list of 40 or 50 metadata terms that we have now approved. Rather, DCMI is the framework provided by the Abstract Model – a framework that supports a wide variety of metadata descriptions, using properties selected from anywhere that is convenient, and encompassing description sets that comprise descriptions of whatever set of entities is important to the task at hand. Rather oddly, we are now in a position where a DC description is a DC description even if it doesn't use a single DC metadata term!

If we keep presenting DC as a flat list of elements that can only be used to describe single entity resources then it’s not surprising that people, like the librarians at LANL, will see it as not being expressive enough for their needs and will turn elsewhere. There is no real excuse for people reaching this kind of conclusion, other than DCMI's inability to promote the real strengths it has to offer.

Don't get me wrong, ten years ago reaching consensus about 15 (or 13 as it originally was) metadata elements deemed to be important for resource discovery on the Web was an inspired move – and paved the way for everything that has happened since.  But we no longer live in the world of ten years ago. In a resource discovery world typified by full-text indexing, hypertext link analysis and powerful user-behaviour monitoring, 15 elements are both too simple to support the richer functionality required in some contexts, and too complex for the general purpose case.

The Usage Board decided some time ago that the original 15 DC properties will be replicated in the DCTERMS namespace. For me, this is much more than a technical convenience. It is a clear way of stating that these are simply 15 metadata terms, just like any others. There is nothing special about them.

The most important thing about DCMI is the framework provided by the Abstract Model – that is what we should promote as the key brand of Dublin Core. And if we feel the need to turn to ISO or IETF to make DC into a standard, then it is the Abstract Model that we should focus on, not 15 somewhat fuzzy metadata properties on a ten year old conference tee-shirt.

Image: Sunrise over Manzanillo, taken from my hotel balcony. [October 2006]



eFoundations is powered by TypePad