Main | October 2006 »

September 28, 2006

DC-2006 Eprints Special Session slides

The slides for the special session about using DC to describe scholarly publications that Julie Allinson and I are presenting at DC-2006 in Mexico are now available in Flickr for anyone that's interested.

What I like about this work is not just that it makes good use of FRBR but also that it demonstrates the expressive power of the DC Abstract Model.  Well, I guess I would say that wouldn't I!  It'll be interesting to see what others think when we present this work in the session.

What came home to me very clearly while working on these slides is that work on any metadata application profile needs to start with an 'application model' - the set of entities that are going to be described and the key relationships between those entities.

Starting with the 15 elements in the DCMES and asking "How can I extend this?" is absolutely the wrong way to begin!

September 25, 2006

DC-2006 Basic Syntax tutorial

I'm giving a basic syntax tutorial on the second day of DC-2006 (I've put the slides on Flickr, just in case anyone is interested in what I'm going to be saying).  It's more or less the same tutorial that I have given at the last 3 or 4 DC conferences - though, of course, the content evolves slightly each year as DCMI Recommendations get updated (a process that sometimes feels like wading thru treacle - so I normally don't have to make too many changes to my slides from one year to the next).

My DC tutorials tend to follow a set pattern - I start with an overview of the DCMI Abstract Model, followed by details about how to encode DC in HTML and XML, finally touching on issues around expressing DC using RDF.  I also try to keep some slides in reserve about how DC metadata is used in the OAI-PMH and RSS 1.0, to use as case studies, just in case there's time at the end - but there never is.

This year I feel a bit frustrated, because the guidelines for encoding DC in XML and RDF are very much in a state of flux and will be the topic of discussions elsewhere in the conference.  I don't want to mislead, by giving a tutorial based on how we used to do DC! :-)  Therefore, this year, I'm going to drop any real discussion about how to represent DC using XML and RDF, which should allow me enough time to focus a little bit on the OAI-PMH and RSS case studies.

But life is never that simple!  The way in which DC is used in the OAI-PMH and RSS doesn't really fit very well with the way our thinking has now evolved.  In particular, the use of DC in RSS 1.0 is a prime example of using the dc:creator, dc:contributor and dc:publisher properties with literal values, whereas our thinking now leans towards making the range of these properties the class of all Agents - i.e. modelling the agent as a resource on which other properties can be hung.

However, making this change for the current DC properties would break the semantics of any existing RSS 1.0 metadata (and any other existing RDF metadata that uses the same convention).

Luckily, we are also thinking about replicating the 15 DCMES properties (the original DC elements) in the DCTERMS namespace.  One possible way forward, first suggested by Mikael Nilsson, is to only assign explicit ranges to the properties in the DCTERMS namespace, leaving the current DCMES elements in a state of blissful fuzziness.

The situation with DC in OAI-PMH is a little different.  Any software that processes oai_dc XML records (the DC XML record format exposed using the OAI-PMH) has to have an understanding of DC hard-coded into it anyway.  So even if the guidelines for representing DC in XML move away from our current DC XML conventions, the fact that we can make an unambiguous mapping from oai_dc to the DCMI Abstract Model means that this format will remain a useful exchange format.

DC-2006 Special session - ePrints Application Profile

At 2.00pm on Weds Oct 4th there is a DC-2006 special session about using Dublin Core to describe eprints (i.e. scientific or scholarly research texts, for example peer-reviewed journal articles, preprints, working papers, theses, book chapters, reports, etc.).

This session will be largely based on some JISC-funded work that Julie Allinson (UKOLN, University of Bath) and I have been doing to develop a Dublin Core application profile for describing scholarly publications.

This work uses a combination of FRBR and the DCMI Abstract Model to create a description set for an eprint that is much richer than the traditional flat descriptions normally associated with Dublin Core.  The intention is to capture some of the relationships between works, expressions, manifestations, copies and agents.

"DC metadata tends to be thought of as only being capable of describing flat, single-entity, constructs - a Web page, a document, an image, etc. However, the DCMI Abstract Model introduces the notion of a description set, a group of related descriptions, which allows it to be used to capture metadata about more complex sets of entities, using models like the one described here.

DCMI is currently developing a revised set of encoding guidelines for XML and RDF/XML, which will allow these more complex, multi-description, description set constructs to be encoded and shared between software applications."

This session may therefore be of interest, not just to people working with eprints, but to those who want to see how the DCMI Abstract Model can be used to support rich resource description and how FRBR and DC can be used in combination.

This session will be part seminar, part discussion.  As well as sharing information about the work we have done so far, we also hope to have a discussion about forming a DCMI working group to take this work forward.  We will start with a presentation about the work:

  • Background, rationale and fundtional requirements
  • The model
  • The application profile and vocabularies
  • Dumb-down issues

The session will be led by Andy Powell (Eduserv Foundation) and Julie Allinson (UKOLN, University of Bath).

Road to Manzanillo

The Eduserv Foundation is pleased to sponsor this year's Dublin Core conference in Mexico, not least because this year's theme, "metadata for knowledge and learning", fits very well with our areas of interest.

We're looking forward to healthy debate, both at the conference itself and at the satellite meetings of the DC Usage Board and the DC Advisory Board, particularly around updates to the DCMI Abstract Model and the guidelines for encoding DC metadata in RDF and XML, the most recent versions of which can all be found in the DC Architecture Working Group Wiki.

Both Pete and I will be at the conference, presenting papers, giving tutorials (Basic Syntax) and running or contributing to various working group and special sessions (DC Collections, DC Architecture, Eprints special session, DCMI/LOM special session and the Education reports from the field special session).  As always, it looks like it is going to be a busy week.

We should also gratefully acknowledge the support of DCMI and JISC, both of whom have helped fund our travel costs for the conference.

September 20, 2006

Repository software evaluation

The OARINZ Project (Research Repositories in New Zealand) have published a Technical Evaluation of Selected Open Source Repository Systems - essentially a side-by-side comparison of Eprints, DSpace and Fedora.  I confess to not having read it in detail yet (I skipped straight to the conclusions :-) ) but it certainly looks like an interesting comparison.

On the face of it, it doesn't appear to go the level of detail I suggested might be useful in my Notes about possible technical criteria for evaluating institutional repository (IR) software.  On the other hand, it probably takes a more realistic approach - when I wrote those notes I had the luxury of not actually having to undertake a real evaluation or make a real decision!

September 19, 2006

New URI schemes - just say no

A little thread has just emerged on the W3C URI mailing list, the conclusion of which (so far) can be summed up more or less as:

  • use http URIs to identify stuff, and
  • make it possible to dereference those http URIs to useful representations of the thing that is being identified.

Sentiments that I very much agree with, and I've given presentations and written in the reasonably recent past (To name: persistently: ay, there's the rub, Persistently identifying website content and Guidelines for assigning identifiers to metadata terms) reaching much the same conclusion.

In his presentation about Public Resource Identifiers (linked from one of the messages in the thread), Steve Pepper suggests that the use of http URIs as identifiers is:

No longer subject to paralysing controversy

Yeah, right!  While I agree with most of his presentation, that particular statement doesn't tally with my experience - perhaps it's true in some alternative Utopian W3C reality?  After my presentation about using http URIs at the DCC Workshop in Glasgow, at least two people suggested that I was a "creativity stifling Luddite" (or nicer words to that effect!) for saying that "the only good long term identifier is a good short term identifier" and "the best short term identifier is the http URI".

Well, perhaps they were right... but I still don't feel like I've ever seen a convincing argument as to why, in the general case, we need to invent new URI schemes rather than simply make creative use of the existing http URI scheme.

The W3C draft document, URNs, Namespaces and Registries, lays out some of the reasons why people choose to develop new URI schemes, and offers counter arguments as to why they should think carefully before doing so.  Again, I very much agree with the general thrust of this document.  Inevitably there's a certain kind of comfort in inventing one's own solution to problems, rather than re-using what is already on the table, and I'm probably as guilty as the next person of doing so in other contexts.  But every time we do it, we need to be very clear that the benefits outweigh the costs of adoption and the possible damage done to interoperability.

September 18, 2006

40-year-old virgin

OK, so I'm a blog virgin... or pretty much at least, having made only one real blog entry so far.

I should of course rapidly point out that the 40-year-old bit in the title doesn't refer to my age unfortunately (though I am still, err, 40 something), but is a reflection on the fact that having not entered the blogging fray until now probably makes me the blogging equivalent of a 40-year-old virgin in the more usual sense.

But it is a real relief to have made a start and I hope that we can turn the Foundation blog into a useful resource over the next few years.

Some years ago, I remember discussing the then relatively new blogging phenomenon with Lorcan Dempsey.  I was trying to argue that blogging would kill email discussion lists, and as a result kill discussion in the community - because all the interesting people would stop having real [tm] discussions on lists, in favour of spending time writing their blogs.

At the time I used to hold up the CETIS SIG mailing lists as exemplars of everything that was healthy about email lists - and secretly I have long admired CETIS for the success that they've had at engaging with the community thru their mailing lists and f2f meetings.  But unfortunately, even the CETIS lists now seem to be going the way of other email discussions - few postings and only sporadic debate about issues (technical or otherwise).  Perhaps there are no burning issues any more :-)

The same seems to be true of the Dublin Core Working Group mailing lists.  The DC-2006 conference is happening in Mexico pretty soon and as chair of the Architecture working group I'm very conscious that there has been relatively little on-list discussion in the year since the last meeting - and that many attendees may well have seen or heard almost nothing in that time because a lot of the interesting stuff now seems to happen elsewhere within a relatively small group of people.  The Architecture working group isn't unusual in this respect.  All in all, it seems to me that this lack of wider engagement is one of the major issues that needs to be considered by the DCMI trustees and Advisory Board members when they meet in Mexico.

Whether I was right about the cause (and I accept that I probably wasn't), it does feel to me that many of our email discussion lists have stopped being the fertile space they used to be.  What worries me is whether they have been replaced by anything as productive?

September 16, 2006

Item banks as repositories

Smallp1000397_2 At the last CETIS Metadata meeting in Bath there was a presentation by the UKCDR project (to a somewhat embarrassingly small crowd) about item banks - a concept that was fairly new to me at the time, but one which I understood to simply mean a repository containing "items", where each item is a question and its associated information conforming to the QTI spec.

Towards the end, the presenters asked for ideas about what standards and machine-interfaces an item bank should support.  I volunteered an answer based largely on my work on the JISC Information Environment technical architecture.  As a result, I've been asked to present a longer version of my answer at the next meeting of the SIG in Glasgow.  The moral is probably "keep your head down unless you want to get asked to do stuff"!  Actually, I'm very happy to be asked, and I like Glasgow as a city - though I have to confess to a growing concern about my personal carbon footprint and flying the length of the country for a one day meeting probably isn't doing the world any favours, but perhaps I should leave that as the topic of another post.

Anyway, I digress... to get back to the main point, my answer at the time can be summed up pretty simply.  In short, an item bank is a repository and therefore it should support the same interfaces and standards that any other repository is expected to support - the OAI-PMH for harvesting item metadata, SRU for searching metadata and/or full-text, RSS for disseminating news feeds and other lists of items, HTTP for getting the items themselves, OpenURLs for any links to bibliographic resources and cool URIs as identifiers for everything that needs to be identified.

Of course, life isn't necessarily that simple and I presume that there will have to be discussions about what kind of metadata (e.g. DC and/or UK LOM Core) should be exposed thru the OAI-PMH and SRU interfaces.  And, as with any repository, there are no widespread agreements in place yet about what API a repository should support to allow content to be deposited into it.

Furthermore, I guess it's also worth thinking about making sure that appropriate parts of the repository get into Google quickly by creating and registering a Google sitemap.  And last but not least, if access control is required, particularly outside of a single institution, then adding support for Shibboleth is probably the way to go.

Overall, the list of things to support isn't that daunting, or so it seems to me.  More importantly, I would argue that this list can be applied to any 'repository', whatever the complexity or nature of the resources it contains.

Image: Looking up at the dome of the Museum of Glass in Tacoma.  Stu Weibel was good enough to spend some time showing me round the area south of Seattle last time I was over for a Dublin Core Usage Board meeting.  [May 2006]

September 11, 2006


Welcome to eFoundations, the Eduserv Foundation blog.

This is where we'll share our news, thoughts and general views on issues related to our areas of activity:

  • metadata, repositories and open access;
  • access and identity management;
  • service architectures;
  • effective elearning.

Andy Powell and Pete Johnston



eFoundations is powered by TypePad