« December 2007 | Main | February 2008 »

January 30, 2008

Metadata Standards Harmonization

Mikael Nilsson announced earlier this week the availability of a document produced by the EC-funded ProLearn project, with the title Harmonization of Metadata Standards, edited by Mikael with contributions from Ambjörn Naeve, Erik Duval, David Massart & myself (though I have to admit my own direct input to this paper was quite limited!).

The document analyses a number of metadata standards and seeks to elucidate the principles and frameworks which underpin those standards, and to highlight that it is the differences and incompatibilities in those principles and frameworks which ultimately create obstacles to the development of systems working across multiple standards. Until we meet the challenge of addressing these contradictions, by "harmonizing" our metadata standards, the effective exchange of metadata instances between systems based on different standards will always be fraught with difficulty.

The paper concludes with a "manifesto" of concrete points of action for the harmonization of metadata standards generally, with specific reference to the case of the IEEE Learning Object Metadata (LOM) standard and Dublin Core, in five areas:

  • Identification: The use of URIs as globally scoped identifiers for metadata terms.
  • Abstract Models: The synchronisation of standards at the level of their abstract models, rather than through (complex, lossy) mapping between instances of different, often incompatible, abstract models.
  • Vocabulary Models: Closely related to the previous point, since the type of metadata term to be described is determined by features of the abstract model, alignment of the ways "element vocabularies" are described, with a recommendation to use RDF Schema. (I think I would have liked to see a bit more qualification/elaboration of this point, and emphasis of the dependency on an RDF-compatible abstract model: the solution isn't, IMHO, as straightforward as producing an RDFS property description corresponding to each "element" of a vocabulary which was constructed for use in the context of a tree-based model - my old "hobby horse" that a "LOM data element" is quite a different sort of thing from a "Dublin Core element".)
  • Application Profile Models: A shared understanding of what constitutes a metadata application profile.
  • Metadata formats: Syntaxes must be grounded in the abstract model(s): it is the model which drives the representation in a concrete syntax.

The paper reprises and refines some of the themes that have been addressed in earlier papers (e.g. a paper at DC-2006 on metadata frameworks and a book chapter written around the same time), but I think it provides a nice distillation of those ideas, brings in some of the current context (including the sort of informal, "subjective" metadata surfaced in many "Web 2.0" contexts and Erik's recent work on "attention metadata"), and extends them to guidance to standards developers on some practical steps for action.

The paper concludes - and here I can hear the characteristically resilient and upbeat voice of Mikael, who is always keen to point out to me that the glass I see as half-empty is in fact half-full! :-) - :

Together, these two initiatives [the IEEE LOM/Dublin Core harmonization effort and the Resource Discovery & Access (RDA) work in the librray community], both of which include important contributions from ProLEARN members, demonstrate important progress towards harmonization of several important metadata domains – generic metadata using Dublin Core, educational metadata, and library metadata, as well as a widening from the all-digital domain to the domain of physical artefacts (books).

Harmonizing metadata specifications in the way outlined in this document seems an overwhelming task, but the steady flow of important developments still makes the future seem bright.

Learning Materials & FRBR

JISC is currently funding a study, conducted by Phil Barker of JISC CETIS, to survey the requirements for a metadata application profile for learning materials held by digital repositories. Yesterday Phil posted an update on work to date, including a pointer to a (draft) document titled Learning Materials Application Profile Pre-draft Domain Model which 'suggests a "straw man" domain model for use during the project which, hopefully, will prove useful in the analysis of the metadata requirements'.

The document outlines two models: the first is of the operations applied to a learning object (based on the OAIS model) and the second is a (very outline) entity-relational model for a learning resource - which is based on a subset of the Functional Requirements for the Bibliographic Record (FRBR) model. As far as I can recall, this is the first time I've seen the FRBR model applied to the learning object space - though of course at least some of the resources which are considered "learning resources" are also described as bibliographic resources, and I think at least some, if not many, of the functions to be supported by "learning object metadata" are analogous to those to be supported by bibliographic metadata.

I do have some quibbles with the model in the current draft. Without a fuller description of the functions to be supported, it's difficult to assess whether it meets those requirements - though  I recognise that, as I think the opening comment I cited above indicates, there's an element of "chicken and egg" involved in this process: you need to have at least an outline set of entity types before you can start talking about operations on instances of those types. Clearly a FRBR-based approach should facilitate interoperability between learning object repositories and systems based on FRBR or on FRBR-derivatives like the Eprints/Scholarly Works Application Profile (SWAP). I have to admit the way "Context" is modelled at present doesn't look quite right to me, and I'm not sure about the approach of collapsing the concepts of an individual agency and a class of agents into a single "Agent" entity type in the model. (For me the distinguishing characteristic of what the SWAP calls an "Agent" is that, while it encompasses both individuals and groups, an "Agent" is something which acts as a unit, and I'm not sure that applies in the same way to the intended audience for a resource.) The other aspect I was wondering about is the potential requirement to model whole-part relationships, which, AFAICT, are excluded from the current draft version. FRBR supports a range of variant whole-part relations between instances of the principal FRBR entity types, although in the case of the SWAP, I don't think any of them were used.

But I'm getting ahead of myself here really - and probably ending up sounding more negative than I intend! I think it's a positive development to see members of the "learning metadata community" exploring - critically - the usefulness of a model emerging from the library community. I need to read the draft more carefully and formulate my thoughts more coherently, but I'll be trying to send some comments to Phil.

January 28, 2008

Eduserv Foundation grants call 2008

I'm very pleased to announce that our 2008 call for research projects is now available.  This year we are looking for projects in three areas:

  • online identity
  • the open social graph
  • always-on Internet access and mobile computing.

In each case we are interested in research and/or development projects that move forward the education community’s understanding of the impact that emerging technologies will have on the way learning and research are undertaken.  Proposals may focus on the technical, social and/or political issues in these areas, inside and/or outside formal institutional settings (lecture theatres, research labs, campus open spaces, libraries, museums, Internet cafes, etc.).

Proposals that combine two or more of these areas are particularly welcome.

Please see the text of the call for full details.  Note that proposals that include a software development component are encouraged, though this is not a requirement.

Important: projects must be based at a UK academic institution (i.e. that's where the money must be going in the first instance).  However, subcontracting work to an external organisation or individual is acceptable.

January 25, 2008

eFoundations and comments

For info...

You may recall that I enabled comment moderation on this blog just after Christmas in an attempt to control comment spam.  On reflection, we're slightly concerned that moderation may hinder people's willingness to add comments and there's not a lot of spam getting thru anyway.  On that basis I've disabled it again.

Apologies for any confusion.

Comments welcome! :-)

Why federated access management?

In my bunfight post I re-iterated my belief that the move to Shibboleth is the right one for the UK education community.  In his follow-up comment, Owen Stephens questioned this view, suggesting instead that "implementing Shibboleth to allow access to 'library' type resources is putting in a technical solution to a problem that didn't seem to exist...".

I tend to disagree, though I can certainly understand where Owen is coming from.

In her blog, Nicole Harris puts forward the JISC's rationale for moving us down this road:

  • Improve the business decisions made by institutions in relation to identity, access and resource management
  • Increase the commercial choice to institutions in relation to identity and access management technologies.
  • Reduce the impact and cost of vendor lock-in within the JISC community.
  • Embed knowledge within the community, rather than within any one organisation.
  • Place the principles of the JISC Information Environment at the core of the implementation of access management within its community.
  • Move towards a single sign-on environment for UK Further and Higher Education institutions across internal, external, and collaborative resources.

I mainly agree, though I think it's worth looking at each of the points in more detail.

Improve the business decisions made by institutions in relation to identity, access and resource management

I suppose this is true, though I'm not overly clear why business decisions should necessarily get better as a result of the transition.  I suppose the overall thinking is that this move pushes responsibility for identity management back into the institutions, where they can choose whether they implement in-house or outsource to a third-party such as Eduserv.  While this move works against some of the benefit of a 'shared service' approach, it hopefully won't destroy it completely.

Furthermore, I think it is the case that the loss of some management information currently provided by a centralised Athens service but unavailable under a distributed federated model will actually make some business decisions harder?  However, I'm assuming that as a community we will find ways round such problems in due course.

Increase the commercial choice to institutions in relation to identity and access management technologies.

It seems to me that this is the killer reason.  Nobody likes a closed, proprietary solution and moving to an open playing field has got to be beneficial to the community in the long term.

Reduce the impact and cost of vendor lock-in within the JISC community.

I understand the point, though I think the use of 'vendor lock-in' is somewhat unfair, at least in its connotation (and kinda typical of the flack Eduserv seems to have to take).  I never heard anyone complain of being locked in to Oxfam (but, yes, before anyone shouts... I understand the situation is different :-) ). As to 'cost', I'm not in a position to judge.  Is anyone?  What are the costs of this transition, overall?  What will the ongoing costs be, overall?  I have no clue as to whether overall costs across the whole community will go up or down.  That doesn't make me think the transition is a bad idea because I think there will be other benefits - but I wouldn't claim it as a reason for doing it.  Overall, I think that saying "reduce the JISC community's dependency on a single supplier" would have been more honest (and a good reason for making the change).

Embed knowledge within the community, rather than within any one organisation.

I don't strongly disagree with this as an argument in favour of the transition, though I think it is an interesting one to make in the context of the government's 'shared service' agenda.

Place the principles of the JISC Information Environment at the core of the implementation of access management within its community.

Other than the argument about using open standards rather than proprietary ones I don't really get this.  As one-time architect of the JISC IE it doesn't strike me that 'architecturally' there is anything particularly more JISC IE-like about Shibboleth as opposed to Athens.  In fact, one could probably argue that Athens is one of the few things in the JISC IE's notion of 'shared infrastructure' that has delivered anything of lasting value!?

Move towards a single sign-on environment for UK Further and Higher Education institutions across internal, external, and collaborative resources.

Well yes, OK.  Fair point.  Though, as I've noted here before, if, by 'external', one means the full range of Web 2.0 and other services that learners and researchers make increasing use of, then Shibboleth doesn't help in the slightest with single sign-on since it has almost no currency outside the education sector.

Overall then, I disagree with the way much of the rationale is presented, but I concur with the resulting direction.

January 24, 2008

XRI and OpenID

A post by Drummond Reed to the openid-general mailing list reminded me that the Extensible Resource Identifier (XRI) now features in OpenID 2.0.  I've never really understood why we needed XRIs and reading the opening of the syntax specification left me not a whole lot wiser, especially since the opening sentence:

Extensible Resource Identifiers (XRIs) provide a standard means of abstractly identifying a resource independent of any particular concrete representation of that resource—or, in the case of a completely abstract resource, of any representation at all

could equally be applied to URIs (or IRIs).

I subsequently came across the XRI and OpenID page in the inames wiki.  This goes someway towards explaining why XRIs are of interest, at least in the context of OpenID, including the following:

Why is this so important? If you as an individual begin using a domain-name based URL as your OpenID at websites across the net, and at some point in the future you lose that domain name to someone else (it expires and is not renewed, you lose it in a domain name dispute, you pass away), whoever the new registrant is now completely controls your OpenID identity. Ironically that happens because it's exactly how OpenID is designed to operate: the credentials for proving ownership of an identifier are now tied to resolution of the identifier itself, and not to the sites at which it is used.

XRI infrastructure prevents this form of identity misappropriation by automatically mapping every i-name to a synonymous persistent i-number (a non-reassignable XRI in which each subsegment starts with a !). OpenID relying parties store this i-number, rather than an the i-name, as the identifier of the user.

Another key feature of XRIs is that the entire resolution infrastructure supports HTTPS, so all XRIs can automatically use HTTPS resolution without it needed to be explicitly specified. (For technical reasons, OpenID URLs must have https:// typed explicitly by the user in order to use HTTPS resolution from the start.)

It'll interesting to see if XRIs get widely adopted within the OpenID world.  I haven't noticed it happening yet though, to be honest, I haven't looked very hard and it is early days anyway.  I'm guessing that the power and simplicity of the http and https URI will take some overcoming.

OAI ORE specification roll-out meetings

The OAI ORE project is co-ordinating two open meetings to introduce the (forthcoming) beta versions of the set of specifications which the project has developed to describe aggregations of resources. (The  current alpha versions are http://www.openarchives.org/ore/0.1/

The first meeting is in the USA on 3 March 2008 at Johns Hopkins University, Baltimore, MD. (Press release)

The second meeting is in the UK on 4 April 2008 at the University of Southampton in conjunction with the Open Repositories 2008 conference. (Press release)

Please note that, in both cases, spaces are limited and registration is required.

January 22, 2008

Bunfight at the Athens/Shibboleth gateway

Those of you on the UK academic jisc-shibboleth@jiscmail.ac.uk list will have seen by now that the competing press releases, concerning the non-funding of the Athens/Shibboleth gateways, from us (Eduserv) and JISC have been released.

What's all the noise about?  Well UK academia is in transition from Athens (the proprietary access management service run by Athens on behalf of JISC for the last 10 years or so) to Shibboleth (the open, standards-based identity and access management framework, currently popular in academia).  This is absolutely the right transition to make but, of course, there are implementation issues that come with the details of the transition.  One such detail surrounds the provision of the so-called Athens/Shibboleth gateways.

What are the gateways?  Essentially, they provide a bridge between the old and new worlds.  Why are they necessary?  Because at this stage it looks highly likely that not all service providers will have switched to Shibboleth by the time the Athens funding ceases (later this summer).  Any institution that has made the switch to Shibboleth but that needs to continue to access resources from service providers that remain Athens-only will need to do so via an Athens/Shibboleth gateway.

Unfortunately, discussions between the JISC and Eduserv about how much money is available for Eduserv to run the those gateways have broken down - leading to a situation where both parties have issued press releases explaining their view of the situation.  This has led one commentator on the mailing list to ask:

Am I the only one that feels like they are witnessing a school playground argument?

I can certainly sympathise with that point of view!

People are clearly witnessing a disagreement.  Whether it is of the school playground variety is a different matter.  I tend to disagree at this stage, though I'm sure something can be arranged if necessary :-)

Also clearly, the disagreement has to do with cost vs. value.  In short, the cost at which we felt it was viable to offer the gateways was in excess of the value the JISC chose to put on them - there was a disagreement about price, pure and simple.  Who was right and who was wrong in that disagreement is another matter of course, as is the issue of whether some middle ground could have been reached.  I'm not aware that we refused to negotiate (I could be wrong) but in any case, from what I've seen, the gap between the two sides was such that I doubt any negotiated agreement could have been reached even if a longer period of negotiation had been allowed.

Clearly, this is unfortunate for the community.  We did not take our side of the decision making lightly, at least as far as I understand it.  I'm sure the JISC would say the same.  I'm equally sure that both sides of the argument feel like they are taking the 'best' decision in the circumstances.  Sometimes things just don't work out.

The bottom line, from our perspective as an educational charity, is that we have to take not-for-profit business decisions around those services that we believe to be of value to the community in order to ensure the ongoing viability of both our services and the charity overall - we can't simply provide services at well below our own costs.

Note: I tried to post part of this response directly to the mailing list but got the following response:

Your message  dated Tue,  22 Jan  2008 17:19:14 -0000  with subject  "Re: LA (mixed messages)" has been submitted to the moderator of the JISC-SHIBBOLETH list.

Call me paranoid, but I don't recall the list being moderated in the past.  Perhaps it always has been?

UK Daily Telegraph does OpenID

I'd better stop announcing these soon since there is clearly going to be a landslide of services offering OpenID provision.  The latest announcement is from the UK Daily Telegraph (found on TechCrunch by Pete). 

Is it uncharitable of me to suggest that if the Telegraph are doing this then OpenID has really hit the mainstream consciousness?  Probably... especially since they've beaten us to it :-)

January 18, 2008

The copy-and-paste generation

Information skills seem to have been in the news of late (e.g. see the item entitled White bread? from a few days ago).  The debate, in the UK at least, is now fueled by reports from the BBC that "more than half of teachers believe internet plagiarism is a serious problem among sixth-form students" (based on a survey undertaken by The Association of Teachers and Lecturers).  Hey... not only is the Internet full of unreliable information but some of those duffers are cutting-and-pasting it into their essays without even removing the Web advertising material! :-)

More seriously, we've tried to make a small contribution to this area through the Eduserv Foundation, funding five information literacy projects last year.  Of these, we are currently providing continuation funding to two of the original recipients - John Crawford at Glasgow Calendonian University, who is working on The Scottish Information Literacy Project: working with partners to create an information literate Scotland project and Netskills, who are developing information skills and plagiarism awareness materials and workshops for the UK schools sector.

I think that the questions around plagiarism are really interesting.  Ignoring those who simply want to cheat or save themselves some effort, learning how to form and express our own opinions based on the writings of others, how we assess arguments, how we express agreement with existing views without simply copying them word for word are really important.  Looking at my own children (who I encourage to use Wikipedia for their homework by the way, but who I also encourage to read books and other information sources) I know they find these skills very difficult to grasp.  I'm not convinced the curriculum, even at A level, helps much, or as much as it could.

As a parent I can say "you can't cut-and-paste that, you've got to read it and then put it in your own words" but the reaction is mixed.  Superficially, they tend to respond with, "why, what's wrong with those words - they say what I want to say!".  Well, yes... but...

To put it somewhat crassly, if I can mashup music why can't I mashup text?  I wonder if there is a genuine difference in mindset here?

Even relatively simple skills like knowing how and when to quote and cite other's work don't necessarily come naturally.  Primary schools, in my limited experience, are quite good at getting younger children to remember to say where they got their information from, in project-based homework for example.  But I'm not sure how well that limited grounding gets built on in secondary school?

The answers in this area, it seems to me, have to focus on what is most effective for learning outcomes.  Unfortunately, I'm not really in a position to judge that - other than in a man in the street kind of way.  On that basis I encourage my kids not to copy-and-paste too much or too often and hope for the best!

January 17, 2008

Updates to DCMI metadata vocabularies

On Monday, the Dublin Core Metadata Initiative announced a significant update to the descriptions of the DCMI vocabularies, reflected in the RDFS term descriptions available in the DCMI "namespace documents". There are two main changes:

  • the categorisation of what were previously called "encoding schemes" as either Vocabulary Encoding Schemes or Syntax Encoding Schemes;
  • the introduction of assertions of domain (rdfs:domain) and range (rdfs:range) relationships for those DCMI properties with URIs in the http://purl.org/dc/terms/ namespace

Given the wide variation in the existing use of the fifteen properties of the Dublin Core Metadata Element Set (with URIs in the http://purl.org/elements/1.1/ namespace), it was decided that making range assertions for these properties may introduce problems for existing applications. So no domain/range assertions are made for those properties, and fifteen new, like-named properties have been defined, with URIs in the http://purl.org/dc/terms/ namespace, and these new properties are the subject of domain/range assertions.

This set of changes represents a considerable step forward in aligning DCMI's descriptions of its own metadata terms with the vocabulary model defined by the DCMI Abstract Model. It is the culmination of a good deal of effort by the DCMI Usage Board and the DCMI Architecture Forum, and in particular by Tom Baker, the chair of the Usage Board, who has done the lion's share of the work in preparing this set of documents and in ensuring that all changes have been documented in accordance with the UB's procedures.

A full description of the changes can be found in the document, Revisions to DCMI Metadata Terms

In addition, and to complement these changes, the document Expressing Dublin Core metadata using the Resource Description Framework (RDF) is now a DCMI Recommendation.

Flickr Commons

Via a tweet by @briankelly I discovered Flickr Commons, a collaboration between the Library of Congress and Flickr to "give you a taste of the hidden treasures in the huge Library of Congress collection" and to demonstrate "how your input of a tag or two can make the collection even richer".  There are more formal announcements here and here.

Brian's initial tweet generated a mini Twitter discussion (something that some people say Twitter isn't supposed to be used for though I tend to disagree).  The general consensus seemed to be that using the resources and tools of the private sector to widen access to public collections makes perfect sense, provided ownership of the data is retained - i.e. in this case it is OK because Flickr isn't Facebook! :-)  There are certainly some very, very obvious benefits in terms of visibility of content, size of audience, quality of user experience, and so on.

On that basis alone, this is a very interesting development and one that I'm sure many parts of the cultural heritage sector will be keeping a close eye on.  Congratulations to the Library of Congress and Flickr for getting their fingers out and doing something to bring these worlds together!  I'm guessing that the two collections that have been made available via Flickr so far are part of the American Memory collection - I haven't checked.  I'm also guessing that, like much of that collection, these images are effectively in the public domain?

As I've said before, what is frustrating for those of us in the UK about this development is that it is much harder to see this kind of thing happening here, where so many of our cultural collections are locked behind restrictive 'personal', 'educational' use licences.

Operating a hand drill at Vultee-Nashville, woman is working on a It'll be fascinating to see what kinds of tags people add.  The Flickr policy statement - "Any Flickr member is able to add tags or comment on these collections. If you're a dork about it, shame on you. This is for the good of humanity, dude!!" is short and to the point.  Like it!

I took a quick browse around the 1930s-40s in Color collection/set.  Here's a nice image (see right), now tagged with 'bandana', a word not in the original catalogue record as far as I can tell.  From there it is possible to navigate to other images in the collection with the same tag - there are three at the time of writing.  OK, so this isn't a earth-shattering example of user-generated content but you get the idea, and bandana researchers all over the world might well be hugely grateful to have three more resources at their disposal! :-)

It will also be interesting to see the kind of comments that people leave.  Hopefully we'll get beyond the use of 'wow' and 'awesome'!  Wouldn't it be great to see comments by the people (or their families or colleagues) in the photos.

Final thought... we've been making the point here for a while that Flickr is a repository and that the Flickr experience is a useful benchmark when we think about how repositories should look and feel - I think this kind of development makes that even more obvious.

January 16, 2008

Complexity, compschlexity

I used to think that complex objects were important and that packaging standards were going to be critical for the future of managing learning and/or research objects.  For example, the JISC Information Environment Technical Standards document (wot I wrote) says:

Resources that comprise a collection of items that are packaged together for management or exchange purposes should be packaged using the IMS Content Packaging Specification if they are 'learning objects' (i.e. resources are primarily intended for use in a learning and teaching context and that have a specific pedagogic aim) or the Metadata Encoding & Transmission Standard (METS).

Now I'm not so sure.  Les Carr, over on RepositoryMan, seems to have reached the same conclusion.

For some time now I've argued that the Web is made up of complex objects anyway (in the sense that almost every Web page you look at is a bundle of text, images and other stuff) but that the Web does well to treat these as loosely-coupled bundles of individual items - items that are to a large extent managed and delivered separately.  In some respects (X)HTML acts like a simple packaging format, using the <img>, <object> and other tags to link together the appropriate bundle of items for a given Web page but leaving the client to decide which bits of the bundle to download and use - a text-only browser will behave differently from a graphical browser for example.

Our attempts at more explicit tightly-coupled complex object standards, in the form of IMS CP and METS for example, have resulted in objects that are useful only in some of the systems, some of the time (between a learning object repository and a learning management system for example) but that are largely unhelpful for people armed only with a bog-standard Web browser.

What do I mean by tightly-coupled?  It's difficult to say precisely!  It may be the wrong term.  One can certainly argue that a METS package in which all the content is linked by reference (as opposed to being carried as part of the package) is not hugely dissimilar to the situation with (X)HTML described above.  But there is one massive difference.  (X)HTML is part of the mainstream Web, other packaging standards are not - when was the last time you found a Web browser that knew what to so with an IMS CP or METS package for example?  So maybe the issue has more to do with the solution being mainstream or not, rather than about how tightly-coupled stuff is?

My concern is that repository efforts that first and foremost treat the world as being made up of complex objects that need to be explicitly packaged together using standards like IMS CP or METS in order to be useful may take repositories further away from the mainstream Web than they are currently - which is not a good thing.  IMHO.

Generation G

As Paul Walk notes, coincidence is a wonderful thing.  In this case, the coincidence is the JISC's publication of a report entitled "Information Behaviour of the Researcher of the Future" (PDF only) following hot on the heels of the debate around whether Google and the Internet should be blamed for students' lack of critical skills when evaluating online resources.

The report, in part, analyses the myths and realities around the google generation, though it actually goes much further than this, providing a very valuable overview of how researchers of the future (those currently in their school or pre-school years) might reasonably be expected to "access and interact with digital resources in five to ten years' time".  Overall, the report seems to indicate that there is little evidence to suggest that there is much generational impact on our information skill and research behaviours:

Whether or not our young people really have lower levels of traditional information skills than before, we are simply not in a position to know. However, the stakes are much higher now in an educational setting where `self-directed learning’ is the norm. We urgently need to find out.


Our overall conclusion is that much writing on the topic of this report overestimates the impact of ICTs on the young and underestimates its effect on older generations. A much greater sense of balance is needed.

Or as the JISC press release puts it:

research-behaviour traits that are commonly associated with younger users – impatience in search and navigation, and zero tolerance for any delay in satisfying their information needs – are now the norm for all age-groups, from younger pupils and undergraduates through to professors

The message is pretty clear.  Information skills are increasingly important and teaching them at university level appears to be shutting the stable door after the horse has bolted.  There is some evidence that to be effective, information skills need to be developed during the formative school years.  Interestingly, to me as a parent at least, is the evidence from the US that indicates that when "the top and bottom quartiles of students - as defined by their information literacy skills - are compared, it emerges that the top quartile report a much higher incidence of exposure to basic library skills from their parents, in the school library, classroom or public library in their earlier years".

The report ends by enumerating sets of implications for information experts, research libraries, policy makers, and ultimately all of us.  Well worth reading.

January 15, 2008

On naming metadata standards

In the UK we often say that a service or product "does exactly what it says on the tin" which, as Wikipedia explains, stems from  a set of UK TV adverts for Ronseal woodstains that have run since 1994 and which means that it is obvious from the label what something is going to do for you.

The Dublin Core Metadata Initiative (DCMI) seems to insist on using non-obvious names for some of its standards, the latest being the Singapore Framework (or The Singapore Framework for Dublin Core Application Profiles to give it its full title - though I'm sure that the abbreviated form will get used more often).

This is not the first time that place names have worked their way into DCMI terminology - the others being the Dublin Core itself (obviously!) and the Warwick Framework (which I'll leave as an exercise for those of you not in the know to find out about - though I will note that the middle 'w' in Warwick was often mispronounced).  These names are great for those in the know - especially those who attended the original meeting from which the name emerged.  But I'm not sure they help the rest of us much?

Might it not be better to say exactly what a standard does on the tin?

I suppose one could argue that DCMI have done that in the longer form of the name - though as I mention above, it seems to me that the short form is likely to get used more often.  Furthermore, slightly quirky names, arguably, help make something distinctive and allow an "in the know" community to form around the name.  This probably happened with the use of Dublin in Dublin Core, particularly in the early days.

I'm less clear that it is helpful at this stage, not least because it is hard to imagine a community forming around the notion of a framework for metadata application profiles! :-)  But, hey, you never know!?

White bread?

Via Emma Place and The Times Online I note that:

Google is "white bread for the mind", and the internet is producing a generation of students who survive on a diet of unreliable information, a professor of media studies will claim this week.

Good grief.  Emma is right to say that this is an important issue and I completely agree that "Internet research skills should be actively taught as a formal part of the university curriculum. Students may well be savvy when it comes to using new Internet technologies, but they need help and guidance on finding and using Web resources that are appropriate for academic work" but the debate isn't helped much by sound bites.

Blaming the Internet for "a generation of students who survive on a diet of unreliable information" is a bit like blaming paper for the Daily Star.  How about blaming an education system that hasn't kept up with the times?

The Internet, Google and Wikipedia are tools - no more, no less.  Let's help people understand how to use them effectively.

I forgot I was a luddite

I'm just in the process of deleting my MyBlogLog account, largely because I get fed up of my image appearing on other people's blogs.  I don't quite know why but I find it somehow disconcerting to have my blog reading habits made so obviously public!

Anyway, MyBlogLog asks for a reason why the account is being closed, offering "I forgot I was a luddite" as one of the options.

Yup, that'll be the one.  Lol.

January 14, 2008

Blastfeed - a small case-study in API persistence

Blastfeed is a service that I've used over the last 6 months or so to build aggregate channels from a set of RSS feeds.  For example, I've been using it to build an single RSS feed of all my favorite Second Life blogs which I can then embed into the sidebar of ArtsPLace SL.

Blastfeed isn't the only option for doing this, Yahoo Pipes would have been an obvious alternative, but it was quick and easy to use and, up until now, was also free.  Recently I got this by email:

Blastfeed has been running smoothly (almost no glitch besides one last November, sorry again) since its debut a little over a year ago. We hope here at 2or3things that you've enjoyed using Blastfeed.

Hence at this stage we feel it's no longer necessary to keep Blastfeed in a beta mode. We have also decided to focus the service onto corporate applications, while letting the opportunity to web users to subscribe for a fee to unlimited usage.

In line with the above we shall discontinue the free service from February 15th 2008 on.

Should you wish to continue using Blastfeed after that date, please contact us by return email stating your Blastfeed username and email and we'll make a quick and fair proposal. However we'll bind the proposal to the number of potential subscribers.

I'm not complaining.  Blastfeed have never promised to remain free forever and until recently they still badged themselves as a beta service.  But this does serve as a timely reminder (for me at least) to take steps to mitigate this kind of thing happening.

The "API" to the aggregated feed service(s) that I've built using Blastfeed is effectively an HTTP GET request against the feed URI, with RSS XML returned as a result.  With the demise of the free service, I can recreate the aggregated feed somewhere else easily enough - but doing so will change the URI and hence the API that I've built for myself.  I'll have to remember all the places that I've used my API (e.g. in the ArtsPLace SL sidebar) and update them with the new URI.

With hindsight, what I should have done was to make the API more persistent by using a PURL redirect rather than the native Blastfeed URI.  That way, I could have changed the technology that I use to create the feed (e.g. replace Blastfeed by Yahoo Pipes) without changing the API and without having to update anything else.

Oh well... live and learn!

Following your nose

I think one of the most helpful principles which I've picked up on from following various discussions around the topic of the Web architecture is that sometimes described as "following your nose".

I'm not sure there's a concise one document summary of the principle anywhere (or at least I struggle to find one with Google). (Edit: It looks as if this draft W3C TAG finding by Noah Mendelsohn is an attempt to provide one. And I should emphasis that it is very much a draft, as it clearly indicates that it is incomplete.) The principle has been highlighted in a number of recent presentations related to the GRDDL specification (see e.g. Dan Connolly, Practical Semantic Web Deployment with Microformats and GRDDL), but I think it's important to emphasise that it is in no way specific to that context. Rather, it is a general principle of the Web, and indeed it arises from some of the central constraints of the REST architectural style: that messages should be self-descriptive. Or as Mark Baker phrases it in a presentation from 2004, "The meaning of a message is fully grounded in public specification, or the Web itself". Each message which forms a representation of resource state should carry information which indicates - in Web-friendly ways - the conventions used in that message that are important for its interpretation.

Mark's presentation in turn references Tim Berners-Lee's keynote presentation from the World Wide Web Conference of 2002, in which he emphasises that working on the Web involves "a serious commitment to common meanings", and traces the application of the principle to bitstreams on the Web, illustrating the role of the chain of unambiguous references to various public specifications in the interpretation of messages on the Web.

And the "follow your nose" approach is not an "optional extra"; on the contrary, it is fundamentally necessary in order to support the highly devolved, loosely coupled nature of interaction on the Web. As the Mendelsohn draft puts it, "Web architecture dictates that any user agent may at any time issue a GET and attempt to interpret representations for any HTTP resource." It is not sufficient to rely on an expectation of some additional pre-coordination between provider and consumer, some private agreement on the use of specialised conventions, in order to to enable interpretation.

The use of URIs as names - and in particular URIs that can be dereferenced using the HTTP protocol - is a critical enabling factor in the "follow your nose" approach. Just at the level of naming/identification, the use of a URI provides for disambiguation in the global context in a way a plain character string can not. But further, when a server provides a URI in a representation, the client can in turn seek to dereference that URI to try to obtain more information about the resource identified by that URI provided its owner. That information may take the form of a human-readable document, but it may also provide information to enable further processing of the original representation. (Aside: This was another note that I started writing before Christmas, and I noticed that in the meantime Ed Summers has posted a draft of a forthcoming Information Standards Quarterly (available from NISO) article titled "Following your nose to the Web of Data", in which he explores this further.)

Over the last couple of years there has been a good deal of interest in embedding structured data in X/HTML documents, not only in the case of GRDDL but also that of microformats. One of the important "hooks" for establishing this "chain of meaning" in this context is the use of HTML's meta data profile feature and the profile attribute of the HTML head element. And indeed a page on the microformats wiki notes that "it is ACCEPTED that each microformat should have a profile URI". For each microformat, an HTML meta data profile provides more information about the interpretation of that microformat, either for a human reader or an application or both. (In practice, unfortunately, as Dave Orchard notes, many microformats implementers - even where a profile has been defined by the microformat creator - ignore the recommendation to use the profile URI in their HTML instances.)

There are examples of conventions used within the digital library and e-learning communities (and indeed more broadly), at least some of them enshrined in de facto or de jure standards, which ignore, or at least do not adhere as closely as perhaps they should to, the "self-describing messages" principle. In many cases, I guess that can be put down to a case of "if we'd known then what we know now...", but I'd like to think we now recognise the need to ground our future specifications firmly in the Web. I'll mention a couple of examples where I think a relatively minor change could make a substantial step towards addressing the problem.

The OpenID Authentication specification (Version 1.1, Version 2), for example, makes use of a number of simple tokens which are used as application-specific link types (e.g. openid.server, openid.delegate, openid2.provider, openid2.local_id) in link elements in the headers of HTML documents to represent relationships between resources. These tokens are defined by the OpenID specification, and are supplementary to the "built-in" link types defined by the HTML specification itself. While the intent in the HTML spec is indeed that the list is extensible, AFAICT, the OpenID specification ignores the advice of the HTML specification to use a meta data profile and the profile attribute to provide access to documentation of those extensions:

Authors may wish to define additional link types not described in this specification. If they do so, they should use a profile to cite the conventions used to define the link types.

So currently an agent processing an HTML document and encountering one of these OpenID link types can not "follow its nose" to obtain further information; OpenID relies on the client having prior knowledge of the OpenID link types and the set of character strings that represent those types. Dan Connolly provides an example of how the provision of such a profile, and the use of the HTML profile attribute to reference that profile, would ground OpenID more firmly in the Web.

Similarly, in the current alpha drafts of the OAI ORE specifications, there is a proposal for a set of conventions proposed for embedding data in HTML documents. However, this too relies on the consumer of the document having advance built-in knowledge of those specific conventions and of specific character strings used as HTML attribute values. I'd suggest that for the ORE case, what is required is:

  1. Clarification of what relationships we wish to assert, and what RDF triples are required to make those assertions, including any additional terms required. (The core ORE data model is based on RDF, so this should be relatively straightforward to do)
  2. Development of a convention for representing those triples in HTML which is firmly grounded in the Web and compatible with the "self-describing message"/"follow your nose" principle, either (a) by defining an ORE-specific microformat with its own associated profile URI and using that profile to enable GRDDL-based extraction of those triples from an HTML instance; or (b) adopting the use of (a small subset of) RDFa. (My slight concern about the latter is that RDFa still seems to be work-in-progress - but OTOH so is ORE, and as long as that is made clear in our documentation that may not be an issue.)

I'm currently in Washington, D.C. for a meeting of the OAI ORE Technical Committee over the next two days, so I guess I'll get to have these discussions in a few hours time :-) Having said that, the combination of time zone adjustment (which I seem to find ever harder these days) and a slightly noisy hotel room means that I've managed only about five hours sleep for two nights running, so at this rate, far from resolutely fighting the corner of Web architecture, I'll probably be dozing over my laptop by lunchtime.

January 09, 2008

DC-2008 Call for Papers

In more rather belated news from the pre-Yuletide period, DCMI has issued the Call for Papers for the DC-2008 conference, to be held from Monday 22 through Friday 26 September 2008 at the Berlin-Brandenburg Academy of Sciences and Humanities (BBAW) in Berlin, Germany. The focus of the conference is on metadata in the context of social software, and "metadata challenges, solutions, and innovation in initiatives and activities underlying semantic and social applications", and the submission deadline for papers, reports and posters is 30 March 2008. See the Website for full details of the theme and the call.

As an aside, I must draw the attention of all beer-loving metadata wonks (my favourite bits of the festive season involved a bottle of Schlenkerla Rauchbier Urbock and a bottle of St Feuillien Cuvée de Noël) to the inclusion on the conference Website of a comprehensive list of Berlin breweries and bars prepared by Traugott Koch. At at least one DCMI conference in the past, I've witnessed Traugott's crestfallen disappointment at the beers on offer in the locality, but it looks as though Berlin will present no such problems.

Flickr - yet another OpenID provider

Via Dan Brickley and Simon Willison I note that Flickr looks set to support OpenID.  This has got to be good news for the future of OpenID, though I'm left wondering what the consequences are of a user-centric identity management landscape in which the balance between the number of 'providers' and the number of 'consumers' seems set to be weighted so heavily in favour of the providers.

Simon Willison appears to see no reason to be concerned about this, saying:

OpenID is good for more than just authentication. The OpenID protocol allows a user to assert ownership of a URL. This can be used for SSO-style authentication, but it can also be used to prove ownership of a specific account to some other service, a concept I’ve been calling identity projection.


A common misconception about OpenID is that it’s only really useful if users stick to using one identity. I’d be happy to see every one of my online profiles acting as an OpenID, not for SSO authentication (I’ll pick one “primary” OpenID to use for that) but so that I can selectively cross-pollinate some of my profiles to new services.

I must admit I struggle to get my head around the possibilities of the technology here.  I worry slightly about the statement that an OpenID "can also be used to prove ownership of a specific account" since that feels like quite a big leap from "asserting ownership of a URL".  Nonetheless, I can see that there are interesting possibilities here.

What worries me is that if the "selective cross-pollination" hinted at above is based on OAuth (which I assume it must be) then we need a better balance of 'consumers' and 'providers' then we appear to have currently?

Graphs, networks, terminology, etc (or, hey, hey, I'm a monkey)

I realise I'm several months behind in even mentioning this debate, which has already generated its fair share of commentary around the Web. Yesterday in the office we were discussing the term "social graph", which has been used by people like Mark Zuckerberg of Facebook and Brad Fitzpatrick (LiveJournal/Google) & David Recordon (Six Apart) to refer to the set of information about people and the relationships between them that is generated within, and used by, social networking services, and also the (rather scathing) criticism of their use of that term by Dave Winer. Winer's criticism is firstly that the term "graph" is - to the non-mathematician at least - most widely understood as referring to a bar or line graph or pie chart, rather than to the mathematical construct made up of nodes and edges, and secondly that the term "social network" is already in widespread use for the very same thing which Zuckerberg, Fitzpatrick & Recordon refer to as a "social graph".

Given my enthusiasm for a certain graph-based data model, it probably comes as no surprise that I'm quite fond of the term "social graph", partly because it seems to me it is the most accurate term for the thing we're talking about here, and partly because I think it enables us to make a distinction between (what to me are) two different things.

The two terms "graph" and "network" are both used in several different ways. Following Dave Winer, the term "graph" is used to refer to (amongst other things):

  1. a visual representation of the variation of one variable in comparison with that of one or more other variables
  2. a mathematical concept of a set of nodes connected by links called edges
  3. a data structure based on that mathematical concept

The term "network" is also used in several ways, including:

  1. an interconnected system of things (inanimate objects or people)
  2. a specialised type of graph (the mathematical concept)

So while I'd agree that a graph (sense 2) is a more general form of a network (sense 2), it seems to me that in the context of social networking applications, the term "network" is typically used in sense 1, i.e. to refer to the set of people - or probably more accurately, "personas" or "identities" rather than biological individuals - and their relationships. And (it seems to me) the term "graph" should really be applied in sense 3, that of a data structure - and indeed I think this is how the term is used in the Fitzpatrick/Recordon article.

And on this basis, there is a clear difference between a network and a graph. The things in a network are people/personas and relationships between them. A graph is a model (or representation, or description, etc) of that network; a graph contains only things which "refer to" those things in the network. Or as Beau puts it quite neatly in a comment on a post by Robert Scobie

Fundamentally, the network is the sum of our living, changing relationships. The graph is a representation of the relationships. Network is to person as graph is to snapshot.

In this sense, social software applications operate on (generate, analyse, expose, extend, merge etc) graphs, not on networks. And indeed a single network might be modelled/represented/described in many forms other than a graph.

When I met Andy for the first time (which, IIRC, was at at a workshop on the RSLP Collection Description Schema back in 2000), he became part of my social network (though I'm not sure I registered in his, as it was ages before he replied to my garbled emails about Encoded Archival Description... ;-)). But I didn't at that time model (represent, describe, etc) my social network as a graph (or use a software application which did that on my behalf), so I don't think that relationship was reflected in a social graph - at that time.

At the risk of muddying the waters further, I might even prefer the suggestion of RonM made in the second comment on a post by Sam Gustin at Portfolio.com that the best term might be "social network graph" i.e. a graph of (describing, representing) a social network. But I suspect we'd be in a small minority on that :-)

Ultimately, as various commentators have acknowledged, our terminology is itself socially constructed, and it will be community usage that determines how these terms are used.

On one last point, I guess I'm slightly sceptical about the tendency to focus heavily on the social graph. While one could in theory construct a single graph which represented/described all the relationships between all the individuals (or personas) in the world, I'm not sure that any of our existing applications actually operate on this graph (well, maybe these sort of folks have stuff which does!). Rather, it seems to me that what we are really dealing with, and will always be dealing with, are many different (sub-)networks, represented/described by many different graphs, each a subgraph of that hypothetical universal social graph. The graph of relationships which I manage in Facebook is different from the graph I manage (albeit in a somewhat desultory fashion of late) in LinkedIn, and I have to admit my Orkut graph has pretty much withered on the vine. In some cases, yes, those differences may be a result of practical, technical obstacles to my ability to share graphs between applications, and I agree that is a problem that must be addressed, but in other cases the differences are the result of conscious choices on my part to construct or disclose different graphs, representing different networks, in different contexts.

Anyway, if the net result of all this is that I sound like a monkey, so be it! [I was going to insert a link to a suitable audio sample here, but I can't find a suitably licensed one... I'm sure there must be one, but as I can't spend more time looking just now, you'll just have to imagine a sort of screechy gabber here.]

January 02, 2008

Rethinking the Digital Divide

The 2008 Association for Learning Technology Conference, Rethinking the Digital Divide, will be in Leeds between 9 and 11 September 2008.  Keynote speakers will include: David Cavallo, Chief Learning Architect for One Laptop per Child, and Head of the Future of Learning Research Group at MIT Media Lab; Dr Itiel Dror, Senior Lecturer in Cognitive Neuroscience at the University of Southampton; and Hans Rosling, Professor of International Health, Karolinska Institute, Sweden, and Director of the Gapminder Foundation.

The closing date for submissions of full research papers for publication in the peer-reviewed Proceedings of ALT-C 2008, and abstracts for demonstrations, posters, short papers, symposia and workshops is 29 February 2008.

The conference will focus on the following dimensions of learning:

Global or local - for example: What are the dichotomies between global and local interests in, applications of and resources for learning technology? How can experience in the developing world inform the developed world, and vice-versa? Will content and services be provided by country-based organisations or by global players?

Institutional or individual - for example: How can the tensions between personal and institutional networks, and between formal and informal content, be resolved?

Pedagogy or technology - for example: How do we prevent technology and the enthusiasms of developers from skewing things away from the needs of learners? Are pedagogic problems prompting new ways of using technology? Are learners’ holistic experiences of learning technologies shifting the emphasis away from ‘pedagogy’ and into learner-centred technology?

Access or exclusion - for example: How can learning technology enable access rather than cause exclusion? If digital access is improving quickly for those with least, do widening gaps between rich and poor matter, and if yes, what needs to be done?

Open or proprietary - for example: Can a balance be struck, or will the future be open source (and/or open access)?

Private or public - for example: What are the respective roles of the private and public sectors in the provision of content and services for learning? Is the privacy of electronic data still under threat? Are there ongoing problems with identity, surveillance and etiquette regarding private/public personae in social software?

For the learner or by the learner - for example: How can technology empower learners and help them take ownership of their learning? How can it help to negotiate between conflicting demands and respond to multiple voices?

Cool URIs for the Semantic Web

Cool URIs are a regular feature on this blog so it was with some interest that I read the W3C working draft announced just before Christmas, Cool URIs for the Semantic Web.

It is perhaps worth noting that this document is a

First Public Working Draft of an intended W3C Interest Group Note giving a tutorial explaining decisions of the TAG for newcomers to Semantic Web technologies

which I take to mean that it is still very much under development.  I've sent my detailed comments to the appropriate list but I just wanted to note a couple of things here.

Firstly, it strikes me that this document doesn't really say a great deal about cool URIs as such.  Rather it talks about the use of URIs in the context of the Semantic Web and, in particular, how URIs for what it calls variously 'real-world objects' or 'non-information resources' should be constructed and dereferenced.

Secondly, and more importantly it seems to me, the document highlights some current problems with terminology in this area.  Back in 2004, the Architecture of the World Wide Web, Volume One introduced the notion of 'information resources' as being resources for which

all of their essential characteristics can be conveyed in a message.

Given this terminology, it seems logical that those things which are not 'information resources' should be termed 'non-information resources' (though the Architecture document itself doesn't use that term).  Unfortunately, the term 'non-information resource' isn't exactly snappy at the best of times, especially when one considers that the printed version of a Web page (which is an information resource) is itself a non-information resource (by virtue of being a physical object).

To try and bypass some of this terminological soup the new working draft introduces the use of 'Web document' and 'real-world object' instead (though not always in a particularly consistent way).  One can understand why... though I'm not totally convinced that the result in any clearer.  Is the conceptual notion of the colour red a 'real-world object' as far as your average man or woman in the street is concerned?

I'm reminded of the early days of discussions around the Dublin Core, circa 1998 I guess, where the term 'document-like object' was introduced as a short-hand for those classes of things that it made sense to describe using the Dublin Core metadata elements.  The term never really took off and, in some cases at least, only served to confuse things further than they were already confused.

The W3C seems to have a similar problem here... distinguishing those things that are, in some sense, on the Web from everything else.  In our work on the DCMI Abstract Model, we often partitioned the world into three basic classes - 'digital resources', 'physical resources' and 'conceptual resources' - of which the first, IMHO, shares a lot of similarities with the W3C's 'information resources'?

Welcome back...

Happy new year and all that...

I left work before Christmas with good intentions for blogging during the break - I even had several posts set up and ready to go - but things failed to materialise I'm afraid.  Too much going on at home I guess... which is a good thing.

Anyway, we're just about back in the office, so normal service should soon be restored.  For info... I've made a few cosmetic changes to the eFoundations Web site, adding a list of the most recent comments to the right-hand sidebar and disabling comments on all postings older than one month.  This latter change should help to reduce the odd bit of spam that used to get thru.  For the same reason, I've also experimentally turned on comment moderation.  As a result, comments will take slightly longer to appear.  We'll see how this change works out and may revert if we turn out not to be responsive enough in doing the moderation.

Here's to a fine and dandy 2008!



eFoundations is powered by TypePad