« October 2006 | Main | December 2006 »

November 30, 2006

Metadata and relationships

Sorry... two posts prompted by Scott Wilson in quick succession, but I couldn't help myself.  Scott writes about metadata here, in response to Wayne Hodgins' post on the Future of Metadata and I have to say I agree with every word.

I'm not sure that I buy Wayne's emphasis on automated metadata generation.  In many cases, if I can automatically generate a bit of metadata and give it to you then the chances are that you can just as easily generate it yourself.  I know that isn't always the case, because in some cases my context, capability and/or knowledge might allow me to generate something that you can't, but I bet it is true in a lot of cases.

More fundamentally, and I think as Scott is arguing to a certain extent, why generate a bit of metadata for someone else to index at all, when they can just as easily index the object itself.  OK, at the moment that really only works for textual resources... but surely that will change over time?

However, it seems to me that one of the key things we can still usefully capture in metadata is the relationships between stuff.  This is a version of that... this is derived from that... and so on.  And, often, it is an intellectual exercise to work out how two things are related.  Sure, on the Web we can index the full text of everything and we know that things are related to other things in a general sense (courtesy of the hypertext link), and we can build very powerful systems on the back of that knowledge (e.g. Google).  But we don't know anything in detail about how things are related to each other - and that is very limiting for some applications.

Making architectural diagrams more understandable

Scott Wilson proposes a mashup stencil, essentially a set of icons for making UML 2.0 diagrams more understandable to the man or woman in the street.  Great idea...  I just wish they were available on something other than a Mac!

If I had more time, and access to the icons in some form that I could use on a PC, I'd consider doing one or more diagrams of the JISC Information Environment using these icons.

November 28, 2006

Why an abstract model for Dublin Core metadata?

In a comment on my earlier post about the DCMI Abstract Model, Jonathan Rochkind asked for some clarification on the motivation for developing the DCAM, and the problems it was designed to address.

(I should emphasise that this represents only my personal view, and I'm not speaking on behalf of the co-authors of the DCAM or of DCMI.)

In my previous post, I said that there were two primary aspects to the DCMI Abstract Models:

  • it describes an abstract information structure, the description set, in terms of the components which make up that structure (description, resource URI, statement etc) and the relationships between those components (a description set contains one or more descriptions; a description contains one or more statements; and so on)
  • it describes how an instance of that structure is to be interpreted, in terms of "what it says" about "things in the world" (each statement in a description "says" that the described resource is related in the way specified by (the resource identified by) the property URI to another resource; and so on)

Aa part of that second aspect, the DCAM also describes the types of "metadata terms" that are referenced in description sets (property, class, vocabulary encoding scheme, syntax encoding scheme) and the relationships that can exist between terms of those types (subproperty or element refinement, subclass).

I guess the first thing to say is that, before the development of the document we call the DCMI Abstract Model (which took place from about mid-2003, with the final document made a DCMI Recommendation in early 2005), there already was an "abstract model" for Dublin Core metadata. DCMI had already embraced the notion that what some piece of DC metadata "said" was distinct from the (multiple) ways in which it might be "encoded": "Dublin Core metadata" could be encoded in multiple digital formats, and, conversely, instances of two different formats could be interpreted as encodings of the same metadata. Underlying this was an assumption - perhaps not as fully or clearly stated as it might have been - that there was some "abstraction" of "Dublin Core metadata" which was independent of any of those "concrete" syntaxes. Of the pre-DCAM documents published by DCMI, the one which comes closest to capturing fully what that abstraction was is probably the Usage Board's DCMI Grammatical Principles. I think it's reasonable to argue that the DCAM, first and foremost, consolidates, rationalises and makes explicit information which already existed (and also provides a more formal representation of it, through the use of UML).

However, that view is a slightly rose-tinted one of a somewhat muddier reality. It is perhaps more accurate to say that there were several such descriptions of "what Dublin Core metadata was", and those descriptions were not always completely consistent with each other. They often differed at least in their use of terminology, if not in the concepts they described. Some were in documents published by DCMI (e.g. the DCMI Grammatical Principles and the descriptions of "abstract models" included in Guidelines for implementing Dublin Core in XML), others in documents published elsewhere (e.g. Tom Baker's, "A Grammar of Dublin Core" in Dlib). Some were published early in the development of DC, others were more recent. With the emergence of the W3C's Resource Description Framework (RDF), DCMI published documents describing the use of RDF for DC metadata, and the use of DC often featured in examples in documents about RDF published by other parties. The terminology and concepts of RDF entered the lexicon of (at least a subset of) the Dublin Core community. While this brought the considerable benefits of aligning Dublin Core with a more formally defined model (and enabling the use of software tools that implemented that model), it also raised new questions: was DCMI's notion of element really the same as RDF's concept of property? Was DCMI's notion of vocabulary encoding scheme the same as RDF Schema's notion of class?

The consequence of this was that while there was, broadly at least, a shared conceptualisation of what DC metadata was, the devil was in the detail: there were sometimes subtle but significant differences between those different descriptions of the DC "abstract model". Implementers ended up with slightly different notions of what DC metadata was, and those differences were sometimes reflected in the software applications that were developed (e.g. the developers of two different applications might take different approaches to implementing the concept of element refinement). If data was processed within a single application (or a set of applications based on the same conceptualisation), then no inconsistencies were visible; but if data was transferred to an application based on a different conceptualisation then the applications might behave differently. As Stu Weibel puts it bluntly, "we don’t even have broad interoperability across Dublin Core systems, let alone with other metadata communities".

On that last note, I think the importance of clarifying these questions really came home to me when I started engaging in conversations involving people coming from different metadata communities, examining questions of interoperability between systems based on different metadata specifications. Those different metadata specifications had their own "abstract models" (even if they weren't always clearly identified as such), their own specifications of information structures and how those structures are interpreted. Often, those different communities apply names to the concepts within their models which are similar to, or the same as, the names used by the DCMI community for a rather different concept. The term "element" was one such example. It quickly became clear that, in our desire to find commonality rather than difference, we risked falling victim to the tendency to see what in my French O-Level class we called "false friends", making assumptions that superficially similar names in different languages must refer to the same concept.

So, to return to Jonathan's question of what problems we had before we developed the DCAM, I'd suggest they include the following;

  1. While we ("the DC community") did have a broadly shared understanding of "what DC metadata was", there were some areas where understandings and perceptions differed, and in the worst cases those differences resulted in two DC metadata applications behaving in different ways.
  2. There were multiple encodings for DC metadata, some defined by DCMI, some defined by other parties. Sometimes it was difficult or impossible to establish whether an instance of one such encoding represented the same information as an instance of another, which severely limited interoperability between systems using them.
  3. There was some confusion between features associated with certain digital formats or syntaxes that were used to represent DC metadata (e.g. the use of XML Namespaces to qualify the names of XML elements and attributes) and features associated with the abstract information structure (e.g. the use of URIs to identify metadata terms and other resources)
  4. We had begun to develop the concept of the DC application profile (DCAP) as a specification of how DC metadata was constructed in the context of some domain or application, typically describing the set of terms used and providing guidance about how the terms were used in that context, but beyond that general notion, there were different perceptions and understandings of the nature of a DCAP, and particularly of the types of terms that a DCAP might refer to
  5. Closely related to the previous point, there was some confusion about whether and how terms should be "declared" for use in DC metadata
  6. There was some confusion about the relationship between Dublin Core and RDF
  7. Our capacity to engage in dialogue about interoperability between systems based on different metadata specifications suffered because of a lack of clarity about our own abstract model and those of the other communities, and from misunderstandings arising from our use of terminology
  8. More broadly, there were varying perceptions of "what DC metadata was" (a set of fifteen elements, "metadata's lowest common denominator", something that appears in the <meta> element of HTML pages, an XML format used by the OAI Protocol for Metadata Harvesting, and so on)

Has the publication of the DCAM solved all these problems? Well, no. Not yet anyway ;-) (And indeed in the meeting of the DCMI Architecture Working Group at DC-2006, we discussed the requirements to make some changes to the DCAM based on feedback received from various sources!) But having the DCAM as a formal specification has, I think, put us on a better footing to be able to start addressing them.

The DCAM gives us a single point of reference for talking about the nature of DC metadata. When we use a term like "element" in conversation, we can point to the DCAM as the source for what we mean by that term. Perhaps more fundamentally we have a description of what our digital formats are "saying". We have a clear description of what information structure we are seeking to represent when we are developing a format for the representation of DC metadata. And conversely, given a format which claims to be a format for representing DC metadata, we can analyse that format in terms of the DCAM and ask whether it serves the purpose of describing how to represent a description set.

We don't yet have a formal description of what a DC application profile is, but establishing that the information structure we are interested in is the description set and that a description set contains references to particular types of metadata terms provides the foundations for doing so. And conversely, given a document that claims to be a DCAP, we can ask whether it specifies how to create a DC description set, and whether the terms it refers to are terms of the type described by the DCAM. Similarly, we have a better understanding of the nature of the terms used in DC metadata, and on that basis we are in a better position to provide guidance to implementers on how to specify and describe any additional terms they may require.

The draft document Expressing DC metadata using the Resource Description Framework seeks to clarify the relationship between Dublin Core and RDF by describing how concepts defined by the DCAM correspond to concepts defined by RDF. I think the presence of the DCAM has facilitated our dialogues with other metadata communities: it enables us to to explain our own approaches, but perhaps more importantly it helps us to reflect on aspects of those other communities' approaches that perhaps have not been explicit in the past. While in the short term, it may be that this highlights differences rather than similarities, that is a vital part of the process of working towards interoperability. Finally, as our paper at DC-2006, "Towards an interoperability framework for metadata standards" [Preprint], suggested, I think this process is helping us to develop a better understanding of the nature of metadata standards and the challenges of interoperability between standards.

November 27, 2006

Repositories and Web 2.0

[Editorial note: I've updated the title and content of this item in response to comments that correctly pointed out that I was over-emphasising the importance of Flash in Web 2.0 service user-interfaces.]

At a couple of meetings recently the relationship between digital repositories as we currently know them in the education sector and Web 2.0 has been discussed.  This happened first at the CETIS Metadata and Digital Repositories SIG meeting in Glasgow that looked at Item Banks, then again at the eBank/R4L/Spectra meeting in London.

In both cases, I found myself asking "What would a Web 2.0 repository look like?".  At the Glasgow meeting there was an interesting discussion about the desirability of separating back-end functionality from the front-end user-interface.  From a purist point of view, this is very much the approach to take - and its an argument I would have made myself until recently.  Let the repository worry about managing the content and let someone (or something) else build the user-interface based on a set of machine-oriented APIs.

Yet what we see in Web 2.0 services is not such a clean separation.  What has become the norm is a default user-interface, typically written in AJAX though often using other technologies such as Flash, that is closely integrated into the back-end content of the Web 2.0 service.  For example, both Flickr and SlideShare follow this model.  Of course, the services also expose an API of some kind (the minimal API being persistent URIs to content and various kinds of RSS feeds) - allowing other services to integrate ("mash") the content and other people to develop their own user-interfaces.  But in some cases at least, the public API isn't rich enough to allow me to build my own version of the default user-interface.

More recently, there has been a little thread on the UK jisc-repositories@jiscmail.ac.uk list about the mashability of digital repositories.  However, it struck me that most of that discussion centered on the repository as the locus of mashing - i.e. external stuff is mashed into the repository user-interface, based on metadata held in repository records.  There seemed to be little discussion about the mashability of the repository content itself - i.e. where resources held in repositories are able to be easily integrated into external services.

One of the significant  hurdles to making repository content more mashable is the way that identifiers are assigned to repository content.  Firstly, there is currently little coherence in the way that identifiers are assigned to research publications in repositories.  This is one of the things we set out to address in the work on the Eprints Application Profile.  Secondly, the 'oai' URIs typically assigned to metadata 'items' in the repository are not Web-friendly and do not dereference (i.e. are not resolvable) in any real sense, without every application developer having to hardcode knowledge about how to dereference them.  To make matters worse, the whole notion of what an 'item' is in the OAI-PMH is quite difficult conceptually, especially for those new to the protocol.

Digital repositories would be significantly more usable in the context of Web 2.0 if they used 'http' URIs throughout, and if those URIs were assigned in a more coherent fashion across the range of repositories being developed.

Reflections on 10 years of the institutional Web


I began writing this post when we first started the blog back in September - but then it got lost somehow and I never finished it off or published it.  More recently I was interviewed by phone as part of a study evaluating the UK eLib Programme (and in particular the impact of the MODELS project) and it reminded me about the conclusions I draw below.

I was asked earlier in the year to give the closing talk at the UK Institutional Web Managers Workshop, looking back over the past 10 years of the institutional Web.

I have to confess that I felt pretty intimidated by the scope of the talk, though I hope that the result was at least a little interesting.  I certainly found it useful to spend some time thinking about what had been happening over that kind of timeframe.

I suppose I reached two broad conclusions...

My first point was that there hasn't been enough engagement during that time between the UK digital library community (i.e. the community that grew out of the e-Lib Programme, the UK's first significant suite of digital library projects) and the UK Web manager community.  I don't mean that in a "they've been ignoring us" kind of way.  What I mean is that we need to continually remind ourselves that digital library activities (at least in the context of UK higher and further education) need to be firmly grounded in the realities of education service delivery - not least so that they remain relevant and practical.  A good example is metadata development and research (a topic close to my heart) which one could argue has had very limited impact on the issues that institutional Web manages care about.  As the provision of an institution's Web site became an increasingly marketing kind of activity, I think we lost a lot of the links between the webmaster, library and elearning communities that would otherwise have been very beneficial.  Having listened to a lot of very interesting debates on the UK webmaster forums over the years, it has always frustrated me that there seems to have been very little engagement by that community in elearning or eresearch issues as such - and the community is poorer for it.

My second point was that we are in danger of losing our own digital history - or at least that associated with the development of institutional Web sites in the UK.  For a community that generally accepts the importance of digital preservation, it seems to me that we are more talk than action!?  Looking back to the early days of institutional Web site, seeing what those sites looked like and what kinds of discussions went on at that time is surprisingly difficult.  Early mailing list archives got lost in the transition from mailbase.ac.uk to jiscmail.ac.uk, the Internat Archive didn't start until too late to capture some of the stuff, and so on.  Don't get me wrong, there is some interesting stuff still around - I showed some of Brian Kelly's early Web evangelism slides that he was using in 1995 during my talk and noted that they could more or less still be used today.  I also had some stuff in my personal email archives, but to my shame, even some of that got lost when I moved from UKOLN to the Eduserv Foundation :-(

So my major recommendation from the talk was that someone needs to capture some of this stuff as soon as possible, because, if we don't, then a significant part of the history of how university Web sites in the UK grew up and the outcomes of the eLib programme will be lost forever.

It used to be said that eLib took a "let a 1000 flowers bloom" kind of approach.  Undoubtedly true.  Yet somewhere along the line we may not have cultivated those flowers properly and, even if we did, we are in danger of losing our record of how pretty they looked.

Image: Cloud reflections at Nisqually National Wildlife Refuge, Washington, US. [May 2006]

November 26, 2006

Library 2.0 in Wikipedia


I've just discovered that the Wikipedia entry for Library 2.0 is up for deletion (as Paul Miller notes here).  How crazy is that!?  I don't understand the process at work here but the reasons for deletion seem minimal to me - other than perhaps some level of personal dislike of the '2.0' suffix.

Sure, at some point in the future the entry will have to be updated to say 'A library-related term that went out of fashion sometime during 20xx'.  But who is prepared to say what '20xx' will be and who cares anyway?  By that time, the debate around Library 2.0 will have left its mark on us - which is reason enough to retain the entry, or so it seems to me.

Image: SL Library 2.0 tee-shirt.  Want one?  Available in four colours (and free) - IM Art Fossett for details.  (Yes, I am guilty of blatantly trivialising the debate! :-) ).  [November 2006]

November 22, 2006

The power of open access to data

This may be old news to some readers, but it was new to me, and so stunning that I felt the need to share it here.  Via David Recordon's shared items in Google Reader and a post in the ConnectID blog I discovered the TED talks and in particular this presentation by Hans Rosling (from Feb 2006 I think).  It's a fascinating talk that uses some very nice graphics to de-bunk some of the myths about developing nations.

Towards the end Hans makes the point that this kind of analysis is only really possible by unlocking UN statistical data in ways that makes it more openly available for use on the Web - data that has hitherto been locked away in closed databases with hard to use or non-existent APIs.  Hans talks a little about the need to search across this data, whereas my view is that it is the ability to re-use the data that is critical to the kind of analysis demonstrated in the presentation.  But that's a minor point.

Amazing stuff... and it makes me wonder if this kind of analysis could usefully be combined with the data that underpins OCLC's environmental scan to plot similar trends in provision of library and museum services and education.

November 20, 2006

From Salford Quays to synthetic worlds

Last week I spent a couple of days at The Lowry arts centre on Salford Quays, in Greater Manchester, where I was attending the third annual JISC CETIS Conference, which had the theme "Linking formal and informal learning". I had attended one of the previous two conferences; they do have a strong practical focus, and one of the primary aims is to try to identify areas of work on which CETIS and, by extension, JISC, might focus in the short- to medium- term.

Salford Quays is one of those redevelopments of once industrial but subsequently decayed waterfront areas that seem to have appeared in many UK cities over the last fifteen or twenty years or so. By day the area seems to attract a modest but steady stream of vistors to venues such as the Lowry and the Imperial War Museum North (and of course to Old Trafford, just down the road/canal), but although there are blocks of residential accommodation, the area still has something of an unpopulated feel. The three-quarters of an hour or so I spent on a damp Monday night wandering the deserted walkways of anonymous glass-fronted offices and silent red-brick apartment blocks in search of a (veggie) pie and a (decent) pint brought to mind my previous weekend's tentative forays (prompted by Andy's obsession enthusiasm) into the world of Second Life (where I go under the guise of Peregrine Juneau, just in case I bump into you (my flying is still rather erratic, so I tend to bump a fair bit)).

So it was timely that games and virtual worlds featured quite prominently in the programme for the conference. One of the opening keynote presentations was by Ernest Adams, a computer games designer, on the topic of "The Philosophical Roots of Games Design". Ernest presented an entertaining analysis of the culture and "philosophy" of the games industry, in which he sought to position games development in relation to several oppositions (deductive/inductive philosophy, classical/romantic thought, sciences/humanities). Taking in Foucault, Zen and the Art of Motorcycle Maintenance and (inevitably, and very amusingly) that bane of the viewer of long-haul in-flight TV, The Matrix, he argued that games desigers are striving to achieve "romantic" ends ("immersion", compelling narrative) using "classical" means (engineering technology): they may aspire to delivering an experience akin to that achieved in literature, but in practice struggle to realise more than the most banal narratives. And indeed many designers seem content to prolong that imbalance, relying ever more on technological innovation and "flash" to compensate for an absence of romantic imagination. The nature of the form means that a strong engineering component will always be necessary, but the industry needs to bridge those oppositions and redress the balance.

Following this, I attended a workshop session on "Identity, games and synthetic worlds", facilitated by Paul Hollins. (See notes from the session by Scott Wilson). It started with a summary by Sara de Freitas of her forthcoming report, Learning in Immersive Worlds: a review of game-based learning which was commissioned by JISC. (AFAIK, the full report is not yet available, but I came across an article by Sara in the ALT Newsletter in which she gives a brief overview.) Sara noted that games clearly do have the potential to engage and motivate learners, but that teachers face difficulties in selecting games appropriate for use in education, not only in terms of discovering games and familiarising themselves with their content, but also because of the absence of a framework for evaluating their usefulness. She concluded that to be effective in teaching and learning the use of games needs to be embedded within educational practice, and that more research is required to establish the effectiveness of games in this context. Games development can be expensive and greater collaboration between the educational and games development communities would be helpful; and any widespread deployment of games for learning and teaching will depend on the development of a sustainable model of funding to support development and deployment.

Ernest echoed some of Sara's notes of caution regarding the use of COTS (commercial, off-the-shelf games - it took me a while to work that out, the first time I saw it!) in teaching and learning, emphasising that the ethos of commercial games development meant that there was a tendency to "sacrifice verisimilitude on the altar of fun" (i.e. privilege entertainment value and "playability" - and ultimately, I assume, profitability - over any attempt at an accurate representation of some part of the physical world). Also, in the large majority of cases at least, commercially-developed games were not based on any systematic deployment of pedagogical principles - other than, perhaps, that of "sink or swim"/"trial and error" (which, it was pointed out, can still be a useful approach in some cases!). If games are to serve pedagogic purposes, they should be designed with learning in mind, but it has proved difficult to engage the developer community on those terms.

However, one of the points that emerged during discussion was that even with such limitations, some COTS do still become a focus of learning, particularly through the discussions which take place in the communities which form around the game. Such activity may take place within the game (through "chat" functions) or perhaps more often may be situated "outside the game", for example in online fora created to support various aspects of game play itself. Generally, however, there seemed to be consensus around the point that games themselves illustrate rather than teach.

The point was also made that COTS tend to be relatively large and complex, and often require a considerable investment of time on the part of the player to complete some task within the game. It was suggested that consideration might be given to funding the development of some smaller, more fine-grained, single-purpose "gamelets". This approach might also be more economically viable than aiming at the development of larger more complex games.

Unfortunately, given the limitations of time (and a less than conducive room layout), the workshop didn't offer opportunities to observe or experience at first hand the games which were mentioned, though the idea of holding a follow-up "show-and-tell"-style event (a "games bash"?) was suggested.

Hmm. I'm conscious that this is something of a non-committal, trip-report sort of post. That's probably because I feel like an observer looking in on the still rather unfamiliar world of computer games. I'm not a "gamer", by any stretch of the imagination - the early video games passed me by, and although I've made the occasional desultory attempt to guide virtual incarnations of my beloved Sunderland Football Club to imaginary Champions League success in football games, and some time last year my curiosity was sufficiently piqued that I bought a copy of World of Warcraft, I've never really "engaged" sufficiently to feel the urge to invest much time and energy (or disposable income!) in playing them.

Oh, well, maybe the time has come and the Second Life environment will prove to be my gateway experience....

November 16, 2006

The "social" in social tagging

One of the sessions I regret missing (prior commitments beckoned) at DC-2006 was a "special session" on "Social Networks: Tagging and Dublin Core metadata" facilitated by Liddy Nevile. From what I gathered from chatting with a few people who did attend, looking at some of the presentation materials posted and observing some of the email exchanges afterwards, it seems to have been one of the more successful meetings to have taken place during the conference. As a result of the meeting, a proposal was made for the formation of a "community" - a forum for open discussion - within DCMI to explore the relationships between the "traditional" approaches to metadata creation and use and the more informal, community-oriented approaches to metadata that have emerged, particularly through the use of "tagging" in the context of various social software systems. (Details of the DCMI Social Tagging Community are now available.)

One of the first points of discussion was the name of this community and in particular whether that name should refer explicitly to "social tagging", rather than simply "tagging". This post is a slightly extended version of a message I sent to the DCMI Social Tagging mailing list on that subject.

Liddy posted a short piece to stimulate discussion, and drew a distinction between the creation of metadata by "trained cataloguers" (using the example of MARC records created by librarians) and the creation of metadata by "ordinary people", non-experts without training in cataloguing practices (using the example of Dublin Core), and suggesting a similarity between the latter and "tagging".

I think this does highlight one facet of "tagging" which is important: the simplicity of assigning tags. To tag a resource, I choose whatever tags I wish to associate with the selected resource. I don't have to worry about whether I'm using the "right" tag, and I don't have to scrutinise DCMI's documentation to unravel the mystery of when I should use "Bristol" as a value string with the dc:subject property and when I should use it as a value string with the dc:coverage property (well, unless I adopt some hare-brained convention for constructing tags which itself relies on that distinction!)

(As an aside, while I'd agree that in some cases DC metadata has been created by people who are not trained cataloguers, and creating DC metadata is generally simpler/less complex than creating MARC metadata, I'd qualify that by saying that this begs the question of what we mean by "DC metadata". If we mean metadata created using the description model described by the DCMI Abstract Model, then the degree of complexity involved in the creation of instances varies depending on the particular DC application profile being used. A DC application profile may be relatively complex and effective metadata creation may well require some degree of familiarisation/training.)

However, I'm not sure that simplicity is the distinction which people are seeking to capture through the inclusion of the "social" adjective. I think the intent is not to indicate the simplicity of the "tagging" operation, or the level of expertise required, but rather to emphasise that the operation takes place in the context of a communal or collaborative system. The levels of training or expertise of the taggers within the community, and whether there are variations in those levels or not, are a characteristic of the community, rather than of tagging itself. And while most of what we now refer to as tagging does take place within such communal systems, I don't think that is necessarily the case: tagging is often social but it may not be.

In theory, I could engage in "tagging" within a system in which I was the only user. I could run a del.icio.us clone on my laptop, accessible only to me on my user account on that machine, and I could post entries and "tag" resources within that system.  In this scenario, I'm certainly performing the "tagging" operation. But there is no communal or collaborative aspect to that operation. I'm not sharing my collection of entries (including my tags) with anyone else, and I'm not looking at entries (including tags) from the collections of other individuals. No-one else is analysing or using my tags and modifying their tagging behaviour based on that experience, and I'm not analysing or using anyone else's tags, and modifying my tagging behaviour based on that experience. Retrieval within the system is based only on my own tag set. The "description" of a resource constructed in this way is created through my input alone. This is "tagging", certainly, and it may be very useful to me for my personal information management, but it's not "social tagging".

If I was doing a similar thing in a system that was hosted on our organisational intranet, and a few colleagues were also posting entries to their own collections and applying their own tags to resources, and we were browsing each other's collections and using each others' tags, then a communal element is introduced. Even if there are only a handful of contributors, there is now a social dimension to the operation. And typically social tagging software makes this explicit both at the point of tagging (by, for example, offering the tagger suggestions for tags based on the tags used by other contributors) and at the point of retrieval/use (browsing the community's tag set as well as my own tag set). We are users of the tags we ourselves assign, and of the tags created by others within the community. And community tagging behaviour conditions our experience of retrieval and may condition the tags we choose.  And perhaps more interestingly, these multiple taggings by different contributors mean that there is now a "description" of a resource created through the aggregated input of multiple contributors. This is now a form of what I would call "social tagging". In this particular scenario, the "community" is small and probably quite homogeneous in terms of their aims, experience/interests, use of terminology, and so on (and probably more broadly of culture and language). Even within such a small group, different members might "engage" with that "community" to a greater or lesser degree - we might choose to try to converge on a shared set of tags across the community (e.g. I might tag something as "semweb", discover later that my colleagues use a tag "semanticWeb" and then edit/replace my tag accordingly) or we might choose to ignore each others' tagging conventions! - but there is a social dimension both to the tagging operation and to the subsequent use of those tags in retrieval.

And of course that scenario extends to the more familiar open, global, Web-based systems (like del.icio.us and so on) where the community of participants is large and heterogeneous, with wide variations in aims, experiences, culture, language etc. They are all potential users of my tags and I'm a potential user of theirs. In this community, yes, both experts and non-experts are contributors.  And the process results in "descriptions" which are the result of combining the inputs of many - in some cases hundreds or thousands of - different contributors. These descriptions are social, communal constructs, rather than the creation of any one individual participant.

In these scenarios I'm not making any assumptions about the "ordinariness" (or otherwise! ;-) ) of the participants or about their levels of expertise or training. A community of "trained cataloguers" might engage in social tagging and "ordinary people" might "tag" in ways which are not "social".

Turning briefly to Dublin Core metadata, I'm not convinced that DC metadata creation has typically taken place within this sort of communal/collaborative context. And in that sense I'd probably say that DC metadata creation has typically not been a "social" process, and the "descriptions" created are not "social" constructs, or at least not in the way that those generated by tagging within a system like del.icio.us are. And we (the DCMI community) can probably learn from examining those explicitly social aspects of those processes and systems.

November 15, 2006

When outsourcing makes sense

The Eduserv Athens team announced a beta release of their Athens to UK Federation Gateway today:

This is the latest element in the Shibboleth/Athens Interoperability programme agreed with the JISC.  This beta Gateway enables organisations using the Athens service as an outsourced identity provider (IdP) to access resources registered with the UK Access Management Federation for Education and Research.

I did some work a while back with Richard Dunning of the Eduserv MATU team to try and brainstorm a SWOT analysis of the different options available to institutions* as they think about moving to federated access management within the UK Access Management Federation for Education and Research.

My personal view is that there are some compelling arguments in favour of institutions outsourcing their identity management provision.  These are the ones that we came up with:

  • Largely predictable levels of support
  • Largely predictable ongoing costs
  • Negotiation with UKAMF handled by supplier
  • Negotiation with Service Providers handled by supplier
  • Upgrades and support for new AIM technologies handled by supplier
  • Bug-fixes handled by supplier
  • Membership of a supplier community or user-group

There are probably others...

Now, some readers may be thinking that, as an employee of Eduserv, I'm no longer impartial or neutral on this issue and you might be right, though I'd like to think that I have always given impartial advice as far as I was able, and that I will continue to do so.

The neutral in me is saying, give serious consideration to outsourcing your IdP service provision before deciding to develop or buy-in in-house solutions.  I've argued previously that I see the current UK transition to federated access management as a stepping stone to personal identity management.  It doesn't make sense to me that as a community we choose to replicate the engineering necessary to undertake these kinds of changes in every single institution.

The company-man in me is saying, if you go down an outsourced IdP route in whatever form, consider Eduserv as a supplier, not least based on a significant track record of service delivery in this area.

(*) Note that we hope to make the full version of the SWOT analysis available on the MATU Web site in due course.

8 out of 10 ten surveys aren't worth the paper they are written on

There's been widespread reporting about the recent BPI survey into attitudes on copyright protection in recorded music for songwriters and performers (e.g. BBC and Observer).  Amongst other things, the survey appears to suggests that:

62 per cent of those polled believe British artists should receive the same copyright protection as their US counterparts

and that

Just under 70 per cent of 18- to 29-year-olds hold that view, the highest of any age group surveyed.

I've had a quick look round and can't find any evidence of how the survey questions were phrased or what context they were put into, which to my mind means that it is pretty much impossible to draw anything more substantial from the survey than a few juicy headlines.

I did stumble across the BPI's Five reasons to support British music: one of the UK's most valuable industries during my search, which it seems to me presents a rather uncompelling case.

On the other side of the debate, the British Library IP Manifesto sensibly urges caution:

The copyright term for sound recordings should not be extended without empirical evidence of the benefits and due consideration of the needs of society as a whole.

I found it interesting to note that the BPI list being a "unique cultural asset" as the first reason for copyright extension, concluding that

... recordings are the cultural heirlooms of Britain. They should remain in possession of their original owners.

whereas the BL argues that longer copyright makes it difficult or impossible to preserve recordings for the future.

November 14, 2006

Children's Book Week (US)

Peepo Lorcan writes about children's book week here and lists two of his favourites.  His second choice, Peepo, resonates particularly strongly for me.  It was also a favourite of my daughter (Daisy) when we first started reading to her at bed-time.  These kind of books get read so many times over and over again that they get engraved on your heart or something - I could probably still recite many of the words from memory now (15 or 16 years later!).

Privatepeaceful Another personal favourite of mine (for older children) is Private Peaceful by Michael Morpurgo.  I read this to my boys (Wilf and Stan) when they were about 11 and 8.  It's a powerful book, and the act of reading out loud really strengthens the emotion in it.

Nothing beats Swallows and Amazons though! :-)

Note: Yes, I know that Children's Book Week in the UK ran from the 2-8 October this year... but I missed that, so am taking the opportunity second time round!

Same issues, different life

I confess... I find Second Life fascinating and possibly somewhat addictive.  Is there some kind of regular meeting I can attend to help me get over it!?  Oh, right... that's called RL? :-)

The combination of objects and scripting is very powerful and gives us the opportunity of considering SL as a new kind of user-interface to content, services and collaboration.  For example, see this elevator script posting from Jeff Bar's blog (though I confess that when I tried to teleport to the elevator in SL, things failed for some reason).

It's also interesting to note the SL is similar enough to RL that business model and related issues that we are all familiar with in the latter are beginning to arise in the former.  There's a post about Copyright issues in the Official Linden Blog which has generated a lengthy and stimulating discussion.

As we see more and more libraries, universities (and other educational institutions) and content providers moving into SL, it seems to me that we may well need to replicate some or all of the current licence brokering and access management functionality that we see in RL into SL?

November 10, 2006

Acrobat and eLearning

Herbert Van de Sompel pointed me that Ted Padova’s AUC Blog and in particular at this post - Creating Courseware with Adobe Acrobat and Adobe Captivate.

I've never been much of a PDF fan myself, but the post is interesting just to see what is possible.  It shows how

...a faculty member might use Adobe Captivate to instruct students on how to use a PDF file enabled with Adobe Reader usage rights for a lesson in basic Russian. Because Acrobat supports rich media it’s a perfect tool for a language lesson where an instructor’s audio can be heard; and with the PDF enabled with Reader usage rights, the students can record audio responses, take a test, and email the PDF back to the instructor.

What I don't quite get is how this might engage with a set of external services, but I'm guessing that the content of the PDF file could be dynamically generated from resources held in item banks and/or other learning object repositories and that student responses could be handled by some sort of 'assessment' service.  Comments on the viability of this would be welcome.

I guess this positions Acrobat as a potentially useful client-side tool in the Personal Learning Environment (PLE) toolkit?

Second Life, first time

Artcybrarycity I made my first real trip into Second Life (SL) this evening.  Art Fossett, my SL alter ego, at your service!

Quite an interesting experience.  I spent a few hours or so there in total.  I was a bit lost and lonely at first, looking around the SL Library and ICT Library on Info Island.  I had to go a buy a new pair of virtual trainers to keep myself busy - something I would never do in RL!


But then I met up with TalisPaul Fossil (a.k.a Paul Miller of Talis) who showed me round Talis Cybrary City - a growing area for library related activities.  Here's a couple of pictures, one of TalisPaul and me chatting inside the Cybrary and one of me next to the fountain outside.  Note that you don't have to face each each to chat in SL - though I guess that doing so is polite - and that Paul is literally chat'ing (i.e. typing) in the picture which is why it looks like he is doing a Tommy Cooper impression!

Being in a library area was nice - all the other librarians that I met there were very friendly.

Spot the Dublin Core tee-shirt - yours for only L$20 :-)

November 09, 2006

Models, models everywhere

Lorcan Dempsey examines the nature of metadata specifications used within the libraries, museums and archives communities. One of the distinctions Lorcan makes is between what he calls:

One of the comments on Lorcan's post requested some clarification on the difference between these two different types/classes of model. I started to write a comment on Lorcan's post, but it got quite long so I decided to write something here instead.

The first thing to say is that I recognise that our use of terminology needs some refinement here! After all, Wordnet tells me that a concept is "an abstract or general idea inferred or derived from specific instance", so I'm not sure that the adjectives "conceptual" and "abstract" do much to help to explain the difference. And I think any model is probably by definition abstract or conceptual (Wordnet again: "a hypothetical description of a complex entity or process").

But leaving aside the adjectives for a moment, I think the key point is that we are talking about models of two different things (or sets of things).

What Lorcan calls a "conceptual model" (and what Andy Powell calls here (and following slides) an "application model") is a model of "the world" (or some part of the world) that is of interest to some community. That "interest" is usually determined by the requirement to develop a software application to provide some set of functions. So this sort of model describes what information is needed to deliver those functions. Typically it specifies what types of thing are being described, the types of relationship that exist between those different types of thing, and the various attributes of those things. Such a model is always an "abstraction", a theoretical construct: we often introduce entities in our models which have no visible analogues in the physical world (the Work, Expression and Manifestation entitiy types in the FRBR model being a good example) and we make choices about how much of the complexity of the "real world" we need to include in our model, based on what particular set of functions we are seeking to deliver (to sell me CDs, Amazon is interested in my home address and credit card number but not my hair colour or National Insurance number).

An "abstract model" like the DCMI Abstract Model, on the other hand, is a model of a class of information structures. This sort of model specifies not the set of things to be described in a metadata application, but rather the nature of the descriptions themselves. So, the DCMI Abstract Model describes an information structure which it calls a DC metadata description set. It specifies the component parts which make up an instance of that information structure (description, resource URI, statement etc) and the relationships between those components (a description set contains one or more descriptions; a description contains one or more statements; and so on). And it also specifies how an instance of that information structure is to be interpreted (a statement in a description says that the resource identified by the resource URI is related in the way specified by the resource identified by the property URI to the resource identified by the value URI (!)).

But the DCMI Abstract Model doesn't say anything about the types of resources that can be described in DC metadata description sets, or the attributes of those resources, or the relationships between them. The DCMI Abstract Model can be used with an infinite number of "application models" ("conceptual models" in Lorcan's post); and conversely a single "application model" could be implemented using several different metadata "abstract models".

I'm probably glossing over some more complex issues here, but I think the key difference is that these two classes of model are models of different things: the former specifies what types of things are being described, and the latter specifies the nature of the descriptions themselves.

November 08, 2006

Intute - new Internet tutorials released

I've always had a soft spot for the Virtual Training Suite of Internet information literacy tutorials.  In my view this was one of the best things to come out of the JISC-funded Resource Discovery Network (now Intute) though, to be honest, I preferred the VTS's previous name - Internet Detective.  It's good therefore to see the service announcing a programme of new and updated tutorials:

Intute has released a number of new Internet tutorials for the Arts and Humanities, in the Virtual Training Suite this term.  The following tutorials have been completely updated and revised:

Internet Archaeologist
By Stuart Jeffrey, Archaeology Data Service/AHDS Archaeology, University of York.

Internet for Historians
By staff from the Institute of Historical Research, University of London.

Internet for Modern Languages
By Dr. Shoshannah Holdom, University of Oxford

Internet for Performing Arts
By Jez Conolly, Drama Subject Librarian, University of Bristol

Internet for Religious Studies
By Dr. Meriel Patrick, Oxford University

Internet Philosopher
By Kathy Behrendt, D.Phil, University of Oxford

This the the first stage of a major programme of change to update and revise all the tutorials in the Virtual Training Suite over the coming year.  A national network of expert authors is being commissioned to re-write the content of each tutorial to bring it in line with recent Internet developments and to ensure the tutorials continue to offer authoritative and timely advice on Internet research for over 60 subjects.

The recommended lists of key Internet resources are being completely updated; there is new advice on Internet searching, with improved interactive exercises; and a new section called "Success Stories" in each tutorial to illustrate how the Internet can be used effectively to support education and research.

Good stuff!

Building a Web 2.0 school Web site

Class 3C Henry VIII

In my spare time I'm chair of governors at a local primary school - Newbridge Primary School.  Being a school governor is very worthwhile (I'd recommend it to any parent) particularly for anyone (like me) who is interested in seeing how ICT gets used in practice in the UK schools sector.

Newbridge Primary was only formed at the start of this school year.  It was created from the amalgamation of two existing schools, an infant and a junior school, that were previously on the same site.  One of the consequences of the amalgamation was that the existing schools lost their Web sites.  As a new school we were offered  a new domain name and some space on a South West Grid for Learning server - space, but not a lot else as it turned out.  No content management system or anything and only ASP for server-side scripting.

For various reasons, I ended up taking on the task of trying to put together a site for the new school.  Once I sat down and thought about it, I decided I should put my money (or at least my effort) where my mouth was - using external Web 2.0 services to manage the more dynamic parts of the site.  This decision was partly pragmatic.  With only ASP to play with on the server, a language I didn't know at all and one that I don't have a particular desire to learn, I wasn't really in any position to build anything complex from scratch myself.

In design terms, the resulting site is pretty basic - I wanted it that way - not least because as a new school we are still going through the process of choosing a logo and so on.  My only criteria really were to stick to the school colour, red, to use standards (XHTML and CSS) throughout, and to make the resulting site as accessible as possible (though I've probably made some mistakes in this area).  The site is a mash-up of content pulled from Google Calendar (calendar entries), Flickr (images), Blogger (blogs),  Del.icio.us (links) and Google Maps (maps).

How does it work?  Well, all the images on the site are pulled in from the school's Flickr account - the idea being that any teacher will be able to upload and tag images as and when they want.  Links to external sites are managed using Del.icio.us - again with the idea that teachers will add and tag sites that they use or want the children to use.  Entries in Del.icio.us are tagged by educational level (using the tags 'ukel1', 'ukel2' and 'ukel3' taken from the proposed list of UK Educational Levels) and curriculum area.  The school calendar is managed on-line using Google calendar.  The lists of school newsletters and other news items are managed in a blog.  Finally, each class has been given a blog of its own - the idea being that children will be able to write their own entries on their class blog.

The server-side and client code needed to make all this hang together is surprisingly light.  A Javascript object here or there to pull in the Google and Flickr stuff.  A simple ASP script and XSL transformation to process RSS feeds from Blogger and Del.icio.us into XHTML.  Not a lot else.

It's interesting though that pulling this stuff together is surprisingly simple, yet very powerful.  Six months ago I was wondering whether the Eduserv Foundation should look into the possibility of funding work around content management systems for schools.  Now I'm thinking that's the last thing a school needs - hiding all their content inside their own system.  Much better to manage and build their content out there on the Web, where it is widely accessible and, more importantly, where staff and pupils can more easily add to the content in whatever way they see fit.

Imagine if every school in the UK managed its links using Del.icio.us, using a consistent tagging convention for curriculum areas and educational level.  Now that would be very powerful.

I have to confess that all of this is very experimental at this stage.  The site is very new and we haven't even tried to get staff or children up to speed with the tools listed above.  That will come in due course.  It'll be interesting to see how well the site works in practice.  There are still significant gaps in site content at the moment.  And, of course, none of this is really about learning... yet.  But you've got to start somewhere and I'm hopeful that using these kinds of services in this kind of way will help to inspire at least some of the teaching staff to experiment more with these tools in their real learning and teaching activities.

Image: Henry VIII by a year 3 child at Newbridge Primary School, UK [October 2006]

Using Web 2.0 tools in education - a practical guide

Brian Benzinger at Solution Watch offers a three part guide to using Web 2.0 tools in education - Back to School with the Class of Web 2.0.  This looks to be an excellent and thought provoking practical guide to many of the Web 2.0 and related tools that are available on-line.

  • Part 1 offers a guide to available tools.
  • Part 2 covers on-line office applications.
  • Part 3 shows some real use-cases.

Well worth a look.

November 07, 2006

Web 2.0 and friendship

Slightly strange title but bear with me while I get to the point...

Lorcan Dempsey noted the Web 2.0 piece in the UK Guardian Weekend supplement on Saturday (A bigger bang).

Overall, it makes for a worthwhile read and there's some interesting quotes from some of the key players.  However, it ends slightly oddly (for me) with a discussion about whether social software is devaluing our notions of friendship:

Sit someone at a computer screen and let it sink in that they are fully, definitively alone; then watch what happens. They will reach out for other people; but only part of the way. They will have "friends", which are not the same thing as friends, and a lively online life, which is not the same thing as a social life; they will feel more connected, but they will be just as alone. Everybody sitting at a computer screen is alone. Everybody sitting at a computer screen is at the centre of the world. Everybody sitting at a computer screen, increasingly, wants everything to be all about them. This is our first glimpse of what people who grow up with the net will want from the net. One of the cleverest things about MySpace is the name.

Coincidentally, I read the article shortly after re-watching Rob Reiner's Stand by me on DVD - being stuck at home currently with a bout of labarynthitis, which I wouldn't recommend to anyone by the way (the illness, not the film)!  The film ends with the narrator typing:

I never had any friends later on like the ones I had when I was twelve.  Jesus, does anyone?

Not exactly an up-beat ending!

It struck me that our perception about the impact of technology on society, and particularly on young people, is difficult to separate from our own changing circumstances.  Certainly, there is no evidence from my own children that technology is making it difficult for them to make real friendships.  Quite the opposite in fact.  They are very adept and comfortable in using on-line social tools (and all sorts of other technology for that matter) as part of their real-world social lives.



eFoundations is powered by TypePad