« August 2009 | Main | October 2009 »

September 29, 2009

The Google Book Settlement

The JISC have made a summary of the proposed Google Book Settlement available for comment on Writetoreply (a service that I really like by the way), along with a series of questions that might usefully be considered by interested parties. Thanks to Naomi Korn and Rachel Bruce for their work on this.

Not knowing a great deal about the proposed settlement I didn't really feel able to comment but in an effort to get up to speed I decided to put together a short set of Powerpoint slides, summarising my take on the issues, based largely on the JISC text.

Here's what I came up with:

Of course, my timing isn't ideal because the proposed review meeting on the 7th October has now been replaced with a 'status update' meeting [PDF] that will "decide how to proceed with the case as expeditiously as possible". Ongoing discussion between Google and the US Department of Justice looks likely to result in changes to the proposed settlement before it gets to the review stage.

Nonetheless, I think it's useful to understand the issues that have led up to any revised settlement and in any case, it was a nice excuse to put together a set of slides using CC images of books from Flickr!

Mobile learning

At the beginning of last week I attended the CILIP MmIT (Multimedia Information & Technology Group) Annual Conference for 2009 on the topic of "Mobile Learning: What Exactly is it?" (I can't give you a link to the event because as far as I can tell there isn't one :-( ).

It wasn't a bad event actually and there were some pretty good speakers. My live-blogged notes are now available, though you should note that the wireless network was pretty flakey (somewhat ironic for a mobile learning event huh?) which means that there are some big(ish) gaps in the coverage.

There were places where I wanted more depth from the speakers but given the introductory nature of the event I think it was probably pitched about right overall.

Two thoughts came to me as the day progressed...

Firstly, it was clear that most of the projects being shown on the day were based either on hardware handed out to people specifically for the particular project or on lowest common denominator standards (i.e. SMS) that work on everybody's existing personal mobile devices. The former is clearly problematic in terms of both sustainability and because of people having to deal with an additional device. The latter results (tyically) in less functionality being offered. At one point I asked if there was any evidence that projects were moving towards developing for specific devices, in particular for the iPhone, on the basis that doing so would allow for significantly more functionality to be delivered to the end-user.

I don't think I got a clear answer on this, though I suspect that the speaker made the assumption that I thought developing for the iPhone was a good thing (on the basis I was holding one at the time). In fact, I'm not sure I have a good feel for what is good and bad in this area - I can see advantages in keeping things simple and inclusive but I can also see that experimenting with the newest technologies allows us to try things that wouldn't be possible otherwise.

Coincidentally, a similar debate surfaced on the website-info-mgt@jiscmail.ac.uk mailing list a few days later, flowing from the announcement of the University of Central Lancashire freshers' iPhone application. In the discussion, I asked if we knew enough about the mobile devices that new freshers are bringing with them to university in order that we can make sensible decisions about which mobile device capabilities to target. In a world of limited development resources, there's not much point in developing an iPhone app if only a handful of your intended audience can afford to own one (unless you explicitly doing it to experiment with what is possible). Brian Kelly has since picked up this theme, We Need Evidence – But What If We Don’t Like The Findings?, though focusing more on operating systems generally rather than mobile devices specifically.

Quite a few sites came back to me with stats (Brian shows some of them). I particularly like the Student IT Services Survey 2009 (PDF) undertaken by Information Services at the University of Bristol which isn't limited to freshers but which asks a whole range of useful questions. Overall, and based on the limited evidence available to date, I suggest that the iPhone and iPod Touch have fairly low penetration in the student market thus far.

It strikes me that, given a generally rising interest in mobile technology, 'everyware', ubiquitous computing, and so on for learning and research, some sort of longitudinal study of what students are bringing with them to university might not be a bad thing?

Secondly, my other thought... was that Dave White's visitors vs. residents stuff is highly pertinent to this space. Actually, for what it's worth, I don't go to any conference these days without realising that Dave's thinking in this area is highly relevant! It seems to me that many of our uses of mobile technologies are aimed at visitors - they are aimed at people who have a job to get done. Yet the really interesting thing about mobile technology is not how 'we' (the university) can use it to reach 'them' (the learner or researcher) but how they are using it to reach each other (as part of their everyday use of technology). The interesting thing is how residents are using it to live their lives online.

We need to see ourselves primarily as enablers in this space - not as direct providers of services.

September 22, 2009

VoCamp Bristol

At the end of the week before last, I spent a couple of days (well, a day and a half as I left early on Friday) at the VoCamp Bristol meeting, at ILRT, at the University of Bristol.

To quote the VoCamp wiki:

VoCamp is a series of informal events where people can spend some dedicated time creating lightweight vocabularies/ontologies for the Semantic Web/Web of Data. The emphasis of the events is not on creating the perfect ontology in a particular domain, but on creating vocabs that are good enough for people to start using for publishing data on the Web.

I admit that I went into the event slightly unprepared, as I didn't have any firm ideas about any specific vocabulary I wanted to work on, but happy to join in with anyone who was working on anything of interest. Some of the outputs of the various groups are listed on the wiki page.

As well as work on specific vocabularies, the opening discussions highlighted an interest in a small set of more general issues, which included the expression of "structural constraints" and "validation"; broader questions of collecting and interpreting vocabulary usage; representing RDF data using JSON; and the features available in OWL 2. Friday morning was set aside for those topics, which meant I had an opportunity to talk a little bit about the work being done within the Dublin Core Metadata Initiative on "Description Set Profiles", which I've mentioned in some recent posts here. I did hastily knock up a few slides, mainly as an aide memoire to make sure I mentioned various bits and pieces:

There was a useful discussion around various different approaches for representing such patterns of constraints at the level of the RDF graph, either based on query patterns, or on the use of OWL (with a "closed-world" assumption that the "world" in question is the graph at hand). Some of the new features in OWL 2, such as capabilities for expressing restrictions on datatypes seem to make it quite an attractive candidate for this sort of task.

I was asked about whether we had considered the use of OWL in the DCMI context. IIRC, we decided against it mostly because we wanted an approach that built explicitly on the description model of the DCMI Abstract Model (i.e. talked in terms of "descriptions" and "statements" and patterns of use of those particular constructs), though I think the "open-world" considerations were also an issue (See this piece for a discussion of some of the "gotchas" that can arise).

Having said that, it would seem a good idea to explore to what extent the constraint types permitted by the DSP model might be mapped into other form(s) of expressing constraints which might be adopted.

All in all, it was a very enjoyable couple of days: a fairly low-key, thoughtful, gentle sort of gathering - no "pitches", no prizes, no "dragons" in their "dens", or other cod-"bizniz" memes :-) - just an opportunity for people to chat and work together on topics that interested them. Thank you to Tom & Damian & Libby for doing the organisation (and introducing me to a very nice Chinese restaurant in Bristol on the Thursday night!)

September 16, 2009

Edinburgh publish guidance on research data management

The University of Edinburgh has published some local guidance about the way that research data should be managed, Research data management guidance, covering How to manage research data and Data sharing and preservation, as well as detailing local training, support and advice options.

One assumes that this kind of thing will become much more common at universities over the next few years.

Having had a very quick look, it feels like the material is more descriptive than prescriptive - which isn't meant as a negative comment, it just reflects the current state of play. The section on Data documentation & metadata for example, gives advice as simple as:

Have you created a "readme.txt" file to describe the contents of files in a folder? Such a simple act can be invaluable at a later date.

but also provides a link to the UK Data Archive's guidance on Data Documentation and Metadata, which at first sight appears hugely complex. I'm not sure what your average research will make of it?

(In passing, I note that the UKDA seem to be promoting the use of the Data Documentation Initiative standard at what they call the 'catalogue' level, a standard that I've not come across before but one that appears to be rooted firmly outside the world of linked data, which is a shame.)

Similarly, the section on Methods for data sharing lists a wide range of possible options (from "posting on a University website" thru to "depositing in a data repository") without being particularly prescriptive about which is better and why.

(As a second aside, I am continually amazed by this firm distinction in the repository world between 'posting on the website' and 'depositing in a repository' - from the perspective of the researcher, both can, and should, achieve the same aims, i.e. improved management, more chance of persistence and better exposure.)

As we have found with repositories of research publications, it seems to me that research data repositories (the Edinburgh DataShare in this case) need to hide much of this kind of complexity, and do most of the necessary legwork, in order to turn what appears to be a simple and obvious 'content management' workflow (from the point of view of the individual researcher) into a well managed, openly shared, long term resource for the community.

September 14, 2009

Flocking behaviour - why Twitter is for starlings, not buzzards

Byrdes of on kynde and color flok and flye allwayes together.

William Turner, 1545

Brian Kelly has posted a light analysis of Twitter usage around the ALT-C 2009 conference in Manchester last week. He notes that there were "over 4,300 tweets published in a week" using the (conference-endorsed) #altc2009 hashtag (summary), and a further "128 tweets [...] from 51 contributors" using the alternative (but not endorsed) #altc09 hashtag (summary). Pretty impressive I think.

Looking at the summaries for the two hashtags I note that @HallyMk1 was by far the highest user of the 'wrong' tag - 41 tweets - making him one of the more prolific individual tweeters at the conference I suspect.

The trouble is, in my experience at least, using a Twitter search for a particular hashtag has become the most common way to keep up to date with what is going on at a given event. On that basis, if you don't tweet using the generally agreed tag you are effectively invisible to much of the conference audience - in short, you aren't part of the conversation in the way you are if you use the same tag as everyone else.

Tags emerge naturally as part of the early 'flocking behaviour' in the run up to an event (with and without the help of conference organisers). I would argue that in general it pays to go with the flow, even if you have good reason for thinking an alternative hashtag would have been a better choice (because it is shorter for example). As I noted to @HallyMk1 on Twitter this morning, to do otherwise makes you "either a slow learner or very stubborn" :-)

September 03, 2009

More on identity and access management...

I seem to be on a mini-roll of posts related to identity and access management at the moment...  so, while I'm at it, a couple of quick (and largely unrelated) things.

Firstly, the JISC call 08/09: Access & Identity Management is currently out and, while I don't know that we are actively seeking partners, if any institutions are interested in working with us around OpenAthens (I guess I'm thinking primarily of the Innovation part of the call here) then I'm sure that there will be people here who would be happy to talk to you.

Secondly, Johannes Ernst has a short post, On Identity Business Models or Lack Thereof, which, while not directly relevant to the education space, is certainly interesting and notes various categories of model that might usefully help frame our wider thinking.

Internet Identity Workshop

I note that the ninth Internet Identity Workshop (IIW IX) is taking place on November 3-5 (Tuesday to Thursday) in Mountain View California at the Computer History Museum.

I mention this primarily because it looks like an excellent event - take a quick look at the breadth of topics discussed at the last meeting for example - but is one that is a long way away from those of us in the UK. Is there space to have this kind of 'identity' meeting in Europe - or does such a thing already exist?

September 02, 2009

Publisher Interface Study - final report

Back in June I reported on a meeting that I had attended as part of the JISC-funded Service Provider Interface Study. The final report from that project is now available, in a commentable form, and feedback is requested.

The report makes two key recommendations:

Recommendation 1 - A brand should be created for academic federated access. For this brand to be successful, it needs widespread adoption worldwide. The brand should include a short name and a logo; these need not mean anything but simply provide a familiar point of reference.

Recommendation 2 - A "style guide" should be created for publishers to follow around implementing discovery using the brand created.

These seem sensible to me and certainly in line with my suggestion that there needs to be much "greater consistency to the way that SAML-based sign-on is presented to the end-user".  Note that the brand refers to 'academic federated access' generally, rather than to the UK Access Management Federation for Education and Research in particular - i.e. it needs to work across federations (possibly based on differing technologies?) - a non-trivial task to say the least (but one that is probably worth aiming for).

As a result of this study the JISC intends to:

  1. carry out a full public consultation on the findings of the report;
  2. instigate an international competition for the design of a federated log-in brand;
  3. develop full brand guidelines for publishers and other service providers;
  4. develop an easy-install tool and guide for embedded WAYFs (Where are You From services).

I would hope that service providers themselves get heavily involved in these activities.  And for the last... I think the JQuery demo, provided in the previous post, is indicative of one direction such an "easy-install" tool could take.

Addendum: Johannes Ernst has an interesting post, Information Cards Have the NASCAR Problem, Too, which notes that OpenID and Information Cards, both of which have globally identifiable logos, also suffer from the multiple brand problem (roughly equivalent to the multiple federation/multiple institution issue in SAML-based federations). He ends with:

What about we drop the NASCAR argument in the OpenID vs. information cards discussion, and figure out how to solve the common issue instead?

a principle that I think we might usefully expand to include our own SAML world if at all possible.

September 01, 2009

Experiments with DSP and Schematron

There has been some discussion recently around DCMI's draft Description Set Profile specification, both on the dc-architecture Jiscmail list and, briefly, on Twitter.

From my perspective, the DSP specification is one of the most interesting recent technical developments made by DCMI. For me, it provides the long-needed piece of the jigsaw that enables us to construct a coherent picture of what a "DC application profile" is. What do these tabular lists of "terms", or combinations of terms, that have typically appeared in these documents people call "DC application profiles" actually "say"? What does "use dc:subject with vocabulary encoding schemes S and T" actually "mean"? How can we formalise this information?

To recap, the DSP specification takes the approach that what is at issue here is a set of "structural constraints" on the information structure that the DCMI Abstract Model calls a "description set". The DCAM itself defines the basic structure (a "description set" contains "descriptions"; a "description" contains "statements"; a "statement" contains a "property URI", an optional "value URI" and "vocabulary encoding scheme URI", and so on). But that's where the DCAM stops: it doesn't say anything about any particular set of property URIs or vocabulary encoding scheme URIs; it doesn't specify whether, in the particular set of description sets I'm creating, plain literals should be in English or Spanish. This is where the DSP spec comes in. The DSP model allows us to say, "I want to apply a more specific set of requirements: a description of a book must provide a title (i.e. must include a statement with property URI http://purl.org/dc/terms/title) and must include exactly two subject terms from LCSH (i.e. must include two statements with property URI http://purl.org/dc/terms/subject and vocabulary encoding scheme URI http://purl.org/dc/terms/LCSH), or a description of a person is optional, but if included it must provide a name (i.e. must include a statement with property URI http://xmlns.com/foaf/0.1/name).

To express these constraints, the spec defines a model of "Description Templates", in turn containing sets of "Statement Templates". A set of such templates provides a set of "patterns", if you like, to which some set of actual "instance" descriptions sets can "match" or "conform". The specification also defines both an XML syntax and an RDF vocabulary for representing such a set of constraints.

As an aside, it's also worth noting that a single description set may be matched against multiple profiles, depending on the context (or indeed against none: there is no absolute requirement that a description set matches any DSP at all). The same description set may be tested against a fairly permissive set of constraints in one context, and a "tighter" set of constraints in another: the same description set may match the former, and fail to match the latter. To paraphrase James Clark's comments on XML schema, "validity" should be treated not as a property of a description set but as a relationship between a description set and a description set profile.

The current draft is very much just that, a draft on which feedback is being gathered. Are the current constraints fully/clearly specified? Is the processing algorithm complete/unambiguous? Are the current constraint types the ones typically required? Are there other constraint types which would be useful? And it is almost certain that there will be changes made in a future version, but nevertheless, it seems to me it is a very solid first step, and it's very encouraging to see that implementers are starting to test out the current model in earnest.

One of the questions that I've been asked in discussions is that of how the DSP model relates to XML schema languages.

A description set might be represented in many different concrete formats, including XML formats. XML schema languages (and here I'm using that term in a generic sense to refer to the family of technologies, not specifically to W3C XML Schema, one particular XML schema language) allow you to express a set of structural constraints on an XML document.

An XML format which is designed to serialise the description set structure provides a mapping between the components in that structure and some set of components in an XML document (XML elements and attributes, their names and their content and values).

And so, for such an XML format, it should be possible to map a DSP - a set of structural constraints on a description set - into a corresponding set of constraints on an instance of that XML format. I say "should" because there are a number of factors to be taken into consideration here:

  • The current draft DSP model includes some constraints which are not strictly structural. For example, the model allows a Statement Template to include a "Sub-property Constraint" (6.4.2), which allows a DSP to "say" things like "This statement template applies to a statement referring to any subproperty of the property dc:contributor". A processor attempting to determine whether or not a particular statement referring to some property ex:property matches such a constraint requires information about that property external to the description set itself in order to know whether the DSP requirement is met or not
  • Whether all the constraints can be reflected in an XML schema depends on the characteristics of the XML format and on the features of the XML schema language. Different XML schema languages have different capabilities when it comes to expressing structural constraints, and, for a single XML format, one schema language may be able to express constraints which another can not. So for the case of mapping the DSP constraints into an XML schema, it may be that, depending on the nature of the XML format, one XML schema language is capable of capturing more of the constraints on the XML document than another.

Anyway, to try to illustrate one possible application of the DSP model, I've spent some time recently playing around with XSLT and Schematron to try to create an XSLT transformation which:

  • takes as input an instance of the DSP-XML format described in the current draft (version of 2008-03-31) i.e. a representation of a DSP in XML; and
  • provides as output a Schematron schema containing a corresponding set of patterns expressing constraints on an instance of the XML format described in the proposed recommendation for the XML format known as DC-DS XML (version of 2008-09-01).

I should emphasise that I'm very much a newcomer to Schematron, my XSLT is a bit rusty, I haven't tested what I've done exhaustively, and I've worked on this on and off over a few days and haven't done a great deal to tidy up the results. So I'm sure there are more elegant and efficient ways of achieving this, but, FWIW, I've put where I've got to on a page on the DCMI Architecture Forum wiki.

The transform is dsp2sch-dcds.xsl

To illustrate its use, I created a short DSP-XML document and a few DC-DS XML instances.

bookdsp.xml is an DSP-XML representation of a short example DSP. It's loosely based on the book-person example that Tom Baker and Karen Coyle used in their recently published Guidelines for Dublin Core Application Profiles, but I've tweaked and extended it to include a broader range of constraints.

Running the transform against that DSP generates a Schematron schema: dsp-dcds.xml.

The page on the wiki lists a few example DC-DS XML instances, and the results of validating those instances against this Schematron schema. So for example, book4.xml is a DC-DS XML instance which conforms to the syntactic rules of the format, but fails to match some of the constraints of the Book DSP (the DSP allows the "book" description to have only two statements using the dc:creator property, and the example has three; and the DSP allows only two "person" descriptions, and the example has three). The result of validation using the Schematron schema is the document valbook4.xml. (The Schematron processor outputs an XML format called Schematron Validation Report Language (SVRL), which is a bit verbose, but fairly self-explanatory; it could be post-processed into a more human-readable format).

The approach taken is, roughly, that the transform generates:

  • a Schematron pattern with a rule with context dcds:descriptionSet, which, for each Description Template, tests for the number of child dcds:description elements that satisfy that Description Template's Resource Class Membership Constraint (more on this below), using a corresponding XPath predicate. e.g. from the bookdsp example dcds:description[dcds:statement[@dcds:propertyURI='http://www.w3.org/1999/02/22-rdf-syntax-ns#type' and (@dcds:valueURI='http://purl.org/dc/dcmitype/Collection')]]
  • for each DSP Description Template, a Schematron pattern with a rule with context dcds:description[the resource class membership predicate above], which tests the Standalone Constraints, and then, for each Statement Template, tests for the number of child dcds:statement elements that satisfy the Statement Template's Property Constraint, using a corresponding XPath predicate. e.g. from the bookdsp example dcds:statement[@dcds:propertyURI='http://purl.org/dc/terms/title']
  • for each DSP Statement Template that specifies a Type Constraint, a Schematron pattern with a rule with context dcds:description[the resource class membership predicate above]/dcds:statement[the property predicate above], which tests for the various other (Literal or Non-Literal) constraints specified within the Statement Template.

A few thoughts and notes are in order.

  1. The transform is specific to the version of the DSP-XML format specified in the current draft, and to the current version of the DC-DS XML format. If either of these change then the transform will require modificaton. Another transform could be written to generate patterns for another XML format, e.g. for RDF/XML (or maybe more easily, a defined "profile" of RDF/XML) or even for the use of the DC-HTML profile for embedding data in XHTML meta/link elements (subject to the limitations of that profile in terms of which aspects of the DCAM description model are supported).
  2. It assumes that the DSP XML instance is valid, and that the DC-DS XML instance is valid, in the sense that it conforms to the basic syntactic rules of that format. (I've got some additional general, DSP-independent Schematron patterns for DC-DS XML, which in theory could be "included" in the generated schema, but I haven't managed to get that to work correctly yet.)
  3. The output from the current version includes a lot of "informational" reporting ("this description contains three statements" etc), as well as actual error messages for mismatches with the DSP constraints. Mostly this was to help me debug the transform and get my head round how Schematron was working, but it makes the output rather verbose. I've left it in for now, but I might remove or reduce it in a subsequent version.
  4. What I've come up with currently implements only a subset of the model in the DSP draft. In particular, I've ignored constraints that go beyond the structural and require checking beyond the syntactic level (like the Subproperty Constraint I mentioned above). And for some other constraints, I've adopted a "strictly structural" interpretation: this is the case for the Description Template Resource Class Membership Constraint (5.5), which I interpreted as "the description should contain a statement referring to the rdf:type property with a value URI matching one of the listed URIs", and for the Statement Template Value Class Membership Constraint (6.6.2), which I interpreted as "there should be a description of the value containing a statement referring to the rdf:type property with a value URI matching one of the listed URIs". i.e. I haven't allowed for the possibility that an application might derive information about the resource type from property semantics (e.g. from inferencing based on RDF schema range/domain).
  5. Finally, the handling of literals is somewhat simplistic. In particular, I haven't given any thought to the handling of XML literals, but even leaving that aside it probably needs some additional character escaping.

Anyway, I intend this not as any sort of "authorised" tool, nor as "the finished article", but as a fairly rough first stab at an example of the sort of XML-schema-based functionality that I think can be built starting from the DSP model, and as a contribution to the ongoing discussion of the current working draft.



eFoundations is powered by TypePad