In these times of financial constraints and environmental concerns, attending meetings remotely (particularly those held overseas) is becoming increasingly common. Such was the case, for me, at 7pm UK time on Friday night last week... I should have been eating pizza in front of the TV with my family but instead was messing about with Skype, my house phone, IRC and Twitter in an attempt to join the joint meeting of the DC Architecture Forum and the W3C Library Linked Data Incubator Group (described in Pete's recent post, The DCMI Abstract Model in 2010).
The meeting started with Tom Baker summarising the history and current state of the DCMI Abstract Model (the DCAM) - a tad long perhaps but basically a sound introduction and overview. Unfortunately, my Skype connection dropped a couple of times during his talk (I have no idea why) and I resorted to using my house phone instead - using the W3C bridge in Bristol. This proved more stable but some combination of my phone and the microphone positioning in the meeting meant that sound, particularly from speakers in the audience, was rather poor.
By the time we got to the meat of the discussion about the future of the DCAM I was struggling to keep up :-(. I made a rather garbled contribution, trying to summarise my views on the possible ways forward but emphasising that all the interesting possibilities had the same end-game - that DCMI would stop using the language of its own world-view, the DCAM, and would instead work within the more widely accepted language of the RDF model and Linked Data - and that the options were really about how best we get there, rather than about where we want to go.
Unfortunately, this is a view that is open to some confusion because the DCAM itself uses the RDF model. So when I say that we should stop using the DCAM and start using RDF and Linked Data its not like saying that we should stop using model A and start using model B. Rather, it's a case of carrying on with the current model (i.e. the RDF model) but documenting it and talking about it using the same language as everyone else, thus joining forces with more active communities elsewhere rather than silo'ing ourselves on the DC mailing lists by having a separate way of talking.
So, anyway, I don't know how well I put my point of view across - one of the problems of being remote is that the only feedback you get is from the person taking minutes, in this case in the W3C IRC channel:
18:50:56 [andypowe11] ok, i'd like to speak at some point
18:52:34 [markva] andypowe11: options 2b, 3 and 4: all work to RDF, which is where we want to get to
18:52:55 [markva] ... which of these is better to get to that end game, wrt time available
18:53:23 [markva] ... 4 seems not ideal, but less effort
18:54:01 [markva] ... lean to 3; 2b has political value by taking along community; but 3 better given time
Stu Weibel spoke after me - a rather animated contribution (or so it seemed from afar). No problem with that... DCMI could probably do with a bit more animation to be honest. I understood him to be saying that we should adopt the Web model and that Linked Data offered us a useful chance to re-align ourselves with other Web activity. As I say, I was struggling to hear properly, so I may have mis-understood him completely. I glanced at the IRC minutes:
18:54:56 [markva] Stu Weibel: frustrated; no productive outcomes all these years
18:55:10 [markva] ... adopt Web as the model
18:55:37 [markva] ... nobody understands DCAM
18:56:03 [markva] ... W3C published architecture document after actual implementation
18:56:45 [markva] ... revive effort: develop reference software; easily drop in data, generate linked data
I responded positively, trying to make it clear that I was struggling to hear and that I may have mis-interpreted him but noting the reference to 'linked data', which I'd heard as 'Linked Data':
18:57:12 [markva] andypowe11: support Stu
The minute is factually correct - I did support Stu - but in an 'economical with the truth' kind of way because I only really supported what I thought I'd heard him say - and quite possibly not what he actually said! With hindsight, I wonder if the minute-taker's use of 'linked data' (lower-case) actually reflected some subtlety in what Stu said that I didn't really pick up on at the time. If nothing else, this exchange highlights for me the potential problems caused by those who want to distinguish 'linked data' (lower-case) from 'Linked Data' (upper-case) - there is no mixed-case in conversation, particularly not where it is carried out over a bad phone connection.
So anyway... the meeting moved on to other things and, feeling somewhat frustrated by the whole experience, I dropped off the call and found my cold pizza.
My point here is not about DCMI at all, though I still have no real idea whether I was right or wrong to agree with Stu. My gut feeling is that I probably agreed with some of what he said and disagreed with the rest - and the lesson, for me, is that I should be more careful before opening my mouth! My point is really about the practical difficulties of engaging in quite challenging intellectual debates in the un-even environment of a hybrid meeting where some people are f2f in the same room and others are remote. Or, to mis-quote William Gibson:
The future of hybrid events is here, it's just not evenly distributed yet.
(Note: none of this is intended to be critical of the minute-taker for the meeting who actually seems to have done a fantastic job of capturing a complex exchange of views in what must have been a difficult environment).
The use of iTunesU by UK universities has come up in discussions a couple of times recently, on Brian Kelly's UK Web Focus blog (What Are UK Universities Doing With iTunesU? and iTunes U: an Institutional Perspective) and on the closed ALT-C discussion list. In both cases, as has been the case in previous discussions, my response has been somewhat cautious, an attitude that always seems to be interpreted as outright hostility for some reason.
So, just for the record, I'm not particularly negative about iTunesU and in some respects I am quite positive - if nothing else, I recognise that the adoption of iTunesU is a very powerful motivator for the generation of openly available content and that has got to be a good thing - but a modicum of scepticism is always healthy in my view (particularly where commercial companies are involved) and I do have a couple of specific concerns about the practicalities of how it is used:
Firstly that students who do not own Apple hardware and/or who choose not to use iTunes on the desktop are not disenfranchised in any way (e.g. by having to use a less functional Web interface). In general, the response to this is that they are not and, in the absence of any specific personal experience either way, I have to concede that to be the case.
Secondly (and related to the first point), that in an environment where most of the emphasis seems to be on the channel (iTunesU) rather than on the content (the podcasts), that confusion isn't introduced as to how material is cited and referred to – i.e. do some lecturers only ever refer to 'finding stuff on iTunesU', while others offer a non-iTunesU Web URL, and others still remember to cite both? I'm interested in whether universities who have adopted iTunesU but who also make the material available in other ways have managed to adopt a single way of citing the material that is on offer?
Both these concerns relate primarily to the use of iTunesU as a distribution channel for teaching and learning content within the institution. They apply much less to its use as an external 'marketing' channel. iTunesU seems to me (based on a gut feel more than on any actual numbers) to be a pretty effective way of delivering OER outside the institution and to have a solid 'marketing win on the back of that. That said, it would be good to have some real numbers as confirmation (note that I don't just mean numbers of downloads here - I mean conversions into 'actions' (new students, new research opps, etc.)). Note that I also don't consider 'marketing' to be a dirty word (in this context) - actually, I guess this kind of marketing is going to become increasingly important to everyone in the HE sector.
There is a wider, largely religious, argument about whether "if you are not paying for it, you aren't the customer, you are part of the product" but HE has been part of the MS product for a long while now and, worse, we have paid for the privilege – so there is nothing particularly new there. It's not an argument that particularly bothers me one way or the other, provided that universities have their eyes open and understand the risks as well as the benefits. In general, I'm sure that they do.
On the other hand, while somebody always owns the channel, some channels seem to me to be more 'open' (I don't really want to use the word 'open' here because it is so emotive but I can't think of a better one) than others. So, for example, I think there are differences in an institution adopting YouTube as a channel as compared with adopting iTunesU as a channel and those differences are largely to do with the fit that YouTube has with the way the majority of the Web works.
This is a two-part meeting, the first part looking at the position of the DCMI Abstract Model in 2010, five years on from its becoming a DCMI Recommendation, from the perspective of a new context in which the emergence of the "Linked Data" approach has brought a wider understanding and take-up of the RDF model.
The second part of the meeting looks at the question of what the DCMI community calls "application profiles", descriptions of "structural patterns" within data, and "validation" against such patterns. Within the DCMI context, work in this area has built on the DCAM, in the form of the draft Description Set Profile specification. But, as I've mentioned before, there is interest in this topic within some sectors of the "Linked Data" community.
Our paper tries to outline the historical factors which led to the development of the DCAM, to take stock of the current position, and suggest a number of possible paths forward. The aim is to provide a starting point for discussions at the face-to-face meeting, and the suggestions for ways forward are not intended to be an exhaustive list, but we felt it was important to have some concrete choices on the table:
DCMI carries on developing DCAM as before, including developing the DSP specification and concrete syntaxes based on DCAM
DCMI develops a "DCAM 2" specification (initial rough draft here), simplified and better aligned with RDF, and with a cleaner sepration of syntax and semantics, and either:
develops the DSP specification and concrete syntaxes based on DCAM; or
treats "DCAM 2" as a clarification and a transitional step towards promoting the RDF model and RDF abstract syntax
DCMI deprecates the DCAM and henceforth promotes the RDF abstract syntax (and examines the question of "structural constraints" within this framework)
DCMI does nothing to change the statuses of existing DCAM-related specifications
For my own part, in 2010, I do rather tend to look at the DCAM as an artefact "of its time". The DCAM was created during a period when the DCMI community was between two world views, one, which I tend to think of as a "classical view", reflected in Tom's "A Grammar of Dublin Core" 2000 article for Dlib, and based on the use of "appropriate literals" - character strings - as values, and a second based on the RDF model, emphasising the use of URIs as global names and supported by a formal semantics. In developing the DCAM, we tried to do two things:
To provide a formalisation of that "classical" view, the "DCMI community" metadata model, if you like: in 2003, DCMI had "a typology of terms" but little consensus on the nature of the data structure(s) in which those terms were referenced.
If I'm honest, I think we've had limited success in these immediate aims. In creating the DCAM "description set model" we may have achieved the former in theory, but in practice people coming to the DCAM from a "classical Dublin Core" viewpoint found that model complicated, and difficult to reconcile with their own conceptualisations. So as a "community model" I suspect the "buy-in" from that community isn't as high as we might like to imagine! People coming to the Dublin Core vocabularies with some familiarity with the (much simpler) RDF model, on the other hand, were confused by, and/or didn't see the need for, the description set model. And a third (and perhaps larger still) constituency was engaged primarily in the use of XML-based metadata schemas (like MODS), with little or no notion of an abstract syntax distinct from the XML syntax itself.
However, I think the existence of the DCAM has perhaps provided some more positive outcomes in other areas.
First, I think the very existence of the DCAM helped advance discussions around comparing metadata standards from different communities, particularly in the initiatives championed by Mikael Nilsson in comparing Dublin Core and the IEEE Learning Object Metadata standard, by drawing attention to the importance of articulating the "abstract models" in use in standards when making such comparisons and when trying to establish conditions for "interoperability" between applications based on them. (This work is nicely summarised in a paper for the ProLEARN project Harmonization of Metadata Standards).
Second, while implementation of the Description Set Profile specification itself has been limited, it has provided a focus for exploring the question of describing structural patterns and performing structural validation, based not on concrete syntaxes and on e.g. XML schema technologies, but on the abstract syntax. A recent thread on the Library Linked Data Incubator Group mailing list, starting with Mikael Nilsson's post, provides a very interesting discussion of current thinking, and this area will be the focus of the second part of the Pittsburgh meeting.
Finally, it's probably stating the obvious that any choice of path forward needs to take into account that DCMI, like many similar organisations, finds itself in an environment in which resources, both human and financial, are extremely limited. Many individuals who devoted time and energy to DCMI activities in the past have shifted their energy to other areas - and while I continue to maintain some engagement with DCMI, mainly through the vocabulary management activity of the Usage Board, I include myself in this category. Many of the DCMI "community" mailing lists show little sign of activity, and what few postings there are seem to receive little response. And some organisations which in the past supported staff to work in this area are choosing to focus their resources elsewhere.
Against this background, more than ever, it seems to me, it is important for DCMI not to try to tackle problems in isolation, but rather to (re)align its approaches firmly with those of the Semantic Web community, to capitalise on the momentum - and the availability of tools, expertise and experience (and good old enthusiasm!) - being generated by the wider take-up of the "Linked Data" approach, and to explore solutions to what might appear to be "DC-specific" problems (but probably aren't) within that broader community. The fact that the Architecture meeting in Pittsburgh is a joint one seems like a good first step in this direction.
One specific issue that came up during discussions at the FAM10 conference (see my previous post) was about the use of 'attributes' vs 'entitlements' in the SAML messages passed from Identity Providers to Service Providers'. For the purposes of this discussion:
an attribute is some property of the individual - eye colour, age, sex and staff category being examples;
an entitlement is an indication of something that the person is allowed to do once they have been authenticated.
(Note: in practice, both attributes and entitlements (as used here) are carried as SAML attributes - the difference lies only in their semantics).
In most use-cases it is possible to use either attributes or entitlements to achieve a particular task. For example, individuals with a staff category of 'librarian' (an attribute) may be inferred by the Service Provider to be allowed to order new books within, say, a library management system - anyone with that attribute is allowed to do so. Alternatively, a 'bookOrdering' entitlement may be used - only people with that entitlement are allowed to order new books, irrespective of whether they are a librarian or not.
So, the question arose, when does one use an attribute and when does one use an entitlement?
In the discussion, I proposed a rule of thumb for making that decision, as follows:
Where you specifically want to control access to some resource or function, and particularly where such a requirement exists across multiple Service Providers, use an entitlement. Where you want to record a property of an individual, particularly where that property issued across multiple Identity Providers, and where different Service Providers may take different actions based on that property (e.g. one system may use the property to configure the user interface, another may use it to control access) use an attribute.
As mentioned previously, I spoke at the FAM10 conference in Cardiff last week, standing in for another speaker who couldn't make it and using material crowdsourced from my previous post, Key trends in education - a crowdsource request, to inform some of what I was talking about. The slides and video from my talk follow:
As it turns out, describing the key trends is much easier than thinking about their impact on federated access management - I suppose I should have spotted this in advance - so the tail end of the talk gets rather weak and wishy-washy. And you may disagree with my interpretation of the key trends anyway. But in case it is useful, here's a summary of what I talked about. Thanks to those of you who contributed comments on my previous post.
By way of preface, it seems to me that the core working assumptions of the UK Federation have been with us for a long time - like, at least 10 years or so - essentially going back to the days of the centrally-funded Athens service. Yet over those 10 years the Internet has changed in almost every respect. Ignoring the question of whether those working assumptions still make sense today, I think it certainly makes sense to ask ourselves about what is coming down the line and whether our assumptions are likely to still make sense over the next 5 years or so. Furthermore, I would argue that federated access management as we see it today in education, i.e. as manifested thru our use of SAML, shows a rather uncomfortable fit with the wider (social) web that we see growing up around us.
And so... to the trends...
The most obvious trend is the current financial climate, which won't be with us for ever of course, but which is likely to cause various changes while it lasts and where the consequences of those changes, university funding for example, may well be with us much longer than the current crisis. In terms of access management, one impact of the current belt-tightening is that making a proper 'business case' for various kinds of activities, both within institutions and nationally, will likely become much more important. In my talk, I noted that submissions to the UCISA Award for Excellence (which we sponsor) often carry no information about staff costs, despite an explicit request in the instructions to entrants to indicate both costs and benefits. My point is not that institutions are necessarily making the wrong decisions currently but that the basis for those decisions, in terms of cost/benefit analysis, will probably have to become somewhat more rigorous than has been the case to date. Ditto for the provision of national solutions like the UK Federation.
More generally, one might argue that growing financial pressure will encourage HE institutions into behaving more and more like 'enterprises'. My personal view is that this will be pretty strongly resisted, by academics at least, but it may have some impact on how institutions think about themselves.
Secondly, there is the related trend towards outsourcing and shared services, with the outsourcing of email and other apps to Google being the most obvious example. Currently that is happening most commonly with student email but I see no reason why it won't spread to staff email as well in due course. At the point that an institution has outsourced all its email to Google, can one assume that it has also outsourced at least part of its 'identity' infrastructure as well? So, for example, at the moment we typically see SAML call-backs being used to integrate Google mail back into institutional 'identity' and 'access management' systems (you sign into Google using your institutional account) but one could imagine this flipping around such that access to internal systems is controlled via Google - a 'log in with Google' button on the VLE for example. Eric Sachs, of Google, has recently written about OpenID in the Enterprise SaaS market, endorsing this view of Google as an outsourced identity provider.
Thirdly, there is the whole issue of student expectations. I didn't want to talk to this in detail but it seems obvious that an increasingly 'open' mashed and mashable experience is now the norm for all of us - and that will apply as much to the educational content we use and make available as it does to everything else. Further, the mashable experience is at least as much about being able to carry our identities relatively seamlessly across services as it is about the content. Again, it seems unclear to me that SAML fits well into this kind of world.
There are two other areas where our expectations and reality show something of a mis-match. Firstly, our tightly controlled, somewhat rigid approach to access management and security are at odds with the rather fuzzy (or at least fuzzilly interpretted) licences negotiated by Eduserv and JISC Collections for the external content to which we have access. And secondly, our over-arching sense of the need for user privacy (the need to prevent publishers from cross-referencing accesses to different resources by the same user for example) are holding back the development of personalised services and run somewhat counter to the kinds of things we see happening in mainstream services.
Fourthly, there's the whole growth of mobile - the use of smart-phones, mobile handsets, iPhones, iPads and the rest of it - and the extent to which our access management infrastructure works (or not) in that kind of 'app'-based environment.
Then there is the 'open' agenda, which carries various aspects to it - open source, open access, open science, and open educational resources. It seems to me that the open access movement cuts right to the heart of the primary use-case for federated access management, i.e. controlling access to published scholarly literature. But, less directly, the open science movement, in part, pushes researchers towards the use of more open 'social' web services for their scholarly communication where SAML is not typically the primary mechanism used to control access.
Similarly, the emerging personal learning environment (PLE) meme (a favorite of educational conferences currently), where lecturers and students work around their institutional VLE by choosing to use a mix of external social web services (Flickr, Blogger, Twitter, etc.) again encourages the use of external services that are not impacted by our choices around the identity and access management infrastructure and over which we have little or no control. I was somewhat sceptical about the reality of the PLE idea until recently. My son started at the City of Bath College - his letter of introduction suggested that he created himself a Google Docs account so that he could do his work there and submit it using email or Facebook. I doubt this is college policy but it was a genuine example of the PLE in practice so perhaps my scepticism is misplaced.
We also have the changing nature of the relationship between students and institutions - an increasingly mobile and transitory student body, growing disaggregation between the delivery of learning and accreditation, a push towards overseas students (largely for financial reasons), and increasing collaboration between institutions (both for teaching and research) - all of which have an impact on how students see their relationship with the institution (or institutions) with whom they have to deal. Will the notion of a mandated 3 or 4 year institutional email account still make sense for all (or even most) students in 5 or 10 years time?
In a similar way, there's the changing customer base for publishers of academic content to deal with. At the Eduserv Symposium last year, for example, David Smith of CABI described how they now find that having exposed much of their content for discovery via Google they have to deal with accesses from individuals who are not affiliated with any institution but who are willing to pay for access to specific papers. Their access management infrastructure has to cope with a growing range of access methods that sit outside the 'educational' space. What impact does this have on their incentives for conforming to education-only norms?
And finally there's the issue of usability, and particularly the 'where are you from' discovery problem. Our traditional approach to this kind of problem is to build a portal and try and control how the user gets to stuff, such that we can generate 'special' URLs that get them to their chosen content in such a way that they can be directed back to us seemlessly in order to login. I hate portals, at least insofar as they have become an architectural solution, so the less said the better. As I said in my talk, WAYFless URLs are an abomination in architectural terms, saved only by the fact that they work currently. In my presentation I played up the alternative usability work that the Kantara ULX group have been doing in this area, which it seems to me is significantly better than what has gone before. But I learned at the conference that Shibboleth and the UK WAYF service have both also been doing work in this area - so that is good. My worry though is that this will remain an unsolvable problem, given the architecture we are presented with. (I hope I'm wrong but that is my worry). As a counterpoint, in the more... err... mainstream world we are seeing a move towards what I call the 'First Bus' solution (on the basis that in many UK cities you only see buses run by the First Group (despite the fact that bus companies are supposed to operate in a free market)) where you only see buttons to log in using Google, Facebook and one or two others.
I'm not suggesting that this is the right solution - just noting that it is one strategy for dealing with an otherwise difficult usability problem.
Note that we are also seeing some consolidation around technology as well - notably OpenID and OAuth - though often in ways that hides it from public view (e.g. hidden behind a 'login with google' or 'login with facebook' button).
Which essentially brings me to my concluding screen - you know, the one where I talk about all the implications of the trends above - which is where I have less to say than I should! Here's the text more-or-less copy-and-pasted from my final slide:
‘education’ is a relatively small fish in a big pond (and therefore can't expect to drive the agenda)
mainstream approaches will win (in the end) - ignoring the difficult question of defining what is mainstream
the current financial climate will have an effect somewhere
HE institutions are probably becoming more enterprise-like but they are still not totally like commercial organisations and they tend to occupy an uncomfortable space between the ‘enterprise’ and the ‘social web’ driven by different business needs (c.f. the finance system vs PLEs and open science)
the relationships between students (and staff) and institutions are changing
In his opening talk at FAM10 the day before, David Harrison had urged the audience to become leaders in the area of federated access management. In a sense I want the same. But I also want us, as a community, to become followers - to accept that things happen outside our control and to stop fighting against them the whole time.
Unfortunately, that's a harder rallying call to make!
Your comments on any/all of the above are very much welcomed.
A blog about the Web, cloud infrastructure, linked data, big data, open access, digital libraries, metadata, learning, research, government, online identity, access management and anything else that takes our fancy by Pete Johnston and Andy Powell.