« October 2010 | Main | December 2010 »

November 29, 2010

Still here & some news from LOCAH

Contrary to appearances, I haven't completely abandoned eFoundations, but recently I've mostly been working on the JISC-funded LOCAH project which I mentioned here a while ago, and my recent scribblings have mostly been over on the project blog.

LOCAH is working on making available some data from the Archives Hub (a collection of archival finding-aids i.e. metadata about archival collections and their constituent items) and from Copac (a "union catalogue" of bibliographic metadata from major research & specialist libraries) as linked data.

So far, I've mostly been working with the EAD data, with Jane Stevenson and Bethan Ruddock from the Archives Hub. I've posted a few pieces on the LOCAH blog, on the high-level architecture/workflow, on the model for the archival description data (also here), and most recently on the URI patterns we're using for the archival data.

I've got an implementation of this as an XSLT transform that reads an EAD XML document and outputs RDF/XML, and have uploaded the results of applying that to a small subset of data to a Talis Platform instance. We're still ironing out some glitches but there'll be information about that on the project blog coming in the not too distant future.

On a personal note, I'm quite enjoying the project. It gives me a chance to sit down and try to actually apply some of the principles that I read about and talk about, and I'm working through some of the challenges of "real world" data, with all its variability and quirks. I worked in special collections and archives for a few years back in the 1990s, when the institutions where I was working were really just starting to explore the potential of the Web, so it's interesting to see how things have changed (or not! :-)), and to see the impact of and interest in some current technological (and other) trends within those communities. It also gives me a concrete incentive to explore the use of tools (like the Talis Platform) that I've been aware of but have only really tinkered with: my efforts in that space inevitably bring me face to face with the limited scope of my development skills, though it's also nice to find that the availability of a growing range of tools has enabled me to get some results even with my rather stumbling efforts.

It'a also an opportunity for me to discuss the "linked data" approach with the archivists and librarians within the project - in very concrete ways based on actual data - and to try to answer their questions and to understand what aspects are perceived as difficult or complex - or just different from existing approaches and practices.

So while some of my work necessarily involves me getting my head down and analysing input data or hacking away at XSLT or prodding datasets with SPARQL queries, I've been doing my best to discuss the principles behind what I'm doing with Jane and Bethan as I go along, and they in turn have reflected on some of the challenges as they perceive them in posts like Jane's here.

One of the project's tasks is to:

Explore and report on the opportunities and barriers in making content structured and exposed on the Web for discovery and use. Such opportunities and barriers may coalesce around licensing implications, trust, provenance, sustainability and usability.

I think we're trying to take a broad view of this aspect of the project, so that it extends not just to the "technical" challenges in cranking out data and how we address them, but also incorporates some of these "softer" elements of how we, as individuals with backgrounds in different "communities", with different practices and experiences and perspectives, share our ideas, get to grips with some of the concepts and terminology and so on. Where are the "pain points" that cause confusion in this particular context? Which means of explaining or illustrating things work, and which don't? What (if any!) is the value of the "linked data" approach for this sort of data? How is that best demonstrated? What are the implications, if any, for information management practices within this community? It may not be the case that SPARQL becomes a required element of archivists' training any time soon, but having these conversations, and reflecting on them, is, I think, an important part of the LOCAH experience.

November 26, 2010

Digital by default

This week saw publication of Martha Lane Fox's report, and the associated government response, about the future of the UK government's web presence, Directgov 2010 and beyond: Revolution not evolution:

The report, and the Government’s initial response, argues for a Channel Shift that will increasingly see public services provided digitally ‘by default’.

Hardly an earth shattering opener! I'm stifling a yawn as I write.

Delving deeper, the report itself makes quite a lot of sense to me, though I'm not really in a position to comment on its practicality. What I did find odd was the wording of the key recommendations, which I found to be rather opaque and confused:

  1. Make Directgov the government front end for all departments’ transactional online services to citizens and businesses, with the teeth to mandate cross government solutions, set standards and force departments to improve citizens’ experience of key transactions.
  2. Make Directgov a wholesaler as well as the retail shop front for government services & content by mandating the development and opening up of Application Programme Interfaces (APIs) to third parties.
  3. Change the model of government online publishing, by putting  a new central team in Cabinet Office in absolute control of the overall user experience across all digital channels, commissioning all government online information from other departments.
  4. Appoint a new CEO for Digital in the Cabinet Office with absolute authority over the user experience across all government online services (websites and APIs) and the power to direct all government online spending.

As has been noted elsewhere, it took a comment by Tom Loosemore on a post by Steph Gray to clarify the real intent of recommendation 3:

The *last* thing that needs to happen is for all online publishing to be centralised into one humungous, inflexible, inefficient central team doing everything from nots to bolts from a bunker somewhere deep in Cabinet Office.

The review doesn’t recommend that. Trust me! It does, as you spotted, point towards a model which is closer to the BBC – a federated commissioning approach, where ‘commissioning’ is more akin to the hands-off commissioning of a TV series, rather than micro-commissioning as per a newspaper editor. Equally, it recommends consistent, high-quality shared UI / design / functionality/serving. Crucially, it recommends universal user metrics driving improvement (or removal) when content can be seen to be underperforming.

So recommendation 3 really appears to mean "federated, and relatively hands-off, commissioning for content that is served at a single domain name". If so, why not simply say that?

One thing I still don't get is why there is such a divide in thinking between "transactional online services" and "information" services. Given the commerce-oriented language used in recommendation 2, this seems somewhat odd. When I use Amazon, I don't go to one place to carry out a transaction and another place to find information about products and service - I just go to the Amazon website. So the starting point for recommendation 1 feels broken from the outset (to me). It is recovered, in part, by the explanation about the real intent of recommendation 3 (which simply draws everything together in one place) but why start from that point at all? It just strikes me as overly confusing.

As I say, this is not a criticism of what is being suggested - just of how it has been suggested.

Tools for sharing - Posterous

There is latent value to others in what we are reading. I say latent because, often, knowledge about what we are reading is either not shared at all or is shared in ways that don't necessarily have much obvious impact. Value also comes at different levels. In some cases, reading something will result in a blog post in response. In others, an "I am reading X" tweet suffices. Indeed, some people seem to make almost exclusive use of Twitter for this purpose - and it's arguably quite effective. And then there's a the middle ground of stuff where you want to make a comment on what you are reading but you don't have the time or inclination to write a blog post and the 140 character limit of Twitter is too limiting to get your point across.

With that middle ground in mind, I've been playing with Posterous, channelled thru my personal hosting at aggregate.andypowe11.net. Nothing unusual in that I know... but it's taking me a while to figure out where the correct balance between Twitter, Posterous and this blog lies. Ditto the balance between personal and corporate. Oh, and then there's also Del.icio.us to think about just to keep things interesting!

My plan, such as it is, is to use Posterous as a place to lodge things that will eventually become full-blown blog posts. Hence the name - Aggregate is what you need to make eFoundations... get it! To date, that hasn't happened - the act of writing a one line comment for Posterous has been sufficient to get the thing out of my system.

We'll see... it may come to nothing.

November 09, 2010

Is (a debate about) 303 really necessary?

At the tail end of last week Ian Davis of Talis wrote a blog post entitled, Is 303 Really Necessary?, which gave various reasons why the widely adopted (within the Linked Data community) 303-redirect pattern for moving from the URI for a non-Information Resource (NIR) to the URI for an Information Resource (IR) was somewhat less than optimal and offering an alternative approach based on the simpler, and more direct, use of a 200 response. For more information about IRs, NIRs and the use of 303 in this context see the Web Architecture section of How to Publish Linked Data on the Web.

Since then the public-lod@w3.org mailing list has gone close to ballistic with discussions about everything from the proposal itself to what happens to the URI you have assigned to China (a NIR) if/when the Chinese border changes. One thing the Linked Data community isn't short of is the ability to discuss how much RDF you can get on the head of a pin ad infinitum.

But as John Sheridan pointed out in a post to the mailing list, there are down-sides to such a discussion, coming at a time when adoption of Linked Data is slowly growing.

debating fundamental issues like this is very destabilising for those of us looking to expand the LOD community and introduce new people and organisations to Linked Data. To outsiders, it makes LOD seem like it is not ready for adoption and use - which is deadly. This is at best the 11th hour for making such a change in approach (perhaps even 5 minutes to midnight?).

I totally agree. Unfortunately the '200 cat' is now well and truly out of the 'NIR bag' and I don't suppose any of us can put it back in again so from that perspective it's already too late.

As to the proposal itself, I think Ian makes some good points and, from my slightly uninformed perspective, I don't see too much wrong with what he is suggesting. I agree with him that being in the (current) situation where clients can infer semantics based on HTTP response codes is less than ideal. I would prefer to see all semantics carried explicitly in the various representations being served on the Web. In that sense, the debate about 303 vs 200 becomes one that is solely about the best mechanism for getting to a representation, rather than being directly about semantics. I also sense that some people are assuming that Ian is proposing that the current 303 mechanism needs to be deprecated at some point in the future, in favour of his 200 proposal. That wasn't my interpretation. I assume that both mechanisms can sit alongside each other quite happily - or, at least, I would hope that to be the case.

Student perspectives on technology in UK universities

Lawrie Phipps of the JISC has written a nice response to the recommendations of the report to HEFCE by the NUS (the UK National Union of Students), Student perspectives on technology – demand, perceptions and training needs [PDF], which makes a number of recommendations around ICT strategy, staff training and so on. Lawrie's contention is that the:

challenge arising from this report is not how to use more technology, nor how to integrate it into practice. The challenge is articulating our existing practice in ways that act as both an exemplar to students (and Support their own digital literacy), and enhance our practice by sharing the exemplary work that is already there.

From my perspective, the difference between "you're not using ICT effectively" and "we are using ICT effectively but nobody recognises that we're using ICT effectively" is somewhat moot. I prefer to see the report in terms of its findings not in terms of its recommendations (which, it seems to me, are really for universities to make anyway).

The point is that where the report indicates fairly fundamental issues, such as student "dissatisfaction that the type of technology used in HE is increasingly outdated" and that a "lack of staff engagement with the Virtual Learning Environment (VLE)" is frustrating for students, we either have to show those things not to be the case (I don't know, maybe they aren't) or acknowledge that whatever it is we are currently doing isn't working well enough? It seems hard to do the former in light of this report?

As a result, I'd tend to read the combination of the report and Lawrie's response as saying, "there are problems with the way ICT is being used to support teaching and learning in universities but we're already doing most of what the report recommends and therefore we need to do something else". Would that be unfair?

As an aside, I was struck by one of the themes highlighted by the report:

Participants expressed concerns over “surface learning” whereby a student only learns the bare minimum to meet module requirements – this behaviour was thought to be encouraged by ICT: students can easily skim-read material online, focusing on key terms rather than a broader base of understanding.

It seems harsh, to me, to lay the blame for this at the door of ICT. If there's a problem with "surface learning" (again, I can only go with what the report says here) then it presumably might have other causes... the pedagogic approaches and/or assessment strategies in use for example?

Me? I love skim-reading! I thought it was a key-skill? I got about 10 paragraphs past that point in the report and stopped reading! Surface learning FTW :-)

November 03, 2010

Google support for GoodRelations

Google have announced support for the GoodRelations vocabulary for product and price information in Web pages, Product properties: GoodRelations and hProduct. This is primarily of interest to ecommerce sites but is more generally interesting because it is likely to lead to a significant rise in the amount of RDF flowing around the Web. It therefore potentially represents a significant step forward for the adoption of the Semantic Web and Linked Data.

Martin Hepp, the inventor of the GoodRelations vocabulary, has written about this development, Semantic SEO for Google with GoodRelations and RDFa, suggesting a slightly modified form of markup which is compatible with that adopted by Google but that is also "understood by ALL RDFa-aware search engines, shopping comparison sites, and mobile services".

November 02, 2010

Google OAuth, OpenID and federated login research

In recent meetings on access management and single sign-on I've mentioned the usability work being done by the Kantara ULX Working Group and suggested that it represents real progress in terms of how the relatively complex 'federated login' experience should be presented to the end-user.

Eric Sachs of Google has written up some research that they've been doing in the same space - research that includes a significant mocked-up ecommerce website and videos covering the kinds of 'login' scenarios that they've been thinking about.

I think this represents a really interesting piece of work, especially if some of it is made available as open source code (as the post suggests might happen).

The website at openidsamplestore.com was built to demonstrate how a website that already allows users to login can help those users (and new users) leverage OpenID to login.  This provides a number of advantages for website owners such as:

  • Higher signup rates for new users and higher return/login rates by existing users
  • Lower customer support costs for handling problems with accounts
  • Improved account security by leveraging the security features and scale of large identity providers like Yahoo, Google, Microsoft, AOL, etc.

Users obviously also benefit from the improved user experience that can be achieved with OpenID.

The advantages outlined here seem, at first glance, to be most appropriate to e-commerce sites but I think they apply much more widely - to academic publishers, educational service providers, government websites, health websites and so on.

It'll be interesting to see how this work develops and whether the fact that it is being undertaken by Google means that it gains more traction and acceptance than might be the case with the Kantara work.

FleSSR public cloud infrastructure update

I wrote a brief update for the FleSSR project blog yesterday, covering some work we did last week at our (relatively new) Swindon Data Centre to build the initial infrastructure for the project's public cloud. I won't repeat any of that here but would just like to note that the FAS 3140 SAN cluster (Storage Area Network) that we are being loaned by NetApp via Q Associates for the duration of the project, of which we'll use about 10 Tbytes for FleSRR, will be up and running over the next couple of days meaning that this infrastructure will be substantial enough for some real testing.

As an aside, when Eduserv's new Swindon Data Centre originally opened all staff we're encouraged to go over from Bath to have a look round. I didn't bother because "what's the point of looking round a shed?" - it wasn't one of my more popular in-house comments :-)

As it happens, I was quite wrong... the Data Centre is actually quite impressive, not just because of the available space (which is much bigger than I was expecting) but also the quality of the one 'vault' that has been built so far and the associated infrastructure. It looks (to my eyes) like a great resource... now we've just got to get it used by our primary communities - education, government and health. I'm hopeful that FleSSR represents a small step towards what will eventually become a well-valued community resource.



eFoundations is powered by TypePad