Influence, connections and outputs
Martin Weller wrote an interesting blog post on Friday, Connections versus Outputs - interesting in the sense that I strongly disagreed with it - that discussed a system for assessing an individual's "prominence in the online community of their particular topic" by measuring their influence, betweenness and hubness (essentially their 'connectedness' to others in that community). Martin had used the system to assess the prominence of people and organisations working in the area of 'distance learning', suggesting that it might form a useful basis for further work looking at metrics for the new forms of scholarly communication that are enabled by the social Web. The algorithm adopted by the system was not available for discussion so one was left reacting to the results it generated.
I reacted somewhat negatively, largely on the basis that the system ranked Brian Kelly's UK Web Focus blog 6th most influential in that particular subject area. This is not a criticism of Brian (who is clearly influential in other areas), but the fact remains that Brian's blog contains only three posts where the phrase 'distance learning' appears, two of which are in comments left by other people and one of which is in a guest post - hardly indicative of someone who is highly influential in that particular subject area?
Why does Brian's blog appear in the list? Probably because he is very well connected to people who do write about distance learning. Unfortunately, that connectedness is not sufficient, on its own, to draw conclusions about his level of influence on that particular topic, so the whole process breaks down very quickly.
My concern is that if we present these kinds of rather poor metrics in any way seriously in counterpoint to more traditional (though still flawed) metrics like the REF we will ultimately do harm in trying to move forward any discussion around the assessment of scholarly communication in the age of the social Web.
To cut a long story short (you can see my fuller comments on the original post) I ended by suggesting that if we really want to develop "some sensible measure of scholarly impact on the social Web" then we have to step back and consider three questions:
- what do we want to measure?
- what can we measure?
- how can bring these two things close enough together to create something useful?
To try and answer my own questions I'll start with the first. I suggest that we want to try and measure two aspects of 'impact':
- the credibility of what an individual has to say on a topic,
- and the engagement of an individual within their 'subject' community and their ability to expose their work to particular audiences.
These two are clearly related, at least in the sense that someone's level of engagement in a community (their connectedness if you like) clearly increases the exposure of their work but is also indicative of the credibility they have within that community.
Having said that, my gut feeling is that credibility, at least for the purposes of scholarly communication, can only really be measured by some kind of a peer-review (i.e. human) process. Of course, on the Web, we are now very used to infering credibility based on the weighted number of inbound links that a resource receives, not least in the form of Google's PageRank algorithm. This works well enough for mainstream Web searching but I wouldn't want it used, at least not at any trivial level, to assess scholarly credibility or impact. Why not? Well a couple of things immediately spring to mind...
Firstly, a link is typically just a link at the moment, whether it's a hyperlink between two resources or the link between people in a social network. The link carries no additional semantics. If paper A critiques paper B then we don't want to link between them to result in paper B being measured as having more credibility/impact than it otherwise would have done had the critique not been written. (This is also true of traditional citations between journal articles of course, except that peer review mechanisms stop (most of) the real dross from ever seeing the light of day. On the Web, everything is there to be cited.)
Secondly, if we just consider blogging for a moment, the way a blog is written will have a big impact on how people react to it and that, in turn, might affect how we measure it. Blogs written in a more 'tabloid' style for example might well result in more commenting or inbound links than those written in a more academic style. We presumably don't want to end up measuring scholarly impact as though we were measuring newspaper circulation?
Thirdly, any metrics that we choose to use in the future will ultimately influence the way that scholarly communication happens. Take blog comments for example. A comment is typically not a first class Web object - comments don't have URIs for example. One can therefore make the argument that writing a comment on someone else's blog post is less easily measurable than writing a new blog post that cites the original. One might therefore expect to see less commenting and more blog-post writing (under a given set of metrics). While this isn't necessarily a bad thing, it seems to me that our behaviour should be driven by what works best for 'scholarly communication' not by what can be most easily 'measured'.
As I said in my first comment on Martin's post, "connectedness is cheap". On that basis, we have to be very careful before using any metrics that are wholly or largely based on measures of connectedness. The point is that the things we can measure easily (to return to the second part of my question above) are likely to be highly spammable (i.e. they can be gamed, either intentionally or by accident). Yes, OK... all measures are spammable, but some are more spammable than others! If we want to start assessing academics in terms of their engagement and output as part of the social Web then I think we need to start by answering my questions above rather than by showcasing rather poor examples of what can be automated now, except as a way of saying, "look, this is hard"!