Google, Social Graphs, privacy & the Web
This has already received a fair amount of coverage elsewhere (Techcrunch, Danny Ayers, Read-Write Web (1), Joshua Porter (1), Read-Write Web (2), Joshua Porter (2), to pick just a few) but I thought it was worth providing a quick pointer. Last week Google announced the availability of what they are calling their Social Graph API.
The YouTube video by Brad Fitzpatrick provides a good overview:
This is a Google-provided service which offers a (service-specific) query interface to a dataset that is generated by crawling data publicly available on the Web in the form of:
- data embedded in HTML pages using the XFN microformat and
- RDF data using the Friend of a Friend (FOAF) vocabulary
- "other publicly declared connections" (Just now I'm a bit vague about what this includes)
Result sets are returned in the form of JSON documents.
On the technical side, I have seen a few critical comments (see discussion on Semantic Web Interest Group IRC channel) around some points of respecting Web architecture principles (e.g. the conflation of (URIs for) people and (URIs for) documents (see the draft W3C TAG finding Dereferencing HTTP URIs) and what looks like the introduction of an unnecessary new URI scheme (see the draft W3C TAG finding URNs, Namespaces and RegIstries)). And some concerns are also voiced about introducing dependency on a centralised Google-provided service - though of course the data is created and held independently and other providers could aggregate that data and offer similar services, even using the same interface (though whether they will be able to do so as effectively as Google can, given their experience in this area, and/or attract the user base which a Google service inevitably will, remains to be seen). And of course there are the usual issues of spamming and trust and the significance of reciprocation: who says "PeteJ is friends with XYZ" and what does XYZ have to say about that?
Overall, however, I think the approach of such a high-profile provider exposing data gathered from distributed, devolved, openly available sources on the Web, rather than from the database of a single social networking service, is being seen as a significant development.
I am quite excited about this in a positive manner. I do have great trepidation as this is exactly the tool social engineering hackers have been hoping for and working toward.
The Google SocialGraph API is exposing everybody who has not thought through their privacy or exposing of their connections.
And in particular, a post by Danah Boyd encourages us to reflect on the social, political and ethical implications of aggregating this data and facilitating access to that aggregation in this way, and reminds us that as individuals we live within a set of power relationships which mean that some are more vulnerable than others to the use of such technologies:
Being socially exposed is AOK when you hold a lot of privilege, when people cannot hold meaningful power over you, or when you can route around such efforts. Such is the life of most of the tech geeks living in Silicon Valley. But I spend all of my time with teenagers, one of the most vulnerable populations because of their lack of agency (let alone rights). Teens are notorious for self-exposure, but they want to do so in a controlled fashion. Self-exposure is critical for the coming of age process - it's how we get a sense of who we are, how others perceive us, and how we fit into the world. We exposure during that time period in order to understand where the edges are. But we don't expose to be put at true risk. Forced exposure puts this population at a much greater risk, if only because their content is always taken out of context. Failure to expose them is not a matter of security through obscurity... it's about only being visible in context.
Even if - as Google take pains to emphasise is the case - the individual data sources are already "public", the merging of data sources, and the change of the context in which information is presented can be significant.
The opposing view is perhaps most vividly expressed in Tim O'Reilly's comment:
The counter-argument is that all this data is available anyway, and that by making it more visible, we raise people's awareness and ultimately their behavior. I'm in the latter camp. It's a lot like the evolutionary value of pain. Search creates feedback loops that allow us to learn from and modify our behavior. A false sense of security helps bad actors more than tools that make information more visible.
One of my tests for whether a Web 2.0 innovation is "good", despite the potential for abuse, is whether it makes us smarter.
I left this post half-finished at this point last night feeling very uneasy with what I perceived as an undertone of almost Darwinian "ruthlessness" in the O'Reilly position, but at the same time struggling to articulate an alternative that I was really convinced of.
So I was delighted this morning when, on opening up my Bloglines feeds, I found an excellent post by Dan Brickley which I think reflects some of the ambivalence I was feeling ("The end of privacy by obscurity should not mean the death of privacy. Privacy is not dead, and we will not get over it. But it does need to be understood in the context of the public record"), and, really, I can only recommend that you read the post in full because I think it's a very sensitive, measured contribution to the debate, based on Dan's direct experience of the issues arising from the deployment of these technologies over several years working on FOAF.
And, far from sitting on the fence, Dan concludes with very practical recommendations for action:
- Best practice codes for those who expose, and those who aggregate, social Web data
- Improved media literacy education for those who are unwittingly exposing too much of themselves online
- Technology development around decentralised, non-public record communication and community tools (eg. via Jabber/XMPP)
Google's announcement of this API has certainly brought both the technical and the social issues to the attention of a wider audience, and sparked some important debate, and perhaps that in itself is a significant contribution in an area where the landscape suddenly seems to be shifting very quickly indeed.
And if I can unashamedly take the opportunity to make a another plug for the activities of the Foundation, I'm sure there's plenty of food for thought here for anyone considering a proposal to the current Eduserv Research Grants call :-)