Acemoglu & company on privacy externalities – is there really harm in inference?



Former Economic Advisor to the Secretary of State, Department of Business, Energy and Industrial Strategy

Ask me anything

Acemoglu and co-authors have this really impressive theory paper out on data externalities and why these mean that data markets will probably be inefficient, why there will be too much data shared, and why people who care a lot about privacy might end up demonstrating a willingness to sell data at a low price.

For all its dazzle, I have a problem with the notion of privacy that lies at the heart of the analysis, and this post tries to explain why I think their notion of privacy s insufficiently specific to address the policy problems we face around privacy and competition. The reason I think it’s worth insisting on this is that I worry about our making our case against GAFA power a privacy case. There is great social good to be had from the responsible use of digital trails, and the more we talk up privacy harms, the harder we make it to use data for good.


The basic externality that Acemoglu & co tease out is this. When I reveal information about myself, the recipient can then use that information to make predictions about other people. If the recipient can ascertain that some other person is, in a statistical sense, very similar to me, then my revelation adds to the recipient’s inferred knowledge of the other person.

There is no doubt that this is an interesting externality. But I am not sure it is a good general model for a breach of privacy. Let me try out a few examples. Remember the ashley madison scandal? Ashley Madison was a website that facilitated extra-marital affairs. The platform was hacked and the names of all those who’d signed up was published online. This is a pretty clear breach of privacy. People signed up to a service they assumed to be confidential and found that information that might be thought prejudicial, even shameful, was then made public. Indeed, there were the immensely sad cases of suicides apparently linked to the breach of privacy.

The Acemoglu & co externality, however, does not apply here. Rather, to continue with this example, imagine the following. Say that back when it was still going, Ashley Madison was developing a marketing campaign & analyses its existing users to identify the characteristics of people likely to sign-up. It discovers that middle-aged men who read certain Reddit groups are good targets. Now, I go to one of these Reddit groups, and Reddit has identified me as a middle-aged man. So Ashley Madison infers that I’m a good lead and so advertises to me. In this case, their current user base, in the terminology of Acemoglu & co, has leaked information that allows an inference to be made about me.

Has my privacy been violated because the inference has been made? Certainly not in the way that the privacy of actual users was violated by the hack. One can plausibly lead to suicide, the other does not. There is an important sense in which the most valued type of privacy – the information whose distribution I most want to control – are facts about me, not inferences about me. “You are an adulterer” is very different from “you have the profile of an adulterer”. In Orwell’s 1984, what’s really chilling is that the all-seeing two-way telescreen is inescapable. And remember the scene when the senior Party official is shown to have that ultimate of luxuries – the ability to turn off the surveillance for 30 minutes?

So is there anything creepy about having our characteristics inferred? In other words, Acemoglu & co rely for the applicability of their results on the notion that there are people who care a lot about being the object of an inference. Is this plausible?

Acemoglu & co offer this example to motivate their concept:

“Some of the issues we emphasize are highlighted by the Cambridge Analytica scandal. The company acquired the private information of millions of individuals from data shared by 270,000 Facebook users who voluntarily downloaded an app for mapping their personality traits, called “This is your digital life”. The app accessed users’ news feed, timeline, posts and messages, and revealed information about other Facebook users. Cambridge Analytica was ultimately able to infer valuable information about more than 50 million Facebook users, which it deployed for designing personalized political messages and advertising in the Brexit referendum and 2016 US presidential election.” 

The Facebook ads feature that reflects the Acemoglu & co externality is the creation of “custom audiences” – you give Facebook a list of users and ask Facebook to create a much larger target set of similar users. That is the heart of Facebook’s micro-targeting capabilities.

Now, I certainly agree that the large-scale political manipulation that micro-targeting has allowed has been very bad for democratic politics. But I think that this might be a question of electoral law and how we organise democracy rather than a more general question about the creepiness of inference.

So take the example away from the charged area of democracy and consider it for “unproblematic” goods. In the old days, I would buy a computer magazine in the newsagent, and that would be stuffed full of computer ads. This was micro-targeted advertising, where the inference was that if I bought that sort of magazine, I’d be a good lead. I claim that this sort of inference is not creepy, and that it is analogous to almost all the inference that the online ad industry performs.

There are more problematic goods. Say I’m a recovering gambling addict. I really want to resist temptation. But casual inference by Facebook will put gambling ads in front of me. General advertising law recognises that there are problematic categories of goods – especially addictive ones – and that these need regulation. And micro-targeting probably needs that these regulations need to be updated. Again, I think these are special cases, rather than a generalised creepiness around inference.

Judith Decew provides a very useful analysis of privacy that can help to illuminate what the harm related to the Acemoglu & co externality might be. She describes two broad notions of privacy, which she calls “tort” and “constitutional”. Tort privacy arises where information about you can be used against you – kompromat is the obvious extreme case. This sort of privacy leads us to want to exercise some control over to whom goes information about ourselves and for what uses. The second she characterises as being able to make certain decisions “without being unduly influenced” by any particular pressures – this is the sense of privacy as a necessary condition to liberty.

The  Acemoglu & co externality could be implicated in infringements of tort privacy – for example, I might be identified as part of a class that can be charged a higher price by a price discriminator because of others’ data leaks. The externality is certainly a necessary condition for the harm to arise; but I would still deny that the inference actually constitutes the harm. The problem is the dominance that allows the price discrimination.

Why does the distinction matter? Because inference can produce good ouctomes, and we can think of good ways to solve the problem of exploitation through personalisation that does not restrict access to inference (which is one of the solutions that Acemoglu & Co propose).

The  Acemoglu & co externality could also be implicated in infringements of constitutional privacy – perhaps the best example is the voting one. I want to decide who to vote for in a considered way, not pushed this way or that by a process I do not understand. But guarding against this – cultivating my autonomy, as it were – is a complicated matter. Being treated as the “average voter” by mass media is not a magic, neutral point – that pushes me around just as much as being micro-targeted might do. And so is the harm the inference, or is the harm the “undue influence”. Regulation – and habits – to avoid the latter are complex and, it seems to me, are the real challenge.

In both types of privacy infringements, I worry that the Acemoglu et al externality diverts us from the hard problems. Being categorised by inference can harm me, but it can also be very beneficial. And living in mass society would be impossible without it. I think that we have to get under the surface of the harms caused by inference to design proper regulatory interventions, rather than rely on too-broad a notion of privacy externalities. To do that risks excluding all the good we can do with the new inferential power in large-scale data trails.