Facebook’s Threat to the NYU Ad Observatory is an Attack on Ethical Data Collection
Late last week Facebook sent a legal threat to the NYU Ad Observatory, a research project that collects and studies political ads on Facebook. The timing of this threat could not be worse. The Ad Observatory is one of the best sources available to understand how political advertisements on being deployed on social media. But the threat is also is a threat to a critical form of ethical data collection, and an attempt by one of the most powerful and controversial websites in the world to cut off scrutiny of its practices.
Facebook calls what the Ad Observatory does “scraping,” but that’s not what this is. The data source, the “Ad Observer,” is a browser plugin. Researches often use plugins like these to collect information for data science and algorithmic accountability projects. It isn’t a script serially visiting websites in some independent or autonomous way from a human at a browser. It is a small piece of software, installed by the data subject, that extracts data from the local copy of a webpage that is inherently made when a person loads website onto a computer.
All data collection techniques are capable of good and bad uses, but collecting data through a browser plugin has qualities that make it preferable to scraping. A plugin more naturally gives an opportunity for data subject consent, or revocation of consent. You can, as the Ad Observer does, explain to the data subject what you plan to do before you do it, and give the data subject an unambiguous way to manifest their consent. As the Clearview AI case is making painfully obvious, scraping data off of sites is rarely done with this level of informed user choice.
On the technical side, collecting data in this way places no greater load on the site than a normal browser. It doesn’t get access at any deeper layer. The research project is not going to crash a site by accidentally overloading it with requests. It simply works off the data the user already gets — in the parlance of cases interpreting the Computer Fraud and Abuse Act, this tool is not changing the level access that a user gets to Facebook, it enables a subsequent use of data that Facebook is already allowing them to access.
Of course, like scraping, the technique is capable of abuse. It’s a program running on a data subject’s computer, so it inherently lets you do a lot of things. One could use a browser plugin to try to gather personal information about the data subject or their social circles, as some commercial plugins have done. That is likely why Facebook is risking further PR backlash by going after these researchers. As Alex Stamos initially noted and Mike Masnick has elaborated on, Facebook may feel pressured to argue that they are compelled to try to shut this down after their recent revised FTC consent decree, which came about after the Cambridge Analytica scandal, when a researcher lied to Facebook and exfiltrated massive amounts of user data.
But this is not Cambridge Analytica and most researchers are not Aleksandr Kogan. When designed and deployed like it is, a tool like the Ad Observatory is a sound and ethical way for researchers to shed light on critically important social activity. The professors by all accounts are professionals following ethical rules on human subjects research and are mindful about their methods, and Facebook knows this.
And perhaps most importantly for algorithmic accountability, and unlike most company-sponsored research, researchers using tools like these are not constrained or beholden to the platform. They act independently because their access is permissionless. The platform is in no position to exert influence on the research or silence any of the results. Facebook may defend its actions on the grounds of user privacy, but the real threat is in the surrender of control of scrutiny.
This technique of data collection is becoming more common, perhaps in part due to how well one can map good research ethics onto these tools. The students in my law clinic have advised numerous research projects that have adopted this technique. A best practice for designing a research plugin would include making the source code available for inspection, and even without it there are numerous ways to verify what data is collected from these plugins.
Facebook appears to assert the legal right to stop this research through because it is a violation of its terms of service, what is effectively a breach of contract claim. It is important to note, though, that Facebook also uses more serious laws like the Computer Fraud and Abuse Act to stop data exfiltration on its site. They continue to do so even as courts trend away from applying the CFAA to public websites, in part thanks to a federal appellate court case from 2016 that said Facebook is not a “public” website because users log in to see its contents. This gives Facebook a power that is denied to other major platforms like Twitter, Google, or LinkedIn. A CFAA claim in this case would be especially dubious, as the plugin merely works off of the local copy of a page that a Facebook user has already downloaded.
There are also reasons to refuse to enforce this contract. One court in New York recently noted that non-negotiated contracts that attempt to control copying of data off of websites should be preempted by federal copyright law, a law that allows many forms of copying to balance the rights of authors with the copying needed to create new expressions. As a matter of consumer protection, courts are empowered to declare contract terms like these unconscionable and thus unenforceable, especially when, as here, they are asserted to stop scrutiny of the company itself. This is in essence the concern that led Congress to enact the Consumer Review Fairness Act in 2016, which prohibits companies from restricting via contract a consumer’s ability to write a negative assessment or review of the company’s services.
Facebook should not be allowed to stop this critical examination of its website, nor should we assume that the actions of the NYU Ad Observatory are in any way nefarious or inappropriate. The technique they have chosen effectively balances data subject autonomy and the public interest in examining how Facebook places political advertisements. The law should defend it as an acceptable research technique, especially when it provides us a rare insight into how political advertisements are influencing voters in the most important election of our lifetime.