Bad Data “For Good”: How Data Brokers Try to Hide Behind Academic Research

3 months 2 weeks ago

When data broker SafeGraph got caught selling location information on Planned Parenthood visitors, it had a public relations trick up its sleeve. After the company agreed to remove family planning center data from its platforms in response to public outcry, CEO Auren Hoffman tried to flip the narrative: he claimed that his company’s harvesting and sharing of sensitive data was, in fact, an engine for beneficial research on abortion access. He even argued that SafeGraph’s post-scandal removal of the clinic data was the real problem: “Once we decided to take it down, we had hundreds of researchers complain to us about…taking that data away from them.” Of course, when pressed, Hoffman could not name any individual researchers or institutions.

SafeGraph is not alone among location data brokers in trying to “research wash” its privacy-invasive business model and data through academic work. Other shady actors like Veraset, Cuebiq, Spectus, and X-Mode also operate so-called “data for good” programs with academics, and have seized on the pandemic to expand them. These data brokers provide location data to academic researchers across disciplines, with resulting publications appearing in peer-reviewed venues as prestigious as Nature and the Proceedings of the National Academy of Sciences. These companies’ data is so widely used in human mobility research—from epidemic forecasting and emergency response to urban planning and business development—that the literature has progressed to meta-studies comparing, for example, Spectus, X-Mode, and Veraset datasets

Data brokers variously claim to be bringing “transparency” to tech or “democratizing access to data.” But these data sharing programs are nothing more than data brokers’ attempts to control the narrative around their unpopular and non-consensual business practices. Critical academic research must not become reliant on profit-driven data pipelines that endanger the safety, privacy, and economic opportunities of millions of people without any meaningful consent. 

Data Brokers Do Not Provide Opt-In, Anonymous Data

Location data brokers do not come close to meeting human subjects research standards. This starts with the fact that meaningful opt-in consent is consistently missing from their business practices. In fact, Google concluded that SafeGraph’s practices were so out of line that it banned any apps using the company’s code from its Play Store, and both Apple and Google banned X-Mode from their respective app stores. 

Data brokers frequently argue that the data they collect is “opt-in” because a user has agreed to share it with an app—even though the overwhelming majority of users have no idea that it’s being sold on the side to data brokers who in turn sell to businesses, governments, and others. Technically, it is true that users have to opt in to sharing location data with, say, a weather app before it will give them localized forecasts. But no reasonable person believes that this constitutes blanket consent for the laundry list of data sharing, selling, and analysis that any number of shadowy third parties are conducting in the background. 

No privacy-preserving aggregation protocols can justify collecting location data from people without their consent.

On top of being collected and shared without consent, the data feeding into data brokers’ products can easily be linked to identifiable people. The companies claim their data is anonymized, but there’s simply no such thing as anonymous location data. Information about where a person has been is itself enough to re-identify them: one widely cited study from 2013 found that researchers could uniquely characterize 50% of people using only two randomly chosen time and location data points. Data brokers today collect sensitive user data from a wide variety of sources, including hidden tracking in the background of mobile apps. While techniques vary and are often hidden behind layers of non-disclosure agreements (or NDAs), the resulting raw data they collect and process is based on sensitive, individual location traces.

Aggregating location data can sometimes preserve individual privacy, given appropriate parameters that take into account the number of people represented in the data set and its granularity. But no privacy-preserving aggregation protocols can justify the initial collection of location data from people without their voluntary, meaningful opt-in consent, especially when that location data is then exploited for profit and PR spin. 

Data brokers’ products are notoriously easy to re-identify, especially when combined with other data sets. And combining datasets is exactly what some academic studies are doing. Published studies have combined data broker location datasets with Census data, real-time Google Maps traffic estimates, and local household surveys and state Department of Transportation data. While researchers appear to be simply building the most reliable and comprehensive possible datasets for their work, this kind of merging is also the first step someone would take if they wanted to re-identify the data. 


Data brokers are not good sources of information about data brokers, and researchers should be suspicious of any claims they make about the data they provide. As Cracked Labs researcher Wolfie Christl puts it, what data brokers have to offer is “potentially flawed, biased, untrustworthy, or even fraudulent.” 

Some researchers incorrectly describe the data they receive from data brokers. For example, one paper describes SafeGraph data as “anonymized human mobility data” or “foot traffic data from opt-in smartphone GPS tracking.” Another describes Spectus as providing “anonymous, privacy-compliant location data” with an “ironclad privacy framework.” Again, this location data is not opt-in, not anonymized, and not privacy-compliant.

Other researchers make internally contradictory claims about location data. One Nature paper characterizes Veraset’s location data as achieving the impossible feat of being both “fine-grained” and “anonymous.” This paper further states it used such specific data points as “anonymized device IDs” and “the timestamps, and precise geographical coordinates of dwelling points” where a device spends more than 5 minutes. Such fine-grained data cannot be anonymous. 

All of this should be a red flag for Institutional Review Boards, which need visibility into whether data brokers actually obtain consent.

A Veraset Data Access Agreement obtained by EFF includes a Publicity Clause, giving Veraset control over how its partners may disclose Veraset’s involvement in publications. This includes Veraset’s prerogative to approve language or remain anonymous as the data source. While the Veraset Agreement we’ve seen was with a municipal government, its suggested language appears in multiple academic publications, which suggests a similar agreement may be in play with academics.

A similar pattern appears in papers using X-Mode data: some use nearly verbatim language to describe the company. They even claim its NDA is a good thing for privacy and security, stating: “All researchers processed and analyzed the data under a non-disclosure agreement and were obligated to not share data further and not to attempt to re-identify data.” But those same NDAs prevent academics, journalists, and others in civil society from understanding data brokers’ business practices, or identifying the web of data aggregators, ad tech exchanges, and mobile apps that their data stores are built on.

All of this should be a red flag for Institutional Review Boards, which review proposed human subjects research and need visibility into whether and how data brokers and their partners actually obtain consent from users. Likewise, academics themselves need to be able to confirm the integrity and provenance of the data on which their work relies.

From Insurance Against Bad Press to Accountable Transparency

Data sharing programs with academics are only the tip of the iceberg. To paper over the dangerous role they play in the online data ecosystem, data brokers forge relationships not only with academic institutions and researchers, but also with government authorities, journalists and reporters, and non-profit organizations. 

The question of how to balance data transparency with user privacy is not a new one, and it can’t be left to the Verasets and X-Modes of the world to answer. Academic data sharing programs will continue to function as disingenuous PR operations until companies are subjected to data privacy and transparency requirements. While SafeGraph claims its data could pave the way for impactful research in abortion access, the fact remains that the very same data puts actual abortion seekers, providers, and advocates in danger, especially in the wake of Dobbs. The sensitive data location data brokers deal in should only be collected and used with specific, informed consent, and subjects must have the right to withdraw that consent at any time. No such consent currently exists.

We need comprehensive federal consumer data privacy legislation to enforce these standards, with a private right of action to empower ordinary people to bring their own lawsuits against data brokers who violate their privacy rights. Moreover, we must pull back the NDAs to allow research investigating these data brokers themselves: their business practices, their partners, how their data can be abused, and how to protect the people whom data brokers are putting in harm’s way.

Gennie Gebhart

General Monitoring is not the Answer to the Problem of Online Harms

3 months 2 weeks ago

Even if you think that online intermediaries should be more proactive in detecting, deprioritizing, or removing certain user speech, the requirements on intermediaries to review all content before publication—often called “general monitoring” or “upload filtering”—raises serious human rights concerns, both for freedom of expression and for privacy.

General monitoring is problematic both when it is directly required by law and when, though not required, it is effectively mandatory because the legal risks of not doing it are so great. Specifically, these indirect requirements incentivize platforms to proactively monitor user behaviors, filter and check user content, and remove or locally filter anything that is controversial, objectionable, or potentially illegal to avoid legal responsibility. This inevitably leads to over censorship of online content as platforms seek to avoid liability for failing to act “reasonably” or remove user content they “should have known” was harmful.

Whether directly mandated or strongly incentivized, general monitoring is bad for human rights and for users. 

  • As the scale of online content is so vast, general monitoring commonly uses automated decision-making tools that reflect the dataset’s biases and lead to harmful profiling.
  • These automated upload filters are prone to error, are notoriously inaccurate, and tend to overblock legally protected expressions.
  • Upload filters also contravene the foundational human rights principles of proportionality and necessity by subjecting users to automated and often arbitrary decision-making.
  • The active observation of all files uploaded by users has a chilling effect on freedom of speech and access to information by limiting the content users can post and engage with online.
  • A platform reviewing every user post also undermines users privacy rights by providing companies, and thus potentially government agencies, with abundant data about users. This is particularly threatening to anonymous speakers.
  • Pre-screening can lead to enforcement overreach, fishing expeditions (undue evidence exploration), and data retention.
  • General monitoring undermines the freedom to conduct business, adds compliance costs, and undermines alternative platform governance models.
  • Monitoring technologies are even less effective at small platforms, which don’t have the resources to develop sophisticated filter tools. General monitoring thus cements the gatekeeper role of a few power platforms and further marginalizes alternative platform governance models.

We have previously expressed concern about governments employing more aggressive and heavy-handed approaches to intermediary regulation, with policymakers across the globe calling on platforms to remove allegedly legal but ‘undesirable’ or ‘harmful’ content from their sites, while also expecting platforms to detect and remove illegal content. In doing so, states fail to protect fundamental freedom of expression rights and fall short of their obligations to ensure a free online environment with no undue restrictions on legal content, whilst also restricting the rights of users to share and receive impartial and unfiltered information. This has a chilling effect on the individual right to free speech wherein users change their behavior and abstain from communicating freely if they know they are being actively observed—leading to a pernicious culture of self-censorship.

In one of the more recent policy developments on intermediary liability, the European Union recently approved the Digital Services Act (DSA). The DSA rejects takedown deadlines that would have suppressed legal, valuable, and benign speech. EFF helped to ensure that the final language steered clear of intrusive filter obligations. By contrast, the draft UK Online Safety Bill raises serious concerns around freedom of expression by imposing a duty of care on online platforms to tackle illegal and otherwise harmful content and to minimize the presence of certain content types. Intrusive scanning of user content will be unavoidable if this bill becomes law.

So how do we protect user rights to privacy and free speech whilst also ensuring illegal content can be detected and removed? EFF and other NGOs have developed the Manila Principles which emphasize that intermediaries shouldn’t be held liable for user speech unless the content in question has been fully adjudicated as illegal and a court has validly ordered its removal. It should be up to independent, impartial, and autonomous judicial authorities to determine that the material at issue is unlawful. Elevating courts to adjudicate content removal means liability is no longer based on the inaccurate and heavy-handed decisions of platforms. This would also ensure that takedown orders are limited to the specific piece of illegal content as decided by courts or similar authority. 

EFF has also previously urged that regulators ensure online intermediaries continue to benefit from exemptions on liability for third-party content, and any additional obligations must not curtail free expression and consumer innovation. To restrict content, these rules must be provided by laws; be precise, clear, and accessible; and must follow due process and respect the principle that independent judicial authorities should assess content and decide on its restriction. Decisively, intermediaries should not be held liable if they choose not to remove content based on a mere notification by users. 

Regulators must take more effective voluntary actions against harmful content and adopt moderation frameworks that are consistent with human rights to make the internet free and limit the power of government agencies in flagging and removing potentially illegal content.

Paige Collings

EFF & ACLU Brief: SFPD Violated Surveillance Law by Spying on Protests for Black Lives

3 months 3 weeks ago
Police used private network of 300 surveillance cameras to spy on George Floyd protests in 2020, plaintiffs tell appeals court.

SAN FRANCISCO–San Francisco police violated the city’s surveillance technology law by tapping into a private surveillance camera network to spy on demonstrators protesting the 2020 police murder of George Floyd, the Electronic Frontier Foundation (EFF) and American Civil Liberties Union Foundation of Northern California (ACLU) told a state appeals court in a brief filed Monday.

San Francisco’s ordinance requires police to get the Board of Supervisors’ permission before acquiring or borrowing surveillance technology. But the San Francisco Police Department (SFPD) obtained no such permission before officers monitored the Union Square Business Improvement District’s network of more than 300 cameras for eight days during the protests.

The EFF and ACLU represent three Black and Latinx activists who organized and participated in the protests and say the police’s illegal spying chills their willingness and ability to attend or organize future demonstrations.

Their brief asks the California Court of Appeal First Appellate District to overturn a San Francisco Superior Court judge’s ruling in the city’s favor. The lower court erroneously found that because SFPD had monitored a few business district cameras once before for a 24-hour period during the 2019 Pride Parade, it was fine for them–under a “grace period” subsection of the ordinance–to use the entire 300-camera network for eight days during the 2020 George Floyd protests without the Board’s permission.

“While thousands of people were peacefully protesting police abuses, the SFPD violated the law by unlawfully surveilling them,” said EFF Staff Attorney Saira Hussain. “The lower court’s erroneous interpretation of the ordinance would allow the SFPD to create a vast new spying program based on one prior use of a surveillance technology. This is exactly the kind of lack of transparency that the San Francisco supervisors were trying to prevent when they passed the ordinance.”

Since the initial lawsuit was filed, SFPD repeatedly has attempted to change the law that governs how police can access third-party cameras. Earlier this year, Mayor Breed introduced and then withdrew a ballot measure to gut the city’s surveillance technology ordinance. Now, supervisors are considering a separate proposal that would grant SFPD sweeping access to the thousands of private surveillance cameras located within the city. Opposition to the proposal has been strong, with hundreds of residents speaking out alongside a coalition of community and civil rights organizations.

“First the SFPD broke the law to spy on racial justice protesters. Now they’re trying to change the law to give themselves expansive new surveillance powers that endanger our rights and safety,” said Nicole Ozer, Technology and Civil Liberties Director at the ACLU of Northern California. “San Francisco is in an uproar because residents know that to protect activists, abortion seekers, and the city’s diverse communities, we need to pass even stronger privacy laws, not allow more dangerous surveillance.”

The case is Williams v. San Francisco, Appellate Case No. A165040

For the appellate brief:

For more on this case: 

For more on police spying technology:

Contact:  SairaHussainStaff
Josh Richman
2 hours 49 minutes ago
EFF's Deeplinks Blog: Noteworthy news from around the internet
Subscribe to EFF update feed