EFF to Department Homeland Security: No Social Media Surveillance of Immigrants

2 months 1 week ago

EFF submitted comments to the Department of Homeland Security (DHS) and its subcomponent U.S. Citizenship and Immigration Services (USCIS), urging them to abandon a proposal to collect social media identifiers on forms for immigration benefits. This collection would mark yet a further expansion of the government’s efforts to subject immigrants to social media surveillance, invading their privacy and chilling their free speech and associational rights for fear of being denied key immigration benefits.

Specifically, the proposed rule would require applicants to disclose their social media identifiers on nine immigration forms, including applications for permanent residency and naturalization, impacting more than 3.5 million people annually. USCIS’s purported reason for this collection is to assist with identity verification, as well as vetting and national security screening, to comply with Executive Order 14161. USCIS separately announced that it would look for “antisemitic activity” on social media as grounds for denying immigration benefits, which appears to be related to the proposed rule, although not expressly included it.

Additionally, a day after the proposed rule was published, Axios reported that the State Department, the Department of Justice, and DHS confirmed a joint collaboration called “Catch and Revoke,” using AI tools to review student visa holders’ social media accounts for speech related to “pro-Hamas” sentiment or “antisemitic activity.”

If the proposed rule sounds familiar, it’s because this is not the first time the government has proposed the collection of social media identifiers to monitor noncitizens. In 2019, for example, the State Department implemented a policy requiring visa and visa waiver applicants to the United States to disclose the identifiers they used on some 20 social media platforms over the last five years—affecting over 14.7 million people annually. EFF joined a large contingent of civil and human rights organizations in objecting to that collection. That policy is now the subject of ongoing litigation in Doc Society v. Blinken, a case brought by two documentary film organizations, who argue that the rule affects the expressive and associational rights of their members by impeding their ability to collaborate and engage with filmmakers around the world. EFF filed two amicus briefs in that case.

What distinguishes this proposed rule from the State Department’s existing program is that most, if not all, of the noncitizens who would be affected currently legally reside in the United States, allowing them to benefit from constitutional protections.

In our comments, we explained that surveillance of even public-facing social media can implicate privacy interests by aggregating a wealth of information about both an applicant for immigration benefits, and also people in their networks, including U.S. citizens. This is because of the quantity and quality of information available on social media, and because of its inherent interconnected nature.

We also argued that the proposed rule appears to allow for the collection and consideration of First Amendment-protected speech, including core political speech, and anonymous and pseudonymous speech. This inevitably leads to a chilling effect because immigration benefits applicants will have to choose between potentially forgoing key benefits or self-censoring to avoid government scrutiny. That is, to help ensure that a naturalized citizenship application is not rejected, for example, an applicant may avoid speaking out on social media about American foreign policy or expressing views about other political topics that may be considered controversial by the federal government—even when other Americans are free to do so.

We urge DHS and USCIS to abandon this dangerous proposal.

Saira Hussain

EFF to Court: Young People Have First Amendment Rights

2 months 1 week ago

Utah cannot stifle young people’s First Amendment rights to use social media to speak about politics, create art, discuss religion, or to hear from other users discussing those topics, EFF argued in a brief filed this week.

EFF filed the brief in NetChoice v. Brown, a constitutional challenge to the Utah Minor Protection in Social Media Act. The law prohibits young people from speaking to anyone on social media outside of the users with whom they are connected or those users’ connections. It also requires social media services to make young people’s accounts invisible to anyone outside of that same subgroup of users. The law requires parents to consent before minors can change those default restrictions.

To implement these restrictions, the law requires a social media service to verify every user’s age so that it knows whether to apply those speech-restricting settings.

The law therefore burdens the First Amendment rights of both young people and adults, the friend-of-the-court brief argued. The ACLU, Freedom to Read Foundation, LGBT Technology Institute, TechFreedom, and Woodhull Freedom Foundation joined EFF on the brief.

Utah, like many states across the country, has sought to significantly restrict young people’s ability to use social media. But “Minors enjoy the same First Amendment right as adults to access and engage in protected speech on social media,” the brief argues. As the brief details, minors use social media for to express political opinions, create art, practice religion, and find community.

Utah cannot impose such a severe restriction on minors’ ability to speak and to hear from others on social media without violating the First Amendment. “Utah has effectively blocked minors from being able to speak to their communities and the larger world, frustrating the full exercise of their First Amendment rights,” the brief argues.

Moreover, the law “also violates the First Amendment rights of all social media users—minors and adults alike—because it requires every user to prove their age, and compromise their anonymity and privacy, before using social media.”

Requiring internet users to provide their ID or other proof of their age could block people from accessing lawful speech if they don’t have the right form of ID, the brief argues. And requiring users to identify themselves infringes on people’s right to be anonymous online. That may deter people from joining certain social media services or speaking on certain topics, as people often rely on anonymity to avoid retaliation for their speech.

Finally, requiring users to provide sensitive personal information increases their risk of future privacy and security invasions, the brief argues.

Aaron Mackey

【憲法大集会】3万8千人高らかに=古川 英一

2 months 1 week ago
  憲法記念日、東京は前日の雨が上がって五月晴れになった。憲法を守る市民団体が今年も有明防災公園を会場に憲法大集会を開いた。旗やプラカードを掲げた市民が緑の芝生を埋めた。 集会では、ノーベル平和賞を去年受賞した日本原水爆被害者団体協議会代表の田中煕巳さん=写真=が壇上に上がり「被団協が受賞したのは、この数年世界で核戦争の危機が高まり、もう一度その役割を果たしてほしいという願いの表れではないか」と述べた。そして「皆さん方が私たちのこれまでの努力を引き継いで核兵器も戦争もない世界..
JCJ

Keeping the Web Up Under the Weight of AI Crawlers

2 months 1 week ago

If you run a site on the open web, chances are you've noticed a big increase in traffic over the past few months, whether or not your site has been getting more viewers, and you're not alone. Operators everywhere have observed a drastic increase in automated traffic—bots—and in most cases attribute much or all of this new traffic to AI companies.

Background

AI—in particular, Large Language Models (LLMs) and generative AI (genAI)—rely on compiling as much information from relevant sources (i.e., "texts written in English" or "photographs") as possible in order to build a functional and persuasive model that users will later interact with. While AI companies in part distinguish themselves by what data their models are trained on, possibly the greatest source of information—one freely available to all of us—is the open web.

To gather up all that data, companies and researchers use automated programs called scrapers (sometimes referred to by the more general term "bots") to "crawl" over the links available between various webpages and save the types of information they're tasked with as they go. Scrapers are tools with a long, and often beneficial, history: services like search engines, the Internet Archive, and all kinds of scientific research rely on them.

When scrapers are not deployed thoughtfully, however, they can contribute to higher hosting costs, lower performance, and even site outages, particularly when site operators see so many of them in operation at the same time. In the long run all this may lead to some sites shutting down rather than bearing the brunt of it.

For-profit AI companies must ensure they do not poison the well of the open web they rely on in a short-sighted rush for training data.

Bots: Read the Room

There are existing best practices those who use scrapers should follow. When bots and their operators ignore these guideposts it sends a signal to site operators, sometimes explicitly, that they can or should cut off their access, impede performance, and in the worst case it may take a site down for all users. Some companies appear to follow these practices most of the time, but we see increasing reports and evidence of new bots that don't.

First, where possible, scrapers should follow instructions given in a site's robots.txt file, whether those are to back off to a certain crawling rate, exclude certain paths, or not to crawl the site at all.

Second, bots should send their requests with a clearly labeled User Agent string which indicates their operator, their purpose, and a means of contact.

Third, those running scrapers should provide a process for site operators to request back-offs, rate caps, exclusions, and to report problematic behavior via the means of contact info or response forms linked via the User Agent string.

Mitigations for Site Operators

Of course, if you're running a website dealing with a flood of crawling traffic, waiting for those bots to change their behavior for the better might not be realistic. Here are a few suggested, if imperfect, mitigations based in part on our own sometimes frustrating experiences.

First, use a caching layer. In most cases a Content Delivery Network (CDN) or an "edge platform" (essentially a newer iteration of a CDN) can provide this for you, and some services offer a free tier for non-commercial users. There are also a number of great projects if you prefer to self-host. Some of the tools we've used for caching include varnish, memcached, and redis.

Second, convert to static content to prevent resource-intensive database reads. In some cases this may reduce the need for caching.

Third, use targeted rate limiting to slow down bots without taking your whole site down. But know this can get difficult when scrapers try to disguise themselves with misleading User Agent strings or by spreading a fleet of crawlers out across many IP addresses.

Other mitigations such as client-side validation (e.g. CAPTCHAs or proof-of-work) and fingerprinting carry privacy and usability trade-offs, and we warn against deploying them without careful forethought.

Where Do We Go From Here?

To reiterate, whatever one's opinion of these particular AI tools, scraping itself is not the problem. Automated access is a fundamental technique of archivists, computer scientists, and everyday users that we hope is here to stay—as long as it can be done non-destructively. However, we realize that not all implementers will follow our suggestions for bots above, and that our mitigations are both technically advanced and incomplete.

Because we see so many bots operating for the same purpose at the same time, it seems there's an opportunity here to provide these automated data consumers with tailored data providers, removing the need for every AI company to scrape every website, seemingly, every day.

And on the operators' end, we hope to see more web-hosting and framework technology that is built with an awareness of these issues from day one, perhaps building in responses like just-in-time static content generation or dedicated endpoints for crawlers.

Starchy Grant