Impotence
Content moderation at scale and the limits of AI against an adversary that adapts. → The flip side of the hype: what the technology is genuinely bad at.
“There are people in Russia whose job it is to try to exploit our systems and other internet systems. So this is an arms race.” — Mark Zuckerberg, testifying before the U.S. Senate, April 10, 2018
Mark Zuckerberg put on a suit. People who knew him remarked on it, because he did not wear suits. For years his uniform had been a gray T-shirt, a small rebellion against the idea that running one of the largest companies on earth required a costume. But the man who walked into Hart Senate Office Building room 216 on the morning of April 10, 2018, flanked by aides and trailed by a wall of photographers crouched on the floor in front of his table, wore a dark suit and a blue tie, and he sat on a black leather cushion that his staff had brought so that the cameras would not catch the founder of Facebook dwarfed by the senators’ dais.
Forty-four senators sat in a horseshoe above him. It was a joint hearing, the Judiciary and Commerce committees together, an unusual arrangement that meant nearly half the Senate got a turn. The occasion was Cambridge Analytica, the political consultancy that had harvested the Facebook data of tens of millions of users without their knowledge, but the data scandal was only the doorway. Behind it stood the larger thing the senators wanted to talk about, the thing that had been gnawing at Washington since November 2016: Russian operatives had used Facebook to interfere in an American presidential election, and Facebook had not noticed until it was over.
Over the next five hours, and again the following day before a House committee, Zuckerberg was asked some version of the same question dozens of ways. How would Facebook stop hate speech? How would it stop foreign propaganda? How would it stop the next Cambridge Analytica, the next fake account, the next coordinated lie dressed up as a grassroots movement? And to a remarkable degree, his answer was the same. Artificial intelligence would handle it. Not today, but soon. He used a specific horizon, again and again, with the confidence of a man reading off an internal roadmap: five to ten years. Over a five-to-ten-year period, he told the senators, Facebook would have AI tools that could get at the nuances of language well enough to identify hate speech automatically, at scale, in the moment it was posted.
It was a striking promise to make under oath, and it landed because it fit the moment. This was 2018. AlphaGo had beaten Lee Sedol two years earlier. Neural networks were reading checks, captioning photographs, translating languages, recognizing faces. The public had been told, by these same companies, that AI was on the verge of doing everything. So when Zuckerberg said AI would solve the hardest problem his company faced, the senators mostly nodded. A few pressed. Most did not. The promise sounded like the future arriving on schedule.
It was not. It was, in fact, almost exactly backward. At the precise moment the industry was selling artificial intelligence as something close to omniscient, the technology was failing, badly and publicly, at the one open-ended task the platforms needed most. And the reason it was failing said something the hype had carefully avoided: the machines were superb at problems that held still, and helpless against an opponent that moved.
To understand why, it helps to be specific about what content moderation actually is, because the phrase makes it sound clerical, a matter of sorting. It is not. A classifier is a pattern-matcher. You show it a few million examples of a thing, it learns the statistical shape of that thing, and then it can flag new instances that match the shape. This works extraordinarily well when the thing has a stable shape. Nudity has a stable shape; skin and anatomy look like skin and anatomy whether the photo was taken in 2015 or 2025. Known terrorist propaganda has a stable shape, because there is a finite library of beheading videos and ISIS recruitment imagery, and once you have hashed them you can catch every re-upload. By 2018 Facebook could report that its automated systems flagged the overwhelming majority of the terrorist content and graphic nudity it removed before any human reported it, often well above ninety percent. Those were real achievements, and they were the numbers Zuckerberg liked to cite.
Hate speech had no stable shape. Hate speech is a category of meaning rather than a category of image, and meaning lives in context. The same string of words can be an attack, a quotation of an attack, a joke about an attack, a victim describing an attack, or an activist condemning one. A slur reclaimed by the community it once targeted is not the same utterance as the slur hurled by an outsider, and no amount of pixel analysis can tell the two apart. Sarcasm inverts meaning entirely. Dog whistles encode it. A classifier trained on a million labeled examples of hatred learns the shape of last year’s hatred, which is exactly the shape this year’s haters have already learned to avoid. Zuckerberg admitted as much, in the careful language he used when he was being honest rather than reassuring: hate speech, he told the committee, was linguistically nuanced, and it was much harder for an AI system to understand than something like nudity.
Then there was the part the demos never showed, the part that turned a hard problem into something closer to an impossible one. Nudity does not study Facebook’s detectors and adapt. A human adversary does.
This was the asymmetry at the center of the whole enterprise, and Zuckerberg, to his credit, named it. Facebook’s situation was not like teaching a network to recognize cats, where the cats are indifferent to whether they are recognized. It was a contest against people who were paid to win it. He described it, more than once, as an arms race. There were people, he said, whose full-time job was to find the weaknesses in Facebook’s systems and exploit them, and as soon as Facebook closed one gap they would find another. Every defense that worked taught the attacker exactly what to change. A classifier that learned to flag a particular Russian propaganda template was, the moment it deployed, a free tutorial for the next template. The machine learned from history. The adversary learned from the machine. In that race the adversary had the structural advantage, because the defender had to be right about everything and the attacker had to be right about one thing.
The adversary in question was not hypothetical. Two months before Zuckerberg’s testimony, a federal grand jury had indicted thirteen Russians and three Russian organizations, the lead defendant being a St. Petersburg operation called the Internet Research Agency, a building full of paid workers who manufactured American political conflict for a living. They did not hack anything. They did something the platforms had no defense against: they pretended to be Americans. They created Facebook pages for fictitious activist groups on both the left and the right, “Blacktivist” and “Heart of Texas” and “Being Patriotic,” grew them to hundreds of thousands of followers with content indistinguishable from the genuine article, and then used them to stage real-world events, to amplify division, to push voters toward staying home. They bought ads in rubles, which a competent system should have noticed, and ran others organically, which no system could. Facebook’s own count, delivered to Congress in the fall of 2017 after months of the company insisting the Russian footprint was trivial, was that content from these accounts had reached an estimated 126 million Americans. That was not a number any classifier produced. It was a number the company arrived at, slowly and reluctantly, after the fact.
The 126 million was the measure of the failure, and the failure was instructive precisely because the IRA’s content had not, for the most part, broken any rule a machine could check. A meme of Jesus arm-wrestling Satan over the 2016 election is not hate speech. A page celebrating Texas secession is not terrorist propaganda. The posts looked like the ordinary, sincere, sometimes ugly speech of real American partisans, because they were engineered to. The thing that made them dangerous was not in the content at all. It was in the provenance, the coordination, the fact that a single building in Russia was operating thousands of supposedly independent American voices in concert. And a system that judges each post on its own merits is constitutionally blind to coordination. The harm was a property of the network, not the node, and Facebook’s tools looked at nodes.
The same blindness defined the more mundane war that never made the news, the one against the fake accounts and the spammers, which Facebook fought every day at a volume that dwarfed the Russian operation. The company would eventually disclose that it was disabling well over a billion fake accounts per quarter, most of them killed by automated systems within minutes of creation. That sounded like victory until you noticed what it implied: a billion attempts every quarter meant an opponent that never stopped probing, that registered accounts in bulk precisely to learn which patterns of registration tripped the alarm and which slipped through. The detectors and the account farms were locked in a loop, each retraining the other. Every signal Facebook learned to trust, an IP range, a browser fingerprint, a burst of friend requests, became a signal the spammers learned to fake, until the honest behavior and the fraudulent behavior converged and the machine could no longer tell them apart without, once again, a human looking. The numbers Facebook published as proof of strength were also, read the other way, a confession that the problem was permanent. You do not report blocking a billion of something that is going away.
So the company did what the demos never mentioned. It hired people. Tens of thousands of them.
The “AI will solve it” story always carried, just beneath the surface, a quiet dependence on human labor that the marketing was designed to obscure. In late 2017, even as Zuckerberg was describing the coming age of automated moderation, Facebook announced it would roughly double its safety and security workforce to twenty thousand people, a large share of them content reviewers. By 2018 and 2019 the company’s moderation operation, mostly outsourced to contractors at firms like Cognizant and Accenture in places like Phoenix, Tampa, Manila, and Dublin, had grown to something on the order of fifteen thousand reviewers. These were the people actually holding the line. They worked from quotas and decision trees, granted seconds per item, paid a fraction of an engineer’s salary, and they spent their shifts looking at the worst things human beings post: the beheadings, the child abuse, the suicides, the animal torture, the torrent of cruelty that the celebrated automated filters had failed to catch or had punted upward as too ambiguous to call. Reporting that emerged over 2018 and 2019, most prominently a series of investigations into Cognizant’s American sites, described moderators developing symptoms of post-traumatic stress, coping with panic attacks and dark humor and, in some cases, coming to believe the conspiracy theories they were paid to review. The AI that was supposed to spare humanity this work instead defined its job: the machine handled the easy, stable cases, and shoveled everything genuinely hard onto people.
This was the shape of the thing the hype had inverted. The public had been told the humans were a temporary bridge to full automation. In practice the automation was the bridge, a crude first pass, and the humans were the system, the part that could actually read context, detect sarcasm, recognize that a slur was a quotation, understand that this account and that account and forty others were the same operation wearing different masks. The labor did not shrink as the AI improved. It grew, because the platform grew, and because every new language and new country and new genre of abuse opened a frontier no classifier had ever seen.
The languages mattered more than almost anyone in Washington understood. Facebook’s automated tools worked least badly in English, where the training data was richest and the engineers most numerous. In much of the rest of the world they barely worked at all. In Myanmar, where Facebook had become the de facto internet for tens of millions of people, the platform’s role in the 2017 campaign of violence against the Rohingya, a campaign the United Nations would describe as bearing the hallmarks of genocide, became a case study in exactly this gap. Anti-Rohingya hate speech and incitement spread for years across Burmese-language Facebook, and the company had, by its own later admission, almost no automated detection in Burmese and a vanishingly small number of moderators who could read the language. A human-rights assessment Facebook itself commissioned, published in 2018, concluded the company had not done enough. The AI that was meant to police the world’s speech could, in practice, barely read most of it. The long tail of human language was not a corner case. It was most of the world.
Inside Facebook, not everyone told the story the way Zuckerberg told it to the Senate. The company’s chief technology officer, Mike Schroepfer, the engineer responsible for actually building the systems the CEO kept promising, spoke about them in a different register. Schroepfer was a true believer in AI; he had helped pour the company’s resources into it. But when he talked about the hardest problems, the ones Zuckerberg waved at with the five-to-ten-year horizon, his honesty kept breaking through. In a 2019 profile he allowed that there were problems his field might never fully solve, that hate speech and misinformation were not the kind of thing you could declare finished, that the work was open-ended in a way the public framing refused to admit. At one point, talking about the scale of the harm and the limits of his tools, he grew emotional enough to have to stop. It was a strange and revealing moment: the man building the solution, in tears, conceding that the solution as advertised did not exist.
The gap between Schroepfer’s candor and Zuckerberg’s confidence was not really a disagreement between two men. It was the gap that ran through the entire field in these years, the distance between what the technology could demonstrate in a controlled setting and what it could do against the world as it actually was. A neural network that scored 95 percent on a fixed benchmark looked like a machine that had solved the problem. But a benchmark is a frozen photograph of a problem, and moderation was not frozen. It was a living adversary with a research budget. The benchmark did not fight back. The Internet Research Agency did.
There was a deeper lesson here about what these systems were and were not, and the people who had built them understood it better than the marketing departments who sold them. Deep learning was, at its core, a spectacular technology for interpolation, for recognizing variations on patterns it had already seen many times. Within the distribution of its training data it was often superhuman. Step outside that distribution, into the novel, the adversarial, the deliberately unfamiliar, and it had no footing, because it had never possessed understanding in the first place, only a very high-resolution memory of the past. Cats do not invent new ways of being cats to fool the cat detector. Checks do not rewrite their own digits. But a propagandist studies the filter, a spammer mutates the message, a harasser invents the slang that this month’s classifier has never read. The thing the machines were worst at was the thing that learned.
That asymmetry would not be confined to content moderation. It was a property of the technology, and it would surface again and again as these systems were pushed out of the lab and into contact with people who had reasons to defeat them. But in 2018 it had a face, and the face was Mark Zuckerberg’s, sitting on a borrowed cushion in a borrowed suit, promising a room full of senators that a problem his own chief technologist could not promise to solve would be solved by a machine, in five to ten years, you have his word on it.
The five-to-ten-year horizon is worth holding onto, because the clock can be checked. By the time it ran out, the chatbots had arrived and the conversation had moved on, and the question of whether AI could finally moderate human speech at planetary scale had quietly disappeared from the headlines, unanswered, the way unsolved problems do when a louder one arrives. The platforms still leaned on the humans. The adversaries still adapted. The machines still held the easy cases and surrendered the hard ones.
What the senators never quite asked, and what Zuckerberg never quite said, was the more fundamental question underneath all of it. The issue ran deeper than whether the machines could keep up with the haters and the propagandists. It was whether they could be said to understand anything at all, or whether they were only, however brilliantly, matching patterns they had been shown. It was a question the field had been arguing about quietly for years, in language far drier than a Senate hearing, among the people who had built the systems and the dissenters who kept insisting, to nearly everyone’s annoyance, that the emperor had less on than the demos suggested. That argument, over what these machines were and what they were not, was the one the field had been postponing, and it was about to be had in the open.