Neuron Makers
Part VI · Chapter 26

The Schism

A dispute over safety, control, and the Microsoft deal splits OpenAI; the Amodeis and a dozen colleagues leave to build a lab premised on the fear that the technology could kill everyone. → Why the field's safety conscience became its own company.

“An AI safety and research company that’s working to build reliable, interpretable, and steerable AI systems.” — Anthropic, describing itself at its Series A, May 2021

The grievance that produced Anthropic did not start with a chatbot. It started with a corporate restructuring most of the world never noticed.

In March 2019, OpenAI announced that it was no longer simply a nonprofit. It had created a new entity, OpenAI LP, a so-called capped-profit company sitting underneath the original nonprofit board. Investors could now put money in and earn a return, though the return was capped, reportedly at one hundred times the original investment for the earliest backers. Anything above the cap would flow back to the nonprofit, in service of the mission. The logic was straightforward and, to the people who ran the place, unavoidable: training the next generation of models would cost billions, and billions do not come from donations. Four months later, in July 2019, Microsoft put in a billion dollars and became OpenAI’s exclusive cloud provider. The lab founded as a counterweight to corporate AI now had a corporate patron.

For most of OpenAI’s senior researchers, this was the price of staying at the frontier. For one cluster of them, it was the moment the place stopped being the thing they had joined. The cluster’s center of gravity was a thin, intense physicist named Dario Amodei.

Amodei had none of the public profile of a Sam Altman or an Elon Musk. He was a scientist’s scientist, the kind of person other researchers deferred to. He had done a PhD in biophysics at Princeton, studying the electrical behavior of neural circuits in actual brains, then worked at Baidu under Andrew Ng on speech recognition, then at Google Brain, before joining OpenAI in 2016. By 2019 he was the lab’s vice president of research, and he had a claim on the work that mattered most. He had led GPT-2 and GPT-3, the two models that turned OpenAI from a sprawling research shop into a company with a product worth selling. He had co-authored the scaling-laws papers, the empirical results this book has already described. If the modern era had a thesis statement, Amodei had helped write it: intelligence was, to a first approximation, something you could buy by the gigawatt.

Three years earlier, in 2016, Amodei had also co-authored “Concrete Problems in AI Safety,” the safety paper this book examined earlier, written with the interpretability researcher Chris Olah and several others. The paper’s quiet argument was that safety was an engineering discipline, not a philosophical add-on to be bolted on at the end, and that the only way to practice it was on real, capable systems, because toy models did not exhibit the failures you needed to study.

That conviction is what made the 2019 turn feel, to Amodei and the people around him, like a slow betrayal. If you believed safety and capability could not be separated, then the people building the most capable systems had a special obligation, and a special set of incentives that could pull them off course. A profit cap, a cloud exclusivity deal, a partner whose own product roadmap now depended on shipping: each was reasonable in isolation, and together they pointed in a direction that worried him. The lab was optimizing for the race. He thought it should be optimizing for getting the race right.

By late 2020 the argument inside OpenAI had become unwinnable, and Amodei did what the disaffected rarely manage to do. He left, and he did not leave alone.

What walked out the door over those weeks was a functioning organization, not a disgruntled individual. Daniela Amodei, Dario’s sister, who had run operations and was building OpenAI’s safety and policy teams, left with him. So did Tom Brown, the lead author of the GPT-3 paper, the person whose name sat first on the most important machine-learning result of the era. So did Jared Kaplan, a Johns Hopkins theoretical physicist who had helped derive the scaling laws; Sam McCandlish, another scaling-laws researcher; Chris Olah, the interpretability pioneer who had spent years trying to read what was actually happening inside a neural network; and Jack Clark, a former Bloomberg journalist who had become OpenAI’s policy director and wrote a widely read newsletter on the state of the field. Seven people in all, the ones who had built GPT-2 and GPT-3, the ones who understood scaling better than almost anyone alive, leaving together to start something premised on the idea that the thing they were best at building might be the most dangerous object humans had ever made.

They incorporated in early 2021 in Delaware as a Public Benefit Corporation, neither a nonprofit nor an ordinary company but a structure that legally obligated the directors to weigh a public mission alongside shareholder returns. Having watched OpenAI’s governance bend under commercial pressure, they later bolted on a second safeguard, a Long-Term Benefit Trust holding a special class of stock and empowered to elect a growing share of the board, with trustees drawn from outside the investor base, among them the RAND chief executive Jason Matheny, the Clinton Health Access Initiative’s Neil Buddy Shah, and the alignment researcher Paul Christiano. The wager was that no later round of fundraising could fully capture a company whose ultimate directors answered to a mission rather than to capital. They called it Anthropic. The name was a pointed choice. The anthropic principle in cosmology is the observation that the universe’s physical constants appear fine-tuned to permit observers like us; flipped toward AI, it gestured at a question of who the observers would be, and whether the systems they built would leave room for them. The mission, as the company stated it, was to build “reliable, interpretable, and steerable AI systems.” On May 28, 2021, Anthropic announced a $124 million Series A led by Jaan Tallinn, the Estonian engineer who had co-written the original Skype and then spent his fortune funding work on existential risk. Dustin Moskovitz, a Facebook co-founder, and Eric Schmidt, the former Google chief executive, were among the backers. The press release described an AI safety and research company. It did not describe a product, because there was not one yet.

The Amodei exodus was the largest single defection, but it was not the only one. The same anxiety was pulling other people out of OpenAI and into adjacent orbits. Paul Christiano, who had helped invent the very technique, reinforcement learning from human feedback, that made ChatGPT possible, left in 2021 to found the Alignment Research Center, a nonprofit devoted to the narrower problem of evaluating whether a powerful model was safe to deploy at all. The intellectual weather of the moment had a name attached to it, too: Stuart Russell, the Berkeley computer scientist whose critique of the field’s foundations this book has already laid out, had published Human Compatible in 2019. When a scientist of Russell’s stature said the field’s foundations were unsound, the people leaving OpenAI felt less like cranks than like the early movers.

This is the part of the story that its critics would later use against it. A group of researchers concludes that a company has become too commercial, too entangled with a Big Tech patron, too willing to ship; they leave to build a purer alternative; and within four years that alternative is itself a frontier lab racing to ship, financed by Big Tech patrons, valued in the hundreds of billions. The schism looks, from a distance, like a story about people who left and then became the thing they left.

That reading is not wrong, exactly. But it misses what the founders actually believed, which was stranger and more specific than hypocrisy.

Amodei’s argument ran roughly as follows. Powerful AI was coming whether or not safety-minded people built it. The failures worth worrying about appeared only in the largest systems, so meaningful safety research could not be done from the sidelines, on small models, in a university. The safety-minded therefore had to be at the frontier, building the most capable models in the world, both to study real risks on real systems and to set a competitive standard that forced everyone else to follow. He had a phrase for it, which Anthropic would later spell out in a public document and which became the company’s whole theory of itself, and the chapter that follows this one turns on it. The argument has a precise shape and an obvious vulnerability, and both are worth taking slowly; what matters here is what it justified.

It justified, with a clear conscience, doing exactly what the founders had left OpenAI for refusing to do safely. And it contained a tension they did not try to hide: to make safety competitive, you had to be competitive, which meant raising enormous amounts of money, which meant taking it from the same kinds of patrons whose influence had soured them on OpenAI.

The money came, and it came from FTX first. In April 2022, Anthropic raised a $580 million Series B, the bulk of it from Sam Bankman-Fried’s crypto exchange and his colleague Caroline Ellison. Bankman-Fried, then celebrated as the philanthropist-prince of effective altruism, reportedly put in around $500 million for a stake of roughly 7.8 percent. (Sources vary on whether to describe his contribution as the $500 million figure or the full round; the dossier this chapter draws on flags the exact split as unconfirmed.) It was the kind of investment that looked visionary for about seven months, until FTX collapsed in November 2022 amid one of the largest frauds in financial history and Bankman-Fried was on his way to prison. Anthropic’s safety lab had been substantially funded by a man who turned out to be a criminal, a fact its critics would not let it forget, and a fact made more awkward by the effective-altruism milieu that connected the founders, the investor, and much of the alignment community besides.

The collapse did not sink Anthropic; if anything it accelerated the company’s pivot toward more conventional, deeper-pocketed patrons. The same December that FTX imploded, Anthropic’s researchers posted the paper that would define their product, a method for training a model to behave by giving it a written set of principles, a “constitution,” and having it critique and revise its own answers against them instead of leaning entirely on contractors to label what was harmful. A few months later they shipped the model that ran on it, named for no one in particular, Claude. Within two years two of the largest technology companies on earth, Amazon and Google, would be underwriting the lab to the tune of billions in exchange for cloud commitments, and a company that had launched with no product would be valued in the tens of billions and climbing. How a lab built out of fear of corporate AI ended up financed by exactly the kind of corporate patrons it had left, and whether the safety thesis survived the contact, is the subject of the next chapter. The setup was now in place. What no one had yet tested was whether the premise held.

Two years later, the conviction would be tested in the most pointed way imaginable. When OpenAI’s board fired Sam Altman over five days in November 2023, a crisis told in full later in this book, the founders Amodei had walked out on came back asking whether he would return to run the combined companies. He said no. Offered the keys to the place he had left, he chose to keep building the alternative. Anthropic was not a stop on the way back to OpenAI; it was the thing they had left to build.

The structural critique that would shadow the company for years was already legible at the founding, before there was a product or a valuation to attach it to. A group of researchers concludes that a company has become too commercial, too entangled with a Big Tech patron, too willing to ship; they leave to build a purer alternative; and the obvious risk, which their own argument all but guaranteed, is that the alternative would have to raise the same enormous sums from the same kinds of patrons and end up looking like the thing it left. Amodei’s answer never changed. Safety could not be practiced from outside the arena, and a lab that throttled itself out of the competition would simply hand the future to people who would not throttle anything. The two readings were not really arguing about the facts, which both sides agreed on. The argument was about whether the founders’ intentions, and the commitments they were about to write down, would make the difference between conviction and cover story.

That was the question Anthropic was founded to answer, and it could only be answered by building. The brief that launched the company had asked something harder than whether it would survive. It had asked whether a lab assembled out of the conviction that the technology might be the most dangerous object humans had ever made could go and build that technology, at the frontier, faster than almost anyone, and stay the thing it set out to be. The answer would not come in a press release. It would come model by model, deal by deal, in the years that followed, when the safety lab started to win.