Part VI · Chapter 27

Race to the Top

Anthropic tries to prove a safety-first lab can also be a frontier lab, shipping Claude and Constitutional AI while taking billions from Google and Amazon. → The bet that safety and capability cannot be separated.

“A ‘race to the top,’ … in which different industry players are incentivized to improve, rather than weaken, their models’ safeguards and their overall safety posture.” — Anthropic, “Core Views on AI Safety,” March 8, 2023

On the morning of March 14, 2023, the people who cared about large language models had exactly one thing on their minds, and it was not Claude. OpenAI was releasing GPT-4, and for a few hours that day, with its live demo and its bar-exam scores and its breathless press, GPT-4 was the only machine in the world.

Anthropic released Claude the same week, and almost nobody noticed.

That was, more or less, the plan. There was no live demo. There was a blog post, written in the flat, careful register of a company that had decided in advance not to oversell. Claude was available to a short list of partners and through an API to developers who applied for access. The post described two versions, Claude and a faster, cheaper variant called Claude Instant, and it spent as much space on what the model should not do as on what it could. The company that built it had existed publicly for less than two years, had a few dozen employees, and was named after a principle in cosmology. If you were not paying close attention, you could be forgiven for thinking that the AI race had two serious entrants, OpenAI and Google, and that everyone else was scenery.

The people who ran Anthropic were paying very close attention, and they had a thesis about all of this that they were willing to state out loud. The thesis was that the quiet launch, far from being a sign of weakness, was the entire point.

Dario Amodei had spent years arguing, to anyone who would sit still for it, that the technology he was helping to build was genuinely dangerous and getting more so on a predictable schedule. He was no doomsayer by temperament. He was a physicist by training, soft-spoken, prone to long pauses, the kind of person who answered a simple question by first defining his terms. What made him unusual was that he believed the danger and the capability were the same curve. The thing that made the models more useful, namely more scale, more data, more compute, was the same thing that made them harder to control. You could not get the benefit without buying the risk. Most people in the field treated safety as a brake you pressed when the car got going too fast. Amodei treated it as a property of the engine.

This was the conviction that had pulled him and his sister Daniela and a clutch of OpenAI’s best researchers out the door at the end of 2020, and it was the conviction Anthropic was built around. By the time Claude shipped, that founding drama was already two years in the past. The interesting question in March 2023 was no longer why they had left. It was whether a company organized around caution could survive in an industry organized around speed.

The honest answer, which Anthropic’s leadership understood better than its critics, was that caution by itself could not survive at all. A lab that only said no would be a think tank, not a competitor, and a think tank had no leverage over the companies actually shipping models. To matter, you had to be at the frontier. To be at the frontier, you had to spend enormous sums on compute and talent. To raise those sums, you had to convince investors there was money in it. And the moment you took the money, you were in the race whether you liked the word or not.

Amodei had a name for the way out of this box, and he used it constantly. He called it the race to the top.

The phrase was doing a lot of work, and it is worth slowing down on it, because it is the hinge the whole chapter turns on. The conventional fear about AI competition was that it produced a race to the bottom: each lab, terrified of falling behind, cut corners on testing and safety, and the lab willing to be most reckless set the pace for everyone else. Amodei accepted the premise that competition was inevitable. What he rejected was the direction. If you could make safety a thing labs competed on, so that being demonstrably more careful became a source of advantage rather than a tax on speed, then the same competitive pressure that drove the bottom could be turned to drive the top. A lab that pioneered a safety technique, or a transparency standard, or a deployment norm, would force its rivals to match it or look careless by comparison. You did not slow the race. You changed what winning meant.

It was an elegant argument, and it had an obvious vulnerability, which critics pointed out immediately and never stopped pointing out. If safety was a competitive advantage, then “safety” had become a marketing position, and a company has every incentive to talk up its marketing positions. How could anyone tell the difference between a lab that was genuinely more careful and a lab that had simply discovered that careful was a good look? Anthropic’s whole existence sat inside that ambiguity. The same facts could be read two ways. The careful launch was either principled restraint or savvy differentiation. Constitutional AI was either a real advance in alignment or a clever brand. The answer, uncomfortably, was usually both at once, and the company’s leaders mostly declined to pretend otherwise.

Start with the technical move, because it came first and because it is the cleanest example of the doubleness. In December 2022, weeks after ChatGPT had detonated the public conversation, Anthropic’s researchers published a paper with a title that read like a manifesto: “Constitutional AI: Harmlessness from AI Feedback.” It described a method for training a model to be helpful and harmless without the thing everyone else relied on, which was an army of human contractors reading model outputs and rating them.

To understand why that mattered, you have to understand what the contractors were for. The standard recipe for turning a raw language model into a usable assistant was reinforcement learning from human feedback, RLHF, the human-ranking method this book described earlier, the technique that had made ChatGPT feel like it had manners. It worked. It was also slow, expensive, inconsistent, and grim. The contractors were often reading the worst things a model could generate, hour after hour, so that the rest of us would never see them. And the values the model absorbed were buried inside a pile of individual judgments that no one could inspect or audit. If you wanted to know why a model refused a request, the honest answer was that ten thousand contractors had collectively leaned that way, and good luck reconstructing it.

Constitutional AI replaced most of the humans with a document. The researchers wrote down a set of principles, a “constitution,” drawn from sources like the Universal Declaration of Human Rights and a handful of plainly worded rules about being helpful and avoiding harm. Then they had the model critique and revise its own outputs against those principles. The model would generate a response, then evaluate whether it violated the constitution, then rewrite it to comply, over and over, generating its own training data. Human feedback didn’t vanish, but the laborious safety-rating step was handed to the AI itself. The acronym was almost a joke at the field’s expense: RLHF became RLAIF, reinforcement learning from AI feedback.

You can see why this was, simultaneously, a genuine research contribution and a perfect piece of positioning. As research, it addressed a real problem, the brittleness and opacity and human cost of the standard pipeline, and it produced a model whose values you could at least point at, because they were written down. As positioning, it was almost too good. Anthropic’s competitors had safety practices buried in internal documents and contractor guidelines. Anthropic had a constitution, a word that summoned founding fathers and rule of law and the deliberate, sober business of writing down what you stand for. When Claude declined to do something, the company could gesture at a text. The substance and the story were the same object.

The skeptics had a fair point that the constitution was not as neutral as the framing implied. Somebody, after all, chose the principles, and those somebodies were a small group of mostly American engineers with particular ideas about harm. Anthropic, to its credit, mostly conceded this. It later ran experiments asking the public to help write a constitution, an acknowledgment that “whose values” was a real question and not a solved one. But the deeper point held. Constitutional AI made Claude’s behavior legible in a way that its rivals’ models were not, and legibility, it turned out, was something you could compete on. That was the race to the top in miniature: a safety technique that doubled as a moat.

The person whose job was to say all this in public was Jack Clark, and watching how he said it tells you something about how seriously the company took the framing. Clark had been OpenAI’s policy lead before he left with the Amodeis; he was British, wry, a former journalist, and he wrote a widely read newsletter called Import AI that thousands of people in the field read on Sunday nights. Clark’s role was to translate Anthropic’s posture into the language of governments and the public, and the phrase he and Dario kept returning to was the race to the top. The idea, as Clark framed it, was that a frontier lab had a peculiar kind of power: not just to build models but to set norms. If Anthropic published a safety standard and held itself to it, the standard became a fact in the world that other labs had to reckon with. Reporters would ask OpenAI and Google why they hadn’t matched it. Regulators would ask the same. You could, in effect, regulate the industry from inside it, by being the company that volunteered for the strictest rules and dared everyone else to follow.

In September 2023, Anthropic put a concrete version of this on the table. It published something it called a Responsible Scaling Policy, and the document was a genuine novelty in an industry that had mostly avoided binding itself to anything. The policy laid out a ladder of what it called AI Safety Levels, ASL-1 through ASL-4 and beyond, borrowed in spirit from the biosafety levels that govern laboratories handling dangerous pathogens. The lower rungs covered models that posed no serious catastrophic risk. The higher rungs covered models that might meaningfully help someone build a weapon, or that might begin to act with dangerous autonomy. The crucial commitment was conditional: as a model’s capabilities crossed defined thresholds, Anthropic would impose correspondingly stricter security and deployment safeguards, and if it could not meet the safeguards for a given level, it would not deploy the model. The company was promising, in writing, to stop.

It was the race-to-the-top theory made operational. A voluntary commitment, published, specific enough to be embarrassing to violate, designed to pressure rivals into publishing their own. Within roughly a year, OpenAI had its own version, a Preparedness Framework, and Google DeepMind had a Frontier Safety Framework. Whether they would have written those documents without Anthropic’s example is impossible to know. But the sequence was the sequence, and it was exactly what Amodei had predicted: set the standard, and watch the others scramble to match it.

There was, of course, a catch, and it was the catch that hung over everything Anthropic did. A self-imposed commitment is only as strong as the company’s willingness to honor it when it costs something. The Responsible Scaling Policy was not a law. No regulator enforced it. If Anthropic ever found itself one safeguard short of deploying a model it badly wanted to ship, a model its rivals were about to beat it to market with, the only thing standing between the policy and the bottom line was the company’s own resolve. Critics noted that Anthropic later revised the policy, softening some commitments and clarifying others, and read the revisions as evidence that the bindingness was always somewhat aspirational. Anthropic read them as a maturing framework. Both could be true. The honest position, which the company’s more candid researchers would admit in private, was that nobody would know how strong the commitment was until the day it actually hurt to keep it, and that day had not yet clearly come.

While the policy people were building the safety scaffolding, the research org was doing the thing that gave all of it credibility, which was building models that were genuinely good. This is the part of the story that the safety framing can obscure. Anthropic had become, increasingly, one of the best applied-AI shops in the world, staffed by people who had written the foundational papers on why scaling worked, rather than a safety organization that happened to ship a product.

Jared Kaplan was a case in point. A theoretical physicist who had spent his career on quantum gravity and cosmology before turning to machine learning, Kaplan was a co-author of the scaling laws, the result this book has already described as the intellectual engine of the whole era, the one that meant progress was, to a first approximation, a matter of buying more. Kaplan became Anthropic’s chief science officer, and the through-line from the scaling laws to Claude was not metaphorical. The same people who had quantified the curve were now riding it.

Tom Brown had led the engineering on GPT-3, the model that proved the curve held at unprecedented scale. Sam McCandlish had worked on the scaling laws too. And Chris Olah was running an effort that was, in a sense, the purest expression of the company’s stated values, and also the one least likely to ever show up on a balance sheet. Olah worked on interpretability: the attempt to actually understand what was happening inside a trained neural network, to open the black box and read the circuits. It was slow, unglamorous, basic-science work, the kind a company racing to ship would normally starve. Anthropic funded it generously, and Olah’s teams produced a stream of results that gradually made the inside of a large model less of a mystery: identifying features, tracing how concepts were represented, eventually mapping millions of interpretable patterns inside a production model. You could be cynical and call interpretability a recruiting tool, a way to attract the researchers who cared about the deep questions. You could also notice that it was the one part of the operation hardest to explain as a marketing move, because the market did not, in any obvious way, pay for it. It was the closest thing the company had to proof that it meant what it said.

The models kept coming, and they kept getting better, and the gap between Anthropic and the leaders kept shrinking. Claude 2 arrived in July 2023, with a longer memory and a public chat interface that finally let ordinary people, not just developers, talk to it. The real inflection came in March 2024, when Anthropic released the Claude 3 family: three models named, in ascending order of size and capability, Haiku, Sonnet, and Opus. The poetry-form names were a small tell about the company’s self-image, literary where OpenAI’s version numbers were industrial. On the standard battery of benchmarks, Opus was, by Anthropic’s measurements, competitive with or ahead of the best OpenAI had shipped. For the first time, the careful lab had stopped catching up. On some axes it was in front.

Then, in June 2024, came the model that changed Anthropic’s commercial fortunes more than any safety argument ever had. It was called Claude 3.5 Sonnet, and it was very good at writing code.

This deserves a moment, because it is where the abstract bet about safety-as-strategy collided with a concrete fact about the market. Anthropic had not set out to be the coders’ model. But software engineers, it turned out, were the most demanding and least sentimental users of these systems. They did not care about a model’s vibe or its constitution. They cared whether it could write a function that ran, refactor a tangled file, find the bug. And Claude 3.5 Sonnet could do those things better than anything else available, by a margin developers could feel. Word spread the way it spreads among engineers, which is to say through other engineers, fast and without marketing. A class of startups was being built on top of these models, tools that let you describe software and have it written, and a striking number of them quietly defaulted to Claude. The safety-first lab had backed into the most lucrative beachhead in the industry almost by accident, and it pressed the advantage hard. Later versions leaned further into coding and into agentic tasks, the model not just answering questions but taking actions, using tools, operating a computer.

Here the tension the company had lived with from the start became impossible to wave away. Anthropic was now racing in the most ordinary sense of the word: shipping models faster, chasing benchmarks, fighting for developers and enterprise contracts and market share against OpenAI and Google with everything it had. The careful rollout that had defined the Claude launch in March 2023 looked, by 2025, like a phase the company had outgrown. New models shipped on a cadence that would have been unthinkable for a lab whose brand was restraint. The same Dario Amodei who warned in long, careful essays about the existential stakes of the technology was also running a company whose commercial success depended on putting ever more capable versions of that technology into ever more hands as quickly as it responsibly could. He did not see these as contradictory. He argued, consistently, that the safest path ran through being a leading lab, because a leading lab had a seat at the table and a marginal lab did not. You could not change the race from the sidelines. But the argument required holding two ideas that pulled in opposite directions, and observers were entitled to wonder whether anyone, however sincere, could hold them indefinitely without one quietly winning.

For one group, the argument had already settled the question, and they voted with their feet. Anthropic was becoming the place the worried went when they decided they could no longer do safety work where they were. The pattern reached its most public expression in May 2024, when OpenAI’s Superalignment team, the group it had created the year before with a promise to spend a fifth of its compute on controlling systems smarter than humans, came apart. Its two leads both resigned. Ilya Sutskever left to start a lab whose entire premise was that it would build nothing commercial until safety was solved, the mirror image of Anthropic’s bet. Jan Leike, his co-lead, was blunter; on his way out he wrote that at OpenAI “safety culture and processes have taken a backseat to shiny products.” Within weeks he was at Anthropic. The race to the top did not only set standards that rivals had to match. It pulled their safety researchers across the street, which was the strategy working exactly as advertised, and also, depending on your reading, the strategy advertising itself.

The clearest place to watch the strain was the money, because the money was enormous and it came from exactly the kind of actors a safety purist might have been wary of. To stay at the frontier, Anthropic needed compute on a scale only the largest companies on earth could provide, and providing it was Amazon and Google, the same hyperscalers whose own AI ambitions made them, in some sense, rivals. In September 2023, Amazon committed to invest up to four billion dollars in Anthropic, taking a minority stake and becoming a primary cloud provider; in November 2024 it roughly doubled that commitment, bringing the reported total to around eight billion. Google had taken a stake earlier, reported at around three hundred million dollars for roughly a ten percent share, and later committed to invest up to around two billion more. The structure of these deals was its own kind of story. Much of the cash flowed back out as payments for the very cloud computing the cash was meant to buy, a circular arrangement that a later generation of skeptics would scrutinize closely. But the headline was unambiguous. The safety lab was being underwritten, to the tune of billions, by two of the most powerful companies in the world.

The valuation followed the funding up a steep and increasingly vertical line, and the clearest measure of how far it had traveled was the fate of the FTX stake. When Sam Bankman-Fried’s exchange collapsed in late 2022, the roughly half a billion dollars it had put into Anthropic’s Series B became a bankruptcy asset, and in March 2024 the estate sold it for around $884 million, more than recovering the original bet for the man’s defrauded creditors. The lab whose most awkward early backer had turned out to be a criminal had appreciated enough that unwinding him was a windfall. From there the line went vertical. A Series E in March 2025 raised $3.5 billion at a $61.5 billion valuation; a Series F that September raised $13 billion at $183 billion, with run-rate revenue reported to have climbed from about $1 billion at the start of the year to more than $5 billion by late summer. The exact figures are slippery, since private valuations are negotiated rather than observed, but the shape is not in doubt. In roughly four years a company founded on the premise that its product was dangerous had become one of the most valuable startups on the planet, and its danger thesis had become, awkwardly and undeniably, part of the pitch. Investors were buying a leading frontier lab whose safety reputation happened to be a durable commercial asset, the kind enterprise customers and governments found reassuring, not safety out of conscience. The race to the top had a market, and the market liked it.

Anthropic had built a governance structure meant to keep all this money from capturing the mission. It was a public-benefit corporation, legally permitted to weigh its stated purpose against shareholder returns, and it had created something called the Long-Term Benefit Trust, a body holding a special class of shares with the eventual power to elect board members, populated by people chosen for their commitment to the mission rather than their stake in the upside. On paper it was an ingenious answer to the question that had broken OpenAI’s own nonprofit-with-a-company structure: how do you take billions in capital without letting the capital decide what the company is for? In practice, the trust’s powers were contingent and could be amended under certain conditions, and the same uncomfortable truth applied to it that applied to the Responsible Scaling Policy. A governance safeguard is only tested when it has to overrule the money, and that test had not yet arrived. Until it did, the trust was a promise, well-constructed and sincerely meant and entirely unproven.

So: did the bet work? By the obvious measures, spectacularly. Anthropic in 2025 was one of a handful of frontier labs, with a model family that engineers reached for first, revenue growing at a rate that startups dream about, and a safety reputation that functioned as both identity and competitive advantage. Amodei had said that being the most cautious lab could also make it one of the best, and on the evidence he was right. Safety had not been a tax. It had been, at least so far, a strategy that paid.

But the brief that launched the company asked a harder question than whether it would succeed, and that question does not resolve as cleanly. The premise had been that the technology was dangerous enough to demand a different kind of company. The test of the premise was never going to be whether the different company won. It was whether the company stayed different once it started winning. Constitutional AI was real and also a brand. The Responsible Scaling Policy was a genuine constraint and also a revisable document. The interpretability research was a true expression of values and also a recruiting magnet. The funding was a necessary evil and also, increasingly, just funding. Each of these could be held as a contradiction or as a synthesis, and the people running Anthropic mostly insisted it was a synthesis: that you could be sincere and strategic at once, that doing well and doing good were not opposed but, if you were clever about it, the same.

Maybe. The thing about a race, even a race to the top, is that it does not slow down to let you check your principles. By the time Claude was competing for developers and Anthropic’s danger thesis had become part of its pitch, the technology was no longer the private concern of the people building it. A hundred million people were using these systems. The arguments that had been conducted in research papers and founders’ essays were about to be taken up by people with subpoena power and Nobel committees and electorates. Anthropic had bet that the safety-minded should be at the frontier when the world arrived to take the measure of what they had built. The world was arriving now. And the first to turn and face it was not a company at all, but the seventy-five-year-old man who had started the whole thing, walking out of Google to say what he could not say from inside it.