Part IX · Chapter 42

The Race to the Present

As of May 2026, with Anthropic's Claude Opus 4.8 the newest model on the board, five labs and a Chinese challenger race toward AGI with systems that reason, act, and improve — and no one agrees how close they are. → Where the story stands today, and the question now handed to the reader.

“We are past the event horizon; the takeoff has started.” — Sam Altman, The Gentle Singularity, June 10, 2025

On Thursday, May 28, 2026, Anthropic had what amounted to the best single day any company in the history of the field had ever had, and the strange thing about it was how little of the day was about the thing the company actually made.

That morning it shipped Claude Opus 4.8, the newest model on the board. By lunchtime it had announced a $65 billion fundraising round, a Series H, that valued the five-year-old startup at roughly $965 billion. That valuation, the press noted within the hour, edged past OpenAI’s $852 billion and made Anthropic, on paper, the most valuable AI startup on earth. The company added, almost in passing, that its run-rate revenue had crossed $47 billion that month, up from something like $14 billion in February. And it teased, for the second time in two months, the wide release of a more powerful model it had given the name Mythos, a model so capable that the company had so far refused to let anyone outside a small group of vetted partners use it.

One day, four superlatives: the newest model, the highest valuation, the steepest revenue curve, and a model held back because it was too dangerous to ship. Dario Amodei, who a decade earlier had been a biophysicist worrying in print about accidents in machine-learning systems, now ran the company that had just passed the company he had left. He did not behave like a man who thought the race was over. None of them did. That was the tell. By the spring of 2026, every person at the front of this field was both winning and losing at the same time, because the lead had become something you could hold for about a month.

The crown changed hands on roughly a monthly clock, and it had been doing so for nearly a year. The pattern was easy to miss if you read only the announcements, because each one was written as though its author had settled the matter. Read them in sequence and a rhythm emerged, like a metronome that no one was conducting.

It started in earnest in August 2025. On August 7, OpenAI released GPT-5, a single model fronted by a router that decided, in real time, whether a given question deserved a fast answer or a slow one. It became the default for every ChatGPT user overnight. It scored 74.9 percent on SWE-bench Verified, the benchmark that had quietly become the one that mattered. SWE-bench was not a quiz. It was a set of real, unsolved bugs pulled from real open-source software repositories, and a model’s score was the percentage it could actually fix, its patch applied and the project’s own tests run to confirm the fix held. A year earlier the best models had hovered around 40 percent. The whole industry had reorganized its bragging around that one number, because coding had turned out to be the first thing these systems could clearly, salably do.

GPT-5 reset the bar and drew an odd backlash. Users discovered that the router had taken away the menu of models they had grown attached to, and that the warm, slightly sycophantic personality of the older GPT-4o was gone. People who had spent two years talking to a particular voice did not enjoy waking up to a different one. It was a small thing and a revealing one. The product had become personal in a way its makers had not planned for.

Then, on November 18, 2025, Google shipped Gemini 3, and for the first time in a long time the lead changed in a way no one could spin. Gemini 3 topped the public leaderboards. It posted a GPQA Diamond score above 90 percent, on a test of graduate-level science questions written to be Google-proof, the kind a PhD in the relevant field could mostly answer and an educated outsider mostly could not. It came with a slow, deliberate mode that Google called Deep Think, and Google did the thing only Google could do. It rolled the model straight into Search, into the product more than a billion people already used, on day one. For a decade Google had been the lab that flinched, the company that held the more capable models and declined to ship them. This time it did not flinch.

The reaction inside OpenAI was the clearest evidence of how the race actually felt from the inside. On December 2, 2025, Sam Altman declared a company-wide “code red,” the highest internal priority, a code red of its own three years after Google’s, and froze work on advertising, on shopping features, on the agentic projects the company had been pushing, to refocus everyone on the quality and speed of ChatGPT itself. The leader had been forced to react. Three weeks later, on December 11, OpenAI shipped GPT-5.2 to answer Gemini 3, and the metronome ticked on. By the new year Altman was telling staff that Gemini’s impact on ChatGPT’s numbers had been smaller than feared and that he expected to exit code red, but the reallocation had a cost. In January 2026 a string of senior researchers left, among them Jerry Tworek, a vice president of research, who departed after seven years with his repeated appeals for more compute denied.

And through all of it, Anthropic kept shipping at a cadence that no one else came close to matching. On November 24, 2025, six days after Gemini 3, it released Claude Opus 4.5, the first model anyone had pushed past 80 percent on SWE-bench Verified, at 80.9 percent. Then it did something that looked, at first, like a mistake. It cut the price by two-thirds, from fifteen dollars per million input tokens down to five, and from seventy-five down to twenty-five for output. A company does not cut the price of its best product by that much unless it believes the product is no longer scarce. The number said that coding, at least the kind of coding that fills most working days, was solved well enough to sell cheaply, and that the margin would come from volume. From there the releases came in a blur that compressed a normal industry’s roadmap into a season. Opus 4.6 on February 5, 2026, with a million-token context window. Sonnet 4.6 on February 17, the first time the company’s mid-tier model was preferred by its own evaluators over the previous generation’s flagship. Opus 4.7 on April 16, which landed badly, with users complaining it refused too much and talked too much and stumbled when handed tools. And then Opus 4.8 on May 28, just forty-one days after 4.7, pitched not on a headline benchmark but on something quieter.

Opus 4.8’s headline was honesty. The company said it was roughly four times less likely than 4.7 to let a flaw in its own code slip past unflagged, and it published a figure of 84 percent on a benchmark called Online-Mind2Web, which measured whether a model could carry out real tasks across real websites without a human guiding each click. Eighty-four percent did not mean the agents were reliable in the way a calculator is reliable. It meant they had crossed from demo into something a business could actually deploy and mostly trust, which was a different and more consequential threshold. The pitch had shifted. For three years the labs had competed on whether a model could do a hard thing once. Now they were competing on whether it would do an ordinary thing every time, and admit when it could not.

That shift, from raw capability to reliability, was the real story underneath the monthly coronations, and it had a darker companion. The same months that produced the honesty pitch also produced the first frontier model that its own maker refused to release.

On April 7, 2026, Anthropic announced Claude Mythos Preview. The model’s existence had already leaked, surfaced in late March through a data breach that Fortune reported, which described a “step change” in capability. The April announcement made the reason for the secrecy specific. Mythos was a general-purpose model, but pointed at software it turned into something else. Anthropic’s red team had set it loose on real systems and watched it autonomously find and exploit a root vulnerability in FreeBSD that had sat undiscovered in the operating system’s code for seventeen years, a flaw eventually catalogued as CVE-2026-4747. That was not the alarming part. The alarming part was that it then found thousands of others, zero-day vulnerabilities across every major operating system and web browser, more than ninety-nine percent of them still unpatched at the time the company disclosed them. A zero-day is a flaw the defender does not yet know exists, which means there is zero days’ warning before it can be used. Mythos was producing them faster than the world could fix them.

So Anthropic did not ship it. Instead it gated the model through something it called Project Glasswing, a controlled program of fewer than fifty organizations: operators of critical infrastructure, security researchers, government bodies. The company’s own reporting placed the model’s cyber capability at or near the threshold its safety framework defined as ASL-3, the level reserved for models that could meaningfully help a malicious actor cause large-scale harm. For years the field’s safety argument had been an argument about hypotheticals, about a superintelligence that did not exist and might never. Here was a concrete artifact, shipped but not shipped: a model good enough that releasing it would have been, functionally, handing a weapon to anyone with an API key. By late May, the European Union was reportedly pressing Washington to intensify talks over how, and whether, such models should ever be released at all. The abstract debate that had run through this whole story, going back to Bostrom’s book and Musk’s warnings a decade earlier, had finally produced a thing you could point at.

The contest was not only American. The Chinese challenger that had detonated the markets in January 2025 was still on the board, and still close. Liang Wenfeng’s DeepSeek had kept shipping, its V3 line iterating into a V3.2 by the autumn of 2025 and a more capable successor in the works, its models freely downloadable and a fraction of the price of the closed American ones. On the benchmarks that mattered most, the best Chinese open models trailed the best closed American ones, but the gap was measured in months, not years, and it had not widened. The export controls that were supposed to slow China had instead taught it efficiency. Each new American flagship bought the closed labs a few months of clear daylight before a Chinese model arrived that was nearly as good and far cheaper, and then the cycle repeated. The lesson the American labs had absorbed on DeepSeek Day, that their lead was a matter of weeks rather than a structural moat, had held.

Which raised the question that the whole accumulating spectacle was supposed to be answering, and that the people building the systems could not answer in agreement: how close was any of this to the thing they all said they were building?

They had been making predictions for years, and the predictions had not converged. Leopold Aschenbrenner, a former OpenAI researcher who now ran an AGI-focused investment fund, had published a hundred-and-sixty-page treatise in June 2024 called Situational Awareness, arguing that artificial general intelligence by 2027 was “strikingly plausible” and that the real story was a coming national-security scramble between the United States and China for the decisive technology of the century. In April 2025 a group led by Daniel Kokotajlo, another OpenAI alumnus, one who had forfeited a fortune in equity rather than sign a non-disparagement clause, published AI 2027, a month-by-month scenario that ran from the present to a superhuman intelligence and offered the reader two endings, a “race” and a “slowdown,” depending on choices not yet made. The authors of AI 2027 later softened their own timelines, which was itself worth noticing. The people closest to the work kept revising toward more time, not less.

Amodei occupied both ends of the argument at once, which made him the most honest and the most exposed of the forecasters. In October 2024 he had published Machines of Loving Grace, an essay imagining what powerful AI could do for human health and prosperity, a “compressed 21st century” of medical and scientific progress folded into a decade. Seven months later, on May 28, 2025, he told Axios something far bleaker. AI could wipe out as much as half of all entry-level white-collar jobs and push unemployment to ten or twenty percent within one to five years, he said, and the industry was, in his word, “sugarcoating” it. The safety CEO had become the bluntest jobs Cassandra in the business. By May 2026 he had begun to soften that too, leaning on the economist’s Jevons paradox, the observation that making something cheaper often increases rather than decreases the total demand for it, and the labor that goes with it. Altman, for his part, had planted his flag in June 2025 with The Gentle Singularity, declaring that humanity was “past the event horizon” and the takeoff had begun, an apocalyptic claim delivered in a tone as warm as a fireside chat, complete with an oddly precise estimate that a single ChatGPT query used about a third of a watt-hour of electricity.

Against all of them stood Demis Hassabis, who had been saying the same measured thing for years and kept saying it: that human-level AI was perhaps five to ten years out, that it would arrive around 2030, that it would resemble the Industrial Revolution run at ten times the scale and ten times the speed. Hassabis had won a Nobel Prize in 2024 for using these systems to fold proteins. He had as much standing as anyone to claim the finish line was near, and he declined to.

Underneath the rhetoric, away from the essays and the interviews, there was one place where the effect was already a measurement rather than a forecast, and it happened to be the field’s own discipline. The Stanford Digital Economy Lab, in a study its authors titled “Canaries in the Coal Mine,” found that employment for software developers between the ages of twenty-two and twenty-five had fallen by roughly twenty percent from a peak in late 2022, the same window in which ChatGPT had arrived and coding assistants had become standard. Older, more experienced developers had not seen the same drop. The signal showed up first, and most clearly, at the bottom of the ladder, among the people whose work most resembled the work the machines had just learned to do. Microsoft said a meaningful share of its code was now AI-generated. Salesforce said it had hired no new engineers in a fiscal year, citing the productivity of its tools. The men who built these systems kept telling audiences the technology would take their jobs, and for once the prediction was not pure salesmanship. In the one corner of the economy where the data was good and the exposure was high, something was already moving.

So that was the board, on the last Saturday of May 2026. Five American labs and a Chinese one, trading a lead that no one could hold. A center of gravity that had migrated from the size of a model to the way it reasoned and the things it could do on its own. Valuations that had detached from any frame a previous generation of investors would have recognized, a startup worth nearly a trillion dollars, an established giant filing to go public at the same scale, a chip company underwriting its own customers in a web of deals that one famous short-seller likened to Enron and that the labs called vertical integration. A model withheld because it was a weapon. A labor signal flickering in the one place its makers could not look away from. And, threaded through all of it, a question the field had been asking itself since Frank Rosenblatt stood up a refrigerator-sized machine in a government building in 1958 and told reporters it would someday walk and talk and reproduce itself.

Rosenblatt’s machine could tell a card marked on the left from a card marked on the right. It took sixty-eight years, two long winters when the whole idea was left for dead, a handful of stubborn people who kept working when no one was watching, two consumer graphics cards running hot in a Toronto bedroom, a Beatles-titled paper out of Google, and more capital than had ever been concentrated on a single technology, to get from that machine to a model that could find a seventeen-year-old flaw in an operating system that thousands of trained engineers had read and missed. The distance from Rosenblatt’s perceptron to Mythos is the distance this book has traveled.

What it could not tell you, what no one inside the story could agree on as they raced past one another on a monthly clock, was how much distance was left. Aschenbrenner said the finish line was 2027. Hassabis said 2030, and meant it as caution. Kokotajlo and his co-authors had looked hard at the same evidence and then quietly stepped back from their own dates. The people with the most information, the most compute, and the most at stake were the ones least able to say. They had built a thing that could write, reason, code, browse, and find the cracks in the world’s software, and they could not tell whether they were a year from the goal they had named or ten, or whether the goal as they had named it was even the right way to describe what they were making. That uncertainty is not a failure of the people in this book. It is the honest condition of standing at the front of something while it is still moving. The man who could not sit down, the one who auctioned three people from a Lake Tahoe hotel room when this story began, helped set in motion a machine its own makers now withhold from the world and cannot fully measure. Where it goes next is not yet written, and the people who would tell you they know are the ones worth trusting least.