Neuron Makers
Part IX · Chapter 40

DeepSeek Day

On a single January day in 2025, an efficient Chinese open model erases roughly a trillion dollars of US market value and puts the entire export-control strategy on trial. → The day the West's assumptions about its lead were challenged in public.

“In the face of disruptive technologies, the moat created by closed source is temporary.” — Liang Wenfeng, 36Kr / “Waves” interview, July 2024

On the morning of Monday, January 27, 2025, the most valuable company on earth began to come apart on the screens of every trading desk in New York, and almost no one selling it could have told you, the week before, the name of the thing that was tearing it down.

The thing was a phone app. Over the weekend it had climbed to the top of Apple’s free App Store in the United States, past ChatGPT, past every game and social network and banking tool, and it was topping the charts in more than a hundred countries at once. It was made by a company in Hangzhou that had never raised a dollar of venture capital, had no sales team, and had been formally incorporated for eighteen months. When the opening bell rang, Nvidia, whose chips were supposed to be the chokepoint of the entire artificial-intelligence boom, the one component no rival could do without, fell and kept falling. By the close it was down nearly 17 percent, settling around $118.58 a share. The drop erased roughly $589 billion of market value in a single session. It was the largest one-day loss in the history of American markets, more than double Nvidia’s own previous record, the roughly $279 billion it had shed in a single day five months earlier. Broadcom fell about 17 percent. Close to a trillion dollars in value evaporated across the Nasdaq before lunch.

Marc Andreessen had set the tone the night before. The venture capitalist, an early and loud believer in the American AI buildout, posted to X that the new Chinese model was “AI’s Sputnik moment.” He called it “one of the most amazing and impressive breakthroughs I’ve ever seen” and “a profound gift to the world.” Sputnik was the right register for the panic, if not for the facts. In 1957 a beeping Soviet satellite had told Americans that a country they had written off as a backward imitator could, in one stroke, put hardware over their heads. The story that ran on the cable tickers and in the group chats that Monday was a compressed version of the same fear: a tiny, scrappy Chinese lab had built a model as good as the best America had, for $5.6 million, on hardware it was not even supposed to have, and had given it away for free.

Every load-bearing word of that sentence was contestable. The lab was not tiny. The $5.6 million did not mean what people thought it meant. “As good as the best” was true on some tests and not on others. And the giveaway was strategy, not charity. The gap between the story and the facts is the actual subject of what came to be called, half in awe and half in mock-horror, DeepSeek Day. But the panic was not wrong about everything. The efficiency was real. The model was real. And the deeper thing the moment exposed about the previous two years of American policy was the most real of all.

To understand any of it you have to start not with a model but with the man behind it, and with a fact about his chips that almost no one selling Nvidia that morning had bothered to learn.

The man was Liang Wenfeng, the reclusive quant who ran the Hangzhou hedge fund High-Flyer and had spun his AI research arm out into a separate company, DeepSeek, in July 2023. The fact about the chips was this: as the previous chapter recounted, the cluster DeepSeek trained on had been bought legally, before the door closed. High-Flyer had stockpiled its Nvidia silicon before the October 2022 export controls and the tighter ones that followed a year later sealed it off from any Chinese buyer.

The mechanics of those controls, the two technical thresholds, the cut-down A800 and H800 parts Nvidia engineered to slip under them, and the eventual closure of that loophole, are Chapter 39’s to tell. What matters here is the residue: High-Flyer had its A100s, and it had stockpiled H800s, all of it legal at the moment of purchase, sitting in data centers while the rest of China’s labs scrambled. Because High-Flyer owned and funded DeepSeek outright, Liang answered to no board, no growth targets, no quarterly story. He could treat money as solved and worry only about the problem. The one constraint he could not buy his way past was the chips, and he had said so himself, in the only substantial interview he gave before the world learned his name. He had also said, in that same summer-2024 conversation with 36Kr’s “Waves,” why he intended to give his work away rather than guard it: against a genuinely disruptive technology, the moat that closed source bought you was only temporary. A closed lead, in his reading, was a lead you were renting; an open one was a position you owned. It was a conviction he was about to test in public.

While Wall Street paid no attention, DeepSeek built, in public, on a steady cadence: the open-source DeepSeek Coder and DeepSeek LLM in late 2023, a first mixture-of-experts model in early 2024, and in May 2024 a model called V2, priced so far below the incumbents that it touched off a price war among Alibaba, Baidu, Tencent, ByteDance, and Zhipu and earned DeepSeek a wary nickname, the Pinduoduo of AI, after the discount e-commerce app that scaled by underselling everyone. The label stuck, and it told you how the rest of the industry had filed the company away. Here was the discounter. Few of them clocked that the discounter was also building something at the frontier.

That something landed on December 26, 2024, with a model called DeepSeek-V3 and a technical report posted to arXiv. The model was a mixture-of-experts giant, 671 billion parameters with only 37 billion firing for any token, built with the same export-forced discipline that ran through everything DeepSeek made: an attention scheme that shrank what each chip had to remember, eight-bit training that halved the memory a full run demanded, every layer of the stack squeezed because the hardware could not be. The engineering was real, and to the researchers who read the report it was obvious. But the report also carried a number, and it was the number, not the engineering, that the world would seize on a month later.

It sat in a table near the back. V3’s final pre-training run had consumed 2.788 million H800 GPU-hours, which the report converted, at an assumed two dollars per GPU-hour, to $5.576 million. The same report said in plain language what that figure left out: the prior research, the failed runs, the architecture work, the data pipeline, and the hardware itself, the thousands of chips that had to exist before a single run could begin. It was, as the previous chapter laid out, a marginal cost and not a total one. The team knew this. They wrote it down. The world would read the headline and skip the footnote.

Three and a half weeks later, the company shipped the thing that turned a respected open model into a global event. On January 20, 2025, DeepSeek released R1 and a sibling called R1-Zero, under the permissive MIT license, along with six smaller “distilled” models built on top of Meta’s Llama and Alibaba’s Qwen, ranging from 1.5 billion parameters up to 70 billion, so that almost anyone with a decent computer could run a version of it locally.

R1 was a reasoning model, the kind that thinks step by step before it answers, the breed Chapter 33 traced from OpenAI’s o1 and the surprising discovery that reinforcement learning alone could teach a model to reason its way to the “aha moment.” What set R1 apart was not the capability but the candor. OpenAI had kept two things hidden, the method that produced the behavior and the reasoning traces themselves, which it scrubbed from the model’s visible output. DeepSeek matched the capability and published the recipe, traces and all. R1-Zero was the more startling of the pair to researchers, because it showed that the reasoning could be coaxed out almost purely through reinforcement learning, the result other labs had achieved behind closed doors and described only in vague terms. DeepSeek wrote it down for anyone to reproduce. And the numbers were close to o1’s. On the 2024 AIME mathematics competition, R1 scored 79.8 percent to o1’s 79.2. On the MATH-500 benchmark it hit 97.3. It reached the 96th percentile on Codeforces competitive programming. On the harder science and reasoning tests it trailed o1 by a few points rather than matching it. The picture was not “China beats America.” It was a Chinese open model trading blows with the best closed model in the world, on most measures, and losing on a few. For something you could download for free and run yourself, that was astonishing enough.

The pricing made it sharper. R1’s interface cost about fifty-five cents per million input tokens and $2.19 per million output, against o1 prices many times higher. The same capability, open, at a fraction of the cost. American developers tried it over the weekend, found it genuinely good, and told each other so. The app climbed. And on Sunday night Andreessen reached for Sputnik, and on Monday the market reached for the sell button, and DeepSeek Day arrived.

What detonated was the number, stripped of its footnote, rather than the model itself. “$5.6 million” raced across the internet detached from every qualifier the technical report had attached to it. In the retelling it stopped being the cost of one training run and became the cost of the company, the cost of the breakthrough, the cost of catching America. If a frontier-grade model could be built for the price of a nice house in Palo Alto, then the entire investment thesis of the American AI industry, the hundreds of billions being poured into data centers and the chips that filled them, looked like a category error. That was the logic, more felt than reasoned, behind the largest one-day loss in market history. Nvidia’s value did not fall because R1 existed. It fell because of what people decided R1 implied about how much hardware the future would need.

In Doral, Florida, at a House Republican retreat, President Trump was asked about it and called DeepSeek “a wake-up call,” adding that the moment “should be a wake-up call for our industries that we need to be laser-focused on competing to win.” He allowed that cheaper AI might, in the end, be “a positive.” It was a measured response by the standards of the day, and it would not stay measured for long.

The American counterattack came on two fronts. The first questioned whether R1 was honestly come by at all. On Fox News on January 28 and 29, David Sacks, the venture capitalist newly installed as the White House’s AI and crypto czar, said there was “substantial evidence” that DeepSeek had “distilled the knowledge out of OpenAI’s models,” meaning it had trained its own model on the outputs of ChatGPT, using the American system as a teacher. OpenAI, for its part, said it was aware that groups were “constantly trying to distill the models of leading US AI companies,” and reporting surfaced that Microsoft’s security researchers had, in the fall of 2024, flagged what looked like unusual data being pulled through the OpenAI interface by accounts they associated with DeepSeek.

The irony was not lost on anyone. OpenAI, a company that had built its models by ingesting an enormous fraction of the public internet without asking, was objecting to being ingested in turn. And the charge, however plausible, was never proven in public. DeepSeek would later deny, in writing, that it had deliberately trained on OpenAI’s outputs, while conceding that the web data any large model swallows now inevitably contains text generated by other AIs, contamination that no one can fully scrub. The accusation hardened into a fixed feature of the debate without ever resolving. It is best treated as what it remained: an unproven allegation, repeated with conviction by interested parties, against a backdrop of an industry in which everyone had, in some sense, trained on everyone.

The second front was about money, and it was the more important one, because it was where the myth could actually be checked. The independent semiconductor analysts at SemiAnalysis published a detailed accounting at the end of January that became the standard rebuttal; the previous chapter walks through its numbers, the fifty thousand Hopper-class chips and the better than billion-dollar program behind the headline. These were estimates, not confessions, and DeepSeek never confirmed them. But the order of magnitude was the point. DeepSeek had not built a frontier lab for $5.6 million. It had built one for hundreds of millions of dollars, on top of a quant fund’s pre-existing GPU fortune, and had then run its final training pass with enough efficiency to report a figure that, taken alone, was true and wildly misleading at the same time.

Both things were true at once, and holding them together is the whole trick of the story. The $5.6 million was real and narrow. The efficiency it represented was real and large. The half-billion-dollar program behind it was also real. DeepSeek’s achievement was not that it was cheap to be DeepSeek. It was that DeepSeek had wrung frontier results out of second-tier, export-throttled chips by being radically clever about every layer of the stack, and had then shown its work. The myth that broke Nvidia for a day was a misreading. The capability underneath the myth was not.

DeepSeek pressed the openness as a weapon. During the last week of February 2025, in what it called Open Source Week, the company released a run of low-level infrastructure code, the kind of plumbing that labs normally guard as a trade secret: an optimized attention kernel, a library for moving data between experts, a matrix-multiplication engine tuned for its eight-bit training. And it disclosed, almost casually, that the economics of serving R1 and V3 implied a theoretical profit margin of 545 percent, a figure it was careful to label as theoretical, computed as if every token served were billed at the daytime list price. The point was not the exact number. The point was the posture. Here is the model, here are the tools that make it fast, and by the way the unit economics are comfortable enough to be embarrassing. It was an argument, aimed at the closed labs, that secrecy was no longer buying them what they thought.

If the American reaction was anxiety, the Chinese reaction was embrace. The change in DeepSeek’s official standing over four weeks was vertiginous. On January 20, the very day R1 shipped, Liang had attended a symposium with Premier Li Qiang, a sign that the leadership was watching. Then, on February 17, 2025, Xi Jinping convened a rare meeting with the country’s leading private entrepreneurs in Beijing, and the seating told the story. There was Ren Zhengfei of Huawei, Pony Ma of Tencent, Lei Jun of Xiaomi, Wang Chuanfu of BYD, Wang Xingxing of the robotics company Unitree. There, most symbolically, was Jack Ma, the Alibaba founder who had vanished from public life after his 2020 collision with regulators, summoned back into the front rank as a sign that the long campaign against China’s tech giants was over. And there, a man almost none of them would have recognized a year earlier, was Liang Wenfeng. The reclusive quant who had once said money was never his problem now sat among the titans of Chinese capital, anointed a national champion by the state that had spent the previous half-decade clipping the wings of the men beside him.

The geopolitical reframe is where the story turns from spectacle to something more uncomfortable for the people who had designed the policy. The export controls of 2022 and 2023 had a clear theory: deny China the best chips, and you deny it the best models, because frontier AI was a brute-force game and the chips were the brute. DeepSeek was the counterexample that made the theory wobble. Cut off from the top hardware, DeepSeek had not stalled. It had been forced to innovate at the level of algorithms and systems, and the things it had been forced to invent, the latent-attention compression, the eight-bit training, the disciplined mixture-of-experts, were precisely the efficiency gains that made it look, for one Monday, as if it had leapfrogged labs with ten times its compute. Scarcity had acted as a forcing function. And the open-weight strategy, which a richer and less constrained lab might never have chosen, gave the work maximum reach. A constraint meant to slow China down had helped channel it toward exactly the two strategies, efficiency and openness, that most embarrassed the American incumbents.

That is one reading, and it has loud advocates. Yann LeCun, Meta’s chief AI scientist, offered a different and pointed framing in the days after the panic, arguing that the lesson was not that China had surpassed the United States but that open models had surpassed closed ones. R1 was built on the published ideas of the global research community and released back into it; the right takeaway, in his view, was a vindication of open science, not a national defeat. The competing reading, held just as firmly by the architects of the controls, is that the policy was working as intended, that DeepSeek’s cleverness was the desperate adaptation of a player who could not get the chips to scale the way the leaders could, and that the gap, while narrow, was still a gap. Both readings fit the same facts. The honest position, as of 2026, is that the question of whether the controls succeeded or backfired is genuinely unresolved, and that anyone claiming otherwise is selling a conclusion.

The chip question did not stay abstract. Through 2025 the H20, the weakest of Nvidia’s China parts and the one DeepSeek had leaned on for inference, became a political object in its own right. In April 2025 the Trump administration moved to require licenses for H20 exports, an effective ban that cost Nvidia a roughly $5.5 billion inventory charge. In July it reversed course and let H20 sales resume. Then, in August, came an arrangement with no real precedent: Nvidia and AMD reportedly agreed to hand the US government 15 percent of their China chip revenue in exchange for the licenses to sell there, a quasi-tax on exports that critics across the political spectrum found hard to categorize. The chip that had been banned, unbanned, and then taxed traced, in three moves, the incoherence of a policy improvising in real time against a target that kept adapting.

And China’s own answer to dependence, the homegrown chip, ran into the wall the export controls were partly designed to expose. In the summer of 2025, under pressure from Beijing to prove the domestic stack, DeepSeek tried to train its next reasoning model, R2, on Huawei’s Ascend processors rather than Nvidia’s. Huawei sent an engineering team to sit on site. The run never succeeded. The reporting, from The Information and Reuters and others, described persistent instability, interconnect problems, and the deeper drag of an immature software ecosystem, Huawei’s CANN against Nvidia’s mature CUDA, the same software moat that had protected Nvidia for fifteen years. DeepSeek reverted to Nvidia for training and kept Ascend for the lighter work of inference. R2, reportedly also held back because Liang was unsatisfied with its performance, slipped its expected release and stayed unshipped through the year. DeepSeek continued the V3 line instead, releasing a V3.1 in August 2025 whose new numeric format, the company said, was “designed for the next generation of domestically produced chips to be released soon,” a signal aimed squarely at China’s silicon ambitions, and one that sent Chinese chip stocks up. The signal and the stalled training run bracketed the real state of affairs: Chinese chips could now serve frontier models, but training them still ran on Nvidia.

In September 2025 DeepSeek made one more move that doubled as an argument. R1 appeared, peer-reviewed, on the cover of Nature, the first major large language model to pass through formal scientific review. The paper put a precise figure on the reinforcement-learning stage that had given R1 its reasoning, the part layered on top of the V3 base: roughly $294,000, on 512 H800 GPUs over about eighty hours. It was a smaller, more defensible, more specific number than the viral $5.6 million, and it came wrapped in the credibility of peer review and a formal response to the distillation charge, in which DeepSeek denied deliberately training on OpenAI’s outputs while acknowledging that incidental AI-generated text in web data could not be ruled out. A company from a country accused of imitation had submitted its work to the oldest institution of scientific transparency in the West, and the secretive American labs that had accused it of cheating had submitted nothing. Openness, again, deployed as a flex.

The longer shadow DeepSeek cast was not over Nvidia, which recovered its losses within months as it became clear the world would buy every chip it could make. The shadow fell over the question of who would supply the world’s open AI. The inversion that Chapter 38 traces, the open frontier tilting Chinese, only accelerated after the crash. Meta’s Llama 4, released in April 2025, underwhelmed, and the American open-weight lead that Llama had once represented kept eroding while the Chinese labs shipped.

The space behind DeepSeek and Qwen filled with other Chinese labs shipping capable open weights at a cadence the West struggled to match. Moonshot AI released Kimi K2 in July 2025, a model with a trillion total parameters and 32 billion active, trained on more than fifteen trillion tokens, posting strong scores on the practical software-engineering benchmarks that mattered to developers. Zhipu, MiniMax, and ByteDance’s Doubao team pushed out their own. Alibaba shipped a Qwen3-Max with more than a trillion parameters in the autumn. The default substrate for anyone building open AI, the model a startup in Lagos or Lyon or São Paulo reached for first when it wanted weights it could run on its own machines and tune to its own needs, was increasingly Chinese. The irony was hard to miss. The country that American policy cast as the imitator had become the supplier of the open commons that the rest of the world built on.

By the middle of 2026, the frontier gap that the controls were meant to widen was, on the leading public benchmarks, measured in months rather than years. Stanford’s annual AI Index had documented the narrowing through 2024 and 2025; the trend did not reverse. The best American closed models still held the very top on the hardest tasks. But the distance had compressed to the point where the strategic story was no longer about a lead. It was about a map being redrawn. China had bet on efficiency and open weights and turned a hardware disadvantage into a distribution advantage. The United States held the chips, the capital, and the closed frontier, and had to decide what to do with them.

The decision, it turned out, had already been telegraphed. Six days before the panic, at the White House, Trump and the principals of Stargate had committed to spend up to half a trillion dollars building AI infrastructure on American soil. China’s answer to the scaling race had been to get smaller, cleverer, cheaper, and to give the result away. America’s answer to the efficiency shock was not to get more efficient. It was to get bigger than anyone had ever proposed getting before.